Commit Graph

53 Commits

Author SHA1 Message Date
Giuseppe Ottaviano
69bb0e0285 [caffe2] Avoid some double (and triple) lookups in workspace (#53319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53319

Noticed these in profiles.

Also switch to `unordered_map`.

Test Plan: Unit tests.

Reviewed By: swolchok

Differential Revision: D26504408

fbshipit-source-id: 9e14d55909a4af019058b8c27c67ee2348cd02a9
2021-03-04 22:57:02 -08:00
Jane Xu
71ca600af9 Renaming CAFFE2_API to TORCH_API (#49496)
Summary:
Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO.

Manually edited some references of the removed `CAFFE2_API`:
* `CONTRIBUTING.md`
* `caffe2/proto/CMakeLists.txt`
* `cmake/ProtoBuf.cmake`
* `c10/macros/Export.h`
* `torch/csrc/WindowsTorchApiMacro.h`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496

Reviewed By: malfet, samestep

Differential Revision: D25600726

Pulled By: janeyx99

fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782
2020-12-18 10:54:50 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Yinghai Lu
2a8d5a132c Fix workspace destruction ordering (#23096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23096

nets can have states that depends on the rest of the state in the Workspace. Hence, they should be destructed first.

Reviewed By: ajyu

Differential Revision: D16382987

fbshipit-source-id: 3fd030ba206e2d0e897abb9e31c95bdaeb9482b7
2019-07-19 16:49:50 -07:00
Benny Chen
f25322fb97 Fix issues under caffe round 1
Summary: Some automation to fix uninitialized members for caffe2 code. Ran canary to make sure I don't have any regression in prod, but not sure how to test comprehensively for caffe2

Reviewed By: ezyang

Differential Revision: D13776185

fbshipit-source-id: fb2a479971cc0276d8784be1c44f01252410bd24
2019-01-23 19:04:59 -08:00
Yangqing Jia
7d5f7ed270 Using c10 namespace across caffe2. (#12714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714

This is a short change to enable c10 namespace in caffe2. We did not enable
it before due to gflags global variable confusion, but it should have been
mostly cleaned now. Right now, the plan on record is that namespace caffe2 and
namespace aten will fully be supersets of namespace c10.

Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where

```
using namespace c10;
```

is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with).

Reviewed By: dzhulgakov

Differential Revision: D10390486

fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b
2018-10-17 12:57:19 -07:00
Yangqing Jia
38f3d1fc40 move flags to c10 (#12144)
Summary:
still influx.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144

Reviewed By: smessmer

Differential Revision: D10140176

Pulled By: Yangqing

fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c
2018-10-04 02:09:56 -07:00
Yangqing Jia
9c49bb9ddf Move registry fully to c10 (#12077)
Summary:
This does 6 things:

- add c10/util/Registry.h as the unified registry util
  - cleaned up some APIs such as export condition
- fully remove aten/core/registry.h
- fully remove caffe2/core/registry.h
- remove a bogus aten/registry.h
- unifying all macros
- set up registry testing in c10

Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077

Reviewed By: ezyang

Differential Revision: D10050771

Pulled By: Yangqing

fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf
2018-09-27 03:09:54 -07:00
Yangqing Jia
28dba2f928 Unify all *_EXPORT and *_IMPORT macros across c++ backend (#12019)
Summary:
TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification.

This is a codemod by mechanically doing the following change:

CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019

Reviewed By: ezyang, teng-li

Differential Revision: D10016276

Pulled By: Yangqing

fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164
2018-09-25 17:41:05 -07:00
Sebastian Messmer
8f0db9bbbb Removing some dependency edges from Blob to other caffe2 (#12043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12043

Re-trying D9979976, this time with all call sites fixed.

D9979976 got reverted because there was a call site that wasn't covered by sandcastle it seems.
I fixed it and used 'grep' to ensure there aren't any more call sites in fbsource.

Reviewed By: ezyang

Differential Revision: D10026392

fbshipit-source-id: cd341514a8e53a40147ea0ee3e52f63bb6444157
2018-09-25 11:40:24 -07:00
Maciej Bargiel
2cdf98a74d Back out "Removing some dependency edges from Blob to other caffe2"
Summary: The controller you requested could not be found. Original commit changeset: 2ea17724e223

Differential Revision:
D10026321
Ninja: stable broken

fbshipit-source-id: faf87cb7cc0f78c2c10d4aa6fceea279cd27acd6
2018-09-25 01:11:14 -07:00
Sebastian Messmer
17a65bf9b6 Removing some dependency edges from Blob to other caffe2 (#11923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11923

This is pre-work to allow moving Blob to ATen/core, which cannot depend on caffe2 anymore.
(1) Removing the Blob -> Tensor dependency allows us to move Blob to ATen/core and use it inside IValue without having to wait for the Tensor merge to be complete.
(2) In the final Blob design, we want it to be a very small class that doesn't have any special treatment for Tensor (or to be more correct, doesn't allow storing Tensor anymore), so this is anyhow the direction we want to go.

This changes call sites that will have to be moved to IValue later, but they cannot be moved to IValue directly, because for that, IValue first needs to be able to store Blob, which in turn first needs this diff and some other changes coming up in future diffs.

Codemods:
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.IsTensorType\\(" "BlobIsTensorType(\\1, "
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->IsTensorType\\(" "BlobIsTensorType(*\\1, "
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.GetMutableTensor\\(" "BlobGetMutableTensor(\\1, "
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->GetMutableTensor\\(" "BlobGetMutableTensor(*\\1, "

It is, however, not only these codemods because regex based refactoring was only able to match a small amount of the call sites. To catch more, I wouldn've needed a AST aware tool like clangr, which I didn't figure out how to use.

Reviewed By: ezyang

Differential Revision: D9979976

fbshipit-source-id: 2ea17724e223b5b73b44f99362727759ca689e61
2018-09-24 22:57:05 -07:00
Edward Yang
91797c0672 Replace direct include of caffe2.pb.h with an intermediary header caffe2_pb.h (#10946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10946

```
codemod -d . --extensions cc,cpp,cu,cuh,h caffe2/proto/caffe2.pb.h caffe2/proto/caffe2_pb.h
```

Reviewed By: houseroad

Differential Revision: D9539945

fbshipit-source-id: 497d04720e8e7e61c05ffe1b23733d0cb774de7e
2018-08-28 11:57:08 -07:00
Yinghai Lu
8044dc4eb8 Support new Reshape semantics (#10848)
Summary:
Since ONNX opset version >5, Reshape changed semantics to take a shape tensor as input instead of relying on `shape` attribute to decide what shape to reshape to. ONNXIFI op has been postponing this change as some of the backends such as TensorRT were not ready. Now that the backends have adopted this semantics, we can remove the legacy mode and output opset version 7 ONNX models.

This change also flushes out some of the bugs and new requirement.
- Converting shape info into int64 tensor
- Fix a bug when we output the shape tensor in the mapped workspace instead of the original workspace
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10848

Reviewed By: houseroad

Differential Revision: D9495121

Pulled By: yinghai

fbshipit-source-id: a6f44a89274c35b33fae9a429813ebf21d9a3d1a
2018-08-24 11:46:41 -07:00
Andrei Maximov
432b3adffc Print blob sizes on fatal signal (#10766)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10766

Added a `Workspace::ForEach(...)` API for accessing the global set of
existing Workspace instances. This is used in the signal handler to print blob
info on the thread receiving a fatal signal.

Reviewed By: mraway

Differential Revision: D9147768

fbshipit-source-id: a94d0b5e6c88390a969ef259ecb8790173af01a4
2018-08-23 13:39:55 -07:00
Orion Reblitz-Richardson
488ea824ed Additional changes to make GPU builds work (#10507)
Summary:
A continuation of https://github.com/pytorch/pytorch/pull/10504 for GPU, torch, etc. builds.

I was testing with

```
FULL_CAFFE2=1 python setup.py build_deps | tee ~/log.txt
cat ~/log.txt | egrep 'undefined refer' | sort | less
```

I'll rebase on master when Yangqing's changes in 10504 land, but putting up for some testing.

cc mingzhe09088 anderspapitto ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10507

Reviewed By: Yangqing

Differential Revision: D9359606

Pulled By: orionr

fbshipit-source-id: c2a3683b3ea5839689f5d2661da0bc9055a54cd2
2018-08-16 13:25:27 -07:00
Yangqing Jia
0a809fc8b1 build changes to make cpu unified build working. (#10504)
Summary:
Properly annotated all apis for cpu front. Checked with cmake using

cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON

and resulting libcaffe2.so has about 11k symbols.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504

Reviewed By: ezyang

Differential Revision: D9316491

Pulled By: Yangqing

fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454
2018-08-15 17:22:36 -07:00
Edward Yang
ad76fc8807 s/DISABLE_COPY_AND_ASSIGN/AT_DISABLE_COPY_AND_ASSIGN/ (#10275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10275

Remove forwarding declaration in caffe2/core/common.h

```
codemod -d caffe2 --extensions cc,cpp,cu,cuh,h \\bDISABLE_COPY_AND_ASSIGN AT_DISABLE_COPY_AND_ASSIGN
```

Reviewed By: mingzhe09088

Differential Revision: D9184809

fbshipit-source-id: 958cf5162b0d92b83ea9c2597abb77320ca57ce8
2018-08-07 08:54:26 -07:00
Jerry Zhang
aebf3b47ae Remove template parameter from Tensor (#9939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939

Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: ezyang, houseroad

Differential Revision: D9024330

fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
2018-07-27 10:56:39 -07:00
Jerry Zhang
969b62f276 Revert D8121878: Remove template parameter from Tensor
Differential Revision:
D8121878

Original commit changeset: 4a5e9a677ba4

fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e
2018-07-26 14:02:04 -07:00
Jerry Zhang
cd5adc7b5f Remove template parameter from Tensor (#13)
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: xw285cornell

Differential Revision: D8121878

fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
2018-07-26 10:25:23 -07:00
Yinghai Lu
4d2a0b889f [Caffe2] Use mapped workspace instead of renaming when working on renamed nets (#6717)
* Use mapped workspace instead of renaming when working on renamed nets

* Comments
2018-04-18 19:14:11 -07:00
Yinghai Lu
6252706feb
[Caffe2] Workspace centric API for TensorRT transformation (#6678)
* Workspace centric API for trt transformation

* Merge SSA rewrite code
2018-04-17 21:23:27 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Marat Dukhan
224493d9ce NNPACK: Use new bindings and custom thread pool
Summary:
This change should dramatically (~10X) improve performance of convolution with NNPACK engine
Closes https://github.com/caffe2/caffe2/pull/1730

Reviewed By: sf-wind

Differential Revision: D6695895

Pulled By: Maratyszcza

fbshipit-source-id: 26291916811ef4cb819a59aec848c4e23668e568
2018-01-11 10:48:12 -08:00
Ilia Cherniavskii
d28720b90a Backpropagation for While op
Summary: Adds support for backprop to While op, fixes gradient computation for Pow

Reviewed By: azzolini

Differential Revision: D6456875

fbshipit-source-id: 9f660317ad6f3898ff7d8ce43098f85c3426409b
2017-12-18 16:03:45 -08:00
Alexander Sidorov
bfdd864631 Automatically pretranspose FCs in BlackBoxPredictor
Summary:
pretransposing FCs seems to offset loses we get from low
batch sizes in AdIndexer. First I confirmed this on local benchmarks (see
previous diff). Then in https://fburl.com/yuo49onj I showed how this
change saves 19% of FC time on AdIndexer. Which is already $0.4M in
cap. exp. and over 3 years gives 5x more ROI.

We also we reuse this code for later more efficient gemm
implementations. I.e. msmelyan is working on new fp16 gemm which
would cut bandwidth usage 2x. We can reuse code in this diff for
repacking required by a new gemm.

In this diff I had to take care of memory usage. Here are several
possible approaches to the transformation:

1. Perform  on the fly, copy the memory. This is what is done in
skinny gemm (FC with engine SKINNY)

Cons: slow first execution, memory is replicated for each thread

2. Perform copy of weights in operator constructor. On the fly in dbg
mode verify that hash on original weight is the same

Cons: memory is still replicated for each thread

3. Perform copy weights in Predictor constructor

Cons: if we have 2 predictors sharing the same weight blob (via
PredictorContainer), we still get 3x more memory. I.e. original
weights and two copies for each of the predictors in a container

4. Replace weights in Predictor constructor, take care of mapping to
support weight sharing within a Predictor container

This is the approach taken in this diff, it solves issues above and
doensn't create any memory overhead.

Cons: Logic became complex, requires a mutex at initialization time

Reviewed By: akyrola

Differential Revision: D6214593

fbshipit-source-id: 25da6ba7bfd39fc8f4b578094d3f334c7957490d
2017-11-09 17:35:32 -08:00
Hassan Eslami
5388948b59 CreateLocalBlob for workspace
Summary: Adds the ability to create a local blob in the workspace even if the blob exists in the parent workspace. This is to support cases where a user wants to create a local copy of the blob and hide the blob from the parent workspace.

Reviewed By: akyrola

Differential Revision: D6194386

fbshipit-source-id: 92c064159ac635ee76c211abc013b72bd8752447
2017-11-01 21:32:47 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Ilia Cherniavskii
f8f5e79f5f Backpropagation for If operator
Summary:
Adding backward pass support for If operator:
 - Implemented necessary changes to Do operator and generation of gradient Do operator to properly forward gradient blobs in and out of subnet
 - Using WorkspaceManager to keep track of workspaces used by Do, in case we need to have access to local blobs to compute gradients (also important for loop's backprop)
 - Update to Workspace to handle blob binding from multiple parent workspaces
 - Implemented generation of gradient If operator
 - Unit test to build and train a net with If control op

Reviewed By: azzolini

Differential Revision: D5745096

fbshipit-source-id: 1023c90a2113716254424d1e50b9e560fe9083e5
2017-09-18 16:17:42 -07:00
Ilia Cherniavskii
67a55b81e3 Forward blobs into workspace
Summary:
Better isolation for workspaces to allow forwarding selected blobs
from parent to child workspace, possibly under new names. Used for proper
isolation of subnets (loops, then/else branhes, etc) from outer workspace.

Reviewed By: azzolini

Differential Revision: D5681667

fbshipit-source-id: e61a2c7c98ee2abf1f0761905f4bfae47c201c32
2017-08-22 18:45:56 -07:00
Victor Gao
34be12353b comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 15:14:43 -07:00
Junjie Bai
5881aa0a78 Use shared_ptr to share OperatorDef across threads
Reviewed By: akyrola

Differential Revision: D5434291

fbshipit-source-id: 89f470d1e2dcde36c3273d86565b1952d7682808
2017-07-17 23:49:59 -07:00
Alexander Sidorov
75fc49833f An observer for every created net and op
Reviewed By: akyrola

Differential Revision: D5319289

fbshipit-source-id: 1140caef6d608ab3e37d22311e5c8a7e489470d5
2017-06-27 18:07:03 -07:00
Alexander Sidorov
c8410859d9 Operator python stacktraces, attempt 2
Summary:
Last time I used uuid filled into OperatorDef. And operator_tracebacks was populated using traceback.extract_stack. There were several issues with this approach:

1. A random field in OperatorDef breaks workflows relying on memoization, i.e. when computation is skipped based on already computed result before.
2. Adding one more field revealed RNNs being non forward compatible wrt to new fields in there. prototxt format seems to not allow forward compatibility (thanks jamesr66a for the investigation!). For RNNs we need to swtich them to a more resilient approach. azzolini's proposed change to OperatorDef / NetDef would allow that by just nesting NetDef dirrectly inside OperatorDef without need for extra serialization.
3. traceback.extract_stack is very slow when executable is on a remote filesystem. It does one or more os.stat for each frame on the stack. For some cases it ended up being up to 15 extra minutes on model construction.

In this diff I use a different approach which should fix all those problems above.

1.2. are solved by not adding a new field at all. Instead I report operator idx wrt to a net it runs in. Thanks akyrola and dzhulgakov for the idea. Downside here is that operator list manipulation breaks the logic and separately created ops are not covered at all.
3. I solved this by operating on raw frames without using traceback and inspect modules which end up doing a lot of file system calls. See function extract_stacktace in core.py with additional comments.

Reviewed By: dzhulgakov

Differential Revision: D5286285

fbshipit-source-id: 626dd0f5f6b8b1d86bd6bf519078b122f43ddcaa
2017-06-25 19:32:58 -07:00
Alexander Sidorov
83e6a0bec8 Revert uuid change to OperatorDef protobuf
Summary:
a few issues:

1. Randomization hurts memoization
1. Even if we make it non random, then we can get key colisions when loading it back.
2. RNNs use prototxt for step net and apparently its not forward compatible like normal protobuf is

I am thinking of a better less invasive solution now.

Reviewed By: jamesr66a

Differential Revision: D5272118

fbshipit-source-id: ab577fad04fbfc632e1fceffa923377a0d3da1be
2017-06-19 16:47:31 -07:00
Alexander Sidorov
eebda50b79 Operator python traceback
Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call.

Reviewed By: jamesr66a

Differential Revision: D5226047

fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108
2017-06-13 18:50:02 -07:00
Alisson Gusatti Azzolini
db1d62caf7 Move RunPlan to a separate file
Summary: This RunPlan is getting complex and confusing. The first step to clean it up is to move it out of workspace.cc to better mark separation of concerns.

Reviewed By: kennyhorror

Differential Revision: D5100721

fbshipit-source-id: 4be0559eba1abb8bb1ddc3818698763c2e014ef2
2017-05-24 11:07:15 -07:00
Yangqing Jia
cf317d1106 create_net: explicitly specify if one wants to overwrite the network.
Summary:
This is from discussion with dzhulgakov : as a step towards revisiting the
core.Net autonaming, we will first guard against accidental overwrites of
existing networks in the workspace.

ajtulloch since we are doing Predictors in mobile, this should be safe right?

azzolini - I assume this would be safe, but would love to get your approval.

akyrola - would this hurt xray?

Reviewed By: dzhulgakov

Differential Revision: D4897725

fbshipit-source-id: aa41271927ad6671f07a53b9505283623f8c49e5
2017-04-17 21:46:53 -07:00
Aapo Kyrola
1e5140aa76 option to recompute blobs backward pass with massive memory savings
Summary:
This diff adds an option to recurrent_net to define some cell blobs to be recomputed on backward step, and thus they don't need to be stored in the step workspace. This is done by modifying the backward step to automatically include all operators that are needed to produce the output that is to be recomputed, and by storing those blobs in a shared workspace. To enable the shared workspace, i had to modify the stepworkspaces blob to also store a forward shared workspace. Making it a class field won't work since the lifecycle of the blob does not match the lifecycle of the operator.

For basic LSTM, the performance hit is quite modest (about 15% with one setting, but your mileage might vary. For Attention models, I am sure this is beneficial as computing the attention blobs is not expensive.

For basic LSTM, the memory saving is wonderful: each forward workspace only has 4 bytes (for timestep).

I also modified the neural_mt LSTM Cells, but there is no test available, so I am not 100% sure I did it correctly. Please have a look.

Added options to LSTM, MILSTM and LSTMAttention to enable memory mode.

Reviewed By: urikz

Differential Revision: D4853890

fbshipit-source-id: d8d0e0e75a5330d174fbfa39b96d8e4e8c446baa
2017-04-11 13:03:48 -07:00
Aapo Kyrola
a2065f3c1e report capacity bytes as part of workspace blob stats
Summary: Instead of reporting the number of total elements of tensor, report the number of bytes. But report the capacity of the tensor, not the current number of bytes.

Reviewed By: jamesr66a, salexspb

Differential Revision: D4851633

fbshipit-source-id: 464d552f41f1b5f25753b0e7001d299b6dac1966
2017-04-07 19:16:37 -07:00
Aapo Kyrola
ffd298376a option to print tensor shapes at exit
Summary:
Added Caffe2 cmd line option --caffe2_print_blob_sizes_at_exit=1, that when enabled, will print all tensor sizes at the workspace destructor. Handy especially when using sub-workspaces like with RNNs. Note that the sizes are number of elements, not bytes. Output is designed to be easily excel-copypasteable.

TODO: add sorting

Reviewed By: jamesr66a

Differential Revision: D4844628

fbshipit-source-id: 11608a1710ae5c89bbd741edb506d25496606185
2017-04-06 21:36:04 -07:00
Alexander Sidorov
51ae65d76f RNN: reuse memory for gradients of internal blobs of the cell net
Summary:
Main idea is that on the backward pass we don't need to store all the backward outputs in memory. This diff addresses only ones used internally in each private workspace by creating that shares them all witing the backward pass.

Another thing we can do - get rid of state_grad blobs, but this would be a different effort.

See comments for more detailed description.

Reviewed By: urikz

Differential Revision: D4784900

fbshipit-source-id: 2dd8fe1b1215217ce92c09d918582d76c3051630
2017-03-29 15:20:50 -07:00
Ou Jin
eeb7279020 compile execution step
Summary:
When the execution step is representing things like:
for loop
  execution_step
     net1
  execution_step
     net2
     net3
the preparation cost for execution step is too high.
This diff moves most of the shared information in the CompiledExecutionStep to save time.

After the change the benchmark result for parameter server handler is as following: (be aware that the first two have some variance)
INFO:__main__:==Summary==
INFO:__main__:Time <function case_if at 0x7f7160c32938> 0.0752924203873
INFO:__main__:Time <function case_loop at 0x7f7160c329b0> 0.0677666187286
INFO:__main__:Time <function case_simple_net at 0x7f7160c32a28> 0.0605396509171
INFO:__main__:Time <function case_one_loop at 0x7f7160c32aa0> 0.0611681699753

Before the change:
INFO:main:==Summary==
INFO:main:Time <function case_if at 0x7f19d079f848> 0.100815701485
INFO:main:Time <function case_loop at 0x7f19d079f8c0> 0.0864136457443
INFO:main:Time <function case_simple_net at 0x7f19d079f938> 0.0614696979523
INFO:main:Time <function case_one_loop at 0x7f19d079f9b0> 0.0598972082138

Reviewed By: azzolini

Differential Revision: D4643926

fbshipit-source-id: 5a4b97230ba778e0ff5cbafc8a216335a191068a
2017-03-08 23:49:41 -08:00
Aapo Kyrola
dcefc74a0c Shape and Type Inference Part1
Summary:
This is a bit large diff, sorry about it. It includes basic shape and type inference functionality, based on YQ's Schema scaffolding. I added some helper functions to make it easier to write simple translations.

Bigger refactoring was needed for ConvPoolBase so that we could use the shape inference already there in the schema.

I annotated enough operators to be able to infer forward-pass of shapes for basic convnet, and added test for that. I intend to bootcamp some annotations and annotate enough to handle Resnets fully. Need to think about gradients, if they could be annotated in an easier way.

Only shapes are now exposed to Python, types will follow later. Also the inference is not called yet anywhere but unit test.

Also I am not sure if everything is in the best location in the code, but shouldn't be hard to move stuff around.

Reviewed By: dzhulgakov

Differential Revision: D4436818

fbshipit-source-id: eebee5937ccc9ac09c245465302388a1fae6933c
2017-02-02 22:29:22 -08:00
Liang Xiong
1aafeb3565 clean up memory of c2/sigrid predictor
Summary: trying to optimize c2 predictor memory usage. mainly to remove unsed dbreader and dper metadata.

Differential Revision: D4232595

fbshipit-source-id: dcd7aa7dd09587ec9811a9e5ec725e0c22757665
2016-11-29 15:18:39 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
Yangqing Jia
d1e9215184 fbsync 2016-10-07 13:08:53 -07:00
Yangqing Jia
b23e51d467 chunky sync 2016-09-06 15:55:19 -07:00
Yangqing Jia
6463eebc7b chunky sync - build scripts to be written 2016-07-21 10:16:42 -07:00