pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Giuseppe Ottaviano	69bb0e0285	[caffe2] Avoid some double (and triple) lookups in workspace (#53319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53319 Noticed these in profiles. Also switch to `unordered_map`. Test Plan: Unit tests. Reviewed By: swolchok Differential Revision: D26504408 fbshipit-source-id: 9e14d55909a4af019058b8c27c67ee2348cd02a9	2021-03-04 22:57:02 -08:00
Jane Xu	71ca600af9	Renaming CAFFE2_API to TORCH_API (#49496 ) Summary: Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO. Manually edited some references of the removed `CAFFE2_API`: * `CONTRIBUTING.md` * `caffe2/proto/CMakeLists.txt` * `cmake/ProtoBuf.cmake` * `c10/macros/Export.h` * `torch/csrc/WindowsTorchApiMacro.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496 Reviewed By: malfet, samestep Differential Revision: D25600726 Pulled By: janeyx99 fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782	2020-12-18 10:54:50 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
Yinghai Lu	2a8d5a132c	Fix workspace destruction ordering (#23096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23096 nets can have states that depends on the rest of the state in the Workspace. Hence, they should be destructed first. Reviewed By: ajyu Differential Revision: D16382987 fbshipit-source-id: 3fd030ba206e2d0e897abb9e31c95bdaeb9482b7	2019-07-19 16:49:50 -07:00
Benny Chen	f25322fb97	Fix issues under caffe round 1 Summary: Some automation to fix uninitialized members for caffe2 code. Ran canary to make sure I don't have any regression in prod, but not sure how to test comprehensively for caffe2 Reviewed By: ezyang Differential Revision: D13776185 fbshipit-source-id: fb2a479971cc0276d8784be1c44f01252410bd24	2019-01-23 19:04:59 -08:00
Yangqing Jia	7d5f7ed270	Using c10 namespace across caffe2. (#12714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714 This is a short change to enable c10 namespace in caffe2. We did not enable it before due to gflags global variable confusion, but it should have been mostly cleaned now. Right now, the plan on record is that namespace caffe2 and namespace aten will fully be supersets of namespace c10. Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where ``` using namespace c10; ``` is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with). Reviewed By: dzhulgakov Differential Revision: D10390486 fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b	2018-10-17 12:57:19 -07:00
Yangqing Jia	38f3d1fc40	move flags to c10 (#12144 ) Summary: still influx. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144 Reviewed By: smessmer Differential Revision: D10140176 Pulled By: Yangqing fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c	2018-10-04 02:09:56 -07:00
Yangqing Jia	9c49bb9ddf	Move registry fully to c10 (#12077 ) Summary: This does 6 things: - add c10/util/Registry.h as the unified registry util - cleaned up some APIs such as export condition - fully remove aten/core/registry.h - fully remove caffe2/core/registry.h - remove a bogus aten/registry.h - unifying all macros - set up registry testing in c10 Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077 Reviewed By: ezyang Differential Revision: D10050771 Pulled By: Yangqing fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf	2018-09-27 03:09:54 -07:00
Yangqing Jia	28dba2f928	Unify all _EXPORT and _IMPORT macros across c++ backend (#12019 ) Summary: TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification. This is a codemod by mechanically doing the following change: CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT} AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT} Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019 Reviewed By: ezyang, teng-li Differential Revision: D10016276 Pulled By: Yangqing fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164	2018-09-25 17:41:05 -07:00
Sebastian Messmer	8f0db9bbbb	Removing some dependency edges from Blob to other caffe2 (#12043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12043 Re-trying D9979976, this time with all call sites fixed. D9979976 got reverted because there was a call site that wasn't covered by sandcastle it seems. I fixed it and used 'grep' to ensure there aren't any more call sites in fbsource. Reviewed By: ezyang Differential Revision: D10026392 fbshipit-source-id: cd341514a8e53a40147ea0ee3e52f63bb6444157	2018-09-25 11:40:24 -07:00
Maciej Bargiel	2cdf98a74d	Back out "Removing some dependency edges from Blob to other caffe2" Summary: The controller you requested could not be found. Original commit changeset: 2ea17724e223 Differential Revision: D10026321 Ninja: stable broken fbshipit-source-id: faf87cb7cc0f78c2c10d4aa6fceea279cd27acd6	2018-09-25 01:11:14 -07:00
Sebastian Messmer	17a65bf9b6	Removing some dependency edges from Blob to other caffe2 (#11923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11923 This is pre-work to allow moving Blob to ATen/core, which cannot depend on caffe2 anymore. (1) Removing the Blob -> Tensor dependency allows us to move Blob to ATen/core and use it inside IValue without having to wait for the Tensor merge to be complete. (2) In the final Blob design, we want it to be a very small class that doesn't have any special treatment for Tensor (or to be more correct, doesn't allow storing Tensor anymore), so this is anyhow the direction we want to go. This changes call sites that will have to be moved to IValue later, but they cannot be moved to IValue directly, because for that, IValue first needs to be able to store Blob, which in turn first needs this diff and some other changes coming up in future diffs. Codemods: $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.IsTensorType\\(" "BlobIsTensorType(\\1, " $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->IsTensorType\\(" "BlobIsTensorType(\\1, " $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.GetMutableTensor\\(" "BlobGetMutableTensor(\\1, " $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->GetMutableTensor\\(" "BlobGetMutableTensor(\\1, " It is, however, not only these codemods because regex based refactoring was only able to match a small amount of the call sites. To catch more, I wouldn've needed a AST aware tool like clangr, which I didn't figure out how to use. Reviewed By: ezyang Differential Revision: D9979976 fbshipit-source-id: 2ea17724e223b5b73b44f99362727759ca689e61	2018-09-24 22:57:05 -07:00
Edward Yang	91797c0672	Replace direct include of caffe2.pb.h with an intermediary header caffe2_pb.h (#10946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10946 ``` codemod -d . --extensions cc,cpp,cu,cuh,h caffe2/proto/caffe2.pb.h caffe2/proto/caffe2_pb.h ``` Reviewed By: houseroad Differential Revision: D9539945 fbshipit-source-id: 497d04720e8e7e61c05ffe1b23733d0cb774de7e	2018-08-28 11:57:08 -07:00
Yinghai Lu	8044dc4eb8	Support new Reshape semantics (#10848 ) Summary: Since ONNX opset version >5, Reshape changed semantics to take a shape tensor as input instead of relying on `shape` attribute to decide what shape to reshape to. ONNXIFI op has been postponing this change as some of the backends such as TensorRT were not ready. Now that the backends have adopted this semantics, we can remove the legacy mode and output opset version 7 ONNX models. This change also flushes out some of the bugs and new requirement. - Converting shape info into int64 tensor - Fix a bug when we output the shape tensor in the mapped workspace instead of the original workspace Pull Request resolved: https://github.com/pytorch/pytorch/pull/10848 Reviewed By: houseroad Differential Revision: D9495121 Pulled By: yinghai fbshipit-source-id: a6f44a89274c35b33fae9a429813ebf21d9a3d1a	2018-08-24 11:46:41 -07:00
Andrei Maximov	432b3adffc	Print blob sizes on fatal signal (#10766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10766 Added a `Workspace::ForEach(...)` API for accessing the global set of existing Workspace instances. This is used in the signal handler to print blob info on the thread receiving a fatal signal. Reviewed By: mraway Differential Revision: D9147768 fbshipit-source-id: a94d0b5e6c88390a969ef259ecb8790173af01a4	2018-08-23 13:39:55 -07:00
Orion Reblitz-Richardson	488ea824ed	Additional changes to make GPU builds work (#10507 ) Summary: A continuation of https://github.com/pytorch/pytorch/pull/10504 for GPU, torch, etc. builds. I was testing with ``` FULL_CAFFE2=1 python setup.py build_deps \| tee ~/log.txt cat ~/log.txt \| egrep 'undefined refer' \| sort \| less ``` I'll rebase on master when Yangqing's changes in 10504 land, but putting up for some testing. cc mingzhe09088 anderspapitto ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10507 Reviewed By: Yangqing Differential Revision: D9359606 Pulled By: orionr fbshipit-source-id: c2a3683b3ea5839689f5d2661da0bc9055a54cd2	2018-08-16 13:25:27 -07:00
Yangqing Jia	0a809fc8b1	build changes to make cpu unified build working. (#10504 ) Summary: Properly annotated all apis for cpu front. Checked with cmake using cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON and resulting libcaffe2.so has about 11k symbols. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504 Reviewed By: ezyang Differential Revision: D9316491 Pulled By: Yangqing fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454	2018-08-15 17:22:36 -07:00
Edward Yang	ad76fc8807	s/DISABLE_COPY_AND_ASSIGN/AT_DISABLE_COPY_AND_ASSIGN/ (#10275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10275 Remove forwarding declaration in caffe2/core/common.h ``` codemod -d caffe2 --extensions cc,cpp,cu,cuh,h \\bDISABLE_COPY_AND_ASSIGN AT_DISABLE_COPY_AND_ASSIGN ``` Reviewed By: mingzhe09088 Differential Revision: D9184809 fbshipit-source-id: 958cf5162b0d92b83ea9c2597abb77320ca57ce8	2018-08-07 08:54:26 -07:00
Jerry Zhang	aebf3b47ae	Remove template parameter from Tensor (#9939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939 Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13 Pull Request resolved: https://github.com/pytorch/translate/pull/166 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125 Closes https://github.com/pytorch/pytorch/pull/9125 Use inheritance for polymorphism, and remove template parameter This is to change the templating in call sites, the core implementations will change later Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are: 1. We added an extra argument DeviceType to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)), 2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided. 3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type 4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s. Reviewed By: ezyang, houseroad Differential Revision: D9024330 fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba	2018-07-27 10:56:39 -07:00
Jerry Zhang	969b62f276	Revert D8121878: Remove template parameter from Tensor Differential Revision: D8121878 Original commit changeset: 4a5e9a677ba4 fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e	2018-07-26 14:02:04 -07:00
Jerry Zhang	cd5adc7b5f	Remove template parameter from Tensor (#13 ) Summary: Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13 Pull Request resolved: https://github.com/pytorch/translate/pull/166 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125 Closes https://github.com/pytorch/pytorch/pull/9125 Use inheritance for polymorphism, and remove template parameter This is to change the templating in call sites, the core implementations will change later Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are: 1. We added an extra argument DeviceType to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)), 2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided. 3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type 4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s. Reviewed By: xw285cornell Differential Revision: D8121878 fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81	2018-07-26 10:25:23 -07:00
Yinghai Lu	4d2a0b889f	[Caffe2] Use mapped workspace instead of renaming when working on renamed nets (#6717 ) * Use mapped workspace instead of renaming when working on renamed nets * Comments	2018-04-18 19:14:11 -07:00
Yinghai Lu	6252706feb	[Caffe2] Workspace centric API for TensorRT transformation (#6678 ) * Workspace centric API for trt transformation * Merge SSA rewrite code	2018-04-17 21:23:27 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Marat Dukhan	224493d9ce	NNPACK: Use new bindings and custom thread pool Summary: This change should dramatically (~10X) improve performance of convolution with NNPACK engine Closes https://github.com/caffe2/caffe2/pull/1730 Reviewed By: sf-wind Differential Revision: D6695895 Pulled By: Maratyszcza fbshipit-source-id: 26291916811ef4cb819a59aec848c4e23668e568	2018-01-11 10:48:12 -08:00
Ilia Cherniavskii	d28720b90a	Backpropagation for While op Summary: Adds support for backprop to While op, fixes gradient computation for Pow Reviewed By: azzolini Differential Revision: D6456875 fbshipit-source-id: 9f660317ad6f3898ff7d8ce43098f85c3426409b	2017-12-18 16:03:45 -08:00
Alexander Sidorov	bfdd864631	Automatically pretranspose FCs in BlackBoxPredictor Summary: pretransposing FCs seems to offset loses we get from low batch sizes in AdIndexer. First I confirmed this on local benchmarks (see previous diff). Then in https://fburl.com/yuo49onj I showed how this change saves 19% of FC time on AdIndexer. Which is already $0.4M in cap. exp. and over 3 years gives 5x more ROI. We also we reuse this code for later more efficient gemm implementations. I.e. msmelyan is working on new fp16 gemm which would cut bandwidth usage 2x. We can reuse code in this diff for repacking required by a new gemm. In this diff I had to take care of memory usage. Here are several possible approaches to the transformation: 1. Perform on the fly, copy the memory. This is what is done in skinny gemm (FC with engine SKINNY) Cons: slow first execution, memory is replicated for each thread 2. Perform copy of weights in operator constructor. On the fly in dbg mode verify that hash on original weight is the same Cons: memory is still replicated for each thread 3. Perform copy weights in Predictor constructor Cons: if we have 2 predictors sharing the same weight blob (via PredictorContainer), we still get 3x more memory. I.e. original weights and two copies for each of the predictors in a container 4. Replace weights in Predictor constructor, take care of mapping to support weight sharing within a Predictor container This is the approach taken in this diff, it solves issues above and doensn't create any memory overhead. Cons: Logic became complex, requires a mutex at initialization time Reviewed By: akyrola Differential Revision: D6214593 fbshipit-source-id: 25da6ba7bfd39fc8f4b578094d3f334c7957490d	2017-11-09 17:35:32 -08:00
Hassan Eslami	5388948b59	CreateLocalBlob for workspace Summary: Adds the ability to create a local blob in the workspace even if the blob exists in the parent workspace. This is to support cases where a user wants to create a local copy of the blob and hide the blob from the parent workspace. Reviewed By: akyrola Differential Revision: D6194386 fbshipit-source-id: 92c064159ac635ee76c211abc013b72bd8752447	2017-11-01 21:32:47 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Ilia Cherniavskii	f8f5e79f5f	Backpropagation for If operator Summary: Adding backward pass support for If operator: - Implemented necessary changes to Do operator and generation of gradient Do operator to properly forward gradient blobs in and out of subnet - Using WorkspaceManager to keep track of workspaces used by Do, in case we need to have access to local blobs to compute gradients (also important for loop's backprop) - Update to Workspace to handle blob binding from multiple parent workspaces - Implemented generation of gradient If operator - Unit test to build and train a net with If control op Reviewed By: azzolini Differential Revision: D5745096 fbshipit-source-id: 1023c90a2113716254424d1e50b9e560fe9083e5	2017-09-18 16:17:42 -07:00
Ilia Cherniavskii	67a55b81e3	Forward blobs into workspace Summary: Better isolation for workspaces to allow forwarding selected blobs from parent to child workspace, possibly under new names. Used for proper isolation of subnets (loops, then/else branhes, etc) from outer workspace. Reviewed By: azzolini Differential Revision: D5681667 fbshipit-source-id: e61a2c7c98ee2abf1f0761905f4bfae47c201c32	2017-08-22 18:45:56 -07:00
Victor Gao	34be12353b	comment out unused parameters Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually. Reviewed By: igorsugak Differential Revision: D5454343 fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2	2017-07-21 15:14:43 -07:00
Junjie Bai	5881aa0a78	Use shared_ptr to share OperatorDef across threads Reviewed By: akyrola Differential Revision: D5434291 fbshipit-source-id: 89f470d1e2dcde36c3273d86565b1952d7682808	2017-07-17 23:49:59 -07:00
Alexander Sidorov	75fc49833f	An observer for every created net and op Reviewed By: akyrola Differential Revision: D5319289 fbshipit-source-id: 1140caef6d608ab3e37d22311e5c8a7e489470d5	2017-06-27 18:07:03 -07:00
Alexander Sidorov	c8410859d9	Operator python stacktraces, attempt 2 Summary: Last time I used uuid filled into OperatorDef. And operator_tracebacks was populated using traceback.extract_stack. There were several issues with this approach: 1. A random field in OperatorDef breaks workflows relying on memoization, i.e. when computation is skipped based on already computed result before. 2. Adding one more field revealed RNNs being non forward compatible wrt to new fields in there. prototxt format seems to not allow forward compatibility (thanks jamesr66a for the investigation!). For RNNs we need to swtich them to a more resilient approach. azzolini's proposed change to OperatorDef / NetDef would allow that by just nesting NetDef dirrectly inside OperatorDef without need for extra serialization. 3. traceback.extract_stack is very slow when executable is on a remote filesystem. It does one or more os.stat for each frame on the stack. For some cases it ended up being up to 15 extra minutes on model construction. In this diff I use a different approach which should fix all those problems above. 1.2. are solved by not adding a new field at all. Instead I report operator idx wrt to a net it runs in. Thanks akyrola and dzhulgakov for the idea. Downside here is that operator list manipulation breaks the logic and separately created ops are not covered at all. 3. I solved this by operating on raw frames without using traceback and inspect modules which end up doing a lot of file system calls. See function extract_stacktace in core.py with additional comments. Reviewed By: dzhulgakov Differential Revision: D5286285 fbshipit-source-id: 626dd0f5f6b8b1d86bd6bf519078b122f43ddcaa	2017-06-25 19:32:58 -07:00
Alexander Sidorov	83e6a0bec8	Revert uuid change to OperatorDef protobuf Summary: a few issues: 1. Randomization hurts memoization 1. Even if we make it non random, then we can get key colisions when loading it back. 2. RNNs use prototxt for step net and apparently its not forward compatible like normal protobuf is I am thinking of a better less invasive solution now. Reviewed By: jamesr66a Differential Revision: D5272118 fbshipit-source-id: ab577fad04fbfc632e1fceffa923377a0d3da1be	2017-06-19 16:47:31 -07:00
Alexander Sidorov	eebda50b79	Operator python traceback Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call. Reviewed By: jamesr66a Differential Revision: D5226047 fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108	2017-06-13 18:50:02 -07:00
Alisson Gusatti Azzolini	db1d62caf7	Move RunPlan to a separate file Summary: This RunPlan is getting complex and confusing. The first step to clean it up is to move it out of workspace.cc to better mark separation of concerns. Reviewed By: kennyhorror Differential Revision: D5100721 fbshipit-source-id: 4be0559eba1abb8bb1ddc3818698763c2e014ef2	2017-05-24 11:07:15 -07:00
Yangqing Jia	cf317d1106	create_net: explicitly specify if one wants to overwrite the network. Summary: This is from discussion with dzhulgakov : as a step towards revisiting the core.Net autonaming, we will first guard against accidental overwrites of existing networks in the workspace. ajtulloch since we are doing Predictors in mobile, this should be safe right? azzolini - I assume this would be safe, but would love to get your approval. akyrola - would this hurt xray? Reviewed By: dzhulgakov Differential Revision: D4897725 fbshipit-source-id: aa41271927ad6671f07a53b9505283623f8c49e5	2017-04-17 21:46:53 -07:00
Aapo Kyrola	1e5140aa76	option to recompute blobs backward pass with massive memory savings Summary: This diff adds an option to recurrent_net to define some cell blobs to be recomputed on backward step, and thus they don't need to be stored in the step workspace. This is done by modifying the backward step to automatically include all operators that are needed to produce the output that is to be recomputed, and by storing those blobs in a shared workspace. To enable the shared workspace, i had to modify the stepworkspaces blob to also store a forward shared workspace. Making it a class field won't work since the lifecycle of the blob does not match the lifecycle of the operator. For basic LSTM, the performance hit is quite modest (about 15% with one setting, but your mileage might vary. For Attention models, I am sure this is beneficial as computing the attention blobs is not expensive. For basic LSTM, the memory saving is wonderful: each forward workspace only has 4 bytes (for timestep). I also modified the neural_mt LSTM Cells, but there is no test available, so I am not 100% sure I did it correctly. Please have a look. Added options to LSTM, MILSTM and LSTMAttention to enable memory mode. Reviewed By: urikz Differential Revision: D4853890 fbshipit-source-id: d8d0e0e75a5330d174fbfa39b96d8e4e8c446baa	2017-04-11 13:03:48 -07:00
Aapo Kyrola	a2065f3c1e	report capacity bytes as part of workspace blob stats Summary: Instead of reporting the number of total elements of tensor, report the number of bytes. But report the capacity of the tensor, not the current number of bytes. Reviewed By: jamesr66a, salexspb Differential Revision: D4851633 fbshipit-source-id: 464d552f41f1b5f25753b0e7001d299b6dac1966	2017-04-07 19:16:37 -07:00
Aapo Kyrola	ffd298376a	option to print tensor shapes at exit Summary: Added Caffe2 cmd line option --caffe2_print_blob_sizes_at_exit=1, that when enabled, will print all tensor sizes at the workspace destructor. Handy especially when using sub-workspaces like with RNNs. Note that the sizes are number of elements, not bytes. Output is designed to be easily excel-copypasteable. TODO: add sorting Reviewed By: jamesr66a Differential Revision: D4844628 fbshipit-source-id: 11608a1710ae5c89bbd741edb506d25496606185	2017-04-06 21:36:04 -07:00
Alexander Sidorov	51ae65d76f	RNN: reuse memory for gradients of internal blobs of the cell net Summary: Main idea is that on the backward pass we don't need to store all the backward outputs in memory. This diff addresses only ones used internally in each private workspace by creating that shares them all witing the backward pass. Another thing we can do - get rid of state_grad blobs, but this would be a different effort. See comments for more detailed description. Reviewed By: urikz Differential Revision: D4784900 fbshipit-source-id: 2dd8fe1b1215217ce92c09d918582d76c3051630	2017-03-29 15:20:50 -07:00
Ou Jin	eeb7279020	compile execution step Summary: When the execution step is representing things like: for loop execution_step net1 execution_step net2 net3 the preparation cost for execution step is too high. This diff moves most of the shared information in the CompiledExecutionStep to save time. After the change the benchmark result for parameter server handler is as following: (be aware that the first two have some variance) INFO:__main__:==Summary== INFO:__main__:Time <function case_if at 0x7f7160c32938> 0.0752924203873 INFO:__main__:Time <function case_loop at 0x7f7160c329b0> 0.0677666187286 INFO:__main__:Time <function case_simple_net at 0x7f7160c32a28> 0.0605396509171 INFO:__main__:Time <function case_one_loop at 0x7f7160c32aa0> 0.0611681699753 Before the change: INFO:main:==Summary== INFO:main:Time <function case_if at 0x7f19d079f848> 0.100815701485 INFO:main:Time <function case_loop at 0x7f19d079f8c0> 0.0864136457443 INFO:main:Time <function case_simple_net at 0x7f19d079f938> 0.0614696979523 INFO:main:Time <function case_one_loop at 0x7f19d079f9b0> 0.0598972082138 Reviewed By: azzolini Differential Revision: D4643926 fbshipit-source-id: 5a4b97230ba778e0ff5cbafc8a216335a191068a	2017-03-08 23:49:41 -08:00
Aapo Kyrola	dcefc74a0c	Shape and Type Inference Part1 Summary: This is a bit large diff, sorry about it. It includes basic shape and type inference functionality, based on YQ's Schema scaffolding. I added some helper functions to make it easier to write simple translations. Bigger refactoring was needed for ConvPoolBase so that we could use the shape inference already there in the schema. I annotated enough operators to be able to infer forward-pass of shapes for basic convnet, and added test for that. I intend to bootcamp some annotations and annotate enough to handle Resnets fully. Need to think about gradients, if they could be annotated in an easier way. Only shapes are now exposed to Python, types will follow later. Also the inference is not called yet anywhere but unit test. Also I am not sure if everything is in the best location in the code, but shouldn't be hard to move stuff around. Reviewed By: dzhulgakov Differential Revision: D4436818 fbshipit-source-id: eebee5937ccc9ac09c245465302388a1fae6933c	2017-02-02 22:29:22 -08:00
Liang Xiong	1aafeb3565	clean up memory of c2/sigrid predictor Summary: trying to optimize c2 predictor memory usage. mainly to remove unsed dbreader and dper metadata. Differential Revision: D4232595 fbshipit-source-id: dcd7aa7dd09587ec9811a9e5ec725e0c22757665	2016-11-29 15:18:39 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00
Yangqing Jia	d1e9215184	fbsync	2016-10-07 13:08:53 -07:00
Yangqing Jia	b23e51d467	chunky sync	2016-09-06 15:55:19 -07:00
Yangqing Jia	6463eebc7b	chunky sync - build scripts to be written	2016-07-21 10:16:42 -07:00

1 2

53 Commits