Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11688
As a first step to remove static context(merge with allocator), we'll create a
global registries for context constructors, and remove CreateContext function from tensor.
Reviewed By: ezyang, dzhulgakov
Differential Revision: D9779821
fbshipit-source-id: 8b239ea50af7a0556fde2382f58f79194f0e3dc1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11817
Blob::Serialize() and Blob::Deserialize() are now free functions SerializeBlob(), DeserializeBlob() instead.
This takes away access to Blob internals from them and makes future refactorings easier.
Reviewed By: ezyang
Differential Revision: D9882726
fbshipit-source-id: 3251ebd4b53fc12f5e6924a6e4a8db3846ab3729
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13
Pull Request resolved: https://github.com/pytorch/translate/pull/166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125
Closes https://github.com/pytorch/pytorch/pull/9125
Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later
Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:
1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change
Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.
Reviewed By: ezyang, houseroad
Differential Revision: D9024330
fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13
Pull Request resolved: https://github.com/pytorch/translate/pull/166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125
Closes https://github.com/pytorch/pytorch/pull/9125
Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later
Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:
1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change
Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.
Reviewed By: xw285cornell
Differential Revision: D8121878
fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
Summary:
Implementation of polling async net executor.
Notes:
- New net executor async_polling - schedules CPU and GPU ops asynchronously, uses single polling thread
- Events: update to Caffe2 events to support async CPU events, adding new methods:
Query() - non-blocking checking of event states: INITIALIZED -> RECORDED -> SUCCESS/FAILED
ErrorMessage() - when operation runs asynchronously and fails calling this on event will give error message
- Tasks: using existing DAGNet's algorithm to compute CPU and GPU chains, a separate task for each chain
- Polling: using single thread to query state of events - for CPU tasks atomically queries task state, for GPU task - uses cudaEventQuery; using Event
- Scheduling of CPU ops: using global thread pools
- Scheduling of GPU ops: using GPU thread pool per GPU device
Reviewed By: dzhulgakov
Differential Revision: D5985110
fbshipit-source-id: a9de7fcbb71d046a3aa1b573072b89a65dfeee8c
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.
Reviewed By: igorsugak
Differential Revision: D5454343
fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
Summary:
At the moment serialization can tak up to 3x memory of the largest blob:
original blob, BlobProto, SerializeAsString version of the blob. As a result in
certain cases serialization takes more memory than it should and it hurts
utilization/max model size per machines.
This diff is adding IOBound ThreadPool that should set quite strict limitation
on the extra memory overhead per one blob.
Reviewed By: dzhulgakov
Differential Revision: D5012887
fbshipit-source-id: 12dbb9d3efab136411ddeffd519b602cf606661e
Summary: Seems like a lot of confusion in the group lately has been about missing CUDA operators. Let's make it clearer in the error message.
Reviewed By: azzolini
Differential Revision: D4737037
fbshipit-source-id: 56c7819df909bf954510296703bff5f221fa8ae7
Summary: Makes it much nicer to spot errors, especially in iPython notebook.
Reviewed By: kennyhorror
Differential Revision: D4465726
fbshipit-source-id: c0adaf5168248a70987ff9d5dfce54a622ff2219
Summary: Previous implementation was just concatenating string which I believe is wrong. Instead let's turn off chunking when we don't ask for it.
Reviewed By: kennyhorror
Differential Revision: D4461311
fbshipit-source-id: 8b9a3325a40a1cd0a8ffeeb20a17bf9f57b7b0a9
Summary: This allows us to serialize things between MKLMemory and a TensorProto.
Reviewed By: dzhulgakov
Differential Revision: D4218044
fbshipit-source-id: 934181493b482cb259c17ff4b17008eac52fd885
Summary:
One more small batch of CHECKs that left in C2 codebase. Most of the left overs
should be in tests/GPU only code.
Reviewed By: Yangqing
Differential Revision: D4243782
fbshipit-source-id: a4a03c116ea8ba16facd2efc135746d5921f19d5
(1) various bugfixes.
(2) Tensor is now a class independent from its data type. This allows us
to write easier type-independent operators.
(3) code convention changes a bit: dtype -> T, Tensor<*Context> -> Tensor* alias.
(4) ParallelNet -> DAGNet to be more consistent with what it does.
(5) Caffe's own flags library instead of gflags.
(6) Caffe's own logging library instead of glog, but glog can be chosen with
compile-time definition -DCAFFE2_USE_GOOGLE_GLOG. As a result, glog macros
like CHECK, DCHECK now have prefix CAFFE_, and LOG(*) now becomes
CAFFE_LOG_*.
(7) an optional protobuf inclusion, which can be chosen with USE_SYSTEM_PROTOBUF
in build_env.py.
(2) blob serialization comments
(3) cudnn: putting it under a separate device name
so we can explicitly choose cudnn instead of
having CUDA device prioritizing it.
(4) note that mint is not available with ipython
due to zeromq conflict
(5) db_throughput utility
(6) added gprofiler
(1) added blob serialization.
(2) registry can now use key types other than string.
(3) changed load_save_op so they interface with a db.
(4) change sgd iter op: it does increments so we can resume an iter.
(5) mnist linear classifier tests snapshot functionality.
(6) added protodb which is a small wrapper over TensorProtos.