Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45177
## Motivation
* To be able to make C2 ops cancellable so we can safely exit.
* Some C2 operators are now blocking thus being non-cancellable. If an error
occurs we need to be able to safely stop all net execution so we can throw
the exception to the caller.
## Summary
* When an error occurs in a net or it got cancelled, running ops will have the
`Cancel` method called.
This diff adds `Cancel` method to the `SafeEnqueueBlobsOp`
and `SafeDequeueBlobsOp` to have the call queue->close() to force all the
blocking ops to return.
* Adds unit test that verified the error propagation.
Test Plan:
## Unit test added to verify that queue ops propagate errors
```
buck test caffe2/caffe2/python:hypothesis_test -- test_safe_dequeue_blob__raises_exception_when_hang --stress-runs 1000
```
```
Summary
Pass: 1000
ListingSuccess: 1
```
Reviewed By: d4l3k
Differential Revision: D23846967
fbshipit-source-id: c7ddd63259e033ed0bed9df8e1b315f87bf59394
Summary:
## Motivation
* To be able to make C2 ops cancellable so we can safely exit.
* Some C2 operators are now blocking thus being non-cancellable. If an error
occurs we need to be able to safely stop all net execution so we can throw
the exception to the caller.
* When an error occurs in a net or it got cancelled, running ops will have the
`Cancel` method called.
* This diff adds `Cancel` method to the `SafeEnqueueBlobsOp`
and `SafeDequeueBlobsOp` to have the call queue->close() to force all the
blocking ops to return.
* Adds unit test that verified the error propagation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44495
Test Plan:
## Unit Test added to verify that queue ops propagate errors
```
buck test caffe2/caffe2/python:hypothesis_test
```
Reviewed By: dzhulgakov
Differential Revision: D23236088
Pulled By: dahsh
fbshipit-source-id: daa90d9ee32483fb51195e269a52cf5987bb0a5a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915
Since we now have C++14, we don't need these c10::guts helpers anymore
ghstack-source-id: 95777609
Test Plan: waitforsandcastle
Differential Revision: D18869639
fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16373
motivation: https://github.com/pytorch/pytorch/pull/12407
This is a manual diff.
most of the fixes should be:
```
auto* Y = Output(0);
Y->Resize(dims);
Y->raw_mutable_data(dtype);
```
-->
```
auto* Y = Output(0, dims, at::dtype(dtype));
```
But there might be other cases.
Reviewed By: dzhulgakov
Differential Revision: D13725460
fbshipit-source-id: 649a4b0e42f62cda1a60171dd9fa3e440dc9dca1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18123
the motivation of this fix is to resolve things like:
for(auto i = 0; i < N; i++) where N is bigger than int32
These instances of comparison were found by enabling -Wsign-compare
There are way too many things to fix, so issuing this as a series of fixes
The plan is to fix all these issues and then enable this flag into Caffe2 to catch future instances
Reviewed By: ZolotukhinM
Differential Revision: D14497094
fbshipit-source-id: bca3927a2188bd33a508fa503ba221c220cdaefe
Summary:
Implementation LeakyRelu operator for mkl-dnn,the speed-up of a single operation is up to 10X on BDW.
Implementation rashape operator for mkl-dnn,it will resolve occasionally crash issue which use fallback reshape operator.
Implementation CreateBlobQueue and SafeEnqueueBlobs operators,it will resolve crash issue which use fallback operators.
Fallback CreateBlobsQueueDBOp,TensorProtosDBInput,CloseBlobsQueue operators.
Implement adam operator for mkl-dnn,the speed-up of a single operator is up to 6X on BDW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11696
Reviewed By: yinghai
Differential Revision: D10100438
Pulled By: wesolwsk
fbshipit-source-id: 0b6e06897cc11e0a8e349d80a870b1e72e47f10d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14269
Removes reference to Context proper and instead adds a bool argument for async copy (the same as `copy_`)
For CopyFrom - I haven't tweaked all callsites yet. Instead I rely on a terrible hack that pointer to context is implicitly converted to bool when passed, haha :) It's not a good code and I propose to fix it in a follow up diff (maybe using clangr tooling).
Reviewed By: ezyang
Differential Revision: D13117981
fbshipit-source-id: 7cb1dc2ba6a4c50ac26614f45ab8318ea96e3138
Summary:
Hi guys,
I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios.
This is the first pull request.
Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015.
CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system.
Python is 3.5, Detectron works from python interface as well.
It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built.
What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat.
After this pull request the next step is to add Visual Studio 2017 support in the script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550
Reviewed By: ezyang
Differential Revision: D13042597
Pulled By: orionr
fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc
Summary:
xw285cornell
- To make hip files to have unique filename extension we change hip files from _hip.cc to .hip (it's the only blessing option other than .cu in hipcc 3d51a1fb01/bin/hipcc (L552)).
- Change to use host compiler to compile .cc|.cpp files. Previously we use hcc to compile them which is unnecessary
- Change the hipify script to not replace "gpu" with "hip" in the filename of the generated hipified files. Previously we do this because hcc has a bug when linking files that have same filename. We have now changed to use host linker to do linking so this is unnecessary anymore.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14036
Reviewed By: xw285cornell
Differential Revision: D13091813
Pulled By: bddppq
fbshipit-source-id: ea3d887751d8abb39d75f5d5104aa66ce66b9ee0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12180
I had to fix a lot of call sites, because a lot of places assume that
you can actually get a const vector&, and if the internal representation
of sizes in a tensor is NOT a vector, it's not possible to fulfill
this API contract.
Framework changes:
- I deleted TensorImpl::dims(); caffe2::Tensor::dims() just forwards to
sizes() now.
- De-templatized SetDims; now it is an explicit list of ArrayRef and
variadic overloads. This makes implicit conversions work again,
so I don't need to explicitly list the std::vector cases too.
- As a knock-on effect, this causes Reset() to accept at::IntList as well as
const std::vector<int64_t>&
- Edited variadic overloads of SetDims to all forward to the underlying
arbitrary-dim implementation, reducing code duplication. (It's probably
marginally less efficient in the new world.)
- Replace Tensor constructor accepting const std::vector<int64_t>& with at::IntList
- Make MKLTensor accept ArrayRef along with vector in constructor and
Reset (unfortunately, no implicit conversions here, since it's templated on
index type.)
- There are a few other places, like cudnn, where I changed functions
that previously took const std::vector<int64_t>& to take at::IntList
instead.
Classification of call site changes:
- 'const std::vector<int64_t>& x_dims = x.dims()' ==>
'at::IntList x_dims = x.dims()'
- 'std::vector<int64_t> x_dims = x.dims()' ==>
'std::vector<int64_t> x_dims = x.dims().vec()' (we need a copy!)
Usually this is because we're about to mutably modify the vector
to compute some new dimension. However, it also very commonly occurs in the
form: 'x_dims_ = x.dims()' because we frequently cache sizes in operators.
- Instead of constructing std::vector<int64_t>{blah, blah}, construct an
at::IntList directly
ArrayRef changes:
- cbegin()/cend() iterators, they operate the same aas begin()/end() because
everything on ArrayRef is const.
- Moved operator<< into ArrayRef.h, so that it's always available when
working with ArrayRef. I also templated it, so it now works on an
ArrayRef of any type.
- Add operator== overload for ArrayRef, and also add variants to permit
comparison of ArrayRef with std::vector, a very common operation.
(The non-templated version of operator== can get these automatically
via implicit conversion, but with templates C++ refuses to do
any explicit conversions.)
I'm planning to audit all dims() call sites to make sure they don't
expect 'auto x = t.dims()' to give you an x whose lifetime can validly
outlive the tensor.
I opted not to do a dims() to sizes() rename, because dims() also matches
the protobufs accessor. Bad news!
Reviewed By: jerryzh168
Differential Revision: D10111759
fbshipit-source-id: a2a81dc4b92c22ad4b3b8ef4077a7e97b6479452
Summary:
TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification.
This is a codemod by mechanically doing the following change:
CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019
Reviewed By: ezyang, teng-li
Differential Revision: D10016276
Pulled By: Yangqing
fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164
Summary:
Properly annotated all apis for cpu front. Checked with cmake using
cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON
and resulting libcaffe2.so has about 11k symbols.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504
Reviewed By: ezyang
Differential Revision: D9316491
Pulled By: Yangqing
fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13
Pull Request resolved: https://github.com/pytorch/translate/pull/166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125
Closes https://github.com/pytorch/pytorch/pull/9125
Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later
Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:
1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change
Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.
Reviewed By: ezyang, houseroad
Differential Revision: D9024330
fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13
Pull Request resolved: https://github.com/pytorch/translate/pull/166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125
Closes https://github.com/pytorch/pytorch/pull/9125
Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later
Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:
1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change
Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.
Reviewed By: xw285cornell
Differential Revision: D8121878
fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
Summary:
* Likely need to test this so bad formatting can't be added in the future, but cleaning all operators so we at least have good examples.
* Formatting between our internal Facebook operator catalog and external caffe2.ai catalog are still slightly different. We'll work on this.
Closes https://github.com/caffe2/caffe2/pull/1846
Reviewed By: pjh5
Differential Revision: D6848570
Pulled By: orionr
fbshipit-source-id: b9bc0bfccb243d0440bd7b2406858cad8dc37e92
Summary:
Preivously in SafeDequeueOp, the in.dims()[0] would fail if in.ndim()=0.
However the error message if not informative. I added a Caffe_Enforce,
which would print out the input and output blob name. This is very helpful for
future debugging as well.
Differential Revision: D6821421
fbshipit-source-id: b07e5829a2c580aaaac88b0d9ff8d05f6da11713
Summary: Weighted sampling reader dequeue randomly chooses a hive reader to read a mini-batch. This diff allows dequeue to output the index of the randomly chosen table to a specific blob.
Reviewed By: kennyhorror
Differential Revision: D6621070
fbshipit-source-id: 754b981fc2bcfdb0146d2a0a5b677e7cfe74211b
Summary: This is a reapplication of the earlier PR due to xplat move. Original author is Christoph Conrads <christoph.conrads@fluent.ai> christoph-conrads .
Reviewed By: houseroad
Differential Revision: D6379736
fbshipit-source-id: b7482ecf3b9487a528c15e92976e915791210002
Summary: This will allow to do data reading in small batches and concat the batches later on.
Reviewed By: kennyhorror
Differential Revision: D5739129
fbshipit-source-id: 66a8087e5f9d10d654e367c6111ac90cbf54224e
Summary:
(1) BlobsQueue is causing a gcc error (google search suggeste it was a
bug, but we'll put the implementation in a separate cc file).
(2) Preparing for cuda 9: update cub.
(3) Prepare for cudnn 7: update cudnn rnn op.
(4) Fix an MSVC issue
Reviewed By: sf-wind, jerryzh168
Differential Revision: D5574352
fbshipit-source-id: 230820ce3ceaa32bee8323bdc509de352c93fcf2
Summary:
Caffe2: add a DB that's wrapped around a BlobsQueue as an adapter for data from non-DB interface.
This is useful for bridging the gap between DB interface data processing ops (TensorProtosDBInput, ImageInputOp etc.) and data that's coming from arbitrary Python or the pretty intricate Hive reader.
Reviewed By: akyrola
Differential Revision: D5554560
fbshipit-source-id: 01bb0056410f9ade205367d5fefc721f91f5b629
Summary: Add check that every time we register a caffe operator to CPU or GPU that documentation is added for the particular operator.
Reviewed By: dzhulgakov
Differential Revision: D5443110
fbshipit-source-id: 3793c3d29bea1228078cb30bdf8243ac0ab90664
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.
Reviewed By: igorsugak
Differential Revision: D5454343
fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
Summary: Add lint rule to check that every time we register a caffe operator to CPU or GPU that documentation is added for the particular operator.
Reviewed By: dzhulgakov
Differential Revision: D5348078
fbshipit-source-id: c3fa22fc7ca8066d5fc8fa780b23d7867fd3380e
Summary: As title. This helps with (quite common) cases where data input is stuck for reason or another, and the net execution never proceeds and is stuck forever.
Reviewed By: andrewwdye
Differential Revision: D5409885
fbshipit-source-id: 840261fd5964408f788fc0f50ece0d74193694ac
Summary:
RFC. This is a naive implementation of Rebatchin Queue for MultiTask
effort. Full disclaimer, I'm very new to Caffe/Machine Learning and I'm doing
dodge science here (under Dmytros supervision), so please be extra tough on
this review so I
can learn best practices :)
Differential Revision: D4871970
fbshipit-source-id: 924820ef0fce45b5e2bdabeec9885cbafa23a880