Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16478
This diff includes an example registration of a caffe2 op in torch. A previous attempt ran into a static initialization order bug.
Reviewed By: smessmer
Differential Revision: D13854304
fbshipit-source-id: ec463ce2272126d08a5163d1599361ee5b718bbc
Summary:
Just noticed while building on a machine without cudnn present - it was building but the runtime failed since some methods weren't bound
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16701
Differential Revision: D13937247
Pulled By: dzhulgakov
fbshipit-source-id: c81f05be7a9e64a1a8591036dcf8692c0ed4064e
Summary:
Add winograd conv method. Users can select the direct conv or winograd conv in the model file.
We close the origin pr https://github.com/pytorch/pytorch/pull/12154 and create this new one for better rebasing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15196
Differential Revision: D13463721
Pulled By: yinghai
fbshipit-source-id: c5cd5c8aa7622ae7e52aeabd3dbb8ffb99b9b4ee
Summary:
-Skip the test due to flaky behavior on AMD/Rocm
-The fix is expected in Rocm 2.2 ( HSA runtime)
bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16639
Differential Revision: D13915231
Pulled By: bddppq
fbshipit-source-id: 66e1d275836337170b15ceb9d60cfdd3242d4df8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16630
two PRs landed concurrently - enforcing tensor constraints and refactoring c10. Since it's not a prod code - disable test and I'll let Sebastian to fix it properly.
Reviewed By: ezyang
Differential Revision: D13908117
fbshipit-source-id: 381c5626078b794afa1fc7a95cb1ea529650424c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15388
This is another pass to make perfkernels code safer from illegal instruction error.
Removed dependency to c10/util/Logging.h
We're err on the safer side at the expense of some verbosity.
Reviewed By: dskhudia
Differential Revision: D13502902
fbshipit-source-id: 4f833115df885c5b4f8c1ca83b9badea1553f944
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16246
The op schema says it returns multiple values, so let's actually return multiple values instead of one tuple.
For some reason, this did work when called from python (probably some auto-unpacking),
but once called from JIT, it segfaulted. This diff fixes that.
Reviewed By: dzhulgakov
Differential Revision: D13780147
fbshipit-source-id: fe94f82f4c53b7454f77c4484fca4ac9dc444475
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16374
this fixes the original attempt in OSS (adds to CMake and python build files)
Reviewed By: smessmer
Differential Revision: D13821061
fbshipit-source-id: 82f0dade0145fd04bdf8e3cb3954b5790e918162
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16191
logdevice related modifications for generic feature type
we directly convert the generic feature structures to json strings, which corresponds to the column input in offline and dper
Reviewed By: itomatik
Differential Revision: D13551909
fbshipit-source-id: 807830c50bee569de202530bc3700374757793a2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16350
Example usage of the new caffe2 integration
Reviewed By: smessmer
Differential Revision: D13408546
fbshipit-source-id: 87240ca7f48d653a70241d243aa0eb25efa67611
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16335
group conv is not implemented with EIGEN engine so this diff disables related tests
Reviewed By: jamesr66a
Differential Revision: D13807204
fbshipit-source-id: 41f6de43da40882f57e64474520e185733caefb7
Summary:
Based on offline discussion it should be less surprising to the users of existing code. Thus caffe2::Tensor is now a move-only class (as it used to be), explicit calls to UnsafeSharedInstance() are necessary to get shared_ptr behavior.
This change also identified a few places that misused the copy constructor - those are fixed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15416
Reviewed By: Yangqing
Differential Revision: D13524598
fbshipit-source-id: aea12d6dff77342606fa88ce4ddddbff266245a7
Summary:
1. Add some gloo communication operators into related fallback list;
2. Work around to avoid compiling errors while using fallback operator whose CPU operator inherits from 'OperatorBase' directly like PrefetchOperator;
3. Add new cpu context support for some python module files and resnet50 training example file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11330
Reviewed By: yinghai
Differential Revision: D13624519
Pulled By: wesolwsk
fbshipit-source-id: ce39d57ddb8cd7786db2e873bfe954069d972f4f
Summary:
bypass-lint
- Change all Caffe2 builds to use setup.py instead of cmake
- Add a -cmake- Caffe2 build configuration that uses cmake and only builds cpp
- Move skipIfCI logic from onnx test scripts to the rest of CI logic
- Removal of old PYTHONPATH/LD_LIBRARY_PATH/etc. env management
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15917
Reviewed By: orionr
Differential Revision: D13637583
Pulled By: pjh5
fbshipit-source-id: c5c5639db0251ba12b6e4b51b2ac3b26a8953153
Summary:
This is follow up on #13945 where we had to turn off some TRT tests because some ops were not ready to accept ONNX opset 9+ models. This PR fixes Reshape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15380
Differential Revision: D13649825
Pulled By: houseroad
fbshipit-source-id: b72e62803de5b63cc001c3fe4b3bf64dfa996e94
Summary:
Implementation LeakyRelu operator for mkl-dnn,the speed-up of a single operation is up to 10X on BDW.
Implementation rashape operator for mkl-dnn,it will resolve occasionally crash issue which use fallback reshape operator.
Implementation CreateBlobQueue and SafeEnqueueBlobs operators,it will resolve crash issue which use fallback operators.
Fallback CreateBlobsQueueDBOp,TensorProtosDBInput,CloseBlobsQueue operators.
Implement adam operator for mkl-dnn,the speed-up of a single operator is up to 6X on BDW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11696
Reviewed By: yinghai
Differential Revision: D10100438
Pulled By: wesolwsk
fbshipit-source-id: 0b6e06897cc11e0a8e349d80a870b1e72e47f10d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15865
factored out code used in tests for operators Add, Mul and Sub
into two new methods: a first one to generate the test vectors, a second
one to run the actual tests given a caffe2 and python operator.
Reviewed By: houseroad
Differential Revision: D13526955
fbshipit-source-id: 8970ba5a1305ca19a54a14b51816d4a19f19d678
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15553
Add unit test and implementation of NHWC layout for Resize operator.
Also, add pragma parallel loop to old NCHWC layout.
Reviewed By: jspark1105
Differential Revision: D13540762
fbshipit-source-id: eebf252bf0d1efdff180a171d804181045f100a5
Summary:
Enable conv+add fusion, same as conv+sum
Caution: only element-wise add is supported on IDEEP without scalar
broadcast. Otherwise, the fusion is illegal.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15268
Differential Revision: D13577375
Pulled By: yinghai
fbshipit-source-id: 92c9c4b667c5ca5f7a262a5bffaa8aa68eeff3bd
Summary:
Hello,
This is a little patch to fix `DeprecationWarning: invalid escape sequence`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15733
Differential Revision: D13587291
Pulled By: soumith
fbshipit-source-id: ce68db2de92ca7eaa42f78ca5ae6fbc1d4d90e05
Summary:
support 0 size in any of the tensor dimensions in mkldnn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15295
Differential Revision: D13573747
Pulled By: yinghai
fbshipit-source-id: 5bf7a0b9e2567e80f44981a7823be5407fc94e53
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15625
3D group conv (both NCHW and NHWC layout) was not correct.
Added group=2 in test_1d_convolution and test_3d_convolution in conv_test
Reviewed By: protonu
Differential Revision: D13562099
fbshipit-source-id: 586e8a7574a2764f2a3b559db6c2415b3ab90453
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15417
Right now the way we test whether Blob contains a CPU tensor is broken in ```PythonOpBase``` is broken, which means non-CPU path might never be taken.
Searching through the codebase, non-gpu path is used in PythonDLPack, and it is used in PytorchOp which is unused. So we'll remove non-gpu path in this diff.
Reviewed By: dzhulgakov
Differential Revision: D13495011
fbshipit-source-id: 9fe9537f05026d2a2cf7051efa81d184de722710
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15632
Just formatting and a few lints.
Reviewed By: yinghai
Differential Revision: D13562403
fbshipit-source-id: c56f8ee61f68cdaccc0828a764ff729454f68259
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15588
Use NHWC2NCHW or NCHW2NHWC functions which is easier to understand compared to code using transpose and generalizable to non-2D convolutions.
Reviewed By: csummersea
Differential Revision: D13557674
fbshipit-source-id: c4fdb8850503ea58f6b17b188513ae2b29691ec0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15082
We didn't have unit test for low-precision rowwise adagrad
Reviewed By: chocjy
Differential Revision: D13300732
fbshipit-source-id: 46e7bdfc82c5a6855eeb6f653c0a96b0b3a20546
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15389
SparseLengthsMean was generating uninitialized data for empty inputs (lengths == 0). We should return zeros.
The unit tests were also not covering this special case which is fixed by this diff.
Reviewed By: salexspb
Differential Revision: D13515970
fbshipit-source-id: 3c35265638f64f13f0262cee930c94f8628005da
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15458
many nets in the wild seem to have outputs that are never produced by the net.
Reviewed By: ZolotukhinM
Differential Revision: D13534185
fbshipit-source-id: 2b23b39c28404c53f68868f3bf6df53c5fea9eab
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15453
Just move things around to facilitate further development. No logic change.
Reviewed By: rdzhabarov
Differential Revision: D13533959
fbshipit-source-id: eebab1306939e802aacffb24a711d372fd67916c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15174
Previously, Caffe2 maintained a separate per-thread per-device
current logical CUDA stream ID. In this PR, we switch Caffe2 over
to using c10::Stream to manage the current stream, and also
manage the allocation of cudaStream_t objects.
This results in a slight behavior change: previously, Caffe2
would have been willing to allocate an arbitrary number of
CUDA streams, depending on how high the logical stream IDs
went. The c10::Stream pool has a fixed number of streams, once
you exceed it, it wraps around.
Reviewed By: dzhulgakov
Differential Revision: D13451550
fbshipit-source-id: da6cf33ee026932a2d873835f6e090f7b8a7d8dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15366
swap the old implementation with a slightly easier one to understand
I ran the tests and compared the number of chains compared to the old algorithm. This one outperforms on every test, but we have yet to see if that impacts performance at all.
old chain 34 nomnigraph chain 25
old chain 46 nomnigraph chain 34
old chain 228 nomnigraph chain 188
old chain 397 nomnigraph chain 338
Reviewed By: ilia-cher
Differential Revision: D13057451
fbshipit-source-id: ccd050bfead6eb94ab9c7b0a70b09a22c2b9e499
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15250
This adds `__repr__` methods to all of the classes under task.py. This makes the objects much easier to interact with when using them in an interactive manner, such as in a Jupyter notebook.
The default `__repr__` method just returns the object ID which is very unhelpful.
Reviewed By: hanli0612
Differential Revision: D13475758
fbshipit-source-id: 6e1b166ec35163b9776c797b6a2e0d002560cd29
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15191
OSS:
just splitting out basic flags from a unit test. So I can extend them in another test where I need to add additional flags.
Reviewed By: yinghai
Differential Revision: D13159184
fbshipit-source-id: 9823e792cf0ed8d0379235c44564862b7d784845
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15110
support casting to string on CPU
Reviewed By: intermilan
Differential Revision: D13429381
fbshipit-source-id: b737a1ba1237b10f692d5c42b42a544b94ba9fd1
Summary:
the speed-up of a single operation is up to 3X .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15106
Differential Revision: D13429596
Pulled By: bddppq
fbshipit-source-id: f8d987cafeac9bef9c3daf7e43ede8c6a4ee2ce5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14631
adding a empty name scope to allow people jump out from current namescope.
This could be useful when you want to access blob from parent or sibling scope.
Facebook:
e.g: we encoutered a potential usecase in D13124249 (it's a large diff, please search by EmptyNameScope in that diff), we need to access to a blob declared in root namescope from a device namescope (device namescope has been used by parallel_GPU API). `EmptyNameScope` can help us do that with ease.
I referenced to `EmptyDeviceScope` D6103412 while implementing this one.
Reviewed By: yinghai
Differential Revision: D13272240
fbshipit-source-id: d4cde5abcc2336e456b6c6ef086266ef94d86da8
Summary:
…done once
This allow no-op build to work correctly even when BUILD_CAFFE2_OPS is on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14982
Differential Revision: D13413960
Pulled By: zdevito
fbshipit-source-id: 6e5412a8c375af8a47c76f548cdd31cff15f3853
Summary:
Currently in caffe2, one cannot properly fetch the content of Int8 blobs.
Upon digging the source code, it turns out that the relevant source code is not being compiled. Adding the source to CMakeLists.txt fixes this issue.
First time ever doing a pull request. Please let me know if there's any rule I should follow. Thanks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15047
Differential Revision: D13417583
Pulled By: bddppq
fbshipit-source-id: dd39575971a3012635edbf97a045d80e4b62a8eb
Summary:
fix auto grad summing for IfOp where intermediate output needs renaming.
Bug before this diff:
- we only renames the output of IfOp without changing the subnet ops output
- this results in blob not found error
the unittest provides an example
this diff fix that for IfOp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14772
Differential Revision: D13327090
Pulled By: harouwu
fbshipit-source-id: ec40ee88526ace3619c54551e223dd71158a02f8
Summary:
This will let us install tests and other Caffe2 python code as a part of running Caffe2 tests in PyTorch.
Broken out of https://github.com/pytorch/pytorch/pull/13733/
cc pjh5 yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14898
Reviewed By: pjh5
Differential Revision: D13381123
Pulled By: orionr
fbshipit-source-id: 0ec96629b0570f6cc2abb1d1d6fce084e7464dbe
Summary:
This pull request contains changes for:
1. Added MIOpen RNN API miopenGetRNNLayerBiasSize and miopenGetRNNLayerParamSize.
2. Fixed usage of API miopenGetRNNLayerParam.
3. Modifying the RNN test to run using MIOpen engine.
Differential Revision: D13355699
Pulled By: bddppq
fbshipit-source-id: 6f750657f8049c5446eca893880b397804120b69
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13756
This implements general Gather operator for arbitrary axis, sharing the code with BatchGather.
- CPU gather & batch gather logic is now shared through caffe2::gather_helper, for any axis.
- Shared CUDA kernel moved to gather_op.cuh, for any axis.
- Gradients of axis > 0 delegate to BatchGatherGradientOp which now has axis argument.
- BatchGatherOp doc strings updated to have correct rank (q + (r -1)) and output.
- Added tests for axis == 2.
GatherOp supports index wrapping for axis == 0 by default, which was earlier for ONNX.
This diff also extends it to work in Cuda kernel. Added "wrap_indices" argument which specifies
wheather this wrapping should be done; set it to true if you'd like wrapping for any axis.
TBD: Update gradients to support negative indices (separate diff).
TBD: Once we have operator versioning, we'd like to update GatherOp to NOT support axis 0 wrapping
by default, but rather do it only if wrap_indices is set.
Reviewed By: dzhulgakov
Differential Revision: D12983815
fbshipit-source-id: 8add9d67b47fe8c5ba7a335f581ca0530b205cd7
Summary:
Goal of this PR is to unify cuda and hip device types in caffe2 python front end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221
Differential Revision: D13148564
Pulled By: bddppq
fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14268
Removes the need for Context in Tensor by doing simple dispatch for CopyBytes. It'd eventually be subsumed by Roy Li's changes of proper copy_ op, but before that is done, let's get a clear logic of how copies are implemented and clean up some craft in CopyFrom implementation.
Note, that with these changes, one can probably can get rid of Context::CopyFromCPU/CopyToCPU, but it's a matter for follow up diffs.
This diff doesn't change the API of Tensor yet, but relies on the fact that passing `Context` to CopyFrom makes copy async if the device is CUDA and doesn't have any effect otherwise (that's how Context methods are implemented).
This doesn't change semantics of copy async implementation - as before it blindly calls cudaMemcpyAsync which probably means that it can be misused if invoked separately outside of operator body. I'll leave it for the follow up copy_ unification.
For Extend() we always do async copy - it makes sense as it's an in-place device-device operation and only any further op would be observable.
Note: there are now three ways of invoking copy in C2 code - templated CopyBytes, virtual CopyFromCPU/etc, and double-dispatch free method here. Hopefully we can get rid of the second one.
Also, please advise whether it's c10-worthy :)
Reviewed By: ezyang
Differential Revision: D13117987
fbshipit-source-id: a6772d6dcf3effaf06717da3a656fc9873b310b5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14342
Sometimes, when we are creating a TaskGroup, we are in fact creating a TaskGroup for a distributed job. In some cases, we may want to register a few nets as "remote" to a TaskGroup. The remote net should have sufficient attributes on where they should be executed later on.
This diff adds the remote net attribute to the TaskGroup class. It exposes two minimal functionalities: adding a remote net, and getting all remote nets added to a TaskGroup.
Reviewed By: d4l3k
Differential Revision: D13188320
fbshipit-source-id: efe947aec30817e9512a5e18be985713b9356bdc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14143
ConvTranspose has a per-operator attribute rename, which meant that the
global attribute rename for kernels => kernel_shape was not applied.
Changing the behavior so that the global renames always apply, but per-op
renames can override those for specific attributes.
Note: The python frontend path isn't actually used for ConvTranspose, but I
thought it would be good to make it consistent.
Reviewed By: yinghai
Differential Revision: D13113395
fbshipit-source-id: cd3f124b4b5c753a506d297138b7d002b51bfb38
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14196
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13641
FeedTensor function used to take a pointer to Tensor and feed the content using Resize
and mutable_data, but since Tensor is a pointer now, we can just return a Tensor instead.
Reviewed By: dzhulgakov
Differential Revision: D13091163
fbshipit-source-id: 9abf2fd320baca76e050530c500dd29f8e2d0211
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14214
This is to pick up the residual task of T36325466 to make sure that input/output binding of c2 Onnxifi op is positional.
Reviewed By: dzhulgakov
Differential Revision: D13134470
fbshipit-source-id: d1b916dade65c79133b86507cd54ea5166fa6810
Summary:
Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12971
Reviewed By: bddppq
Differential Revision: D12850675
Pulled By: yinghai
fbshipit-source-id: f1cde163201bd7add53b8475329db1f038a73019
Summary:
xw285cornell
- To make hip files to have unique filename extension we change hip files from _hip.cc to .hip (it's the only blessing option other than .cu in hipcc 3d51a1fb01/bin/hipcc (L552)).
- Change to use host compiler to compile .cc|.cpp files. Previously we use hcc to compile them which is unnecessary
- Change the hipify script to not replace "gpu" with "hip" in the filename of the generated hipified files. Previously we do this because hcc has a bug when linking files that have same filename. We have now changed to use host linker to do linking so this is unnecessary anymore.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14036
Reviewed By: xw285cornell
Differential Revision: D13091813
Pulled By: bddppq
fbshipit-source-id: ea3d887751d8abb39d75f5d5104aa66ce66b9ee0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13795
Add ability for dot string generation for a single subgraph and python bindings (which is pretty useful for model exploration in Python)
Restructure DotGenerator class a bit to make it easy to implement this feature
Reviewed By: bwasti
Differential Revision: D13010512
fbshipit-source-id: 825665438394b7e6968ab6da167b477af82a7b62
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14074
expose inducesEdges and addNode to python's NNSubgraph. This make it easy to manually construct a NNSubgraph in python
Reviewed By: bwasti
Differential Revision: D13092885
fbshipit-source-id: a94ed0b318162e27e3a4b5a4954eb6d169da7405
Summary:
Currently after performing export it gives two entries of externel_input
of input data in predict_net proto because it extends the externel_input
twice once seperately using input blob and one it is extendind all the entries
of external_input from proto in which input blob is already included
Signed-off-by: Parth Raichura <parth.raichura@softnautics.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12979
Differential Revision: D12916349
Pulled By: soumith
fbshipit-source-id: 4d4a1c68c0936f8de3f4e380aea1393fe193cd2d
Summary:
Avoid false failure by checking for the presence of the test data in setup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13627
Differential Revision: D13090324
Pulled By: ezyang
fbshipit-source-id: e85571943d168c0007212d7b1a5b99ffa0c39235
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13825
async_polling was an intermediate step towards async_scheduling and is not used
Reviewed By: yinghai
Differential Revision: D13019059
fbshipit-source-id: eee6ba53e7f476ddb481afba3bf1768303864d32
Summary:
This pull request contains changes for:
1. Removing ConvTranspose related changes from caffe2/operators/hip/conv_op_miopen.cc
2. Adding the file caffe2/operators/hip/conv_transpose_op_miopen.cc
3. Modifying the tests to run convTranspose op using MIOpen engine
Differential Revision: D13055099
Pulled By: bddppq
fbshipit-source-id: ca284f8f9a073005b22013c375cc958257815865
Summary: Currently Lambdarank applies exponential emphasis on relevance, i.e., g=2^rel when calculating dcg, this diff adds options that supports g=rel in the loss function.
Reviewed By: itomatik
Differential Revision: D9891514
fbshipit-source-id: 64730d467a665670edd37e6dc1c077987991d1a8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13641
FeedTensor function used to take a pointer to Tensor and feed the content using Resize
and mutable_data, but since Tensor is a pointer now, we can just return a Tensor instead.
Reviewed By: ezyang
Differential Revision: D12873145
fbshipit-source-id: 653735c20d611ff6ac9e380d8b3c721cb396a28f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13798
The semantics of C2 and ONNX Concat is a bit different. C2 concat accepts "add_axis" arg and will raise the dim if so. It's equivalent of attaching a Reshape after plain concat in ONNX.
Reviewed By: rdzhabarov
Differential Revision: D13012867
fbshipit-source-id: da23e555bae709fd2a373b04dcb9db4e984ae315
Summary:
There was a bug in the uniqueness check that only made the first run
unique
Reviewed By: duc0
Differential Revision: D13013504
fbshipit-source-id: ecf7526d0fafd7968f1301734123f93968efef46
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13812
Original commit changeset: 2cf95bdc5ed8
Looks like in iOS, `uint64_t` is not the same as `size_t`. :( Fixed it here.
Reviewed By: houseroad
Differential Revision: D13017390
fbshipit-source-id: d33854ce341225aba372fb945c3704edc14f9411
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13745
We need to support types beside `int64` and `float`.
Reviewed By: bddppq, rdzhabarov
Differential Revision: D12967258
fbshipit-source-id: 688076e6f504b2bf24bba89714df87a678c5638a
Summary:
Add a markdown document summarizing the coverage of serialized operator tests. This currently only takes into account what has been covered by the tests with respect to the entire registry of c2 operators.
Next, we will break down the coverage by which operators have unit tests associated with them, which have hypothesis tests, and which have tests more specifically calling assertReferenceChecks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13703
Reviewed By: dzhulgakov
Differential Revision: D12970810
Pulled By: ajyu
fbshipit-source-id: 4f0cd057b1cf734371333e24d26cbab630a170e1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13377
* Enable junk fill for the default CPU allocator. The first diff only enables this for the tests. A second diff will change the default of zero-fill to false.
* Fix tests to use 64-bit counters that IterOp and LearningRateOp demands.
* Fix kernels that uses uninitialized memory.
Reviewed By: salexspb
Differential Revision: D10866512
fbshipit-source-id: 17860e77e63a203edf46d0da0335608f77884821
Summary:
I was hitting this error:
caffe2/caffe2/operators/stats_put_ops.h:66:25: runtime error: 9.22337e+18 is outside the range of representable values of type 'long'
So, the assignment from int64_t to float loses some precision and because of that we overflow.
Reproduced this issue with this diff D12945013
Reviewed By: mlappelbaum, jdshi-fb
Differential Revision: D12927086
fbshipit-source-id: 7eae7fe25ab49d5ac15279335bd5b1fa89d6e683
Summary: Adding Fetching Real number representation for int8 tensor in workpace.py
Reviewed By: harouwu
Differential Revision: D12936556
fbshipit-source-id: f8756a37bce21c93d44d52faf5da9c9bd6473f4a
Summary:
We updated the description of upsample_op in onnx: https://github.com/onnx/onnx/pull/1467
Therefore, we need to support the new upsample_op in caffe2-onnx backend as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13272
Reviewed By: houseroad
Differential Revision: D12833656
Pulled By: zrphercule
fbshipit-source-id: 21af5282abaae12d2d044e4018a2b152aff79917
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12733
Conv in NHWC layout only works for 2D images. This has been a pain point when implementing quantized 3D convolution because we need NHWC layout for best performance (note that NHWC layout in general gives better performance in CPU not just for quantized operators). For example, our quantized ops have a functionality to measure quantized error operator by operator but this needs running a shadow fp32 operator, but this is not easy when there's no 3D conv in NHWC layout is available (currently we're doing layout conversion on the fly for the shadow fp32 operator which is error prone). Some of Caffe2 frameworks like brew generates error when we try to create a 3D conv op in NHWC layout. This was also a blocker for using aibench because aibench is using brew.
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D10333829
fbshipit-source-id: 2d203ee1db833cd3f9d39353219e3894b46c4389
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13554
D10233252 broke ROCM test.
We don't have group conv in NHWC for hip yet and this diff omits related tests.
Reviewed By: hyuen
Differential Revision: D12917880
fbshipit-source-id: 9baf36a8cb061ee8cf393b2c438a2d1460ce5cd8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12428
Group conv in NHWC layout was enabled in CPU after D7547497.
In D7547497, unit test of group conv in NHWC layout in CPU was enabled in group_conv_test.py but not in conv_test.py . This diff also enables it in conv_test.py .
Reviewed By: BIT-silence
Differential Revision: D10233252
fbshipit-source-id: aeeaf3eedc60e1cf6321b5a1dbe6a561e3aacbde
Summary:
Essentially makes cuDNN to think of those kernels like of Nx1 ones.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12902
Reviewed By: BIT-silence
Differential Revision: D10852862
Pulled By: soumith
fbshipit-source-id: 7416cf6d131177340d21cbf1d42c1daa6c7cad8c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13437
revert
transform the NCHW Convolution operators to NHWC and the tensors around these operators
Reviewed By: bwasti
Differential Revision: D12871789
fbshipit-source-id: 6509a29fa1654424d22904df0d3e60f8cd9c0ec7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13436
revert
Add a utility function to convert a list of caffe2_pb2.Argument to a dictionary.
Reviewed By: bwasti
Differential Revision: D12871811
fbshipit-source-id: 486ad09f3f37723c92a946c486ce3e24a649b4e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13429
Made the SSA transformation idempotent. This ensures that if a caffe2 graph is already in SSA form, the name of the ONNX models inputs/outputs match these of the caffe2 graph.
Avoid evaluating the model by running it if the shapes of all the blobs are present in the value_info map. This speeds up the conversion and decrease its memory usage in the case of medium to large nets.
Reviewed By: abadams
Differential Revision: D12873354
fbshipit-source-id: d695b28e610562afa9a41c2d4da05be212ccb488
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13332
Add a utility function to convert a list of caffe2_pb2.Argument to a dictionary.
Reviewed By: bwasti
Differential Revision: D10861211
fbshipit-source-id: da2fcc3e3b4dbf8decbe14a8e2d5621b3fcc377f
Summary: Made the clangr rule more robust and it discovered more callsites.
Reviewed By: smessmer
Differential Revision: D12825017
fbshipit-source-id: 3be1eeb7ea697b36ef89e78ba64c0ee1259439c4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13206
Add has device option for checking if a node has a device option set
Reviewed By: bwasti
Differential Revision: D12815365
fbshipit-source-id: 58477df93777f470cfb30cd75f02a659a7017b7c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13132
Expose more of the C++ API to python
Reviewed By: duc0
Differential Revision: D10855086
fbshipit-source-id: 98cc89bc72ef91ed1c59c1a19688e047765cf90b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13203
Minor changes in the test workflow to run the model on CPUs
Reviewed By: stephenyan1231
Differential Revision: D9925797
fbshipit-source-id: b7b1fb2658ab68b1ffc2b1f7b314958ea4732b32
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13004
Implement BucketWeighted model layer, which learns a weight for each possible score in an IdScoreList. Here, we assume that the scores in the IdScoreList have already been converted into the appropriate 'buckets'. If this is not done, then essentially each score represents its own bucket.
We assume that the scores/buckets are integers, and if max_score is not set, we assume that the maximum cardinality of the score is less than or equal to the cardinality of the ids.
Reviewed By: chonglinsun
Differential Revision: D10413186
fbshipit-source-id: 743e643a1b36adf124502a8b6b29976158cdb130
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12843
This adds a cuda implementation for the UpsampleBilinearOp and UpsampleBilinearGradientOp.
The CUDA code is based off of the corresponding ResizeNearest operators but with bilinear interpolation logic taken from the CPU implementation.
Reviewed By: houseroad
Differential Revision: D10453776
fbshipit-source-id: b29ac330b72465974ddb27c0587bca590773fdec
Summary:
This is mostly for reusing all the cudnn test cases in our python operator_tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12278
Differential Revision: D10842592
Pulled By: bddppq
fbshipit-source-id: 4b3ed91fca64ff02060837b3270393bc2f9a9898
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13007
No reason to use the hook if it's set, this helps fbcode traces.
This slightly pessimizes the stack trace for ATen functions,
because we are no longer skipping all of the frames we should.
This is probably OK.
Reviewed By: Yangqing
Differential Revision: D10518499
fbshipit-source-id: be54e490df3c3fde7ff894b5b1473442ffc7ded3
Summary:
TSIA - we want to deprecate numba in fbcode when moving to new compiler tiers.
Converted the old test to a non-numba regular python op test.
Reviewed By: xw285cornell
Differential Revision: D10519910
fbshipit-source-id: 0e9188a6d0fc159100f0db704b106fbfde3c5833
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12848
Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call
SerializeAsString_EnforceCheck so that the return value is checked and can
throw an exception if failing.
Most of the affected code was called from classes derived from BlobSerializeBase.
Didn't touch most tests and ENFORCE calls because they usually do checks
anyway.
Original commit changeset: c0760e73ecc7
Reviewed By: dzhulgakov
Differential Revision: D10453456
fbshipit-source-id: d2f2b7b4578e721924354149f08f627c7e3bf070
Summary:
- exhaustive_search attribute will be blacklisted so it
will be discarded from the coverted onnx model. At present
it throws error while verifying the onnx model
Signed-off-by: Parth Raichura <parth.raichura@softnautics.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12805
Differential Revision: D10502374
Pulled By: ezyang
fbshipit-source-id: 0926dfa3237a8a431184e7f7250146e5b0cbfb85
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12900
Workspace sometimes will be populated with input tensors for shape inference but net.external_input() is not a reliable way to tell weights from input in the workspace. We say in some usecases where net.external_input() is empty. In this case, we need to give user an option to provide input hint.
Reviewed By: bddppq
Differential Revision: D10476822
fbshipit-source-id: 1a3fa2df69b959d5b952a7824eba9e6c713f4f07
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12897
UnsafeCoalesce Op is used during memonger days when we try to coalesce operators
for better efficienct computation kernels. It creates a little bit of an unsafe
underlying memory storage pattern.
With the new tensor unification I am not sure if it is still safe for us to do
so, so I propose we delete it for the sake of safety.
Reviewed By: bddppq, ilia-cher
Differential Revision: D10475980
fbshipit-source-id: b1a838c9f47d681c309ee8e2f961b432236e157e
Summary:
This test flushes out the issue that IDEEP cannot handle tensor with dims like (0, 2), which is a valid tensor shape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8459
Differential Revision: D10419328
Pulled By: yinghai
fbshipit-source-id: c5efcd152364a544180a8305c47a2a2d126ab070
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12736
This updates UpsampleBilinearOp and UpsampleBilinearGradientOp to support scales to bring it inline with ResizeNearestOp https://github.com/pytorch/pytorch/pull/12720.
Reviewed By: houseroad
Differential Revision: D10416228
fbshipit-source-id: f339b7e06979c9c566afb4cee64a2d939b352957
Summary: Added 2 years ago in D3665603, never used, kill it.
Reviewed By: ezyang
Differential Revision: D10421336
fbshipit-source-id: 1b027a9ef2b71d0dd2c572cd4338bc8e046320d8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12799
Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call
SerializeAsString_EnforceCheck so that the return value is checked and can
throw an exception if failing.
Most of the affected code was called from classes derived from BlobSerializeBase.
Didn't touch most tests and ENFORCE calls because they usually do checks
anyway.
Reviewed By: ezyang
Differential Revision: D10416438
fbshipit-source-id: cb842e3e26b0918829d71267a375d4dd40600d58
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12681
- Get rid of NodeMatchCriteria as a template parameter, which was too generic. So MatchNode<NodeMatchCriteria> becomes MatchNode<GraphType>, and MatchStore stores the predicate on GraphType::NodeRef.
- Similarly, get rid of NNNodeMatchCriteria
Now one can just pass in a function pointer NodeRef -> bool to NNMatchNode constructor directly like this
mg.createNode(is<Relu>)
- Merge static utilities in SubgraphMatcher class into MatchGraph class
- Rename MatchNode to MatchPredicate
Change use cases and tests to make it work
Reviewed By: ZolotukhinM
Differential Revision: D10386907
fbshipit-source-id: 43874bd154e3d7c29ce07b4b74eca8a7a9f3078a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11711
Added GPU support for spatial batch normalization. This functions by reducing values from GPUs onto a CPU and broadcasting those results back to each GPU. We have run several experiments, and found these results to be better than those without spatial bn: https://fb.quip.com/fr7HAeDliPB8
Reviewed By: enosair
Differential Revision: D9547420
fbshipit-source-id: ccbd2937efd6cfd61182fff2f098fb7c5ae8aeb1
Summary: Add a mapping for conversion -- this will help with debugging as well but is directly used by the TUI stacked on top of this
Reviewed By: duc0
Differential Revision: D10396130
fbshipit-source-id: cdd39278f0ed563bb828b1aebbbd228f486d89c8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12685
In this diff, we push the fake run of the net into the ONNXIFI transformer, because
1. We cannot do shape inference for every op
2. Since the net has been SSA rewritten, we cannot use shape info from outer workspace directly.
In addition, this diff adds input shape info when querying the `onnxBackendCompatibility` function.
Reviewed By: bddppq
Differential Revision: D10390164
fbshipit-source-id: 80475444da2170c814678ed0ed3298e28a1fba92
Summary:
Simple additions that make it vastly easier to use nomnigraph in
python
Reviewed By: duc0
Differential Revision: D10383027
fbshipit-source-id: 441a883b84d4c53cca4f9c6fcc70e58692b8f782
Summary:
This seems to be a typo that never got caught - no actual functionality changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12688
Differential Revision: D10391704
Pulled By: Yangqing
fbshipit-source-id: ce633776957628c4881956c5423bfab78294d512
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12661
Was disabled since workspace.has_mkldnn is now set to false
Reviewed By: yinghai
Differential Revision: D10383913
fbshipit-source-id: ad6dc705f0606b3711e8b450dc384ad3ebb87686
Summary:
The pytorch.org site redirects all of the http:// requests to the https:// site anyway, so the comments and error messages might as well refer directly to the https:// site. The GitHub project description should also be updated to point to https://pytorch.org
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12636
Differential Revision: D10377099
Pulled By: soumith
fbshipit-source-id: f47eaba1dd3eecc5dbe62afaf7022573dc3fd039
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12382
implement fp16-> (uint8 + scale and bias in fp32)
this is similar to fp32 rowwise quantization
we could have done scale and bias in fp16 but not too motivated since we are not saving much and those datatypes have to be converted to fp32 to process since x86 doesn't support half float operations anyways
Reviewed By: csummersea
Differential Revision: D10220463
fbshipit-source-id: 6c382026de881f03798c2e5fc43abfc80f84ea1f
Summary:
This resolves an issue where the `model.Copy` operation would
copy to the wrong GPU, such that the below `net.Sum` operation
would use an input argument for which p2p access was not enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12342
Differential Revision: D10343181
Pulled By: ezyang
fbshipit-source-id: fd2d6d0ec6c09cda2db0a9a4f8086b3560e5a3ec
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12178
Fisher GAN calls processor_util.add_mlp, which inject the layer norm through the
normalizer. We allow to use alternative impl for LayerNorn in the normalizer.
Reviewed By: Wakeupbuddy
Differential Revision: D9235528
fbshipit-source-id: 88c126c658102926613242ef84a481f6de1676ed
Summary:
1. Fix BN translator
IntelCaffe and NVCaffe fuse BN+Scale, and the "BatchNorm" op contains 5 params including (scale and bias)
2. Fix Scale translator
the translated outputs of scale have the same names with those of Conv.
All their names are output + '_w' and output + '_b'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10056
Differential Revision: D10099205
Pulled By: yinghai
fbshipit-source-id: 73a73868e3e16c495e8b233fdb1d373d556a9537
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12390
Introduce a no op optimizer for when we don't want updates to happen, but don't want to affect downstream processes.
Reviewed By: mlappelbaum
Differential Revision: D10209812
fbshipit-source-id: 2af4ebc0fb42e78ea851c3a9f4860f3d224037b6
Summary:
Changes in this PR:
1. Intermediate Docker image is shared from build stage to test stage through ECR, in order to fix the Caffe2 flaky CUDA tests.
2. There are ~7 Caffe2 operator tests that are only flaky in `caffe2_py2_gcc4_8_ubuntu14_04_test` on CPU. Disabling those tests on that config only, which is okay to do because we are still running those tests in other test jobs.
After this PR is merged, CircleCI will be running on master automatically, and will be running on PRs if the author rebased their PR onto the newest master (which we will ask all the authors to do when we switch off Jenkins for Linux).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12389
Differential Revision: D10224267
Pulled By: yf225
fbshipit-source-id: dd1a90a425c3d13b870d3d328cb301eee2e6e2cd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12180
I had to fix a lot of call sites, because a lot of places assume that
you can actually get a const vector&, and if the internal representation
of sizes in a tensor is NOT a vector, it's not possible to fulfill
this API contract.
Framework changes:
- I deleted TensorImpl::dims(); caffe2::Tensor::dims() just forwards to
sizes() now.
- De-templatized SetDims; now it is an explicit list of ArrayRef and
variadic overloads. This makes implicit conversions work again,
so I don't need to explicitly list the std::vector cases too.
- As a knock-on effect, this causes Reset() to accept at::IntList as well as
const std::vector<int64_t>&
- Edited variadic overloads of SetDims to all forward to the underlying
arbitrary-dim implementation, reducing code duplication. (It's probably
marginally less efficient in the new world.)
- Replace Tensor constructor accepting const std::vector<int64_t>& with at::IntList
- Make MKLTensor accept ArrayRef along with vector in constructor and
Reset (unfortunately, no implicit conversions here, since it's templated on
index type.)
- There are a few other places, like cudnn, where I changed functions
that previously took const std::vector<int64_t>& to take at::IntList
instead.
Classification of call site changes:
- 'const std::vector<int64_t>& x_dims = x.dims()' ==>
'at::IntList x_dims = x.dims()'
- 'std::vector<int64_t> x_dims = x.dims()' ==>
'std::vector<int64_t> x_dims = x.dims().vec()' (we need a copy!)
Usually this is because we're about to mutably modify the vector
to compute some new dimension. However, it also very commonly occurs in the
form: 'x_dims_ = x.dims()' because we frequently cache sizes in operators.
- Instead of constructing std::vector<int64_t>{blah, blah}, construct an
at::IntList directly
ArrayRef changes:
- cbegin()/cend() iterators, they operate the same aas begin()/end() because
everything on ArrayRef is const.
- Moved operator<< into ArrayRef.h, so that it's always available when
working with ArrayRef. I also templated it, so it now works on an
ArrayRef of any type.
- Add operator== overload for ArrayRef, and also add variants to permit
comparison of ArrayRef with std::vector, a very common operation.
(The non-templated version of operator== can get these automatically
via implicit conversion, but with templates C++ refuses to do
any explicit conversions.)
I'm planning to audit all dims() call sites to make sure they don't
expect 'auto x = t.dims()' to give you an x whose lifetime can validly
outlive the tensor.
I opted not to do a dims() to sizes() rename, because dims() also matches
the protobufs accessor. Bad news!
Reviewed By: jerryzh168
Differential Revision: D10111759
fbshipit-source-id: a2a81dc4b92c22ad4b3b8ef4077a7e97b6479452
Summary:
All usages of the `ndarray` construct have now been guarded with `USE_NUMPY`. This eliminates the requirement of NumPy while building PyTorch from source.
Fixes#11757
Reviewed By: Yangqing
Differential Revision: D10031862
Pulled By: SsnL
fbshipit-source-id: 32d84fd770a7714d544e2ca1895a3d7c75b3d712
Summary:
If block is missing control inputs when do caffe2 net execution, this PR add them back and remove the un-SSA semantics
jamesr66a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12224
Differential Revision: D10135408
Pulled By: wanchaol
fbshipit-source-id: 746c870bde54ed4ca627167361db1b3f36cd235c
Summary:
Original commit changeset: f5614a5d2607
D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz
We need to land this revert ASAP to unblock aggregator push.
Reviewed By: orionr
Differential Revision: D10123245
fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11349
Special case BatchGather and BatchGatherGradient for block_size=1. This makes BatchGather 3-4X faster and BatchGatherGradient 10X for this case.
Reviewed By: jspark1105, ilia-cher
Differential Revision: D7218043
fbshipit-source-id: ea12042239a8adc92b9efcbd0b66e354fb43f4c7
Summary:
the speed-up of a single operation is up to 6X on BDW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11686
Reviewed By: yinghai
Differential Revision: D9828129
Pulled By: wesolwsk
fbshipit-source-id: 7dbacea90609e18438f6fe1229c641937d0696c8
Summary:
This does 6 things:
- add c10/util/Registry.h as the unified registry util
- cleaned up some APIs such as export condition
- fully remove aten/core/registry.h
- fully remove caffe2/core/registry.h
- remove a bogus aten/registry.h
- unifying all macros
- set up registry testing in c10
Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077
Reviewed By: ezyang
Differential Revision: D10050771
Pulled By: Yangqing
fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12021
TestPilot runs stress tests in parallel. These fail for serialized tests because extracting (and subsequent deletion) of binary data during the process isn't threadsafe. Extract zips into tempfile to avoid this problem.
Also remove some accidentally checked in zips of a test that we didn't end up including for now.
Reviewed By: houseroad
Differential Revision: D10013682
fbshipit-source-id: 6e13b850b38dee4106d3c10a9372747d17b67c5a