Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19083
As we have discussed, there are too many of AdjustBatch ops and they incur reallocation overhead and affects the performance. We will eliminate these ops by
- inling the input adjust batch op into Glow
- inling the output adjust batch op into OnnxifiOp and do that only conditionally.
This is the C2 part of the change and requires change from Glow side to work e2e.
Reviewed By: rdzhabarov
Differential Revision: D14860582
fbshipit-source-id: ac2588b894bac25735babb62b1924acc559face6
Summary:
Almost there, feel free to review.
these c10 operators are exported to _caffe2 domain.
TODO:
- [x] let the onnx checker pass
- [x] test tensor list as argument
- [x] test caffe2 backend and converter
- [x] check the c10 schema can be exported to onnx
- [x] refactor the test case to share some code
- [x] fix the problem in ONNX_ATEN_FALLBACK
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18210
Reviewed By: zrphercule
Differential Revision: D14600916
Pulled By: houseroad
fbshipit-source-id: 2592a75f21098fb6ceb38c5d00ee40e9e01cd144
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18716
Might be useful as an intermediate stage for some systems that currently use Caffe2 nets as an execution mechanism.
Not sure it's a good idea all together, please comment.
Limitations:
- only Tensor types as inputs/outputs
- the entire module is serialized as a zip archive inside a proto in Caffe2 db, it'd be subject to 4Gb limit and is likely very slow. For small models it'd work though.
- no autograd, though it can be attached in principle
- no way to retrieve parameters inside the script module from C2 runtime perspective (though they potentially can be alias-fetched and stored as individual blobs)
- after deserialization, python wrappers returned don't have correct type (as we don't do module_lookup trick)
Build-wise, I had to add dependency from pybind_state to libtorch.so. I don't think we build Caffe2 python frontend independently anymore, so it should be fine.
Reviewed By: amirshim, houseroad
Differential Revision: D14339599
fbshipit-source-id: 88a37a8abd1f1c4703e5ef937031f222535d4080
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18560
We have to import python protobuf here **before** we load cpp extension.
Otherwise it breaks under certain build conditions if cpp implementation of
protobuf is used. Presumably there's some registry in protobuf library and
python side has to initialize the dictionary first, before static
initialization in python extension does so. Otherwise, duplicated protobuf
descriptors will be created and it can lead to obscure errors like
Parameter to MergeFrom() must be instance of same class: expected caffe2.NetDef got caffe2.NetDef.
I think it also fixes https://github.com/facebookarchive/caffe2/issues/1573
Reviewed By: ezyang, iroot900
Differential Revision: D14622054
fbshipit-source-id: 2499eb88ecdee85ff8d845859048f7ae5da2a480
Summary:
The mkldnn-bridge is upgraded in this PR to support DNNLOWP operators.
Meanwhile, APIs have been updated in caffe2 to use latest version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16308
Differential Revision: D14697018
Pulled By: yinghai
fbshipit-source-id: ca952589098accb08295fd5aa92924c61e74d69c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18740
Test utilities for writing Caffe2/PyTorch performance microbenchmarks. Brief description of the file structure
* benchmark_core.py : core utiltiites for running microbenchmark tests
* benchmark_caffe2.py : Caffe2 specific benchmark utilitites
* benchmark_pytorch.py: PyTorch specific benchmark utilities
* benchmark_runner.py : Main function. Currently it can run the microbenchmark tests in a stand-alone mode. The next step is to have this integrate with AI-PEP.
The utilities are located at https://github.com/pytorch/pytorch/tree/master/test to have access to both Caffe2/PyTorch Python's frontend.
Include two operator microbenchmarks; support both Caffe2/PyTorch:
* MatMul
* Add
Reference: PyTorch benchmarks : https://github.com/pytorch/benchmark/tree/master/timing/python. In this work, we start with two example binary operators MatMul and Add, but eventually we should to cover unary operators like in the PyTorch benchmark repo.
Reviewed By: zheng-xq
Differential Revision: D13887111
fbshipit-source-id: b7a56b95448c9ec3e674b0de0ffb96af4439bfce
Summary:
For MKL-DNN,the filter data will be reorderd to primitive format, it takes a lot of time.
So the patch provide a method to convert filter format before training.
And "OptimizeForIdeep" will be changed to "OptimizeForMkldnn" in this patch.
This patch depends on https://github.com/pytorch/pytorch/pull/12866
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15171
Differential Revision: D14590741
Pulled By: yinghai
fbshipit-source-id: 07971c9977edac3c8eec08ca2c39cda639683492
Summary:
In blob feeder for ideep device, the wrong device option is given and led to a crash issue.
This patch aims to correct the device option to fix this bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18552
Differential Revision: D14679838
Pulled By: yinghai
fbshipit-source-id: bde11e6a6fe44822166881dcb7c9bd0b34b4ecf3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18512
Ceil and Floor have been supported since version 6 of ONNX: export them using the native onnx ops instead of an Aten op.
Similarly, support for the Where op has been added in version 9, so we don't need to wrap these op in an Aten op.
Reviewed By: houseroad
Differential Revision: D14635130
fbshipit-source-id: d54a2b6e295074a6214b5939b21051a6735c9958
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18494
Today we have some C2 end2end test run requiring reading model data from external filesystem (for example, Gluster and AWS). This could be a source for flaky test when the external filesystems are not reachable during the tests.
In this diff, we add try/catch logic around where we download models and open model files from external system. In case such attempts fails, we will catch the excption and let the unittest skip the current test instead of failure.
I also refactor the code a little bit by removing some duplicated logic on downloading and build the c2 model data. It has been duplicated in two classes and a few functions...
Reviewed By: yinghai
Differential Revision: D14442241
fbshipit-source-id: da8bf56c8d096efa34ca2070de5cd10a18aad70c
Summary:
arguments order is okay to be different
ajyu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18466
Differential Revision: D14627258
Pulled By: bddppq
fbshipit-source-id: 430e1fb1bea2c5639a547ae7c1652368788c86b9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18155
- Make a python decorator caffe2_flaky for caffe2 operator unit tests.
- The environment variable CAFFE2_RUN_FLAKY_TESTS are now used to mark flaky test mode
During test run,
- If flaky tests mode are on, only flaky tests are run
- If flaky tests mode are off, only non-flaky tests are run
Mark ctc_beam_search_decoder_op_test as flaky
Reviewed By: ezyang, salexspb
Differential Revision: D14468816
fbshipit-source-id: dceb4a48daeb5437ad9cc714bef3343e9761f3a4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18129
A lot of tensor interference function assume the operator passes the schema.
So call Verity to make sure this is actually the case.
Created diff before to add checking in Concat (https://github.com/pytorch/pytorch/pull/17110), but I encountered lot more places where this is assumed (for example ElementwiseOpShapeInference)
Reviewed By: mdschatz
Differential Revision: D14503933
fbshipit-source-id: cf0097b8c3e4beb1cded6b61e092a6adee4b8fcb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18257
support adding op in global_init_net. because pred_init_net is per thread, and just doesn't cut it.
Reviewed By: jspark1105
Differential Revision: D14552695
fbshipit-source-id: 53dd44c84ad019019ab9f35fc04d076b7f941ddc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17905
support adding op in global_init_net. because pred_init_net is per thread, and just doesn't cut it.
Reviewed By: jspark1105
Differential Revision: D14114134
fbshipit-source-id: 112bb2ceb9d3d5e663dd430585567f4eaa2db35f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18040
Add flag to fails if float point exceptions is detected in operator runs
Sample exception
Exception [enforce fail at operator.h:837] !std::fetestexcept(FE_DIVBYZERO). Division by zero floating point exception (FE_DIVBYZERO) reported.
Error from operator:
input: "1" input: "0" output: "out" name: "" type: "Div"
Reviewed By: jspark1105
Differential Revision: D14467731
fbshipit-source-id: fad030b1d619a5a661ff2114edb947e4562cecdd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18084
data_strategy parameter was not used in some of unit tests for optimizers
Reviewed By: hyuen
Differential Revision: D14487830
fbshipit-source-id: d757cd06aa2965f4c0570a4a18ba090b98820ef4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18036
- Add macros to export c10 cuda operators to caffe2 frontend
- Instead of having a separate caffe2 registry for the c10 operator wrappers, use the existing caffe2 registries
Reviewed By: ezyang
Differential Revision: D14467495
fbshipit-source-id: 7715ed2e38d2bbe16f1446ae82c17193a3fabcb9
Summary:
According to https://docs.python.org/3/tutorial/inputoutput.html, it is good practice to use the "with" keyword when dealing with file objects. If not, you should call f.close() to close the file and immediately free up any system resources used by it. Thus, I adjust the open file function to "with open() as f".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18017
Differential Revision: D14475112
Pulled By: ezyang
fbshipit-source-id: d1c0821e39cb8a09f86d6d08b437b4a99746416c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17548
expose half float operators to OSS
common/math/Float16.h is the original implementation
this is substituted by caffe2/c10/util/Half.h
from the comments seems like the both implementations don't handle denormals
Reviewed By: jspark1105
Differential Revision: D14244200
fbshipit-source-id: f90ba28c5bf6a2b451b429cc4925b8cc376ac651
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17726
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17725
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17461
Implementing a standalone LSTM Operator in Caffe2 adopted from this Aten implementation: diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/native/RNN.cpp. The most tricky thing in this exercise was that caffe2::Tensor has no copy constructor that made it necessary to implement a custom templated copy constructor for the different Tensor containers used in the code. Also there was no way to use off-the-shelf C2 operators in my code easily so I had to copy some code that is doing basic matmul, cat, split, transpose and linear as utility functions.
Two things missing:
- Profiling this implementation against the current ONNXified LSTM op
- Make this operator available to use in PyTorch
Reviewed By: dzhulgakov
Differential Revision: D14351575
fbshipit-source-id: 3b99b53212cf593c7a49e45580b5a07b90809e64
Summary:
Observed the test `TestGroupConvolution.test_group_convolution` to fail with the following error:
```
Falsifying example: test_group_convolution(self=<caffe2.python.operator_test.group_conv_test.TestGroupConvolution testMethod=test_group_convolution>, stride=3, pad=0, kernel=5, size=8, group=4, input_channels_per_group=7, output_channels_per_group=8, batch_size=2, order='NHWC', engine='', use_bias=False, gc=, dc=[, device_type: 1])
You can reproduce this example by temporarily adding reproduce_failure('3.59.1', b'AAAA') as a decorator on your test case
```
This example generated by hypothesis has `group=2, order='NHWC' and dc=[, device_type: 1])`.
I think this example should be skipped.
I have mimicked the change corresponding to [PR#13554](https://github.com/pytorch/pytorch/pull/13554) to skip this example.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17715
Differential Revision: D14346642
Pulled By: ezyang
fbshipit-source-id: b1f1fef09f625fdb43d31c7213854e61a96381ba
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17623
Despite it's generic sounding name, caffe2::DeviceGuard actually
only worked on CUDA devices. Rename it to something that more
clearly spells out its applicability.
I'm not sure if it's the right call, but in this patch I added
'using CUDAGuard = c10::cuda::CUDAGuard', as this seems to be more
in-line with how the Caffe2 codebase is currently written. More
idiomatic c10 namespace style would be to say cuda::CUDAGuard.
Willing to change this if people shout.
This is a respin of D13156470 (#14284)
Reviewed By: dzhulgakov
Differential Revision: D14285504
fbshipit-source-id: 93b8ab938b064572b3b010c307e1261fde0fff3d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17461
Implementing a standalone LSTM Operator in Caffe2 adopted from this Aten implementation: diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/native/RNN.cpp. The most tricky thing in this exercise was that caffe2::Tensor has no copy constructor that made it necessary to implement a custom templated copy constructor for the different Tensor containers used in the code. Also there was no way to use off-the-shelf C2 operators in my code easily so I had to copy some code that is doing basic matmul, cat, split, transpose and linear as utility functions.
Two things missing:
- Profiling this implementation against the current ONNXified LSTM op
- Make this operator available to use in PyTorch
Reviewed By: dzhulgakov
Differential Revision: D14160172
fbshipit-source-id: c33e3f9e8aeae578b64d97593cb031a251216029
Summary:
Because of two separate python extensions with different pybind
instances I have to go through void* conversion. Since it's hidden from
user, it's fine.
New APIs added on C2 side:
- workspace.FetchTorch('blob')
- workspace.Workspace.current.blobs['blob'].to_torch()
- workspace.FeedBlob('blob', pytorch_tensor)
Works on CPU an GPU.
The only glitches are with resizing because of variable/tensor split.
But data sharing works properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17190
Reviewed By: ezyang
Differential Revision: D14163882
Pulled By: dzhulgakov
fbshipit-source-id: d18e5b8fcae026f393c842a1149e972515732de2
Summary:
They are previously merged to resolve#17051. However, since it was resolved by the upstream, and it was causing some issues like https://github.com/abjer/tsds/issues/8, I think it's time to revert these changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17567
Differential Revision: D14265241
Pulled By: kostmo
fbshipit-source-id: 7fa2b7dd4ebc5148681acb439cf82d983898694e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17549
Currently Dropout is only enabled in training, we enable the option of having dropout in Eval.
This is to follow [1]. This functionality would be used for uncertainty estimation in exploration project.
[1] Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning." international conference on machine learning. 2016.
Reviewed By: Wakeupbuddy
Differential Revision: D14216216
fbshipit-source-id: 87c8c9cc522a82df467b685805f0775c86923d8b
Summary:
MKL-DNN support multi-node mode,but not support multi-devices mode,this commit will support multi-devices for MKL-DNN.This commit depend on https://github.com/pytorch/pytorch/pull/11330
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12856
Differential Revision: D13735075
Pulled By: ezyang
fbshipit-source-id: b63f92b7c792051f5cb22e3dda948013676e109b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16723
Removed obsolete argument correct_transform_coords in bbox_transform op.
* It was only for backward compatibility. We should not have models using it now.
Differential Revision: D13937430
fbshipit-source-id: 504bb066137ce408c12dc9dcc2e0a513bad9b7ee
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17194
we found that there is a per row absolute error due to int8 quant
and a relative error table-wide in case fp16 is used
Reviewed By: csummersea
Differential Revision: D14113353
fbshipit-source-id: c7065aa9d15c453c2e5609f421ad0155145af889
Summary:
similar to softmax there are issues of getting nan randomly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17170
Differential Revision: D14110515
Pulled By: bddppq
fbshipit-source-id: 5c97661184d45a02122fd69d35a839fdf4520c8c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17062
from jiyan's training jobs it seems like we found a quantization bug
fp32
fp32->rowwise int8 is fine
fp16 is fine
fp16->rowwise int8 is not fine
we are preconverting everything to fp32 and using the existing code, so there is no need to change the epsilon in the case of fp16 since at the time of converting, everything is a float
Reviewed By: jspark1105
Differential Revision: D14063271
fbshipit-source-id: 747297d64ed8c6fdf4be5bb10ac584e1d21a85e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17158
Because of Reshape op, batch size can be changed. This diff addresses first order issue raised from multiple batch size system. We need to export different real_batch_size for different max_batch_size input and attach it to the right output.
It also fixes a false exception.
Reviewed By: ipiszy
Differential Revision: D14099541
fbshipit-source-id: 0fa9e86826f417a11d2b5dd2ee60dff64a7ce8c4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17074
There are some common functionalities in backend lowering. This diff creates a base class which hosts these common stuff.
Reviewed By: ipiszy
Differential Revision: D14073192
fbshipit-source-id: 9617603d0e73db6f7fcc5572756b9dbab506dae5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17046
As we are moving to use bound shape inference, we can remove the awkward fake inference run path and make the code cleaner.
Reviewed By: ipiszy
Differential Revision: D14061501
fbshipit-source-id: b3ace98b3dabef3c3359086a0bb1410518cefa26
Summary:
For >2D input, previously the code uses static shape captured during tracing and reshape before/after `Gemm`.
Now we add `-1` to the first `Reshape`, and uses `Shape(X) => Slice(outer) => Concat(with -1 for inner) => Reshape` for the second.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16184
Differential Revision: D14070754
Pulled By: ezyang
fbshipit-source-id: 86c69e9b254945b3406c07e122e57a00dfeba3df
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16691
Previous diffs already introduced a macro that registers caffe2 CPU kernels with c10.
This now also registers the CUDA kernels with it.
Reviewed By: bwasti
Differential Revision: D13901619
fbshipit-source-id: c15e5b7081ff10e5219af460779b88d6e091a6a6
Summary:
The second input (`lengths`) is not supported.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16727
Differential Revision: D14054105
Pulled By: houseroad
fbshipit-source-id: 36b8d00460f9623696439e1bd2a6bc60b7bb263c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16932
During onnxifi transformation net ssa is rewritten. At the last step the weight
names are changed back to what they were before. The diff keeps the weight
names unchanged thru the process.
Reviewed By: yinghai
Differential Revision: D13972597
fbshipit-source-id: 7c29857f788a674edf625c073b345f2b44267b33
Summary:
Impl ExpandDims op and fallback to CPU if needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15264
Differential Revision: D13808797
Pulled By: yinghai
fbshipit-source-id: 7795ec303a46e85f84e5490273db0ec76e8b9374
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16643
The test was disabled in D13908117 because it conflicted with another diff that was about to land.
Now fixed the merge conflict and re-landing it.
Reviewed By: ezyang
Differential Revision: D13911775
fbshipit-source-id: b790f1c3a3f207916eea41ac93bc104d011f629b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16548
With this macro, a caffe2 operator can now directly be registered with c10.
No need to write custom wrapper kernels anymore.
Differential Revision: D13877076
fbshipit-source-id: e56846238c5bb4b1989b79855fd44d5ecf089c9c
Summary:
This PR is a follow up of #15460, it did the following things:
* remove the undefined tensor semantic in jit script/tracing mode
* change ATen/JIT schema for at::index and other index related ops with `Tensor?[]` to align with what at::index is really doing and to adopt `optional[tensor]` in JIT
* change python_print to correctly print the exported script
* register both TensorList and ListOfOptionalTensor in JIT ATen ops to support both
* Backward compatibility for `torch.jit.annotate(Tensor, None)`
List of follow ups:
* remove the undefined tensor semantic in jit autograd, autodiff and grad_of
* remove prim::Undefined fully
For easy reviews, please turn on `hide white space changes` in diff settings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16379
Differential Revision: D13855677
Pulled By: wanchaol
fbshipit-source-id: 0e21c14d7de250c62731227c81bfbfb7b7da20ab
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16676
This op is used for changing batch size (first dimension) of the tensor.
Reviewed By: bertmaher, ipiszy
Differential Revision: D13929200
fbshipit-source-id: 4f2c3faec072d468be8301bf00c80d33adb3b5b3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16785
There's no EIGEN engine implemented for DeformConv but unit test was checking it.
Reviewed By: BIT-silence
Differential Revision: D13967306
fbshipit-source-id: e29c19f59f5700fc0501c59f45d60443b87ffedc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16478
This diff includes an example registration of a caffe2 op in torch. A previous attempt ran into a static initialization order bug.
Reviewed By: smessmer
Differential Revision: D13854304
fbshipit-source-id: ec463ce2272126d08a5163d1599361ee5b718bbc
Summary:
Just noticed while building on a machine without cudnn present - it was building but the runtime failed since some methods weren't bound
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16701
Differential Revision: D13937247
Pulled By: dzhulgakov
fbshipit-source-id: c81f05be7a9e64a1a8591036dcf8692c0ed4064e
Summary:
Add winograd conv method. Users can select the direct conv or winograd conv in the model file.
We close the origin pr https://github.com/pytorch/pytorch/pull/12154 and create this new one for better rebasing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15196
Differential Revision: D13463721
Pulled By: yinghai
fbshipit-source-id: c5cd5c8aa7622ae7e52aeabd3dbb8ffb99b9b4ee
Summary:
-Skip the test due to flaky behavior on AMD/Rocm
-The fix is expected in Rocm 2.2 ( HSA runtime)
bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16639
Differential Revision: D13915231
Pulled By: bddppq
fbshipit-source-id: 66e1d275836337170b15ceb9d60cfdd3242d4df8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16630
two PRs landed concurrently - enforcing tensor constraints and refactoring c10. Since it's not a prod code - disable test and I'll let Sebastian to fix it properly.
Reviewed By: ezyang
Differential Revision: D13908117
fbshipit-source-id: 381c5626078b794afa1fc7a95cb1ea529650424c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15388
This is another pass to make perfkernels code safer from illegal instruction error.
Removed dependency to c10/util/Logging.h
We're err on the safer side at the expense of some verbosity.
Reviewed By: dskhudia
Differential Revision: D13502902
fbshipit-source-id: 4f833115df885c5b4f8c1ca83b9badea1553f944
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16246
The op schema says it returns multiple values, so let's actually return multiple values instead of one tuple.
For some reason, this did work when called from python (probably some auto-unpacking),
but once called from JIT, it segfaulted. This diff fixes that.
Reviewed By: dzhulgakov
Differential Revision: D13780147
fbshipit-source-id: fe94f82f4c53b7454f77c4484fca4ac9dc444475
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16374
this fixes the original attempt in OSS (adds to CMake and python build files)
Reviewed By: smessmer
Differential Revision: D13821061
fbshipit-source-id: 82f0dade0145fd04bdf8e3cb3954b5790e918162
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16191
logdevice related modifications for generic feature type
we directly convert the generic feature structures to json strings, which corresponds to the column input in offline and dper
Reviewed By: itomatik
Differential Revision: D13551909
fbshipit-source-id: 807830c50bee569de202530bc3700374757793a2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16350
Example usage of the new caffe2 integration
Reviewed By: smessmer
Differential Revision: D13408546
fbshipit-source-id: 87240ca7f48d653a70241d243aa0eb25efa67611
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16335
group conv is not implemented with EIGEN engine so this diff disables related tests
Reviewed By: jamesr66a
Differential Revision: D13807204
fbshipit-source-id: 41f6de43da40882f57e64474520e185733caefb7
Summary:
Based on offline discussion it should be less surprising to the users of existing code. Thus caffe2::Tensor is now a move-only class (as it used to be), explicit calls to UnsafeSharedInstance() are necessary to get shared_ptr behavior.
This change also identified a few places that misused the copy constructor - those are fixed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15416
Reviewed By: Yangqing
Differential Revision: D13524598
fbshipit-source-id: aea12d6dff77342606fa88ce4ddddbff266245a7
Summary:
1. Add some gloo communication operators into related fallback list;
2. Work around to avoid compiling errors while using fallback operator whose CPU operator inherits from 'OperatorBase' directly like PrefetchOperator;
3. Add new cpu context support for some python module files and resnet50 training example file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11330
Reviewed By: yinghai
Differential Revision: D13624519
Pulled By: wesolwsk
fbshipit-source-id: ce39d57ddb8cd7786db2e873bfe954069d972f4f
Summary:
bypass-lint
- Change all Caffe2 builds to use setup.py instead of cmake
- Add a -cmake- Caffe2 build configuration that uses cmake and only builds cpp
- Move skipIfCI logic from onnx test scripts to the rest of CI logic
- Removal of old PYTHONPATH/LD_LIBRARY_PATH/etc. env management
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15917
Reviewed By: orionr
Differential Revision: D13637583
Pulled By: pjh5
fbshipit-source-id: c5c5639db0251ba12b6e4b51b2ac3b26a8953153
Summary:
This is follow up on #13945 where we had to turn off some TRT tests because some ops were not ready to accept ONNX opset 9+ models. This PR fixes Reshape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15380
Differential Revision: D13649825
Pulled By: houseroad
fbshipit-source-id: b72e62803de5b63cc001c3fe4b3bf64dfa996e94
Summary:
Implementation LeakyRelu operator for mkl-dnn,the speed-up of a single operation is up to 10X on BDW.
Implementation rashape operator for mkl-dnn,it will resolve occasionally crash issue which use fallback reshape operator.
Implementation CreateBlobQueue and SafeEnqueueBlobs operators,it will resolve crash issue which use fallback operators.
Fallback CreateBlobsQueueDBOp,TensorProtosDBInput,CloseBlobsQueue operators.
Implement adam operator for mkl-dnn,the speed-up of a single operator is up to 6X on BDW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11696
Reviewed By: yinghai
Differential Revision: D10100438
Pulled By: wesolwsk
fbshipit-source-id: 0b6e06897cc11e0a8e349d80a870b1e72e47f10d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15865
factored out code used in tests for operators Add, Mul and Sub
into two new methods: a first one to generate the test vectors, a second
one to run the actual tests given a caffe2 and python operator.
Reviewed By: houseroad
Differential Revision: D13526955
fbshipit-source-id: 8970ba5a1305ca19a54a14b51816d4a19f19d678
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15553
Add unit test and implementation of NHWC layout for Resize operator.
Also, add pragma parallel loop to old NCHWC layout.
Reviewed By: jspark1105
Differential Revision: D13540762
fbshipit-source-id: eebf252bf0d1efdff180a171d804181045f100a5