Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16691
Previous diffs already introduced a macro that registers caffe2 CPU kernels with c10.
This now also registers the CUDA kernels with it.
Reviewed By: bwasti
Differential Revision: D13901619
fbshipit-source-id: c15e5b7081ff10e5219af460779b88d6e091a6a6
Summary:
The second input (`lengths`) is not supported.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16727
Differential Revision: D14054105
Pulled By: houseroad
fbshipit-source-id: 36b8d00460f9623696439e1bd2a6bc60b7bb263c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16932
During onnxifi transformation net ssa is rewritten. At the last step the weight
names are changed back to what they were before. The diff keeps the weight
names unchanged thru the process.
Reviewed By: yinghai
Differential Revision: D13972597
fbshipit-source-id: 7c29857f788a674edf625c073b345f2b44267b33
Summary:
Impl ExpandDims op and fallback to CPU if needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15264
Differential Revision: D13808797
Pulled By: yinghai
fbshipit-source-id: 7795ec303a46e85f84e5490273db0ec76e8b9374
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16643
The test was disabled in D13908117 because it conflicted with another diff that was about to land.
Now fixed the merge conflict and re-landing it.
Reviewed By: ezyang
Differential Revision: D13911775
fbshipit-source-id: b790f1c3a3f207916eea41ac93bc104d011f629b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16548
With this macro, a caffe2 operator can now directly be registered with c10.
No need to write custom wrapper kernels anymore.
Differential Revision: D13877076
fbshipit-source-id: e56846238c5bb4b1989b79855fd44d5ecf089c9c
Summary:
This PR is a follow up of #15460, it did the following things:
* remove the undefined tensor semantic in jit script/tracing mode
* change ATen/JIT schema for at::index and other index related ops with `Tensor?[]` to align with what at::index is really doing and to adopt `optional[tensor]` in JIT
* change python_print to correctly print the exported script
* register both TensorList and ListOfOptionalTensor in JIT ATen ops to support both
* Backward compatibility for `torch.jit.annotate(Tensor, None)`
List of follow ups:
* remove the undefined tensor semantic in jit autograd, autodiff and grad_of
* remove prim::Undefined fully
For easy reviews, please turn on `hide white space changes` in diff settings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16379
Differential Revision: D13855677
Pulled By: wanchaol
fbshipit-source-id: 0e21c14d7de250c62731227c81bfbfb7b7da20ab
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16676
This op is used for changing batch size (first dimension) of the tensor.
Reviewed By: bertmaher, ipiszy
Differential Revision: D13929200
fbshipit-source-id: 4f2c3faec072d468be8301bf00c80d33adb3b5b3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16785
There's no EIGEN engine implemented for DeformConv but unit test was checking it.
Reviewed By: BIT-silence
Differential Revision: D13967306
fbshipit-source-id: e29c19f59f5700fc0501c59f45d60443b87ffedc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16478
This diff includes an example registration of a caffe2 op in torch. A previous attempt ran into a static initialization order bug.
Reviewed By: smessmer
Differential Revision: D13854304
fbshipit-source-id: ec463ce2272126d08a5163d1599361ee5b718bbc
Summary:
Just noticed while building on a machine without cudnn present - it was building but the runtime failed since some methods weren't bound
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16701
Differential Revision: D13937247
Pulled By: dzhulgakov
fbshipit-source-id: c81f05be7a9e64a1a8591036dcf8692c0ed4064e
Summary:
Add winograd conv method. Users can select the direct conv or winograd conv in the model file.
We close the origin pr https://github.com/pytorch/pytorch/pull/12154 and create this new one for better rebasing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15196
Differential Revision: D13463721
Pulled By: yinghai
fbshipit-source-id: c5cd5c8aa7622ae7e52aeabd3dbb8ffb99b9b4ee
Summary:
-Skip the test due to flaky behavior on AMD/Rocm
-The fix is expected in Rocm 2.2 ( HSA runtime)
bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16639
Differential Revision: D13915231
Pulled By: bddppq
fbshipit-source-id: 66e1d275836337170b15ceb9d60cfdd3242d4df8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16630
two PRs landed concurrently - enforcing tensor constraints and refactoring c10. Since it's not a prod code - disable test and I'll let Sebastian to fix it properly.
Reviewed By: ezyang
Differential Revision: D13908117
fbshipit-source-id: 381c5626078b794afa1fc7a95cb1ea529650424c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15388
This is another pass to make perfkernels code safer from illegal instruction error.
Removed dependency to c10/util/Logging.h
We're err on the safer side at the expense of some verbosity.
Reviewed By: dskhudia
Differential Revision: D13502902
fbshipit-source-id: 4f833115df885c5b4f8c1ca83b9badea1553f944
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16246
The op schema says it returns multiple values, so let's actually return multiple values instead of one tuple.
For some reason, this did work when called from python (probably some auto-unpacking),
but once called from JIT, it segfaulted. This diff fixes that.
Reviewed By: dzhulgakov
Differential Revision: D13780147
fbshipit-source-id: fe94f82f4c53b7454f77c4484fca4ac9dc444475
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16374
this fixes the original attempt in OSS (adds to CMake and python build files)
Reviewed By: smessmer
Differential Revision: D13821061
fbshipit-source-id: 82f0dade0145fd04bdf8e3cb3954b5790e918162