Summary:
This is an extension to the original PR https://github.com/pytorch/pytorch/pull/21765
1. Increase the coverage of different opsets support, comments, and blacklisting.
2. Adding backend tests for both caffe2 and onnxruntime on opset 7 and opset 8.
3. Reusing onnx model tests in caffe2 for onnxruntime.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22421
Reviewed By: zrphercule
Differential Revision: D16225518
Pulled By: houseroad
fbshipit-source-id: 01ae3eed85111a83a0124e9e95512b80109d6aee
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22631
Test Plan:
test suite
Imported from OSS
Differential Revision: D16185040
fbshipit-source-id: 9b83749f6c9cd05d13f54a3bb4801e263293252b
Summary:
Having the NVRTC stub in ATen is necessary to call driver APIs in ATen. This is currently blocking https://github.com/pytorch/pytorch/pull/22229.
`DynamicLibrary` is also moved as it is used in the stub code, and seems general enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22362
Differential Revision: D16131787
Pulled By: ezyang
fbshipit-source-id: add2ee8a8865229578aa00001a00d5a6671e0e73
Summary:
After the Variable/Tensor merge, code paths in ATen need to be able to check whether a tensor requires gradient, and throw errors in places where a `requires_grad=true` tensor is not allowed (such as https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Utils.h#L76-L78 and https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/SparseTensorImpl.cpp#L86). Since the `GradMode` thread-local variable controls whether a tensor should accumulate gradients, we need to be able to check this variable from ATen when we determine whether a tensor requires gradient, hence the PR to move `GradMode` / `AutoGradMode` / `NoGradGuard` to ATen.
Note that we intentionally don't merge `at::GradMode` and `at::NonVariableTypeMode`, with the following reasoning:
Semantically, `at::GradMode` and `at::NonVariableTypeMode` actually mean different things: `at::GradMode` controls whether a tensor should accumulate gradients, and `at::NonVariableTypeMode` controls whether a Variable should be treated as a non-Variable tensor in type dispatches. There are places whether we *don't* want the tensor to accumulate gradients, but *still* want the Variable to be treated as a Variable. Here is one example:
```python
# torch/tensor.py
with torch.no_grad():
...
new_tensor = self.new() # `at::GradMode` is false at this point
...
```
```cpp
// tools/autograd/templates/python_variable_methods.cpp
static PyObject * THPVariable_new(PyObject* self, PyObject* args, PyObject* kwargs)
{
...
// if we merge `at::GradMode` and `at::NonVariableTypeMode`, since `at::GradMode` is false and `self_.type()` checks `at::GradMode` to decide whether to return non-Variable type, it will return a non-Variable type here, which is not what we want (and throws a "Tensor that was converted to Variable was not actually a Variable" error)
return THPVariable_Wrap(torch::utils::legacy_tensor_new(self_.type(), args, kwargs));
...
}
```
For the above reason, we cannot merge `at::GradMode` and `at::NonVariableTypeMode`, as they have different purposes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18573
Differential Revision: D16134413
Pulled By: yf225
fbshipit-source-id: 6140347e78bc54206506499c264818eb693cdb8a
Summary:
As per attached tasks, these are noops and are being deprecated/removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22113
Reviewed By: philipjameson
Differential Revision: D15901131
fbshipit-source-id: 3acf12208f692548afe4844be13717a49d74af32
Summary:
This is useful for measuring inference performance of your
models. This is a very basic benchmark for now. We don't support
batching on the benchmark side, no inter and intra op parallelizm is
supported yet, just caller based parallelizm.
Main phylosophy here is that user should be able to provide inputs
from python and just stack them within the benchmark. API should be
exactly the same as passing inputs to module.forward.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20766
Test Plan: Added a new unit test
Differential Revision: D15435461
Pulled By: salexspb
fbshipit-source-id: db08829dc3f4398bb1d8aa16cc4a58b6c72f16c6
Summary:
This makes it so we can see the output of prim::Print in environments like iPython notebooks which override sys.stdout
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21625
Differential Revision: D15756793
Pulled By: jamesr66a
fbshipit-source-id: 7d9a14b2e229ed358e784318e9d862677db2c461
Summary:
This changes our compiler so it first emits Loads & Stores, and then transforms the graph to SSA in a follow up pass. When a variable is set, we emit a prim::Store, and when a variable is referenced, we emit a prim::Load.
```
a = 1
print(a)
```
becomes:
```
%a.1 : int = prim::Constant[value=1]()
prim::Store[name="a"](%a.1)
%a : int = prim::Load[name="a"]()
prim::Print(%a)
```
In the follow up pass, convertToSSA, the values are turned into SSA form with the Loads & Stores removed. This change will enable breaks and continues because you can transform the graph with the variable naming information still intact.
There are still some remaining jitter and edge cases issues that I have to look through, but I think is still ready for eview.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21101
Differential Revision: D15723353
Pulled By: eellison
fbshipit-source-id: 3269934d4bc24ddaf3a87fdd20620b0f954d83d0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20674
A few targets in caffe2/caffe2/distribute needs to be split too, otherwise won't compile. Also some clean ups and make select_gpu_type to gpu_library_selector
Differential Revision: D15406019
fbshipit-source-id: 6455ab885b248502b48d4c7565597e00fecfd547
Summary:
#19975 was separated by 2 PRs.
This one:
Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions.
At this moment both functions just operate with strides and doesn't store any tensor state.
(Original RFC #19092)
-----
Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api).
Note: We had several complaints about `.to(memory_format)` function, and decided not to support it.
1. `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.
- Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior.
- Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern.
`x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise.
2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.
- `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged.
- `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format.
Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455
Differential Revision: D15341577
Pulled By: VitalyFedyunin
fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20234
The differences with the existing function _dist_broadcast_coalesced
is that this one works for both CPU and CUDA tensors and that it has a
maximum number of in flight operations.
This should be the final change needed to have only a single version
of DistributedDataParallel that both supports CPU and CUDA models, or
even a mix of both.
See #17757 for more information.
Reviewed By: mrshenli
Differential Revision: D15228099
fbshipit-source-id: a2113ba6b09b68cb5328f49f4c1960031eb43c93
Summary:
* adds TORCH_API and AT_CUDA_API in places
* refactor code generation Python logic to separate
caffe2/torch outputs
* fix hip and asan
* remove profiler_cuda from hip
* fix gcc warnings for enums
* Fix PythonOp::Kind
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19554
Differential Revision: D15082727
Pulled By: kostmo
fbshipit-source-id: 83a8a99717f025ab44b29608848928d76b3147a4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19607
Explicit is better than implicit - it's pretty hard to debug where particular file is if it's not greppable.
As a follow up step - we should look whether we can just include build_variables.py in CMake directly to share setups of two build systems
Reviewed By: ezyang
Differential Revision: D15023348
fbshipit-source-id: 600ef2d1871bc28530c6a02681b284f7499904df
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19282
This is largely a hack because we need to use the function schema parser from ATen/core
but aren't clear yet on how the final software architecture should look like.
- Add function schema parser files from jit to ATen/core build target.
- Also move ATen/core build target one directory up to allow this.
We only change the build targets and don't move the files yet because this is likely
not the final build set up and we want to avoid repeated interruptions
for other developers. cc zdevito
Reviewed By: dzhulgakov
Differential Revision: D14931922
fbshipit-source-id: 26462e2e7aec9e0964706138edd3d87a83b964e3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19281
String<->Number conversions aren't available in the STL used in our Android environment.
This diff adds workarounds for that so that the function schema parser can be compiled for android
Reviewed By: dzhulgakov
Differential Revision: D14931649
fbshipit-source-id: d5d386f2c474d3742ed89e52dff751513142efad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19280
We want to use the function schema parser from ATen/core, but with as little dependencies as possible.
This diff moves the function schema parser into its own file and removes some of its dependencies.
Reviewed By: dzhulgakov
Differential Revision: D14931651
fbshipit-source-id: c2d787202795ff034da8cba255b9f007e69b4aea