Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46501
Gradients in this method will not be modified.
ghstack-source-id: 114851646
Test Plan: waitforbuildbot
Reviewed By: pritamdamania87
Differential Revision: D24374300
fbshipit-source-id: a2941891008f9f197a5234b50260218932d2d37d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46601
* except excluded tests and magic methods.
https://github.com/pytorch/pytorch/issues/38731
Previously, we'd only do run these tests for inplace operations. Since this is a lot more tests, fixed these issues that came up when running them -
- Updated schema of conj() to reflect existing behaviour.
- Updated deepEquals method in check_alias_annotation.cpp to re-use the overloaded == operator. Previous implementation did not cover all types of IValues.
- Corrected the order inputs are passed in during autograd testing of 'view' & 'reshape'.
- Subbed out atn::ger with the func its aliased to, atn::outer, for testing. The alias annotation checking code doesn't handle aliased operators properly.
ghstack-source-id: 114830903
Test Plan: Ran all tests in test:jit and verified they pass.
Reviewed By: eellison
Differential Revision: D24424955
fbshipit-source-id: 382d7e2585911b81b1573f21fff1d54a5e9a2054
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46657
This is used to simulate fake quantize operation for ops with fixed quantization parameters
e.g. hardsigmoid
Test Plan:
Imported from OSS
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D24451406
fbshipit-source-id: 26cc140c00f12bdec9a8f9dc880f4c425f4d4074
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46679
Current way of import configs will have runtime error when a single benchmark is launched directly with buck(e.g. `/buck-out/gen/caffe2/benchmarks/operator_benchmark/pt/conv_test.par`). The diff fixed that issue.
ghstack-source-id: 114857978
Test Plan: waitforsandcastle
Reviewed By: vkuzo
Differential Revision: D24459631
fbshipit-source-id: 29df17e66962a8604dbb7b8b9106713c3c19bed5
Summary:
I am adding documentation for building the C++-only libtorch.so without invoking Python in the build and install process. This works on my Ubuntu 20.04 system and is designed to be operating system agnostic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44196
Reviewed By: zou3519
Differential Revision: D24421066
Pulled By: malfet
fbshipit-source-id: e77c222703353ff7f7383fb88f7bce705f88b7bf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46572
When `num_samples == 0`, grid becomes zero. Although CUDA just silently proceeds, `cudaGetLastError()` will complain about the `Error: invalid configuration argument`. So it's actually failing in some future places that becomes really hard to debug.
Reviewed By: jianyuh
Differential Revision: D24409874
fbshipit-source-id: ca54de13b1ab48204bbad265e3f55b56b94a1a2f
Summary:
Added a workaround for the cases when NVCC tries to compile the object for sm_30 GPU compute capability to avoid the error message telling that `__ldg` intrinsic is not defined.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46535
Reviewed By: zou3519
Differential Revision: D24422445
Pulled By: ezyang
fbshipit-source-id: 66e8eb1cbe42d848cfff46d78720d72100e628f8
Summary:
There's some code which uses `six.PY3`, similar to:
```python
if six.PY3:
print("Python 3+ code")
else:
print "Python 2 code"
```
Where:
```python
PY3 = sys.version_info[0] == 3
```
When run on Python 4, this will run the Python 2 code! Instead, use `six.PY2` and avoid `six.PY3`.
---
Similarly, there's some `sys.version_info[0] == 3` checks, better done as `sys.version_info[0] >= 3`.
---
Also, it's better to avoid comparing the `sys.version` string, as it makes assumptions that each version component is exactly one character long, which will break in Python 3.10:
```pycon
>>> sys.version
'3.8.1 (v3.8.1:1b293b6006, Dec 18 2019, 14:08:53) \n[Clang 6.0 (clang-600.0.57)]'
>>> sys.version < "3.3"
False
>>> fake_v3_10 = '3.10.1 (v3.8.1:1b293b6006, Dec 18 2019, 14:08:53) \n[Clang 6.0 (clang-600.0.57)]'
>>> fake_v3_10 < "3.3"
True
```
---
Finally, I think the intention here is to skip when the Python version is < 3.6:
```python
unittest.skipIf(sys.version_info[0] < 3 and sys.version_info[1] < 6, "dict not ordered")
```
However, it will really skip for Python 0.0-0.5, 1.0-1.5 and 2.0-2.5. It's best to compare to the `sys.version_info` tuple and not `sys.version_info[1]`:
```python
unittest.skipIf(sys.version_info < (3, 6), "dict not ordered")
```
---
Found using https://github.com/asottile/flake8-2020:
```console
$ pip install -U flake8-2020
$ flake8 --select YTT
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32389
Reviewed By: zou3519
Differential Revision: D24424662
Pulled By: ezyang
fbshipit-source-id: 1266c4dbcc8ae4d2e2e9b1d7357cba854562177c
Summary:
Fixes issues when building certain PyTorch extensions where the cpp files do NOT compile if flags such as `__HIP_NO_HALF_CONVERSIONS__` are defined.
cc jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46273
Reviewed By: zou3519
Differential Revision: D24422463
Pulled By: ezyang
fbshipit-source-id: 7a43d1f7d59c95589963532ef3bd3c68cb8262be
Summary:
This PR makes it possible to cast the parameters of nn.Module to complex dtypes.
The following code works with the proposed changes.
```python
In [1]: import torch
In [2]: lin = torch.nn.Linear(5, 1).to(torch.complex64)
In [3]: lin(torch.zeros(3, 5, dtype=torch.complex64))
Out[3]:
tensor([[-0.1739+0.j],
[-0.1739+0.j],
[-0.1739+0.j]], grad_fn=<AddmmBackward>)
```
Fixes https://github.com/pytorch/pytorch/issues/43477.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44788
Reviewed By: zou3519
Differential Revision: D24307225
Pulled By: anjali411
fbshipit-source-id: dacc4f5c8c9a99303f74d1f5d807cd657b3b69b5
Summary:
Resolves one item in https://github.com/pytorch/pytorch/issues/46321
This PR sets up DistExamplesTest which will be used as the class to implement future tests for examples. This class is run as part of CI tests. It also creates a dist_examples folder and includes the [batch server example](https://github.com/pytorch/examples/blob/master/distributed/rpc/batch/parameter_server.py) which is slightly modified to allow to be tested.
Run test:
pytest test/distributed/rpc/test_tensorpipe_agent.py -k test_batch_updating_parameter_server -vs
pytest test/distributed/rpc/test_process_group_agent.py -k test_batch_updating_parameter_server -vs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46510
Reviewed By: mrshenli
Differential Revision: D24379296
Pulled By: H-Huang
fbshipit-source-id: 1c102041e338b022b7a659a51894422addc0e06f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46249
This saves 15kb binary size on ios and increases binary size on android x86 for 30kb. It also reduces size a bit for android arm. I've talked to Martin and we should land this since Android binary size is much less important because of Voltron.
ghstack-source-id: 114177627
Test Plan: bsb
Reviewed By: ezyang
Differential Revision: D23057150
fbshipit-source-id: 43bd62901b81daf08ed96de561d711357689178f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46573
Original commit changeset: 7dd709b585f8
ghstack-source-id: 114730143
Test Plan: Verified on circleci that previously broken test is fixed.
Reviewed By: zdevito
Differential Revision: D24413096
fbshipit-source-id: 439568c631c4556b8ed6af20fcaa4b1375e554cf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46046
*_like functions are used in pytorch to create a new tensor with the same shape of the input tensor. But we don’t always preserve the layout permutation of the tensor. Current behavior is that, for a dense and non-overlapping tensor, its layout permutation is preserved. For eg. passing a channel last contiguous tensor t with ‘shape/stride’ (2, 4, 3, 2)/(24, 1, 8, 4) to empty_like(t) function will create a new tensor with exactly the same ‘shape/stride’ as the input tensor t. However, if the input tensor is non-dense or has overlap, we simply create a contiguous tensor based on input tensor’s shape, so the tensor layout permutation is lost.
This PR preserves the layout permutation for non-dense or overlapping tensor. The strides propagation rule that used in this PR is exactly the same as what is being used in TensorIterator. The behavior changes are listed below:
| code | old | new |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|------------------------------------------------------|
| #strided tensors<br>a=torch.randn(2,3,8)[:,:,::2].permute(2,0,1)<br>print(a.stride())<br>print(a.exp().stride())<br>print((a+a).stride())<br>out = torch.empty(0)<br>torch.add(a,a,out=out)<br>print(out.stride()) | (2, 24, 8) <br>(6, 3, 1) <br>(1, 12, 4) <br>(6, 3, 1) | (2, 24, 8)<br>(1, 12, 4)<br>(1, 12, 4)<br>(1, 12, 4) |
| #memory dense tensors<br>a=torch.randn(3,1,1).as_strided((3,1,1), (1,3,3))<br>print(a.stride(), (a+torch.randn(1)).stride())<br>a=torch.randn(2,3,4).permute(2,0,1)<br>print(a.stride())<br>print(a.exp().stride())<br>print((a+a).stride())<br>out = torch.empty(0)<br>torch.add(a,a,out=out)<br>print(out.stride()) | (1, 3, 3) (1, 1, 1)<br>(1, 12, 4)<br>(6, 3, 1)<br>(1, 12, 4)<br>(6, 3, 1) | (1, 3, 3) (1, 3, 3)<br>(1, 12, 4)<br>(1, 12, 4)<br>(1, 12, 4)<br>(1, 12, 4) |
This is to solve the non-dense tensor layout problem in #45505
TODO:
- [x] Fix all the BC broken test cases in pytorch
- [ ] Investigate if any fb internal tests are broken
This change will cover all kinds of non-dense tensors.
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D24288970
Pulled By: glaringlee
fbshipit-source-id: 320fd4e0d1a810a12abfb1441472298c983a368d
Summary: It creates cpu overload issues when openmp gets enabled and OMP_NUM_THREADS=1 is not set.
Test Plan: buck test //caffe2/caffe2/quantization/server:quantize_dnnlowp_op_test
Reviewed By: jspark1105
Differential Revision: D24437305
fbshipit-source-id: 426209fc33ce0d4680c478f584716837ee62cb5e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46356
Adding the flag `-Werror=cast-function-type` to ensure we don't allow
any invalid casts (ex: PyCFunction casts).
For more details see: https://github.com/pytorch/pytorch/issues/45419
ghstack-source-id: 114632980
Test Plan: waitforbuildbot
Reviewed By: albanD
Differential Revision: D24319759
fbshipit-source-id: 26ce4650c220e8e9dd3550245f214c7e6c21a5dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45538
This is used to simulate fake quantize operation for ops with fixed quantization parameters
e.g. hardsigmoid
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D24004795
fbshipit-source-id: fc4797f80842daacd3b3584c5b72035774634edd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46337
We plan to pass around the mappings instead of using global registration api to keep
the mappings local to the transformations user is performing
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D24317436
fbshipit-source-id: 81569b88f05eeeaa9595447e482a12827aeb961f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46227
Follow up from https://github.com/pytorch/pytorch/issues/45419, in
this PR I've removed as many PyCFunction casts as I could from the codebase.
The only ones I didn't remove were the ones with `METH_VARARGS | METH_KEYWORDS`
which have 3 parameters instead of 2 and had to be casted. Example: `
{"copy_", (PyCFunction)(void(*)(void))THPStorage_(copy_), METH_VARARGS |
METH_KEYWORDS, nullptr},`
ghstack-source-id: 114632704
Test Plan: waitforbuildbot
Reviewed By: albanD
Differential Revision: D24269435
fbshipit-source-id: 025cfd43a9a2a3e59f6b2951c1a78749193d77cf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46219
- Refactor StaticRuntime and group common data structures, the jit graph, and the script module into a separate struct `InferenceModule`:
```
struct InferenceModule {
explicit InferenceModule(const torch::jit::Module& m);
explicit InferenceModule(std::shared_ptr<torch::jit::Graph> g);
torch::jit::Module module;
std::shared_ptr<torch::jit::Graph> graph;
std::unique_ptr<c10::FunctionSchema> schema;
std::unordered_map<Value*, size_t> value_to_reg;
std::vector<size_t> input_regs; // inputs to the graph
std::vector<size_t> output_regs; // outputs of the graph
std::vector<size_t> internals;
};
```
which is stored in the PyTorchPredictor, as well as the static runtime, and shared across threads. Then this is what's left inside the Static Runtime:
```
mutable std::vector<IValue> reg_;
// The nodes we need to run
std::vector<ProcessedNode> nodes_;
```
`reg_` holds all the weights and activations, which is different across threads during running. `nodes_` holds the op nodes and input/output registers, and is the same across threads for now. We could potentially put other stateful data structures in it, so I kept it inside the static runtime. It could be easily moved into the `InferenceModule` if we decide not to anything else into `ProcessedNode`.
- Added StaticRuntimeOptions so we can toggle certain optimizations on/off, for testing and benchmarking. `cleanup_activations` is an example.
- Integration with PyTorchPredictor. Added a lockfree stack in the PyTorchPredictor to hold all the static runtime instances. Benchmark shows that the `push` and `pop` combo takes about 80 ns, which is quite acceptable.
This diff focuses on threading model only. Benchmarks will be separate.
Reviewed By: bwasti
Differential Revision: D24237078
fbshipit-source-id: fd0d6347f02b4526ac17dec1f731db48424bade1
Summary:
Simplifies some parts of build.sh and removes old references in the code to non-existent trusty images.
There are other parts of the code where trusty is referenced for travis (most of them in third party directories) and I did not touch those. https://github.com/pytorch/pytorch/search?q=trusty
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46594
Reviewed By: seemethere
Differential Revision: D24426796
Pulled By: janeyx99
fbshipit-source-id: 428c52893d2d35c1ddd1fd2e65a4b6575f260492
Summary:
This diff changes `TensorExprKernel::generateStmt` to use flatten loops instead of flatten tensors.
Checked all tests on CPU as well as CUDA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46539
Reviewed By: nickgg
Differential Revision: D24395956
Pulled By: navahgar
fbshipit-source-id: f3792903f2069bda37b571c9f0a840e6fb02f189
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).
New submodule commit: 23cb1db72b
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46578
Test Plan: Ensure that CI jobs succeed on GitHub before landing.
Reviewed By: YazhiGao
Differential Revision: D24415308
fbshipit-source-id: c353dcf86cfd833a571a509930a17d09277a73e4
Summary:
Instead of installing Open MPI for build and test jobs with environment *-xenial-cuda*, install Open MPI into the relevant Docker images. This would save time and remove duplication in our scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46569
Reviewed By: walterddr
Differential Revision: D24409534
Pulled By: janeyx99
fbshipit-source-id: 6152f2f5daf63744d907dd234bc12d2a5ec58f3d