Summary:
Make the following changes:
- When there are more than 10k errors, cuda-memcheck only shows 10k errors, in this case we shouldn't raise an Exception
- Add UNDER_CUDA_MEMCHECK environment to allow disabling `pin_memory` tests when running cuda-memcheck.
- Add a `--ci` command option, when turned on, then this script would run output to stdout instead of writing a file, and exit with an error if cuda-memcheck fails
- Add a `--nohang` command option. When turned on, then hang would be treated as pass instead of error
- Do simple filtering on the test to run: if `'cpu'` in the test name but not `'cuda'` is not in the test name
- Add `--split` and `--rank` to allowing splitting the work (NVIDIA CI has a limitation of 3 hours, we have to split the work to satisfy this limitation)
- The error summary could be `ERROR SUMMARY: 1 error`, or `ERROR SUMMARY: 2 errors`, the tail could be `error` or `errors`, it is not of the same length. The script is fixed to handle this case.
- Ignore errors from `cufft`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29243
Differential Revision: D18941701
Pulled By: mruberry
fbshipit-source-id: 2048428f32b66ef50c67444c03ce4dd9491179d2
Summary:
Tests for unique_dim will be refactored in a separate PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31211
Differential Revision: D19034968
Pulled By: ngimel
fbshipit-source-id: 855d326b37638b5944f11fbbce03394cf000daf9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30892
Fixes all outstanding lints and actually installs a properly configured
flake8
Test Plan: Imported from OSS
Differential Revision: D18862825
Pulled By: suo
fbshipit-source-id: 08e9083338a7309272e17bb803feaa42e348aa85
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30826
Previously the scalar_check for the reduction None case was:
input.dim() <= 1, but it should be target based, i.e.:
target.dim() == 0. This follows from the "correct cases", i.e.
(N, C) X (N,) -> (N,)
(C,) X () -> ()
Test Plan: Imported from OSS
Differential Revision: D18833660
Pulled By: gchanan
fbshipit-source-id: 26338b842a8311718c4b89da3e2f1b726d5409b8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30768
The behavior didn't match the documentation, because the documentation (for 'none' reduction) reads:
input X target -> output
(N, C) X (N, C) -> (N,)
(C,) X (C,) -> ()
but the later case would output (1,). This also changes the case to:
() X (C,) -> ()
from:
() X (C,) -> (C,)
which makes more sense with the above formulas.
Restacked version of: https://github.com/pytorch/pytorch/pull/30748
Test Plan: Imported from OSS
Differential Revision: D18821554
Pulled By: gchanan
fbshipit-source-id: 3df77c51cf25648cb5fab62a68b09f49c91dab4e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30670
Also turn off scalar_check for grad_input: it isn't necessary because the input can't be 0-dimensional.
Test Plan: Imported from OSS
Differential Revision: D18784523
Pulled By: gchanan
fbshipit-source-id: 246d30970457075a0403dd0089317659a2cd2dd4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30669
The inputs can't be 0-d, so we don't need that check in the scalar_check.
Test Plan: Imported from OSS
Differential Revision: D18784524
Pulled By: gchanan
fbshipit-source-id: d44222dffc91880a6e8c7be69e6e146e60040d43
Summary:
With the CI failure caused in 8bbafa0b32 fixed (incorrect return type of the lambdas in CUDA kernels)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30521
Differential Revision: D18770151
Pulled By: ailzhang
fbshipit-source-id: 02f0fe1d5718c34d24da6dbb5884ee8b247ce39a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30527
When we introduced dtype.is_signed we allowed for support of
quantized types, but we're not sure what the correct result should be.
See discussion at https://github.com/pytorch/pytorch/pull/29511
Test Plan: Imported from OSS
Differential Revision: D18765410
Pulled By: nairbv
fbshipit-source-id: c87cfe999b604cfcbbafa561e04d0d5cdbf41e6d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30434
These are all pointwise ops that are implemented correctly wrt shapes in THC.
Test Plan: Imported from OSS
Differential Revision: D18699087
Pulled By: gchanan
fbshipit-source-id: 82cb91b00c77bfaca75be497c87fc7ae52daf46c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29953
The underlying function handles it correctly.
Test Plan: Imported from OSS
Differential Revision: D18548055
Pulled By: gchanan
fbshipit-source-id: cc2d0ae37d9689423363d115c6a653cb64840528
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29952
The underlying op handles the check correctly.
Test Plan: Imported from OSS
Differential Revision: D18548048
Pulled By: gchanan
fbshipit-source-id: 9ac6fde743408e59ccdfc61bd574ebe6e2862238
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29923
Note that this changes the behavior of masked_select when both "self" and "mask" are 0-dimensional.
In previous versions of PyTorch, this would return a 0-dimensional tensor. But the documentation reads:
"Returns a new 1-D tensor which indexes the input tensor according to the boolean mask mask which is a BoolTensor."
Test Plan: Imported from OSS
Differential Revision: D18539560
Pulled By: gchanan
fbshipit-source-id: 1637ed2c434fcf8ceead0073aa610581f4a19d21
Summary:
Migrate index_add cpu from TH to ATen.
I couldn't find replacement for get1d and set1d, so doing pointer arithmetic inplace.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28421
Test Plan: existing tests
Differential Revision: D18060971
Pulled By: ggoossen
fbshipit-source-id: 413719990cdb2fe578964cde14e93577e48a4342
Summary:
This adds support for gemm-style matrix multiplications with data and output in bf16 to PyTorch on ROCm to the backend (i.e., bgemm).
Enable operators depending on bgemm.
With this change, bf16 matrices on ROCm can be multiplied on the GPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27719
Differential Revision: D18653514
Pulled By: bddppq
fbshipit-source-id: 805db923579bec6fc8fd1c51eeb5b1ef85a96758
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28347
gchanan , I am generating a warning as follows:
```
(torch_new) prasun@prasun-xps:~/dev/explore-array-computing$ python arange_test.py
Trying 45...
Before arange shape is torch.Size([1, 45])
After arange shape is torch.Size([1, 45])
Trying 46...
Before arange shape is torch.Size([1, 46])
After arange shape is torch.Size([1, 46])
Trying 47...
Before arange shape is torch.Size([1, 47])
After arange shape is torch.Size([1, 47])
Trying 48...
Before arange shape is torch.Size([1, 48])
After arange shape is torch.Size([1, 48])
Trying 49...
Before arange shape is torch.Size([1, 49])
../aten/src/ATen/native/RangeFactories.cpp:163: UserWarning: Size of out Tensor does not match the result Tensor. The output Tensor will be resized!
After arange shape is torch.Size([50])
Traceback (most recent call last):
File "arange_test.py", line 10, in <module>
assert len(line.shape) == 2
AssertionError
```
Is this alright ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29195
Differential Revision: D18638184
Pulled By: ezyang
fbshipit-source-id: a93e4ce615b5a315570f9951021ef74fc1d895a6
Summary:
Enabled basic support for bfloat16 on cuda
Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27259
Differential Revision: D17728661
Pulled By: izdeby
fbshipit-source-id: 99efb6bc4aec029fe6bbc8a68963dca9c9dc5810
Summary:
Stacked PRs
* https://github.com/pytorch/pytorch/issues/29244 - Use custom CRC
* **https://github.com/pytorch/pytorch/issues/29232 - Add zipfile serialization**
This adds a serialization method that uses a zipfile (https://github.com/pytorch/pytorch/issues/26567). Right now it is
guarded behind a flag `_use_new_zipfile_serialization`. In release mode it seems to have performance about the same / slightly better than the current serialization in some simple benchmarks for large/small tensors.
Follow ups:
* Flip the `_use_new_zipfile_serialization` flag
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29232
Differential Revision: D18332036
Pulled By: driazati
fbshipit-source-id: 1bac0847c4d599612cba905f2cac8248783be2f4