pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-08 07:39:33 +01:00

Author	SHA1	Message	Date
ashishfarmer	bcdff7eb67	Fix for tests on ROCm (#37616 ) Summary: This pull request fixes and re-enables two of the tests disabled in https://github.com/pytorch/pytorch/issues/37427 1. `test_sparse_add_out_bfloat16` in test_sparse.py fixed to use updated `atol` argument instead of `prec` for `assertEqual` 2. The conversion of `flt_min` to `int64` is divergent on HIP compared to numpy. The change removes that conversion from the `test_float_to_int_conversion_finite` test case in test_torch.py cc: ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/37616 Differential Revision: D21379876 Pulled By: ezyang fbshipit-source-id: 2bfb41d67874383a01330c5d540ee516b3b07dcc	2020-05-04 07:16:54 -07:00
Pavel Belevich	b1790794f6	Enforce Tensor.random_ check that from and to are in tensor dtype bounds (#37507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37507 Replace `TORCH_WARN` with `TORCH_CHECK` if `Tensor.random_()`'s `from` or `to-1` is out of bounds for tensor's dtype. Previously warning said "This warning will become an error in version 1.6 release, please fix the code in advance", so the time has come. Related to #33106 Test Plan: Imported from OSS Differential Revision: D21349413 Pulled By: pbelevich fbshipit-source-id: ac7c196a48fc58634611e427e65429a948119e40	2020-05-01 12:58:45 -07:00
anjali411	1f09f7ea44	Python API for Complex Storage and storage copy logic (#35771 ) Summary: Following up on this: https://github.com/pytorch/pytorch/pull/35851 cross dtype storage copy is not being used internally, so I have not included cross dtype copy for complex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35771 Differential Revision: D21319650 Pulled By: anjali411 fbshipit-source-id: 07c72996ee598eba0cf401ad61534494d6f5b5b3	2020-05-01 11:47:22 -07:00
kshitij12345	22708be5af	Migrate `tan` from TH to ATen (CUDA) (#36906 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24641 Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.tan(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.tan(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.tan(a) a.numel() == 10000 for 20000 times torch.half 0.28325206200003095 torch.tan(a) a.numel() == 10000 for 20000 times torch.float 0.28363607099998944 torch.tan(a) a.numel() == 10000 for 20000 times torch.double 0.43924326799998425 torch.tan(a) a.numel() == 100000 for 20000 times torch.half 0.3754699589999859 torch.tan(a) a.numel() == 100000 for 20000 times torch.float 0.38143782899999223 torch.tan(a) a.numel() == 100000 for 20000 times torch.double 1.7672172019999834 ``` After: ``` torch.tan(a) a.numel() == 10000 for 20000 times torch.half 0.28982524599996395 torch.tan(a) a.numel() == 10000 for 20000 times torch.float 0.29121579000002384 torch.tan(a) a.numel() == 10000 for 20000 times torch.double 0.4599610559998837 torch.tan(a) a.numel() == 100000 for 20000 times torch.half 0.3557764019997194 torch.tan(a) a.numel() == 100000 for 20000 times torch.float 0.34793807599999127 torch.tan(a) a.numel() == 100000 for 20000 times torch.double 1.7564662459999454 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36906 Differential Revision: D21335320 Pulled By: VitalyFedyunin fbshipit-source-id: efab9c175c60fb09223105380d48b93a81994fb0	2020-05-01 10:17:19 -07:00
Hong Xu	cd48fb5030	Vectorize linspace on CPU. (#27957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27957 Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136): ```python import timeit for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(40_000, 50000), (400_000, 5000)]: print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times') print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t)) ``` Before: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 1.3964195849839598 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 1.2374563289922662 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 1.8631796519621275 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 1.6991038109990768 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 1.8358083459897898 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 1.7214750979910605 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 1.8356257299892604 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 1.706238206999842 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 1.7463878280250356 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 1.6172360889613628 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 1.8656846070080064 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 1.714238062966615 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 1.8272205490502529 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 1.6409171230043285 ``` After: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 1.0077099470072426 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 0.8227124120458029 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 1.0058343949494883 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 0.8376779520185664 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 1.903041019977536 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 1.7576498500420712 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 1.7628699769848026 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 1.6204477970022708 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 2.0970272019621916 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 1.9493417189805768 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 2.29020385700278 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 2.1212510910118 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 2.3479344319785014 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 2.156775983981788 ``` Test Plan: Imported from OSS Differential Revision: D20773454 Pulled By: VitalyFedyunin fbshipit-source-id: ebeef59a90edde581669cc2afcc3d65929c8ac79	2020-04-30 14:26:24 -07:00
kshitij12345	7e9cc4df85	Migrate `cos` and `cos_` from TH to ATen (CUDA) (#36653 ) Summary: Benchmark with same build settings on same system. Closes https://github.com/pytorch/pytorch/issues/24545 gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.cos(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.cos(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.cos(a) a.numel() == 10000 for 20000 times torch.half 0.2797315450006863 torch.cos(a) a.numel() == 10000 for 20000 times torch.float 0.283109110998339 torch.cos(a) a.numel() == 10000 for 20000 times torch.double 0.3648525129974587 torch.cos(a) a.numel() == 100000 for 20000 times torch.half 0.34239949499897193 torch.cos(a) a.numel() == 100000 for 20000 times torch.float 0.33680364199972246 torch.cos(a) a.numel() == 100000 for 20000 times torch.double 1.0512770260102116 ``` After: ``` torch.cos(a) a.numel() == 10000 for 20000 times torch.half 0.285825898999974 torch.cos(a) a.numel() == 10000 for 20000 times torch.float 0.2781305120001889 torch.cos(a) a.numel() == 10000 for 20000 times torch.double 0.34188826099989456 torch.cos(a) a.numel() == 100000 for 20000 times torch.half 0.29040409300023384 torch.cos(a) a.numel() == 100000 for 20000 times torch.float 0.28678944200009937 torch.cos(a) a.numel() == 100000 for 20000 times torch.double 1.065477349000048 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36653 Differential Revision: D21164675 Pulled By: VitalyFedyunin fbshipit-source-id: 5dd5d3af47c2a5527e1f4ab7669c2ed9a2293cee	2020-04-29 15:52:24 -07:00
Jesse Brizzi	bca82801e7	add support for generating Vandermonde matrices (#36725 ) Summary: Adds support for generating Vandermonde matrices based off of the Numpy implementation found [here](https://github.com/numpy/numpy/blob/v1.17.0/numpy/lib/twodim_base.py#L475-L563). Adds test to ensure generated matrix matches expected Numpy implementation. Note test are only limited to torch.long and torch.double due to differences in now PyTorch and Numpy deal with type promotion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36725 Differential Revision: D21075138 Pulled By: jessebrizzi fbshipit-source-id: 6bb1559e8247945714469b0e2b07c6f4d5fd1fd0	2020-04-29 13:16:26 -07:00
Nikita Shulga	1bb66a0cd4	Extend some of the basic ops to kHalf (#37121 ) Summary: Added enough operators to make sure that all unit tests from ATen/basic are passing, except for MM and IntArrayRefExpansion Pull Request resolved: https://github.com/pytorch/pytorch/pull/37121 Test Plan: `./bin/basic --gtest_filter=--gtest_filter=BasicTest.BasicTestHalfCPU` + `python -c "import torch; x = torch.tensor([2], dtype=torch.half); print(torch.isfinite(x+x))"` Differential Revision: D21296863 Pulled By: malfet fbshipit-source-id: e03d7a6939df11f611a9b317543bac52403cd009	2020-04-29 10:49:16 -07:00
ashishfarmer	bbd2350c99	Disable tests failing on test2 in ROCm CI (#37427 ) Summary: This pull request disables the unit tests that were observed to be failing once `test2` was enabled. These tests will be one by one looked at and fixed at the earliest, but until then disabling them to unblock `test2` The pull request also disables fftPlanDestroy for rocFFT to avoid double-freeing FFT handles cc: ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/37427 Differential Revision: D21302909 Pulled By: ezyang fbshipit-source-id: ecadda3778e65b7f4f97e24b932b96b9ce928616	2020-04-29 09:56:28 -07:00
Pavel Belevich	ec8517b6df	Move exponential_() to DistributionTemplates (#37456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37456 Fixes #37370 Test Plan: Imported from OSS Differential Revision: D21290781 Pulled By: pbelevich fbshipit-source-id: 2f516b5112b9ce1c9ba8967b3758decf86d65676	2020-04-29 08:07:35 -07:00
Pavel Belevich	06168bf17d	Move geometric_() to DistributionTemplates (#37418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37418 Fixes #37369 Test Plan: Imported from OSS Differential Revision: D21290757 Pulled By: pbelevich fbshipit-source-id: 42133f35edcbe716a07987bef2e68a4cdc27236a	2020-04-29 08:07:30 -07:00
Pavel Belevich	ce6077d7a8	Move log_normal_() to DistributionTemplates (#37392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37392 Fixes #37368 Test Plan: Imported from OSS Differential Revision: D21290740 Pulled By: pbelevich fbshipit-source-id: 15a76b2625d2ca8187c25333a86eecd111a259c6	2020-04-29 08:06:05 -07:00
kshitij12345	4e3dc34c47	add complex support to `reciprocal_cuda` kernel (#36749 ) Summary: dylanbespalko anjali411 Not sure if the test should be added to `test_torch` or `test_complex`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36749 Differential Revision: D21290529 Pulled By: anjali411 fbshipit-source-id: 07bc282e4c9480cd015ec5db104e79728437cd90	2020-04-28 21:51:46 -07:00
Emilio Castillo	273c464145	Fix `TensorIterator::view_offsets_` size (#37214 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37084 There are 3 alternatives for this design. This PR and the first one. When a tensor is a scalar `ndim==0`, accessing view_offsets_[0] when doing reductions, yields an invalid offset for the index which is the output of `argmax` and `argmin`. `fba9b9a023/aten/src/ATen/native/cpu/Reduce.h (L217)` This also happens in cuda code: `fba9b9a023/aten/src/ATen/native/cuda/Reduce.cuh (L797)` The second alternative is to check the size of `view_offsets` before accessing it. But this introduces some burden. The third alternative is related to the way that inputs are treated in `argmax` and `argmin` depending on the `dim` argument value. `fba9b9a023/aten/src/ATen/native/ReduceOps.cpp (L775-L780)` If `dim` is not specified, then the scalar gets reshaped into a 1-dim tensor and everything works properly, since now `view_offsets` has an actual entry. If dim is specified, then the input remains as a scalar causing the issue we see here. This PR tries to solve it in a generic way for every case so I went with option 1. I am willing to discuss it and change if you think that the other alternatives are better. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37214 Differential Revision: D21258320 Pulled By: ngimel fbshipit-source-id: 46223412187bbba4bfa7337e3f1d2518db72dea2	2020-04-28 18:08:51 -07:00
anjali411	b8ec165c0d	Fix failing test in test_torch.py (#37362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37362 Differential Revision: D21264829 Pulled By: anjali411 fbshipit-source-id: cec6af84630378f03cb3863c85e161776af236cd	2020-04-27 16:42:11 -07:00
Mike Ruberry	b64fc3c4b5	Changes warnings generated in cpp to show point of Python origination (#36052 ) Summary: Today in PyTorch, warnings triggered in C++ are printed to Python users like this: `../aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.` This may be unhelpful to Python users, who have complained it's difficult to relate these messages back to their programs. After this PR, warnings that go through the PyWarningHandler and allow it to add context print like this: ``` test/test_torch.py:16463: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:81.) cpu_result = getattr(cpu_tensor, op_str)(*cpu_args) ``` This relates the warning back to the user's program. The information about the cpp file and line number is preserved in the body of the warning message. Some warnings, like those generated in the JIT, already account for a user's Python context, and so they specify that they should be printed verbatim and are unaffected by this change. Warnings originating in Python and warnings that go through c10's warning handler, which prints to cerr, are also unaffected. A test is added to test_torch.py for this behavior. The test relies on uint8 indexing being deprecated and its warning originating from its current header file, which is an unfortunate dependency. We could implement a `torch.warn` function, instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36052 Differential Revision: D20887740 Pulled By: mruberry fbshipit-source-id: d3515c6658a387acb7fccaf83f23dbb452f02847	2020-04-25 21:18:58 -07:00
Xiang Gao	d7f7c290e3	addmv migration [resubmit] (#37236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37236 Differential Revision: D21232988 Pulled By: anjali411 fbshipit-source-id: ac6c0ee018aef3c841b039d76e6e1fbb3cd0292d	2020-04-25 07:43:27 -07:00
anjali411	4f3946a89b	Added complex dtypes to get_all_math_dtypes, complex acc type for cpu, fixed rdiv and pow for complex (#37193 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/36730 https://github.com/pytorch/pytorch/issues/36057 Partially resolves: https://github.com/pytorch/pytorch/issues/36671 ``` >>> 2j / torch.tensor([4], dtype = torch.complex64) tensor([(0.0000+0.5000j)], dtype=torch.complex64) >>> 1 / torch.tensor(3+4j) tensor((0.1200-0.1600j), dtype=torch.complex64) ``` rdiv is more generally broken for all dtypes because it doesn't promote the types properly eg. ``` >>> 1 / torch.tensor(2) tensor(0) >>> 2j / torch.tensor(4) tensor(0) ``` so that issue should be fixed in a separate PR Adding CPU acc types for complex Added cumsum, cumprod for complex dtypes Added complex dtypes to get_all_math_dtypes to expand testing for complex dtypes Old PR - https://github.com/pytorch/pytorch/pull/36747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37193 Differential Revision: D21229373 Pulled By: anjali411 fbshipit-source-id: 8a086136d8c10dabe62358d276331e3f22bb2342	2020-04-24 15:05:50 -07:00
Alexander Fix	2baff9476e	Test test_is_nonzero make expected exception inline (#37128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37128 In certain build modes (in fbcode, building a .par) the mechanism to get test output "expect" files doesn't work. All other tests in test_torch.py already had assertExpectedInline instead of assertExpected, with the expected result inline in the file. There was no equivalent for assertExpectedRaises, so I added one, and changed the tests for test_is_nonzero (the only test using this) Test Plan: CI, specifically the test test_is_nonzero should pass Reviewed By: malfet Differential Revision: D21197651 fbshipit-source-id: 2a07079efdcf1f0b0abe60e92cadcf55d81d4b13	2020-04-24 13:12:31 -07:00
moto	5a27ec09b8	Add Inverse Short Time Fourier Transform in ATen native (#35569 ) Summary: Ported `torchaudio`'s implementation (test, and documentation as well) to ATen. Note - Batch packing/unpacking is performed in Python. ATen implementation expects 4D input tensor. - The way `hop_length` is initialized in the same way as `stft` implementation. [The Torchaudio's version tried to mimic the same behavior but slightly different](`7da61a4bee/torchaudio/functional.py (L152-L157)`). Closes https://github.com/pytorch/pytorch/issues/34827 Relates https://github.com/pytorch/pytorch/issues/3775 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35569 Differential Revision: D21178090 Pulled By: mthrok fbshipit-source-id: 2701a8b241a36a6fb1b740c2fb2b07cb938185d4	2020-04-24 12:14:55 -07:00
kshitij12345	e98cdfa26f	Migrate `tanh` from TH to ATen (CUDA) (#36995 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24642 Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.tanh(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.tanh(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.tanh(a) a.numel() == 10000 for 20000 times torch.half 0.2816318240002147 torch.tanh(a) a.numel() == 10000 for 20000 times torch.float 0.2728829070001666 torch.tanh(a) a.numel() == 10000 for 20000 times torch.double 0.39797203200214426 torch.tanh(a) a.numel() == 100000 for 20000 times torch.half 0.3228214350019698 torch.tanh(a) a.numel() == 100000 for 20000 times torch.float 0.31780802399953245 torch.tanh(a) a.numel() == 100000 for 20000 times torch.double 1.3745740449994628 ``` After: ``` torch.tanh(a) a.numel() == 10000 for 20000 times torch.half 0.27825374500025646 torch.tanh(a) a.numel() == 10000 for 20000 times torch.float 0.27764024499992956 torch.tanh(a) a.numel() == 10000 for 20000 times torch.double 0.3771585260001302 torch.tanh(a) a.numel() == 100000 for 20000 times torch.half 0.2995866400015075 torch.tanh(a) a.numel() == 100000 for 20000 times torch.float 0.28355561699936516 torch.tanh(a) a.numel() == 100000 for 20000 times torch.double 1.393811182002537 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36995 Differential Revision: D21163353 Pulled By: ngimel fbshipit-source-id: e2216ff62cdfdd13b6a56daa63d4ef1440d991d4	2020-04-23 12:29:27 -07:00
Taylor Robie	7aec364bdf	extend gather shape check to handle incorrectly sized outputs (#37102 ) Summary: Fixes a safety issue (Nonsense values and segfaults) introduced by https://github.com/pytorch/pytorch/pull/36875 when in-place gather tries to use incorrect shapes. Consider the following block of code: ``` k0 = 8 k1 = 8 m = 100 x = torch.rand((k0, k1)) ind = torch.randint(0, k0, (m, k1)) output = torch.empty((m, k1)) print(torch.gather(x, 0, ind, out=output)) print(torch.gather(x, 1, ind, out=output)) ``` The first gather is legal, the second is not. (`ind` and `output` need to be transposed) Previously this was caught when the kernel tried to restride inputs for TensorIterator, but we can no longer rely on those checks and must test explicitly. If `m` is small the second gather returns gibberish; if it is large enough to push the read out of memory block the program segfaults. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37102 Differential Revision: D21190580 Pulled By: robieta fbshipit-source-id: 80175620d24ad3380d78995f7ec7dbf2627d2998	2020-04-23 11:47:01 -07:00
Anjali Chourdia	c306f2ed08	Revert D20660338: [pytorch][PR] Migrate addmv and mv from legacy to ATen native (CUDA & CPU) Test Plan: revert-hammer Differential Revision: D20660338 Original commit changeset: db1f521f1241 fbshipit-source-id: 8616ddd7bbd8f00351cfc45331a09b0bc9aa28ea	2020-04-23 10:46:45 -07:00
Gao, Xiang	a38c6e0454	Migrate addmv and mv from legacy to ATen native (CUDA & CPU) (#30898 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/24605 https://github.com/pytorch/pytorch/issues/24535 https://github.com/pytorch/pytorch/issues/24739 https://github.com/pytorch/pytorch/issues/24680 https://github.com/pytorch/pytorch/issues/30986 This does not fix https://github.com/pytorch/pytorch/issues/29984, it will be fixed in later PR. Most of this PR is just following the same logic inside TH and THC except the handle of n-dimensional zero-sized tensor, in specific the case: ``` (m,).addmv((m, 0), (0,), beta, alpha) ``` # Legacy code bugs and how this PR deal with it The above case is a case where BLAS often have a mismatch of semantics with PyTorch: For BLAS and cuBLAS, the above is a noop, but for PyTorch, it is a scalar-vector multiplication `output = beta * input`. The handle of this case is already very poor in legacy code and it is poorly tested: For the CPU implementation, there are two code paths: - Path 1: when dtype is float or double and `USE_BLAS`, then use BLAS - Path 2: when other dtypes or not `USE_BLAS`, use a fallback kernel in PyTorch For the CUDA implementation, there are also two code paths: - Path 1: when float or double, then use `cublasSgemv` or `cublasDgemv` in cuBlas - Path 2: when half, dispatch to `addmm` `test_blas_alpha_beta_empty` is supposed to cover all cases, but unfortunately, it only tests the Path 1 of CUDA and Path 1 of CPU, and both uncovered paths (path 2 for CPU and path 2 for CUDA) are buggy in legacy code. In this PR, I expanded the coverage of `test_blas_alpha_beta_empty`, but unfortunately, I have to skip the `half` dtype on CUDA 9. See the description below for detail: ## Bug on CPU implementation For the CPU implementation, the fallback kernel in path 2 already has the same semantics as PyTorch, not BLAS. But the code that tries to correct BLAS semantics to match PyTorch also runs on this case, leading to double correction, that is, `output = beta * input` now becomes `output = beta * beta * input`. This leads to the issue https://github.com/pytorch/pytorch/issues/30986 I just opened, and it is fixed in this PR. ## Bug on CUDA implementation For the CUDA implementation, path 2 dispatches to ``` (m, 1).addmm((m, 0), (0, 1), beta, alpha) ``` But unfortunately, for some old CUDA version when on old GPU on half dtype, the above is also noop, which is definitely not correct. But from what I see, on newer CUDA version or newer GPU, this is not a problem. This is a bug of PyTorch in `addmm`, so I opened a new issue https://github.com/pytorch/pytorch/issues/31006 to track this problem. But this is highly likely a dependency bug for PyTorch originating from cuBLAS, and it is only on a rarely used edge case on old hardware and software, so this issue would be a `won't_fix` unless some real requirements strongly indicate that this should be fixed. This issue is already with legacy code, and this PR does not make it worse. To prevent this issue from bothering us, I disable the test of `half` dtype for CUDA 9 when expanding the coverage of `test_blas_alpha_beta_empty`. I promote a CircleCI CUDA 10.1 test to `XImportant` so that it runs on PRs, because the path 2 of CUDA implementation is only covered by this configuration. Let me know if I should revert this change. ## An additional problem In legacy code for `addmv`, dtype `bfloat16` is enabled and dispatch to `addmm`, but `addmm` does not support `bfloat16` from what I test. I do the same thing in the new code. Let me know if I should do it differently. # Benchmark Code: ```python import torch print(torch.__version__) for i in range(1000): torch.arange(i, device='cuda') print('cpu') for i in 10, 100, 1000, 10000: a = torch.randn((i,)) b = torch.randn((i, i)) c = torch.randn((i,)) %timeit a.addmv(b, c, alpha=1, beta=2) print('cuda') for i in 10, 100, 1000, 10000: a = torch.randn((i,)).cuda() b = torch.randn((i, i)).cuda() c = torch.randn((i,)).cuda() torch.cuda.synchronize() %timeit a.addmv(b, c, alpha=1, beta=2); torch.cuda.synchronize() ``` Before: ``` 1.5.0a0+2b45368 cpu 2.74 µs ± 30.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 8.5 µs ± 85.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 686 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 74 ms ± 410 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cuda The slowest run took 4.81 times longer than the fastest. This could mean that an intermediate result is being cached. 27.6 µs ± 23 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 17.3 µs ± 151 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 20.5 µs ± 369 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 756 µs ± 6.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After: ``` 1.5.0a0+66b4034 cpu 3.29 µs ± 20 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 9.09 µs ± 7.41 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 687 µs ± 7.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 73.8 ms ± 453 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cuda 18.2 µs ± 478 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 17.7 µs ± 299 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 21.5 µs ± 2.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 751 µs ± 35.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30898 Differential Revision: D20660338 Pulled By: anjali411 fbshipit-source-id: db1f521f124198f63545064026f93fcb16b68f18	2020-04-23 06:56:49 -07:00
Alexander Fix	b889e0da8a	[torch] Excluding test_fft_input_modification without MKL (#36680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36680 If torch compiled without MKL, this test fails with torch.fft requiring MKL support Test Plan: CI Reviewed By: malfet Differential Revision: D21051362 fbshipit-source-id: dd2e2c7d323622c1c25fc4c817b85d83d2241b3a	2020-04-22 21:58:02 -07:00
Ailing Zhang	efcbcca454	Revert D21138687: [pytorch][PR] Added complex dtypes to get_all_math_dtypes, complex acc type for cpu, fixed rdiv and pow for complex Test Plan: revert-hammer Differential Revision: D21138687 Original commit changeset: ad3602ccf86c fbshipit-source-id: 69eb031c1a7c3d5e4b9f4241fbdada8d5980535d	2020-04-22 14:49:45 -07:00
Emilio Castillo	5fc391a646	Enforce type promotion in `torch.cat` (#35030 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35014 CUDA `cat` implementation doesn't use `TensorIterator` so there is the need of manually doing some checks in the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35030 Differential Revision: D21155853 Pulled By: nairbv fbshipit-source-id: 9e78bb7591f806734e12555831157061c925ff40	2020-04-22 13:35:07 -07:00
kshitij12345	a00d6758b8	Migrate `cosh` and `cosh_` from TH to ATen (CUDA) (#36654 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24546 Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.cosh(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.cosh(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.cosh(a) a.numel() == 10000 for 20000 times torch.half 0.2813017509997735 torch.cosh(a) a.numel() == 10000 for 20000 times torch.float 0.28355878599904827 torch.cosh(a) a.numel() == 10000 for 20000 times torch.double 0.27810572300040803 torch.cosh(a) a.numel() == 100000 for 20000 times torch.half 0.3239932899996347 torch.cosh(a) a.numel() == 100000 for 20000 times torch.float 0.321233343998756 torch.cosh(a) a.numel() == 100000 for 20000 times torch.double 0.5546665399997437 ``` After: ``` torch.cosh(a) a.numel() == 10000 for 20000 times torch.half 0.2905335750001541 torch.cosh(a) a.numel() == 10000 for 20000 times torch.float 0.27596429500044906 torch.cosh(a) a.numel() == 10000 for 20000 times torch.double 0.30358699899989006 torch.cosh(a) a.numel() == 100000 for 20000 times torch.half 0.30139567500009434 torch.cosh(a) a.numel() == 100000 for 20000 times torch.float 0.30246640400036995 torch.cosh(a) a.numel() == 100000 for 20000 times torch.double 0.5403946970000106 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36654 Differential Revision: D21164606 Pulled By: VitalyFedyunin fbshipit-source-id: 55e88f94044957f81599ae3c12cda38a3e2c985c	2020-04-22 10:16:24 -07:00
David Reiss	e75fb4356b	Remove (most) Python 2 support from Python code (#35615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615 Python 2 has reached end-of-life and is no longer supported by PyTorch. Now we can clean up a lot of cruft that we put in place to support it. These changes were all done manually, and I skipped anything that seemed like it would take more than a few seconds, so I think it makes sense to review it manually as well (though using side-by-side view and ignoring whitespace change might be helpful). Test Plan: CI Differential Revision: D20842886 Pulled By: dreiss fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed	2020-04-22 09:23:14 -07:00
anjali411	25eb250d77	Added complex dtypes to get_all_math_dtypes, complex acc type for cpu, fixed rdiv and pow for complex (#36747 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/36730 https://github.com/pytorch/pytorch/issues/36057 Partially resolves: https://github.com/pytorch/pytorch/issues/36671 ``` >>> 2j / torch.tensor([4], dtype = torch.complex64) tensor([(0.0000+0.5000j)], dtype=torch.complex64) >>> 1 / torch.tensor(3+4j) tensor((0.1200-0.1600j), dtype=torch.complex64) ``` rdiv is more generally broken for all dtypes because it doesn't promote the types properly eg. ``` >>> 1 / torch.tensor(2) tensor(0) >>> 2j / torch.tensor(4) tensor(0) ``` so that issue should be fixed in a separate PR Adding CPU acc types for complex Added cumsum, cumprod for complex dtypes Added complex dtypes to get_all_math_dtypes to expand testing for complex dtypes Pull Request resolved: https://github.com/pytorch/pytorch/pull/36747 Differential Revision: D21138687 Pulled By: anjali411 fbshipit-source-id: ad3602ccf86c70294a6e71e564cb0d46c393dfab	2020-04-22 08:52:41 -07:00
Mike Ruberry	4a2372bc90	Implements torch.isclose for complex tensors (#36456 ) Summary: Previously torch.isclose would RuntimeError when called on complex tensors. This update updates torch.isclose to run on complex tensors and be consistent with [NumPy](https://numpy.org/doc/1.18/reference/generated/numpy.isclose.html). However, NumPy's handling of NaN, -inf, and inf values is odd, so I adopted Python's [cmath.isclose](https://docs.python.org/3/library/cmath.html) behavior when dealing with them. See https://github.com/numpy/numpy/issues/15959 for more on NumPy's behavior. While implementing complex isclose I also simplified the isclose algorithm to: - A is close to B if A and B are equal, if equal_nan is true then NaN is equal to NaN - If A and B are finite, then A is close to B if `abs(a - b) <= (atol + abs(rtol * b))` This PR also documents torch.isclose, since it was undocumented, and adds multiple tests for its behavior to test_torch.py since it had no dedicated tests. The PR leaves equal_nan=True with complex inputs an error for now, pending the outcome of https://github.com/numpy/numpy/issues/15959. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36456 Differential Revision: D21159853 Pulled By: mruberry fbshipit-source-id: fb18fa7048e6104cc24f5ce308fdfb0ba5e4bb30	2020-04-21 19:53:55 -07:00
Mike Ruberry	a850d8a526	Fixes exponential with lambda=0 (#36837 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/36798. In the future more thorough testing would be nice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36837 Differential Revision: D21102342 Pulled By: mruberry fbshipit-source-id: 4fae45677e54b403296033720dfb13abca47f3a4	2020-04-21 17:34:07 -07:00
Jesse Brizzi	28f439d4f4	add absolute alias for abs (#36597 ) Summary: Adds an absolute alias for the abs function to match Numpy's use of both: https://docs.scipy.org/doc/numpy/reference/generated/numpy.absolute.html Adds test to ensure the output from abs and absolute are the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36597 Differential Revision: D21024458 Pulled By: jessebrizzi fbshipit-source-id: 4f2987e7bc7cde444d0a93e833a0350844b48d44	2020-04-20 14:49:51 -07:00
Mike Ruberry	0f0d69009e	Makes CUDA -float->uint8 cast consistent with CPU (#36832 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/36807. Also updates the cast testing to catch issues like this better. In the future a more constexpr based approach to casting would be nice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36832 Differential Revision: D21120822 Pulled By: mruberry fbshipit-source-id: 9504ddd36cfe6d9f9f545fc277fef36855c1b221	2020-04-19 23:33:38 -07:00
Natalia Gimelshein	1b3741aa7f	[WIP] reenable bfloat16 masked_select (#36859 ) Summary: Try reenabling bfloat16 masked_select, see it windows tests pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36859 Differential Revision: D21109535 Pulled By: ngimel fbshipit-source-id: ca260943e6575d8e788e9fd87161a0d40d3d44fb	2020-04-19 15:41:32 -07:00
Brian Vaughan	54ed6fd3ee	Use both absolute and relative tolerance in testing (#34258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34258 This PR allows both atol and rtol to be specified, uses defaults based on the prior analysis (spreadsheet attached to https://github.com/pytorch/pytorch/pull/32538), but retains the absolute tolerance behavior in cases where precision was previously specified explicitly. Test Plan: Imported from OSS Differential Revision: D21110255 Pulled By: nairbv fbshipit-source-id: 57b3a004c7d5ac1be80ee765f03668b1b13f4a7e	2020-04-19 06:16:49 -07:00
Xiang Gao	6ba734bae9	Vectorize reduction when reducing on fastest striding dimension [resubmit] (#36873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36873 Differential Revision: D21109194 Pulled By: ngimel fbshipit-source-id: eb18c6b4394f19a6c5eca45ef4ce97d623e051bd	2020-04-18 16:27:00 -07:00
Yuxin Wu	a64ea8ea04	Back out "Vectorize reduction when reducing on fastest striding dimension" (#36854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36854 Original commit changeset: ea3f7f29709c Test Plan: n/a Differential Revision: D21103684 fbshipit-source-id: e4862b32bf9815486e5fa7e05b9816550e9b0263	2020-04-17 19:53:30 -07:00
Xiang Gao	d92005ff73	Vectorize reduction when reducing on fastest striding dimension (#36709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36709 Test Plan: Imported from OSS Differential Revision: D21083393 Pulled By: ngimel fbshipit-source-id: ea3f7f29709c9a6e5b3ec45ba809cb2cf6c5e0c8	2020-04-17 10:12:49 -07:00
Mike Ruberry	d7fabfd5df	Implements complex isfinite and isinf (#36648 ) Summary: Implements complex isfinite and isinf, consistent with NumPy. A complex value is finite if and only if both its real and imaginary part are finite. A complex value is infinite if and only if its real or imaginary part are infinite. Old isfinite, isinf, and isnan tests are modernized and instead of fixtures the torch results are compared with NumPy. A new test is added for complex isfinite, isinf, and isnan. The docs for each function are updated to clarify what finite, infinite, and NaN values are. The new tests rely on a new helper, _np_compare, that we'll likely want to generalize in the near future and use in more tests. Addresses part of the complex support tasks. See https://github.com/pytorch/pytorch/issues/33152. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36648 Differential Revision: D21054766 Pulled By: mruberry fbshipit-source-id: d947707c5437385775c82f4e6c722349ca5a2174	2020-04-16 09:09:02 -07:00
anjali411	9e016f77a8	Added complex types to get_all_dtypes and turned on masked_fill for complex (#36335 ) Summary: 1. Added complex dtypes to get_all_dtypes to unify testing for complex dtypes with other dtypes so that they don't get out of sync with behavior supported for other dtypes. 2. resolves https://github.com/pytorch/pytorch/issues/36322, https://github.com/pytorch/pytorch/issues/36327 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36335 Differential Revision: D21045603 Pulled By: anjali411 fbshipit-source-id: 5089306b66fdc18148e831f56298da5de673be67	2020-04-16 08:24:45 -07:00
lixinyu	1e7155caa5	Bucketization (#7284 ) (#34577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34577 Test Plan: Imported from OSS Differential Revision: D20380975 Pulled By: glaringlee fbshipit-source-id: d75939bc54d98675f88d7037491a8420ac20847a	2020-04-15 10:32:51 -07:00
Vasiliy Kuznetsov	16e90eba59	hardsigmoid: add cuda kernels (#36351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36351 Adds CUDA kernels for hardsigmoid, to enable its use in training. Note: the update to the cpu backward pass is to keep the cpu vs cuda logic consistent, no change in functionality. Test Plan: add CI for the forward pass run this for the backward pass: https://gist.github.com/vkuzo/95957d365600f9ad10d25bd20f58cc1a Imported from OSS Differential Revision: D20955589 fbshipit-source-id: dc198aa6a58e1a7996e1831f1e479c398ffcbc90	2020-04-15 10:15:49 -07:00
xiaobingsuper	1a0b95e7e4	bfloat16: enable basic math function (#35172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35172 Test Plan: Imported from OSS Differential Revision: D20721146 Pulled By: ngimel fbshipit-source-id: 25b2176d0a431706c51a7086e0642aff814d7148	2020-04-14 17:18:21 -07:00
Kurt Mohler	ce3555a635	Relanding masked_select cuda port from TH to ATen (#36539 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33054 Relanding PR https://github.com/pytorch/pytorch/issues/35429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36539 Differential Revision: D21007226 Pulled By: ngimel fbshipit-source-id: 3c66ad073ff8e767ad120bc94120379d40346018	2020-04-14 14:03:59 -07:00
Natalia Gimelshein	f3f640d479	move test_abs to device-generic tests (#36465 ) Summary: Per title. test_abs used to be marked as slow_test and run on cpu only. Conceptually similar tests are done in TestTorchMathOps, so it's a matter of adding `abs` test there. 2 remaining checks (correct abs for large-valued long tensors, and correct abs for signed zeros) are factored into separate tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36465 Differential Revision: D21000248 Pulled By: ngimel fbshipit-source-id: 8bc8b0da936b1c10fe016ff2f0dbb5ea428e7e61	2020-04-14 09:48:08 -07:00
Wanchao Liang	3526627f46	Use unittest assertWarns instead (#36411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36411 This PR remove pytorch specific defined assertwarns and use the unit test one, also format some tests Test Plan: Imported from OSS Differential Revision: D20998159 Pulled By: wanchaol fbshipit-source-id: 1280ecff2dd293b95a639d13cc7417fc819c2201	2020-04-13 15:56:42 -07:00
Kurt Mohler	2bc49a4b85	block_diag dense (#33449 ) Summary: Add block_diag function for dense tensors, based on scipy.linalg.block_diag Closes https://github.com/pytorch/pytorch/issues/31932 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33449 Differential Revision: D20943099 Pulled By: zou3519 fbshipit-source-id: 8b5c9476fb5af959aafa4169612c660396d9b717	2020-04-13 10:04:55 -07:00
Max Balandat	379e4d9cad	[pytorch] Make behavior of SobolEngine consistent w/ other RNG functions (#36427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36427 Addresses https://github.com/pytorch/pytorch/issues/36341 Test Plan: unit tests Reviewed By: ldworkin Differential Revision: D20952703 fbshipit-source-id: 28055f4c4c0f8012c2d96e473b822fa455dd833c	2020-04-13 07:53:33 -07:00
Mike Ruberry	b92f8d9b7e	Revert D20950587: [pytorch][PR] Added complex types to get_all_dtypes and turned on masked_fill for complex Test Plan: revert-hammer Differential Revision: D20950587 Original commit changeset: ba7c372a28f0 fbshipit-source-id: 487ac59a971b1ecefd20fd446385ba12334d9695	2020-04-12 21:33:17 -07:00

1 2 3 4 5 ...

1169 Commits