pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Edward Yang	e362ee6f8a	Make it illegal to directly construct _TensorBase (#56150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56150 See #56017 for full context; the short story is that by making it illegal to directly construct _TensorBase, we need only write a single tp_dealloc function which will work universally for all _TensorBase subclasses, rather than having to write two versions, one for _TensorBase itself, and others for Python subclasses of _TensorBase. This means simpler code. The subtlety here is that we only install our custom `tp_new` for direct subclasses of TensorBase. This is important, because overriding the `tp_new` also overrides any user defined constructor. Fortunately class Tensor(_TensorBase) has no nontrivial constructors and doesn't mind, but other subclasses like Parameter definitely mind! Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28028746 Pulled By: ezyang fbshipit-source-id: 3c03a14666ad1ded1145fe676afb0a7623cdb9bb	2021-04-28 09:25:25 -07:00
Arindam Roy	5d7e48c9fc	Disable one test in rocm (#56951 ) Summary: The test seems to be failing in ROCM 4.1 on CI node. Disabling the same for now. The test will be re-enabled for ROCM when CI transitions to 4.2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56951 Reviewed By: zou3519 Differential Revision: D28059808 Pulled By: ezyang fbshipit-source-id: a9b064b7525ae6dce89c51fe29ff07f37b7ac796	2021-04-28 08:58:51 -07:00
Yukio Siraichi	cf17fd6dd5	Fix multinomial CUDA misalignment and non-deterministic behavior (#55364 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46702 - fails on probability distribution with odd items - trying to access an `acc_type` (`float`) in a `scalar_t` (`float16`) aligned memory - produce unrepeatable result for large input tensor - parallel cumsum not monotonic at some positions ### Fixes - computing cumsum on `acc_type` (`float`) instead of using `scalar_t` (`float16`) fixed both issues - the non-monotonic behavior may happen even using `float`, though - in these cases, deterministic behavior may be achieved by eliminating the race condition when writing the result, using the atomic function `atomicMax` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55364 Reviewed By: mruberry Differential Revision: D28031666 Pulled By: ngimel fbshipit-source-id: 0fc6289e0b9ea2d31ef3771e7ca370de8f5c02de	2021-04-27 12:04:32 -07:00
Yu Guo	f5c24cc891	add deterministic path for index_copy_cpu (#56900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56900 use serial copy with iter.serial_for_each in the deterministic mode Test Plan: buck test mode/opt //caffe2/test:torch -- test_index_copy_deterministic ✓ Pass: caffe2/test:torch - test_index_copy_deterministic_cpu (test_torch.TestTorchDeviceTypeCPU) (5.581) buck test mode/opt //caffe2/test:torch_cuda -- test_nondeterministic_alert_index_copy ✓ ListingSuccess: caffe2/test:torch_cuda - main (11.565) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_index_copy_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (29.172) ✓ Pass: caffe2/test:torch_cuda - main (29.172) Reviewed By: ngimel Differential Revision: D27992992 fbshipit-source-id: cebeefd8508553f9dbc4145819fe90dd625502f3	2021-04-26 16:57:47 -07:00
Yu Guo	72c3ee073f	add deterministic path for index_add_cuda (#56521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56521 index_add_cuda is non-deterministic due to cuda atomicAdd. Here we add a deterministic code path with index_put(accumulate=True) Test Plan: buck test mode/opt //caffe2/test:torch_cuda -- test_index_add_deterministic ✓ ListingSuccess: caffe2/test:torch_cuda - main (12.289) ✓ Pass: caffe2/test:torch_cuda - test_index_add_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (27.190) ✓ Pass: caffe2/test:torch_cuda - main (27.190) Summary Pass: 2 ListingSuccess: 1 buck test mode/opt //caffe2/test:torch_cuda -- test_nondeterministic_alert ✓ ListingSuccess: caffe2/test:torch_cuda - main (16.088) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReflectionPad1d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_kthvalue_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad1d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_bincount_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_index_put_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_EmbeddingBag_max_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_MaxPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveAvgPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_histc_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_linear_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveMaxPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_FractionalMaxPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveAvgPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_NLLLoss_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_put_accumulate_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_grid_sample_2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_put_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_trilinear_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_bicubic_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReflectionPad2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_scatter_add_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AvgPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_grid_sample_3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_CTCLoss_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_FractionalMaxPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_index_copy_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_median_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_gather_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_bilinear_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - main (37.654) Summary Pass: 32 ListingSuccess: 1 Reviewed By: ngimel Differential Revision: D27861072 fbshipit-source-id: c33731017b863751f3e3068a23135129c555b66f	2021-04-26 12:14:58 -07:00
kshitij12345	9eee14704a	OpInfo: roll and rot90 (#56770 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56770 Reviewed By: ngimel Differential Revision: D27987820 Pulled By: mruberry fbshipit-source-id: c6b86cdc1b89d91eeda2215020137582e7c20c65	2021-04-25 22:12:38 -07:00
kshitij12345	9e027d7ea3	[OpInfo] Add opinfo for `transpose` and its aliases (#56122 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56122 Reviewed By: ezyang Differential Revision: D27962878 Pulled By: mruberry fbshipit-source-id: cfd84bb0dcedeb98233a10e2c9754281f7cb76af	2021-04-25 21:58:16 -07:00
kshitij12345	298db67220	[OpInfo] Add Function Variant and Opinfo for permute (#56125 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56125 Reviewed By: ezyang Differential Revision: D27960312 Pulled By: mruberry fbshipit-source-id: b9dd89f7e69d7dff29f3b53828656c13df898fa5	2021-04-25 21:26:44 -07:00
Kurt Mohler	1f04494c0e	Consolidate nondeterministic error tests (#55631 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55631 Reviewed By: malfet Differential Revision: D27909953 Pulled By: mruberry fbshipit-source-id: 9115b2433f9c276555be55bd51b270a7a2846829	2021-04-22 23:37:01 -07:00
Sam Estep	75024e228c	Add lint for unqualified `type: ignore` (#56290 ) Summary: The other half of https://github.com/pytorch/pytorch/issues/56272. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2384511062 - https://github.com/pytorch/pytorch/actions/runs/765036024 Reviewed By: seemethere Differential Revision: D27867219 Pulled By: samestep fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235	2021-04-21 08:07:23 -07:00
Brandon Lin	d806b06167	Support int32 indices in torch.repeat_interleave (#55102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55102 To avoid casting a tensor to `.long()`, we introduce support for int32 in `torch.repeat_interleave`. Reviewed By: ezyang Differential Revision: D27478235 fbshipit-source-id: 08b4cce65fe94ff10535ddc07e1ba2bacea6a2cf	2021-04-19 09:07:25 -07:00
Winston Smith	36b476ccdd	Added OpInfos for eq, ne, ge, gt, le, and lt (#55709 ) Summary: A https://github.com/pytorch/pytorch/issues/54261 task Added OpInfos for `eq`, `ne`, `ge`, `gt`, `le`, and `lt`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55709 Reviewed By: jbschlosser Differential Revision: D27760382 Pulled By: mruberry fbshipit-source-id: 30d8c9633c69a097c1e4a9daf4178c617c0a9093	2021-04-17 22:52:47 -07:00
Victor Bittorf	52f1a07b63	Python API for Vitals (#53238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53238 There is a tension for the Vitals design: (1) we want a macro based logging API for C++ and (2) we want a clean python API. Furthermore, we want to this to work with "print on destruction" semantics. The unfortunate resolution is that there are (2) ways to define vitals: (1) Use the macros for local use only within C++ - this keeps the semantics people enjoy (2) For vitals to be used through either C++ or Python, we use a global VitalsAPI object. Both these go to the same place for the user: printing to stdout as the globals are destructed. The long history on this diff shows many different ways to try to avoid having 2 different paths... we tried weak pointers & shared pointers, verbose switch cases, etc. Ultimately each ran into an ugly trade-off and this cuts the difference better the alternatives. Test Plan: buck test mode/dev caffe2/test:torch -- --regex vital buck test //caffe2/aten:vitals Reviewed By: orionr Differential Revision: D26736443 fbshipit-source-id: ccab464224913edd07c1e8532093f673cdcb789f	2021-04-15 16:06:43 -07:00
Nikita Shulga	6daa1760d7	Skip geqrf test if compiled without LAPACK (#56105 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56105 Reviewed By: walterddr Differential Revision: D27785443 Pulled By: malfet fbshipit-source-id: 9701f693a71f77259c0a6371106e7185cc49a803	2021-04-15 08:07:51 -07:00
Yu Guo	8596ac186b	deterministic code path for gather_backward for dim = 1 (#55573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55573 provide a deterministic code path for gather_backward when dim = 1 Test Plan: buck test //caffe2/test:torch -- test_gather_backward ✓ Pass: caffe2/test:torch - test_gather_backward_one_dim (test_torch.TestTorch) (1.099) ✓ Pass: caffe2/test:torch - test_gather_backward_deterministic_path (test_torch.TestTorch) (1.166) test on GPU buck test mode/opt //caffe2/test:torch_cuda -- test_gather_backward_deterministic Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/1407375070421778 ✓ ListingSuccess: caffe2/test:torch_cuda - main (7.484) ✓ Pass: caffe2/test:torch_cuda - test_gather_backward_deterministic_path_cuda (test_torch.TestTorchDeviceTypeCUDA) (26.145) ✓ Pass: caffe2/test:torch_cuda - main (26.145) Summary Pass: 2 ListingSuccess: 1 Reviewed By: ngimel Differential Revision: D27632008 fbshipit-source-id: ec27475332a3b36360cc014193256c21cba77d63	2021-04-13 15:18:00 -07:00
Kurt Mohler	5a45b1b2f2	Add nondeterministic alert for `index_put_` when `accumulate=False` (#55827 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55516 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55827 Reviewed By: yinghai Differential Revision: D27725794 Pulled By: ngimel fbshipit-source-id: f6b5b3e635170524fdb5a0141ebd27925c37e8d9	2021-04-13 14:28:16 -07:00
Winston Smith	aceceb3d5c	Reland #50999 (Added pow() on CPU for float16 & bfloat16) (#55280 ) Summary: #### Reason for relanding Line 1607 of `torch/testing/_internal/common_methods_invocations.py` of https://github.com/pytorch/pytorch/issues/50999 had `dtype` instead of `dtype=torch.bool`, so 4 of the 9 sample inputs for `bool` had incorrect dtype. This bug was caught by https://github.com/pytorch/pytorch/issues/54949. 1. Added support for pow() on CPU for `float16` (`Half`) and `bfloat16` types. Both `pow(Tensor, Scalar)` and `pow(Tensor, Tensor)` are now supported for the aforementioned types. However autograd isn't supported for `Float16` on CPU yet, as `log_vml_cpu` can't be enabled for it. 2. heitorschueroff added `pow_tensor_scalar_optimized_kernel` to refactor & simplify `PowKernel.cpp`. It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it). It replaced code that had previously been duplicated for (float, double) and complex types, so PowKernel.cpp looks a lot cleaner now. 3. Enabled (unskipped) some tests for `erf`, `erfc`,`erfinv`, `tan` and `linalg.vector.norm` which were being skipped earlier due to `pow()` not having been implemented for `float16` & `bfloat16`. 4. Added an OpInfo for `pow()` & enabled some test cases for `pow()`. 5. Extended the coverage of existing tests for `pow` in `test_binary_ufuncs.py` in order to enable comparison with `numpy`, even with discontiguous tensors, and added a test to ensure that a runtime error is raised for `pow`'s inplace variant if resizing the base tensor is required during its invocation. 6. Added `float16` & `bfloat16` to `square`'s dtype lists in its `UnaryUfuncInfo`. 7. Removed redundant `dtypesIfCPU` and `dtypesIfCUDA` from `OpInfo`s where they are equal to `dtypes`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55280 Reviewed By: jbschlosser Differential Revision: D27591772 Pulled By: heitorschueroff fbshipit-source-id: c7420811b32595bb3353149a61e54a73f2eb352b	2021-04-13 13:23:29 -07:00
albanD	505f6f325f	port addcdiv to opinfo (#55518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55518 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27649411 Pulled By: albanD fbshipit-source-id: cfb0a235d94ef62589acbeb9bf11d2ea17248484	2021-04-13 06:21:10 -07:00
albanD	9ccae89102	port addcmul to OpInfo (#55517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55517 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27649413 Pulled By: albanD fbshipit-source-id: e1faf25cf7f9c3636f62db1512aee78fd7c4f9b6	2021-04-13 06:19:33 -07:00
Wenlei Xie	561b507843	Eliminate device guard in generic dispatch key kernel wrappers (#55131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55131 Benchmark `zeros_out`: ```python from torch.utils.benchmark import Timer counts = Timer( stmt="""at::zeros_out(t, {1});""", setup="auto t = at::empty({1});", language="cpp", ).collect_callgrind(number=1_000) print(counts) ``` With device guard: ``` <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f834f095ca0> at::zeros_out(t, {1}); setup: auto t = at::empty({1}); All Noisy symbols removed Instructions: 1396022 1396022 Baseline: 0 0 1000 runs per measurement, 1 thread ``` Without device guard: ``` <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f25e48927c0> at::zeros_out(t, {1}); setup: auto t = at::empty({1}); All Noisy symbols removed Instructions: 1296022 1296022 Baseline: 0 0 1000 runs per measurement, 1 thread ``` We see about `7.7%` improvement. ghstack-source-id: 126295368 Test Plan: ``` buck build //caffe2/aten/... buck test mode/dev mode/no-gpu //caffe2/test:torch -- 'caffe2/test:torch - test_msnpu_error (test_torch.TestTorch)' ``` Reviewed By: ezyang Differential Revision: D27496584 fbshipit-source-id: 97f783a809b77b28f77a93096d69b3da9ee69df7	2021-04-12 15:42:19 -07:00
Mike Ruberry	399b66c813	Ports logdet from method_tests() to op_db (#55743 ) Summary: Per title. Also updates some tensor construction helpers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55743 Reviewed By: ngimel Differential Revision: D27702060 Pulled By: mruberry fbshipit-source-id: f64b7bee855733ad1f4fd182819ceec5831d9878	2021-04-11 20:39:16 -07:00
Yukio Siraichi	93bf0ae6fc	Remove legacy constructor calls from pytorch codebase. (#54142 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/53889 Related to https://github.com/pytorch/pytorch/issues/47112 Removing every occurrence of the legacy constructor call present in PyTorch at: - _docs_ - _benchmarks_ - _test_ - _caffe2_ - _CONTRIBUTING.md_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142 Reviewed By: ngimel Differential Revision: D27699450 Pulled By: mruberry fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546	2021-04-11 15:45:17 -07:00
Nikita Shulga	add49e7e4e	Enforce PEP263 for PyTorch python codebase (#55346 ) Summary: All python files containing non-ASCII characters should be correctly annotated with `# -- coding: utf-8 --` comment Delete number of superfluous UTF-8 characters, most commonly UTF-8 opening closing quotation mark U+2019 (’) instead of ascii apostrophe ', for example `Module’s`->`Module's` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55346 Reviewed By: samestep Differential Revision: D27582044 Pulled By: malfet fbshipit-source-id: c1cd89655915858ff3a41f675cdfffff795a8e44	2021-04-06 18:31:38 -07:00
lezcano	fd02fc5d71	Port put_ and take from TH to ATen (#53356 ) Summary: The two ports were don together, as they can be implemented with the same kernel. In TH, they were already implemented with the same kernel. Resolves https://github.com/pytorch/pytorch/issues/24751 Resolves https://github.com/pytorch/pytorch/issues/24614 Resolves https://github.com/pytorch/pytorch/issues/24640 Resolves https://github.com/pytorch/pytorch/issues/24772 This port makes sure that it interacts correctly with the "deterministic algorithms" flag, as done in https://github.com/pytorch/pytorch/pull/51388 This PR also makes these two functions correct in the following aspects (all of them added to the tests as well): - Support for complex numbers - Correct handling of scalar inputs and zero-dimensional inputs - Implementation that does not do any copies nor sorting of any of the input tensors - Faster and more correct implementation of the backwards (now it works as it should when `source.shape() != index.shape()`) - Now `put_(..., accumulate=True)` is implemented correctly with atomic operations on GPU / CPU (when possible) and is deterministic (modulo the loss of precision that might happen due to the reordering of a sum of floats) - Adds the `torch.put` function that was missing, (`index_put` exists, for example) - Corrected docs It also adds a much more thorough testing to the operations and their gradients. There is a BC-breaking change, and that is that now we check that the inputs do not overlap in the `put_` operation. This was handled (some of the cases, other cases were wrong) in the TH implementation by making contiguous copies of the inputs. How should we handle this one? Edit. Benchmarks: <details> <summary>Script</summary> ```python from IPython import get_ipython import torch from itertools import product torch.manual_seed(13) torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def run_test(ndims, size, index_len, device, cmd): print(f"cmd: {cmd}, ndims: {ndims}, tensor_size: {size}, index_len: {index_len}, device: {device}") large_tensor = torch.rand(([size] ndims), device=device) small_tensor = torch.rand((index_len,), device=device) index = torch.randint(size * ndims, (index_len,), dtype=torch.long, device=device) if cmd == "put": command = "large_tensor.put_(index, small_tensor, accumulate=False)" if device == cuda: command += "; torch.cuda.synchronize()" elif cmd == "accumulate": command = "large_tensor.put_(index, small_tensor, accumulate=True)" if device == cuda: command += "; torch.cuda.synchronize()" elif cmd == "take": command = "torch.take(large_tensor, index)" if device == cuda: command += "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() for method, device in product(["accumulate", "put", "take"], [cpu, cuda]): run_test(3, 1000, 10, device, method) run_test(3, 1000, 1000, device, method) run_test(3, 1000, 10000, device, method) run_test(2, 10000, 100000, device, method) ``` </details> ```python put_(accumulate=False) ``` <details> <summary>ATen CPU (1.5x - 2x speedup)</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.05 µs ± 2.35 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 3.15 µs ± 5.13 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 21.6 µs ± 13.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 238 µs ± 781 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 722 ns ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 4.89 µs ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 42.5 µs ± 96.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 428 µs ± 774 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>ATen GPU (same speed)</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 8.99 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.4 µs ± 24.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.4 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 15.6 µs ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 8.44 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 9.09 µs ± 4.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 9.77 µs ± 0.998 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 15.8 µs ± 5.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> ```python put_(accumulate=True) ``` <details> <summary>ATen CPU (x2 speedup)</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.12 µs ± 2.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 3.14 µs ± 2.05 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 20.8 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 264 µs ± 263 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 814 ns ± 1.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 5.11 µs ± 6.02 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 43.9 µs ± 49.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 442 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>ATen GPU (3x - 11x speedup)</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.01 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.4 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.3 µs ± 44.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 12.6 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 34.7 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 38.2 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 61.2 µs ± 50.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 140 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> ```python take() ``` <details> <summary>ATen CPU (1.1x speedup)</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.18 µs ± 2.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 2.79 µs ± 2.96 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 16.6 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 161 µs ± 984 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.1 µs ± 3.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 2.93 µs ± 7.31 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 18.6 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 178 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> <details> <summary>ATen GPU (same speed)</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.38 µs ± 23.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.7 µs ± 9.77 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.6 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 11.5 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.31 µs ± 7.57 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 9.52 µs ± 5.78 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 9.73 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 11.7 µs ± 5.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/53356 Reviewed By: mruberry Differential Revision: D27520243 Pulled By: ngimel fbshipit-source-id: e3979349c2c62d2949e09fb05e5fd4883fbc9093	2021-04-05 18:05:38 -07:00
Edward Yang	3acbaf834e	Make structured functions properly check device/dtype of explicit out args (#55150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55150 Somehow I forgot to add these checks. Now they're in here. Thanks ngimel for noticing. This is probably a slight efficiency hit on TensorIterator, which is probably already doing all these checks. Would be good to follow up on this, though it may not be easily fixable with the TI rewrite. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D27523879 Pulled By: ezyang fbshipit-source-id: 458e617dbc6de6fcfa9e5841148b30b99f52e001	2021-04-05 14:42:43 -07:00
kshitij12345	0a81034dd0	Port atan2 to structured kernel (#55130 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55130 Reviewed By: gchanan Differential Revision: D27502777 Pulled By: ezyang fbshipit-source-id: 9c368e2c3670f5633e059024ccff8b3e95e2733e	2021-04-05 00:12:42 -07:00
Nikita Shulga	8377e6221a	Revert D27478225: [pytorch][PR] Added pow() on CPU for float16 & bfloat16 Test Plan: revert-hammer Differential Revision: D27478225 (`6d030c14cf`) Original commit changeset: d309dd98d5a9 fbshipit-source-id: e0518f15185b41946caf3a8456c7af3f52e5a910	2021-04-03 10:26:44 -07:00
Winston Smith	6d030c14cf	Added pow() on CPU for float16 & bfloat16 (#50999 ) Summary: Added the functionality desired in https://github.com/pytorch/pytorch/issues/50789. 1. Added support for pow() on CPU for `float16` (`Half`) and `bfloat16` types. Both `pow(Tensor, Scalar)` and `pow(Tensor, Tensor)` are now supported for the aforementioned types. However autograd isn't supported for `Float16` on CPU yet, as `log_vml_cpu` can't be enabled for it. 2. heitorschueroff added `pow_tensor_scalar_optimized_kernel` to refactor & simplify `PowKernel.cpp`. It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it). It replaced code that had previously been duplicated for (float, double) and complex types, so PowKernel.cpp looks a lot cleaner now. 3. Enabled (unskipped) some tests for `erf`, `erfc`,`erfinv`, `linalg.norm` and `linalg.vector.norm` which were being skipped earlier due to `pow()` not having been implemented for `float16` & `bfloat16`. 4. Added an OpInfo for `pow()` & enabled some test cases for `pow()`. 5. Extended the coverage of existing tests for `pow` in `test_binary_ufuncs.py` in order to enable comparison with `numpy`, even with discontiguous tensors, and added a test to ensure that a runtime error is raised for `pow`'s inplace variant if resizing the base tensor is required during its invocation. 6. Added `float16` & `bfloat16` to `square`'s dtype lists in its `UnaryUfuncInfo`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50999 Reviewed By: zou3519 Differential Revision: D27478225 Pulled By: heitorschueroff fbshipit-source-id: d309dd98d5a96d0cb9b08281757bb1c65266d011	2021-04-02 15:57:06 -07:00
lezcano	36c27fd0ac	SVD docs improved (#54002 ) Summary: - Corrected a few errata in the SVD docs - Made the notation more uniform (refer to `Vh` in `linalg.svd`, always use double tilts...) - Wrote a better explanation about why the gradients of `U` and `V` are not well-defined when the input is complex or real but has repeated singular values. The previous one pointed to a somewhat obscure post on gauge theory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54002 Reviewed By: malfet Differential Revision: D27459502 Pulled By: mruberry fbshipit-source-id: f5c35eca02d35dadd2fc0eeadfacc8824f409400	2021-04-01 09:31:40 -07:00
Kurt Mohler	6c235ef267	Allow `std=0` in `torch.normal`, and error if `std<0` (#51317 ) Summary: Part of https://github.com/pytorch/pytorch/issues/49998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51317 Reviewed By: bdhirsh Differential Revision: D27253939 Pulled By: mruberry fbshipit-source-id: af7a72c3d91549b1a88b73849b6973e7619dc50b	2021-03-31 21:06:07 -07:00
Edward Yang	6c8d783830	Generate no-op meta functions for all inplace operations (#54901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54901 Some subtleties: - Need to make sure not to clobber composite definitions when deciding when to generate - I was lazy and so I didn't make inplace on TensorList work, nor did I make inplace functions that returned void work - A few tests started complaining that these noop meta functions weren't raising the errors they needed. This is tracked in https://github.com/pytorch/pytorch/issues/54897 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D27407232 Pulled By: ezyang fbshipit-source-id: 5e706a267496368acdafd128942c310954e43d29	2021-03-30 09:31:39 -07:00
Edward Yang	1f36ce6e4d	Restore storage on meta tensors; increase meta coverage (#53973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53973 Two parts to this PR; I had to put them together because adding support for X causes more test code to be exercised, which in turn may require a fix for Y. The first part is restoring the concept of storage to meta tensors. Previously, meta tensors had a nullptr storage (e.g., `meta_tensor.storage()` is an error.) As I was increasing the coverage of meta tensors, I started running into test cases (specifically memory overlap tests) that were failing because not having storage meant I couldn't check for memory overlap. After some discussion, we decided that it would make sense for meta tensors to model this as well (we already model strides, so getting accurate view information also seems useful). This PR does that by: * Rewrite all of the factory functions in MetaTensor.cpp to use the generic versions (which are very carefully written to not actually poke at the data pointer, so everything works out). The key idea here is we give meta tensors a special allocator, MetaAllocator, which always returns a nullptr even if you ask for a nonzero number of bytes. resize_ is also made generic; the normal variant can be used directly rather than having to instruct it to avoid resizing storage * Turn on memory overlap checking in TensorIterator even for meta tensors * Although meta tensors now have storage, the concept of meta storage is NOT exposed to Python land (as it would imply I would have to codegen MetaFloatStorage, MetaDoubleStorage, etc. classes). So `x.storage()` still raises an error and I have a cludge in `__deepcopy__` to break storage sharing upon deep copy (this is wrong, but no tests exercise this at the moment). The second part is adding more support for the most used functions in the test suite. * Inplace operations have very simple meta functions. I added `fill_`, `zero_`, `random_`, `uniform_` and `normal_`. In the case of random, I take advantage of pbelevich's templates for defining random kernels, so that I can reuse the common scaffolding, and then just register a noop stub that actually does the RNG. (Look, another structured kernels tiny variant!) * `copy_` is now implemented. Copying into a meta tensor is always OK, but copying out of a meta tensor raises an error (as we don't know what the "correct" data to copy out is in this case) * `empty_strided` usage from structured kernels now is implemented (TBH, this could have been done as soon as `empty_strided` was added) * Meta was missing in a few places in TensorOptions/DispatchKey utility functions, so I added them * Autograd engine now correctly homes meta tensors with CPU tensors (they have -1 device index so CUDA queues wouldn't work anyway) * `apply_`, `map_` and `map2_` are special cased to no-op on meta tensor self. These count as inplace operations too but they are implemented a little differently. Getting more meta function support triggers a number of bugs in the test suite, which I then fix: - Linear algebra functions sometimes don't report NotImplementedError because they get swallowed by catch all try blocks. This is tracked in https://github.com/pytorch/pytorch/issues/53739 - dlpack obviously doesn't work with meta tensors, I just disabled the test Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D27036572 Test Plan: Imported from OSS Reviewed By: agolynski, bdhirsh Pulled By: ezyang fbshipit-source-id: 7005ecf4feb92a643c37389fdfbd852dbf00ac78	2021-03-29 08:37:46 -07:00
Xiang Gao	eec48303c0	Make index_add take a scalar argument alpha (#54176 ) Summary: ``` index_add(Tensor self, int dim, Tensor index, Tensor source) -> Tensor ``` now becomes ``` index_add(Tensor self, int dim, Tensor index, Tensor source, Scalar alpha=1) -> Tensor ``` Generally, this sounds useful and harmless, and inside PyTorch, we are already needing this feature in `add_out_dense_sparse_cuda`, see the `SparseCUDATensorMath.cu` change in this PR. Test not added yet. Will add if after discussion we believe this is a good idea. - [ ] TODO: add test Pull Request resolved: https://github.com/pytorch/pytorch/pull/54176 Reviewed By: ngimel Differential Revision: D27319198 Pulled By: mruberry fbshipit-source-id: fe43be082d1230c87c5313458213d5252be2ff23	2021-03-28 00:22:45 -07:00
lezcano	5870346173	Port index_copy from TH to ATen (#52203 ) Summary: The design of the `TensorIterator` was similar to that in https://github.com/pytorch/pytorch/pull/50578 Resolves https://github.com/pytorch/pytorch/issues/24670 Resolves https://github.com/pytorch/pytorch/issues/24523 Timings: <details> <summary>Script</summary> ```python from IPython import get_ipython import torch torch.manual_seed(13) torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def run_test(ndims, size, index_len, device): print(f"ndims: {ndims}, tensor_size: {size}, index_len: {index_len}, device: {device}") x = torch.rand(([size] ndims), device=device) index = torch.randint(size, (index_len,), dtype=torch.long, device=device) for d in range(ndims): shape_t = [size] * d + [index_len] + [size] * (ndims - d - 1) t = torch.rand(*shape_t, device=device) command = "x.index_copy(d, index, t)" if device == cuda: command = command + "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() run_test(3, 700, 10, cpu) run_test(3, 700, 100, cpu) run_test(3, 700, 700, cpu) run_test(2, 10000, 10000, cpu) run_test(3, 700, 10, cuda) run_test(3, 700, 100, cuda) run_test(3, 700, 700, cuda) run_test(2, 10000, 10000, cuda) ``` </details> <details> <summary>CPU ATen</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cpu 327 ms ± 309 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 329 ms ± 456 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 378 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 100, device: cpu 348 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 359 ms ± 330 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 526 ms ± 686 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 700, device: cpu 560 ms ± 19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 552 ms ± 2.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 932 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cpu 163 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 302 ms ± 5.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>CUDA ATen</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cuda 9.63 ms ± 441 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.65 ms ± 230 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.4 ms ± 881 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 700, index_len: 100, device: cuda 10.8 ms ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 11 ms ± 417 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 21.2 ms ± 18.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 700, index_len: 700, device: cuda 19 ms ± 4.42 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 17.8 ms ± 493 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 25.8 ms ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cuda 5.59 ms ± 109 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 10 ms ± 25.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` </details> <details> <summary>CPU TH</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cpu 333 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 327 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 366 ms ± 753 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 100, device: cpu 336 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 345 ms ± 914 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 884 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 700, device: cpu 441 ms ± 3.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 514 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 7.46 s ± 6.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cpu 141 ms ± 233 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 1.13 s ± 855 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>CUDA TH</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cuda 9.64 ms ± 390 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.68 ms ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 13.9 ms ± 928 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 700, index_len: 100, device: cuda 11.6 ms ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.1 ms ± 3.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 30.3 ms ± 27.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 700, index_len: 700, device: cuda 27.2 ms ± 19.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 30.6 ms ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 146 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cuda 6.5 ms ± 3.99 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 64.7 ms ± 55.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` </details> According to these we see a slight performance improvement across both CPU and GPU. cc: nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/52203 Reviewed By: jbschlosser Differential Revision: D27066572 Pulled By: mruberry fbshipit-source-id: 6101e461cf731afa3db042a383b723d3d6bfdc26	2021-03-22 22:36:35 -07:00
kshitij12345	afb560065c	[testing] OpInfo for sgn and sign (#53885 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 TODO: * [x] Check rendered docs. https://11525594-65600975-gh.circle-artifacts.com/0/docs/generated/torch.sgn.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/53885 Reviewed By: ejguan Differential Revision: D27114318 Pulled By: mruberry fbshipit-source-id: 678179d87741aacd3b50f03dc460207c5aa29589	2021-03-22 09:39:40 -07:00
lezcano	9d9986fd10	Support for Half / bfloat16 / index_select and better testing (#53898 ) Summary: Added the support for half / bfloat / bool for `index_select`, as suggested by ngimel in https://github.com/pytorch/pytorch/issues/49707#issuecomment-788140578 For the tests to pass, I also added the support for `index_add`. I added `OpInfo` tests for `index_add` and more thorough forward tests for `index_select` to test these changes. While doing so, I found that the support for scalar types in the derivative of `index_add` was not correct, so I corrected it. Resolves https://github.com/pytorch/pytorch/issues/49707 It should also resolve similar issues that I encountered when porting `index_copy`, `take` and `put`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53898 Reviewed By: mruberry Differential Revision: D27193294 Pulled By: ngimel fbshipit-source-id: 5a0af2c62a0cf24f3cc9c74f230ab4f3712bbb7a	2021-03-19 20:37:48 -07:00
Edward Yang	49f1336106	Add Tensor::is_cpu, genericize TensorIterator (#54079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54079 Fixes https://github.com/pytorch/pytorch/issues/53815 Instead of testing if something is CUDA, we instead test if something is not CPU. This in the general theming of "Don't be so darn CUDA centric". Intruigingly, we didn't have a is_cpu() method on Tensor. Which seems like a big oversight and one of the reasons how we ended up in this mess. So in it goes. Maybe we should also get this for Python bindings as well (but in that case, should probably look into redoing all of the is_X bindings so they aren't done manually). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27109507 Pulled By: ezyang fbshipit-source-id: abbe72c2e688c452ffe098d206cb79938b5824b1	2021-03-19 09:10:24 -07:00
Edward Yang	3c457043fb	Also propagate storage_access_should_throw_ when copying tensor metadata (#53816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53816 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27036574 Pulled By: ezyang fbshipit-source-id: 71e61b0aa3d46159c9af1112c262cbfa7eaa1879	2021-03-16 15:18:37 -07:00
Edward Yang	547f435763	Fix restriding logic for structured kernels (#53759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53759 Fixes #53587, see issue for in-depth explanation of the bug. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26971342 Pulled By: ezyang fbshipit-source-id: 805983fed2658e27fb033f36a71fd30950a29328	2021-03-14 20:41:23 -07:00
Edward Yang	d47d246206	Add 'noarch' tests which only run in one CI config (#53747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53747 Fixes #53743 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26971343 Pulled By: ezyang fbshipit-source-id: cee7aa10063ae674f741406a3af830e4b4f128df	2021-03-14 20:39:07 -07:00
Brian Hirsh	c68cc24cee	update upsample tests in test_nn.py to test for memory_format (#53665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53665 ngimel pointed out to me where we already test the behavior of the `Upsample` ops in `test_nn.py`. This PR deleting my bespoke tests in `test_torch.py` and updates those in `test_nn.py` to test memory format properly. There were two reasons the original test didn't pick up on a memory format regression: - They didn't test the memory format of the output tensor explicitly, i.e. `output.is_contiguous(memory_format=...)` - Even with that change, the test tensors were to simple to fail the tests. From some trial and error, it looks like one of the first two dimensions in the inputs needs to be > 1 in order for the `channels_last` memory format to actually re-order the strides. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26929683 Pulled By: bdhirsh fbshipit-source-id: d17bc660ff031e9b3e2c93c60a9e9308e56ea612	2021-03-10 14:21:14 -08:00
Natalia Gimelshein	6aa5148df2	Filter 0's returned by exponential distribution (#53480 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48841 for half datatype (it was fixed for other datatypes before). The reason for https://github.com/pytorch/pytorch/issues/48841 happening for half was that `exponential_` for half was producing 0s. Exponential distribution implementation on cuda is here `e08aae2613/aten/src/ATen/native/cuda/DistributionTemplates.h (L535-L545)` with `transformation::exponential` defined here `e08aae2613/aten/src/ATen/core/TransformationHelper.h (L113-L123)` It takes a uniformly distributed random number and takes `log` of it. If necessary, the result is then converted to low precision datatype (half). To avoid 0's, before applying `log`, ones are replaced with std::nextafter(1,0). This seems fine, because log(1-eps) is still representable in half precision (`torch.tensor([1.], device="cuda").nextafter(torch.tensor([0.], device="cuda")).log().half()` produces 5.96e-8) , so casting to `scalar_t` should work. However, since fast log approximation is used (`__logf`), the log result is ~3e-9 instead of more accurate 5.96e-8, and underflows when casting to half. Using `::log` instead of fast approximation fixes it, however, it comes with ~20% perf penalty on exponential kernel for fp32 datatype, probably more for half. Edit: alternative approach used now is to filter all small values returned by transformation. The result is equivalent to squashing of 1's to 1-eps that was used before, and computing correct log of 1-eps (which is -eps, exactly equal even for doubles). This doesn't incur noticeable performance hit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53480 Reviewed By: mruberry Differential Revision: D26924622 Pulled By: ngimel fbshipit-source-id: dc1329e4773bf91f26af23c8afa0ae845cfb0937	2021-03-10 00:35:31 -08:00
Brian Hirsh	233b9490c2	fix channels_last bug in upsample kernels (#53535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53535 During the port to structured kernels for upsample kernels, I missed that a subset of them explicitly pass `memory_format` information from the input to the output tensors. Note 1: I added the logic into the `meta` function of each op, which feels morally correct since this logic affects the output shape/metadata. One consequence is that all backend implementations will get the logic. I synced with fmassa that this seems reasonable. Note 2: This logic used to happen in the following operators, which this PR fixes: - upsample_nearest3d - upsample_trilinear3d - upsample_nearest2d - upsample_bilinear2d I explicitly didn't patch the other upsample kernels, which look like they never forwarded memory_format information: - `upsample_bicubic2d` (maybe this should though? `UpSampleBicubic2d.cpp` isn't currently written to do anything different for `channels_last` tensors) - All of the `upsample_{mode}1d` operators. Probably because, afaik, channels_last isn't supported for 3d tensors - The corresponding backwards operator for every upsample op. Note 3: I'm also wondering why memory_format isn't just directly a part of the `tensor::options()` method, which would cause all ops to universally forward memory_format information from input to output tensors, rather than just the upsample ops. My guess is: - BC-breakage. I'm not sure whether this would really break people, but it's an API change - performance. `tensor::options()` is called everywhere, and adding a call to `suggest_memory_format()` would probably noticeably hit microbenchmarks. We could probably deal with that by making `memory_format` a precomputed field on the tensor? Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26891540 Pulled By: bdhirsh fbshipit-source-id: b3845f4dd5646b88bf738b9e41fe829be6b0e5cf	2021-03-09 15:23:53 -08:00
Jane Xu	d0b32156f0	move test to CUDA only (#53561 ) Summary: Helps make master green by removing this hefty memory allocating from CPU test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53561 Reviewed By: malfet, albanD Differential Revision: D26897941 Pulled By: janeyx99 fbshipit-source-id: 9f6c2d55f4eea1ab48665f7819fc113f21991036	2021-03-08 16:32:14 -08:00
mattip	54a2498919	Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387 ) Summary: Related to https://github.com/pytorch/pytorch/issues/50006 Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387 Reviewed By: albanD Differential Revision: D26773387 Pulled By: mruberry fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd	2021-03-08 03:32:14 -08:00
Edward Yang	758fb94fcb	Prefix assert_async with underscore, fix some bugs in assert_async CUDA testing (#53276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53276 - One of the tests had a syntax error (but the test wasn't fine grained enough to catch this; any error was a pass) - Doesn't work on ROCm Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D26820048 Test Plan: Imported from OSS Reviewed By: mruberry Pulled By: ezyang fbshipit-source-id: b02c4252d10191c3b1b78f141d008084dc860c45	2021-03-05 17:36:01 -08:00
Edward Yang	cfd9360d09	Revert D26837780: Revert D26819810: Revert D26815021: Revert D26744062: Add assert_async Test Plan: revert-hammer Differential Revision: D26837780 Original commit changeset: 21567cab5c0f fbshipit-source-id: 8ea735e5fdc97e32ae3fafd40297a1b8a7cd34b0	2021-03-04 20:45:35 -08:00
Edward Yang	1accffe450	Revert D26819810: Revert D26815021: Revert D26744062: Add assert_async Test Plan: revert-hammer Differential Revision: D26819810 Original commit changeset: e528260e1aa9 fbshipit-source-id: 21567cab5c0ff5f5e60a699d4d4678773a567c30	2021-03-04 18:48:56 -08:00
Edward Yang	9e5e5a7d96	Revert D26815021: Revert D26744062: Add assert_async Test Plan: revert-hammer Differential Revision: D26815021 Original commit changeset: 972eaafcdf14 fbshipit-source-id: e528260e1aa91df1873c73af00aa57addd671607	2021-03-04 09:28:25 -08:00
Mike Ruberry	b864457743	Revert D26744062: Add assert_async Test Plan: revert-hammer Differential Revision: D26744062 (`12d63cc2f5`) Original commit changeset: be6d2653afe5 fbshipit-source-id: 972eaafcdf14d96abdec3dea6bcbd5cac1f3d759	2021-03-04 04:11:25 -08:00
Edward Yang	12d63cc2f5	Add assert_async (#53086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53086 Fixes #36853 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26744062 Pulled By: ezyang fbshipit-source-id: be6d2653afe584adf67a05b5d43185b40764650d	2021-03-03 16:18:07 -08:00
Edward Yang	0f81a69a96	Make meta a device (getting rid of empty_meta) (#53143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53143 Meta is now an honest to goodness device type, like cpu, so you can use device='meta' to trigger allocation of meta tensors. This way better than empty_meta since we now have working API for most factory functions (they don't necessarily work yet, though, because need to register Meta versions of those functions.) Some subtleties: - I decided to drop the concept of CPU versus CUDA meta tensors; meta tensors are device agnostic. It's hard to say exactly what the correct level of abstraction here is, but in this particular case implementation considerations trump semantic considerations: it is way easier to have just a meta device, than to have a meta device AND a cpu device AND a cuda device. This may limit the applicability of meta tensors for tracing models that do explicit cpu()/cuda() conversions (unless, perhaps, we make those operations no-ops on meta tensors). - I noticed that the DeviceType uppercase strings are kind of weird. Are they really supposed to be all caps? That's weird. - I moved the Meta dispatch key to live with the rest of the "device" dispatch keys. - I intentionally did NOT add a Backend for Meta. For now, I'm going to hope meta tensors never exercise any of the Backend conversion code; even if it does, better to fix the code to just stop converting to and from Backend. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D26763552 Pulled By: ezyang fbshipit-source-id: 14633b6ca738e60b921db66a763155d01795480d	2021-03-03 11:24:13 -08:00
Natalia Gimelshein	e5e54ada61	fix logcumsumexp functor to properly handle infs and nans (#52947 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52213 Nans were previously inconsistently propagated due to std::min always returning first argument if one of the args in nan when reduction functor was called on 2 `-inf` arguments, `std::min(x,y) - std::max(x,y)` resulted in `-inf - (-inf)` = nan, even though logcumsumexp is well defined for `-inf, -inf` pair. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52947 Reviewed By: H-Huang Differential Revision: D26718456 Pulled By: ngimel fbshipit-source-id: a44433889da352cc959786dd15b6361a68fcfed7	2021-03-02 10:58:01 -08:00
kshitij12345	f5617b0932	[testing] Add Opinfo for torch.frac and minor fixes (#52660 ) Summary: Reference : https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52660 Reviewed By: ailzhang Differential Revision: D26618151 Pulled By: mruberry fbshipit-source-id: cf0df38e46f44d3afff6e0015af5a840c661aa0e	2021-03-01 04:58:31 -08:00
Nikita Vedeneev	0048d97eda	remove index_fill side-effect for scalar tensors (#52209 ) Summary: `index_fill` silently promotes zero dim Tensors to 1-dim Tensors. This PR fixes that. Was: ``` In [1]: import torch In [2]: x = torch.tensor(1) In [3]: idx = torch.tensor(0).long() In [4]: x.dim() Out[4]: 0 In [5]: x.index_fill(0, idx, -1).dim() Out[5]: 1 ``` Now: ``` In [6]: x.index_fill(0, idx, -1).dim() Out[6]: 0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52209 Reviewed By: ejguan Differential Revision: D26446470 Pulled By: ngimel fbshipit-source-id: 4737e6941a7216b57f3416b59362817834df3a3a	2021-02-25 00:35:27 -08:00
Jane Xu	09516d2d0c	Reenables skipped tests for all CUDA versions except 11.2 (#52359 ) Summary: This PR adds functionality to skip a test based on CUDA version. This way, we can be more specific when skipping a test, such as when the test only fails for a particular CUDA version. This allows us to add back the skipped tests for CUDA 11.2 for other CUDA versions, such as 10.1 and 11.1. I tested this locally (by using 11.0 instead of 11.2), but will run all the CI to make sure it works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52359 Reviewed By: walterddr Differential Revision: D26487951 Pulled By: janeyx99 fbshipit-source-id: 45c71cc6105ffd9985054880009cf68ea5ef3f6a	2021-02-19 15:30:55 -08:00
Nikita Vedeneev	9699c703c2	Stable sort for the CPU take 2. (#51790 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38681. A duplicate of https://github.com/pytorch/pytorch/pull/50052 created to become importable to the fb internal tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51790 Reviewed By: agolynski Differential Revision: D26279045 Pulled By: glaringlee fbshipit-source-id: 348e171dee9c370a76002b65d0c82c329f57a421	2021-02-19 09:28:57 -08:00
Xiong Wei	c7b0005831	Enhance Tensor.unflatten to support -1 as the inferred size (#51955 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51719, https://github.com/pytorch/pytorch/issues/28142 Change - Update `torch.Tensor.unflatten` to support users pass`-1` as the inferred size for both tensors and named tensors. - Examples of using `-1` in the `unflatten` function are added to the docs. - Fix the rendered issue of original `unflatten` docs by removing a blank line between its example section. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51955 Reviewed By: agolynski Differential Revision: D26467198 Pulled By: zou3519 fbshipit-source-id: 6a3ede25561223187273796427ad0cb63f125364	2021-02-18 08:37:41 -08:00
Ailing Zhang	83fa713f2b	Fix test to use proper condition. (#52216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52216 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26427506 Pulled By: ailzhang fbshipit-source-id: ba4f2f66794cb2843926e5566eb4d25582f7fb2b	2021-02-12 12:59:35 -08:00
Kshiteej K	d7ea0fe75a	[testing] Add OpInfo for rad2deg and deg2rad (#51283 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50006 We should probably add aliases for these operators to be consistent with NumPy names i.e. `np.degrees` and `np.radians`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51283 Reviewed By: ngimel Differential Revision: D26171163 Pulled By: mruberry fbshipit-source-id: 1869604ed400820d95f6ff50a0e3cba1de1ffa84	2021-02-10 19:45:10 -08:00
Jane Xu	bff8194522	Replace 11.1 with 11.2 on CI for Windows (#51598 ) Summary: Adding CUDA 11.2 to Windows CI. Disabled tests: The following ran into `CUDA error: misaligned address` for CUDA 11.2: (issue linked below) `test_where_scalar_valid_combination_cuda_complex128` in test_torch.py `test_sgn_complex_cuda` in test_autograd.py The following ran into `CUDA error: too many resources requested for launch` for CUDA 11.2: (https://github.com/pytorch/pytorch/issues/52002) test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_float64 test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_float64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51598 Reviewed By: mrshenli Differential Revision: D26344965 Pulled By: janeyx99 fbshipit-source-id: 3c9a4ed16d748969e96593220ec0a9f33e1ffcef	2021-02-10 17:59:11 -08:00
vfdev	8b0cb5ede3	OpInfo: Added clamp and trunc tests with aliases (#51167 ) Summary: Description: - Added clamp, trunc tests with aliases - Added tests for aliases for asin(h), acos(h), etc - fixed 'fix' alias implementation - fixed annotations in test_jit_alias_remapping - updated native_functions.yaml aliases guidelines Blocked by https://github.com/pytorch/pytorch/issues/50368 cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/51167 Reviewed By: gchanan Differential Revision: D26245753 Pulled By: mruberry fbshipit-source-id: e17b657f0515139735a8a677b1ae284904f98aef	2021-02-10 05:36:18 -08:00
Mike Ruberry	594a66d778	Warn about floor_divide performing incorrect rounding (#50281 ) (#50281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51745 Test Plan: Imported from OSS Reviewed By: ngimel Pulled By: mruberry Differential Revision: D26257855 fbshipit-source-id: e5d497cf07b0c746838ed081c5d0e82fb4cb701b	2021-02-10 03:13:34 -08:00
kshitij12345	768662913a	Migrate masked_fill__cuda to ATen (#51404 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49543 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51404 Reviewed By: mrshenli Differential Revision: D26329833 Pulled By: ngimel fbshipit-source-id: 510988888fad015239ab4766eb391a89b742130b	2021-02-09 22:57:03 -08:00
mattip	b97a040f71	ENH: toggle TORCH_WARN_ONCE to TORCH_WARN for tests (#48560 ) Summary: Toward fixing https://github.com/pytorch/pytorch/issues/47624 ~Step 1: add `TORCH_WARN_MAYBE` which can either warn once or every time in c++, and add a c++ function to toggle the value. Step 2 will be to expose this to python for tests. Should I continue in this PR or should we take a different approach: add the python level exposure without changing any c++ code and then over a series of PRs change each call site to use the new macro and change the tests to make sure it is being checked?~ Step 1: add a python and c++ toggle to convert TORCH_WARN_ONCE into TORCH_WARN so the warnings can be caught in tests Step 2: add a python-level decorator to use this toggle in tests Step 3: (in future PRs): use the decorator to catch the warnings instead of `maybeWarnsRegex` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48560 Reviewed By: ngimel Differential Revision: D26171175 Pulled By: mruberry fbshipit-source-id: d83c18f131d282474a24c50f70a6eee82687158f	2021-02-08 08:21:19 -08:00
wanyu2018umac	444203c52f	Fix torch.cdist backward CUDA error due to illegal gridDim setting (#51569 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51569 Reviewed By: mruberry Differential Revision: D26215694 Pulled By: ngimel fbshipit-source-id: 0710417e6a802424e2dcada325f27452c95d042f	2021-02-02 20:41:24 -08:00
Jeffrey Wan	b18eeaa80a	Implement `np.diff` for single order differences (#50569 ) Summary: Implements `np.diff` for single order differences only: - method and function variants for `diff` and function variant for `diff_out` - supports out variant, but not in-place since shape changes - adds OpInfo entry, and test in `test_torch` - automatic autograd because we are using the `Math` dispatch _Update: we only support Tensors for prepend and append in this PR. See discussion below and comments for more details._ Currently there is a quirk in the c++ API based on how this is implemented: it is not possible to specify scalar prepend and appends without also specifying all 4 arguments. That is because the goal is to match NumPy's diff signature of `diff(int n=1, int dim=-1, Union[Scalar, Tensor] prepend=None, Union[Scalar, Tensor] append)=None` where all arguments are optional, positional and in the correct order. There are a couple blockers. One is c++ ambiguity. This prevents us from simply doing `diff(int n=1, int dim=-1, Scalar? prepend=None, Tensor? append=None)` etc for all combinations of {Tensor, Scalar} x {Tensor, Scalar}. Why not have append, prepend not have default args and then write out the whole power set of {Tensor, Scalar, omitted} x {Tensor, Scalar, omitted} you might ask. Aside from having to write 18 overloads, this is actually illegal because arguments with defaults must come after arguments without defaults. This would mean having to write `diff(prepend, append, n, dim)` which is not desired. Finally writing out the entire power set of all arguments n, dim, prepend, append is out of the question because that would actually involve 2 * 2 * 3 * 3 = 36 combinations. And if we include the out variant, that would be 72 overloads! With this in mind, the current way this is implemented is actually to still do `diff(int n=1, int dim=-1, Scalar? prepend=None, Tensor? append=None)`. But also make use of `cpp_no_default_args`. The idea is to only have one of the 4 {Tensor, Scalar} x {Tensor, Scalar} provide default arguments for the c++ api, and add `cpp_no_default_args` for the remaining 3 overloads. With this, Python api works as expected, but some calls such as `diff(prepend=1)` won't work on c++ api. We can optionally add 18 more overloads that cover the {dim, n, no-args} x {scalar-tensor, tensor-scalar, scalar-scalar} x {out, non-out} cases for c++ api. _[edit: counting is hard - just realized this number is still wrong. We should try to count the cases we do cover instead and subtract that from the total: (2 * 2 * 3 * 3) - (3 + 2^4) = 17. 3 comes from the 3 of 4 combinations of {tensor, scalar}^2 that we declare to be `cpp_no_default_args`, and the one remaining case that has default arguments has covers 2^4 cases. So actual count is 34 additional overloads to support all possible calls]_ _[edit: thanks to https://github.com/pytorch/pytorch/issues/50767 hacky_wrapper is no longer necessary; it is removed in the latest commit]_ hacky_wrapper was also necessary here because `Tensor?` will cause dispatch to look for the `const optional<Tensor>&` schema but also generate a `const Tensor&` declaration in Functions.h. hacky_wrapper allows us to define our function as `const Tensor&` but wraps it in optional for us, so this avoids both the errors while linking and loading. _[edit: rewrote the above to improve clarity and correct the fact that we actually need 18 more overloads (26 total), not 18 in total to complete the c++ api]_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/50569 Reviewed By: H-Huang Differential Revision: D26176105 Pulled By: soulitzer fbshipit-source-id: cd8e77cc2de1117c876cd71c29b312887daca33f	2021-02-02 20:25:16 -08:00
Max Balandat	a990ff7001	[SobolEngine] Fix edge case of dtype of first sample (#51578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51578 https://github.com/pytorch/pytorch/pull/49710 introduced an edge case in which drawing a single sample resulted in ignoring the `dtype` arg to `draw`. This fixes this and adds a unit test to cover this behavior. Test Plan: Unit tests Reviewed By: danielrjiang Differential Revision: D26204393 fbshipit-source-id: 441a44dc035002e7bbe6b662bf6d1af0e2cd88f4	2021-02-02 14:24:56 -08:00
vfdev	b106250047	Introduced AliasInfo for OpInfo (#50368 ) Summary: Introduced AliasInfo for OpInfo. Context: Split of https://github.com/pytorch/pytorch/issues/49158 cc mruberry , please let me know if you'd like to see here more code to cover > [ ] fold test_op_aliases.py into OpInfo-based testing in test_ops.py from https://github.com/pytorch/pytorch/issues/50006 and/or add `UnaryUfuncInfo('abs')` as discussed https://github.com/pytorch/pytorch/pull/49158/files#r548774221 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50368 Reviewed By: ngimel Differential Revision: D26177261 Pulled By: mruberry fbshipit-source-id: 2e3884a387e8d5365fe05945375f0a9d1b5f5d82	2021-02-02 00:10:09 -08:00
kshitij12345	4b65a27a35	[testing] Add OpInfo for round and logit (#51272 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51272 Reviewed By: ngimel Differential Revision: D26177020 Pulled By: mruberry fbshipit-source-id: 4728b14c7a42980c7ca231ca1946430e0e38ed5b	2021-02-01 21:15:40 -08:00
Nikita Vedeneev	b198cf4f1c	port `index_fill_` from TH to ATen. (#50578 ) Summary: As per title. The port is based on TensorIterator. Supports complex input. Resolves https://github.com/pytorch/pytorch/issues/24714. Resolves https://github.com/pytorch/pytorch/issues/24577. Resolves https://github.com/pytorch/pytorch/issues/36328. Possibly resolves https://github.com/pytorch/pytorch/issues/48230 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50578 Reviewed By: ngimel Differential Revision: D26049539 Pulled By: anjali411 fbshipit-source-id: 2be4e78f7a01700c593a9e893e01f69191e51ab1	2021-02-01 16:08:37 -08:00
kshitij12345	50fa415a4d	[testing] Add OpInfo for ceil and floor (#51198 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51198 Reviewed By: malfet Differential Revision: D26105099 Pulled By: mruberry fbshipit-source-id: 6cfa89f42b87cca66dbc5bf474d17a6cad7eb45a	2021-02-01 10:10:36 -08:00
Max Balandat	449098c2d2	[SobolEngine] Update direction numbers to 21201 dims (#49710 ) Summary: Performs the update that was suggested in https://github.com/pytorch/pytorch/issues/41489 Adjust the functionality to largely match that pf the scipy companion PR https://github.com/scipy/scipy/pull/10844/, including - a new `draw_base2` method - include zero as the first point in the (unscrambled) Sobol sequence The scipy PR is also quite opinionated if the `draw` method doesn't get called with a base 2 number (for which the resulting sequence has nice properties, see the scipy PR for a comprehensive discussion of this). Note that this update is a breaking change in the sense that sequences generated with the same parameters after as before will not be identical! They will have the same (better, arguably) distributional properties, but calling the engine with the same seed will result in different numbers in the sequence. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49710 Test Plan: ``` from torch.quasirandom import SobolEngine sobol = SobolEngine(3) sobol.draw(4) sobol = SobolEngine(4, scramble=True) sobol.draw(5) sobol = SobolEngine(4, scramble=True) sobol.draw_base2(2) ``` Reviewed By: malfet Differential Revision: D25657233 Pulled By: Balandat fbshipit-source-id: 9df50a14631092b176cc692b6024aa62a639ef61	2021-02-01 08:44:31 -08:00
kshitij12345	a88e1d3ddf	[complex] Complex support for masked_scatter and autograd support for masked_scatter and masked_select (#51281 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/33152 Changes * Enable complex support for masked_scatter * Enable half support for masked_scatter CPU * Enable complex autograd support for masked_scatter CPU and masked_select (both CPU and CUDA). Note: Complex Support for masked_scatter CUDA is disabled as it depends on `masked_fill` which is yet to be ported to ATen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51281 Reviewed By: ailzhang Differential Revision: D26127561 Pulled By: anjali411 fbshipit-source-id: 6284926b934942213c5dfc24b5bcc8538d0231af	2021-01-29 13:49:31 -08:00
kshitij12345	eaf5ca09dc	Migrate masked_scatter_ CUDA to ATen (#50039 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49542 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50039 Reviewed By: heitorschueroff Differential Revision: D26096247 Pulled By: ngimel fbshipit-source-id: ec1810d3412e0d7ab6b950265a3123519ad886c1	2021-01-27 14:17:02 -08:00
kshitij12345	6d098095eb	[numpy] torch.lgamma: promote integer inputs to float (#50140 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50140 Reviewed By: mrshenli Differential Revision: D25951094 Pulled By: mruberry fbshipit-source-id: e53f1dbddff889710f05d43dbc9587382d3decb0	2021-01-27 12:08:46 -08:00
Peter Bell	9b6d463704	Move std and var tests to OpInfos (#50901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50901 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26083289 Pulled By: mruberry fbshipit-source-id: 7e14ff37bba46dd456e0bc0aa9c4e0a632d0734c	2021-01-27 10:50:51 -08:00
mattip	345844d9d8	test, fix deepcopy of tensor with grad (#50663 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/3307 Previously, `self.grad` was not ~cloned~ deepcopied to the returned tensor in `deepcopy`. Added a test and an implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50663 Reviewed By: heitorschueroff Differential Revision: D26074811 Pulled By: albanD fbshipit-source-id: 536dad36415f1d03714b4ce57453f406ad802b8c	2021-01-26 16:19:53 -08:00
anjali411	e544d74c55	[CPU] Add torch.trace for complex tensors (#50380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50380 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25949361 Pulled By: anjali411 fbshipit-source-id: 9910bc5b532c9bf3add530221d643b2c41c62d01	2021-01-23 09:04:31 -08:00
kshitij12345	a291b254ee	Migrate masked_scatter_ CPU to ATen (#49732 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49541 Reference: https://github.com/pytorch/pytorch/issues/24507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49732 Reviewed By: ejguan Differential Revision: D25991438 Pulled By: ngimel fbshipit-source-id: a43bd0bfe043d8e32a6cadbbf736a0eaa697e7ec	2021-01-22 12:05:56 -08:00
Kurt Mohler	8ab1a1495d	Rename `set_deterministic` to `use_deterministic_algorithms` (#49904 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49904 Reviewed By: ezyang, mrshenli Differential Revision: D25956761 Pulled By: mruberry fbshipit-source-id: 86a59289d50825a0ebbd7c358b483c8d8039ffa6	2021-01-22 11:27:07 -08:00
Kyle Chen	16faabe7f0	[ROCm] re-enable tests (#50691 ) Summary: Signed-off-by: Kyle Chen <kylechen@amd.com> cc: jeffdaily re-enable test_torch.py and test_unary_ufuncs.py tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/50691 Reviewed By: mruberry Differential Revision: D25967842 Pulled By: ngimel fbshipit-source-id: dc0f6cb68fe4d151c2719bdf67ead96e1396acf2	2021-01-20 11:23:39 -08:00
Xinyu Li	7526e38cd3	Revert "Stable sort for CPU (#50052 )" (#50752 ) Summary: This reverts commit `c99f356051`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50752 Reviewed By: zou3519 Differential Revision: D25958146 Pulled By: glaringlee fbshipit-source-id: f4068d038f9bd337bac8b673eaeb46a4646f6c77	2021-01-19 18:21:25 -08:00
kshitij12345	316f0b89c3	[testing] Port `torch.{repeat, tile}` tests to use OpInfo machinery (#50199 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50013 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50199 Reviewed By: ngimel Differential Revision: D25949791 Pulled By: mruberry fbshipit-source-id: 10eaf2d749fac8c08847f50461e72ad1c75c61e3	2021-01-19 06:02:27 -08:00
nikitaved	c458558334	kill `multinomial_alias_setup/draw` (#50489 ) Summary: As per title. Partially Fixes https://github.com/pytorch/pytorch/issues/49421. These functions appear to be dead code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50489 Reviewed By: mruberry Differential Revision: D25948912 Pulled By: ngimel fbshipit-source-id: 108723bd4c76cbc3535eba902d6f74597bfdfa58	2021-01-19 00:23:58 -08:00
76181208+imaginary-person@users.noreply.github.com	3f052ba07b	Remove unnecessary dtype checks for complex types & disable complex dispatch for CPU min/max pointwise ops (#50465 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50064 PROBLEM DESCRIPTION: 1. Had not removed dtype checks for complex types in the previous PR (https://github.com/pytorch/pytorch/issues/50347) for this issue. These type-checks were added in https://github.com/pytorch/pytorch/issues/36377, but are no longer necessary, as we now rely upon dispatch macros to produce error messages. 2. dtype checks in `clamp_max()` and `clamp_min()` for complex inputs had not been removed either. 3. For min/max pointwise ops in TensorCompareKernel.cpp, complex dispatch had not been removed for min/max functions. ### FIX DESCRIPTION: FIX SUMMARY: 1. Removed dtype checks added in https://github.com/pytorch/pytorch/issues/36377, and added 3 more in TensorCompare.cpp. 2. Removed dtype checks for complex inputs in `clamp_max()` and `clamp_min()`. 3. Disabled complex dispatch for min/max pointwise ops in TensorCompareKernel.cpp. 4. Error messages in the exceptions raised due to min/max ops not being implemented are now checked for containing the text _not support_ (which can also be present in _not supported_), or _not implemented_, so one of them should be a part of error messages, in order for them to be informative. REASON FOR NOT CHANGING DISPATCH FOR CUDA AND CLAMP OPS: As for the CUDA min/max operations, their kernels do not seem to be compiled & dispatched for complex types anyway, so no further changes seem to be required. Basically, the dispatch macros currently being used don't have cases for complex types. For example, 1. the reduce CUDA ops use [AT_DISPATCH_ALL_TYPES_AND2 (`678fe9f077`)](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L548-L575) in [ReduceMinMaxKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/ReduceMinMaxKernel.cu), and that macro doesn't allow complex types. 2. In [MinMaxElementwiseKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu), the CUDA pointwise ops use [`AT_DISPATCH_FLOATING_TYPES_AND2 (`678fe9f077`)`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L240-L263) for non-integral & non-boolean types, and this marco doesn't have a case for complex types either. 3. [clamp CUDA ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/UnaryOpsKernel.cu#L170-L211) use `AT_DISPATCH_ALL_TYPES_AND2 (`678fe9f077`)`, which doesn't have a case for complex types. Similarly, [CPU clamp min/max ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp#L428-L458) use the `AT_DISPATCH_ALL_TYPES_AND `dispatch macro, which doesn't have a case for complex types. REASON FOR ADDING 3 dtype CHECKS: There are a few cases in which the methods corresponding to `min_stub()` or `max_stub()` are not called, so dispatch macros don't get invoked, resulting in no exceptions being raised. Hence, `dtype` checks are necessary at 3 places to raise exceptions: 1. `52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L342)` 2. `52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L422)` 3. `52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L389)` The first dtype check requirement can be verified from the following example Python code based on `test_complex_unsupported()`: ``` import unittest import torch class MyTestCase(unittest.TestCase): def test_1(self): t = torch.tensor((1 + 1j), device='cpu', dtype=torch.complex128) with self.assertRaises(Exception): torch.max(t, dim=0) if __name__ == '__main__': unittest.main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50465 Reviewed By: mruberry Differential Revision: D25938106 Pulled By: ngimel fbshipit-source-id: 95e2df02ba8583fa3ce87d4a2fdcd60b912dda46	2021-01-17 22:00:05 -08:00
nikitaved	c99f356051	Stable sort for CPU (#50052 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/38681](https://github.com/pytorch/pytorch/issues/38681) for the CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50052 Reviewed By: mrshenli Differential Revision: D25900823 Pulled By: glaringlee fbshipit-source-id: 1a3fa336037d0aa2344d79f46dcacfd478a353d1	2021-01-15 19:34:27 -08:00
kshitij12345	5546a12fe3	remove redundant tests from tensor_op_tests (#50096 ) Summary: All these Unary operators have been an entry in OpInfo DB. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50096 Reviewed By: zhangguanheng66 Differential Revision: D25870048 Pulled By: mruberry fbshipit-source-id: b64e06d5b9ab5a03a202cda8c22fdb7e4ae8adf8	2021-01-12 04:53:12 -08:00
kshitij12345	9f832c8d3e	[numpy] torch.exp: promote integer inputs to float (#50093 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50093 Reviewed By: H-Huang Differential Revision: D25803549 Pulled By: mruberry fbshipit-source-id: e6f245b5e728f2dca6072f8c359f03dff63aa14d	2021-01-08 06:30:18 -08:00
Thomas Viehmann	def8aa5499	Remove cpu half and dead code from multinomial (#50063 ) Summary: Based on ngimel's (Thank you!) feedback, cpu half was only accidental, so I'm removing it. This lets us ditch the old codepath for without replacement in favour of the new, better one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50063 Reviewed By: mruberry Differential Revision: D25772449 Pulled By: ngimel fbshipit-source-id: 608729c32237de4ee6d1acf7e316a6e878dac7f0	2021-01-05 19:46:33 -08:00
anjali411	8fb5f16931	Complex backward for indexing, slicing, joining, and mutating ops (#49552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49552 This PR: 1. Migrates independent autograd test for `hstack`, `dstack`, `vstack`, `movedim`, `moveaxis` from `test_autograd.py` to the new `OpInfo` based tests. 2. Migrates autograd test for `gather`, `index_select` from the method_tests to the new `OpInfo` based tests. 2. Enables complex backward for `stack, gather, index_select, index_add_` and adds tests for complex autograd for all the above mentioned ops. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25682511 Pulled By: anjali411 fbshipit-source-id: 5d8f89db4a9ec340ab99a6196987d44a23e2c6c6	2021-01-04 19:44:15 -08:00
kshitij12345	42d2e31cd6	[numpy] `torch.rsqrt` : promote integer inputs to float (#47909 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47909 Reviewed By: ngimel Differential Revision: D25730876 Pulled By: mruberry fbshipit-source-id: c87a8f686e1dd64e511640e0278021c4a584ccf2	2020-12-30 10:33:14 -08:00
kshitij12345	963f7629b5	[numpy] `torch.digamma` : promote integer inputs to float (#48302 ) Summary: BC-breaking Note: This PR updates PyTorch's digamma function to be consistent with SciPy's special.digamma function. This changes the result of the digamma function on the nonpositive integers, where the gamma function is not defined. Since the gamma function is undefined at these points, the (typical) derivative of the logarithm of the gamma function is also undefined at these points, and for negative integers this PR updates digamma to return NaN. For zero, however, it returns -inf to be consistent with SciPy. Interestingly, SciPy made a similar change, which was noticed by at least one user: https://github.com/scipy/scipy/issues/9663#issue-396587679. SciPy's returning of negative infinity at zero is intentional: `59347ae8b8/scipy/special/cephes/psi.c (L163)` This change is consistent with the C++ standard for the gamma function: https://en.cppreference.com/w/cpp/numeric/math/tgamma PR Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48302 Reviewed By: ngimel Differential Revision: D25664087 Pulled By: mruberry fbshipit-source-id: 1168e81e218bf9fe5b849db0e07e7b22e590cf73	2020-12-24 22:42:55 -08:00
Kshiteej K	3f4b98d568	[numpy] `torch.erfinv`: promote integer inputs to float (#49155 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49155 Reviewed By: ngimel Differential Revision: D25664234 Pulled By: mruberry fbshipit-source-id: 630fd1d334567d78c8130236a67dda0f5ec02560	2020-12-23 14:22:03 -08:00
Kshiteej K	461aafe389	[numpy] `torch.angle`: promote integer inputs to float (#49163 ) Summary: BC-Breaking Note: This PR updates PyTorch's angle operator to be consistent with NumPy's. Previously angle would return zero for all floating point values (including NaN). Now angle returns `pi` for negative floating point values, zero for non-negative floating point values, and propagates NaNs. PR Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 TODO: * [x] Add BC-Breaking Note (Prev all real numbers returned `0` (even `nan`)) -> Fixed to match the correct behavior of NumPy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49163 Reviewed By: ngimel Differential Revision: D25681758 Pulled By: mruberry fbshipit-source-id: 54143fe6bccbae044427ff15d8daaed3596f9685	2020-12-22 18:43:14 -08:00
Xiang Gao	50b361a821	Enable BF16 for indexing on CUDA (#48801 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48801 Reviewed By: glaringlee Differential Revision: D25542914 Pulled By: ngimel fbshipit-source-id: 4113eb2729d15b40a89268172cc37122b5213624	2020-12-14 17:24:31 -08:00
Chester Liu	3a943e9f82	Use Unicode friendly API on Win32 in THAllocator (#47905 ) Summary: This replaces the narrow character set APIs with the wide character set ones in `THAllocator.cpp`. This fixes the potential crashes caused by passing non-ASCII characters in `torch::from_file` on Windows. See: https://github.com/pytorch/pytorch/issues/47422 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47905 Reviewed By: zhangguanheng66 Differential Revision: D25399146 Pulled By: ezyang fbshipit-source-id: 0a183b65de171c48ed1718fa71e773224eaf196f	2020-12-14 14:24:20 -08:00
Brian Hirsh	f54ab8fbfe	Revert "Revert D25003113: make validate debug-only in Device copy ctr" (#49123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49123 This reverts commit `7a4a2df225`. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25463531 Pulled By: bdhirsh fbshipit-source-id: 7c7ecdc1d63ffd137b84a129887c424b2083a958	2020-12-14 07:33:37 -08:00
kiyosora	15200e385a	Enable torch.where() to support Float16 & BFloat16 type inputs (#49004 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/49075 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49004 Reviewed By: zou3519 Differential Revision: D25495225 Pulled By: H-Huang fbshipit-source-id: 09418ee5503f65c8862e40119c5802779505a4db	2020-12-11 13:36:41 -08:00
kshitij12345	eb9516eaa4	[numpy] `torch.exp{2, m1}`: promote integer inputs to float (#48926 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48926 Reviewed By: zhangguanheng66 Differential Revision: D25392344 Pulled By: mruberry fbshipit-source-id: ddbabcfd58cc4c944153b1a224cc232efa022104	2020-12-10 00:14:22 -08:00

1 2 3 4 5 ...

1730 Commits