pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nikita Shulga	8377e6221a	Revert D27478225: [pytorch][PR] Added pow() on CPU for float16 & bfloat16 Test Plan: revert-hammer Differential Revision: D27478225 (`6d030c14cf`) Original commit changeset: d309dd98d5a9 fbshipit-source-id: e0518f15185b41946caf3a8456c7af3f52e5a910	2021-04-03 10:26:44 -07:00
Winston Smith	6d030c14cf	Added pow() on CPU for float16 & bfloat16 (#50999 ) Summary: Added the functionality desired in https://github.com/pytorch/pytorch/issues/50789. 1. Added support for pow() on CPU for `float16` (`Half`) and `bfloat16` types. Both `pow(Tensor, Scalar)` and `pow(Tensor, Tensor)` are now supported for the aforementioned types. However autograd isn't supported for `Float16` on CPU yet, as `log_vml_cpu` can't be enabled for it. 2. heitorschueroff added `pow_tensor_scalar_optimized_kernel` to refactor & simplify `PowKernel.cpp`. It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it). It replaced code that had previously been duplicated for (float, double) and complex types, so PowKernel.cpp looks a lot cleaner now. 3. Enabled (unskipped) some tests for `erf`, `erfc`,`erfinv`, `linalg.norm` and `linalg.vector.norm` which were being skipped earlier due to `pow()` not having been implemented for `float16` & `bfloat16`. 4. Added an OpInfo for `pow()` & enabled some test cases for `pow()`. 5. Extended the coverage of existing tests for `pow` in `test_binary_ufuncs.py` in order to enable comparison with `numpy`, even with discontiguous tensors, and added a test to ensure that a runtime error is raised for `pow`'s inplace variant if resizing the base tensor is required during its invocation. 6. Added `float16` & `bfloat16` to `square`'s dtype lists in its `UnaryUfuncInfo`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50999 Reviewed By: zou3519 Differential Revision: D27478225 Pulled By: heitorschueroff fbshipit-source-id: d309dd98d5a96d0cb9b08281757bb1c65266d011	2021-04-02 15:57:06 -07:00
kshitij12345	bac566bf61	torch.square : OpInfo and minor fixes (#52551 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Add `out` variant to be consistent with Unary Ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52551 Reviewed By: heitorschueroff Differential Revision: D27233482 Pulled By: mruberry fbshipit-source-id: fef6f241849a12c46028bd1aad8f5ecc1dc65ea1	2021-03-24 00:04:42 -07:00
kshitij12345	b93ab10b7a	torch.lerp: cuda complex support (#54129 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54048 TODO * [x] Add test Pull Request resolved: https://github.com/pytorch/pytorch/pull/54129 Reviewed By: bdhirsh Differential Revision: D27261878 Pulled By: anjali411 fbshipit-source-id: 10937a2eab944c73b5a98ec6278f50a876b8c7dc	2021-03-23 19:58:43 -07:00
Brian Hirsh	779cae9e42	port at::pow to structured (#53669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53669 This PR does two things: * Ports `pow` to be structured * Fixes a bug with how pow handles mixed cpu and cuda tensors bug fix Pow is a binary op, and all binary ops that use TensorIterator are currently written to handle the case when one of the inputs is a CUDA tensor, and the other is a zero-dimensional cpu tensor. `pow` incidentally only handles one of the two cases: it fails when the CUDA tensor is passed as the exponent, e.g. `at::pow(torch.tensor(2.0, device='cpu'), torch.tensor([2, 2], device='cuda'))`. Porting `pow` to structured happened to change the error that was outputted from a `TORCH_CHECK` in TensorIterator to an `INTERNAL_ASSERT` in loop.cuh, so I ended up trying to fix the error and update the tests. I added more details in a comment on the PR. notes on the structured port Pow is a little weird, so I wrote down a couple of issues I noticed during the port: * Multiple independent overloads. `pow` has two overloads that have their own cpu/cuda kernels, meaning one doesn't call the other. I have to update the names of the kernel overloads to make the compiler happy, since the codegen would otherwise try to generate two classes with the same name. `pow` actually has 3 overloads that all have `out` variants, so I ported all 3 to structured- one of them just happens to redispatch one of the others in most cases. * Name propagation. Is name propagation implemented per operator? Or is expected to work for most/all ops by default. Right now it looks like it happens for TensorIterator ops by default. For ops that don't use TensorIterator, we need to explicitly pass the names through to the `set_output()` call in the meta function. This happened to matter for `pow` because it has 3 overloads, but only two of them directly use TensorIterator. I had to pass names directly to `set_output` in the 3rd overload to make tests happy. * Lack of `const Tensor &` in the C++ API. It's a goal to slowly make all `Tensor &` arguments const as part of the structured port, but in this case I needed to explicitly cast constness away because one structured kernel called back into the C++ API, which still has ordinary `Tensor &` arguments. This probably isn't something we'll fix soon, since we have boxing logic that actually relies on the `Tensor &` / `const Tensor &` distinction in some places. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27029821 Pulled By: bdhirsh fbshipit-source-id: c1786e770de6e6c2474b9a48210b88057ab1018e	2021-03-19 14:30:48 -07:00
mattip	54a2498919	Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387 ) Summary: Related to https://github.com/pytorch/pytorch/issues/50006 Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387 Reviewed By: albanD Differential Revision: D26773387 Pulled By: mruberry fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd	2021-03-08 03:32:14 -08:00
Natalia Gimelshein	3309f034aa	remove pointless test (#52609 ) Summary: Fixes T81870118 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52609 Reviewed By: mruberry Differential Revision: D26584288 Pulled By: ngimel fbshipit-source-id: 7cec37db46cfe5b5b2fd21fe7c3e3fcbb8aba049	2021-02-22 16:25:04 -08:00
Gregory Chanan	983347fa25	Allow broadcasting against lerp weights. (#52319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52319 Fixes: https://github.com/pytorch/pytorch/issues/52254 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26488411 Pulled By: gchanan fbshipit-source-id: 60eb471609986584c4235ba7f263581e988e7642	2021-02-18 09:53:25 -08:00
Mike Ruberry	594a66d778	Warn about floor_divide performing incorrect rounding (#50281 ) (#50281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51745 Test Plan: Imported from OSS Reviewed By: ngimel Pulled By: mruberry Differential Revision: D26257855 fbshipit-source-id: e5d497cf07b0c746838ed081c5d0e82fb4cb701b	2021-02-10 03:13:34 -08:00
Peter Bell	b150f150ba	Add division overload with rounding_mode selection (#51706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50280 As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26123271 Pulled By: mruberry fbshipit-source-id: 51a83717602114597ec9c4d946e35a392eb01d46	2021-02-04 13:08:36 -08:00
kiyosora	4803eaf502	Implement NumPy-like function torch.fmax() & torch.fmin() (#49312 ) Summary: - Implementing the NumPy-like function`torch.fmax()` and `torch.fmin()` recommended in https://github.com/pytorch/pytorch/issues/48440 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49312 Reviewed By: izdeby Differential Revision: D25887246 Pulled By: heitorschueroff fbshipit-source-id: d762eeff8b328bfcbe7d48b7ee9d2da72c249691	2021-01-20 06:45:25 -08:00
76181208+imaginary-person@users.noreply.github.com	3f052ba07b	Remove unnecessary dtype checks for complex types & disable complex dispatch for CPU min/max pointwise ops (#50465 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50064 PROBLEM DESCRIPTION: 1. Had not removed dtype checks for complex types in the previous PR (https://github.com/pytorch/pytorch/issues/50347) for this issue. These type-checks were added in https://github.com/pytorch/pytorch/issues/36377, but are no longer necessary, as we now rely upon dispatch macros to produce error messages. 2. dtype checks in `clamp_max()` and `clamp_min()` for complex inputs had not been removed either. 3. For min/max pointwise ops in TensorCompareKernel.cpp, complex dispatch had not been removed for min/max functions. ### FIX DESCRIPTION: FIX SUMMARY: 1. Removed dtype checks added in https://github.com/pytorch/pytorch/issues/36377, and added 3 more in TensorCompare.cpp. 2. Removed dtype checks for complex inputs in `clamp_max()` and `clamp_min()`. 3. Disabled complex dispatch for min/max pointwise ops in TensorCompareKernel.cpp. 4. Error messages in the exceptions raised due to min/max ops not being implemented are now checked for containing the text _not support_ (which can also be present in _not supported_), or _not implemented_, so one of them should be a part of error messages, in order for them to be informative. REASON FOR NOT CHANGING DISPATCH FOR CUDA AND CLAMP OPS: As for the CUDA min/max operations, their kernels do not seem to be compiled & dispatched for complex types anyway, so no further changes seem to be required. Basically, the dispatch macros currently being used don't have cases for complex types. For example, 1. the reduce CUDA ops use [AT_DISPATCH_ALL_TYPES_AND2 (`678fe9f077`)](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L548-L575) in [ReduceMinMaxKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/ReduceMinMaxKernel.cu), and that macro doesn't allow complex types. 2. In [MinMaxElementwiseKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu), the CUDA pointwise ops use [`AT_DISPATCH_FLOATING_TYPES_AND2 (`678fe9f077`)`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L240-L263) for non-integral & non-boolean types, and this marco doesn't have a case for complex types either. 3. [clamp CUDA ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/UnaryOpsKernel.cu#L170-L211) use `AT_DISPATCH_ALL_TYPES_AND2 (`678fe9f077`)`, which doesn't have a case for complex types. Similarly, [CPU clamp min/max ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp#L428-L458) use the `AT_DISPATCH_ALL_TYPES_AND `dispatch macro, which doesn't have a case for complex types. REASON FOR ADDING 3 dtype CHECKS: There are a few cases in which the methods corresponding to `min_stub()` or `max_stub()` are not called, so dispatch macros don't get invoked, resulting in no exceptions being raised. Hence, `dtype` checks are necessary at 3 places to raise exceptions: 1. `52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L342)` 2. `52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L422)` 3. `52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L389)` The first dtype check requirement can be verified from the following example Python code based on `test_complex_unsupported()`: ``` import unittest import torch class MyTestCase(unittest.TestCase): def test_1(self): t = torch.tensor((1 + 1j), device='cpu', dtype=torch.complex128) with self.assertRaises(Exception): torch.max(t, dim=0) if __name__ == '__main__': unittest.main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50465 Reviewed By: mruberry Differential Revision: D25938106 Pulled By: ngimel fbshipit-source-id: 95e2df02ba8583fa3ce87d4a2fdcd60b912dda46	2021-01-17 22:00:05 -08:00
Erjia Guan	ca5d9617ba	Fix remainder type promotion (#48668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48668 Combine tests for `fmod` and `remainder`. ## BC-breaking Note: In order to make `remainder` operator have type promotion, we have to introduce BC breaking. ### 1.7.1: In the case where the second argument is a python number, the result is casted to the dtype of the first argument. ```python >>> torch.remainder(x, 1.2) tensor([0, 0, 0, 0, 0], dtype=torch.int32) ``` ### This PR: In the case where the second argument is a python number, the dtype of result is determined by type promotion of both inputs. ```python >>> torch.remainder(x, 1.2) tensor([1.0000, 0.8000, 0.6000, 0.4000, 0.2000]) ``` Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25869136 Pulled By: ejguan fbshipit-source-id: 8e5e87eec605a15060f715952de140f25644008c	2021-01-12 22:09:30 -08:00
Erjia Guan	a0f7b18391	Fix `fmod` type promotion (#48278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48278 Remove various lines from tests due to no type promotion introduced from #47323 ## BC-breaking Note: In order to make `fmod` operator have type promotion, we have to introduce BC breaking. ### 1.7.1: In the case where the second argument is a python number, the result is casted to the dtype of the first argument. ```python >>> torch.fmod(x, 1.2) tensor([0, 0, 0, 0, 0], dtype=torch.int32) ``` ### Prior PR: Check the BC-breaking note of #47323 ### This PR: In the case where the second argument is a python number, the dtype of result is determined by type promotion of both inputs. ```python >>> torch.fmod(x, 1.2) tensor([1.0000, 0.8000, 0.6000, 0.4000, 0.2000]) ``` Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25869137 Pulled By: ejguan fbshipit-source-id: bce763926731e095b75daf2e934bff7c03ff0832	2021-01-12 22:04:19 -08:00
Gregory Chanan	88bd69b488	Stop using c10::scalar_to_tensor in float_power. (#50105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50105 There should be no functional change here. A couple of reasons here: 1) This function is generally an anti-pattern (https://github.com/pytorch/pytorch/issues/49758) and it is good to minimize its usage in the code base. 2) pow itself has a fair amount of smarts like not broadcasting scalar/tensor combinations and we should defer to it. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25786172 Pulled By: gchanan fbshipit-source-id: 89de03aa0b900ce011a62911224a5441f15e331a	2021-01-08 09:44:15 -08:00
Gregory Chanan	74dcb6d363	torch.xlogy: Use wrapped_scalar_tensor / gpu_with_scalars to speed up GPU kernel. (#49926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49926 While investigating https://github.com/pytorch/pytorch/issues/49758, I changed the xlogy kernel to use the recommended wrapped_scaler_tensor pattern instead of moving the scalar to the GPU as a tensor. While this doesn't avoid a synchronization (there is no synchronization in the move, as its done via fill), this does significantly speed up the GPU kernel (almost ~50%, benchmark in PR comments). From looking at the nvprof output, it looks like this code path avoids broadcasting. Aside: this seems unnecessary, as there is nothing special from the point-of-view of broadcasting whether the Tensor is ()-sized or marked as a wrapped_scalar. Still, this is a useful change to make as we avoid extra kernel launches and dispatches to create and fill the tensor. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25724215 Pulled By: gchanan fbshipit-source-id: 4adcd5d8b3297502672ffeafc77e8af80592f460	2021-01-04 12:42:08 -08:00
kshitij12345	2780400904	[numpy] Add `torch.xlogy` (#48777 ) Summary: Reference https://github.com/pytorch/pytorch/issues/38349 Fixes https://github.com/pytorch/pytorch/issues/22656 TODO: * [x] Add docs * [x] Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/48777 Reviewed By: ngimel Differential Revision: D25681346 Pulled By: mruberry fbshipit-source-id: 369e0a29ac8a2c44de95eec115bf75943fe1aa45	2020-12-22 15:05:59 -08:00
kshitij12345	2df249f0ab	[fix] inplace remainder/% (#49390 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49214 BC-Breaking Before this PR, `%=` didn't actually do the operation inplace and returned a new tensor. After this PR, `%=` operation is actually inplace and the modified input tensor is returned. Before PR, ```python >>> import torch >>> a = torch.tensor([11,12,13]) >>> id(a) 139627966219328 >>> a %= 10 >>> id(a) 139627966219264 ``` After PR, ```python >>> import torch >>> a = torch.tensor([11,12,13]) >>> id(a) 139804702425280 >>> a %= 10 >>> id(a) 139804702425280 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49390 Reviewed By: izdeby Differential Revision: D25560423 Pulled By: zou3519 fbshipit-source-id: 2b92bfda260582aa4ac22c4025376295e51f854e	2020-12-22 07:30:03 -08:00
Jeffrey Wan	5ab9593098	`torch.reciprocal`: promote integer inputs to float (#49102 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49091 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49102 Reviewed By: VitalyFedyunin Differential Revision: D25639541 Pulled By: soulitzer fbshipit-source-id: 1dd360bd7b77f106d606143d8d3961610bac8cb7	2020-12-18 16:17:30 -08:00
Erjia Guan	c98c98d77d	Migrate `fmod` and `fmod_` from TH to ATen (CUDA) (#47323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47323 Fixes #24565 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D24763086 Pulled By: ejguan fbshipit-source-id: fa004baea19bbbdbeb44814903db29226805ef0e	2020-12-02 09:38:29 -08:00
FNSTER	30324d1e71	fix INTERNAL ASSERT FAILED for maximum (#48446 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48393 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48446 Reviewed By: zhangguanheng66 Differential Revision: D25240270 Pulled By: ngimel fbshipit-source-id: 57fc223b98f2b6f96f2f24e1d9041644e3187262	2020-12-01 15:29:48 -08:00
Nikita Shulga	032e4f81a8	Fix test comparison ops check for scalar overflow (#48597 ) Summary: Test should verify, that all listed conditions throw, not just the first one Refactor duplicated constants Use `self.assertTrue()` instead of suppressing flake8 `B015: Pointless Comparison` warning Pull Request resolved: https://github.com/pytorch/pytorch/pull/48597 Reviewed By: mruberry Differential Revision: D25222734 Pulled By: malfet fbshipit-source-id: 7854f755a84f23a1a52dc74402582e34d69ff984	2020-11-30 12:39:28 -08:00
Mike Ruberry	36c87f1243	Refactors test_torch.py to be fewer than 10k lines (#47356 ) Summary: Creates multiple new test suites to have fewer tests in test_torch.py, consistent with previous test suite creation like test_unary_ufuncs.py and test_linalg.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47356 Reviewed By: ngimel Differential Revision: D25202268 Pulled By: mruberry fbshipit-source-id: 75fde3ca76545d1b32b86d432a5cb7a5ba8f5bb6	2020-11-28 20:11:40 -08:00

1 2 3

123 Commits