pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-08 07:39:33 +01:00

Author	SHA1	Message	Date
Mike Ruberry	0891c908bb	Revert D33768645: Set correct device id on efficientzerotensors Test Plan: revert-hammer Differential Revision: D33768645 (`5dd6cd55ba`) Original commit changeset: 66ce9907630b Original Phabricator Diff: D33768645 (`5dd6cd55ba`) fbshipit-source-id: 4bb1ad46f01cd33aeb813bdc123741cf665194a8 (cherry picked from commit `8ca385b1d8`)	2022-01-26 17:01:32 +00:00
anjali411	5dd6cd55ba	Set correct device id on efficientzerotensors (#71611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71611 Fixes https://github.com/pytorch/pytorch/issues/71160 Test Plan: Imported from OSS Reviewed By: pbelevich, ngimel Differential Revision: D33768645 Pulled By: anjali411 fbshipit-source-id: 66ce9907630b65a12c0775077147a7e72ff4cee4 (cherry picked from commit `3af98a4d70`)	2022-01-25 23:32:11 +00:00
Jonathan Colen	33403f4848	edge_order check in torch.gradient only applies to dim argument (#67926 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67919 The compatibility check on `edge_order` in `pre_check_gradient` now looks only at dim argument if it is present, otherwise it checks all dimensions. Previously, it would check all dimensions regardless of the dim argument and throw unnecessary errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67926 Reviewed By: albanD Differential Revision: D33760621 Pulled By: mruberry fbshipit-source-id: d490cd8610c68ff3787e670fc947de3cbf2db062 (cherry picked from commit `45bc56de9e`)	2022-01-25 21:29:31 +00:00
Mike Ruberry	e0d829a266	Kill the test_torch.py mixin and creates test_scatter_gather_ops (#71691 ) Summary: Per title. Also annotates test_torch.py with additional cleanup tasks and adds empty sample inputs to elementwise unary and binary OpInfos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71691 Reviewed By: ngimel Differential Revision: D33735126 Pulled By: mruberry fbshipit-source-id: 8cc097a7581a8b620540c95b2a5889c1165ecf23 (cherry picked from commit `5c6a245a3f`)	2022-01-24 09:32:32 +00:00
Mike Ruberry	7680a0ae9d	Deprecates _aminmax (#71576 ) Summary: Replaces https://github.com/pytorch/pytorch/pull/62432. Existing callsites are updated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71576 Reviewed By: ngimel Differential Revision: D33689960 Pulled By: mruberry fbshipit-source-id: fad1ba78347ecec7fd48f21862c3eb606662b8f4 (cherry picked from commit `6cd438e9a1`)	2022-01-21 09:23:29 +00:00
Peter Bell	17bb68618f	Copy: Fix CPU transpose path ignoring neg and conj bits (#69026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69026 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33064533 Pulled By: anjali411 fbshipit-source-id: 98c25586a1707ac2324f69f652ce5a14dd59c0ad	2022-01-14 10:13:33 -08:00
Emilio Castillo	8dfff8b2e2	Fix scatter for empty indexes (#70662 ) Summary: This PR fixes an issue with `scatter` where the output is garbage for zero-sized indexes. ```py import torch null_index = torch.zeros((0, 4), dtype=torch.int64) null_arr = torch.zeros((0, 4)) zeros_arr = torch.zeros((1, 4)) result = zeros_arr.scatter(0, null_index, null_arr) print(null_index) print(null_arr) print(zeros_arr) print(result) ``` ``` tensor([], size=(0, 4), dtype=torch.int64) tensor([], size=(0, 4)) tensor([[0., 0., 0., 0.]]) tensor([[1.7036e+19, 2.9965e+32, 3.9133e-14, 1.3585e-19]]) ``` the out array is never filled if `index` arg has 0 elements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70662 Reviewed By: dagitses Differential Revision: D33476807 Pulled By: albanD fbshipit-source-id: 97dbdd9c0133899e58828c43ecba81838807b8af	2022-01-07 09:20:43 -08:00
Peter Bell	917d56a7e4	Copy: Fix conj bit being ignored on type mismatch (#68963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68963 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33064492 Pulled By: anjali411 fbshipit-source-id: 043f927d6bfff46bf5f8ea6fce9409f250bf8ff8	2022-01-05 17:59:32 -08:00
Brian Hirsh	457ba1dd3e	Porting index_add to structured kernels, add an out variant (#65993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65993 This PR attempts to port `index_add` to structured kernels, but does more than that: * Adds an `out=` variant to `index_add` * Revises `native_functions.yaml` registrations, to not have multiple entries and instead pass default value to `alpha`. * Changes in `derivatives.yaml` file for autograd functioning * Revises error messages, please see: https://github.com/pytorch/pytorch/pull/65993#issuecomment-945441615 Follow-up PRs in near future will attempt to refactor the OpInfo test, and will give another look at tests in `test/test_torch.py` for this function. (hence the use of ghstack for this) ~This is WIP because there are tests failing for `Dimname` variant on mobile/android builds, and I'm working on fixing them.~ Issue tracker: https://github.com/pytorch/pytorch/issues/55070 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32646426 fbshipit-source-id: b035ecf843a9a27d4d1e18b202b035adc2a49ab5	2021-12-14 11:57:13 -08:00
kshitij12345	5b2586fe09	[testing] Ignore expected_regex in assertRaisesRegex for non-native device (#68723 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29719 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68723 Reviewed By: zou3519 Differential Revision: D32797061 Pulled By: mruberry fbshipit-source-id: 3bcae6d3d62d180059dbe39be520b0e7f9aea19f	2021-12-02 14:52:27 -08:00
Emilio Castillo	533e72e0a4	Fix DLPack CUDA stream convention (#67618 ) Summary: Apparently for the array API, cuda default stream and per thread stream should be 1 and 2 instead of 0 and 1: https://data-apis.org/array-api/latest/API_specification/array_object.html?dlpack-self-stream-none#dlpack-self-stream-none. This caused a problem in the interop with CuPy https://github.com/cupy/cupy/pull/5970#discussion_r739912926. cc rgommers leofang mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67618 Reviewed By: albanD Differential Revision: D32521805 Pulled By: mruberry fbshipit-source-id: 95777e4014e5edf1f88ba10adc03c6e34c13248d	2021-11-18 08:36:05 -08:00
kshitij12345	d5d2096dab	[testing] make @dtypes mandatory when using @dtypesIf (#68186 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53647 With this if a test forgets to add `dtypes` while using `dtypesIf`, following error is raised ``` AssertionError: dtypes is mandatory when using dtypesIf however 'test_exponential_no_zero' didn't specify it ``` Tested Locally Pull Request resolved: https://github.com/pytorch/pytorch/pull/68186 Reviewed By: VitalyFedyunin Differential Revision: D32468581 Pulled By: mruberry fbshipit-source-id: 805e0855f988b77a5d8d4cd52b31426c04c2200b	2021-11-18 08:29:31 -08:00
rusty1s	9807787135	`scatter_reduce` (#68115 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63780 Basic functionality of a `scatter_reduce` algorithm with `reduce="sum"`: * `scatter_reduce` is named as `scatter_reduce2` due to compiling issues * It currently re-uses functionality from `scatter_add` * Tests are missing: WIP The error when the `scatter_reduce` naming is used: ``` In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13949:18: error: redefinition of ‘struct at::_ops::scatter_reduce’ 13949 \| struct TORCH_API scatter_reduce { \| ^~~~~~~~~~~~~~ aten/src/ATen/Operators.h:13817:18: note: previous definition of ‘struct at::_ops::scatter_reduce’ 13817 \| struct TORCH_API scatter_reduce { \| ^~~~~~~~~~~~~~ aten/src/ATen/Operators.h:13960:18: error: redefinition of ‘struct at::_ops::scatter_reduce_out’ 13960 \| struct TORCH_API scatter_reduce_out { \| ^~~~~~~~~~~~~~~~~~ aten/src/ATen/Operators.h:13839:18: note: previous definition of ‘struct at::_ops::scatter_reduce_out’ 13839 \| struct TORCH_API scatter_reduce_out { \| ^~~~~~~~~~~~~~~~~~ In file included from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/core/TensorBody.h: In member function ‘at::Tensor at::Tensor::scatter_reduce(int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>) const’: aten/src/ATen/core/TensorBody.h:3976:83: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 3976 \| return at::_ops::scatter_reduce::call(const_cast<Tensor&>(*this), dim, index, reduce, output_size); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13824:109: note: initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’ 13824 \| static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from ../aten/src/ATen/ATen.h:15, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Functions.h: In function ‘at::Tensor at::scatter_reduce(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’: aten/src/ATen/Functions.h:7119:61: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 7119 \| return at::_ops::scatter_reduce::call(self, dim, index, reduce, output_size); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13824:109: note: initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’ 13824 \| static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from ../aten/src/ATen/ATen.h:15, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_out(at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’: aten/src/ATen/Functions.h:7124:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 7124 \| return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13846:111: note: initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’ 13846 \| static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from ../aten/src/ATen/ATen.h:15, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_outf(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>, at::Tensor&)’: aten/src/ATen/Functions.h:7129:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 7129 \| return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13846:111: note: initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’ 13846 \| static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from aten/src/ATen/NativeFunctions.h:6, from ../aten/src/ATen/TensorIndexing.h:12, from ../aten/src/ATen/ATen.h:20, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/NativeMetaFunctions.h: At global scope: aten/src/ATen/NativeMetaFunctions.h:496:18: error: redefinition of ‘struct at::meta::structured_scatter_reduce’ 496 \| struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase { \| ^~~~~~~~~~~~~~~~~~~~~~~~~ aten/src/ATen/NativeMetaFunctions.h:481:18: note: previous definition of ‘struct at::meta::structured_scatter_reduce’ 481 \| struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase { \| ^~~~~~~~~~~~~~~~~~~~~~~~~ ninja: build stopped: subcommand failed. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68115 Reviewed By: albanD Differential Revision: D32488450 Pulled By: cpuhrsch fbshipit-source-id: 65e79c6d0555c0d5715535bb52aade8d5fcd9722	2021-11-17 19:53:12 -08:00
Mikayla Gawarecki	cac3cd1433	add torch.diff support for n greater than 1 (#67260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67260 Addressing 54853 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31930294 Pulled By: mikaylagawarecki fbshipit-source-id: 97c7a27e9200c6688242680ff96b73dfff828479	2021-11-17 09:16:33 -08:00
Nick Anderson	f9ea41f257	Fixes spelling error writeable to writable, improves warning, and documentation (#67664 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46741 pytorchbot contributors: nickleus27, yanivsagy, and khanhthien123 SmrutiSikha this is mostly your work. We just did very minor clean up. cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67664 Reviewed By: gchanan Differential Revision: D32311838 Pulled By: mruberry fbshipit-source-id: 0e5d4d888caeccb0fd7c80e6ff11b1b1fa8e00d6	2021-11-11 13:05:00 -08:00
Kurt Mohler	db014b8529	Add `set_deterministic_debug_mode` and `get_deterministic_debug_mode` (#67778 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67386 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67778 Reviewed By: ngimel Differential Revision: D32310661 Pulled By: mruberry fbshipit-source-id: 300129e96ca51c22fa711182ce6a9f4d4d2ce57f	2021-11-11 12:48:29 -08:00
Thomas Viehmann	33b7790907	Fix conv_transpose3d backward with non-contiguous grad_out (#67829 ) Summary: Many thanks to Forest Yang (meowmix) from the forum for reporting it with a minimal reproduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67829 Reviewed By: malfet Differential Revision: D32184786 Pulled By: albanD fbshipit-source-id: b63dbd3148b5def2109deb2f4612c08f55f59dfb	2021-11-05 08:34:21 -07:00
soulitzer	83e8612d11	Clean up test autograd (#67413 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/66066 This PR: - cleans up op-specific testing from test_autograd. test_autograd should be reserved for testing generic autograd functionality - tests related to an operator are better colocated - see the tracker for details What to think about when moving tests to their correct test suite: - naming, make sure its not too generic - how the test is parametrized, sometimes we need to add/remove a device/dtype parameter - can this be merged with existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/67413 Reviewed By: jbschlosser, albanD Differential Revision: D32031480 Pulled By: soulitzer fbshipit-source-id: 8e13da1e58a38d5cecbfdfd4fe2b4fe6f816897f	2021-11-03 15:26:09 -07:00
kshitij12345	885a8e53ba	replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201 ) Summary: Reference https://github.com/pytorch/pytorch/issues/53849 Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201 Reviewed By: mrshenli Differential Revision: D31299718 Pulled By: mruberry fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd	2021-11-01 09:22:34 -07:00
kshitij12345	c00806beda	Add skipXLA and expectedFailureXLA decorator (#66857 ) Summary: Add skipXLA and expectedFailureXLA decorator and relevant test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66857 Reviewed By: ngimel Differential Revision: D32039856 Pulled By: mruberry fbshipit-source-id: 3c99d5e06c1c7684d1f798c11c783bd6ebea9899	2021-10-29 19:53:36 -07:00
jjsjann123	1ec732bc46	Add fp16/fp32 autocasting to JIT/TorchScript (#63939 ) Summary: Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b) This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast. We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)` The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs. Few limitation/challenge that is not properly resolved in this PR: 1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules. 2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input') 3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value. Credit goes mostly to: tlemo kevinstephano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939 Reviewed By: navahgar Differential Revision: D31093381 Pulled By: eellison fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314	2021-10-27 12:11:36 -07:00
Nikita Shulga	77beccaedb	Do not build PyTorch with caffe2 by default (#66658 ) Summary: CAFFE2 has been deprecated for a while, but still included in every PyTorch build. We should stop building it by default, although CI should still validate that caffe2 code is buildable. Build even fewer dependencies when compiling mobile builds without Caffe2 Introduce `TEST_CAFFE2` in torch.common.utils Skip `TestQuantizedEmbeddingOps` and `TestJit.test_old_models_bc` is code is compiled without Caffe2 Should be landed after https://github.com/pytorch/builder/pull/864 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66658 Reviewed By: driazati, seemethere, janeyx99 Differential Revision: D31669156 Pulled By: malfet fbshipit-source-id: 1cc45e2d402daf913a4685eb9f841cc3863e458d	2021-10-21 20:32:47 -07:00
Kurt Mohler	94f4e9a995	Enable warning tests for nondeterministic backward functions (#66736 ) Summary: Followup from https://github.com/pytorch/pytorch/issues/66233 Since https://github.com/pytorch/pytorch/issues/50209 was fixed, we can enable these warning tests now cc mruberry kurtamohler Pull Request resolved: https://github.com/pytorch/pytorch/pull/66736 Reviewed By: zou3519 Differential Revision: D31723385 Pulled By: mruberry fbshipit-source-id: dc1922a6d0c45cc80020db85710e755a89113861	2021-10-21 12:51:53 -07:00
Jane Xu	8a65047acc	[skip ci] Set test owners for everything considered with module: tests (#66865 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66865 Reviewed By: anjali411 Differential Revision: D31771147 Pulled By: janeyx99 fbshipit-source-id: 8bebe5ac2098364ef1ee93b590abb5f4455b0f89	2021-10-20 09:37:03 -07:00
lezcano	0974215c4d	Prefer mT and mH over transpose(-2, -1) and transpose(-2, -1).conj() (#64181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181 This PR replaces all the calls to: - `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python - `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python. It also simplifies two pieces of code, and fixes one bug where a pair of parentheses were missing in the function `make_symmetric_matrices`. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31692896 Pulled By: anjali411 fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a	2021-10-18 13:02:25 -07:00
Kurt Mohler	a25648953c	Add `warn_only` kwarg to `use_deterministic_algorithms` (#66233 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64883 Adds a `warn_only` kwarg to `use_deterministic_algorithms`. When enabled, calling an operation that does not have a deterministic implementation will raise a warning, rather than an error. `torch.testing._internal.common_device_type.expectedAlertNondeterministic` is also refactored and documented in this PR to make it easier to use and understand. cc mruberry kurtamohler Pull Request resolved: https://github.com/pytorch/pytorch/pull/66233 Reviewed By: bdhirsh Differential Revision: D31616481 Pulled By: mruberry fbshipit-source-id: 059634a82d54407492b1d8df08f059c758d0a420	2021-10-15 13:54:59 -07:00
anjali411	a82fcd3560	Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082 Fixes https://github.com/pytorch/pytorch/issues/66024 #65779 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD Test Plan: Imported from OSS Reviewed By: Gamrix, albanD Differential Revision: D31615588 Pulled By: anjali411 fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19	2021-10-13 13:57:51 -07:00
lezcano	82a216c45b	Add tensor.{adjoint(),H,mT,mH} methods and properties (#64179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64179 This PR follows the discussion in https://github.com/pytorch/pytorch/issues/45063#issuecomment-904431478 Fixes https://github.com/pytorch/pytorch/issues/45063 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30730483 Pulled By: anjali411 fbshipit-source-id: 821d25083f5f682450f6812bf852dc96a1cdf9f2	2021-10-13 07:44:43 -07:00
Kurt Mohler	5883523c1d	Remove dtype from torch.Storage and use only torch.ByteStorage (#62030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030 Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible Fixes https://github.com/pytorch/pytorch/issues/47442 * THE SERIALIZATION FORMAT IS FULLY FC/BC. We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today. * There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate. * As we no longer know what dtype of a storage is, we've removed the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes. * `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls. `Storage._new_with_file` and `_set_from_file` require explicit element size arguments. * It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor. * It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling. * The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall. To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage or your serialization code will degrade to standard file-based serialization. Original pull request: https://github.com/pytorch/pytorch/pull/59671 Reviewed By: soulitzer, ngimel Differential Revision: D29466819 Pulled By: ezyang fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e	2021-10-05 13:50:34 -07:00
Philip Meier	aebde1bc2b	deprecate device getter from `torch.testing` namespace (#63844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63844 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31141433 Pulled By: mruberry fbshipit-source-id: a29331278ab99a19e225e2cb357458e3db4f9732	2021-09-29 02:40:52 -07:00
Alban Desmaison	7c62b6e973	add deepcopy support to subclasses (#65584 ) Summary: Happy to get any feedback on how to make this code cleaner! This: - Fix Tensor attribute deepcopy BC-breaking? - Add a test for Tensor attribute deepcopy - Fix subclass deepcopy - Moves the subclass serialization tests into their own class not to interfere with other serialization test logic - Add a test for subclass deepcopy cc ezyang gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/65584 Reviewed By: gchanan Differential Revision: D31206590 Pulled By: albanD fbshipit-source-id: 74a8f0767f4933b9c941fbea880a8fd1b893ea2f	2021-09-27 14:36:22 -07:00
Kshiteej K	ff6b475d4a	[fix] don't expose unique_dim in torch (#63080 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62793 This is mostly a quick fix. I think the more correct fix could be updating `unique_dim` to `_unique_dim` which could be BC-breaking for C++ users (� maybe). Maybe something else I am missing. ~~Not sure how to add a test for it.~~ Have tested it locally. We can add a test like following. Tested this locally, it fails currently but passes with the fix. ```python def test_wildcard_import(self): exec('from torch import *') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63080 Reviewed By: gchanan Differential Revision: D30738711 Pulled By: zou3519 fbshipit-source-id: b86d0190e45ba0b49fd2cffdcfd2e3a75cc2a35e	2021-09-14 18:19:17 -07:00
Victor Quach	8131bc85d0	Raise TypeError on assigned grad with wrong type (#64876 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64813 Raises a TypeError when assigned value to a grad is not a Tensor or None. Adds tests. cc ezyang gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/64876 Reviewed By: anjali411 Differential Revision: D30901678 Pulled By: soulitzer fbshipit-source-id: dbb3cb5fd0bbac6918e0b2e2f51d340daa43dee0	2021-09-13 16:41:45 -07:00
Emilio Castillo	1cb3507ed3	Adds DLPack support (#57110 ) Summary: Partially Fixes https://github.com/pytorch/pytorch/issues/55090 Depends on https://github.com/pytorch/pytorch/issues/55365 Inspired by https://github.com/dmlc/dlpack/issues/57#issuecomment-774482973 Questions, in PyTorch we can't create streams or easily synchronize them from just an integer. Should we add an [`ExternalStream`](https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.ExternalStream.html) object like the one we have in CuPy? TODO: Add tests Would like some feedback as this design needs quite a few iterations rgommers leofang Pull Request resolved: https://github.com/pytorch/pytorch/pull/57110 Reviewed By: saketh-are Differential Revision: D30761481 Pulled By: mruberry fbshipit-source-id: e85d78df3c1f8defc2a698878da89cd843cb1209	2021-09-12 19:47:15 -07:00
Alban Desmaison	d8ae3cc318	Add more error checking in subclass creation (#64746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64746 This extracts the error checking that used to be in the PR above. We are not going to land the proposed fix there, but I think we want this error checking in right now as these would lead to respectively a memory leak and arbitrary memory read/write. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30867569 Pulled By: albanD fbshipit-source-id: bf468033fb8b49fcb26eed423f5fad82b4a46c56	2021-09-10 16:49:10 -07:00
Philip Meier	26b7ff5aea	deprecate dtype getters from `torch.testing` namespace (#63554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554 Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold: 1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`. 2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries. We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30662206 Pulled By: mruberry fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56	2021-09-07 08:58:51 -07:00
Ivan Yashchuk	a91a278d60	Fix `copy_transpose_valid` condition for `copy_same_type_transpose_` (#64425 ) Summary: Thanks to ngimel for the hint where the problem might be (https://github.com/pytorch/pytorch/issues/64358#issuecomment-910868849)! I added a test that fails on master to verify the fix. The shape `(60, 60)` was chosen because of `MIN_SZ = 60 * 60` in `copy_transpose_valid`. Fixes https://github.com/pytorch/pytorch/issues/64358 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64425 Reviewed By: mruberry Differential Revision: D30752725 Pulled By: ngimel fbshipit-source-id: f40370ea8365c94e30f8e8a3dcab5f3b3462464a	2021-09-03 18:50:33 -07:00
Kushashwa Ravi Shrimali	76e187aa08	Port `gather` to structured kernel (#63312 ) Summary: Will add a description once this is ready for review. cc: ysiraichi ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/63312 Reviewed By: iramazanli Differential Revision: D30597447 Pulled By: ezyang fbshipit-source-id: d36e59835c2f4b38e286032dd2a1111a7e16b7e5	2021-09-02 01:36:21 -07:00
anjali411	5d80a48cef	Add fast path for addmm when the inputs are conjugate (#59380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59380 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28898374 Pulled By: anjali411 fbshipit-source-id: eab0e64d37bb57c18b54cabb8e5c00666338ba04	2021-09-01 16:34:02 -07:00
Philip Meier	401bbb2aa0	remove componentwise comparison of complex values in TestCase.assertEqual (#63572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63572 Addresses #61906. Issue will be fixed later in the stack when `torch.testing.assert_close` got the same treatment. cc ezyang gchanan Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30633527 Pulled By: mruberry fbshipit-source-id: c2002a4998a7a75cb2ab83f87190bde43a9d4f7c	2021-08-30 12:36:45 -07:00
Kushashwa Ravi Shrimali	d37636901e	[Doc] `make_tensor` to `torch.testing` module (#63925 ) Summary: This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs. TODOs: * [x] Add examples cc: pmeier mruberry brianjo Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925 Reviewed By: ngimel Differential Revision: D30633487 Pulled By: mruberry fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af	2021-08-30 12:25:40 -07:00
mingfeima	b0782f0f32	add BFloat16 support for bernoulli and Dropout on CPU (#56372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56372 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D28836792 Pulled By: VitalyFedyunin fbshipit-source-id: ede951d172a59276e11383fd767778ab959b5a6b	2021-08-25 12:01:27 -07:00
Aaron Bockover	c78ab28441	Add support for the ONNX Runtime Eager Mode backend (#58248 ) Summary: This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort. We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends). The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective). Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248 Reviewed By: astaff Differential Revision: D30344992 Pulled By: albanD fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2	2021-08-20 11:17:13 -07:00
Philip Meier	99203580a9	Updates internal `assert_allclose` callsites in favor of `assert_close` (#61841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841 Redo of #60863. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30408145 Pulled By: mruberry fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58	2021-08-19 12:50:41 -07:00
Thomas J. Fan	07b00fc324	ENH Migrate nll_loss2d from THC to ATen (#62826 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24608 Fixes https://github.com/pytorch/pytorch/issues/24607 With the following benchmark, the backward pass runs a little slower. This is strange since the implementation should be exactly the same. <details> <summary>Benchmark script</summary> ```python from itertools import product import torch import torch.nn as nn import torch.nn.functional as F import time torch.manual_seed(0) MS_PER_SECOND = 1000 def _time(): torch.cuda.synchronize() return time.perf_counter() * MS_PER_SECOND device = "cuda" C = 3 n_runs = 30 reductions = ["none", "sum", "mean"] Ns = [128, 256, 512] Hs = [128, 256, 512] for reduction, N, H in product(reductions, Ns, Hs): total_fwd_time = 0 total_back_time = 0 if reduction == "none": grad_out = torch.randn(N, H, H, device=device) else: grad_out = torch.randn(1)[0] for _ in range(n_runs): input = torch.randn(N, C, H, H, device=device, requires_grad=True) target = torch.rand(N, H, H, device=device).mul(3).floor().long() # forward start = _time() result = F.nll_loss(input, target, reduction=reduction) total_fwd_time += _time() - start result = F.nll_loss(input, target, reduction=reduction) for _ in range(n_runs): # backward start = _time() result.backward(grad_out, retain_graph=True) total_back_time += _time() - start fwd_avg = total_fwd_time / n_runs bwd_avg = total_back_time / n_runs print( f"input size({N}, {C}, {H}, {H}), reduction: {reduction}, fwd: {fwd_avg:.2f} (ms), back: {bwd_avg:.2f} (ms)" ) ``` </details> <details> <summary>master results</summary> ``` input size(128, 3, 128, 128), reduction: none, fwd: 0.34 (ms), back: 0.57 (ms) input size(128, 3, 256, 256), reduction: none, fwd: 2.56 (ms), back: 3.85 (ms) input size(128, 3, 512, 512), reduction: none, fwd: 14.54 (ms), back: 16.62 (ms) input size(256, 3, 128, 128), reduction: none, fwd: 1.26 (ms), back: 1.78 (ms) input size(256, 3, 256, 256), reduction: none, fwd: 7.07 (ms), back: 8.22 (ms) input size(256, 3, 512, 512), reduction: none, fwd: 29.38 (ms), back: 33.29 (ms) input size(512, 3, 128, 128), reduction: none, fwd: 3.41 (ms), back: 4.05 (ms) input size(512, 3, 256, 256), reduction: none, fwd: 14.32 (ms), back: 16.46 (ms) input size(512, 3, 512, 512), reduction: none, fwd: 59.20 (ms), back: 66.68 (ms) input size(128, 3, 128, 128), reduction: sum, fwd: 0.08 (ms), back: 0.21 (ms) input size(128, 3, 256, 256), reduction: sum, fwd: 0.21 (ms), back: 0.73 (ms) input size(128, 3, 512, 512), reduction: sum, fwd: 0.82 (ms), back: 2.86 (ms) input size(256, 3, 128, 128), reduction: sum, fwd: 0.12 (ms), back: 0.39 (ms) input size(256, 3, 256, 256), reduction: sum, fwd: 0.42 (ms), back: 1.45 (ms) input size(256, 3, 512, 512), reduction: sum, fwd: 1.53 (ms), back: 5.66 (ms) input size(512, 3, 128, 128), reduction: sum, fwd: 0.21 (ms), back: 0.74 (ms) input size(512, 3, 256, 256), reduction: sum, fwd: 0.78 (ms), back: 2.86 (ms) input size(512, 3, 512, 512), reduction: sum, fwd: 2.98 (ms), back: 11.23 (ms) input size(128, 3, 128, 128), reduction: mean, fwd: 0.07 (ms), back: 0.21 (ms) input size(128, 3, 256, 256), reduction: mean, fwd: 0.21 (ms), back: 0.73 (ms) input size(128, 3, 512, 512), reduction: mean, fwd: 0.82 (ms), back: 2.86 (ms) input size(256, 3, 128, 128), reduction: mean, fwd: 0.13 (ms), back: 0.39 (ms) input size(256, 3, 256, 256), reduction: mean, fwd: 0.42 (ms), back: 1.45 (ms) input size(256, 3, 512, 512), reduction: mean, fwd: 1.54 (ms), back: 5.65 (ms) input size(512, 3, 128, 128), reduction: mean, fwd: 0.22 (ms), back: 0.74 (ms) input size(512, 3, 256, 256), reduction: mean, fwd: 0.78 (ms), back: 2.87 (ms) input size(512, 3, 512, 512), reduction: mean, fwd: 2.98 (ms), back: 11.23 (ms) ``` </details> <details> <summary>PR results</summary> ``` input size(128, 3, 128, 128), reduction: none, fwd: 0.33 (ms), back: 0.59 (ms) input size(128, 3, 256, 256), reduction: none, fwd: 2.51 (ms), back: 3.92 (ms) input size(128, 3, 512, 512), reduction: none, fwd: 14.52 (ms), back: 17.05 (ms) input size(256, 3, 128, 128), reduction: none, fwd: 1.23 (ms), back: 1.85 (ms) input size(256, 3, 256, 256), reduction: none, fwd: 7.07 (ms), back: 8.45 (ms) input size(256, 3, 512, 512), reduction: none, fwd: 29.39 (ms), back: 34.21 (ms) input size(512, 3, 128, 128), reduction: none, fwd: 3.40 (ms), back: 4.18 (ms) input size(512, 3, 256, 256), reduction: none, fwd: 14.33 (ms), back: 16.90 (ms) input size(512, 3, 512, 512), reduction: none, fwd: 59.04 (ms), back: 68.36 (ms) input size(128, 3, 128, 128), reduction: sum, fwd: 0.07 (ms), back: 0.25 (ms) input size(128, 3, 256, 256), reduction: sum, fwd: 0.21 (ms), back: 0.86 (ms) input size(128, 3, 512, 512), reduction: sum, fwd: 0.82 (ms), back: 3.33 (ms) input size(256, 3, 128, 128), reduction: sum, fwd: 0.12 (ms), back: 0.46 (ms) input size(256, 3, 256, 256), reduction: sum, fwd: 0.42 (ms), back: 1.70 (ms) input size(256, 3, 512, 512), reduction: sum, fwd: 1.53 (ms), back: 6.58 (ms) input size(512, 3, 128, 128), reduction: sum, fwd: 0.21 (ms), back: 0.87 (ms) input size(512, 3, 256, 256), reduction: sum, fwd: 0.78 (ms), back: 3.34 (ms) input size(512, 3, 512, 512), reduction: sum, fwd: 2.98 (ms), back: 13.07 (ms) input size(128, 3, 128, 128), reduction: mean, fwd: 0.07 (ms), back: 0.26 (ms) input size(128, 3, 256, 256), reduction: mean, fwd: 0.21 (ms), back: 0.86 (ms) input size(128, 3, 512, 512), reduction: mean, fwd: 0.82 (ms), back: 3.34 (ms) input size(256, 3, 128, 128), reduction: mean, fwd: 0.12 (ms), back: 0.46 (ms) input size(256, 3, 256, 256), reduction: mean, fwd: 0.42 (ms), back: 1.72 (ms) input size(256, 3, 512, 512), reduction: mean, fwd: 1.53 (ms), back: 6.60 (ms) input size(512, 3, 128, 128), reduction: mean, fwd: 0.21 (ms), back: 0.87 (ms) input size(512, 3, 256, 256), reduction: mean, fwd: 0.78 (ms), back: 3.33 (ms) input size(512, 3, 512, 512), reduction: mean, fwd: 2.98 (ms), back: 13.07 (ms) ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/62826 Reviewed By: bdhirsh Differential Revision: D30282279 Pulled By: ngimel fbshipit-source-id: 4aa0ff3f8af0632957417931d332ec486a12b52d	2021-08-12 18:07:15 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Matti Picus	658540f43f	remove deprecated is_deterministic and set_deterministic (#62158 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62158 Reviewed By: mruberry Differential Revision: D29909634 Pulled By: ezyang fbshipit-source-id: ccffbcf8f378e39bd2c7fbeace7ed1cbbe003981	2021-08-04 16:45:23 -07:00
Natalia Gimelshein	d783617216	enable warnings on cuda synchronization (#62092 ) Summary: This creates `torch.cuda.set_warn_on_synchronization()` function that would warn or error when synchronizing operation is performed. We could wrap it in a context manager for ease of use, but it would be a lie, because it sets global, and not thread-local state. Since it's intended for debugging, maybe that's ok though. As all `torch.cuda.*` functions, it's going through CPython, not pybind, so the argument is converted to long before being passed to c10 function. I'll make python argument a python enum class, but without pybind it'll still have to go thourgh long conversion. For a test script ``` import torch torch.cuda.set_warn_on_synchronization(1) x=torch.randn(10, device="cuda") x.nonzero() y=torch.randn((), device="cuda") if y: print("something") torch.multinomial(x.abs(), 10, replacement=False) torch.randperm(20000, device="cuda") ind = torch.randint(10, (3,), device="cuda") mask = torch.randint(2, (10,), device="cuda", dtype=torch.bool) val = torch.randn((), device="cuda") x[mask]=1. x[mask] = val torch.cuda.synchronize() ``` the output is ``` /../playground/sync_warn_test.py:4: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.) x.nonzero() /../playground/sync_warn_test.py:7: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.) if y: something /../playground/sync_warn_test.py:9: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.) torch.multinomial(x.abs(), 10, replacement=False) /../playground/sync_warn_test.py:15: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.) x[mask] = val ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62092 Reviewed By: mruberry Differential Revision: D29968792 Pulled By: ngimel fbshipit-source-id: cc6f817212c164727ed99ecf6ab050dc29631b9e	2021-07-30 09:13:01 -07:00
Jagadish Krishnamoorthy	64d61901eb	[ROCm] Skip test_masked_scatter_large_tensor_cuda (#61313 ) Summary: Refer https://github.com/pytorch/pytorch/issues/60190. Skipping unit test until hipcub issue is fixed. Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/61313 Reviewed By: iramazanli Differential Revision: D29626664 Pulled By: malfet fbshipit-source-id: db2a390d2a3e28ec05a5032a50aa9a35c86b96ca	2021-07-09 10:27:08 -07:00
kshitij12345	5e9bcf9101	fix: support removing hook in the hook (#61250 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/58354 Problem: Once a hook is called `05c1e5b655/torch/csrc/autograd/python_hook.cpp (L51-L54)` If the hook has `handle.remove()` while executing and if there are no references to the hook function object then `python` is free to garbage collect. At the subsequent call to `05c1e5b655/torch/csrc/autograd/python_hook.cpp (L54)` we have `hook` pointing to invalid memory Thus when we try to fetch the name for `hook` from `check_single_result` with `05c1e5b655/torch/csrc/autograd/python_hook.cpp (L175-L177)` we get segfault. Solution: Temporarily increase the life-time of hook with `Py_INCREF` till we have verified the result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61250 Reviewed By: iramazanli Differential Revision: D29623826 Pulled By: soulitzer fbshipit-source-id: c71322311f19066cafb7203980668868c59d4e5e	2021-07-09 09:27:58 -07:00
Heitor Schueroff	f32f85e6da	Implemented torch.corrcoef (#60420 ) Summary: Implements `torch.corrcoef` similar to [`np.corrcoef`](https://numpy.org/doc/stable/reference/generated/numpy.corrcoef.html) using `torch.cov` implemented in https://github.com/pytorch/pytorch/pull/58311. closes https://github.com/pytorch/pytorch/issues/1254 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60420 Reviewed By: mruberry Differential Revision: D29474687 Pulled By: heitorschueroff fbshipit-source-id: f3c7c5610363aebd88274a51fc77e3cf879cb611	2021-06-30 12:36:02 -07:00
Victor Bittorf	91c076eadc	Add TorchVitals for DataLoader (#60959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60959 Add TorchVitals for Dataloader, this indicates that the data loader was enabled. This is a no-op if TORCH_VITALS environment variable is not set. Test Plan: buck test mode/dbg caffe2/test:torch -- --regex vitals Reviewed By: VitalyFedyunin Differential Revision: D29445146 fbshipit-source-id: d5778fff3dafb3c0463fec7a498bff4905597518	2021-06-29 14:08:32 -07:00
Heitor Schueroff	ec9c03c234	Implemented torch.cov (#58311 ) Summary: Based from https://github.com/pytorch/pytorch/pull/50466 Adds the initial implementation of `torch.cov` similar to `numpy.cov`. For simplicity, we removed support for many parameters in `numpy.cov` that are either redundant such as `bias`, or have simple workarounds such as `y` and `rowvar`. cc PandaBoi closes https://github.com/pytorch/pytorch/issues/19037 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58311 Reviewed By: jbschlosser Differential Revision: D29431651 Pulled By: heitorschueroff fbshipit-source-id: 167dea880f534934b145ba94291a9d634c25b01b	2021-06-29 14:02:39 -07:00
kshitij12345	956faea585	[fix] cauchy sampling inf on cuda (#60186 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59144 As pointed by ngimel, the issue is indeed with calling `tan`. However the C++ `std::tan` [documenation](https://en.cppreference.com/w/cpp/numeric/math/tan) states that ``` The function has mathematical poles at π(1/2 + n); however no common floating-point representation is able to represent π/2 exactly, thus there is no value of the argument for which a pole error occurs. ``` All `torch.tan`,`numpy.tan` and `math.tan` are compliant with the above statement. <details> ```python import torch import math import numpy as np # Single Precision print(torch.tan(torch.tensor(math.pi, device='cuda', dtype=torch.float32) * 0.5)) print(np.tan(np.array(np.pi, dtype=np.float32) * 0.5)) # Double Precision print(math.tan(math.pi * 0.5)) print(torch.tan(torch.tensor(math.pi, device='cuda', dtype=torch.double) * 0.5)) print(np.tan(np.array(np.pi, dtype=np.float64) * 0.5)) ``` Output ``` tensor(-22877334., device='cuda:0') -22877332.42885646 1.633123935319537e+16 tensor(1.6331e+16, device='cuda:0', dtype=torch.float64) 1.633123935319537e+16 ``` </details> So this issue stems from the use of `__tanf` faster approximation of tan from CUDA library (for float16, bfloat16 and float). `8a839c5478/aten/src/ATen/NumericUtils.h (L91-L100)` The fix in the PR is to use the slower but more correct version. Benchmark:: ``` [ cauchy : input dtype torch.float16 device cuda ] \| Before \| After 1 threads: ------------------------------------- (128,) \| 3.8 \| 4.3 (256, 128) \| 3.8 \| 4.2 (2, 512, 256) \| 3.8 \| 4.2 (2, 64, 256, 128) \| 22.8 \| 29.6 (4, 2, 512, 256, 128) \| 649.6 \| 869.3 Times are in microseconds (us). [ cauchy : input dtype torch.bfloat16 device cuda ] \| Before \| After 1 threads: ------------------------------------- (128,) \| 3.8 \| 4.3 (256, 128) \| 3.8 \| 4.3 (2, 512, 256) \| 3.8 \| 4.3 (2, 64, 256, 128) \| 23.8 \| 30.8 (4, 2, 512, 256, 128) \| 682.5 \| 904.2 Times are in microseconds (us). [ cauchy : input dtype torch.float32 device cuda ] \| Before \| After 1 threads: -------------------------------------- (128,) \| 3.8 \| 4.2 (256, 128) \| 3.7 \| 4.2 (2, 512, 256) \| 3.7 \| 4.2 (2, 64, 256, 128) \| 35.3 \| 37.1 (4, 2, 512, 256, 128) \| 1020.0 \| 1058.3 Times are in microseconds (us). [- cauchy : input dtype torch.float64 device cuda ] \| Before \| After 1 threads: ---------------------------------------- (128,) \| 3.8 \| 4.2 (256, 128) \| 8.0 \| 8.0 (2, 512, 256) \| 46.0 \| 46.0 (2, 64, 256, 128) \| 669.2 \| 669.4 (4, 2, 512, 256, 128) \| 21255.0 \| 21262.1 Times are in microseconds (us). ``` <details> Benchmark Script: ```python import torch import itertools import time from torch.utils.benchmark import Timer from torch.utils.benchmark import Compare import sys import pickle print('Using pytorch %s' % (torch.__version__)) cuda_shapes = [(128,), (256, 128), (2, 512, 256), (2, 64, 256, 128), (4, 2, 512, 256, 128)] cuda_dtypes = [torch.half, torch.bfloat16, torch.float, torch.double] results = [] repeats = 10 for device in ['cuda']: dtypes = cuda_dtypes shapes = cuda_shapes for dtype in dtypes: for shape in shapes: t = torch.randn(shape, device=device, dtype=dtype) * 10 tasks = [("t.cauchy_()", "After", "")] timers = [Timer(stmt=stmt, label=f"cauchy : input dtype {dtype} device {device}", sub_label=f"{(shape)}", description=desc, globals=globals()) for stmt, desc, label in tasks] for i, timer in enumerate(timers * repeats): results.append( timer.blocked_autorange() ) print(f"\r{i + 1} / {len(timers) * repeats}", end="") sys.stdout.flush() with open('after-pr.pkl', 'wb') as f: pickle.dump(results, f) comparison = Compare(results) comparison.print() ``` Compare Script: ``` import torch import itertools import time from torch.utils.benchmark import Timer from torch.utils.benchmark import Compare import sys import pickle with open('before-pr.pkl', 'rb') as f: after_results = pickle.load(f) with open('after-pr.pkl', 'rb') as f: before_results = pickle.load(f) comparison = Compare(after_results + before_results) comparison.print() ``` </details> TODO: * [x] Add comment Pull Request resolved: https://github.com/pytorch/pytorch/pull/60186 Reviewed By: jbschlosser Differential Revision: D29433897 Pulled By: ngimel fbshipit-source-id: 9c5f14b83e3372bed72369f70eed9256c04385c6	2021-06-28 12:49:30 -07:00
Victor Bittorf	8b6487c650	Add CUDA Vital (#58059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58059 Add CUDA.used vital sign which is true only if CUDA was "used" which technically means the context was created. Also adds the following features: - Force vitals to be written even if vitals are disabled, to enable testing when the env variable is not set from the start of execution - Add a read_vitals call for python to read existing vital signs. Test Plan: buck test mode/dbg caffe2/test:torch -- --regex basic_vitals Reviewed By: xuzhao9 Differential Revision: D28357615 fbshipit-source-id: 681bf9ef63cb1458df9f1c241d301a3ddf1e5252	2021-06-25 16:31:11 -07:00
Masaki Kozuki	a404cc9a7b	CUDA `addcmul` and `addcdiv` do math in float for 16 bits I/O (#60715 ) Summary: Currently foreach `addcmul` and `addcdiv` cast scalar to float so that actual math is done in FP32 when tensor dtype is Float16/BFloat16 while regular `addcmul` and `addcdiv`, not. ### Reproducible steps to see the behavioral difference ```ipython In [1]: import torch; torch.__version__ Out[1]: '1.9.0' In [2]: a, b, c = torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([-1.0], device='cuda', dtype=torch.half) In [4]: torch.addcmul(a, b, c, value=2) Out[4]: tensor([-inf], device='cuda:0', dtype=torch.float16) In [5]: torch._foreach_addcmul([a], [b], [c], value=2)[0] Out[5]: tensor([-60000.], device='cuda:0', dtype=torch.float16) ``` ### How foreach casts? Foreach addcmul and addcdiv cast scalar to `opmath_t` (almost equivalent to acc_type) here: `42c8439b6e/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu (L30)` and cast inputs and results here: `42c8439b6e/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L133-L135)` Related to https://github.com/pytorch/pytorch/issues/58833 #60227 https://github.com/pytorch/pytorch/issues/60454 cc ptrblck mcarilli ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/60715 Reviewed By: albanD Differential Revision: D29385715 Pulled By: ngimel fbshipit-source-id: 8bb2db19ab66fc99d686de056a6ee60f9f71d603	2021-06-25 10:21:35 -07:00
Ilqar Ramazanli	90cd57ee16	To add edge_order=2 and documentation for gradient operator (#58165 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56036 Fixes https://github.com/pytorch/pytorch/issues/56130 * All the interior points are computed using second order accurate central differences method for gradient operator. However, currently we only have first order method computation for edge points. In this PR we are adding second order methods for edge points as well. * Currently, there is no detailed description of how gradient operator computed using second order method, and how to use parameters correctly. We add detailed explanation of meaning of each parameter, and return of the gradient operator, meanwhile giving description of the second-order computation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58165 Reviewed By: mruberry Differential Revision: D29305321 Pulled By: iramazanli fbshipit-source-id: 0e0e418eed801c8510b8babe2ad3d064479fb4d6	2021-06-23 03:35:15 -07:00
Philip Meier	0c916c8a4e	up the priority of numpy array comparisons in self.assertEqual (#59067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58988. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067 Reviewed By: jbschlosser Differential Revision: D28986642 Pulled By: heitorschueroff fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0	2021-06-22 13:07:07 -07:00
praneeth	9b30fb8528	add support for constant (#60166 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58739 Add support for constants according to python array API stipulation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60166 Reviewed By: anjali411 Differential Revision: D29253958 Pulled By: mruberry fbshipit-source-id: 0bc86b74d3a4eb3ec4a65c941ec2710747402db1	2021-06-21 20:47:21 -07:00
Thomas J. Fan	c16f87949f	ENH Adds nn.ReflectionPad3d (#59791 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27655 This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791 Reviewed By: gchanan Differential Revision: D29242015 Pulled By: jbschlosser fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56	2021-06-21 10:53:14 -07:00
Peter Bell	e8e3394ea8	Recognize transposed dense tensors as a form of partial overlap (#59014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59014 Fixes #48401 `assert_no_overlap` currently has a false-negative where it recognizes the transpose of a contiguous tensor as fully overlapping. This happens because the memory regions do fully overlap, but of course the strides are different so the actual elements don't all overlap. This goes slightly in the other direction, by requiring strides to exactly match we get false-positives for some unusual situations, e.g. ``` torch.add(a, a, out=a.view([1, *a.shape])) ``` Or replacing strides of length-1 dimensions, etc. However, I think these are sufficiently obscure that it's okay to error and the common cases like inplace operations still work as before. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D29040928 Pulled By: ngimel fbshipit-source-id: 5a636c67536a3809c83f0d3117d2fdf49c0a45e6	2021-06-18 16:29:25 -07:00
Mike Ruberry	92513038e8	Revert D28994140: [pytorch][PR] Implemented torch.cov Test Plan: revert-hammer Differential Revision: D28994140 (`23c232554b`) Original commit changeset: 1890166c0a9c fbshipit-source-id: 73dfe1b00464e38f004f99960cdeeb604ed4b20a	2021-06-13 02:33:37 -07:00
Heitor Schueroff	23c232554b	Implemented torch.cov (#58311 ) Summary: Based from https://github.com/pytorch/pytorch/pull/50466 Adds the initial implementation of `torch.cov` similar to `numpy.cov`. For simplicity, we removed support for many parameters in `numpy.cov` that are either redundant such as `bias`, or have simple workarounds such as `y` and `rowvar`. cc PandaBoi TODO - [x] Improve documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/58311 Reviewed By: mruberry Differential Revision: D28994140 Pulled By: heitorschueroff fbshipit-source-id: 1890166c0a9c01e0a536acd91571cd704d632f44	2021-06-11 09:40:50 -07:00
Kimish Patel	4f79270b89	[PyTorch ] Thread parallel bmm across batch dim (#59596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59596 Parallelize batch matmul across batch dim. This was found to improve perf for some usecases on mobile. ghstack-source-id: 130989569 Test Plan: CI unit tests Reviewed By: albanD Differential Revision: D26833417 fbshipit-source-id: 9b84d89d29883a6c9d992d993844dd31a25f76b1	2021-06-10 08:25:40 -07:00
Yukio Siraichi	84061dadad	Add reduce variants for `scatter` operation. (#57015 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56463 #56464 - Add reduce variants for `scatter` in both _native_functions.yaml_ and _TensorAdvancedIndexing.cpp_ - Add `OpInfo` tests and reduce tests in _test_torch.py_ - Fix default reduce argument for `scatter_` in __tensor_docs.py_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/57015 Reviewed By: mrshenli Differential Revision: D28162657 Pulled By: ezyang fbshipit-source-id: 4d37ed1569ce8560aca1085c9cf5349f11427c4f	2021-06-08 13:37:26 -07:00
Mike Ruberry	de40c8e495	Adds remaining OpInfos and removes redundant test generators (#55558 ) Summary: Per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55558 Reviewed By: ngimel Differential Revision: D28922522 Pulled By: mruberry fbshipit-source-id: 89cefd93788bc8aa0683f4583cf5caa81aa2dc93	2021-06-06 14:52:26 -07:00
Natalia Gimelshein	344ecb2e71	flip via TI (#59509 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/58747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59509 Reviewed By: mruberry Differential Revision: D28918665 Pulled By: ngimel fbshipit-source-id: b045c7b35eaf22e53b1bc359ffbe5a4fda05dcda	2021-06-05 15:43:29 -07:00
Natalia Gimelshein	5117ac3bb4	Revert D28877076: [pytorch][PR] torch.flip via TI Test Plan: revert-hammer Differential Revision: D28877076 (`d82bc3feb8`) Original commit changeset: 4fa6eb519085 fbshipit-source-id: c81e7d3283ff6822db913bf9f49a1533268755d0	2021-06-04 23:03:53 -07:00
lezcano	d82bc3feb8	torch.flip via TI (#58747 ) Summary: Implements an idea by ngimel to improve the performance of `torch.flip` via a clever hack into TI to bypass the fact that TI is not designed to work with negative indices. Something that might be added is vectorisation support on CPU, given how simple the implementation is now. Some low-hanging fruits that I did not implement: - Write it as a structured kernel - Migrate the tests to opinfos - Have a look at `cumsum_backward` and `cumprod_backward`, as I think that they could be implemented faster with `flip`, now that `flip` is fast. Edit This operation already has OpInfos and it cannot be migrated to a structured kernel because it implements quantisation Summary of the PR: - x1.5-3 performance boost on CPU - x1.5-2 performance boost on CUDA - Comparable performance across dimensions, regardless of the strides (thanks TI) - Simpler code <details> <summary> Test Script </summary> ```python from itertools import product import torch from torch.utils.benchmark import Compare, Timer def get_timer(size, dims, num_threads, device): x = torch.rand(size, device=device) timer = Timer( "torch.flip(x, dims=dims)", globals={"x": x, "dims": dims}, label=f"Flip {device}", description=f"dims: {dims}", sub_label=f"size: {size}", num_threads=num_threads, ) return timer.blocked_autorange(min_run_time=5) def get_params(): sizes = ((1000,)2, (1000,)3, (10000,)2) for size, device in product(sizes, ("cpu", "cuda")): threads = (1, 2, 4) if device == "cpu" else (1,) list_dims = [(0,), (1,), (0, 1)] if len(size) == 3: list_dims.append((0, 2)) for num_threads, dims in product(threads, list_dims): yield size, dims, num_threads, device def compare(): compare = Compare([get_timer(*params) for params in get_params()]) compare.trim_significant_figures() compare.colorize() compare.print() compare() ``` </details> <details> <summary> Benchmark PR </summary> ![image](https://user-images.githubusercontent.com/3291265/119139954-81e46d80-ba3b-11eb-9aad-e825e515d41b.png) </details> <details> <summary> Benchmark master </summary> ![image](https://user-images.githubusercontent.com/3291265/119139915-76914200-ba3b-11eb-9aa8-84b3ca220c93.png) </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/58747 Reviewed By: agolynski Differential Revision: D28877076 Pulled By: ngimel fbshipit-source-id: 4fa6eb519085950176cb3a9161eeb3b6289ec575	2021-06-04 20:13:38 -07:00
Elton Leander Pinto	2119efd234	`reflection_pad1d_backward`: Port to structured (#59103 ) Summary: Tracking Issue: https://github.com/pytorch/pytorch/issues/55070 Port `reflection_pad1d_backward` to structured kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59103 Test Plan: Pre-existing tests Reviewed By: jbschlosser Differential Revision: D28836043 Pulled By: ezyang fbshipit-source-id: 4c3b0880edf305896f540113dcab70c8af24253b	2021-06-04 10:23:53 -07:00
Edward Yang	f05d5bec48	Preserve PyObject even when it goes dead (#56017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56017 Fixes #55686 This patch is seemingly straightforward but some of the changes are very subtle. For the general algorithmic approach, please first read the quoted issue. Based on the algorithm, there are some fairly straightforward changes: - New boolean on TensorImpl tracking if we own the pyobj or not - PythonHooks virtual interface for requesting deallocation of pyobj when TensorImpl is being released and we own its pyobj, and implementation of the hooks in python_tensor.cpp - Modification of THPVariable to MaybeOwned its C++ tensor, directly using swolchok's nice new class And then, there is python_variable.cpp. Some of the changes follow the general algorithmic approach: - THPVariable_NewWithVar is simply adjusted to handle MaybeOwned and initializes as owend (like before) - THPVariable_Wrap adds the logic for reverting ownership back to PyObject when we take out an owning reference to the Python object - THPVariable_dealloc attempts to resurrect the Python object if the C++ tensor is live, and otherwise does the same old implementation as before - THPVariable_tryResurrect implements the resurrection logic. It is modeled after CPython code so read the cited logic and see if it is faithfully replicated - THPVariable_clear is slightly updated for MaybeOwned and also to preserve the invariant that if owns_pyobj, then pyobj_ is not null. This change is slightly dodgy: the previous implementation has a comment mentioning that the pyobj nulling is required to ensure we don't try to reuse the dead pyobj. I don't think, in this new world, this is possible, because the invariant says that the pyobj only dies if the C++ object is dead too. But I still unset the field for safety. And then... there is THPVariableMetaType. colesbury explained in the issue why this is necessary: when destructing an object in Python, you start off by running the tp_dealloc of the subclass before moving up to the parent class (much in the same way C++ destructors work). The deallocation process for a vanilla Python-defined class does irreparable harm to the PyObject instance (e.g., the finalizers get run) making it no longer valid attempt to resurrect later in the tp_dealloc chain. (BTW, the fact that objects can resurrect but in an invalid state is one of the reasons why it's so frickin' hard to write correct __del__ implementations). So we need to make sure that we actually override the tp_dealloc of the bottom most subclass of Tensor to make sure we attempt a resurrection before we start finalizing. To do this, we need to define a metaclass for Tensor that can override tp_dealloc whenever we create a new subclass of Tensor. By the way, it was totally not documented how to create metaclasses in the C++ API, and it took a good bit of trial error to figure it out (and the answer is now immortalized in https://stackoverflow.com/q/67077317/23845 -- the things that I got wrong in earlier versions of the PR included setting tp_basicsize incorrectly, incorrectly setting Py_TPFLAGS_HAVE_GC on the metaclass--you want to leave it unset so that it inherits, and determining that tp_init is what actually gets called when you construct a class, not tp_call as another not-to-be-named StackOverflow question suggests). Aside: Ordinarily, adding a metaclass to a class is a user visible change, as it means that it is no longer valid to mixin another class with a different metaclass. However, because _C._TensorBase is a C extension object, it will typically conflict with most other metaclasses, so this is not BC breaking. The desired new behavior of a subclass tp_dealloc is to first test if we should resurrect, and otherwise do the same old behavior. In an initial implementation of this patch, I implemented this by saving the original tp_dealloc (which references subtype_dealloc, the "standard" dealloc for all Python defined classes) and invoking it. However, this results in an infinite loop, as it attempts to call the dealloc function of the base type, but incorrectly chooses subclass type (because it is not a subtype_dealloc, as we have overridden it; see `b38601d496/Objects/typeobject.c (L1261)` ) So, with great reluctance, I must duplicate the behavior of subtype_dealloc in our implementation. Note that this is not entirely unheard of in Python binding code; for example, Cython `c25c3ccc4b/Cython/Compiler/ModuleNode.py (L1560)` also does similar things. This logic makes up the bulk of THPVariable_subclass_dealloc To review this, you should pull up the CPython copy of subtype_dealloc `b38601d496/Objects/typeobject.c (L1230)` and verify that I have specialized the implementation for our case appropriately. Among the simplifications I made: - I assume PyType_IS_GC, because I assume that Tensor subclasses are only ever done in Python and those classes are always subject to GC. (BTW, yes! This means I have broken anyone who has extend PyTorch tensor from C API directly. I'm going to guess no one has actually done this.) - I don't bother walking up the type bases to find the parent dealloc; I know it is always THPVariable_dealloc. Similarly, I can get rid of some parent type tests based on knowledge of how THPVariable_dealloc is defined - The CPython version calls some private APIs which I can't call, so I use the public PyObject_GC_UnTrack APIs. - I don't allow the finalizer of a Tensor to change its type (but more on this shortly) One alternative I discussed with colesbury was instead of copy pasting the subtype_dealloc, we could transmute the type of the object that was dying to turn it into a different object whose tp_dealloc is subtype_dealloc, so the stock subtype_dealloc would then be applicable. We decided this would be kind of weird and didn't do it that way. TODO: - More code comments - Figure out how not to increase the size of TensorImpl with the new bool field - Add some torture tests for the THPVariable_subclass_dealloc, e.g., involving subclasses of Tensors that do strange things with finalizers - Benchmark the impact of taking the GIL to release C++ side tensors (e.g., from autograd) - Benchmark the impact of adding a new metaclass to Tensor (probably will be done by separating out the metaclass change into its own change) - Benchmark the impact of changing THPVariable to conditionally own Tensor (as opposed to unconditionally owning it, as before) - Add tests that this actually indeed preserves the Python object Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27765125 Pulled By: ezyang fbshipit-source-id: 857f14bdcca2900727412aff4c2e2d7f0af1415a	2021-06-03 10:50:36 -07:00
Thomas J. Fan	7f2e620105	FIX Validates that weights are 2d in embedding (#59314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55185 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59314 Reviewed By: H-Huang Differential Revision: D28837753 Pulled By: jbschlosser fbshipit-source-id: 683378244c61b0937c95563f91ef87ab09fd1653	2021-06-02 12:52:21 -07:00
Natalia Gimelshein	12418a4f86	Back out "Revert D28664514: [pytorch][PR] various TensorIterator speed improvements" Summary: Original commit changeset: fcad039b7dc8 Test Plan: Existing tests Reviewed By: mruberry Differential Revision: D28720186 fbshipit-source-id: 14ac99ee2d7cafb86b20c979f8917beeefd616e1	2021-05-26 12:22:48 -07:00
Edward Yang	17fb651a3b	Make torch.Tensor(torch.tensor(1.0)) work (#58885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58885 Fixes #58884 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D28687510 Pulled By: ezyang fbshipit-source-id: 81325f501cc3e83cbac02f7c44ded9d396356bb8	2021-05-26 11:33:05 -07:00
Natalia Gimelshein	8398ebaa86	Revert D28664514: [pytorch][PR] various TensorIterator speed improvements Test Plan: revert-hammer Differential Revision: D28664514 (`8a28bbeeb9`) Original commit changeset: 2e03cf90b37a fbshipit-source-id: fcad039b7dc823fec8afa694ab74a7ac5011f8ab	2021-05-26 10:49:58 -07:00
Xiang Gao	c88333484f	[resubmit] masked_scatter thrust->cub (#58865 ) Summary: See ae7760cf50bb2cddff4663a07b9d68decf4b6c75 for the fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/58865 Reviewed By: mruberry Differential Revision: D28657940 Pulled By: ngimel fbshipit-source-id: 9155c710b0e18ebb3bfa2dabfdd117355ac30840	2021-05-25 11:00:50 -07:00
Natalia Gimelshein	8a28bbeeb9	various TensorIterator speed improvements (#58810 ) Summary: 1) remove pushing back to strides vector for 1D tensors, those strides are never used in the loop anyway 2) avoid calling get_data_ptrs unless necessary 3) don't call into assert_no_partial_overlap if tensorImpls are the same (assert_no_partial_overlap has this comparison too, but after a couple of nested function calls) 4) is_non_overlapping_and_dense instead of is_contiguous in memory overlap (which, for some reason, is faster than is_contiguous, though I hoped after is_contiguous is non-virtualized, it should be the same). Altogether, brings instruction count down from ~110K to 102735 for the following binary inplace benchmark: ``` In [2]: timer = Timer("m1.add_(b);", setup="at::Tensor m1=torch::empty({1}); at::Tensor b = torch::empty({1});", language="c++", timer=timeit.default_timer) ...: stats=timer.collect_callgrind(number=30, repeats=3) ...: print(stats[1].as_standardized().stats(inclusive=False)) ``` similar improvements for unary inplace. Upd: returned stride packing for now, counts is now 104295, so packing is worth ~ 52 instructions, we should think about how to remove it safely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58810 Reviewed By: bhosmer Differential Revision: D28664514 Pulled By: ngimel fbshipit-source-id: 2e03cf90b37a411d9994a7607402645f1d8f3c93	2021-05-25 10:44:51 -07:00
Serhat Yilmaz	b4f3a989da	[torch][repeat_interleave] Fix ambigious function call (#58881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58881 recently added new parameter to the function with PR: https://github.com/pytorch/pytorch/pull/58417 However, this introduced ambiguity when making call below: some_tensor.repeat_interleave(some_integer_value) Making it optional to avoid the issue. Reviewed By: ezyang, ngimel Differential Revision: D28653820 fbshipit-source-id: 5bc0b1f326f069ff505554b51e3b24d60e69c843	2021-05-25 00:31:32 -07:00
Yu Guo	74c12da451	add deterministic path for scatter_add_cuda for 1D tensors (#58761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58761 previously we implemented deterministic path for gather_backward in https://github.com/pytorch/pytorch/pull/55573, which replaced non-deterministic scatter_add_cuda. It's better to move it inside scatter_add so scatter_add can benefit from the deterministic path. Test Plan: buck test mode/opt //caffe2/test:torch_cuda -- test_scatter_add_one_dim_deterministic ✓ ListingSuccess: caffe2/test:torch_cuda - main (5.063) ✓ Pass: caffe2/test:torch_cuda - test_scatter_add_one_dim_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (30.909) ✓ Pass: caffe2/test:torch_cuda - main (30.909) Summary Pass: 2 ListingSuccess: 1 buck test mode/opt //caffe2/test:torch_cuda -- test_gather_backward ✓ ListingSuccess: caffe2/test:torch_cuda - main (4.613) ✓ Pass: caffe2/test:torch_cuda - test_gather_backward_deterministic_path_cuda (test_torch.TestTorchDeviceTypeCUDA) (25.369) buck test mode/opt //caffe2/test:torch_cuda -- test_nondeterministic_alert ✓ ListingSuccess: caffe2/test:torch_cuda - main (5.356) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_CTCLoss_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_put_accumulate_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad1d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_scatter_add_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_FractionalMaxPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveAvgPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AvgPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_grid_sample_2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_NLLLoss_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_put_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_median_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_gather_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_bincount_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_histc_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReflectionPad1d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_bilinear_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_bicubic_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_grid_sample_3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_MaxPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveAvgPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_EmbeddingBag_max_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_trilinear_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveMaxPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReflectionPad2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_FractionalMaxPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_kthvalue_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_linear_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - main (28.146) Summary Pass: 30 ListingSuccess: 1 Reviewed By: ngimel Differential Revision: D28585659 fbshipit-source-id: 1ad003d4130501ceff5f6a7a870ca3dbc9a3f1f2	2021-05-23 21:36:02 -07:00
kshitij12345	ee3ea31f12	OpInfo: split, split_with_sizes (#58184 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58184 Reviewed By: ngimel Differential Revision: D28627271 Pulled By: mruberry fbshipit-source-id: e6c0d2b005904ddebc9dab76685403530a6f6519	2021-05-23 15:47:35 -07:00
Serhat Yilmaz	4ca4640bae	[torch][repeat_interleave] remove stream syncronization if output size is given (#58417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58417 Same as title. Test Plan: Rely on CI signal. Update unit test to exercise new code path as well. Reviewed By: ngimel Differential Revision: D28482927 fbshipit-source-id: 3ec8682810ed5c8547b1e8d3869924480ce63dcd	2021-05-22 20:53:28 -07:00
Natalia Gimelshein	9e261de630	Revert D28547564: [pytorch][PR] masked_scatter thrust->cub Test Plan: revert-hammer Differential Revision: D28547564 (`5152cf8647`) Original commit changeset: 83aeddfaf702 fbshipit-source-id: d5259afb584e0f6c0a11de4d4cb3d56a2a562eb7	2021-05-21 09:18:34 -07:00
Xiang Gao	5152cf8647	masked_scatter thrust->cub (#56750 ) Summary: Benchmark: ```python import torch import itertools def run50_sync(f): for _ in range(50): f() torch.cuda.synchronize() run50_sync(lambda: torch.randperm(1000000, device='cuda')) def benchmark(M): a = torch.randn(M, device='cuda') m = torch.randint(1, (M,), dtype=torch.long, device='cuda').bool() v = torch.randn(M, device='cuda') torch.cuda.synchronize() %timeit run50_sync(lambda:a.masked_scatter_(m, v)) for M in (100, 1000, 100000, 10000000): print(M) benchmark(M) ``` Before: ``` 100 8.65 ms ± 80.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 1000 8.75 ms ± 72.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100000 9.27 ms ± 87.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 33.6 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` After ``` 100 8.04 ms ± 37.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 1000 8.09 ms ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100000 8.63 ms ± 76.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 31.9 ms ± 298 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56750 Reviewed By: ailzhang Differential Revision: D28547564 Pulled By: ngimel fbshipit-source-id: 83aeddfaf7023f9f9501c6b1e2faf91e8b6277b1	2021-05-20 10:27:58 -07:00
lezcano	452569dffb	cfloat and cdouble functions (#58137 ) Summary: This adds the methods `Tensor.cfloat()` and `Tensor.cdouble()`. I was not able to find the tests for `.float()` functions. I'd be happy to add similar tests for these functions once someone points me to them. Fixes https://github.com/pytorch/pytorch/issues/56014 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58137 Reviewed By: ejguan Differential Revision: D28412288 Pulled By: anjali411 fbshipit-source-id: ff3653cb3516bcb3d26a97b9ec3d314f1f42f83d	2021-05-13 21:13:37 -07:00
kshitij12345	6b1eeef601	OpInfo: squeeze (#58080 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58080 Reviewed By: agolynski Differential Revision: D28379485 Pulled By: mruberry fbshipit-source-id: 2b288036f595a5bd6b948a072494ee87f82322ce	2021-05-12 21:29:31 -07:00
Yu Guo	8a45006765	enable deterministic path for index_copy_cuda with index_put (#58144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58144 reland D28291041 (`14badd9929`), which was reverted due to a type error from Tuple[torch.Tensor], seems that mypy requires Tuple[torch.Tensor, torch.Tensor, torch.Tensor] Test Plan: buck test mode/opt //caffe2/test:torch_cuda -- test_index_copy_deterministic ✓ ListingSuccess: caffe2/test:torch_cuda - main (9.229) ✓ Pass: caffe2/test:torch_cuda - test_index_copy_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (25.750) ✓ Pass: caffe2/test:torch_cuda - main (25.750) Reviewed By: ngimel Differential Revision: D28383178 fbshipit-source-id: 38896fd6ddd670cfcce36e079aee7ad52adc2a28	2021-05-12 16:26:50 -07:00
kshitij12345	d09abf004c	OpInfo: narrow (#58082 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58082 Reviewed By: agolynski Differential Revision: D28379371 Pulled By: mruberry fbshipit-source-id: 484e560b1e6ceba234e497585ed308a27cd8b7a0	2021-05-12 15:39:15 -07:00
Mike Ruberry	c911c30520	Revert D28291041: enable deterministic path for index_copy_cuda with index_put Test Plan: revert-hammer Differential Revision: D28291041 (`14badd9929`) Original commit changeset: 7f0cf3ec7280 fbshipit-source-id: 6117bc6e5b2044ce70d4e4a19bccd8c183ea3702	2021-05-12 03:33:57 -07:00
Kurt Mohler	c7fb0a0e82	Remove beta warning for use_deterministic_algorithms (#58074 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58073 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58074 Reviewed By: ngimel Differential Revision: D28373676 Pulled By: mruberry fbshipit-source-id: cae9a92ebbf6ac5f8d3008aa6a6a9cd5c1041c9f	2021-05-12 03:30:12 -07:00
Yu Guo	14badd9929	enable deterministic path for index_copy_cuda with index_put (#57870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57870 this is similar to index_add_cuda with index_put accumulate = True Test Plan: buck test mode/opt //caffe2/test:torch_cuda -- test_index_copy_deterministic ✓ ListingSuccess: caffe2/test:torch_cuda - main (9.229) ✓ Pass: caffe2/test:torch_cuda - test_index_copy_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (25.750) ✓ Pass: caffe2/test:torch_cuda - main (25.750) Reviewed By: ngimel Differential Revision: D28291041 fbshipit-source-id: 7f0cf3ec72805f3617fd1de9ff03e1d49114fed8	2021-05-12 00:32:35 -07:00
Yu Guo	a07a0190f9	enable deterministic path for index_put with accumulate=False on CPU and CUDA (#57839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57839 we reuse the `index_put_accum_kernel`, rename it to `index_put_deterministic_kernel` and add a bool `accumulate` in `index_backward_kernel` Test Plan: buck test mode/opt //caffe2/test:torch -- test_index_put_non_accumulate_deterministic ✓ Pass: caffe2/test:torch - test_index_put_non_accumulate_deterministic_cpu (test_torch.TestTorchDeviceTypeCPU) (5.120) Summary Pass: 1 Skip: 1 ↻ caffe2/test:torch - test_index_put_non_accumulate_deterministic_meta (test_torch.TestTorchDeviceTypeMETA) ListingSuccess: 1 buck test mode/opt //caffe2/test:torch_cuda -- test_index_put_non_accumulate_deterministic ✓ ListingSuccess: caffe2/test:torch_cuda - main (6.397) ✓ Pass: caffe2/test:torch_cuda - test_index_put_non_accumulate_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (26.030) ✓ Pass: caffe2/test:torch_cuda - main (26.030) Summary Pass: 2 ListingSuccess: 1 Reviewed By: ngimel Differential Revision: D28290699 fbshipit-source-id: df8bbe7af2e72017566161b05b85737fda4ceb3f	2021-05-12 00:31:19 -07:00
Ilqar Ramazanli	8b816e9010	To implement gradient for Pytorch (#54617 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54617 Reviewed By: anjali411 Differential Revision: D28057452 Pulled By: iramazanli fbshipit-source-id: 9bd86679282d34f5e5393e6447121586517eb4f0	2021-05-11 18:52:20 -07:00
kshitij12345	502eb664ae	OpInfo: chunk (#57935 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57935 Reviewed By: ngimel Differential Revision: D28346217 Pulled By: mruberry fbshipit-source-id: 331995aa18fd2983fc2122a9af31fba43ab9839c	2021-05-11 10:16:10 -07:00
Edward Yang	da8cc355a3	Relax tp_new so that it is OK to call (#57544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57544 Instead of removing tp_new from the superclass (which causes super().__new__ to not work), I now still install tp_new on the superclass, but verify that you are not trying to directly construct _TensorBase. Fixes https://github.com/pytorch/pytorch/issues/57421 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28189475 Pulled By: ezyang fbshipit-source-id: 9397a3842a77f5428d182dd62244b42425bca827	2021-05-05 09:04:39 -07:00
Peter Bell	33eea146ee	torch.clamp with tensor min and max (#52695 ) Summary: Fixes gh-2793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52695 Reviewed By: mruberry Differential Revision: D27395977 Pulled By: ezyang fbshipit-source-id: f86aa240feb034d42e4c45447e72218f6a773c24	2021-05-03 12:56:16 -07:00
kshitij12345	154eca0309	OpInfo: ravel, view, view_as (#56910 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56910 Reviewed By: ngimel Differential Revision: D28141867 Pulled By: mruberry fbshipit-source-id: bff49d40d7e3bb36bc83d1405bd77f5529eeffe9	2021-05-02 22:10:36 -07:00
Ivan Yashchuk	eaf00bf7d4	Skip linalg.qr saved mode check if compiled without LAPACK (#56284 ) Summary: This PR also removes qr and eig tests from test/test_torch.py. They were not skipped if compiled without LAPACK and they are now replaced with OpInfos. Fixes https://github.com/pytorch/pytorch/issues/55929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56284 Reviewed By: ejguan Differential Revision: D27827077 Pulled By: mruberry fbshipit-source-id: 1dceb955810a9fa34bb6baaccbaf0c8229444d3a	2021-05-02 16:07:07 -07:00
kshitij12345	41099ef71c	OpInfo: mvlgamma (#56907 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56907 Reviewed By: astaff Differential Revision: D28118669 Pulled By: mruberry fbshipit-source-id: f54ad6dc64ddb6bcfca5c5c7fd8f395cd9761128	2021-05-01 20:51:01 -07:00
Wenlei Xie	20085f6d23	Support auto generation of device check (#56872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56872 ghstack-source-id: 127914018 Test Plan: auto test Reviewed By: ezyang Differential Revision: D27986429 fbshipit-source-id: 0da8413b0b8e6810fcea27ed1de499f11f68bd1f	2021-05-01 12:02:09 -07:00

1 2 3 4 5 ...

1831 Commits