pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Hameer Abbasi	b1907f5ebc	Fix pickling for Tensor subclasses (redo) (#47732 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47051 Redo of https://github.com/pytorch/pytorch/issues/47115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47732 Reviewed By: izdeby Differential Revision: D25465382 Pulled By: ezyang fbshipit-source-id: 3a8d57281a2d6f57415d5735d34ad307f3526638	2021-02-01 07:32:52 -08:00
mattip	345844d9d8	test, fix deepcopy of tensor with grad (#50663 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/3307 Previously, `self.grad` was not ~cloned~ deepcopied to the returned tensor in `deepcopy`. Added a test and an implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50663 Reviewed By: heitorschueroff Differential Revision: D26074811 Pulled By: albanD fbshipit-source-id: 536dad36415f1d03714b4ce57453f406ad802b8c	2021-01-26 16:19:53 -08:00
Taylor Robie	d31a760be4	move has_torch_function to C++, and make a special case object_has_torch_function (#48965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48965 This PR pulls `__torch_function__` checking entirely into C++, and adds a special `object_has_torch_function` method for ops which only have one arg as this lets us skip tuple construction and unpacking. We can now also do away with the Python side fast bailout for `Tensor` (e.g. `if any(type(t) is not Tensor for t in tensors) and has_torch_function(tensors)`) because they're actually slower than checking with the Python C API. Test Plan: Existing unit tests. Benchmarks are in #48966 Reviewed By: ezyang Differential Revision: D25590732 Pulled By: robieta fbshipit-source-id: 6bd74788f06cdd673f3a2db898143d18c577eb42	2021-01-10 19:23:35 -08:00
Taylor Robie	632a4401a6	clean up imports for tensor.py (#48964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48964 Stop importing overrides within methods now that the circular dependency is gone, and also organize the imports while I'm at it because they're a jumbled mess. Test Plan: Existing unit tests. Benchmarks are in #48966 Reviewed By: ngimel Differential Revision: D25590730 Pulled By: robieta fbshipit-source-id: 4fa929ce8ff548500f3e55d0475f3f22c1fccc04	2021-01-10 19:23:32 -08:00
Samuel Marks	e6779d4357	[*.py] Rename "Arguments:" to "Args:" (#49736 ) Summary: I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings. ```sh (pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" \| paste -s -d+ -- \| bc)"; done Args: 1095 Arguments: 0336 ``` It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per: - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md) - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md) - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst) Therefore, only `Args:` is valid. This PR replaces them throughout the codebase. PS: For related PRs, see tensorflow/tensorflow/pull/45420 PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736 Reviewed By: albanD Differential Revision: D25710534 Pulled By: soumith fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619	2020-12-28 09:34:47 -08:00
Jeffrey Wan	5ab9593098	`torch.reciprocal`: promote integer inputs to float (#49102 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49091 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49102 Reviewed By: VitalyFedyunin Differential Revision: D25639541 Pulled By: soulitzer fbshipit-source-id: 1dd360bd7b77f106d606143d8d3961610bac8cb7	2020-12-18 16:17:30 -08:00
Taylor Robie	0639387ff1	move Tensor comparisons back to C (#48018 ) Summary: It seems that the machinery to handle comparison method in C rather than Python already exists, unless I'm missing something. (There is a wrapper for `TypeError_to_NotImplemented_`, and Python code gen handles `__torch_function__` which are the two things `_wrap_type_error_to_not_implemented` is doing) The performance change is quite stark: ``` import torch from torch.utils.benchmark import Timer global_dict = { "x": torch.ones((2, 2)), "y_scalar": torch.ones((1,)), "y_tensor": torch.ones((2, 1)), } for stmt in ("x == 1", "x == y_scalar", "x == y_tensor"): print(Timer(stmt, globals=global_dict).blocked_autorange(min_run_time=5), "\n") ``` ### Before: ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7f3d1289dc10> x == 1 Median: 12.86 us IQR: 0.65 us (12.55 to 13.20) 387 measurements, 1000 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7f3d1289d1d0> x == y_scalar Median: 6.03 us IQR: 0.33 us (5.91 to 6.24) 820 measurements, 1000 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7f3d2b9e2050> x == y_tensor Median: 6.34 us IQR: 0.33 us (6.16 to 6.49) 790 measurements, 1000 runs per measurement, 1 thread ``` ### After: ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7fbdba2a16d0> x == 1 Median: 6.88 us IQR: 0.40 us (6.74 to 7.14) 716 measurements, 1000 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7fbdd2e07ed0> x == y_scalar Median: 2.98 us IQR: 0.19 us (2.89 to 3.08) 167 measurements, 10000 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7fbdd33e4510> x == y_tensor Median: 3.03 us IQR: 0.13 us (2.97 to 3.10) 154 measurements, 10000 runs per measurement, 1 thread ``` There's still a fair bit of work left. Equivalent NumPy is about 6x faster than the new overhead, and PyTorch 0.4 is about 1.25 us across the board. (No scalar cliff.) But it's a start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48018 Reviewed By: gchanan Differential Revision: D25026257 Pulled By: robieta fbshipit-source-id: 093b06a1277df25b4b7cc0d4e585b558937b10a1	2020-11-18 15:25:41 -08:00
Hameer Abbasi	7908bf27d5	Fix output type of torch.max for Tensor subclasses. (#47110 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47090 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47110 Reviewed By: ngimel Differential Revision: D24649568 Pulled By: ezyang fbshipit-source-id: 9374cf0c562de78e520bcb03415db273c1dd76a3	2020-11-10 19:45:36 -08:00
Edward Yang	35491412d1	Revert D24649817: [pytorch][PR] Fix pickling for Tensor subclasses. Test Plan: revert-hammer Differential Revision: D24649817 (`c4209f1115`) Original commit changeset: 1872faa36030 fbshipit-source-id: b9832cea45552bd8776909118c4324fbd61fd414	2020-11-05 10:25:48 -08:00
Hameer Abbasi	c4209f1115	Fix pickling for Tensor subclasses. (#47115 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47051 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47115 Reviewed By: ejguan Differential Revision: D24649817 Pulled By: ezyang fbshipit-source-id: 1872faa3603085f07c0a8a026404161d0715720d	2020-11-04 19:25:32 -08:00
Jeffrey Wan	f5073b0c5a	Add `inputs` argument to `autograd.backward()` (#46855 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46373 As noted in https://github.com/pytorch/pytorch/issues/46373, there needs to be a flag passed into the engine that indicates whether it was executed through the backward api or grad api. Tentatively named the flag `accumulate_grad` since functionally, backward api accumulates grad into .grad while grad api captures the grad and returns it. Moving changes not necessary to the python api (cpp, torchscript) to a new PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46855 Reviewed By: ngimel Differential Revision: D24649054 Pulled By: soulitzer fbshipit-source-id: 6925d5a67d583eeb781fc7cfaec807c410e1fc65	2020-11-02 14:32:38 -08:00
Nikita Vedeneev	c31ced4246	make `torch.lu` differentiable. (#46284 ) Summary: As per title. Limitations: only for batches of squared full-rank matrices. CC albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/46284 Reviewed By: zou3519 Differential Revision: D24448266 Pulled By: albanD fbshipit-source-id: d98215166268553a648af6bdec5a32ad601b7814	2020-10-23 10:13:46 -07:00
Michael Carilli	5640b79bf8	Allow consumer ops to sync on GraphRoot's gradient (#45787 ) Summary: Currently, a GraphRoot instance doesn't have an associated stream. Streaming backward synchronization logic assumes the instance ran on the default stream, and tells consumer ops to sync with the default stream. If the gradient the GraphRoot instance passes to consumer backward ops was populated on a non-default stream, we have a race condition. The race condition can exist even if the user doesn't give a manually populated gradient: ```python with torch.cuda.stream(side_stream): # loss.backward() implicitly synthesizes a one-element 1.0 tensor on side_stream # GraphRoot passes it to consumers, but consumers first sync on default stream, not side_stream. loss.backward() # Internally to backward(), streaming-backward logic takes over, stuff executes on the same stream it ran on in forward, # and the side_stream context is irrelevant. GraphRoot's interaction with its first consumer(s) is the spot where # the side_stream context causes a problem. ``` This PR fixes the race condition by associating a GraphRoot instance, at construction time, with the current stream(s) on the device(s) of the grads it will pass to consumers. (i think this relies on GraphRoot executing in the main thread, before backward thread(s) fork, because the grads were populated on the main thread.) The test demonstrates the race condition. It fails reliably without the PR's GraphRoot diffs and passes with the GraphRoot diffs. With the GraphRoot diffs, manually populating an incoming-gradient arg for `backward` (or `torch.autograd.grad`) and the actual call to `autograd.backward` will have the same stream-semantics relationship as any other pair of ops: ```python # implicit population is safe with torch.cuda.stream(side_stream): loss.backward() # explicit population in side stream then backward in side stream is safe with torch.cuda.stream(side_stream): kickoff_grad = torch.ones_like(loss) loss.backward(gradient=kickoff_grad) # explicit population in one stream then backward kickoff in another stream # is NOT safe, even with this PR's diffs, but that unsafety is consistent with # stream-semantics relationship of any pair of ops kickoff_grad = torch.ones_like(loss) with torch.cuda.stream(side_stream): loss.backward(gradient=kickoff_grad) # Safe, as you'd expect for any pair of ops kickoff_grad = torch.ones_like(loss) side_stream.wait_stream(torch.cuda.current_stream()) with torch.cuda.stream(side_stream): loss.backward(gradient=kickoff_grad) ``` This PR also adds the last three examples above to cuda docs and references them from autograd docstrings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45787 Reviewed By: nairbv Differential Revision: D24138376 Pulled By: albanD fbshipit-source-id: bc4cd9390f9f0358633db530b1b09f9c1080d2a3	2020-10-07 08:53:53 -07:00
Mike Ruberry	6d37126a10	Makes rdiv consistent with div (#45407 ) Summary: In addition to making rdiv consistent with div, this PR significantly expands division testing, accounting for floor_divide actually performing truncation division, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45407 Reviewed By: ngimel Differential Revision: D23974967 Pulled By: mruberry fbshipit-source-id: 82b46b07615603f161ab7cd1d3afaa6d886bfe95	2020-09-29 08:34:01 -07:00
Rong Rong	bea7901e38	Enable torch.tensor typechecks (#45077 ) Summary: this fixes https://github.com/pytorch/pytorch/issues/42983. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45077 Reviewed By: ezyang Differential Revision: D23842493 Pulled By: walterddr fbshipit-source-id: 1c516a5ff351743a187d00cba7ed0be11678edf1	2020-09-24 08:22:06 -07:00
Rong Rong	4a0aa69a66	Fix undefined variable 'namedshape' in tensor.py (#45085 ) Summary: Hot Fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/45085 Reviewed By: malfet, seemethere Differential Revision: D23824444 Pulled By: walterddr fbshipit-source-id: c9f37b394d281b7ef44b14c30699bb7510a362a7	2020-09-22 08:52:47 -07:00
Peter Bell	caea1adc35	Complex support for stft and istft (#43886 ) Summary: Ref https://github.com/pytorch/pytorch/issues/42175, fixes https://github.com/pytorch/pytorch/issues/34797 This adds complex support to `torch.stft` and `torch.istft`. Note that there are really two issues with complex here: complex signals, and returning complex tensors. ## Complex signals and windows `stft` currently assumes all signals are real and uses `rfft` with `onesided=True` by default. Similarly, `istft` always takes a complex fourier series and uses `irfft` to return real signals. For `stft`, I now allow complex inputs and windows by calling the full `fft` if either are complex. If the user gives `onesided=True` and the signal is complex, then this doesn't work and raises an error instead. For `istft`, there's no way to automatically know what to do when `onesided=False` because that could either be a redundant representation of a real signal or a complex signal. So there, the user needs to pass the argument `return_complex=True` in order to use `ifft` and get a complex result back. ## stft returning complex tensors The other issue is that `stft` returns a complex result, represented as a `(... X 2)` real tensor. I think ideally we want this to return proper complex tensors but to preserver BC I've had to add a `return_complex` argument to manage this transition. `return_complex` defaults to false for real inputs to preserve BC but defaults to True for complex inputs where there is no BC to consider. In order to `return_complex` by default everywhere without a sudden BC-breaking change, a simple transition plan could be: 1. introduce `return_complex`, defaulted to false when BC is an issue but giving a warning. (this PR) 2. raise an error in cases where `return_complex` defaults to false, making it a required argument. 3. change `return_complex` default to true in all cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43886 Reviewed By: glaringlee Differential Revision: D23760174 Pulled By: mruberry fbshipit-source-id: 2fec4404f5d980ddd6bdd941a63852a555eb9147	2020-09-18 01:39:47 -07:00
Nikita Shulga	0c01f136f3	[BE] Use f-string in various Python functions (#44161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44161 Reviewed By: seemethere Differential Revision: D23515874 Pulled By: malfet fbshipit-source-id: 868cf65aedd58fce943c08f8e079e84e0a36df1f	2020-09-04 07:38:25 -07:00
Supriya Rao	4db8ca1129	[quant] Create nn.quantized.dynamic.EmbeddingBag (#43088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43088 Create quantized module that the user can use to perform embedding bag quantization The module uses the EmbeddingPackedParams to store the weights which can be serialized /deserialized using TorchBind custom classes (C++ get/setstate code) Following PR will add support for `from_float` to convert from float to quantized module Test Plan: python test/test_quantization.py TestDynamicQuantizedModule.test_embedding_bag_api Imported from OSS Reviewed By: vkuzo Differential Revision: D23167519 fbshipit-source-id: 029d7bb44debf78c4ef08bfebf267580ed94d033	2020-08-21 11:45:02 -07:00
Adam Thompson	1c616c5ab7	Add complex tensor dtypes for the __cuda_array_interface__ spec (#42918 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42860 The `__cuda_array_interface__` tensor specification is missing the appropriate datatypes for the newly merged complex64 and complex128 tensors. This PR addresses this issue by casting: * `torch.complex64` to 'c8' * `torch.complex128` to 'c16' Pull Request resolved: https://github.com/pytorch/pytorch/pull/42918 Reviewed By: izdeby Differential Revision: D23130219 Pulled By: anjali411 fbshipit-source-id: 5f8ee8446a71cad2f28811afdeae3a263a31ad11	2020-08-14 10:26:23 -07:00
Heitor Schueroff de Souza	62bd2ddec7	Implemented non-named version of unflatten (#42563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42563 Moved logic for non-named unflatten from python nn module to aten/native to be reused by the nn module later. Fixed some inconsistencies with doc and code logic. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23030301 Pulled By: heitorschueroff fbshipit-source-id: 7c804ed0baa5fca960a990211b8994b3efa7c415	2020-08-12 13:14:28 -07:00
Hameer Abbasi	3d46e02ea1	Add __torch_function__ for methods (#37091 ) Summary: According to pytorch/rfcs#3 From the goals in the RFC: 1. Support subclassing `torch.Tensor` in Python (done here) 2. Preserve `torch.Tensor` subclasses when calling `torch` functions on them (done here) 3. Use the PyTorch API with `torch.Tensor`-like objects that are _not_ `torch.Tensor` subclasses (done in https://github.com/pytorch/pytorch/issues/30730) 4. Preserve `torch.Tensor` subclasses when calling `torch.Tensor` methods. (done here) 5. Propagating subclass instances correctly also with operators, using views/slices/indexing/etc. (done here) 6. Preserve subclass attributes when using methods or views/slices/indexing. (done here) 7. A way to insert code that operates on both functions and methods uniformly (so we can write a single function that overrides all operators). (done here) 8. The ability to give external libraries a way to also define functions/methods that follow the `__torch_function__` protocol. (will be addressed in a separate PR) This PR makes the following changes: 1. Adds the `self` argument to the arg parser. 2. Dispatches on `self` as well if `self` is not `nullptr`. 3. Adds a `torch._C.DisableTorchFunction` context manager to disable `__torch_function__`. 4. Adds a `torch::torch_function_enabled()` and `torch._C._torch_function_enabled()` to check the state of `__torch_function__`. 5. Dispatches all `torch._C.TensorBase` and `torch.Tensor` methods via `__torch_function__`. TODO: - [x] Sequence Methods - [x] Docs - [x] Tests Closes https://github.com/pytorch/pytorch/issues/28361 Benchmarks in https://github.com/pytorch/pytorch/pull/37091#issuecomment-633657778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37091 Reviewed By: ngimel Differential Revision: D22765678 Pulled By: ezyang fbshipit-source-id: 53f8aa17ddb8b1108c0997f6a7aa13cb5be73de0	2020-08-05 20:44:13 -07:00
Tongzhou Wang	c935712d58	Use unbind for tensor.__iter__ (#40884 ) Summary: Unbind, which has a special backward with cat, is arguably better than multiple selects, whose backward is creating & adding a bunch of tensors as big as `self`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40884 Reviewed By: pbelevich Differential Revision: D22363376 Pulled By: zou3519 fbshipit-source-id: 0911cdbb36f9a35d1b95f315d0a2f412424e056d	2020-07-06 10:53:15 -07:00
Michael Carilli	8066fba226	[RELAND2] Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#40358 ) Summary: https://github.com/pytorch/pytorch/pull/40129 fixed the error responsible for the first revert, but exposed another error in the same test. This PR is intended as the "master copy" for merge, and it runs on full CI. Two other PRs (restricted to run on a small subset of CI) supporting debugging DDP failures/hangs with multiple devices per process (`test_c10d.py:DistributedDataParallelTest.test_grad_layout_1devicemodule_2replicaperprocess`). - https://github.com/pytorch/pytorch/pull/40290 tries the test with purely rowmajor contiguous params on an untouched master. In other words https://github.com/pytorch/pytorch/pull/40290 contains none of this PR's diffs aside from the test itself. - https://github.com/pytorch/pytorch/pull/40178, for comparison, tries the test with this PR's diffs. Both fail the same way, indicating failure is unrelated to this PR's other diffs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40358 Differential Revision: D22165785 Pulled By: albanD fbshipit-source-id: ac7cdd79af5c080ab74341671392dca8e717554e	2020-06-22 17:13:21 -07:00
Mike Ruberry	4f761f325c	Back out "[pytorch][PR] Removes dunder div" Summary: NVIDIA's Apex is updating to no longer rely on this behavior, but we're reverting this Python2->Python3 update to unblock internal apex users. Test Plan: Sandcaslte + OSS CI. Reviewed By: ngimel Differential Revision: D22146782 fbshipit-source-id: f9483d2cbf9dc3a469ad48a6c863edea3ae51070	2020-06-19 18:31:20 -07:00
Alban Desmaison	08227fea4f	Revert D22079377: [pytorch][PR] [RELAND] Change AccumulateGrad to yield `.grad`s that match weights' memory layout Test Plan: revert-hammer Differential Revision: D22079377 Original commit changeset: 9bd2b7e0c34f fbshipit-source-id: c22cc349d790caa574eace0d63980854c33e5a59	2020-06-17 10:17:27 -07:00
Michael Carilli	1ec8ece2b9	[RELAND] Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#40129 ) Summary: https://github.com/pytorch/pytorch/pull/34904 was reverted because it had a misconfigured 4 GPU test that for some reason wasn't caught by external CI ([example failure](https://app.circleci.com/pipelines/github/pytorch/pytorch/181719/workflows/cfb37cd9-9a0c-4738-898b-d683934cd308/jobs/5868948/steps)). This PR reverts the revert, and adds diffs that should repair the misconfigured test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40129 Differential Revision: D22079377 Pulled By: albanD fbshipit-source-id: 9bd2b7e0c34fdaf887497b52037cfe82cba709c1	2020-06-17 09:02:54 -07:00
Mike Ruberry	9d588f7ce2	Removes dunder div (#39151 ) Summary: BC-breaking note: If a user is using one of these dunders directly they will not longer be available. Users should update to Python3 compatible dunders. Original PR note: `__div__` (and `__idiv__` and `__rdiv__`) are no longer special dunders in Python3. This PR replaces them with the `__truediv__` (`__itrudediv__`, `__rtruediv__`) dunders, since we no longer support Python2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39151 Differential Revision: D22075713 Pulled By: mruberry fbshipit-source-id: d318b47b51f7cc4c3728b1606a34d81e49ba0fa1	2020-06-16 23:02:20 -07:00
Alban Desmaison	f1e575a0bf	Revert D20496044: [pytorch][PR] Change AccumulateGrad to yield `.grad`s that match weights' memory layout Test Plan: revert-hammer Differential Revision: D20496044 Original commit changeset: 248d680f4b1b fbshipit-source-id: 6462b25e3fb9c8596c1da443389089f09c32df4d	2020-06-16 10:38:40 -07:00
Michael Carilli	2beb9690c3	Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#34904 ) Summary: Currently, whether `AccumulateGrad` [steals](`67cb018462/torch/csrc/autograd/functions/accumulate_grad.h (L42)`) or [clones](`67cb018462/torch/csrc/autograd/functions/accumulate_grad.h (L80)`) an incoming gradient, the gradient ends up rowmajor contiguous, regardless of its param's layout. If the param's layout is channels last, or otherwise not rowmajor contigous, later kernels that apply gradients to params are forced into an uncoalesced memory access pattern for either the param or the gradient. This may not sound like a big deal but for any binary op on large tensors it's a >3X increase in gmem traffic => 3X slowdown. The present PR changes `AccumulateGrad` to prefer, where possible, stashing gradients that match their params' layouts (["Gradient Layout Contract"](https://github.com/pytorch/pytorch/pull/34904/files#diff-ef1a56d24f66b280dcdb401502d6a796R29-R38)). Allowing `AccumulateGrad` to stash non-rowmajor-contiguous grads means DDP allreduces and DP reduces must allow non-rowmajor-contiguous grads. This PR extends DDP and DP to allow gradients with non-rowmajor-contiguous strides as long as their layout is nonoverlapping and dense. For good measure, I include changes that allow all five nccl primitives (allreduce, reduce, broadcast, allgather, reducescatter) to act on non-rowmajor-contiguous tensors (again as long as each input's layout is nonoverlapping and dense, and as long as all tensors participating in a given collective have the same layout). The primitive comm changes aren't necessary to enable the DDP changes, but I wasn't sure this would end up true until I had written both sets of changes. I think primitive comm enablement is reasonable to keep in the PR, especially since the code for it is simple. Channels last params will be a major beneficiary of this PR, but I don't see it as channels-last-specific fix. The spirit is layout matching in general: - Grads should be stashed with memory layouts matching their params. - Src and dst tensors on opposite ends of collectives should have matching dense layouts. This PR also updates autograd docs to describe potential BC-breaking changes below. ## BC notes ngimel albanD gchanan #### BC-breaking In the common case where the user lets AccumulateGrad decide grad layouts, strides for grads of dense but non-rowmajor-contiguous params will change. Any user code that was accustomed to `view(-1)`ing these grads will break. Also, the circumstances under which a grad can be stolen directly from the backward function that created it, as opposed to deep-copied by AccumulateGrad, have changed. In most cases we expect silent performance improvement, because we expect channels-last-aware backward kernels will create channels last gradients for channels last params. Now those can be stolen, whereas before this PR they were cloned and made rowmajor contiguous. IMO this is a mild BC breakage. Param backward hooks still see grads come in with whatever format the backward kernel gave them. The only BC breakage potential I see is if user code relies somehow on a grad in a hook having or not having the same deep memory as the eventual `param.grad`. Any such users hopefully know they're off the edge of the map and understand how to update their expectations. #### BC escape hatches At alband's recommendation, this PR's changes to AccumulateGrad do not alter the pre-PR code's decisions about whether grad is accumulated in or out of place. Accumulations of new grads onto an existing `.grad` attribute were (usually) in-place before this PR and remain in-place after this PR, keeping the existing `.grad`'s layout. After this PR, if the user wants to force accumulation into a grad with a particular layout, they can preset `param.grad` to a zeroed tensor with the desired strides or call `grad.contiguous(desired format)`. This likely won't be as performant as letting AccumulateGrad establish grad layouts by cloning or stealing grads with contract-compliant strides, but at least users have a control point. One limitation (present before this PR and unchanged by this PR): Presetting `param.grad` does not ensure in-place accumulation all the time. For example, if `create_graph=True`, or if incoming `new_grad` is dense and existing `variable_grad` is sparse, accumulation occurs out of place, and the out-of-place result may not match the existing grad's strides. ---------------------------- I also noticed some potential DDP improvements that I considered out of scope but want to mention for visibility: 1. make sure Reducer's ops sync with AccumulateGrad streams 2. ~to reduce CPU overhead and incur fewer kernel launches, lazily create flat `contents` tensors by a single `cat` kernel only when a bucket is full, instead of `copy_`ing grads into `contents` individually as soon as they are received.~ PR includes a [minor change](https://github.com/pytorch/pytorch/pull/34904/files#diff-c269190a925a4b0df49eda8a8f6c5bd3R312-R315) to divide grads while copying them into flat buffers, instead of copying them in, then dividing separately. Without cat+div fusion, div-while-copying is the best we can do. 3. https://github.com/pytorch/pytorch/issues/38942 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34904 Differential Revision: D20496044 Pulled By: albanD fbshipit-source-id: 248d680f4b1bf77b0a986451844ec6e254469217	2020-06-16 08:43:31 -07:00
Nikita Shulga	c6b69a4e4d	Delete Python <= 3.5 specific checks from the code (#39879 ) Summary: Remove PY3 and PY34 checks from `torch/testing/_internal/common_utils.py` Remove PY35 global var from `torch.jit.annotations` Always call `try_get_real_signature` in `torch/jit/annotations.py` Use `map` instead of `imap`, since Python-2 is no longer support, so map is always lazy. Remove all pre Python-3.6 checks from `torch/_six.py` and `torch/_appdirs.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39879 Differential Revision: D22037811 Pulled By: malfet fbshipit-source-id: af0c79f976569c2059d39ecb49c6b8285161734f	2020-06-15 08:16:06 -07:00
Alban Desmaison	d6715e6364	Improve warnings to actually point at user code (#39143 ) Summary: These warning's goal is to show the user where to be careful in their code. So make them point to the user's code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39143 Differential Revision: D21764201 Pulled By: albanD fbshipit-source-id: f1369d1b0e71d93af892ad3b7b1b3030e6699c59	2020-05-29 06:45:24 -07:00
Brian	389e16c33b	`torch.pow` Add type promotion support and fix issue with __rpow__ (#37098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37098 ### Cherry-picked from another stack: Some code review already occurred here: https://github.com/pytorch/pytorch/pull/32582 ### Summary: Fixes: https://github.com/pytorch/pytorch/issues/32436 The issue caused incorrect handling of dtypes for scalar tensor. e.g. before this change: ``` >>> 5.5 torch.ones(5, dtype=torch.int32) tensor([5, 5, 5, 5, 5], dtype=torch.int32) ``` should return a float tensor. Also fixes a number of incorrect cases: * tensors to negative powers were giving incorrect results (1 instead of 0 or error) * Behavior wasn't consistent between cuda/cpu * large_value ** 1 in some cases gave a result not equal to large_value because of truncation in conversion to double and back. BC-breaking: Previously incorrect behavior (in 1.4): ``` >>> a tensor([1, 1, 1, 1, 1], dtype=torch.int32) >>> a.pow_(.5) tensor([1, 1, 1, 1, 1], dtype=torch.int32) ``` After this change: `RuntimeError: result type Float can't be cast to the desired output type Int` Test Plan: Imported from OSS Differential Revision: D21686207 Pulled By: nairbv fbshipit-source-id: e797e7b195d224fa46404f668bb714e312ea78ac	2020-05-26 08:29:51 -07:00
Robert Porter	8fe2a5e91b	Fixes type annotations for named tensors #27846 (#36890 ) Summary: This enables type checking for named tensors, and fixes the underlying problems. The bulk of the fix is modifying `gen_pyi.py` to generate reasonable types in `torch/__init__.pyi`. I took two approaches: First, I tried to take a generic approach and added `DimnameList` to the magic list of variable argument lists. Unfortunately that was insufficient for many of the method signatures, so I also added manual definitions for `rename`, `refine_names`, and `unflatten` in `__init__.pyi.in`. Finally there were a few problems in the doctests that had to be cleaned up so that `test/test_type_hints.py` will run successfully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36890 Differential Revision: D21259192 Pulled By: zou3519 fbshipit-source-id: 2a9e7d7bec9be5ae3ae2995078c6abfa3eca103c	2020-04-28 06:51:22 -07:00
anjali411	4f3946a89b	Added complex dtypes to get_all_math_dtypes, complex acc type for cpu, fixed rdiv and pow for complex (#37193 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/36730 https://github.com/pytorch/pytorch/issues/36057 Partially resolves: https://github.com/pytorch/pytorch/issues/36671 ``` >>> 2j / torch.tensor([4], dtype = torch.complex64) tensor([(0.0000+0.5000j)], dtype=torch.complex64) >>> 1 / torch.tensor(3+4j) tensor((0.1200-0.1600j), dtype=torch.complex64) ``` rdiv is more generally broken for all dtypes because it doesn't promote the types properly eg. ``` >>> 1 / torch.tensor(2) tensor(0) >>> 2j / torch.tensor(4) tensor(0) ``` so that issue should be fixed in a separate PR Adding CPU acc types for complex Added cumsum, cumprod for complex dtypes Added complex dtypes to get_all_math_dtypes to expand testing for complex dtypes Old PR - https://github.com/pytorch/pytorch/pull/36747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37193 Differential Revision: D21229373 Pulled By: anjali411 fbshipit-source-id: 8a086136d8c10dabe62358d276331e3f22bb2342	2020-04-24 15:05:50 -07:00
moto	5a27ec09b8	Add Inverse Short Time Fourier Transform in ATen native (#35569 ) Summary: Ported `torchaudio`'s implementation (test, and documentation as well) to ATen. Note - Batch packing/unpacking is performed in Python. ATen implementation expects 4D input tensor. - The way `hop_length` is initialized in the same way as `stft` implementation. [The Torchaudio's version tried to mimic the same behavior but slightly different](`7da61a4bee/torchaudio/functional.py (L152-L157)`). Closes https://github.com/pytorch/pytorch/issues/34827 Relates https://github.com/pytorch/pytorch/issues/3775 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35569 Differential Revision: D21178090 Pulled By: mthrok fbshipit-source-id: 2701a8b241a36a6fb1b740c2fb2b07cb938185d4	2020-04-24 12:14:55 -07:00
Ailing Zhang	efcbcca454	Revert D21138687: [pytorch][PR] Added complex dtypes to get_all_math_dtypes, complex acc type for cpu, fixed rdiv and pow for complex Test Plan: revert-hammer Differential Revision: D21138687 Original commit changeset: ad3602ccf86c fbshipit-source-id: 69eb031c1a7c3d5e4b9f4241fbdada8d5980535d	2020-04-22 14:49:45 -07:00
David Reiss	e75fb4356b	Remove (most) Python 2 support from Python code (#35615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615 Python 2 has reached end-of-life and is no longer supported by PyTorch. Now we can clean up a lot of cruft that we put in place to support it. These changes were all done manually, and I skipped anything that seemed like it would take more than a few seconds, so I think it makes sense to review it manually as well (though using side-by-side view and ignoring whitespace change might be helpful). Test Plan: CI Differential Revision: D20842886 Pulled By: dreiss fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed	2020-04-22 09:23:14 -07:00
anjali411	25eb250d77	Added complex dtypes to get_all_math_dtypes, complex acc type for cpu, fixed rdiv and pow for complex (#36747 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/36730 https://github.com/pytorch/pytorch/issues/36057 Partially resolves: https://github.com/pytorch/pytorch/issues/36671 ``` >>> 2j / torch.tensor([4], dtype = torch.complex64) tensor([(0.0000+0.5000j)], dtype=torch.complex64) >>> 1 / torch.tensor(3+4j) tensor((0.1200-0.1600j), dtype=torch.complex64) ``` rdiv is more generally broken for all dtypes because it doesn't promote the types properly eg. ``` >>> 1 / torch.tensor(2) tensor(0) >>> 2j / torch.tensor(4) tensor(0) ``` so that issue should be fixed in a separate PR Adding CPU acc types for complex Added cumsum, cumprod for complex dtypes Added complex dtypes to get_all_math_dtypes to expand testing for complex dtypes Pull Request resolved: https://github.com/pytorch/pytorch/pull/36747 Differential Revision: D21138687 Pulled By: anjali411 fbshipit-source-id: ad3602ccf86c70294a6e71e564cb0d46c393dfab	2020-04-22 08:52:41 -07:00
Zhu, Haozhe	bd3c6e8e91	avoid large vector copy when query per_channel q_params (#31040 ) Summary: The quantizer use std::vector to save per_channel scales and zero_points, but when query scales(zero_points), it requires to return tensor. These lead to use std::vector to initialize tensors and it dose cost lots of time. So I change quantizer to save per_channel scales and zero_points by using tensor directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31040 Differential Revision: D19701070 Pulled By: jerryzh168 fbshipit-source-id: 9043f16c44b74dd8289b8474e540171765a7f92a	2020-02-19 16:24:24 -08:00
Brian Stark	17d4ef9e9e	Support using scalar tensor for split (#32493 ) Summary: split requires an int input, however in tracing operators such as size(axis) return a tensor, which is different behavior than when not tracing. As such need to modify split to handle these cases. Fixes https://github.com/pytorch/pytorch/issues/27551 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32493 Reviewed By: hl475 Differential Revision: D19538254 Pulled By: houseroad fbshipit-source-id: c8623009de5926aa38685e08121f4b48604bd8c0	2020-02-07 17:16:43 -08:00
Alban Desmaison	717274c001	Add useful warnings for t.grad when it won't be populated for known reasons (#30531 ) Summary: Fix https://github.com/pytorch/pytorch/issues/2362 and https://github.com/pytorch/pytorch/issues/19778 To avoid issues with frozen model, we only consider warning for Tensors that require gradients and are neither leafs nor retain gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30531 Differential Revision: D18832767 Pulled By: albanD fbshipit-source-id: 743e863dc14ab57713e66da78b2e4d759dfba0ff	2019-12-11 09:47:18 -08:00
Elias Ellison	f48a8901c5	Add floor_divide function (#30493 ) Summary: Adds `torch.floor_divide` following the numpy's `floor_divide` api. I only implemented the out-of-place version, I can add the inplace version if requested. Also fixes https://github.com/pytorch/pytorch/issues/27512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30493 Differential Revision: D18896211 Pulled By: eellison fbshipit-source-id: ee401c96ab23a62fc114ed3bb9791b8ec150ecbd	2019-12-10 07:51:39 -08:00
Michael Suo	62b10721fb	Actually make flake8 do something (#30892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30892 Fixes all outstanding lints and actually installs a properly configured flake8 Test Plan: Imported from OSS Differential Revision: D18862825 Pulled By: suo fbshipit-source-id: 08e9083338a7309272e17bb803feaa42e348aa85	2019-12-06 17:50:50 -08:00
Seiya Tokui	1d7b40f1c4	Fix reading `__cuda_array_interface__` without strides (#24947 ) Summary: When converting a contiguous CuPy ndarray to Tensor via `__cuda_array_interface__`, an error occurs due to incorrect handling of default strides. This PR fixes this problem. It makes `torch.tensor(cupy_ndarray)` works for contiguous inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24947 Differential Revision: D18838986 Pulled By: ezyang fbshipit-source-id: 2d827578f54ea22836037fe9ea8735b99f2efb42	2019-12-06 07:36:27 -08:00
Igor Fedan	75309b45f3	explicitly provide memory format when calling to clone() at Indexing.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28660 Test Plan: Imported from OSS Differential Revision: D18333346 Pulled By: ifedan fbshipit-source-id: 06590205d883a5096388a4ae318389244130972d	2019-11-07 05:38:32 -08:00
Jerry Zhang	23193c155f	Quantized Tensor support copy (#28612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28612 att Test Plan: python test/test_quantized_tensor.py Imported from OSS Differential Revision: D18255247 fbshipit-source-id: 814b12640fdf9d79b27482ee642ce430dbaeea68	2019-11-01 17:40:17 -07:00
Peter Bell	f33813d589	Return NotImplemented from all binary math ops (#27423 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26333 Fixes the operators missed in https://github.com/pytorch/pytorch/issues/26507 and includes a test for all operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27423 Differential Revision: D17835390 Pulled By: ezyang fbshipit-source-id: 7a1351c7ccc8ad11454dbaa00d3701dcee4f06a8	2019-10-28 14:28:33 -07:00
Richard Zou	0fbbc7acb4	Allow `align_to` to take in partially named tensors (#27308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27308 Currently, `tensor.align_to(*names)` has the restriction that the `tensor` must be fully named. This doesn't need to be the case, when using Ellipsis, we "expand the ellipsis to all unmentioned dimensions, in the order which they appear in the original tensor". For example, consider `tensor: Tensor[None, None, C]`. `tensor.align_to(C, None, None)` is ambiguous because the user might have wanted to switch the order of the None dimensions and there is no way to specify that using this API. However, `tensor.align_to('C', ...)` isn't ambiguous: we can select the two unnamed dimensions in the order in which they appear. To actually implement this, we write a brand-new `align_to(names, ellipsis_idx)` function in c++ that is separate from the regular `align_to(names)` implementation. Ideally we would support "..." as a special name in c++ and combine the two implementations; we'll need to support "..." in c++ in the future but that requires a bit of extra work. In this PR, Python processees the ellipsis and then calls the correct overload. Test Plan: - run tests Differential Revision: D17745179 Pulled By: zou3519 fbshipit-source-id: 9fed06d224215cfb7efecd8c002604baab3c45e6	2019-10-09 16:28:45 -07:00
zou3519	59b14a7620	Documentation for named tensors (#27173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27173 `docs/source/named_tensor.rst` is the entry point; most users will land either here or the named tensor tutorial when looking to use named tensors. We should strive to make this as readable, concise, and understandable as possible. `docs/source/name_inference.rst` lists all of the name inference rules. It should be clear but it's hard to make it concise. Please let me know if anything doesn't make sense and please propose alternative wordings and/or restructuring to improve the documentation. This should ultimately get cherry-picked into the 1.3 branch as one monolithic commit so it would be good to get all necessary changes made in this PR and not have any follow ups. Test Plan: - built and reviewed locally with `cd docs/ && make html`. Differential Revision: D17763046 Pulled By: zou3519 fbshipit-source-id: c7872184fc4b189d405b18dad77cad6899ae1522	2019-10-08 22:22:30 -07:00

1 2 3 4

186 Commits