pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Natalia Gimelshein	50b91103a9	add self cuda time to avoid double/quadruple counting (#45209 ) Summary: In profiler, cuda did not report self time, so for composite functions there was no way to determine which function is really taking time. In addition, "total cuda time" reported was frequently more than total wallclock time. This PR adds "self CUDA time" in profiler, and computes total cuda time based on self cuda time, similar to how it's done for CPU. Also, slight formatting changes to make table more compact. Before: ``` -------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls -------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- aten::matmul 0.17% 890.805us 99.05% 523.401ms 5.234ms 49.91% 791.184ms 7.912ms 100 aten::mm 98.09% 518.336ms 98.88% 522.511ms 5.225ms 49.89% 790.885ms 7.909ms 100 aten::t 0.29% 1.530ms 0.49% 2.588ms 25.882us 0.07% 1.058ms 10.576us 100 aten::view 0.46% 2.448ms 0.46% 2.448ms 12.238us 0.06% 918.936us 4.595us 200 aten::transpose 0.13% 707.204us 0.20% 1.058ms 10.581us 0.03% 457.802us 4.578us 100 aten::empty 0.14% 716.056us 0.14% 716.056us 7.161us 0.01% 185.694us 1.857us 100 aten::as_strided 0.07% 350.935us 0.07% 350.935us 3.509us 0.01% 156.380us 1.564us 100 aten::stride 0.65% 3.458ms 0.65% 3.458ms 11.527us 0.03% 441.258us 1.471us 300 -------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 528.437ms CUDA time total: 1.585s Recorded timeit time: 789.0814 ms ``` Note recorded timeit time (with proper cuda syncs) is 2 times smaller than "CUDA time total" reported by profiler After ``` -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::matmul 0.15% 802.716us 99.06% 523.548ms 5.235ms 302.451us 0.04% 791.151ms 7.912ms 100 aten::mm 98.20% 519.007ms 98.91% 522.745ms 5.227ms 790.225ms 99.63% 790.848ms 7.908ms 100 aten::t 0.27% 1.406ms 0.49% 2.578ms 25.783us 604.964us 0.08% 1.066ms 10.662us 100 aten::view 0.45% 2.371ms 0.45% 2.371ms 11.856us 926.281us 0.12% 926.281us 4.631us 200 aten::transpose 0.15% 783.462us 0.22% 1.173ms 11.727us 310.016us 0.04% 461.282us 4.613us 100 aten::empty 0.11% 591.603us 0.11% 591.603us 5.916us 176.566us 0.02% 176.566us 1.766us 100 aten::as_strided 0.07% 389.270us 0.07% 389.270us 3.893us 151.266us 0.02% 151.266us 1.513us 100 aten::stride 0.60% 3.147ms 0.60% 3.147ms 10.489us 446.451us 0.06% 446.451us 1.488us 300 -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 528.498ms CUDA time total: 793.143ms Recorded timeit time: 788.9832 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45209 Reviewed By: zou3519 Differential Revision: D23925491 Pulled By: ngimel fbshipit-source-id: 7f9c49238d116bfd2db9db3e8943355c953a77d0	2020-09-28 21:51:13 -07:00
Heitor Schueroff de Souza	96f8755034	Fixed handling of nan for evenly_distribute_backward (#45280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45280 Performance is the same on CPU and on CUDA is only 1-1.05x slower. This change is necessary for the future nan ops including nan(min\|max\|median) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D23908796 Pulled By: heitorschueroff fbshipit-source-id: c2b57acbe924cfa59fbd85216811f29f4af05088	2020-09-28 15:57:02 -07:00
lcskrishna	a4486fe7ba	[ROCm] Print name irrespective of seq number assignment for roctx traces (#45229 ) Summary: Recent changes to the seq_num correlation behavior in profiler (PR https://github.com/pytorch/pytorch/issues/42565) has changed the behavior for emit_nvtx(record_shapes=True) which doesn't print the name of the operator properly. Created PR to dump out the name in roctx traces, irrespective of the sequence number assigned only for ROCm. cc: jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45229 Reviewed By: zou3519 Differential Revision: D23932902 Pulled By: albanD fbshipit-source-id: c782667ff002b70b51f1cc921afd1b1ac533b39d	2020-09-28 15:03:47 -07:00
Rohan Varma	23dfca8351	Support record_shapes in RPC profiling (#44419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44419 Closes https://github.com/pytorch/pytorch/issues/39969 This PR adds support for propagation of input shapes over the wire when the profiler is invoked with `record_shapes=True` over RPC. Previously, we did not respect this argument. This is done by saving the shapes as an ivalue list and recovering it as the type expected (`std::vector<std::vector<int>>` on the client). Test is added to ensure that remote ops have the same `input_shapes` as if the op were run locally. ghstack-source-id: 112977899 Reviewed By: pritamdamania87 Differential Revision: D23591274 fbshipit-source-id: 7cf3b2e8df26935ead9d70e534fc2c872ccd6958	2020-09-26 13:26:44 -07:00
Brian Hirsh	439930c81b	adding a beta parameter to the smooth_l1 loss fn (#44433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44433 Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23636720 Pulled By: bdhirsh fbshipit-source-id: caea2a1f8dd72b3b5fd1d72dd886b2fcd690af6d	2020-09-25 16:36:28 -07:00
Rohan Varma	27ab9bc0f9	[RPC profiling] Extend RPC profiling to support async function execution over RPC. (#44664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44664 Closes https://github.com/pytorch/pytorch/issues/39971. This PR adds support for functions decorated with `rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. ghstack-source-id: 112868470 Test Plan: ``` rvarm1@devbig978:fbcode (52dd34f6)$ buck test mode/no-gpu mode/dev-nosan //caffe2/test/distributed/rpc:process_group_agent -- test_rpc_profiling_async_function --print-passing-details --stress-runs 1 ``` Reviewed By: mrshenli Differential Revision: D23638387 fbshipit-source-id: eedb6d48173a4ecd41d70a9c64048920bd4807c4	2020-09-25 13:19:26 -07:00
Brian Hirsh	2739a7c599	Byte-for-byte compatibility fixes in codegen (#44879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44879 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23825163 Pulled By: bdhirsh fbshipit-source-id: 4d8028274f82c401b393c4fe1b9e32de3f4909c6	2020-09-25 08:06:50 -07:00
kshitij12345	00e704e757	[fix] torch.repeat : dim-0 backward (#45212 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45201 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45212 Reviewed By: mrshenli Differential Revision: D23905545 Pulled By: albanD fbshipit-source-id: c5bf9cf481c8cf3ccc1fdbfb364006b29f67dc9f	2020-09-25 07:53:00 -07:00
Yanli Zhao	c6500bcf14	[reland] Make grad point to bucket buffer in DDP to save memory usage (#44344 ) Summary: [test all] Pull Request resolved: https://github.com/pytorch/pytorch/pull/44344 reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 112845787 Test Plan: 1. When grad_is_view=false: a. roberta_base, peak memory usage 8250MB, p50 per iteration latency 0.923second, https://www.internalfb.com/intern/fblearner/details/218029699/?notif_channel=cli b. resnet, peak memory usage 3089MB, p50 per iteration latency 0.120second, https://www.internalfb.com/intern/fblearner/details/218029035/?notif_channel=cli c. accuracy benchmark, distributed=false, .accuracy 40.914535522461, .loss: 1.6370717287064; distributed=true, .accuracy: 39.966053009033, .loss: 1.6849111318588 https://www.internalfb.com/intern/fblearner/details/218035688/?notif_channel=cli d. classy vision uru production flow, https://www.internalfb.com/intern/fblearner/details/219065811/?notif_channel=cli e. pytext flow, https://www.internalfb.com/intern/fblearner/details/219137458/?notif_channel=cli 2. When grad_is_view=true: a. roberta_base, peak memory usage 7183MB, p50 per iteration latency 0.908second, https://www.internalfb.com/intern/fblearner/details/217882539?tab=operator_details b. resnet, peak memory usage 2988 MB, p50 per iteration latency 0.119second, https://www.internalfb.com/intern/fblearner/details/218028479/?notif_channel=cli c. accuracy benchmark, distributed=false, .accuracy 41.713260650635, .loss: 1.69939661026; distributed=true, .accuracy: 39.966053009033, .loss: 1.6849111318588, https://www.internalfb.com/intern/fblearner/details/218037058/?notif_channel=cli d. classy vision uru production flow, expected, can not work well with apex.amp https://www.internalfb.com/intern/fblearner/details/219205218/?notif_channel=cli e. pytext flow, detach_() related error, expected, as pytext zero_grad depends on apex repo where detach_() is called. also seeing the warning in finalize_bucket_dense due to tied weights, which is expected. https://www.internalfb.com/intern/fblearner/details/219150229/?notif_channel=cli Reviewed By: mrshenli Differential Revision: D23588186 fbshipit-source-id: f724d325b954ef6f06ede31759bf01dd29a6f5e5	2020-09-24 20:54:51 -07:00
Rohan Varma	70d2e4d1f6	[RPC profiling] Allow disableProfiler() to be called from another thread. (#44653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44653 This changes the profiler per a discussion with ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling. This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default. Added a test in `test_misc.cpp` to test this. ghstack-source-id: 112605620 Reviewed By: mrshenli Differential Revision: D23638499 fbshipit-source-id: f5bbb0d41ef883c5e5870bc27e086b8b8908f46b	2020-09-22 21:16:58 -07:00
Rohan Varma	1bd6533d60	Remove thread_local RecordFunctionGuard from profiler. (#44646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44646 Per a discussion with ilia-cher, this is not needed anymore and removing it would make some future changes to support async RPC profiling easier. Tested by ensuring profiling tests in `test_autograd.py` still pass. ghstack-source-id: 112605618 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23683998 fbshipit-source-id: 4e49a439509884fe04d922553890ae353e3331ab	2020-09-22 21:15:31 -07:00
anjali411	58b6ab69e5	torch.sgn for complex tensors (#39955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39955 resolves https://github.com/pytorch/pytorch/issues/36323 by adding `torch.sgn` for complex tensors. `torch.sgn` returns `x/abs(x)` for `x != 0` and returns `0 + 0j` for `x==0` This PR doesn't test the correctness of the gradients. It will be done as a part of auditing all the ops in future once we decide the autograd behavior (JAX vs TF) and add gradchek. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23460526 Pulled By: anjali411 fbshipit-source-id: 70fc4e14e4d66196e27cf188e0422a335fc42f92	2020-09-22 08:24:53 -07:00
anjali411	9f67176b82	Complex gradcheck logic (#43208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43208 This PR adds gradcheck for complex. The logic used for complex gradcheck is described in Section 3.5.3 here: https://arxiv.org/pdf/1701.00392.pdf More concretely, this PR introduces the following changes: 1. Updates get_numerical_jacobian to take as input a scalar value for vector (v). Adds gradcheck logic for C -> C, C-> R, R -> C. For R -> C functions, only the real value of gradient is propagated. 2. Adds backward definition for `torch.complex` and also adds a test to verify the definition added. 3. Updates backward for `mul`, `sin`, `cos`, `sinh`, `cosh`. 4. Adds tests for all `torch.real`, `torch.imag`, `torch.view_as_real`, `torch.view_as_complex`, `torch.conj`. Follow up tasks: 1. Add more thorough tests for R -> C cases. Specifically, add R->C test variants for functions. for e.g., `torch.mul(complex_tensor, real_tensor)` 2. Add back commented test in `common_methods_invocation.py`. 3. Add more special case checking for complex gradcheck to make debugging easier. 4. Update complex autograd note. 5. disable complex autograd for operators not tested for complex. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23655088 Pulled By: anjali411 fbshipit-source-id: caa75e09864b5f6ead0f988f6368dce64cf15deb	2020-09-20 22:05:04 -07:00
Peter Bell	da7863f46b	Add one dimensional FFTs to torch.fft namespace (#43011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43011 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23751850 Pulled By: mruberry fbshipit-source-id: 8dc5fec75102d8809eeb85a3d347ba1b5de45b33	2020-09-19 23:32:22 -07:00
Richard Zou	69f6d94caa	Register diag_backward, diagonal_backward, infinitetely...gelu_backward as operators (#44422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44422 See #44052 for context. Test Plan: - `pytest test/test_autograd.py -v` - `pytest test/test_nn.py -v` Reviewed By: mrshenli Differential Revision: D23607691 Pulled By: zou3519 fbshipit-source-id: 09fbcd66b877af4fa85fd9b2f851ed3912ce84d6	2020-09-10 18:43:18 -07:00
Richard Zou	7ff7e6cfc8	Register cummaxmin_backward, cumprod_backward as operators (#44410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44410 See #44052 for context. One of the cumprod_backward overloads was unused so I just deleted it. Test Plan: - `pytest test/test_autograd.py -v` Reviewed By: mrshenli Differential Revision: D23605503 Pulled By: zou3519 fbshipit-source-id: f9c5b595e62d2d6e71f26580ba96df15cc9de4f7	2020-09-10 18:43:15 -07:00
Richard Zou	08b431f54c	Add trace_backward, masked_select_backward, and take_backward as ops (#44408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44408 See #44052 for context. Test Plan: - `pytest test/test_autograd.py -v` Reviewed By: mrshenli Differential Revision: D23605504 Pulled By: zou3519 fbshipit-source-id: b9b1646d13caa6e536d08669c29bfc2ad8ff89a3	2020-09-10 18:41:07 -07:00
Nikita Shulga	4bead6438a	Enable torch.autograd typechecks (#44451 ) Summary: To help with further typing, move dynamically added native contributions from `torch.autograd` to `torch._C._autograd` Fix invalid error handling pattern in `89ac30afb8/torch/csrc/autograd/init.cpp (L13-L15)` `PyImport_ImportModule` already raises Python exception and nullptr should be returned to properly propagate the to Python runtime. And all native methods/types in `torch/autograd/__init.py` after `torch._C._init_autograd()` has been called Use f-strings instead of `.format` in test_type_hints.py Fixes https://github.com/pytorch/pytorch/issues/44450 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44451 Reviewed By: ezyang Differential Revision: D23618261 Pulled By: malfet fbshipit-source-id: fa5f739d7cff8410641128b55b810318c5f636ae	2020-09-10 13:37:29 -07:00
Kenichi Maehashi	cb90fef770	Fix return value of PyErr_WarnEx ignored (SystemError) (#44371 ) Summary: This PR fixes unexpected `SystemError` when warnings are emitted and warning filters are set. ## Current behavior ``` $ python -Werror >>> import torch >>> torch.range(1, 3) UserWarning: torch.range is deprecated in favor of torch.arange and will be removed in 0.5. Note that arange generates values in [start; end), not [start; end]. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> SystemError: <built-in method range of type object at 0x7f38c7703a60> returned a result with an error set ``` ## Expected behavior ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end). ``` ## Note Python exception must be raised if `PyErr_WarnEx` returns `-1` ([python docs](https://docs.python.org/3/c-api/exceptions.html#issuing-warnings)). This PR fixes warnings raised in the following code: ```py import torch torch.range(1, 3) torch.autograd.Variable().volatile torch.autograd.Variable().volatile = True torch.tensor(torch.tensor([])) torch.tensor([]).new_tensor(torch.tensor([])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44371 Reviewed By: mrshenli Differential Revision: D23598410 Pulled By: albanD fbshipit-source-id: 2fbcb13fe4025dbebaf1fd837d4c8e0944e05010	2020-09-10 10:15:21 -07:00
Richard Zou	9a5a732866	Register some backwards functions as operators (#44052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44052 Summary ======= This PR registers the following backwards functions as operators: - slice_backward - select_backward - gather_backward - index_select_backward (the backward function for index_select) - select_index_backward (prevously known as index_select_backward, but is actually the backward function for max.dim, min.dim, etc) In the future, I'd like to register more backward functions as operators so that we can write batching rules for the backward functions. Batching rules for backward functions makes it so that we can compute batched gradients. Motivation ========== The rationale behind this PR is that a lot of backwards functions (27 in total) are incompatible with BatchedTensor due to using in-place operations. Sometimes we can allow the in-place operations, but other times we can't. For example, consider select_backward: ``` Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) { auto grad_input = at::zeros(input_sizes, grad.options()); grad_input.select(dim, index).copy_(grad); return grad_input; } ``` and consider the following code: ``` x = torch.randn(5, requires_grad=True) def select_grad(v): torch.autograd.grad(x[0], x, v) vs = torch.randn(B0) batched_grads = vmap(select_grad)(vs) ``` For the batched gradient use case, `grad` is a BatchedTensor. The physical version of `grad` has size `(B0,)`. However, select_backward creates a `grad_input` of shape `(5)`, and tries to copy `grad` to a slice of it. Other approaches ================ I've considered the following: - register select_backward as an operator (this PR) - have a branch inside select_backward for if `grad` is batched. - this is OK, but what if we have more tensor extensions that want to override this? - modify select_backward to work with BatchedTensor, by creating a new operator for the "select + copy_ behavior". - select + copy_ isn't used elsewhere in derivative formulas so this doesn't seem useful Test Plan ========= - `pytest test/test_autograd.py -v` - Registering backward functions may impact performance. I benchmarked select_backward to see if registering it as an operator led to any noticable performance overheads: https://gist.github.com/zou3519/56d6cb53775649047b0e66de6f0007dc. The TL;DR is that the overhead is pretty minimal. Test Plan: Imported from OSS Reviewed By: ezyang, fbhuba Differential Revision: D23481183 Pulled By: zou3519 fbshipit-source-id: 125af62eb95824626dc83d06bbc513262ee27350	2020-09-04 08:30:39 -07:00
albanD	73f009a2aa	refactor manual function definitions (#43711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43711 this makes them available in forward if needed No change to the file content, just a copy-paste. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23454146 Pulled By: albanD fbshipit-source-id: 6269a4aaf02ed53870fadf8b769ac960e49af195	2020-09-02 09:23:21 -07:00
Pritam Damania	f1624b82b5	Preserve python backtrace in autograd engine errors. (#43684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43684 This PR attempts to address #42560 by capturing the appropriate exception_ptr in the autograd engine and passing it over to the Future. As part of this change, there is a significant change the Future API where we now only accept an exception_ptr as part of setError. For the example in #42560, the exception trace would now look like: ``` > Traceback (most recent call last): > File "test_autograd.py", line 6914, in test_preserve_backtrace > Foo.apply(t).sum().backward() > File "torch/tensor.py", line 214, in backward > torch.autograd.backward(self, gradient, retain_graph, create_graph) > File "torch/autograd/__init__.py", line 127, in backward > allow_unreachable=True) # allow_unreachable flag > File "torch/autograd/function.py", line 87, in apply > return self._forward_cls.backward(self, *args) > File "test_autograd.py", line 6910, in backward > raise ValueError("something") > ValueError: something ``` ghstack-source-id: 111109637 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D23365408 fbshipit-source-id: 1470c4776ec8053ea92a6ee1663460a3bae6edc5	2020-09-01 01:28:47 -07:00
Ralf Gommers	4c19a1e350	Move torch/autograd/grad_mode.pyi stubs inline (#43415 ) Summary: - Add `torch._C` bindings from `torch/csrc/autograd/init.cpp` - Renamed `torch._C.set_grad_enabled` to `torch._C._set_grad_enabled` so it doesn't conflict with torch.set_grad_enabled anymore This is a continuation of gh-38201. All I did was resolve merge conflicts and finish the annotation of `_DecoratorContextManager.__call__` that ezyang started in the first commit. ~Reverts commit `b5cd3a80bb`, which was only motivated by not having `typing_extensions` available.~ (JIT can't be made to understand `Literal[False]`, so keep as is). Pull Request resolved: https://github.com/pytorch/pytorch/pull/43415 Reviewed By: ngimel Differential Revision: D23301168 Pulled By: malfet fbshipit-source-id: cb5290f2e556b4036592655b9fe54564cbb036f6	2020-08-31 16:14:41 -07:00
mfkasim91	576880febf	Print all traceback for nested backwards in detect_anomaly (#43626 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43405. This pull request adds a feature of printing all tracebacks if a `detect_anomaly` mode detects `nan` in nested backward operations. The way I did it is by assigning a node as a parent to all nodes it produces during its backward calculation. Then if one of the children produces `nan`, it will print the traceback from the parent and grand parents (if any). The parent is assigned in `parent_node_` member in `Node` class which is accessible in C++ by function `node->parent()` and in Python by `node.parent_function`. A node has a parent iff: 1. it is created from a backward operation, and 2. created when anomaly mode and grad mode are both enabled. An example of this feature: import torch def example(): x = torch.tensor(1.0, requires_grad=True) y = torch.tensor(1e-8, requires_grad=True) # small to induce nan in n-th backward a = x * y b = x * y z1 = a / b # can produce nan in n-th backward as long as https://github.com/pytorch/pytorch/issues/43414 is unsolved z = z1 * z1 gy , = torch.autograd.grad( z , (y,), create_graph=True) gy2, = torch.autograd.grad(gy , (y,), create_graph=True) gy3, = torch.autograd.grad(gy2, (y,), create_graph=True) gy4, = torch.autograd.grad(gy3, (y,), create_graph=True) return gy4 with torch.autograd.detect_anomaly(): gy4 = example() with output: example.py:16: UserWarning: Anomaly Detection has been enabled. This mode will increase the runtime and should only be enabled for debugging. with torch.autograd.detect_anomaly(): /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Error detected in DivBackward0. Traceback of forward call that caused the error: File "example.py", line 17, in <module> gy4 = example() File "example.py", line 12, in example gy3, = torch.autograd.grad(gy2, (y,), create_graph=True) File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad return Variable._execution_engine.run_backward( (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:61.) return Variable._execution_engine.run_backward( /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Traceback of forward call that induces the previous calculation: File "example.py", line 17, in <module> gy4 = example() File "example.py", line 11, in example gy2, = torch.autograd.grad(gy , (y,), create_graph=True) File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad return Variable._execution_engine.run_backward( (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:65.) return Variable._execution_engine.run_backward( /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Traceback of forward call that induces the previous calculation: File "example.py", line 17, in <module> gy4 = example() File "example.py", line 8, in example z1 = a / b # can produce nan in n-th backward as long as https://github.com/pytorch/pytorch/issues/43414 is unsolved (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:65.) return Variable._execution_engine.run_backward( Traceback (most recent call last): File "example.py", line 17, in <module> gy4 = example() File "example.py", line 13, in example gy4, = torch.autograd.grad(gy3, (y,), create_graph=True) File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad return Variable._execution_engine.run_backward( RuntimeError: Function 'DivBackward0' returned nan values in its 1th output. cc & thanks to albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/43626 Reviewed By: malfet Differential Revision: D23397499 Pulled By: albanD fbshipit-source-id: aa7435ec2a7f0d23a7a02ab7db751c198faf3b7d	2020-08-31 08:23:07 -07:00
Ashkan Aliabadi	4e39c310eb	Move torch/csrc/utils/hash.h to c10/util/hash.h. (#42503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42503 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252331 Pulled By: AshkanAliabadi fbshipit-source-id: 3c4c0e27b9a7eec8560e374c2a3ba5f1c65dae48	2020-08-29 17:47:00 -07:00
Pritam Damania	931b8b4ac8	Use ivalue::Future in autograd engine and DistEngine. (#43676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43676 This is one part of https://github.com/pytorch/pytorch/issues/41574 to ensure we consolidate everything around ivalue::Future. I've removed the use of torch/csrc/utils/future.h from the autograd engines and used ivalue::Future instead. ghstack-source-id: 110895545 Test Plan: waitforbuildbot. Reviewed By: albanD Differential Revision: D23362415 fbshipit-source-id: aa109b3f8acf0814d59fc5264a85a8c27ef4bdb6	2020-08-29 02:15:26 -07:00
Iurii Zdebskyi	4cb8d306e6	Add _foreach_add_(TensorList tensors, Scalar scalar) API (#42531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42531 [First PR: Add private API to support tensor lists: _foreach_add(TensorList tensors, Scalar scalar)](https://github.com/pytorch/pytorch/pull/41554). Motivation [GitHub issue](https://github.com/pytorch/pytorch/issues/38655) Current PyTorch optimizer implementations are not efficient in cases when we work with a lot of small feature tensors. Starting a lot of kernels slows down the whole process. We need to reduce the number of kernels that we start. As an example, we should be looking at [NVIDIAs Apex](https://github.com/NVIDIA/apex). In order to track progress, we will pick PyTorchs DCGAN model with Adam optimizer and once the optimizer is reimplemented with tensor lists, benchmark the model performance against original model version, Apexs version with original Adam optimizer and it’s FusedAdam optimizer. Current API restrictions - List can't be empty (will fixed in upcoming PRs). - All tensors in the list must have the same dtype, device and size. Broadcasting At this point we don't support broadcasting. What is 'Fast' and 'Slow' route In particular cases, we cant process an op with a fast list CUDA kernel. Still, we can do with a regular for-loop where the op will be applied to each tensor individually through the dispatch mechanisms. There are a few checks that decide whether the op will be performed via a 'fast' or 'slow' path. To go the fast route, - All tensors must have strided layout - All tensors must be dense and not have overlapping memory - The resulting tensor type must be the same. --------------- In this PR - Adding a `std::vector<Tensor> _foreach_add_(TensorList tensors, Scalar scalar)` API - Resolving some additional comments from previous [PR](https://github.com/pytorch/pytorch/pull/41554). Tests Tested via unit tests TODO 1. Properly handle empty lists Plan for the next PRs 1. APIs - Binary Ops for list with Scalar - Binary Ops for list with list - Unary Ops for list - Pointwise Ops 2. Complete tasks from TODO 3. Rewrite PyTorch optimizers to use for-each operators for performance gains. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23331892 Pulled By: izdeby fbshipit-source-id: c585b72e1e87f6f273f904f75445618915665c4c	2020-08-28 14:34:46 -07:00
Haoran Li	f35e069622	Back out "Make grad point to bucket buffer in DDP to save memory usage" (#43557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43557 backout the diff that caused some errors in pytext distributed training Test Plan: Tested by rayhou who verified reverting the diff works Differential Revision: D23320238 fbshipit-source-id: caa0fe74404059e336cd95fdb41373f58ecf486e	2020-08-25 18:04:39 -07:00
rakshithvasudev	0cb52cb458	Autograd better error (#43308 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/5025 Thanks for the conversation in the issue thread. Hopefully this must fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43308 Reviewed By: ezyang Differential Revision: D23241918 Pulled By: suraj813 fbshipit-source-id: e1efac13f5ce590196f227149f011c973c2bbdde	2020-08-21 05:50:33 -07:00
Yanli Zhao	97d594b9f7	Make grad point to bucket buffer in DDP to save memory usage (#41954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41954 Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in https://github.com/pytorch/pytorch/pull/41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 110260297 Test Plan: unit tests, For roberta_base model with ~1GB parameters, peak memory dropped ~1GB (8250MB-7183MB). Per iteration latency (0.982s ->0.909s), 8% speed up https://www.internalfb.com/intern/fblearner/details/211713882?tab=operator_details https://www.internalfb.com/intern/fblearner/details/211772923?tab=operator_details For resnet model with ~97M parameters, peak memory dropped ~100MB (3089MB -> 2988MB). Per iteration latency has no change (0.122s -> 0.123s) https://www.internalfb.com/intern/fblearner/details/211713577?tab=operator_details https://www.internalfb.com/intern/fblearner/details/211712582?tab=operator_details accuracy benchmark is expected as well https://www.internalfb.com/intern/fblearner/details/213237067?tab=Outputs Reviewed By: mrshenli Differential Revision: D22707857 fbshipit-source-id: b5e767cfb34ccb3d067db2735482a86d59aea7a4	2020-08-20 15:33:44 -07:00
Pritam Damania	133e9f96e1	Use c10 threadpool for GPU to CPU distributed autograd continuations. (#42511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42511 DistEngine currently only has a single thread to execute GPU to CPU continuations as part of the backward pass. This would be a significant performance bottleneck in cases where we have such continuations and would like to execute these using all CPU cores. To alleviate this in this PR, we have the single thread in DistEngine only dequeue work from the global queue, but then hand off execution of that work to the c10 threadpool where we call "execute_graph_task_until_ready_queue_empty". For more context please see: https://github.com/pytorch/pytorch/issues/40255#issuecomment-663298062. ghstack-source-id: 109997718 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D22917579 fbshipit-source-id: c634b6c97f3051f071fd7b994333e6ecb8c54155	2020-08-17 15:04:19 -07:00
Sebastian Messmer	20e0e54dbe	Allow Tensor& in the unboxing logic (#42712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42712 Previously, operators taking Tensor& as arguments or returning it couldn't be c10-full because the unboxing logic didn't support it. This adds temporary support for that. We're planning to remove this again later, but for now we need it to make those ops c10-full. See https://docs.google.com/document/d/19thMVO10yMZA_dQRoB7H9nTPw_ldLjUADGjpvDmH0TQ for the full plan. This PR also makes some ops c10-full that now can be. ghstack-source-id: 109693706 Test Plan: unit tests Reviewed By: bhosmer Differential Revision: D22989242 fbshipit-source-id: 1bd97e5fa2b90b0860784da4eb772660ca2db5a3	2020-08-12 17:33:23 -07:00
Richard Zou	bda0007620	Improve calling backward() and grad() inside vmap error messages (#42876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42876 Previously, the error messages were pretty bad. This PR adds nice error messages for the following cases: - user attempts to call .backward() inside vmap for any reason whatsoever - user attempts to call autograd.grad(outputs, inputs, grad_outputs), where outputs or inputs is being vmapped over (so they are BatchedTensors). The case we do support is calling autograd.grad(outputs, inputs, grad_outputs) where `grad_outputs` is being vmapped over. This is the case for batched gradient support (e.g., user passes in a batched grad_output). Test Plan: - new tests: `pytest test/test_vmap.py -v` Reviewed By: ezyang Differential Revision: D23059836 Pulled By: zou3519 fbshipit-source-id: 2fd4e3fd93f558e67e2f0941b18f0d00d8ab439f	2020-08-12 10:05:31 -07:00
Hameer Abbasi	75a15d3d01	Follow-up for pytorch/pytorch#37091. (#42806 ) Summary: This is a follow-up PR for https://github.com/pytorch/pytorch/issues/37091, fixing some of the quirks of that PR as that one was landed early to avoid merge conflicts. This PR addresses the following action items: - [x] Use error-handling macros instead of a `try`-`catch`. - [x] Renamed and added comments to clarify the use of `HANDLED_FUNCTIONS_WRAPPERS` in tests. `HANDLED_FUNCTIONS_NAMESPACES` was already removed in the last PR as we had a way to test for methods. This PR does NOT address the following action item, as it proved to be difficult: - [ ] Define `__module__` for whole API. Single-line repro-er for why this is hard: ```python >>> torch.Tensor.grad.__get__.__module__ = "torch.Tensor.grad" Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'method-wrapper' object has no attribute '__module__' ``` Explanation: Methods defined in C/properties don't always have a `__dict__` attribute or a mutable `__module__` slot for us to modify. The documentation action items were addressed in the following commit, with the additional future task of adding the rendered RFCs to the documentation: `552ba37c05` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42806 Reviewed By: smessmer Differential Revision: D23031501 Pulled By: ezyang fbshipit-source-id: b781c97f7840b8838ede50a0017b4327f96bc98a	2020-08-12 09:11:33 -07:00
Peter Bell	2878efb35d	Use `C10_API_ENUM` to fix invalid attribute warnings (#42464 ) Summary: Using the macro added in https://github.com/pytorch/pytorch/issues/38988 to fix more attribute warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42464 Reviewed By: malfet Differential Revision: D22916943 Pulled By: ezyang fbshipit-source-id: ab9ca8755cd8b89aaf7f8718b4107b4b94d95005	2020-08-12 09:02:49 -07:00
Heitor Schueroff de Souza	ffc3da35f4	Don't materialize output grads (#41821 ) Summary: Added a new option in AutogradContext to tell autograd to not materialize output grad tensors, that is, don't expand undefined/None tensors into tensors full of zeros before passing them as input to the backward function. This PR is the second part that closes https://github.com/pytorch/pytorch/issues/41359. The first PR is https://github.com/pytorch/pytorch/pull/41490. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41821 Reviewed By: albanD Differential Revision: D22693163 Pulled By: heitorschueroff fbshipit-source-id: a8d060405a17ab1280a8506a06a2bbd85cb86461	2020-08-11 04:27:07 -07:00
Mike Ruberry	9c8021c0b1	Adds torch.linalg namespace (#42664 ) Summary: This PR adds the `torch.linalg` namespace as part of our continued effort to be more compatible with NumPy. The namespace is tested by adding a single function, `torch.linalg.outer`, and testing it in a new test suite, test_linalg.py. It follows the same pattern that https://github.com/pytorch/pytorch/pull/41911, which added the `torch.fft` namespace, did. Future PRs will likely: - add more functions to torch.linalg - expand the testing done in test_linalg.py, including legacy functions, like torch.ger - deprecate existing linalg functions outside of `torch.linalg` in preference to the new namespace Pull Request resolved: https://github.com/pytorch/pytorch/pull/42664 Reviewed By: ngimel Differential Revision: D22991019 Pulled By: mruberry fbshipit-source-id: 39258d9b116a916817b3588f160b141f956e5d0b	2020-08-07 10:18:30 -07:00
Ilia Cherniavskii	f9a6c14364	Fix sequence numbers in profiler output (#42565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42565 After recent changes to the record function we record more ranges in profiler output and also keep emitting sequence numbers for all ranges. Sequence numbers are used by external tools to correlate forward and autograd ranges and with many ranges having the same sequence number it becomes impossible to do this. This PR ensures that we set sequence numbers only for the top-level ranges and only in case when autograd is enabled. Test Plan: nvprof -fo trace.nvvp --profile-from-start off python test_script.py test_script https://gist.github.com/ilia-cher/2baffdd98951ee2a5f2da56a04fe15d0 then examining ranges in nvvp Reviewed By: ngimel Differential Revision: D22938828 Pulled By: ilia-cher fbshipit-source-id: 9a5a076706a6043dfa669375da916a1708d12c19	2020-08-06 19:12:05 -07:00
Ilia Cherniavskii	a53fdaa23f	Remove ProfiledType (#42570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42570 ProfiledType doesn't do anything and is not used atm, removing Test Plan: CI Reviewed By: ezyang Differential Revision: D22938664 Pulled By: ilia-cher fbshipit-source-id: 037c512938028f44258b702bbcde3f8c144f4aa0	2020-08-06 01:52:08 -07:00
Mike Ruberry	ccfce9d4a9	Adds fft namespace (#41911 ) Summary: This PR creates a new namespace, torch.fft (torch::fft) and puts a single function, fft, in it. This function is analogous to is a simplified version of NumPy's [numpy.fft.fft](https://numpy.org/doc/1.18/reference/generated/numpy.fft.fft.html?highlight=fft#numpy.fft.fft) that accepts no optional arguments. It is intended to demonstrate how to add and document functions in the namespace, and is not intended to deprecate the existing torch.fft function. Adding this namespace was complicated by the existence of the torch.fft function in Python. Creating a torch.fft Python module makes this name ambiguous: does it refer to a function or module? If the JIT didn't exist, a solution to this problem would have been to make torch.fft refer to a callable class that mimicked both the function and module. The JIT, however, cannot understand this pattern. As a workaround it's required to explicitly `import torch.fft` to access the torch.fft.fft function in Python: ``` import torch.fft t = torch.randn(128, dtype=torch.cdouble) torch.fft.fft(t) ``` See https://github.com/pytorch/pytorch/issues/42175 for future work. Another possible future PR is to get the JIT to understand torch.fft as a callable class so it need not be imported explicitly to be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41911 Reviewed By: glaringlee Differential Revision: D22941894 Pulled By: mruberry fbshipit-source-id: c8e0b44cbe90d21e998ca3832cf3a533f28dbe8d	2020-08-06 00:20:50 -07:00
Hameer Abbasi	3d46e02ea1	Add __torch_function__ for methods (#37091 ) Summary: According to pytorch/rfcs#3 From the goals in the RFC: 1. Support subclassing `torch.Tensor` in Python (done here) 2. Preserve `torch.Tensor` subclasses when calling `torch` functions on them (done here) 3. Use the PyTorch API with `torch.Tensor`-like objects that are _not_ `torch.Tensor` subclasses (done in https://github.com/pytorch/pytorch/issues/30730) 4. Preserve `torch.Tensor` subclasses when calling `torch.Tensor` methods. (done here) 5. Propagating subclass instances correctly also with operators, using views/slices/indexing/etc. (done here) 6. Preserve subclass attributes when using methods or views/slices/indexing. (done here) 7. A way to insert code that operates on both functions and methods uniformly (so we can write a single function that overrides all operators). (done here) 8. The ability to give external libraries a way to also define functions/methods that follow the `__torch_function__` protocol. (will be addressed in a separate PR) This PR makes the following changes: 1. Adds the `self` argument to the arg parser. 2. Dispatches on `self` as well if `self` is not `nullptr`. 3. Adds a `torch._C.DisableTorchFunction` context manager to disable `__torch_function__`. 4. Adds a `torch::torch_function_enabled()` and `torch._C._torch_function_enabled()` to check the state of `__torch_function__`. 5. Dispatches all `torch._C.TensorBase` and `torch.Tensor` methods via `__torch_function__`. TODO: - [x] Sequence Methods - [x] Docs - [x] Tests Closes https://github.com/pytorch/pytorch/issues/28361 Benchmarks in https://github.com/pytorch/pytorch/pull/37091#issuecomment-633657778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37091 Reviewed By: ngimel Differential Revision: D22765678 Pulled By: ezyang fbshipit-source-id: 53f8aa17ddb8b1108c0997f6a7aa13cb5be73de0	2020-08-05 20:44:13 -07:00
Ailing	dae94ed022	Keep manual_kernel_registration only effective in aten codegen. (#42386 ) Summary: This PR removes manual registration in aten/native codebase. And it separates manual device/catchall kernel registration from manual VariableType kernel registration. The first one remains as manual_kernel_registration in native_functions.yaml. The second one is moved to tools/ codegen. Difference in generated TypeDefault.cpp: https://gist.github.com/ailzhang/897ef9fdf0c834279cd358febba07734 No difference in generated VariableType_X.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/42386 Reviewed By: agolynski Differential Revision: D22915649 Pulled By: ailzhang fbshipit-source-id: ce93784b9b081234f05f3343e8de3c7a704a5783	2020-08-05 10:31:35 -07:00
Sebastian Messmer	1542c41a67	Change C++ frontend to take optional<Tensor> arguments (#41947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41947 Previously, if an op took an optional `Tensor?` argument, the C++ frontend (i.e. `at::op()` and `Tensor::op()`) were generated to take `Tensor`. A previous PR (https://github.com/pytorch/pytorch/pull/41610) changed the kernels to be written with `c10::optional<Tensor>` instead of `Tensor`, but that did not touch the C++ frontend yet. This PR changes the C++ frontend API to take `c10::optional<Tensor>` instead of `Tensor` as well. This should be mostly bc conserving. Since `Tensor` implicitly converts to `c10::optional<Tensor>`, any old code calling an op with a `Tensor` would still work. There are likely corner cases that get broken though. For example, C++ only ever does one implicit conversion. So if you call an op with a non-tensor object that gets implicitly converted to a `Tensor`, then that previously worked since the API took a `Tensor` and C++ allows one implicit conversion. Now it wouldn't work anymore because it would require two implicit conversions (to `Tensor` and then to `c10::optional<Tensor>`) and C++ doesn't do that. The main reasons for doing this are - Make the C++ API more sane. Those arguments are optional and that should be visible from the signature. - Allow easier integration for XLA and Autocast. Those backends generate code to wrap operators and forward operator arguments to calls to at::op(). After https://github.com/pytorch/pytorch/pull/41610, there was a mismatch because they had to implement operators with `optional<Tensor>` but call `at::op()` with `Tensor`, so they had to manually convert between those. After this PR, they can just forward the `optional<Tensor>` in their call to `at::op()`. ghstack-source-id: 108873705 Test Plan: unit tests Reviewed By: bhosmer Differential Revision: D22704832 fbshipit-source-id: f4c00d457b178fbc124be9e884a538a3653aae1f	2020-07-31 16:11:55 -07:00
Sebastian Messmer	3a19af2427	Make operators with optional Tensor? arguments c10-full (#41610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41610 Previously, operators that have a `Tensor?` (i.e. optional tensor) in their schema implemented it using `Tensor` in C++ and filled in an undefined tensor for the None case. The c10 operator library, however, expects `Tensor?` to be represented as `optional<Tensor>`, so those operators couldn't be c10-full yet and still had to use codegenerated unboxing instead of templated unboxing. This PR changes that. It extends the `hacky_wrapper_for_legacy_signatures` to not only take case of TensorOptions, but now also map between signatures taking `Tensor` and `optional<Tensor>`. For this, it requires an additional template parameter, the expected signature, and it uses that to go argument-by-argument and unwrap any optionals it finds. ghstack-source-id: 108873701 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D22607879 fbshipit-source-id: 57b2fb01a294b804f82cd55cd70f0ef4a478e14f	2020-07-31 16:09:08 -07:00
Wojciech Baranowski	48569cc330	Reland split (#41567 ) Summary: Take 3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41567 Reviewed By: zou3519 Differential Revision: D22586331 Pulled By: albanD fbshipit-source-id: ca08199da716d64a335455610edbce752fee224b	2020-07-21 08:06:27 -07:00
Ilia Cherniavskii	e7a09b4d17	RecordFunction in Dispatcher (#37587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37587 Lifting RecordFunction up into the dispatcher code Test Plan: Imported from OSS Differential Revision: D21374246 fbshipit-source-id: 19f9c1719e6fd3990e451c5bbd771121e91128f7	2020-07-17 22:20:05 -07:00
Heitor Schueroff de Souza	cf811d2fb3	retain undefined tensors in backward pass (#41490 ) Summary: Leave undefined tensors / None returned from custom backward functions as undefined/None instead of creating a tensor full of zeros. This change improves performance in some cases. This is BC-Breaking: Custom backward functions that return None will now see it potentially being propagated all the way up to AccumulateGrad nodes. Potential impact is that .grad field of leaf tensors as well as the result of autograd.grad may be undefined/None where it used to be a tensor full of zeros. Also, autograd.grad may raise an error, if so, consider using allow_unused=True ([see doc](https://pytorch.org/docs/stable/autograd.html?highlight=autograd%20grad#torch.autograd.grad)) if it applies to your case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41490 Reviewed By: albanD Differential Revision: D22578241 Pulled By: heitorschueroff fbshipit-source-id: f4966f4cb520069294f8c5c1691eeea799cc0abe	2020-07-17 12:42:50 -07:00
Rohan Varma	3c862c80cf	Move list size constants for profiler::Event and profiler::ProfilerConfig into (#40474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40474 These constants are unnecessary since there is an enum, and we can add the size at the end of the enum and it will be equal to the list size. I believe that this is the typical pattern used to represent enum sizes. ghstack-source-id: 107969012 Test Plan: CI Reviewed By: ezyang Differential Revision: D22147754 fbshipit-source-id: 7064a897a07f9104da5953c2f87b58179df8ea84	2020-07-17 12:00:18 -07:00
Alban Desmaison	b1d4e33c8b	Revert D22552377: [pytorch][PR] Reland split unsafe version Test Plan: revert-hammer Differential Revision: D22552377 (`5bba973afd`) Original commit changeset: 1d1b713d2429 fbshipit-source-id: 8194458f99bfd5f077b7daa46ca3e81b549adc1b	2020-07-16 15:24:19 -07:00
albanD	45c5bac870	[WIP] Fix cpp grad accessor API (#40887 ) Summary: Update the API to access grad in cpp to avoid unexpected thread safety issues. In particular, with the current API, a check like `t.grad().defined()` is not thread safe. - This introduces `t.mutable_grad()` that should be used when getting a mutable version of the saved gradient. This function is not thread safe. - The `Tensor& grad()` API is now removed. We could not do a deprecation cycle as most of our call side use non-const Tensors that use the non-const overload. This would lead to most calls hitting the warning. This would be too verbose for all the users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40887 Reviewed By: ezyang Differential Revision: D22343932 Pulled By: albanD fbshipit-source-id: d5eb909bb743bc20caaf2098196e18ca4110c5d2	2020-07-16 09:11:12 -07:00

1 2 3 4 5 ...

1042 Commits