pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	2ef6ffdfa1	Revert "[BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212 )" This reverts commit `f3aca45a16`. Reverted https://github.com/pytorch/pytorch/pull/97212 on behalf of https://github.com/soulitzer due to TestAutogradFunctionCUDA.test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_output_mark_dirty_True_cuda leaks	2023-03-28 18:30:51 +00:00
soulitzer	f3aca45a16	[BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212 ) Fixes https://github.com/pytorch/pytorch/issues/96887 We error out in BOTH the case when graph is created and when it is not created. Still bc-breaking, but not as severe because we are limiting to the case where someone uses setup_context. This makes setup_context and non-setup_context versions diverge in their behavior - With the non-setup_context version, saved variables are assumed to have the grad_fn of the inputs. - But now with the setup_context version, we produce an error for this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97212 Approved by: https://github.com/zou3519	2023-03-28 03:14:32 +00:00
Edward Z. Yang	a1edf5f63c	[EASY] Do hook sizes check with SymInt (#97362 ) I don't think this matters for any uses right now, but I found it during an audit; might as well fix it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97362 Approved by: https://github.com/wconstab	2023-03-22 23:26:00 +00:00
Kazuaki Ishizaki	69aa6b4bb9	fix typo in comments under torch/csrc/autograd (#96061 ) This PR fixes typos in comments of `.cpp` and `.h` files under `torch/csrc/autograd` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/96061 Approved by: https://github.com/soulitzer	2023-03-06 18:05:14 +00:00
Nikita Karetnikov	8c44ae2f5d	[inductor] enable `test_lowmem_dropout1_dynamic_shapes` (#94884 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94884 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-16 04:41:19 +00:00
cyy	4d51c8532c	Some simple fixes (#93221 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93221 Approved by: https://github.com/Skylion007	2023-01-30 05:14:03 +00:00
Aaron Gokaslan	8c8cd9539d	Add missing moves to torch autograd (#92772 ) Applies some additional std::move functions to torch/csrc/autograd to opportunities that were found via static analysis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92772 Approved by: https://github.com/ezyang	2023-01-24 02:01:52 +00:00
soulitzer	d19988093d	[autograd Function] Return input as-is if marked dirty even when requires_grad=False (#91214 ) Fixes https://github.com/pytorch/pytorch/issues/90209 Somewhat related: https://github.com/pytorch/pytorch/issues/71119 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91214 Approved by: https://github.com/albanD	2022-12-21 21:20:56 +00:00
soulitzer	6d425a7ce9	Fix forward AD custom Function non-differentiable outputs (#90787 ) Fixes https://github.com/pytorch/pytorch/issues/90067 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90787 Approved by: https://github.com/albanD	2022-12-13 23:13:44 +00:00
richard	382ef1fda7	Autograd graphtask trim unnecessary edges (#82544 ) ### Introduction <!-- What did you change and why was it needed? --> Removing unnecessary weight gradient calculation is very important for applications that need high-order derivatives during training. However, this is not supported by the current Autograd engine. For more detail: The backward function of a `matmul` operator (e.g., `linear` `addmm` `mm`), has two matmuls, one for `input gradient` and another for `weight gradient`. For a typical neural network (nn) with a few linear layers and activation functions, if the user calls `torch.autograd.grad()` to calculate the derivative of the nn output `y` w.r.t the nn input `x`, only the `input gradient` of the `matmul` operator is needed, and the `weight gradient` is discarded. However, the current PyTorch autograd engine will always calculate the `weight gradient` if `weight` requires gradient (the calculation of the high-order derivative is performed during training). The figure attached shows the autograd graph of the following code snippet: ```py y = torch.nn.functional.linear(x, weight, bias) y = y.pow(2) # first order derivative y__x, = torch.autograd.grad(y, x, grad_outputs=grad_outputs, create_graph=True) # first order derivative y__x__x, = torch.autograd.grad(y__x, x, grad_outputs=grad_outputs, create_graph=True) ``` The path with ❌ is not needed when calculating derivatives. <img width="50%" alt="image" src="https://user-images.githubusercontent.com/9999318/182018117-719c5a23-bcc6-4a63-8e8d-1bca3ebda2e3.png"> ### Issue <!-- Link to Issue ticket or RFP --> Related issue: https://github.com/pytorch/pytorch/issues/56500 ### Method When calling `torch.autograd.grad`, `exec_info_` is created for each GraphTask, which allows filtering paths on the graph that are not needed. However, when the GraphTask calls into the node, the node still does not know whether the edges are needed or not. In the case of matmul, `weight.requires_grad is True` so the weight gradient is always calculated. Following https://github.com/pytorch/pytorch/issues/56500#issuecomment-825694656, this PR passes the graph task's thread_local `exec_info_` into the node, so it could trim unnecessary edges during `torch.autograd.grad` calls. ### Benchmark Benchmark script: https://gist.github.com/yueyericardo/24158433a2021c51eeef9c3e2722df99 Benchmark result: 6 hidden layers, batch size 10000, on A100 FP32 result \| hessian benchmark \| FP32 (before) \| FP32 (After) \| FP32 (Functorch v0.1.1) \| \| ----------------------------- \| ------------- \| ----------------- \| ----------------------- \| \| Linear + ReLU (no backward) \| 55.658 ms \| 29.392 ms (1.90X) \| 29.547 ms (1.90X) \| \| Linear + ReLU (with backward) \| 81.173 ms \| 54.917 ms (1.47X) \| 68.988 ms (1.18X) \| TF32 result \| hessian benchmark \| TF32 (before) \| TF32 (after) \| TF32 (Functorch v0.1.1) \| \| ----------------------------- \| ------------- \| ----------------- \| ----------------------- \| \| Linear + ReLU (no backward) \| 19.801 ms \| 11.259 ms (1.76X) \| 10.754 ms (1.84X) \| \| Linear + ReLU (with backward) \| 29.167 ms \| 20.466 ms (1.42X) \| 22.784 ms (1.28X) \| For FP32 result, we could get 1.9X speed up for hessian calculation, and 1.47X speed up during training, which is even faster than functorch `vmap(jacfwd(jacrev` implementation. (functorch has performance regression on v0.2.0, https://github.com/pytorch/functorch/issues/989, so we are using v0.1.1 for benchmark) @zou3519 does functorch also includes similar optimizations during hessian calculation? If not, what do we need to do so the functorch could also benefit from this PR? ### Testing <!-- How did you test your change? --> - [x] we need to figure out a way for unittest ### Thanks Thanks for the great blog: [How Computational Graphs are Executed in PyTorch \| PyTorch](https://pytorch.org/blog/how-computational-graphs-are-executed-in-pytorch/) cc @zasdfgbnm @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/82544 Approved by: https://github.com/soulitzer	2022-08-11 18:50:09 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
Yulv-git	ac2d2e3a3d	Fix some typos. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561 Approved by: https://github.com/albanD	2022-04-11 21:55:59 +00:00
soulitzer	e39bf13316	Fix internal assert custom function when input does not require grad (#72008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72008 Fixes #71119 Technically BC-breaking because when an input does not require grad, previously it was returned as-is instead of a view because it didn't need to. Now we will also return a view in that case (whether or not forward AD runs). Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33859553 Pulled By: soulitzer fbshipit-source-id: 81b3fa371f4c0904630878500aa190492c562367 (cherry picked from commit `ee74bc8234`)	2022-02-01 22:36:04 +00:00
soulitzer	7a0c97195f	Add save_for_forward to custom function (#71569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71569 Not sure if this is the right API Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33695395 Pulled By: soulitzer fbshipit-source-id: 652b5758f15d901f98ff0da94e977030c7f3415b (cherry picked from commit `9421a6846a`)	2022-01-25 07:30:46 +00:00
soulitzer	09aeadf4ab	Fix custom function forward AD internal assert (#71531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71531 Based on the comment above the original internal assert, this is the desired check. 1. Don't error, and automatically make jvp return a view for that tensor output (this is easier than I originally thought: https://github.com/pytorch/pytorch/pull/71531#discussion_r789211877) 2. Error (currently doing) Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33695399 Pulled By: soulitzer fbshipit-source-id: dba49890a55ad1dd59ed5c41faa96bf7cfc9e562 (cherry picked from commit `fdb0f266f5`)	2022-01-25 07:30:46 +00:00
Peter Bell	d701357d92	Factor out TensorBase that doesn't depend on native operators (#63612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612 This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to be rebuilt every time someone changes an operator signature. Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to minimize friction in code mixing the two types. To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build system for certain folders, or just define it at the top of any file. I've also included an example of manually special-casing the commonly used `contiguous` operator. The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in `Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can materialize a `Tensor` for use in dispatch without actually increasing its refcount. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728580 Pulled By: ezyang fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03	2021-09-08 13:28:54 -07:00
Alban Desmaison	e322547fe6	Add forward AD support for custom Functions (#64061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64061 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D30640868 Pulled By: albanD fbshipit-source-id: b0e6610430a879074d6d5306443772fc154b431f	2021-09-01 14:33:09 -07:00
albanD	99e28baeba	Small custom function refactor which doesn't change anything (#63433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63433 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30431970 Pulled By: albanD fbshipit-source-id: 905fa4d2ddeca18005b1bcb13dd6f8a080327e7c	2021-08-20 08:44:23 -07:00
Richard Barnes	e3d75b8475	irange for PyTorch sans jit (#59481 ) Summary: Switches most of the simple for loops outside of `jit` directories to use `c10::irange`. Generated with D28874212. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59481 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D28909681 fbshipit-source-id: ec9ab1bd602933238d9d0f73d4d8d027b75d9d85	2021-06-09 14:46:11 -07:00
albanD	5e72571df3	Fix wrong changes from #54103 (#54610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54610 The `.is_view()` method actually only refers to backward mode views This is not a problem right now in master (and thus I didn't revert the other PR) because nothing creates forward AD views. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D27396756 Pulled By: albanD fbshipit-source-id: 64ff11c6f2486c6430714988d1cf6ecf3d80dccb	2021-04-05 07:48:23 -07:00
Pritam Damania	4fa47e5e7d	Support non-tensor inputs and outputs for checkpointed functions. (#52422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52422 As mentioned in https://github.com/pytorch/pytorch/issues/52415, `torch.utils.checkpoint` doesn't support checkpointing for functions which have non-tensor inputs and outputs. This PR resolves this issue by ensuring the autograd machinery ignores the non-tensor inputs and outputs and processes the tensors accordingly. ghstack-source-id: 124406867 Test Plan: 1) unit test 2) waitforbuildbot Reviewed By: albanD Differential Revision: D26507228 fbshipit-source-id: 0a5a1591570814176185362e83ad18dabd9c84b0	2021-03-19 21:29:03 -07:00
albanD	cc92117aad	cleanup static_cast of AutogradMeta (#54103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54103 The goal is to reduce the spread of static casts in the autograd code as per the comment in https://github.com/pytorch/pytorch/pull/49097#discussion_r543695091 I wasn't sure how to use a virtual method here but a simple method in impl clean it up quite nicely. Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D27117840 Pulled By: albanD fbshipit-source-id: 5f277dde34ccf6bc20f76583b906ff3528cde5aa	2021-03-18 09:29:07 -07:00
albanD	c23808d8e8	Reland: Add base forward grad logic (#49734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49734 RFC: https://github.com/pytorch/rfcs/pull/11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D25678797 Pulled By: albanD fbshipit-source-id: 3d58550c11b5f58b9b73fd30596d042b857fb9dd	2020-12-22 12:11:27 -08:00
Walter Shen	f5178bf151	Revert D25607503: Add base forward grad logic Test Plan: revert-hammer Differential Revision: D25607503 (`fdf02eff3d`) Original commit changeset: f1396290de1d fbshipit-source-id: 057206e28ff48ee288856adfe3ca577d4880789f	2020-12-21 19:56:28 -08:00
albanD	fdf02eff3d	Add base forward grad logic (#49097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49097 RFC: https://github.com/pytorch/rfcs/pull/11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25607503 Pulled By: albanD fbshipit-source-id: f1396290de1d75760f3d380c43cdd56e86fa6099	2020-12-21 14:39:43 -08:00
Heitor Schueroff de Souza	ffc3da35f4	Don't materialize output grads (#41821 ) Summary: Added a new option in AutogradContext to tell autograd to not materialize output grad tensors, that is, don't expand undefined/None tensors into tensors full of zeros before passing them as input to the backward function. This PR is the second part that closes https://github.com/pytorch/pytorch/issues/41359. The first PR is https://github.com/pytorch/pytorch/pull/41490. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41821 Reviewed By: albanD Differential Revision: D22693163 Pulled By: heitorschueroff fbshipit-source-id: a8d060405a17ab1280a8506a06a2bbd85cb86461	2020-08-11 04:27:07 -07:00
albanD	45c5bac870	[WIP] Fix cpp grad accessor API (#40887 ) Summary: Update the API to access grad in cpp to avoid unexpected thread safety issues. In particular, with the current API, a check like `t.grad().defined()` is not thread safe. - This introduces `t.mutable_grad()` that should be used when getting a mutable version of the saved gradient. This function is not thread safe. - The `Tensor& grad()` API is now removed. We could not do a deprecation cycle as most of our call side use non-const Tensors that use the non-const overload. This would lead to most calls hitting the warning. This would be too verbose for all the users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40887 Reviewed By: ezyang Differential Revision: D22343932 Pulled By: albanD fbshipit-source-id: d5eb909bb743bc20caaf2098196e18ca4110c5d2	2020-07-16 09:11:12 -07:00
Alban Desmaison	b88b7d552f	Prevent custom Functions from creating non differentiable type that requires grad (#38326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38326 Test Plan: Imported from OSS Differential Revision: D21668740 Pulled By: albanD fbshipit-source-id: f452f65e76003492055311523a652937b1300183	2020-05-21 08:30:14 -07:00
Richard Zou	f5ee46f1cf	Remove custom function in no_grad block error message (#33896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33896 Fixes #32625. Previously, we'd receive an error message if we have a custom function return a view of an input in a no_grad block: ``` class Alias(Function): staticmethod def forward(ctx, x): return x[:] staticmethod def backward(ctx, gx): return gx inp = torch.rand(2, requires_grad=True) with torch.no_grad(): # Used to error out output = Alias.apply(inp) ``` After this change, the error no longer happens. The behavior changes to become consistent to if we had implemented an operator that does the same thing as the custom function: - the output requires_grad - we are able to detect (and error out) if the user tries to modify the output in-place outside of the no_grad block. Test Plan: - new test Differential Revision: D20345601 Pulled By: zou3519 fbshipit-source-id: 7f95b4254f52ddbf989d26f449660403bcde1c78	2020-03-10 07:58:55 -07:00
albanD	8908b62fb2	Clean views created inside no_grad that are modified inplace (#32839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32839 As mentioned in the updated comment in `variable.h`, this disambiguate code like: ```python base = torch.rand(10, requires_grad=True) with torch.no_grad(): view = base[1] view.copy_(var) torch.autograd.grad(base.sum(), var) # <- what should it return? ``` Given that there is no consensus of what should happen here (does the gradient flow through the view in the no_grad or not). This special case is detected and forbidden. As mentionned in the error message: - If you want it to be tracked: move both out of the no_grad - If do not want them to be tracked, move both inside the no_grad This implies that any custom Function that returns views does not allow inplace modification on its output. I'll add a PR to the stack to relax this to be a DeprecationWarning for now. And we will make it into an actual error for 1.6 This replaces https://github.com/pytorch/pytorch/pull/26607 cc sublee Test Plan: Imported from OSS Differential Revision: D19814114 Pulled By: albanD fbshipit-source-id: ff2c9d97c8f876d9c31773a2170e37b06d88bed7	2020-02-19 14:55:53 -08:00
albanD	3e8d813263	Add more checks to custom Function (#33069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33069 This PR adds the following: - Warn when a non-input Tensor is given to `mark_dirty()` as it is not needed. - Raise an error if we modify inplace an input that is a view and that we have multiple output. This setting is not handled by `CopySlices` and will raise a cryptic error during the backward. - Raise an error if an input is modified inplace but not returned. That will prevent the graph rewrite from being done correctly. Test Plan: Imported from OSS Differential Revision: D19791563 Pulled By: albanD fbshipit-source-id: 4d8806c27290efe82ef2fe9c8c4dc2b26579abd1	2020-02-10 07:25:24 -08:00
albanD	e1c53a5c86	Fix version counter bump in cpp Function (#33068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33068 The version counter is already tracked if we use pytorch's functions but not if the user unpack the Tensor and modifies it by hand or with a third party library. Test Plan: Imported from OSS Differential Revision: D19791564 Pulled By: albanD fbshipit-source-id: a73c0f73d8fd0c0e5bf838f14bed54fa66937840	2020-02-10 07:22:29 -08:00
Nathan Goldbaum	f531815526	Deprecate tensor.type() (#30281 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29161. I looked a bit at the code changes related to this and think I have all of the use cases of `DeprecatedTypeProperties` covered in the message, but suggestions from someone with more context on this would be very much appreciated :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30281 Differential Revision: D18830818 Pulled By: ezyang fbshipit-source-id: 1a7fcee15354ae09e6644577e7fa33bd26acfe20	2019-12-05 10:55:34 -08:00
Edward Yang	1ab2f043ba	Move most methods off Variable into torch::autograd::impl functions. (#29665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29665 Our intention is to merge the static distinction between Tensor and Variable. Ordinarily, this would entail merging the methods of Tensor and Variable. But there are a lot of "private"-ish methods on Variable that we don't actually want to dump onto the Tensor class. So, as prep work, we move all of those methods off of Variable and into the torch::autograd::impl namespace (impl as in, please don't use this end users). This ends up being a fairly large patch because all of the call sites have to play ball too. While I was on the topic, I also moved any of the touched functions into the C++ file, so that modifying them would not trigger a recompilation of all of torch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18496169 Pulled By: ezyang fbshipit-source-id: afb203252620ec274be596b3e7b1d84d321bad3a	2019-11-18 08:12:12 -08:00
Edward Yang	4e21157e01	Revert "Revert D18171156: Merge Tensor and Variable." (#29299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29299 This reverts commit `9c43b16df9`, but also with the changes from D18348622. Comments there: thpp-compatibility is used by admarket/adreview/service:adreviewservice and libtorch is too big for the service to deal with. thpp-compatibility doesn't support autograd, so we hack around dispatching variables by using AutoNonVariableTypeMode everywhere we call into ATen, so we never attempt to call into Variable stubs. If you get it wrong, you'll get an error like: ``` what(): Could not run 'aten::empty' with arguments from the 'VariableTensorId' backend. 'aten::empty' is only available for these backends: [SparseCPUTensorId, CPUTensorId, MkldnnCPUTensorId]. (lookup_ at caffe2/aten/src/ATen/core/dispatch/DispatchTable.h:298) ``` Test Plan: Imported from OSS ``` buck test //thpp-compatibility/... buck build mode/opt-clang admarket/adreview/service:adreviewservice ``` adreviewservice canary: https://our.intern.facebook.com/intern/ads/canary/422290029716387895 (comparing against parent comment due to current breakage) ==> experiment store https://our.intern.facebook.com/intern/experiment_store/experiment/43990006/ adfinder canary: https://our.intern.facebook.com/intern/ads/canary/422268535840333934 adindexer canary: https://our.intern.facebook.com/intern/ads/canary/422268550559034675 adreview second canary: https://our.intern.facebook.com/intern/ads/canary/422307863515591925 canary without thpp-compat fixups https://our.intern.facebook.com/intern/ads/canary/422308951649168772 Reviewed By: dreiss Differential Revision: D18353504 Pulled By: ezyang fbshipit-source-id: 65feaba39fa07bb66762810909aeb38868668a30	2019-11-08 09:11:20 -08:00
Edward Yang	9c43b16df9	Revert D18171156: Merge Tensor and Variable. Test Plan: revert-hammer Differential Revision: D18171156 Original commit changeset: 5b6a045beba3 fbshipit-source-id: f5581d902c2305018ea49f8473592be2a465560b	2019-11-06 10:57:00 -08:00
Edward Yang	25261a4776	Merge Tensor and Variable. (#28620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28620 All Tensors are Variables now, they just happen to have requires_grad=False. Tensors ALWAYS have `VariableTensorId` in their type set. When constructing this patch, I had to make decisions about what I would fix in this patch, and what I would leave for follow up PRs. Here is the cleanup that happens in this patch: - The `is_variable` property is removed from TensorOptions. I removed this immediately because unlike Tensor::is_variable, TensorOptions::is_variable doesn't respect our VariableTensorId thread-local state. This means that there were a bunch of places where TensorOptions::is_variable was false, which is obviously bogus in the world when tensor and variable are merged. Instead of keeping the method as a function that always returns true, I just opted to remove it entirely (it's not public API.) All places we set `is_variable` are deleted. - Knock on effect: there is no longer a separate DeprecatedTypeProperties for the variable and non-variable versions of type. - Knock on effect: instead of asserting on TensorOptions::is_variable, instead we just test `at::impl::variable_is_excluded()` - There is now only one copy of the cuDNN RNN dropout cache, not two (I'm not sure why we had two to begin with) Some cleanup that doesn't happen in this patch: - Eliminating unnecessary uses of `make_variable` - Eliminating `Tensor::is_variable` The most subtle part of this patch is retaining tracing behavior: the fact that everything is a Variable means that more code gets routed to VariableType than before; this can change traces. I identified two places where we didn't appropriately turn off VariableType, mostly factory functions: - `torch.tensor` must turn off VariableType before invoking `at::empty` to construct the tensor, as it subsequently does direct data access - `tensor_slow` (invoked when you pass a Python scalar to a tensor argument) must turn off VariableType before calling `scalar_to_tensor` so the scalar gets traced as constant, rather than as a call to `scalar_to_tensor`. Honestly, these are all giant hacks, and should be replaced with a more specialized guard that just toggles tracing. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D18171156 Pulled By: ezyang fbshipit-source-id: 5b6a045beba37492647e350190f495114e86504d	2019-11-04 14:59:57 -08:00
Edward Yang	9bdcc499d1	Delete a few cases where we directly use Backend/TensorTypeId. (#25467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25467 Use Layout/Device more directly in these cases. ghstack-source-id: 89289651 Test Plan: sandcastle and ossci Differential Revision: D17131883 fbshipit-source-id: ab3c6d1c879b7f26f20a2378364c852dc37508fc	2019-08-30 13:00:20 -07:00
mal	6b656565ab	Hooks for C++ API (#24393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24393 Ability to register hook on a variable, similar to python autograd API. register_hook will take a function as argument and create a CppFunctionPreHook similar to PyFunctionPreHook. It will return the index of the hook which can be passed to remove_hook to disable the hook. Test Plan: Added tests. Differential Revision: D16861722 fbshipit-source-id: d08047f932e38c7bde04283a18b2d0311c8ad604	2019-08-16 12:44:20 -07:00
mal	ec13f18390	Allow empty Variables to be saved for backwards (#23618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23618 For example: `save_for_backward({Variable(), x, Variable()})` should be allowed, so that this is consistent with the python API behaviour. Test Plan: Added a test similar to the python test `test_save_none_for_backward` from test_autograd.py. Differential Revision: D16589402 fbshipit-source-id: 847544ad8fc10772954d8629ad5a62bfdc1a66c1	2019-07-31 19:51:35 -07:00
mal	3fa2df7c9a	Support custom autograd functions in C++ (#23572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23572 ### (The stack from #23020 was moved into this PR) Adding API for custom autograd operations, with user defined forward and backward, [like in python](https://pytorch.org/docs/stable/notes/extending.html#extending-torch-autograd). The custom operation should be a subclass of Function, with static forward and backward functions. `forward()` can accept any arguments similar to the Python API and `backward()` should accept a variable list as an argument. Both `forward()` and `backward() `accept a AutogradContext* which can be used to share data between them. Variables can be saved in the context using `save_for_backward()` and other data can be saved in the map `save` in the form of `<std::string, at::IValue>` pairs. Variables saved in forward can be accessed with `get_saved_variables()`. Example usage: ``` class MyFunction : public Function<MyFunction> { public: static variable_list forward(AutogradContext ctx, int n, Variable var) { // Save data for backward in context ctx->saved_data["n"] = n; return {var}; } static variable_list backward(AutogradContext ctx, variable_list grad_output) { // Use data saved in forward auto n = ctx->saved_data["n"].toInt(); return {grad_output[0]*n}; } }; ``` Then, it can be used with: ``` Variable x; MyFunction::apply(6, x); ``` Also AutogradContext has methods to mark outputs as non differentiable and mark inputs as dirty similar to the [Python API](`ff23a02ac4/torch/autograd/function.py (L26)`). Test Plan: Added tests for the custom autograd function API based on test_autograd.py. Currently only the tests for the basic functionality have been added. More tests will be added later. Differential Revision: D16583428 fbshipit-source-id: 0bd42f19ce37bcd99d3080d16195ad74d40d0413	2019-07-31 11:30:48 -07:00
mal	e7a9b0d62f	Rename torch::autograd::Function to torch::autograd::Node Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23269 Test Plan: Imported from OSS Differential Revision: D16454878 fbshipit-source-id: b1e840fc2d3901955280d141e5ad6efd5e9d66af	2019-07-23 20:52:22 -07:00
mal	44493a623e	Pass variable_list of inputs to _wrap_outputs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23037 Test Plan: Imported from OSS Differential Revision: D16380071 fbshipit-source-id: ae3333c02ef8a3c09b95bec7b8e92ce649553615	2019-07-19 12:31:23 -07:00
mal	58e20638f7	Refactoring _wrap_outputs to remove python dependence. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22631 Test Plan: test suite Imported from OSS Differential Revision: D16185040 fbshipit-source-id: 9b83749f6c9cd05d13f54a3bb4801e263293252b	2019-07-10 12:12:16 -07:00

44 Commits