pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Tugsbayasgalan Manlaibaatar	bf7307adf8	Support inference_mode decorator (#109274 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109274 Approved by: https://github.com/williamwen42	2023-09-27 22:21:42 +00:00
FFFrog	70f2adaec3	Setup_context does not contain default values of forward() (#108561 ) Fixes #108529 As the title shown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108561 Approved by: https://github.com/soulitzer	2023-09-19 16:23:52 +00:00
Emil Laftchiev	f2639a2c37	Back out "Dynamo support for autograd.Function w/ once_differentiable (#108686 )" (#109199 ) Summary: Original commit changeset: e11cddf1fecc Original Phabricator Diff: D49064185 Test Plan: Comparing PT1 and PT2 performance on the IG Feed Model with this diff backed out: N4274204 Comparing the PT1 and PT2 performance on IG Feed with this diff committed: N4271093 Reviewed By: zou3519 Differential Revision: D49230047 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109199 Approved by: https://github.com/zou3519, https://github.com/xw285cornell	2023-09-13 15:43:20 +00:00
Richard Zou	ef2bbe1ae1	Dynamo support for autograd.Function w/ once_differentiable (#108686 ) Fixes #106893 There are two main changes: - Before this PR, the function returned by once_differentiable was included in skipfiles (because its .co_code is torch/autograd/function.py). This PR adds a mechanism to tell Dynamo to inline a function, no matter if it is included in skipfiles. - A bugfix: when we are introspecting the backward, we need to turn the grad mode off. This is to accurately model the eager-mode semantics: In eager-mode PyTorch, if second-order gradients were not requested, then the grad mode is off. torch.compile does not work with higher-order gradients and just assumes we do first-order gradients, so this is OK. Test Plan: - new test Differential Revision: [D49064185](https://our.internmc.facebook.com/intern/diff/D49064185) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108686 Approved by: https://github.com/voznesenskym	2023-09-08 16:10:32 +00:00
David Berard	06b173780d	[dynamo] "TorchDynamo Cache Lookup" event: use C++ api (#108436 ) Background: "TorchDynamo Cache Lookup" events appear in traces to indicate a dynamo cache lookup; it's useful to check when cache lookups are taking a long time. To add a profiler event, one can use the `torch.profiler.record_function` context manager, or the C++ equivalent. Previously, the python version was used; first, when the profiler was enabled, callbacks for record_function_enter and record_function_exit were registered; then those would be called before and after every cache lookup. This PR: Instead of calling the python bindings for `torch.profiler.record_function`, directly call the C++ implementation. This simplifies a lot of the code for binding C/C++. It also improves performance; previously there was a lot of overhead in the "TorchDynamo Cache Lookup" event, making the event artificially take a long time. After this change the events now appear shorter, because there's less overhead in starting/stopping the event: in other words, the profiler no longer distorts the results as much. Performance results: I ran using the script below on a cpu-only 1.6GHz machine. I report the median time (from 100 measurements) of a "TorchDynamo Cache Lookup" event before and after this PR. I think it is reasonable to consider the difference to be due to a reduction in overhead. <details> <summary>Benchmarking script</summary> ```python def fn(x, y): return (x * y).relu() a, b = [torch.rand((4, 4), requires_grad=True) for _ in range(2)] opt_fn = torch.compile(fn) opt_fn(a, b) opt_fn(a, b) with torch.profiler.profile() as prof: opt_fn(a, b) ``` </details> Median before PR: 198-228 us (median of 100, measured 5 times) Median after PR: 27us Pull Request resolved: https://github.com/pytorch/pytorch/pull/108436 Approved by: https://github.com/anijain2305, https://github.com/jansel	2023-09-04 04:37:26 +00:00
Jirka Borovec	9178deedff	removing some redundant str splits (#106089 ) drop some redundant string splits, no factual changes, just cleaning the codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/106089 Approved by: https://github.com/albanD, https://github.com/malfet	2023-09-01 00:22:58 +00:00
David Berard	8c66f97c9b	[profiler] move _enable_dynamo_cache_lookup_profiler (#107720 ) _enable_dynamo_cache_lookup_profiler used to get turned on when running `__enter__` or `__exit__` with the profiler. But it's possible to turn the profiler on and off without the context manager (e.g. with a schedule and calling `.step()`). Instead, we should put these calls (which are supposed to be executed when the profiler turns on/off) where `_enable_profiler()` and `_disable_profiler()` are called. This puts `_enable_dynamo_cache_lookup_profiler` and `_set_is_profiler_enabled` into `_run_on_profiler_(start\|stop)` and calls that on the 3 places where `_(enable\|disable)_profiler` get called. Differential Revision: [D48619818](https://our.internmc.facebook.com/intern/diff/D48619818) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107720 Approved by: https://github.com/wconstab	2023-08-23 23:41:35 +00:00
Aaron Gokaslan	660e8060ad	[BE]: Update ruff to 0.285 (#107519 ) This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings. I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519 Approved by: https://github.com/ezyang	2023-08-22 23:16:38 +00:00
PyTorch MergeBot	d59a6864fb	Revert "[BE]: Update ruff to 0.285 (#107519 )" This reverts commit `88ab3e4322`. Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))	2023-08-22 19:53:32 +00:00
David Berard	614b865721	[profiler] _RecordFunctionFast - faster python bindings for record_function (#107195 ) torch.profiler.record_function is relatively slow; for example, in some benchmarks I was running, x.view_as(x) was ~2us, and ~16-17us when wrapped in a record_function context. The reasons for this are: dispatcher overhead from going through an op (the main source of overhead), python binding / python conversion overhead, and some overhead from the context manager. This new implementation is faster, but it won't work with torchscript. Based on the benchmarks I was running, it adds 0.5-0.7us overhead per call when the profiler is turned off. To use it, you can just: ```python with torch._C._profiler_manual._RecordFunctionFast("title"): torch.add(x, y) ``` It implements a context manager in python which directly calls the record_function utilities, instead of calling through an op. * The context manager is implemented directly in python because the overhead from calling a python function seems non-negligible * All the record_function calls, python object conversions are guarded on checks for whether the profiler is enabled or not. It seems like this saves a few hundred nanoseconds. For more details about the experiments I ran to choose this implementation, see [my record_functions experiments branch](https://github.com/pytorch/pytorch/compare/main...davidberard98:pytorch:record-function-fast-experiments?expand=1). This also adds a `torch.autograd.profiler._is_profiler_enabled` global variable that can be used to check whether a profiler is currently enabled. It's useful for further reducing the overhead, like this: ```python if torch.autograd.profiler._is_profiler_enabled: with torch._C._profiler_manual._RecordFunctionFast("title"): torch.add(x, y) else: torch.add(x, y) ``` On BERT_pytorch (CPU-bound model), if we add a record_function inside CachedAutotuning.run: * Naive torch.profiler.record_function() is a ~30% slowdown * Always wrapping with RecordFunctionFast causes a regression of ~2-4%. * Guarding with an if statement - any regression is within noise Selected benchmark results: these come from a 2.20GHz machine, GPU build but only running CPU ops; running `x.view_as(x)`, with various record_functions applied (with profiling turned off). For more detailed results see "record_functions experiments branch" linked above (those results are on a different machine, but show the same patterns). Note that the results are somewhat noisy, assume 0.05-0.1us variations ``` Baseline:: 1.7825262546539307 us # Just running x.view_as(x) profiled_basic:: 13.600390434265137 us # torch.profiler.record_function(x) + view_as precompute_manual_cm_rf:: 2.317216396331787 us # torch._C._profiler_manual._RecordFunctionFast(), if the context is pre-constructed + view_as guard_manual_cm_rf:: 1.7994389533996582 us # guard with _is_profiler_enabled + view_as ``` Differential Revision: [D48421198](https://our.internmc.facebook.com/intern/diff/D48421198) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107195 Approved by: https://github.com/albanD, https://github.com/aaronenyeshi	2023-08-22 18:48:30 +00:00
Aaron Gokaslan	b1e8e01e50	[BE]: Apply PYI autofixes to various types (#107521 ) Applies some autofixes from the ruff PYI rules to improve the typing of PyTorch. I haven't enabled most of these ruff rules yet as they do not have autofixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107521 Approved by: https://github.com/ezyang	2023-08-20 02:42:21 +00:00
Aaron Gokaslan	88ab3e4322	[BE]: Update ruff to 0.285 (#107519 ) This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings. I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519 Approved by: https://github.com/ezyang	2023-08-20 01:36:18 +00:00
soulitzer	aa04b0536b	Fix inference_mode decorator pass mode as kwarg (#107349 ) Fixes https://fb.workplace.com/groups/1405155842844877/permalink/7330520550308347/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/107349 Approved by: https://github.com/albanD ghstack dependencies: #107296	2023-08-17 17:12:31 +00:00
andreasfloros	c9c90765c1	grad_mode decorators without paren (#107086 ) This PR implements the feature described in #107036 for `no_grad`, `enable_grad` and `inference_mode`. Users can still use the above as before but they can also use them without parentheses. For example: ```python import torch a = torch.ones(1, requires_grad=True) def do_something(): print(2 * a) with torch.no_grad(): do_something() # tensor([2.]) torch.no_grad()(do_something)() # tensor([2.]) torch.no_grad(do_something)() # tensor([2.]) do_something() # tensor([2.], grad_fn=<MulBackward0>) ``` For `inference_mode`, decorating without parenthesis is equivalent to decorating with the default `mode=True`, similiar to how dataclasses behave (https://docs.python.org/3/library/dataclasses.html#module-contents) Closes #107036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107086 Approved by: https://github.com/albanD	2023-08-15 05:25:33 +00:00
poseljacob	a25eee1d77	_force_original_view_tracking to work as both context manager and function (#106706 ) Fix _force_original_view_tracking to work as a function as well as a context manager, as stated by documentation. Applied similar fixes to PR: https://github.com/pytorch/pytorch/pull/105291 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106706 Approved by: https://github.com/albanD	2023-08-07 23:29:22 +00:00
MooYeh	fb6652b56e	[profiler] add profiler parsing support for custom device. (#106142 ) We hope PyTorch profiling parsing ability can also be applicable to custom devices. Based on previous work https://github.com/pytorch/pytorch/pull/101554, we have made supplementary updates to PyTorch profiling to extend its parsing capabilities for custom devices. These modifications do not affect the original logic of the code and mainly include the following aspects: 1. Added the relevant logic for use_device in torch.profiler.profiler._KinetoProfile. 2. In torch.autograd.profiler and torch.autograd.profiler_util, custom devices profiling data parsing ability has been added using privateuse1 and use_device attributes. 3. In torch._C._autograd.pyi and torch._C._autograd.pyi, custom devices related attributes have been added. The underlying C++ logic will be added in subsequent pull requests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106142 Approved by: https://github.com/aaronenyeshi	2023-08-02 20:23:22 +00:00
Alex Settle	9ba0558d48	Add sequence_nr to aot_autograd to map forward ops to their corresponding backward ops (#103129 ) Fixes #102375 Sequence_nr increments in the forward pass and decrements in the backward pass. Backward ops with the same sequence_nr as a forward op represent the backward implementation for the op. The long term goal is to make this information available to the profiler so users can observe which ops are fused by the inductor openai triton kernels. Added a test for this feature test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_aot_sequence_nr. The test case uses aot_export_module() to create a joint fwd/bwd fx graph. Then it walks all the nodes in fx graph using fx_graph.graph.nodes. The seq_nr of each node is recorded in node.meta. During the fwd pass the seq_nr increments and it decrements during the bwd pass. This allows the user to map forward ops to their corresponding bwd ops which is useful for performance analysis. Expected output from the test case. SeqNr\|OrigAten\|SrcFn 0\|aten.convolution.default\|l__self___conv1 0\|aten.add.Tensor\|l__self___bn1 1\|aten._native_batch_norm_legit_functional.default\|l__self___bn1 2\|aten.relu.default\|l__self___relu1 3\|aten.add.Tensor\|add 4\|aten.view.default\|flatten 5\|aten.t.default\|l__self___fc1 6\|aten.unsqueeze.default\|l__self___fc1 7\|aten.mm.default\|l__self___fc1 8\|aten.squeeze.dim\|l__self___fc1 9\|aten.add.Tensor\|l__self___fc1 10\|aten.sub.Tensor\|l__self___loss_fn 11\|aten.abs.default\|l__self___loss_fn 12\|aten.mean.default\|l__self___loss_fn 12\|aten.ones_like.default\| 12\|aten.expand.default\| 12\|aten.div.Scalar\| 11\|aten.sgn.default\| 11\|aten.mul.Tensor\| 8\|aten.unsqueeze.default\| 7\|aten.t.default\| 7\|aten.mm.default\| 7\|aten.t.default\| 7\|aten.t.default\| 7\|aten.mm.default\| 6\|aten.squeeze.dim\| 5\|aten.t.default\| 4\|aten.view.default\| 2\|aten.threshold_backward.default\| 1\|aten.native_batch_norm_backward.default\| 0\|aten.convolution_backward.default\| 0\|aten.add.Tensor\| Pull Request resolved: https://github.com/pytorch/pytorch/pull/103129 Approved by: https://github.com/soulitzer	2023-08-02 00:52:52 +00:00
Edward Z. Yang	3bf922a6ce	Apply UFMT to low traffic torch modules (#106249 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249 Approved by: https://github.com/Skylion007	2023-07-29 23:37:30 +00:00
Furkan Akkurt	3959695fbd	Fix typo ; Update grad_mode.py (#106045 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/106045 Approved by: https://github.com/albanD, https://github.com/soulitzer	2023-07-27 00:24:50 +00:00
Jason Ansel	c902b84e0b	Compiled autograd (#103822 ) This branch: 1) converts the autograd tape into an FX graph 2) caches that conversion using a "shadow" graph 3) compiles and runs the generated FX graph instead of the normal autograd What works currently: 1) Caching, capture, and initial integration 2) Backwards hooks 3) Inlining AotAutograd generated subgraphs 4) torch.compiling the generated FX graph 5) Auto-detecting dynamic shapes based on changes Future work 1) Larger scale testing 1) Boxed calling convention, so memory can be freed incrementally 1) Support hooks on SavedTensor 1) Additional testing by running eager autograd tests under compiled_autograd.enable() Pull Request resolved: https://github.com/pytorch/pytorch/pull/103822 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-07-24 21:12:05 +00:00
Justin Chu	4cc1745b13	[BE] f-stringify torch/ and scripts (#105538 ) This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`. - https://docs.python.org/3/reference/lexical_analysis.html#f-strings - https://pypi.org/project/flynt/ Command used: ``` flynt torch/ -ll 120 flynt scripts/ -ll 120 flynt tools/ -ll 120 ``` and excluded `collect_env.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-21 19:35:24 +00:00
Justin Chu	79c5e33349	[BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436 Approved by: https://github.com/malfet, https://github.com/albanD	2023-07-21 07:38:46 +00:00
poseljacob	1aba399138	allow set_multithreading_enabled to act as function and context manager (#105291 ) Fixes #104985 Implemented `set_multithreading_enabled` C++ function to directly alter state rather than using `MultithreadingEnabled` class, which was automatically resetting the state when the object was destroyed. This behavior more closely aligns with set_grad_enabled which does work as expected. This allows us to change python class `set_multithreading_enabled` to act as both a function and context manager. I also added a getter: `torch._C.is_multithreading_enabled` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105291 Approved by: https://github.com/albanD	2023-07-18 16:55:40 +00:00
shibo19	2961ea80f5	Deprecate "Type" and support more devices for save_on_cpu (#103245 ) Fixes #ISSUE_NUMBER 1、the class named "Type" has not been used anymore in anywhere, so I add warning message to remove it in the future. 2、add a arg(default is "cuda") for save_on_cpu so that it can support more device type (like privateuse1) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103245 Approved by: https://github.com/soulitzer	2023-06-09 05:05:01 +00:00
Richard Li	f1f57e1e54	trigger tracing for MTIA events (#102288 ) Summary: trigger tracing for MTIA events on python side when ProfilerActivity.MTIA is specified Test Plan: Test diff: D45437426 ``` hg graft D45437426 ``` - in one terminal ``` cd ~/fbsource/fbcode buck2 run -j 8 \ //infra_asic_fpga/firmware/tools/mad/service:mad_service ``` - in another terminal Pytorch profiler ``` buck run mode/dev-nosan -j 8 //caffe2/torch/fb/acc_runtime/afg/tests:test_afg -- -m kernel_add ``` Differential Revision: D46122853 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102288 Approved by: https://github.com/aaronenyeshi	2023-06-05 15:10:31 +00:00
soulitzer	9866408167	Multihooks should not keep tensor alive in closure (#102859 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102859 Approved by: https://github.com/albanD	2023-06-02 22:05:25 +00:00
dujinhang	2e8ce910bb	[Profiler][1/N] add profiler support for custom device. (#101554 ) 1. `torch.autograd.profiler` interface parameters changed. (use `self.use_device` instead of `self.use_cuda` facilitates access by other devices and integrate it in subsequent pr) 2. Modify `ProfilerEventStub`(aka `std::shared_ptr<CUevent_st>`) to `ProfilerVoidEventStub`(aka `std::shared_ptr<void>`) so that `ProfilerStubs` can be inherited by any `{device}Methods`. In addition, `cuda_event_start_` is renamed to `device_event_start_` , cuda and other devices can use this event pointer if needed. 4. custom device support using legacy profiling(add `ProfilerState::KINETO_PRIVATEUSE1_FALLBACK` option) 5. add `privateuse1Stubs` register (parse results and test cases are added in subsequent pr) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101554 Approved by: https://github.com/aaronenyeshi	2023-06-02 09:19:19 +00:00
Richard Zou	74f10b9ea5	Switch most Python RAII guard usages to context manager (#102642 ) There are some I can't easily switch due to reasons like: - Dynamo modelling the guard - BC concerns (for torch.autograd.set_multithreading_enabled) Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/102642 Approved by: https://github.com/albanD	2023-06-01 16:28:37 +00:00
lkct	e7681b53e3	Fix typing for `setup_context` in `autograd` (#101464 ) The original only matches a tuple of length 1, but it's intended to match any length. Also, it now aligns with the docstring at L320 `d5cba0618a/torch/autograd/function.py (L320)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/101464 Approved by: https://github.com/soulitzer, https://github.com/kit1980	2023-05-16 18:41:35 +00:00
David Berard	935100cbde	[profiler] When record_inputs=True, record scalar lists of length <= 30 (#100593 ) Many ops take as inputs scalars or scalar lists which are important to understand the properties of the op. For example, convolution ops' behavior and output shapes often depend on padding and strides, which are provided as scalars of lists of scalars. This will record scalar lists when record_inputs=True. Details: During collection (and this was true before this PR as well), we serialize values and tensor metadata into an InputOutputEncoder. After collection occurs, we deserialize these values to attach the information to each of the events. This PR does this: - Adds support for serializing scalar lists during collection / serialization - Adds an extra field called "Concrete Args" - Splits up the deserialization process into two steps - one for generating "input shapes" and one for generating "concrete args". We split up input shapes and concrete args to avoid interrupting any previous workflows that relied on the specific data in the input shapes category; additionally, it's just a better description. Note that single scalars will remain in the "input shapes" category as they were already in that category in the past. Differential Revision: [D45798431](https://our.internmc.facebook.com/intern/diff/D45798431) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100593 Approved by: https://github.com/aaronenyeshi	2023-05-16 07:58:46 +00:00
Jane Xu	4a7ee79bf9	[BE] super small comment update to gradcheck.py (#101103 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101103 Approved by: https://github.com/soulitzer	2023-05-12 16:41:44 +00:00
Oleh Lokshyn	35991df5d6	fix(docs): torch.autograd.graph.Node.register_hook can override grad_inputs, not grad_outputs (#100272 ) Fixes #99165 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100272 Approved by: https://github.com/soulitzer	2023-04-29 00:10:12 +00:00
Kiersten Stokes	bafa2c4724	Change 'w.r.t.' to 'wrt' in function docstrings to fix doc rendering (#100028 ) Fixes #72428 according to decision reached in comments. I've left other instances of `w.r.t.` in tact (e.g. in parameter/return descriptions, in comments, etc) because there were many, and I didn't' want to go out-of-scope. That being said, I'm happy to change those as well if we'd prefer the consistency! I've also fixed a typo that I came across while grepping for instances. Will update with screenshots once docs are built. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100028 Approved by: https://github.com/albanD	2023-04-25 23:53:26 +00:00
Aaron Gokaslan	e2a3817dfd	[BE] Enable C419 rule for any all shortcircuiting (#99890 ) Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890 Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet	2023-04-25 15:02:13 +00:00
Kazuaki Ishizaki	f7fe6e148e	[test] Make environment variable name better (#97356 ) This PR intends to use better (or correct?) environment variable name (`TORCH_DOCTEST_ANOMALY` instead of `TORCH_DOCTEST_ANOMOLY`) in test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97356 Approved by: https://github.com/malfet, https://github.com/kit1980	2023-03-30 06:21:28 +00:00
Sergii Dymchenko	477f3f555f	Simplify by using yield from (#97831 ) The issues were found by SIM104 flake8-simplify in a local run. I'll take a look on adding the check to the CI separately. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97831 Approved by: https://github.com/Skylion007	2023-03-29 19:15:24 +00:00
soulitzer	d0abc31428	Remove unnecessary retain_grad call from gradcheck (#96923 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96923 Approved by: https://github.com/albanD	2023-03-27 13:38:28 +00:00
Pearu Peterson	9d5ac03b9a	Deprecate gradcheck check_sparse_nnz argument as duplicate of masked argument (#97187 ) As in the title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97187 Approved by: https://github.com/soulitzer	2023-03-22 14:11:03 +00:00
Qi Zhu	086ce765a5	Add new parameter `materialize_grads` to torch.autograd.grad() (#97015 ) Fixes #44189 Adds a new parameter, zero_grad_unused, to the torch.autograd.grad() function. This parameter allows for the gradient to be set to 0 instead of None when a variable is unused, which can be helpful for higher-order partial differentials. Here is an example of using this new parameter to solve d^3y/dx^3 given y = a * x: ```python x = torch.tensor(0.5, dtype=torch.float32, requires_grad=True) a = torch.tensor(1, dtype=torch.float32, requires_grad=True) y = x * a dydx = torch.autograd.grad(y, x, create_graph=True, allow_unused=True) d2ydx2 = torch.autograd.grad(dydx, x, allow_unused=True, zero_grad_unused=True) try: d3ydx3 = torch.autograd.grad(d2ydx2, x, allow_unused=True, zero_grad_unused=True) except RuntimeError as e: assert False, "Should not raise error" ``` With `zero_grad_unused`, d2ydx2 could be 0 instead of None, enabling d3ydx3 to be calculated as defined in math without throwing an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97015 Approved by: https://github.com/soulitzer	2023-03-18 03:11:12 +00:00
albanD	985fc66b30	Bind increment_version to python (#96852 ) Should be convenient when writing python-only kernels (with triton) that don't have access to the C++ APIs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96852 Approved by: https://github.com/soulitzer	2023-03-17 20:36:33 +00:00
Luke Confait	46eaf4be7d	Fix Typo in pytorch/torch/autograd/__init__.py (#97024 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97024 Approved by: https://github.com/Skylion007, https://github.com/soulitzer	2023-03-17 16:24:18 +00:00
Pearu Peterson	2abcafcfd8	Add masked_grad kw argument to to_dense (#96095 ) As in the title. The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`. As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example, ```python torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense()) torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense()) ``` (recall, gradcheck has `masked=False` as default) must be updated to ```python torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False)) torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True) ``` Fixes https://github.com/pytorch/pytorch/issues/95550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96095 Approved by: https://github.com/cpuhrsch	2023-03-16 21:38:11 +00:00
Will Constable	784dd583a6	Automatically register/clear dynamo profiler hooks while profiling (#96199 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96199 Approved by: https://github.com/jansel	2023-03-14 21:19:33 +00:00
Andrew Gu	457396fcdc	[Autograd] `expand_as` instead of `clone` to get `AccumulateGrad` (#96356 ) This PR makes a minor change to the multi-grad hook implementation. This should decrease peak memory since we avoid one `clone()` per tensor passed into the multi-grad hook. Let me know if there are technical reasons why we need to clone. If so, is there a way for some use cases to not clone? Before with `clone()`: ![Screenshot 2023-03-08 at 6 08 41 PM](https://user-images.githubusercontent.com/31054793/223873111-ad9105ab-2958-45a1-a2f5-18e9b254c710.png) After with `expand_as()` -- no more "Memcpy DtoD" kernels: ![Screenshot 2023-03-08 at 6 08 48 PM](https://user-images.githubusercontent.com/31054793/223873104-670b6abc-cd5c-4d1e-a316-cea1bef5832a.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96356 Approved by: https://github.com/soulitzer	2023-03-09 21:58:42 +00:00
Pearu Peterson	b89fda51cd	Implement sparse semantics support in gradcheck (2nd try) (#95405 ) Replaces https://github.com/pytorch/pytorch/pull/94714 that was reverted due to https://github.com/pytorch/pytorch/pull/94714#issuecomment-1442355648 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95405 Approved by: https://github.com/albanD	2023-02-27 17:48:02 +00:00
Jane Xu	6dc81f7bdd	Update docs that Parameters are immune to no_grad mode (#95232 ) Fixes https://github.com/pytorch/pytorch/issues/83998 ![image](https://user-images.githubusercontent.com/31798555/220971800-4af57d92-9f15-4e13-bfe4-73e2ff1cd943.png) ![image](https://user-images.githubusercontent.com/31798555/221019508-d7330a16-7f01-4d37-a1af-a4905e9596c4.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95232 Approved by: https://github.com/soulitzer	2023-02-23 23:33:19 +00:00
Zain Rizvi	808879ec8b	Revert "Implement sparse semantics support in gradcheck (#94714 )" (#95386 ) This reverts commit `7ac511c29a` from https://github.com/pytorch/pytorch/pull/94714 since it breaks periodic. Git thinks there's a merge conflict due to an unfortunately located newline deletion, so reverting this one manually Details behind the failure in https://github.com/pytorch/pytorch/pull/94714#issuecomment-1442160593 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95386 Approved by: https://github.com/clee2000	2023-02-23 18:02:37 +00:00
PyTorch MergeBot	cb6e38d89d	Revert "Update docs that Parameters are immune to no_grad mode (#95232 )" This reverts commit `5783cee2a3`. Reverted https://github.com/pytorch/pytorch/pull/95232 on behalf of https://github.com/ZainRizvi due to This caused the test_doc_examples test to fail on trunk	2023-02-23 17:43:45 +00:00
Jane Xu	5783cee2a3	Update docs that Parameters are immune to no_grad mode (#95232 ) Fixes https://github.com/pytorch/pytorch/issues/83998 ![image](https://user-images.githubusercontent.com/31798555/220971800-4af57d92-9f15-4e13-bfe4-73e2ff1cd943.png) ![image](https://user-images.githubusercontent.com/31798555/220971892-35554d17-fc44-4211-9017-7a5555ae3bb1.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95232 Approved by: https://github.com/soulitzer	2023-02-23 16:41:54 +00:00
kshitij12345	3b966a6ce3	[autograd] disable backward/grad for complex scalar output (#92753 ) Fixes https://github.com/pytorch/pytorch/issues/92750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753 Approved by: https://github.com/ezyang	2023-02-23 11:38:27 +00:00

1 2 3 4 5 ...

820 Commits