pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
lezcano	8597d37536	Implement numpy(force=True) (#109636 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109636 Approved by: https://github.com/ezyang ghstack dependencies: #109634	2023-09-20 20:06:13 +00:00
Edward Z. Yang	103260a43b	Re-define check for `typing` classes. (#109201 ) This PR fix the `is_typing` function: checks whether a value is an instance of a class from the `typing` package. This reverts commit b09c09f7bb3adb6a5b8a107a5b96757b569daa8d. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109201 Approved by: https://github.com/ezyang	2023-09-20 00:04:56 +00:00
leslie-fang-intel	55c19a3c6d	Inductor: Increase multiplier to 3 for Inductor AMP benchmark correctness check (#109097 ) Summary As reported in https://github.com/pytorch/pytorch/issues/108333, we find some of the models have failed the benchmark's correctness check. However, the end-to-end model's accuracy ([test script](https://gist.github.com/leslie-fang-intel/aac8b3c2b450532fd0517c758bb845e0)) when comparing AMP with FP32 is within a difference of less than 0.1%. Thus, it's possible that the correctness check failures for these models are false alarms. We use multiplier of 3 instead of 2 in this PR to avoid these false alarms. Model end-to-end accuracy test results are: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/jiahaofa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/jiahaofa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> SPR \| \| \| \| \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| FP32 Imperative TOP1 Accuracy \| FP32 Imperative TOP5 Accuracy \| BF16 AMP Inductor TOP1 Accuracy \| BF16 AMP Inductor TOP5 Accuracy \| BF16/FP32 Relative Loss TOP1 Accuracy \| BF16/FP32 Relative Loss TOP5 Accuracy gluon_inception_v3 \| 73.262 \| 90.774 \| 73.256 \| 90.802 \| -0.01% \| 0.03% mobilenetv2_100 \| 72.89 \| 90.996 \| 72.826 \| 90.946 \| -0.09% \| -0.05% mobilenetv3_large_100 \| 75.72 \| 92.55 \| 75.764 \| 92.554 \| 0.06% \| 0.00% </body> </html> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109097 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-09-16 10:02:56 +00:00
Yukio Siraichi	dfdc0b63c9	Bisect FX node asserts on `ValidationException`. (#107493 ) This PR introduces binary search for finding smaller validation errors, when they occur. We do that by bisecting the sequence of `torch._assert` FX nodes recorded as the source expression of the translation validator (TV) by `ShapeEnv.evaluate_expr` calls. Then, we raise the error caused by the earliest node. In summary, the changes are: - Call `bisect` on `ValidationError` @ _torch/_dynamo/convert_frame.py_ - Implement the binary search @ _torch/fx/experimental/symbolic_shapes.py_ Edit: moved `ShapeEnv` replay-recording to #107989 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107493 Approved by: https://github.com/ezyang ghstack dependencies: #107989	2023-09-15 15:18:12 +00:00
Emil Laftchiev	f2639a2c37	Back out "Dynamo support for autograd.Function w/ once_differentiable (#108686 )" (#109199 ) Summary: Original commit changeset: e11cddf1fecc Original Phabricator Diff: D49064185 Test Plan: Comparing PT1 and PT2 performance on the IG Feed Model with this diff backed out: N4274204 Comparing the PT1 and PT2 performance on IG Feed with this diff committed: N4271093 Reviewed By: zou3519 Differential Revision: D49230047 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109199 Approved by: https://github.com/zou3519, https://github.com/xw285cornell	2023-09-13 15:43:20 +00:00
Nakul Camsamudram	3b265e021f	Support Optional typehint without graph breaking (#108970 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108970 Approved by: https://github.com/anijain2305	2023-09-11 16:42:44 +00:00
Richard Zou	ef2bbe1ae1	Dynamo support for autograd.Function w/ once_differentiable (#108686 ) Fixes #106893 There are two main changes: - Before this PR, the function returned by once_differentiable was included in skipfiles (because its .co_code is torch/autograd/function.py). This PR adds a mechanism to tell Dynamo to inline a function, no matter if it is included in skipfiles. - A bugfix: when we are introspecting the backward, we need to turn the grad mode off. This is to accurately model the eager-mode semantics: In eager-mode PyTorch, if second-order gradients were not requested, then the grad mode is off. torch.compile does not work with higher-order gradients and just assumes we do first-order gradients, so this is OK. Test Plan: - new test Differential Revision: [D49064185](https://our.internmc.facebook.com/intern/diff/D49064185) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108686 Approved by: https://github.com/voznesenskym	2023-09-08 16:10:32 +00:00
Evgeni Burovski	1f20531939	fall back to eager on `NotImplementedError` (#107863 ) Follow-up to https://github.com/pytorch/pytorch/pull/107710: Help dynamo fall back to eager when compiling unimplemented numpy constructs: - arrays of strings - (arg){min, max} for complex types - various arguments typed as NotImplemented (`np.ones(4, order="F")` etc) - numpy functions which torch._numpy does not implement To test, run (we do not implement arrays of strings) ``` import torch import numpy as np @torch.compile(fullgraph=False) def fn(): return np.asarray(["L", "U"]) ``` and observe it compiles with fullgraph=False and fails with fullgraph=True Fixes https://github.com/pytorch/pytorch/issues/107970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107863 Approved by: https://github.com/ezyang, https://github.com/lezcano	2023-09-07 21:22:20 +00:00
Elias Ellison	e18f512b81	Update accuracy checking for nan, floats (#108202 ) Fixes inference accuracy for `doctr_reco_predictor` and `pyhpc_turbulent_kinetic_energy`. For the `same(float, float)` comparison we weren't going through the more rigorous tensor comparison path which takes into account the fp64 base results. Also return True when fp64 base result are not well formed (nan). I debugged these models and the source of divergence were innocuous: `doctr_reco_predictor` - can be fixed by turning off layout optimization, decomp for batch norm `pyhpc_turbulent_kinetic_energy` - divergence caused because fused kernel keeps precision in fp32 instead of casting back and forth from/to fp32/bf16. Fused kernel is better precision, anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108202 Approved by: https://github.com/jansel	2023-09-01 02:54:01 +00:00
Brian Hirsh	5efd63b1b8	better support for fakeifying and dynamoing through torch_dispatch subclasses (with dynamic shapes) (#107415 ) There is already some support for plumbing `__torch_dispatch__` tensor subclasses through dynamo, but this PR beefs it up a bit and adds a test. In particular: (1) Fakeifying tensor subclasses didn't properly set autograd metadata (requires_grad, is_leaf) on the newly fakeified wrapper subclass. I don't actually have a test for this in this PR, but it's tested pretty heavily later in my aot autograd tests (2) Fakeifying tensor subclasses didn't properly track source information for dynamic shapes on the inner tensors. I added a new `WrapperSubclassFieldSource` subclass, that represents a source coming from a tensor field on a wrapper subclass, which I use in the fakeifying logic, and again in symbolic_shapes.py to generate proper guards. (3) `_make_wrapper_subclass()` marginally updated this code to work better with dynamic shapes. One thing that's a bit weird about `_make_wrapper_subclass`: it has two overloads, and the first explicitly does not support dynamic shapes (and the second.. does not support kwargs). I think that later we probably want to consolidate / at least make the first overload work with dynamic shapes, but I didn't want to handle that in this PR (so these smaller changes seemed like a strict improvement). Pull Request resolved: https://github.com/pytorch/pytorch/pull/107415 Approved by: https://github.com/ezyang	2023-08-29 02:36:48 +00:00
Animesh Jain	9d2ffc5dfa	[reland][Dynamo] cache_size policy #107496 (#108069 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108069 Approved by: https://github.com/yanboliang	2023-08-28 22:06:54 +00:00
Tugsbayasgalan Manlaibaatar	485de73004	Improve unbacked symint error msg (#107806 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107806 Approved by: https://github.com/avikchaudhuri	2023-08-25 01:07:09 +00:00
lezcano	207b06d099	[dynamo] Wrap ndarray dunder methods (#107689 ) Fixes https://github.com/pytorch/pytorch/issues/107437 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107689 Approved by: https://github.com/ezyang ghstack dependencies: #107687, #107688, #107710, #107711, #107746	2023-08-23 13:55:36 +00:00
lezcano	612c8a8c84	Guard numpy imports in the dynamo folder (#107299 ) Fixes https://github.com/pytorch/pytorch/issues/107228 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107299 Approved by: https://github.com/atalman	2023-08-21 19:07:20 +00:00
Edward Z. Yang	36bb7a1f42	Add fast traceback utilities (#107358 ) This adds some utilities for conveniently working with fast combined CapturedTraceback from Python. The main goal of these utilities is to make it easier for people to use CapturedTraceback as a drop-in replacement for `traceback.extract_stack`, which is 20x slower than CapturedTraceback. I port symbolic shapes to use the new CapturedTraceback code, to validate that the APIs work and are useful. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107358 Approved by: https://github.com/zdevito, https://github.com/albanD ghstack dependencies: #107438	2023-08-18 19:05:54 +00:00
Michael Lazos	e0d6072f69	Add API to mark input tensors static for cudagraphs (#107154 ) Adds API to mark tensor as a static input - To make this trigger recompiles properly, I'll need to update tensor match checks to also check for this new attribute Additional concern is memory - the tensors will be kept alive, but this is the current behavior for nn modules and parameters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107154 Approved by: https://github.com/eellison	2023-08-16 04:38:19 +00:00
Yanbo Liang	fbfb9a1648	[Dynamo] Improve PT2 fbcode logging observability (#106932 ) Summary: https://docs.google.com/document/d/1D5K3_ELsda3tIUeSyNL_2yee-M3jVWbirqSQ5BDNvHQ/edit This is the revamped version of D47908299. For each frame, we will record a list of compilation metrics: e.g, backend_compile time, entire_frame_compile time, cache_size, co_filename, co_firstlineno, co_name, guards, graph input_count, graph node_count, graph op_count. With the help of job info: mast_job_name, global_rank, we can satisfy the requirements from `Things I’ve used/wanted to use our logging to determine` in https://docs.google.com/document/d/1D5K3_ELsda3tIUeSyNL_2yee-M3jVWbirqSQ5BDNvHQ/edit (or add more metrics for this framework) Test Plan: ``` buck2 test //caffe2/test:test_dynamo ``` Differential Revision: D48142400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106932 Approved by: https://github.com/anijain2305	2023-08-11 20:46:04 +00:00
lezcano	a9dca53438	NumPy support in torch.compile (#106211 ) RFC: https://github.com/pytorch/rfcs/pull/54 First commit is the contents of https://github.com/Quansight-Labs/numpy_pytorch_interop/ We have already been using this in core for the last few months as a external dependency. This PR pulls all these into core. In the next commits, I do a number of things in this order - Fix a few small issues - Make the tests that this PR adds pass - Bend backwards until lintrunner passes - Remove the optional dependency on `torch_np` and simply rely on the upstreamed code - Fix a number dynamo tests that were passing before (they were not tasting anything I think) and are not passing now. Missing from this PR (but not blocking): - Have a flag that deactivates tracing NumPy functions and simply breaks. There used to be one but after the merge stopped working and I removed it. @lezcano to investigate. - https://github.com/pytorch/pytorch/pull/106431#issuecomment-1667079543. @voznesenskym to submit a fix after we merge. All the tests in `tests/torch_np` take about 75s to run. This was a work by @ev-br, @rgommers @honno and I. I did not create this PR via ghstack (which would have been convenient) as this is a collaboration, and ghstack doesn't allow for shared contributions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106211 Approved by: https://github.com/ezyang	2023-08-11 00:39:32 +00:00
Jason Lu	bc88028e8e	Back out "Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 )" (#106743 ) Summary: Original commit changeset: 81319beb97f3 Original Phabricator Diff: D47961182 Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822 Reviewed By: atuljangra Differential Revision: D48131623 @diff-train-skip-merge (D48131623 landed internally) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743 Approved by: https://github.com/malfet	2023-08-08 15:27:34 +00:00
Thomas Ortner	cc21fa75a3	Enable dynamic shapes of torch.nn.Parameter (#105855 ) This PR adds a new configuration that enables shapes of torch.nn.Parameter to be treated as dynamic in order to avoid extensive recompilation when Paramters are used instead of Tensor. This features addresses part of issue #105279 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105855 Approved by: https://github.com/ezyang	2023-08-08 05:40:01 +00:00
Mikayla Gawarecki	d8e5f2aa6d	Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224 Approved by: https://github.com/atalman, https://github.com/albanD	2023-07-31 17:18:56 +00:00
Michael Voznesensky	8549abc347	Grab bag of DTensor enablement stuff (Enable whole graph capture for DTensor) (#105787 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105787 Approved by: https://github.com/ezyang	2023-07-30 00:17:45 +00:00
Aaron Gokaslan	6d43c89f37	[BE]: Update Ruff to 0.0.280 (#105724 ) Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724 Approved by: https://github.com/ezyang, https://github.com/janeyx99	2023-07-22 23:03:34 +00:00
angelayi	b0a04331b4	[dynamo] Fix import if numpy is not installed (#105711 ) This [line](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/allowed_functions.py#L18) results in an import issue if numpy is not installed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105711 Approved by: https://github.com/yanboliang, https://github.com/ezyang	2023-07-21 05:52:32 +00:00
William Wen	777fc0bb58	[dynamo] fine-grained bytecode-source attribution in python 3.11 (#104676 ) Since Python 3.11 bytecode contains endline and column information, for each bytecode, we attribute the source code corresponding to the bytecode in a more accurate way. For example, we can highlight a function call in a series of nested function calls, or highlight a function call spanning multiple lines. Sample: ```python import torch import torch._dynamo from functorch.experimental.control_flow import cond def h(x): return x * 5 def true_fn(x): return x * 2 def false_fn(x): return x * 3 def f(pred, x): x = h( h(h(x)) ) x = x[1:][:2] torch._dynamo.graph_break() x = cond(pred, true_fn, false_fn, [x]) opt_f = torch.compile(f, backend="eager") opt_f(torch.tensor(True), torch.randn(3, 3, 3, 3)) ``` Output: ``` $ TORCH_LOGS="trace_call" python playground9.py TRACE inlined call h from f /scratch/williamwen/work/pytorch/playground9.py:16 h(h(x)) ~^^^ TRACE FX call mul from h /scratch/williamwen/work/pytorch/playground9.py:6 (inline depth: 1) return x * 5 ~~^~~ TRACE inlined call h from f /scratch/williamwen/work/pytorch/playground9.py:16 h(h(x)) ~^^^^^^ TRACE FX call mul_1 from h /scratch/williamwen/work/pytorch/playground9.py:6 (inline depth: 1) return x * 5 ~~^~~ TRACE inlined call h from f /scratch/williamwen/work/pytorch/playground9.py:15 x = h( ~^ h(h(x)) ^^^^^^^ ) ^ TRACE FX call mul_2 from h /scratch/williamwen/work/pytorch/playground9.py:6 (inline depth: 1) return x * 5 ~~^~~ TRACE FX call getitem from f /scratch/williamwen/work/pytorch/playground9.py:18 x = x[1:][:2] ~^^^^ TRACE FX call getitem_1 from f /scratch/williamwen/work/pytorch/playground9.py:18 x = x[1:][:2] ~~~~~^^^^ TRACE inlined call true_fn from <resume in f> /scratch/williamwen/work/pytorch/playground9.py:20 x = cond(pred, true_fn, false_fn, [x]) ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TRACE FX call mul from true_fn /scratch/williamwen/work/pytorch/playground9.py:9 (inline depth: 1) return x * 2 ~~^~~ TRACE inlined call false_fn from <resume in f> /scratch/williamwen/work/pytorch/playground9.py:20 x = cond(pred, true_fn, false_fn, [x]) ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TRACE FX call mul from false_fn /scratch/williamwen/work/pytorch/playground9.py:12 (inline depth: 1) return x * 3 ~~^~~ TRACE FX call cond from <resume in f> /scratch/williamwen/work/pytorch/playground9.py:20 x = cond(pred, true_fn, false_fn, [x]) ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104676 Approved by: https://github.com/ezyang	2023-07-20 17:18:52 +00:00
Andrey Talman	c6653b65d8	Back out "Make adding buffers more like adding parameters (#104069 )" (#105581 ) Summary: D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/ with `TypeError: register_buffer() takes 3 positional arguments but 4 were given` Original commit changeset: d4b4069fbd38 Original Phabricator Diff: D47537831 Test Plan: ``` buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform ``` Reviewed By: atalman Differential Revision: D47600140 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581 Approved by: https://github.com/mikaylagawarecki	2023-07-20 03:39:53 +00:00
kshitij12345	e137ac6c59	[dynamo][torch_np] support linalg, random and fft module (#105320 ) Support tracing through `np.linalg` with `torch_np` installed. Will update with other modules if this approach makes sense. TODO: * [x] Add test for `fft` and `random`. Fixes https://github.com/pytorch/pytorch/issues/105269 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105320 Approved by: https://github.com/ezyang, https://github.com/lezcano	2023-07-19 11:06:37 +00:00
Michael Lazos	1597dd7a54	Report guard failures with recompiles logging (#105500 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/105500 Approved by: https://github.com/Chillee, https://github.com/anijain2305	2023-07-19 02:20:44 +00:00
Wanchao Liang	cb23373264	[dynamo] allow tensor subclass fakification in dynamo (#105308 ) This PR adds necessary plumbing through torchdynamo to allow tensor subclasses with certain contract (i.e. with `__tensor_flatten__` and `__tensor_unflatten__`) to goes through the dynamo fakification pass by fakifying the tensor subclass internal components. Some of the tensor subclass contract logic mostly borrowed from https://github.com/pytorch/pytorch/pull/97540 Added some tests to verify simply passing through a tensor subclass (i.e. DTensor) through dynamo eager works as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105308 Approved by: https://github.com/ezyang	2023-07-18 17:28:04 +00:00
Aleksandar Samardžić	5d473a950f	Make conversions from/to sparse semi-structured always @torch.compile-d (#105272 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105272 Approved by: https://github.com/ezyang	2023-07-18 04:51:28 +00:00
lezcano	a26afb9848	Better comparisons for np.ndarrays in dynamo (#105333 ) This takes tolerances into account. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105333 Approved by: https://github.com/larryliu0820	2023-07-17 20:20:50 +00:00
ekamiti	32d422f335	Make adding buffers more like adding parameters (#104069 ) Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible. Fixes #35735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069 Approved by: https://github.com/mikaylagawarecki	2023-07-17 17:59:05 +00:00
Animesh Jain	95232c216b	[dynamo] Bugfix for enums (#105306 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105306 Approved by: https://github.com/yanboliang	2023-07-17 16:39:16 +00:00
lezcano	b190f46514	Allow NumPy code in torch.compile to run on cuda (#104699 ) This can be achieved by doing `torch.set_default_device("cuda")`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104699 Approved by: https://github.com/ezyang, https://github.com/larryliu0820	2023-07-06 18:43:09 +00:00
Animesh Jain	8c191d8eef	[dynamo][ac] Reland #104397 - Remove disable monkeypatching of utils.checkpoint (#104665 ) NO CHANGE from before. The ancestor diff was reverted, so this diff got reverted as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104665 Approved by: https://github.com/wconstab	2023-07-06 00:48:02 +00:00
Animesh Jain	4005152b92	[dynamo] Organize higherorderops variable trackers (#104565 ) The main change is moving the higherorderops from torch.py to higher_order_ops.py. And creating smaller subclasses of HigherOrderOp for cond, map etc Pull Request resolved: https://github.com/pytorch/pytorch/pull/104565 Approved by: https://github.com/zou3519	2023-07-05 22:19:26 +00:00
PyTorch MergeBot	40f53912cf	Revert "[dynamo][ac] Remove disable monkeypatching of utils.checkpoint (#104397 )" This reverts commit `537a6c0651`. Reverted https://github.com/pytorch/pytorch/pull/104397 on behalf of https://github.com/huydhn due to This has been reverted internally by D47216591, so I need to also revert it on OSS to keep them in sync ([comment](https://github.com/pytorch/pytorch/pull/104397#issuecomment-1621086360))	2023-07-05 06:11:08 +00:00
Animesh Jain	537a6c0651	[dynamo][ac] Remove disable monkeypatching of utils.checkpoint (#104397 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104397 Approved by: https://github.com/wconstab	2023-06-30 02:27:06 +00:00
Animesh Jain	2bb83cd45c	[dynamo][ac] Minor refactor for better code organization and a bugfix (#104276 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104276 Approved by: https://github.com/zou3519	2023-06-29 12:57:59 +00:00
cdzhan	c06bb82ba1	fix specialization when you pass an unspec int into slicing on a Python list. (#104142 ) Fixes #103545 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104142 Approved by: https://github.com/malfet, https://github.com/jansel	2023-06-28 13:13:07 +00:00
Animesh Jain	75dab587ef	[dynamo] FSDP + AC + torch.compile (#103953 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103953 Approved by: https://github.com/wanchaol	2023-06-24 01:40:56 +00:00
Vinay Kumar Burugu	3c28431a0f	Feature: Dump compile_times when TORCH_LOGS=dynamo is enabled. (#104057 ) Partial implementation of https://github.com/pytorch/pytorch/issues/103173. This PR only implements the feature to dump compile_times at the end of the session using the atexit handler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104057 Approved by: https://github.com/ezyang	2023-06-23 05:25:09 +00:00
Thiago Crepaldi	6f655d4195	Add symbolic tracing support to torch._dynamo.export (fake input + weights) (#100017 ) Fixes #95900 Using the following repro as guide: ```python import torch import torch._dynamo from torch._subclasses import fake_tensor from torch.fx.experimental.symbolic_shapes import ShapeEnv from torch._dynamo.output_graph import config class Model(torch.nn.Module): def __init__(self) -> None: super().__init__() self.linear = torch.nn.Linear(2, 2) self.linear2 = torch.nn.Linear(2, 2) def forward(self, x): out = self.linear(x) out = self.linear2(out) return out fake_mode = fake_tensor.FakeTensorMode(allow_non_fake_inputs=False, allow_fallback_kernels=True, shape_env=ShapeEnv( allow_scalar_outputs=config.capture_scalar_outputs, allow_dynamic_output_shape_ops=config.capture_dynamic_output_shape_ops, frame_id=0 ), ) # Fakefying input/model before calling torch._dynamo.export with fake_mode: fake_x = torch.rand(5, 2, 2) model = Model() # Calling torch._dynamo.export without active fake mode graph_module, guards = torch._dynamo.export( model, fake_x, aten_graph=True, fake_mode=fake_mode ) graph_module.print_readable() graph_module.graph.print_tabular() ``` Summary of changes: * Plumb fake_mode through torch.export API. When specified, it replaces the creation of a new FaketendorMode at InstructionTranslator on behalf of OutputGraph Hacks FakeTensor.__new__ to prevent a torch.tensor._make_subclass call for inputs that are already fakefied by user. This probably need to be fixed in a nicer way. Any idea? * Removed a few asserts that didn't want faked tensors coming from user script * Added torch._subclasses.fake_tensor.FakeTensor to type list on a few asserts check to allow fake inputs The changes above allowed symbolic tracing with both static and dynamic shapes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100017 Approved by: https://github.com/ezyang	2023-06-15 21:28:10 +00:00
Mengwei Liu	96c23fe212	[dynamo][numpy] Add support for builtin functions (#103457 ) In order to be able to run stuff like: ``` def f(x): a = x.numpy() return a + a ``` This PR adds a branch in `BuiltinVariable` to handle `NumpyNdarrayVariable` case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103457 Approved by: https://github.com/ezyang	2023-06-15 09:18:45 +00:00
Animesh Jain	16c2090b2d	[benchmark][compile] Limit number of bounding boxes to 5 (#103413 ) Depends on https://github.com/pytorch/benchmark/pull/1729 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103413 Approved by: https://github.com/ezyang	2023-06-15 01:06:40 +00:00
Edward Z. Yang	ddf4cd69ec	Delete ifdyn and ifunspec combinators (#103596 ) Replaced with expect tests for ease of updating. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103596 Approved by: https://github.com/voznesenskym	2023-06-15 00:14:17 +00:00
Animesh Jain	bd0ed940b7	[activation checkpoint][dynamo] Wrap AC into Tag based higher order op (#102935 ) These are the numbers with this PR ![image](https://github.com/pytorch/pytorch/assets/13822661/63e991d5-80e2-4e94-8e4b-243621c3990e) There are 3 main followups * A naive partitioner gives better memory footprint than min-cut partitioner here. Currently, we are using min-cut partitioner. Waiting for @Chillee to discuss this further to either modify min-cut or add a naive partitioner. * aot_eager is < 1x memory footprint. This is true even for non AC models. This could hide some inefficiency somewhere. * inductor is giving very different memory numbers between AOT-traced-AC (duplicate early) vs this implementation. This leads to some inefficiency in inductor that we need to resolve. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102935 Approved by: https://github.com/jansel	2023-06-14 20:15:43 +00:00
Edward Z. Yang	8b015c166c	Don't test dynamic_shapes in tensor_always_has_static_shape (#103517 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103517 Approved by: https://github.com/anijain2305	2023-06-14 07:04:17 +00:00
Mengwei Liu	2eac8bd2b8	[dynamo][numpy] Support ndarray methods (#97537 ) This PR adds universal support for ndarray methods. After #100839 each `NumpyNdarrayVariable` should wrap a `torch.Tensor`. This PR adds a `numpy_method_wrapper` which converts the `torch.Tensor` to `torch_np.ndarray` and then call the numpy ndarray method. Then we also try to return a `torch.Tensor` (return as-is if the value is not ndarray-like) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97537 Approved by: https://github.com/ezyang	2023-06-12 17:21:31 +00:00
Edward Z. Yang	12cd1dbba0	Handle recursive tuple in clone_inputs (#102979 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102979 Approved by: https://github.com/wconstab	2023-06-05 22:11:48 +00:00
Michael Lazos	c46af25bb3	Initialize optimizer in dynamo to avoid graph break and tracing slowness (#102640 ) On calls to `_init_group` rather than tracing through it, extract python values from the arguments, and call the initialization. This avoids having to trace this function which is very slow with large parameters, and also avoids graph breaking on it. This is sound in this case because the state is only initialized once in the eager case. Guards on the state and params are generated explicitly rather than via tracing the initialization. Caveats: `_init_group` also gathers various state tensors into lists via mutating list arguments to pass to the functional optimizer implementation. These state tensors exist on the optimizer itself, but we don't know exactly how the gathering is done and which tensors correspond to which attributes of the optimizer module (each optimizer has different states). To rectify this, we keep weak_ptrs to all of the tensors collected in the lists in globals (similar to how parameter keys are stored for dictionaries). These pointers are guaranteed to be alive as long as the optimizer object is alive if the internal state is not interfered with and they are guarded with weakref guards Pull Request resolved: https://github.com/pytorch/pytorch/pull/102640 Approved by: https://github.com/jansel	2023-06-03 15:49:51 +00:00
Mengwei Liu	c304fddf68	[dynamo][numpy] Support graph break for numpy ndarray (#100839 ) Issue: #93684 In previous PRs #95849 #99560 we redirect `numpy.`, `<tensor>.numpy()` calls to `torch_np.` methods and attributes, by creating `NumpyNdarrayVariable` for those calls. We need to handle `NumpyNdarrayVariable` when graph break happens. This PR did 2 things: 1. In `codegen.py` we made sure we can reconstruct the value wrapped by `NumpyNdarrayVariable`, to be `torch_np.ndarray` in the stack whenerver we recompiles the subgraph. 2. In `builder.py` we can wrap the value to be `NumpyNdarrayVariable` and save it as graph input. ----- Starting from commit 6: ## A new design for supporting numpy in dynamo In short the core concept doesn't change: we still convert `numpy` API calls to `torch_np` API calls. However, instead of wrapping a `torch_np.ndarray` in `NumpyNdarrayVariable`, the new design wraps a `torch.Tensor`. The reason for doing this change is because we need to keep `torch.Tensor` everywhere in the captured graph, so that it works well with the backend of dynamo. See discussions in https://github.com/Quansight-Labs/numpy_pytorch_interop/issues/142 for details. ### Flow This is an example showing how do we think about dynamo working on a simple function: ```python def f(x: torch.Tensor, y: torch.Tensor): a, b = x.numpy(), y.numpy() c = np.add(x, y) return torch.from_numpy(c) ``` ``` +------------+ +------------+ torch.Tensor \| \|numpy.ndarray\| \| -------------- .numpy() --------------\| \| \| \| \| \| +------------------+ +------------+ \| numpy.add \|numpy.ndarray\| \|torch.Tensor +------------+ \| --------------\| torch.from_numpy -------------- torch.Tensor \| \|numpy.ndarray\| \| \| \| -------------- .numpy() --------------\| \| +------------------+ \| \| \| \| +------------+ +------------+ +------------+ +----------------+ torch.Tensor \| \|torch.Tensor \| \| -------------- .detach() --------------\| \| \| \| \| \| +----------------+ +------------+ +------------+ \| \|torch_np.ndarray\| \|torch.Tensor\| \|torch.Tensor \| torch_np.add -----------------\| util.to_tensor -------------\| .detach() -------------- +------------+ \| \| \| \| \| \| torch.Tensor \| \|torch.Tensor \| \| +----------------+ +------------+ -------------- .detach() --------------\| \| \| \| \| \| +------------+ \| +----------------+ \| \| wrapper on torch_np.add \| +--------------------------------------------------------+ ``` ### Approach `torch_np` APIs can take both `torch_np.ndarray` as well as `torch.Tensor`. What we need to do is to have a wrapper for these APIs to convert the return value back to `torch.Tensor`. This way only the wrapper is showing up in the captured graph, with `torch.Tensor`s as input and `torch.Tensor` as output. If we have a graph break or we've traced to the end of the program, we need to inspect all the `NumpyNdarrayVariable` in the stack and convert them back to `numpy.ndarray`, to make sure the compiled version is still behaving the same as the eager version. ### Examples Here's an example of the graph generated: ```python def fn(x: np.ndarray, y: np.ndarray): a = x.real b = y.real torch._dynamo.graph_break() return np.add(a, 1), np.add(b, 1) ``` Graph generated: ``` [2023-05-16 10:31:48,737] torch._dynamo.output_graph.__graph: [DEBUG] TRACED GRAPH __compiled_fn_0 <eval_with_key>.0 opcode name target args kwargs ------------- -------------- ---------------------------------------------------------- ---------------------- -------- placeholder l_x_ L_x_ () {} placeholder l_y_ L_y_ () {} call_function from_numpy <built-in method from_numpy of type object at 0x12b1fdc80> (l_x_,) {} call_function from_numpy_1 <built-in method from_numpy of type object at 0x12b1fdc80> (l_y_,) {} call_function attr_wrapper <function attr_wrapper at 0x12e8693a0> (from_numpy, 'real') {} call_function attr_wrapper_1 <function attr_wrapper at 0x12e8693a0> (from_numpy_1, 'real') {} output output output ((),) {} [2023-05-16 10:31:48,908] torch._dynamo.output_graph.__graph: [DEBUG] TRACED GRAPH __compiled_fn_2 <eval_with_key>.1 opcode name target args kwargs ------------- ------------- ---------------------------------------------------------- ------------------------------- -------- placeholder l_a_ L_a_ () {} placeholder l_b_ L_b_ () {} call_function from_numpy <built-in method from_numpy of type object at 0x12b1fdc80> (l_a_,) {} call_function from_numpy_1 <built-in method from_numpy of type object at 0x12b1fdc80> (l_b_,) {} call_function wrapped_add <Wrapped function <original add>> (from_numpy, 1) {} call_function wrapped_add_1 <Wrapped function <original add>> (from_numpy_1, 1) {} output output output ((wrapped_add, wrapped_add_1),) {} ``` ### Changes * `codegen.py`: reconstruct `numpy.ndarray` from `NumpyNdarrayVariable` by adding bytecode to call `utils.to_numpy_helper()`. * `output_graph.py`: getting rid of legacy code that does exactly what `codegen.py` does, which only handling return case but not graph break case. * `utils.py`: added helpers to convert `numpy.ndarray` to `torch.Tensor` and vice versa. Also adding a wrapper class that takes in a function. In `__call__` it calls the function and converts its out to `torch.Tensor` (or a list of it). * `builder.py`: add method to wrap `numpy.ndarray` graph inputs into `NumpyNdarrayVariable`, by calling `torch.numpy` in the proxy. * `misc.py`: `numpy` API calls goes into `NumpyVariable` and we find the function with the same name in `torch_np` module, then wrap it with the wrapper defined in `utils.py`. * `tensor.py`, `torch.py`: proxy `tensor.numpy()` to be `torch.detach()` but wrap it with `NumpyNdarrayVariable`. Similarly, `torch.from_numpy()` -> `torch.detach()` but wrap it with `TensorVariable`. In `NumpyNdarrayVariable`, do the similar `torch_np.ndarray` to `torch.Tensor` wrapping for attributes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100839 Approved by: https://github.com/ezyang	2023-06-03 00:54:25 +00:00
Edward Z. Yang	90b1b17c9f	Fix string concatenation with non-string (#102728 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102728 Approved by: https://github.com/Skylion007	2023-06-01 20:02:03 +00:00
Animesh Jain	2fa1b563da	[dynamo] Activation checkpoint higher order ops - Reland 101028 (#101790 ) https://github.com/pytorch/pytorch/pull/101028 was reverted due to internal breakage. Relanding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101790 Approved by: https://github.com/zou3519	2023-05-18 19:09:14 +00:00
Yanbo Liang	7052fb37bd	[Dynamo] Improve handling UnspecializedNNModuleVariable side effect (#101141 ) Fixes #101102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101141 Approved by: https://github.com/jansel	2023-05-16 03:57:13 +00:00
PyTorch MergeBot	d0db7d624d	Revert "[dynamo] Activation checkpointing as higher order op (#101028 )" This reverts commit `de15e740a1`. Reverted https://github.com/pytorch/pytorch/pull/101028 on behalf of https://github.com/jeanschmidt due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/101028#issuecomment-1548280970))	2023-05-15 17:47:08 +00:00
Michael Lazos	d75f93603a	Flatten exceptions in dynamo (#100779 ) Fixes https://github.com/pytorch/pytorch/issues/93571 [before and after](https://gist.github.com/mlazos/256b0e8f0f98495752a22b960e9f4fcb) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100779 Approved by: https://github.com/ezyang	2023-05-13 00:58:57 +00:00
Animesh Jain	de15e740a1	[dynamo] Activation checkpointing as higher order op (#101028 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101028 Approved by: https://github.com/voznesenskym, https://github.com/zou3519	2023-05-12 03:17:41 +00:00
Jerry Zhang	c3f3cb5b0f	[quant][pt2e] Support conv bn fusion in convert step for QAT flow (#100442 ) Summary: This PR adds support for folding bn weights into conv for QAT flow, this is equivalent to the QAT branch of `from_float` in eager mode quantized conv module: https://github.com/pytorch/pytorch/blob/main/torch/ao/nn/quantized/modules/conv.py#L223 Items that needs followup: * there are some workaround I did because quantize_per_tensor is using float/int args and dynamo does not support these args, need to fix after we change the quantized model representation and also change these args to Tensor Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_convert_qat_conv_bn_fusion (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)' Reviewed By: andrewor14 Differential Revision: D45344281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100442 Approved by: https://github.com/kimishpatel	2023-05-09 19:43:51 +00:00
Bin Bao	86ddfc7f68	[inductor] Move cpp wrapper trigger logic to inner_compile (#100611 ) Summary: This enables cpp wrapper for backward as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100611 Approved by: https://github.com/jansel	2023-05-08 15:24:02 +00:00
Animesh Jain	3f025c607c	summarize graph breaks (#100696 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100696 Approved by: https://github.com/yanboliang	2023-05-05 22:27:47 +00:00
Edward Z. Yang	ce1ad1c143	Add load_storage (#100519 ) This adds a new operator debugprims::load_storage which does the unusual thing of loading a tensor from disk (via ContentStoreReader). This will be used in a later PR to implement delta debugging in the minifier, even when the repro is too big to fit into memory. The way it works is that you specify a name of the tensor you want to load, as well as enough metadata to reconstruct the tensor, if the store isn't available. If there is an active content store, we read and return the tensor from that store; otherwise we use `rand_strided` to create it. I needed some infra improvements to do this: * `custom_op` now supports factory functions. Factory functions have to be registered specially via `impl_factory` * I modified `clone_input` to also support dtype conversion, which I use to change the dtype of a loaded tensor if necessary. * ContentStore needs to work with a device argument, so we torch.load directly to the correct device. This is for fake tensor support. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100519 Approved by: https://github.com/zou3519, https://github.com/anijain2305	2023-05-05 05:25:03 +00:00
Animesh Jain	8994d9e610	[dynamo] Hide guard_fail_hook behind a flag to improve cache lookup time (+10% DebertaV2) (#100590 ) For TorchDynamo eager backend, DebertaV2 speedup improves from 0.77x to 0.87x. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100590 Approved by: https://github.com/voznesenskym, https://github.com/wconstab	2023-05-04 18:52:21 +00:00
Edward Z. Yang	c7e9f40653	Misc accuracy improvements on minifier (#100447 ) The changes: * Add config knob `same_two_models_use_fp64` for toggling whether or not to use fp64 * Add a test showing that RMSE is superior to atol/rtol * Add `--strict-accuracy` options, which allows for testing against integral/boolean accuracy. Regular accuracy by default now ONLY. There's a test which exercises this, it's a little delicate but I had trouble thinking of a good test otherwise. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100447 Approved by: https://github.com/voznesenskym	2023-05-04 02:51:26 +00:00
kshitij12345	8b64dee5d2	[fix] torch_compile_debug don't log with 0 (#100462 ) Fixes https://github.com/pytorch/pytorch/issues/99906 Tested locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100462 Approved by: https://github.com/mlazos	2023-05-03 08:23:09 +00:00
Richard Zou	984a2397ba	Refactor OutputGraph (#99987 ) This PR splits OutputGraph into two classes: - SubgraphTracer (handles FX-tracing) - OutputGraph (handles Dynamo-specific output graph logic, like tracking graph inputs, compiling the graph, and executing it). The motivation behind this is in the next PR up in the stack. TL;DR is: in order to do higher-order operators, we need nested SubgraphTracer, one for each level of nesting of the higher-order operators. I'm happy to flatten the stack into a single PR, but this separate made it easier for me to test. Lmk if you want the stack flattened. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/99987 Approved by: https://github.com/anijain2305, https://github.com/voznesenskym	2023-05-02 17:11:02 +00:00
Michael Voznesensky	aafc6ce8cc	Produce constant variables in cases where a SymNode is created with a constant (#100144 ) ` AOT_DYNAMIC_SHAPES=1 TORCHDYNAMO_DYNAMIC_SHAPES=1 benchmarks/dynamo/huggingface.py --performance --training --amp --backend eager --disable-cudagraphs --device cuda --only AllenaiLongformerBase --explain` Looks promising! Goes from: Dynamo produced 173 graphs covering 2760 ops with 160 graph breaks (14 unique) To: Dynamo produced 6 graphs covering 2298 ops with 15 graph breaks (7 unique) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100144 Approved by: https://github.com/ezyang	2023-05-01 21:32:11 +00:00
Edward Z. Yang	2d8deffc1e	Refactor repro/minifier into CLI; add analyze (#100226 ) This is a two part PR; I can split it if you really want me to. The first part is a refactor of the after aot repro/minifier scripts to come with a command line interface. I maintain exact BC with the previous interface (so, e.g., you still get a repro.py and a run_minifier.py that do the same thing as before), but each of these scripts also take command line arguments now which you can use to customize what actually happens. Check `run_repro` for full documentation on the arguments. The second part of this is an implementation of `analyze` subcommand on the new CLI for any repro. <img width="1277" alt="image" src="https://user-images.githubusercontent.com/13564/235045677-8545aab7-5e83-4813-bbec-47783dc60122.png"> This facility is oriented towards accuracy debugging. It does several things: 1. It will run your model twice and check for nondeterminism in inductor/float64, even on intermediate inputs (our benchmarking nondeterminism test only checks for nondeterminism on the final output). This makes localizing which operator is nondeterministic easy. 2. It will run your compiled model side-by-side with eager and float64 variants, and then report when things diverge too far from RMSE delta from float64. Importantly, it does all this without requiring every intermediate to be held in memory (which will cause an OOM on large repros, such as the one I tested this on.) Some other minor improvements: * MinifierTestBase now has an easy to comment out spot that you can use to retain the temporary directory; good for debugging * We print "running minifier" and "running repro" in MinifierTestBase to make it easier to orient where logs are coming from * same takes a `log_error` optional argument which you can use to reroute the error logs when things mismatch * counters["inductor"]["intermediate_hooks"] tracks the number of intermediate hooks we've codegen'ed; good for populate the tqdm interface * torch.fx.interpreter gets an official `boxed_run` interface which uses the boxed arguments calling convention and doesn't retain inputs unnecessarily long * torch.utils._content_store gets compute_tensor_metadata/read_tensor_metadata helper functions for computing tensor information without serializing it Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100226 Approved by: https://github.com/bertmaher, https://github.com/bdhirsh, https://github.com/anijain2305	2023-05-01 11:12:38 +00:00
PyTorch MergeBot	89c43f4108	Revert "Produce constant variables in cases where a SymNode is created with a constant (#100144 )" This reverts commit `d7bdfd3454`. Reverted https://github.com/pytorch/pytorch/pull/100144 on behalf of https://github.com/ezyang due to ci failure is real ([comment](https://github.com/pytorch/pytorch/pull/100144#issuecomment-1529587039))	2023-05-01 11:10:48 +00:00
Michael Voznesensky	d7bdfd3454	Produce constant variables in cases where a SymNode is created with a constant (#100144 ) ` AOT_DYNAMIC_SHAPES=1 TORCHDYNAMO_DYNAMIC_SHAPES=1 benchmarks/dynamo/huggingface.py --performance --training --amp --backend eager --disable-cudagraphs --device cuda --only AllenaiLongformerBase --explain` Looks promising! Goes from: Dynamo produced 173 graphs covering 2760 ops with 160 graph breaks (14 unique) To: Dynamo produced 6 graphs covering 2298 ops with 15 graph breaks (7 unique) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100144 Approved by: https://github.com/ezyang	2023-04-30 17:13:57 +00:00
Animesh Jain	03806eddbf	[dynamo] Compile torchvision augmentations (#100292 ) Resolves https://github.com/pytorch/pytorch/issues/100112 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100292 Approved by: https://github.com/jansel	2023-04-29 02:59:41 +00:00
Larry Liu	f5853342ea	[dynamo][numpy] Handle return value being numpy ndarray (#99560 ) On top of #95849 this PR is trying to handle the special case when dealing with numpy. Consider the following example: ``` def f(x: torch.Tensor) -> np.ndarray: a = x.numpy() return a.T ``` In previous PR this will error out because we translate `a.T` to be a method call on `torch_np.ndarray.T` which is also a `torch_np.ndarray`. This PR handles this case, by conditionally converting a `torch_np.ndarray` to `np.ndarray` before returning, to match the original behavior. The compiled version will be: ``` def f(x): ___tmp_0 = __compiled_fn_0(x) if isinstance(___tmp_0, torch_np.ndarray): return ___tmp_0.tensor.numpy() else: return ___tmp_0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99560 Approved by: https://github.com/jansel, https://github.com/yanboliang	2023-04-27 16:18:35 +00:00
Larry Liu	687afeb686	[dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes (#95849 ) Issue: #93684 # Problem Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations. # Design (as I know it) * Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`. * Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent. This PR adds `NumpyTensorVariable` and supports: 1. tensor to ndarray, ndarray to tensor 2. numpy functions such as numpy.meshgrid() 3. ndarray attributes such as `itemsize`, `stride` Next PR will handle returning `np.ndarray` and add support for ndarray methods Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849 Approved by: https://github.com/ezyang	2023-04-27 16:18:35 +00:00
Animesh Jain	3dcc7b396c	[easy] iterate dict with sorted keys for accuracy checking (#99793 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99793 Approved by: https://github.com/jansel	2023-04-24 21:26:35 +00:00
Edward Z. Yang	f602b3a6ae	Preserve mark_dynamic when cloning inputs (#99617 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99617 Approved by: https://github.com/ngimel, https://github.com/voznesenskym, https://github.com/anijain2305	2023-04-22 19:46:31 +00:00
Michael Voznesensky	0ac0d9d224	Pass locals to enum_repr to correctly make the guard str for enums (#99680 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99680 Approved by: https://github.com/jansel	2023-04-21 07:14:49 +00:00
Yanbo Liang	05809c7d3b	[Dynamo] No graph break for explicit calling Conv{1/2/3}d.forward & ConvTranspose{1/2/3}d.forward (#99015 ) Before this PR, if users call ```Conv2d(x)```, dynamo handles it well(no graph break) and puts a ```call_module``` op in the FX graph. However, if users explicitly call ```Conv2d.forward(x)``` in another ```forward``` function, the inlining would be failed(caused graph break). This PR fixed this issue by translating the explicit ```Conv2d.forward(x)``` to ```Conv2d(x)```. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99015 Approved by: https://github.com/jansel, https://github.com/wconstab	2023-04-15 08:04:13 +00:00
Michael Voznesensky	10fbdcf72c	Re-PR of 90269 - Force all nn_module associated tensors to be static (#99108 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99108 Approved by: https://github.com/ezyang	2023-04-14 05:53:48 +00:00
Angela Yi	1d077f28ed	[export] Constraints API (#98433 ) Wrapper for users to insert constraints into model code. The constraints will not be maintained in the graph after tracing through make_fx so retracing with dynamo/make_fx will not work. This will be supported after torch._assert supported is implemented. Then we can convert the constrain_range calls to torch._asserts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98433 Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan	2023-04-13 21:20:10 +00:00
PyTorch MergeBot	ab761605ae	Revert "[export] Constraints API (#98433 )" This reverts commit `1510eb4072`. Reverted https://github.com/pytorch/pytorch/pull/98433 on behalf of https://github.com/izaitsevfb due to Breaks internal tests, asked by author to revert	2023-04-12 23:37:19 +00:00
PyTorch MergeBot	629377ea8b	Revert "Replace _dynamo.config with an object instead of module (#96455 )" This reverts commit `420104a886`. Reverted https://github.com/pytorch/pytorch/pull/96455 on behalf of https://github.com/jansel due to BC breaking, was landed prematurely	2023-04-12 15:06:14 +00:00
Angela Yi	1510eb4072	[export] Constraints API (#98433 ) Wrapper for users to insert constraints into model code. The constraints will not be maintained in the graph after tracing through make_fx so retracing with dynamo/make_fx will not work. This will be supported after torch._assert supported is implemented. Then we can convert the constrain_range calls to torch._asserts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98433 Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan	2023-04-12 01:32:44 +00:00
Han Qi	420104a886	Replace _dynamo.config with an object instead of module (#96455 ) Summary: Replace _dynamo.config with an object instead of module Current usage patterns of setting and reading fields on config will work unchanged. Only changes needed going forward: 1. import torch._dynamo.config will not work. However, just doing import torch._dynamo is sufficient to access dynamo config as torch._dynamo.config. 2. Files inside of _dynamo folder need to access config via from torch._dynamo.config_util import config instead of from torch._dynamo import config. Because _dynamo/__init__.py imports some of the files so it would be circular import. Test Plan: Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/96455 Approved by: https://github.com/williamwen42	2023-04-11 21:23:32 +00:00
Edward Z. Yang	b8b840be3d	Convert logging f-strings to use % format, part five (#98765 ) This does some annoying but simple cases by hand. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98765 Approved by: https://github.com/wanchaol	2023-04-11 13:17:59 +00:00
Edward Z. Yang	822464567f	Lazily format graphs for debug printing (#98776 ) The current code unconditionally formats the graphs, which is a waste of CPU if no one looks at them. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98776 Approved by: https://github.com/albanD, https://github.com/mlazos	2023-04-10 22:41:33 +00:00
Edward Z. Yang	b09722f540	Convert logging f-strings to use % format, part two (#98700 ) This hits multi-line logging strings Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98700 Approved by: https://github.com/voznesenskym	2023-04-10 12:19:31 +00:00
Edward Z. Yang	9a8f71f23e	Convert logging f-strings to use % format (#98697 ) Codemod done with https://gist.github.com/ezyang/2e8b0463cdc6be278478495b23ff0530 with assistance from ChatGPT. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98697 Approved by: https://github.com/voznesenskym	2023-04-10 12:19:31 +00:00
YJ Shi	5ceae85f1c	[Dynamo] Include UserDict in clone_inputs (#97725 ) Fixes #97724 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97725 Approved by: https://github.com/yanboliang	2023-04-08 00:19:35 +00:00
Horace He	c75dd7c413	grab bag of changes (#98572 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98572 Approved by: https://github.com/shunting314, https://github.com/mlazos	2023-04-07 20:02:59 +00:00
Will Constable	390c51bf87	Skip nnmodule hook guards by default (#98371 ) This PR makes basic nnmodule forward hooks work by default, without any overhead. But it leaves silent correctness issues if users modify/remove their hooks later, thus also emits a warning. - the usual case is to not use hooks, so avoid guard overhead here - registering any hook before compile will trigger a warning about hook support - registering a hook later (or removing one) requires user knowledge and opting in, currently this isn't warnable (but maybe we can observe compiled nnmodules to make it warnable). Why skip hook guards by default instead of not tracing __call__/hooks by default? - avoid having a mode flag that alters dynamo tracing behavior (harder to test both codepaths in CI with full coverage) - the most basic hook usecase (registering a hook before compile, and never removing it) will work by default with this PR, while it would require enablement and incur overhead in the 'not tracing __call__' proposal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98371 Approved by: https://github.com/jansel	2023-04-07 15:10:51 +00:00
Edward Z. Yang	d01ee10b25	Add detect_fake_mode (#98321 ) This replaces fake_mode_from_tensors but it preferentially looks for fake_mode in TracingContext and also if there is an active fake mode on the dispatch stack, before groveling in tensors to find it. This advances PegasusForCausalLM, which was previously failing because we generated a graph that had a parameter (non-fake) and a SymInt, and thus previously we failed to detect the correct fake mode. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98321 Approved by: https://github.com/voznesenskym	2023-04-05 22:15:16 +00:00
Yanbo Liang	b1c2925493	[Dynamo] Support typing.Union and typing.Optional (#98384 ) Fixes #98265 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98384 Approved by: https://github.com/ezyang	2023-04-05 21:31:52 +00:00
Michael Voznesensky	b1e60bfb6a	Pass f_locals as a dict rather than kwargs (#98107 ) Fixes https://github.com/pytorch/pytorch/issues/97688 One big problem is that instead of printing x < y we now print `E["x"] < E["y"]` and now all of the tests wobbled and I'm mad. Signed-off-by: Edward Z. Yang <ezyangmeta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98107 Approved by: https://github.com/ezyang	2023-04-04 00:30:08 +00:00
Yanbo Liang	a6bd21d935	[Dynamo] Eagerly initializing Lazy Module to reduce graph breaks (#97946 ) Fixes Meta internal user case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97946 Approved by: https://github.com/wconstab	2023-04-03 22:24:43 +00:00
Jason Ansel	35b3309539	Fix graph break from inline patched init (#98150 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98150 Approved by: https://github.com/anijain2305, https://github.com/yanboliang	2023-04-03 01:11:30 +00:00
Michael Lazos	ee9a9b7add	Remove old logging callsites (#98095 ) Get around GH first issue, OSS only changes for https://github.com/pytorch/pytorch/pull/97182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98095 Approved by: https://github.com/anijain2305	2023-04-01 00:57:37 +00:00
William Wen	14ef91cea6	[dynamo 3.11] small bug fixes (#96508 ) Bugs fixed: - CALL_FUNCTION_EX expects null pop in symbolic_convert - make_function_with_closure codegen requires a push_null - copy over the closure in eval_frame.c - add JUMP_FORWARD to terminal opcodes - enum repr fix in utils.py - fix symbolic_convert's break_graph_if_unsupported wrapper Pull Request resolved: https://github.com/pytorch/pytorch/pull/96508 Approved by: https://github.com/jansel	2023-03-31 18:18:12 +00:00
David Berard	c218309f88	[dynamo] profiler.record_function on all dynamo_timed functions (#96495 ) Summary: profiler.record_function inserts an event into the chrome trace generated by the pytorch profiler. This PR adds record_function everywhere that @dynamo_timed is annotated. dynamo_timed and the CLI viewer torch._dynamo.utils.compile_times() are already useful on their own; but for identifying _when_ these get called, it's nice to be able to view in the profiler chrome trace. Why not just turn on python stack traces in the profiler to get this information? Dynamo compilation is implemented in python and therefore produces a huge amount of events when it records compilation steps. The resulting trace files are often too large to load in chrome://tracing, and they take a long time to generate. Additionally, the stack traces are deep enough that they are often hard to read. This approach produces much more readable traces with lower overhead. Tests: - Added in test/dynamo/test_profiler.py. Verified in https://github.com/pytorch/pytorch/actions/runs/4559322864/jobs/8043307798?pr=96495 that the tests are actually running. - Performance run with `ciflow/inductor-perf-compare` shows no noticeable change in compilation time or speedup numbers. Geomean speedup changes from 1.275 -> 1.277. Geomean compilation times change from 54.2s -> 53.8s. That's likely just due to noise. All individual benchmark numbers regressed by no more than 5% between the two runs; and we see improvements of around the same magnitude, suggesting this is, again, just noise. For meta employees, you can see the results in a google sheets here: https://docs.google.com/spreadsheets/d/1Ki69XvcgxcA3ZnqC5n_jav5KiD4u7Wojlad3VTnIdlk/edit?usp=sharing Example: Run this: ```python import torch def gn(x): return x.sin().cos() def fn(x, y): return x.sin() * y.cos() x, y = [torch.rand((2, 2), device='cuda') for _ in range(2)] # just to clear out any lazy initialization with torch.profiler.profile() as prof: torch.compile(gn)(x) with torch.profiler.profile() as prof: torch.compile(fn)(x, y) prof.export_chrome_trace("./dynamo_timed_profile.json") ``` and we can see that the resulting trace shows important dynamo steps, even when python tracing is turned off. <img width="867" alt="Screenshot 2023-03-29 at 7 26 15 PM" src="https://user-images.githubusercontent.com/5067123/228712263-8ae67ab9-1a52-4765-a9c2-7c5cf0abe2f5.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96495 Approved by: https://github.com/ngimel, https://github.com/mlazos	2023-03-30 21:49:02 +00:00
Edward Z. Yang	fb7f983357	Graph break on operators that fake tensor doesn't support (#97708 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97708 Approved by: https://github.com/eellison	2023-03-28 19:49:54 +00:00
vfdev	0f424f7f05	Fixed broken link to troubleshooting.html docs page (#97330 ) Seen first in error message: ``` [2023-03-22 10:30:39,786] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64) function: '<resume in paste_mask_in_image>' (/vision/torchvision/models/detection/roi_heads.py:407) reasons: w == 857 to diagnose recompilation issues, see https://pytorch.org/docs/master/dynamo/troubleshooting.html. [2023-03-22 10:30:40,036] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64) function: '<resume in paste_mask_in_image>' (/vision/torchvision/models/detection/roi_heads.py:406) reasons: ___stack0 == 207 to diagnose recompilation issues, see https://pytorch.org/docs/master/dynamo/troubleshooting.html. ``` Broken link: - https://pytorch.org/docs/master/dynamo/troubleshooting.html. Good link: - https://pytorch.org/docs/master/compile/troubleshooting.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/97330 Approved by: https://github.com/zou3519	2023-03-22 16:40:21 +00:00

1 2 3 4 5

214 Commits