pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Guilherme Leobas	8603a1c870	Suport generators (#141055 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141055 Approved by: https://github.com/zou3519	2025-02-08 22:42:12 +00:00
eellison	92b7e610ab	[Inductor changes] Invoke Quant (#139102 ) Adds a `invoke_quant` higher order operator as proposed [here](https://docs.google.com/document/d/1s2PfJlq6Q1F8l11CkTIC69BW1rEnGEgs6YmBC7hu8rA/edit?tab=t.0). The primary motivations are - Unifying scattered reasoning for quant operators throughout the code base - Easy of pattern matching - see this very large pattern match expression [here](`949fdd2997/torch/_inductor/fx_passes/post_grad.py (L390-L426)`. Compared to the pattern I have in the tests: ``` @register_graph_pattern( CallFunction( torch.ops.aten.mm, CallFunction( torch.ops.higher_order.invoke_quant, Ignored(), Ignored(), Ignored(), scheme="nf4", ), Arg(), ), pass_dict=test_pass, ) ``` - Ability to specify inductor specific logic, like codegen'ing the operators in lower precision, or forcing fusion to a matmul. Example graph: ``` Python ===== AFTER POST GRAD ===== /data/users/eellison/pytorch/torch/fx/_lazy_graph_module.py class <lambda>(torch.nn.Module): def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"): # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(args, kwargs, quant_options=self) # type: ignore[call-arg] repeated_subgraph0 = self.repeated_subgraph0 invoke_quant: "f32[8][1]cpu" = torch.ops.higher_order.invoke_quant(repeated_subgraph0, arg0_1, arg1_1, scheme = 'nf4'); repeated_subgraph0 = arg0_1 = arg1_1 = None return (invoke_quant,) class repeated_subgraph0(torch.nn.Module): def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"): # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(args, *kwargs, quant_options=self) # type: ignore[call-arg] mul: "f32[8][1]cpu" = torch.ops.aten.mul.Tensor(arg0_1, arg1_1); arg0_1 = None add: "f32[8][1]cpu" = torch.ops.aten.add.Tensor(mul, arg1_1); mul = arg1_1 = None return add ``` The schema for `invoke_quant` is `torch.ops.higher_order.invoke_quant(subgraph, args, scheme=None)` where the scheme will not always be present. I wasn't sure exactly how the inductor specific configurations like `codgen_in_low_precision` should be passed through. I didnt want to stuff them all in as kwargs, and I didn't want to have them affect pattern matching. So they will be stored as meta of the node itself. And, following that, I wanted the invocation of the hop to match how it will show up in the graph. So I decided to have it be an object that is then invoked for the tracing. ``` invoke_quant = InvokeQuant(codegen_low_precision=True) invoke_quant(gn, (x, y), scheme="nf4") ``` Todo - not require the packing of args in a tuple, will do following https://github.com/pytorch/pytorch/pull/139162. Feedback welcome. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139102 Approved by: https://github.com/Chillee	2025-02-08 19:30:19 +00:00
James Wu	76c8a2dc48	Fix get_top() to return the base level event of the stack, not the most recently started event (#146649 ) `get_top()` is really confusing when talking about a stack, because it can mean the most recently started event on the stack or the toplevel event in perfetto(which displays the stack upside down). Rename to `get_outermost` and fix the bug associated with it, so that it returns the correct value out of the stack. Running nanogpt now puts `guard_latency_us` correctly in the `dynamo` event: ``` tlp python benchmarks/dynamo/torchbench.py --backend inductor --device cuda --only nanogpt --amp --cold-start-latency --print-compilation-time --training --performance 2>&1 --dynamic-shapes \| tee out.log ``` <img width="1281" alt="image" src="https://github.com/user-attachments/assets/4eeb371a-4d81-415a-acc4-7d303a4b2a93" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/146649 Approved by: https://github.com/masnesral, https://github.com/anijain2305	2025-02-07 18:04:50 +00:00
Animesh Jain	ee45ea599d	[dynamo] Actionable message on recompilations for fullgraph=True (#146550 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146550 Approved by: https://github.com/zou3519, https://github.com/StrongerXi ghstack dependencies: #146553	2025-02-07 17:28:43 +00:00
Animesh Jain	fa0956951c	[dynamo] Remove the suggestion to use suppress_errors on compiler error (#146553 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146553 Approved by: https://github.com/zou3519, https://github.com/jansel	2025-02-07 17:28:43 +00:00
Animesh Jain	99ddbb4802	[dynamo][fullgraph] Do not skip frame with fullgraph=True (#146527 ) Earlier if there were no ops in the graph, fullgraph=True will also fallback to eager. This hides issues in testing, where we silently fallback to eager, and do not test optimized bytecode. As can be seen in the PR, I had to fix several tests when I forced to use the optimized bytecode in the absence of graph. A few failing tests will be fixed in follow up PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146527 Approved by: https://github.com/zou3519, https://github.com/StrongerXi	2025-02-06 18:56:07 +00:00
Animesh Jain	e2e265e27b	[dynamo] Use polyfill to implement comparison operators (#144485 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144485 Approved by: https://github.com/jansel	2025-02-06 17:27:07 +00:00
Simon Fan	a14c780c4c	[dynamo] fix dynamo_compile logging on RecompileLimitExceeded (#146544 ) Logging branches based on RecompileLimitExceeded or not. If we exceed the limit, we fallback to eager before even trying to analyze the frame. We handle RecompileLimitExceeded outside of the try/catch/finally that edits the metrics context: `72405b0c0f/torch/_dynamo/convert_frame.py (L908-L935)`. dynamo_config and recompile_reason are both known before we raise the RecompileLimitExceeded, so we can add them with the rest of the "common" metrics. which are logged on metric_context decorator exit and is always called Pull Request resolved: https://github.com/pytorch/pytorch/pull/146544 Approved by: https://github.com/masnesral	2025-02-06 16:20:42 +00:00
PyTorch MergeBot	1b79d47635	Revert "[dynamo] check for incompatible configs (#146513 )" This reverts commit `aab7925418`. Reverted https://github.com/pytorch/pytorch/pull/146513 on behalf of https://github.com/atalman due to inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_dynamo_bisect [GH job link](https://github.com/pytorch/pytorch/actions/runs/13174131431/job/36772837627) [HUD commit link](`4a545eb85d`) ([comment](https://github.com/pytorch/pytorch/pull/146513#issuecomment-2639860568))	2025-02-06 13:42:25 +00:00
Animesh Jain	340cfe4f28	[dynamo][fbcode] Turn on inline_inbuilt_nn_modules (#145407 ) As title. Some internal testing at https://fb.workplace.com/groups/241460628989036/permalink/411650015303429/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/145407 Approved by: https://github.com/ezyang, https://github.com/jansel	2025-02-06 13:18:35 +00:00
Simon Fan	aab7925418	[dynamo] check for incompatible configs (#146513 ) internal: https://fb.workplace.com/groups/1075192433118967/permalink/1599802033991335/ Assuming flags don't change during compilation, we shouldn't allow incompatible configs to be set at torch.compile wrap time. Not in this PR: For flags that need to change during compilation, we'd have to be strict about where they can be used in the compile lifecycle Pull Request resolved: https://github.com/pytorch/pytorch/pull/146513 Approved by: https://github.com/williamwen42	2025-02-06 07:39:52 +00:00
bobrenjc93	389c5c0842	print out partial fx graph for all data-dependent errors (#146363 ) The previous implementation didn't catch the following type of errors ``` torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not extract specialized integer from data-dependent expression u2 (unhinted: u2). (Size-like symbols: none) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146363 Approved by: https://github.com/angelayi, https://github.com/bdhirsh ghstack dependencies: #146298, #146296	2025-02-06 04:21:34 +00:00
Simon Fan	72405b0c0f	[ca] refactor compile reasons and log to tlparse (#146386 ) This PR accumulates comple reasons inside each CacheNode, and logs them to tlparse on each CA compile. This defines a compile as an autograd structure change, and a recompile as a dynamic shape change. sample tlparse: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpdbo7gt/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 for compiles: ```python [ "!0: Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]" ] ``` for recompiles: ```python [ "!0: Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]", "!1: Cache miss due to 7 changed tensor shapes (total of 7): sizes[0], sizes[1], sizes[2], sizes[3], sizes[4], sizes[5], sizes[6]" ] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146386 Approved by: https://github.com/jansel ghstack dependencies: #146229	2025-02-05 23:33:21 +00:00
Simon Fan	e20b0c82d1	[ca] no longer require is_traceable annotations for c++ autograd functions (#146229 ) This PR removes the CA compile-time error for C++ autograd functions, and supports them by having dynamo graph break on them (instead of allow_in_graph). The CppNode's collects are kept as is for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146229 Approved by: https://github.com/jansel, https://github.com/zou3519	2025-02-05 08:49:17 +00:00
clr	93d98aca31	inductor: Don't throw an internal error when a nn.module is missing a attribute (#145122 ) If a nn.module getattr call throws, we should make sure that we don't crash with an internal error Note that I couldn't figure out how to test this, so advice would be awesome. I have my best case attempt at https://github.com/pytorch/pytorch/pull/145799, but it doesn't seem to reproduce the crash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145122 Approved by: https://github.com/jansel	2025-02-05 05:49:32 +00:00
Michael Lazos	616ac94175	[Dynamo] Fix spammy optimizer warning (#146374 ) Fixes https://discuss.pytorch.org/t/torch-compile-optimizer-step-generates-excessive-warning-messages/216067/7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146374 Approved by: https://github.com/anijain2305	2025-02-05 01:03:49 +00:00
clr	4e194bbfd6	dynamo: fsdp throw unimplemented vs attribute error (#146188 ) Rather than throw a full exception for fsdp, instead just return unimplemented, and respect the user options (i.e. fullgraph, vs graph break). Pull Request resolved: https://github.com/pytorch/pytorch/pull/146188 Approved by: https://github.com/jansel	2025-02-04 21:45:55 +00:00
Yanbo Liang	07b9fe0690	[Trace PyDispatcher] Add CustomFunctionHigherOrderOperatorVariable (#146272 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146272 Approved by: https://github.com/zou3519 ghstack dependencies: #146270, #146271	2025-02-04 20:55:51 +00:00
Aaron Gokaslan	7f65a20884	[BE]: Enable ruff SLOT checks (#146276 ) This enables a check that which a class which only inherits from immutable classes like str, tuple, and NamedTuple, also defined `__slots__` so they don't allocate memory unnecessarily. This also ensure contributors think about how they define their classes with subclass NamedTuples and str, of which we have many in our codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/146276 Approved by: https://github.com/aorenste	2025-02-04 19:18:23 +00:00
bobrenjc93	c591ad0c03	dump partial fx graph to stderr when dynamo tracing fails with guard on data-dependent (#146296 ) As discussed with @avikchaudhuri and @bdhirsh last week, this can be quite useful when debugging. The following code produces a data dependent error ``` import torch from torch import nn # UserError: Could not guard on data-dependent expression Eq(507 - u0, 0) (unhinted: Eq(507 - u0, 0)). (Size-like symbols: u0) class Repro(nn.Module): def __init__(self): super().__init__() def forward(self, cache, update, pos): _, _, max_seq_len, _ = cache.shape _, _, seqlen, _ = update.shape pos_item = pos[0].item() # u0 torch._check(pos_item + seqlen <= max_seq_len) # u0 + 502 <= 507 torch._check(pos_item >= 0) before = cache.narrow(2, 0, pos_item) # FAIL # Laith: why can't we make unbacked expressions size-like? after = cache.narrow(2, (pos_item + seqlen), (max_seq_len - pos_item - seqlen)) # PASS end = torch.tensor(max_seq_len - pos_item - seqlen).item() after = cache.narrow(2, (pos_item + seqlen), end) return torch.cat([before, update, after], dim=2) repro = Repro() bsz = 1 n_heads = 4 max_seq_len = 512 head_dim = 64 seqlen = 5 pos_item = 1 cache = torch.zeros(bsz, n_heads, max_seq_len, head_dim) update = torch.ones(bsz, n_heads, seqlen, head_dim) pos = torch.tensor([pos_item]) example_inputs = (cache, update, pos) torch.export.export(repro, example_inputs) ``` This is what it now prints out ``` class GraphModule(torch.nn.Module): def forward(self, L_cache_: "f32[1, 4, 512, 64][131072, 32768, 64, 1]cpu", L_update_: "f32[1, 4, 5, 64][1280, 320, 64, 1]cpu", L_pos_: "i64[1][1]cpu"): l_cache_ = L_cache_ l_update_ = L_update_ l_pos_ = L_pos_ # File: /data/users/bobren/a/pytorch/r1.py:14 in forward, code: pos_item = pos[0].item() # u0 getitem: "i64[][]cpu" = l_pos_[0]; l_pos_ = None item: "Sym(u0)" = getitem.item(); getitem = None # File: /data/users/bobren/a/pytorch/r1.py:15 in forward, code: torch._check(pos_item + seqlen <= max_seq_len) # u0 + 502 <= 507 add: "Sym(u0 + 5)" = item + 5 le: "Sym(u0 + 5 <= 512)" = add <= 512; add = None _check = torch._check(le); le = _check = None # File: /data/users/bobren/a/pytorch/r1.py:16 in forward, code: torch._check(pos_item >= 0) ge: "Sym(u0 >= 0)" = item >= 0 _check_1 = torch._check(ge); ge = _check_1 = None # File: /data/users/bobren/a/pytorch/r1.py:17 in forward, code: before = cache.narrow(2, 0, pos_item) before: "f32[1, 4, u0, 64][131072, 32768, 64, 1]cpu" = l_cache_.narrow(2, 0, item); before = None # File: /data/users/bobren/a/pytorch/r1.py:21 in forward, code: after = cache.narrow(2, (pos_item + seqlen), (max_seq_len - pos_item - seqlen)) add_1: "Sym(u0 + 5)" = item + 5 sub: "Sym(512 - u0)" = 512 - item; item = None sub_1: "Sym(507 - u0)" = sub - 5; sub = None narrow_1 = l_cache_.narrow(2, add_1, sub_1); l_cache_ = add_1 = sub_1 = narrow_1 = None Traceback (most recent call last): File "/data/users/bobren/a/pytorch/torch/_dynamo/utils.py", line 3075, in run_node return getattr(args[0], node.target)(args[1:], kwargs) File "/data/users/bobren/a/pytorch/torch/utils/_stats.py", line 27, in wrapper return fn(args, *kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1267, in __torch_dispatch__ return self.dispatch(func, types, args, kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1808, in dispatch return self._cached_dispatch_impl(func, types, args, kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1369, in _cached_dispatch_impl output = self._dispatch_impl(func, types, args, kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 2282, in _dispatch_impl decomposition_table[func](args, *kwargs) File "/data/users/bobren/a/pytorch/torch/_decomp/decompositions.py", line 759, in slice_forward return self.as_strided(sizes, strides, storage_offset) File "/data/users/bobren/a/pytorch/torch/utils/_stats.py", line 27, in wrapper return fn(args, *kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1267, in __torch_dispatch__ return self.dispatch(func, types, args, kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1808, in dispatch return self._cached_dispatch_impl(func, types, args, kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1370, in _cached_dispatch_impl entry = self._make_cache_entry(state, key, func, args, kwargs, output) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1640, in _make_cache_entry output_info = self._get_output_info_for_cache_entry( File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1583, in _get_output_info_for_cache_entry synth_output = self._output_from_cache_entry( File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1738, in _output_from_cache_entry return self._get_output_tensor_from_cache_entry( File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1709, in _get_output_tensor_from_cache_entry empty.set_(storage, storage_offset, shape, stride) File "/data/users/bobren/a/pytorch/torch/fx/experimental/sym_node.py", line 564, in guard_size_oblivious r = self.shape_env.evaluate_expr( File "/data/users/bobren/a/pytorch/torch/fx/experimental/recording.py", line 263, in wrapper return retlog(fn(args, **kwargs)) File "/data/users/bobren/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 6468, in evaluate_expr return self._evaluate_expr( File "/data/users/bobren/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 6658, in _evaluate_expr raise self._make_data_dependent_error( torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not guard on data-dependent expression Ne(507 - u0, 1) (unhinted: Ne(507 - u0, 1)). (Size-like symbols: u0) Caused by: after = cache.narrow(2, (pos_item + seqlen), (max_seq_len - pos_item - seqlen)) # r1.py:21 in forward (utils/_stats.py:27 in wrapper) For more information, run with TORCH_LOGS="dynamic" For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u0" If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1 For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146296 Approved by: https://github.com/zou3519 ghstack dependencies: #146298	2025-02-04 19:12:39 +00:00
Aaron Gokaslan	292af3cc89	[BE][Ez]: ISC001 Auto concatenate implicit one line strings (#146408 ) Apply ruff rule about implicit string concatenation, this autofixes strings that are all the same type and on the same line. These lines are broken up likely as the result of autoformatters in the past. All fixes are automated using the autofixes in ISC001. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146408 Approved by: https://github.com/justinchuby, https://github.com/janeyx99	2025-02-04 19:07:04 +00:00
rzou	f38a2ea0d4	[Dynamo] Better unsupported message for Fake Tensor Exception (#146357 ) I cannot repro this. But this line shows up in internal logs, and I want to know what the exception is and the context inside it. All of the exceptions_allowed_to_be_fallback are dataclasses, so they should print nicely. Test Plan: - code reading Pull Request resolved: https://github.com/pytorch/pytorch/pull/146357 Approved by: https://github.com/williamwen42	2025-02-04 18:52:11 +00:00
Brian Hirsh	e68f5087d8	update _unsafe_set_version_counter to accept lists of tensors (#137921 ) See the comment [here](https://github.com/pytorch/pytorch/issues/132014#issuecomment-2379547400) (cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @XilunWu @rec) - this PR updates `_unsafe_set_version_counter` to accept a list of tensors, for overhead-sensitive users (e.g. distributed) who need to hide VC bumps from autograd on a large list of tensors without wanting to suffer the overhead of going from python->C++ separately for every tensor in the list. I left the binding in pybind, and used a `std::vector`. if we really need to optimize overhead even further, we could write a manual cpython binding. I use this updated API in the next PR to fix FSDP2, so that it properly hides the VC of all `all_gather_buffer` tensors in its call to `split_with_sizes_copy.out(all_gather_buffers)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137921 Approved by: https://github.com/awgu, https://github.com/albanD	2025-02-04 04:51:11 +00:00
Animesh Jain	487400f47f	[dynamo] Support functools.partial variables through inspect.signature (#146339 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146339 Approved by: https://github.com/jansel ghstack dependencies: #146322, #146116	2025-02-04 04:39:39 +00:00
Animesh Jain	5f53889850	[dynamo][builtin-skipfiles-cleanup] Remove inspect (#146116 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146116 Approved by: https://github.com/williamwen42, https://github.com/zou3519, https://github.com/jansel ghstack dependencies: #146322	2025-02-04 03:36:07 +00:00
Animesh Jain	0da07a6d1d	[dynamo][skip-function] Add missing unimplemented line (#146322 ) This is a missing line from the merged PR in the stack below. Lets try to get this in quickly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146322 Approved by: https://github.com/StrongerXi, https://github.com/jansel, https://github.com/mlazos	2025-02-03 22:11:55 +00:00
Yanbo Liang	15e12d5ec3	[Trace PyDispatcher] Support temporarily_pop_interpreter_stack ctx manager (#146271 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146271 Approved by: https://github.com/zou3519 ghstack dependencies: #146270	2025-02-03 21:47:54 +00:00
Simon Fan	1d4adf4e1f	[dynamo] log recompile reason to dynamo_compile (#146117 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146117 Approved by: https://github.com/bobrenjc93	2025-02-03 21:04:04 +00:00
Harmen Stoppels	01554c7b5a	fix incorrect literal strings / accidental tuples (#146037 ) * `expr,` is short for `(expr,)` * literal strings over multiple lines need to escape the newline `\` or use `(...)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146037 Approved by: https://github.com/Skylion007	2025-02-03 15:08:11 +00:00
Animesh Jain	fa48757180	[dynamo] misc fixes for inspect (#146283 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146283 Approved by: https://github.com/jansel ghstack dependencies: #146075	2025-02-03 04:26:10 +00:00
Animesh Jain	c0ec2e0a0d	[dynamo][functions] Improve getattr on functions (#146075 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146075 Approved by: https://github.com/jansel	2025-02-03 02:01:57 +00:00
Yanbo Liang	511d0dd558	[Dynamo][Trace PyDispatcher] Support calling id function over class (#146269 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146269 Approved by: https://github.com/anijain2305	2025-02-02 22:29:30 +00:00
Animesh Jain	cef856faa9	[dynamo][enum] Trace through enum.py for enum construction (#146070 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146070 Approved by: https://github.com/jansel ghstack dependencies: #146062, #146198, #146258, #146214	2025-02-02 03:12:36 +00:00
Animesh Jain	31fb691782	[dynamo] Graph break on tensor.retain_grad (#146214 ) Fixes https://github.com/pytorch/pytorch/issues/146212 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146214 Approved by: https://github.com/jansel ghstack dependencies: #146062, #146198, #146258	2025-02-02 03:12:36 +00:00
Animesh Jain	529eb8d558	[dynamo] Add return to python_type (#146258 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146258 Approved by: https://github.com/jansel ghstack dependencies: #146062, #146198	2025-02-02 03:12:36 +00:00
Nikita Shulga	e56dcf2772	[CPUInductor] Fix SVE256 detection (#146207 ) This PR removes `torch.cpu._is_arm_sve_supported()` and replaces is with stable `torch.backends.cpu.get_cpu_capability()` I should have reviewed https://github.com/pytorch/pytorch/pull/134672 more thoroughly, because it introduced duplicate, but slightly different API for detecting CPU architectures, which resulted in runtime crashes on system that do support SVE128, rather than SVE256 Fixes https://github.com/pytorch/pytorch/issues/145441 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146207 Approved by: https://github.com/angelayi	2025-02-01 18:51:34 +00:00
Shangdi Yu	a97a906dd9	Add "//caffe2:libtorch" to minifier TARGET file (#146203 ) Summary: as title. To avoid errors like "undefined symbol: aoti_torch_device_type_cpu" when compiling minifier_launcher.py Test Plan: CI Differential Revision: D68978430 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146203 Approved by: https://github.com/desertfire	2025-02-01 05:37:23 +00:00
Animesh Jain	1de41e6918	[dynamo][exceptions][3.10] Clean symbolic stack on exception handling (#146198 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146198 Approved by: https://github.com/williamwen42 ghstack dependencies: #146062	2025-02-01 02:51:44 +00:00
Animesh Jain	f25f1163dc	[dynamo] Support frozenset({..}).__contains__ (#146062 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146062 Approved by: https://github.com/Skylion007, https://github.com/jansel	2025-01-31 23:22:58 +00:00
Animesh Jain	781aceee9c	[dynamo] Revert abc change due to internal failures (#146177 ) xref - https://www.internalfb.com/tasks/?t=191383874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146177 Approved by: https://github.com/StrongerXi ghstack dependencies: #146141	2025-01-31 21:28:06 +00:00
William Wen	49df8de8be	[dynamo] disable eval_frame callback in _TorchDynamoContext __enter__/__exit__ (#145981 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145981 Approved by: https://github.com/jansel	2025-01-31 20:40:59 +00:00
Animesh Jain	667b94d1c2	[hotfix][dynamo] Skip linecache due to a flaky issue (#146141 ) A large number of jit + dynamo wrapped tests fail in linecache tracing. We need further debugging. Skipping for now to stem the bleeding. https://github.com/pytorch/pytorch/issues/146076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146141 Approved by: https://github.com/StrongerXi	2025-01-31 17:45:06 +00:00
PyTorch MergeBot	f5a61ba0a3	Revert "inductor: Don't throw an internal error when a nn.module is missing a attribute (#145122 )" This reverts commit `d100e9ae74`. Reverted https://github.com/pytorch/pytorch/pull/145122 on behalf of https://github.com/ZainRizvi due to Sorry but this is failing internally. See D68924977 for details ([comment](https://github.com/pytorch/pytorch/pull/145122#issuecomment-2627880860))	2025-01-31 17:39:23 +00:00
Aaron Orenstein	57d8278ab9	pickler for GraphModule (#141659 ) Pickling GraphModule needs some special handling for wrapping things that normally can't be pickled - but async compile needs to pass them across a wire so we need to be able to serialize it - add some helpers to enable that. Differential Revision: [D68921318](https://our.internmc.facebook.com/intern/diff/D68921318) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141659 Approved by: https://github.com/jamesjwu	2025-01-31 05:34:28 +00:00
Pian Pawakapan	ffb424eab6	[dynamo/export] call local_scalar_dense when full() value is scalar tensor (#144999 ) Fixes https://github.com/pytorch/pytorch/issues/144907 ``` class Foo(torch.nn.Module): def forward(self, val): return torch.full((80, 2), val, dtype=torch.float32) export(Foo(), args=(torch.tensor(1),)) ``` When we have a `torch.full` call like above, where the fill value is a scalar Tensor and not a scalar value, the FX graph from `_dynamo.export()` contains a single node: the full op. We run into a `PendingUnbackedSymbolNotFound` error, because the `item()` call is implicit; the UnbackedSymInt is extracted but goes directly into the data of the output tensor value, and we're then unable to locate it when we try to compute unbacked bindings. On the other hand, non-strict export doesn't face this, because an explicit `item()`, or `local_scalar_dense` node is inserted, and the unbacked binding is directly the example value of that node. This adds a dynamo handler to imitate what happens in non-strict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144999 Approved by: https://github.com/angelayi	2025-01-31 02:45:43 +00:00
Oguz Ulgen	ccd27e8129	Turn on fx graph cache and automatic dynamic pgo local caches in fbcode (#146065 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146065 Approved by: https://github.com/jamesjwu	2025-01-31 01:11:48 +00:00
Animesh Jain	1e3d1738a4	[dynamo][polyfills]Support getrecursionlimit (#145989 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145989 Approved by: https://github.com/StrongerXi, https://github.com/jansel ghstack dependencies: #145986, #145987, #145994	2025-01-31 00:47:31 +00:00
Animesh Jain	e7bb608d02	[dynamo][dicts] Support construction of types.MappingProxyType (#145994 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145994 Approved by: https://github.com/StrongerXi, https://github.com/jansel ghstack dependencies: #145986, #145987	2025-01-31 00:47:31 +00:00
Animesh Jain	4665bc2cc0	[dynamo][functions] Support `id` on function (#145987 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145987 Approved by: https://github.com/StrongerXi, https://github.com/jansel, https://github.com/mlazos ghstack dependencies: #145986	2025-01-31 00:47:23 +00:00
Animesh Jain	56307dc370	[dynamo][dicts] Raise exception on pop (#145986 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145986 Approved by: https://github.com/Skylion007, https://github.com/williamwen42, https://github.com/StrongerXi, https://github.com/jansel	2025-01-31 00:47:13 +00:00
Aaron Orenstein	23695ea002	Fix dynamo use of `list[int]` in graph break (#145554 ) This reintroduces the change backed out by #145393 and fixes the underlying problem. Although using a BuiltinVariable was better than nothing when we saw a GenericAlias it had problems if there was a graph break and we had to reconstruct the original python code which BuiltinVariable did as a simple `list` instead of a `list[int]`. This changes it to use a TypingVariable instead and then teaches TypingVariable how to reconstruct. Original commit changeset: 77b9193acb23 python test/dynamo/test_repros.py ReproTests.test_graph_break_on_jit_isinstance Pull Request resolved: https://github.com/pytorch/pytorch/pull/145554 Approved by: https://github.com/anijain2305 ghstack dependencies: #145551, #145552, #145553	2025-01-30 22:21:40 +00:00
Aaron Orenstein	fbb076cc45	Fix call to create_load_global (#145553 ) There is no version of create_load_global() that takes three parameters - any use of this function will fail. I think this is probably the correct fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145553 Approved by: https://github.com/anijain2305 ghstack dependencies: #145551, #145552	2025-01-30 22:21:40 +00:00
Aaron Orenstein	ccbbc88bbb	Turn on mypy for _dynamo/variables/builtin.py (#145552 ) The fact that mypy errors were ignored was hiding several bugs in builtin.py (for example the previous diff's incorrect override and use of `call_getattr`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145552 Approved by: https://github.com/anijain2305, https://github.com/Skylion007 ghstack dependencies: #145551	2025-01-30 22:21:32 +00:00
Aaron Orenstein	f3120f6d26	Remove incorrect BuiltinVariable.call_hasattr() (#145551 ) BuiltinVariable.call_hasattr() overrides the base class - but actually behaves differently. The base is `obj.call_hasattr(tx, attr)` but BuiltinVariable's version is `<unused>.call_hasattr(tx, obj, attr)`. The BuiltinVariable version is used as a pattern from `call_self_handler()` for `BuiltinVariable(hasattr)`. I think the other version is just used for internal `hasattr(obj, name)` so I renamed that one to `call_obj_hasattr`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145551 Approved by: https://github.com/anijain2305	2025-01-30 22:21:19 +00:00
clr	d100e9ae74	inductor: Don't throw an internal error when a nn.module is missing a attribute (#145122 ) If a nn.module getattr call throws, we should make sure that we don't crash with an internal error Note that I couldn't figure out how to test this, so advice would be awesome. I have my best case attempt at https://github.com/pytorch/pytorch/pull/145799, but it doesn't seem to reproduce the crash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145122 Approved by: https://github.com/jansel	2025-01-30 21:55:29 +00:00
Yidi Wu	7e7341bddd	[hop] fix unbacked_bindings meta for while_loop (#143559 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143559 Approved by: https://github.com/zou3519	2025-01-30 21:33:09 +00:00
Thomas Bohnstingl	9f9904172d	[scan] scan dim handling in user-facing scan() (#145179 ) This PR introduces the capability that the scan dim is handled in the user facing scan() call. Internally, the scan dim is always shifted to dim 0 and then the scan is performed over that dim. This is a follow-up PR from https://github.com/bohnstingl/pytorch/pull/3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145179 Approved by: https://github.com/ydwu4	2025-01-30 21:09:07 +00:00
Yidi Wu	a3698ebd5c	[while_loop] specialize when cond_fn return constants (#144515 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144515 Approved by: https://github.com/zou3519	2025-01-30 19:02:34 +00:00
IvanKobzarev	894ef8c1e3	[torchbench] Inductor freezing bfloat16 conv folding needs high tolerance (#145623 ) Issue: https://github.com/pytorch/pytorch/issues/144888 Torchbench of timm lcnet_050 model fails on accuracy in case of `--frezing` `--inference` `--bfloat16` `res_error==0.12` If to turn off convolution inductor constant folding - `res_error==0.016` `float16 error ~ 0.00669` `float16 without conv folding ~ 0.0018` convolution folding results in increase of error almost at one order of magnitude. I think we should revisit and try to do something to improve the accuracy for conv folding. E.g. For example doing conv folding at compilation time with float64? At the moment I am adding counters to identify if convolution folding happened, and in case of bfloat16 and conv_folding - increase multiplier to the max level (10) to pass accuracy test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145623 Approved by: https://github.com/eellison	2025-01-30 12:46:35 +00:00
PyTorch MergeBot	1185b81c51	Revert "[dynamo] Use polyfill to implement comparison operators (#144485 )" This reverts commit `d1f82de2bf`. Reverted https://github.com/pytorch/pytorch/pull/144485 on behalf of https://github.com/huydhn due to This seems to break dynamo tests in trunk after landing ([comment](https://github.com/pytorch/pytorch/pull/144485#issuecomment-2622893294))	2025-01-29 21:30:42 +00:00
Animesh Jain	d1f82de2bf	[dynamo] Use polyfill to implement comparison operators (#144485 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144485 Approved by: https://github.com/jansel	2025-01-29 17:37:40 +00:00
Animesh Jain	4499d60d56	[dynamo][builin-skipfiles-cleanup] Remove types (#145909 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145909 Approved by: https://github.com/zou3519 ghstack dependencies: #145856, #145875, #145878, #145892	2025-01-29 16:47:02 +00:00
Animesh Jain	3f77002b96	[dynamo][builtin-skipfiles-cleanup] remove abc, enum, importlib (#145892 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145892 Approved by: https://github.com/williamwen42, https://github.com/StrongerXi ghstack dependencies: #145856, #145875, #145878	2025-01-29 05:30:06 +00:00
Animesh Jain	236793684d	[dynamo][builtin-skipfiles-cleanup] Remove threading, _collections_abc, _weakrefset, threading (#145878 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145878 Approved by: https://github.com/williamwen42, https://github.com/StrongerXi ghstack dependencies: #145856, #145875	2025-01-29 05:30:06 +00:00
Animesh Jain	a479656cd2	[dynamo][builtin-skipfiles-removal] Remove logging (#145875 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145875 Approved by: https://github.com/williamwen42 ghstack dependencies: #145856	2025-01-29 05:29:58 +00:00
Animesh Jain	64ee57847b	[dynamo][builtin-skipfiles-cleanup] Remove some builtins (#145856 ) [dynamo][builtin-skipfiles-cleanup] Remove more builtins Pull Request resolved: https://github.com/pytorch/pytorch/pull/145856 Approved by: https://github.com/zou3519	2025-01-29 05:29:47 +00:00
Thomas Bohnstingl	82859f6185	[associative_scan] scan dim handling in user-facing associative_scan() (#139864 ) This PR implements the user-facing dim change, i.e., that the scan dim provided by the user is always moved to dim 0 and then the associative_scan operation always operates on dim 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139864 Approved by: https://github.com/ydwu4	2025-01-28 23:58:10 +00:00
PyTorch MergeBot	3481c2aec4	Revert "[dynamo] save/restore system random state more carefully (#145750 )" This reverts commit `e3d3f2b22e`. Reverted https://github.com/pytorch/pytorch/pull/145750 on behalf of https://github.com/eellison due to bisected perf regression ([comment](https://github.com/pytorch/pytorch/pull/145750#issuecomment-2620028414))	2025-01-28 20:51:07 +00:00
Ryan Guo	eaff13275e	[dynamo] Properly branch on an unspecialized NN module (#145786 ) User defined NN module might have their own `__len__` or `__bool__` methods which Dynamo needs to trace through, so that side effects and/or reads to buffered writes are properly handled. This patch removes the special `UnspecializedNNModuleVariable` branch in Dynamo's branch handling, and lets these cases fall into the `UserDefinedObjectVariable` branch, which handles the aforementioned cases correctly. Fixes #145284. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145786 Approved by: https://github.com/williamwen42	2025-01-28 19:45:17 +00:00
Ryan Guo	eaec97ab1f	[dynamo] Properly prune dead input cell object (#145781 ) This patch models input cell object as "newly created" rather than "pre-existing" python object (see added documentation for why this actually captures the semantics more accurately). This enables the `SideEffects.prune_dead_object_new` algorithm to prune away writes to input cell objects which are no longer relevant; this didn't happen prior to this patch because we modelled them as pre-existing objects, which forces us to codegen their attribute mutations. Fixes #145564. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145781 Approved by: https://github.com/williamwen42, https://github.com/jansel	2025-01-28 18:28:13 +00:00
Animesh Jain	80a0412b76	[dynamo][builtin-skipfiles-cleanup] Remove posixpath (#145828 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145828 Approved by: https://github.com/zou3519 ghstack dependencies: #145744, #145753, #145826	2025-01-28 16:14:34 +00:00
Animesh Jain	6824a4a75d	[dynamo][builtin-skipfiles-cleanup] Remove re (#145826 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145826 Approved by: https://github.com/zou3519 ghstack dependencies: #145744, #145753	2025-01-28 16:14:34 +00:00
Animesh Jain	4307e6c008	[dynamo][builtin-skipfile-cleanup] Remove signal (#145753 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145753 Approved by: https://github.com/zou3519 ghstack dependencies: #145744	2025-01-28 16:14:23 +00:00
Animesh Jain	5c5306e8bc	[dynamo][builtin-skiplist-cleanup] Remove weakref (#145744 ) WeakKeyDictionary already works very nicely with the UserDefinedObject Variable Tracker. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145744 Approved by: https://github.com/jansel	2025-01-28 07:55:12 +00:00
Burak Turk	01a4d86b31	add pt2 callbacks for backward pass and prevent duplicate callbacks (#145732 ) Summary: This change adds callbacks for lazy backwards compilation while preventing duplicate callbacks to be fired. Differential Revision: D68577593 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145732 Approved by: https://github.com/mlazos	2025-01-28 03:50:02 +00:00
William Wen	e3d3f2b22e	[dynamo] save/restore system random state more carefully (#145750 ) Reattempt of https://github.com/pytorch/pytorch/pull/145435 since the state of the linked internal diff appears to be messed up. Note: I have verified that the previously failing internal tests now pass internally. Differential Revision: [D68723334](https://our.internmc.facebook.com/intern/diff/D68723334) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145750 Approved by: https://github.com/StrongerXi	2025-01-28 01:34:13 +00:00
Ryan Guo	5a4d959cdb	[dynamo] Properly model torch profiler context objects (#145537 ) Prior to this patch, Dynamo conveniently modelled torch profiler context objects (e.g., `torch.profiler.profile`) as `NullContextVariable` because `torch.compile` ignore the effect of these profiler contexts. However, the semantics of these profiler contexts diverges from `contextlib.nullcontext` in the `__enter__` function, where the former returns `self` and the latter returns `None`. This causes subtle error as observed in #125021. This patch adds back a `ProfilerContextVariable`, which addresses the aforementioned semantic discrepency. Fixes #125021. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145537 Approved by: https://github.com/zou3519, https://github.com/williamwen42	2025-01-28 00:03:36 +00:00
Colin L. Rice	c1161957a4	inductor_config_logging: Don't drop keys (#144700 ) This bit me while I was trying to debug some trace issues. In general this config is already quite large when dumping, so adding more fields doesn't make it significantly worse. Also a number of the items we are type checking for (except the test configs), don't even show up. Primarily this will help us when debugging rocm, halide, and trace configs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144700 Approved by: https://github.com/ezyang	2025-01-27 23:47:25 +00:00
PyTorch MergeBot	2de53b3b65	Revert "pickler for GraphModule (#141659 )" This reverts commit `c6ad08357b`. Reverted https://github.com/pytorch/pytorch/pull/141659 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally, please take a look at D68694181 for more details. ([comment](https://github.com/pytorch/pytorch/pull/141659#issuecomment-2617045120))	2025-01-27 22:39:30 +00:00
Animesh Jain	993b229665	[dynamo][dicts] Fix dict.__new__ bug (#145723 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145723 Approved by: https://github.com/jansel, https://github.com/StrongerXi ghstack dependencies: #145519, #145547, #145558	2025-01-27 21:42:43 +00:00
Animesh Jain	7e1c7253e9	[dynamo][builtin-skipfile-cleanup] Support tuple.__new__ (#145558 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145558 Approved by: https://github.com/jansel, https://github.com/StrongerXi ghstack dependencies: #145519, #145547	2025-01-27 21:42:43 +00:00
Ryan Guo	bfaf76bfc6	[dynamo] clear out traced frames at the start of `test_log_traced_frames` (#145640 ) The test was being flaky in CI, and this patch fixes it. Fixes #137461. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145640 Approved by: https://github.com/williamwen42	2025-01-27 20:49:59 +00:00
Randolf Scholz	835e770bad	Use `typing.IO[bytes]` instead of `io.BytesIO` in annotations (#144994 ) Fixes #144976 Using appoach ① `IO[bytes]`, but could also try with a protocol. ## Notes: - moved `torch.serialization.FILE_LIKE` to `torch.types.FileLike` - Use `FileLike` annotation where it makes sense - made sure those functions also support `os.PathLike` - Replaced `isinstance(x, io.BytesIO)` with `isinstance(x, (io.IOBase, IO))` where appropriate. - Replaced `BinaryIO` with `IO[bytes]` (the two ABCs are almost identical, the only difference is that `BinaryIO` allows `bytearray` input to `write`, whereas `IO[bytes]` only `bytes`) - needed to make `torch.serialization._opener` generic to avoid LSP violations. - skipped `torch/onnx/verification` for now (functions use `BytesIO.getvalue` which is not part of the `IO[bytes]` ABC, but it kind of seems that this is redundant, as e.g. `onnx.load` supports `str \| PathLike[str] \| IO[bytes]` directly... Pull Request resolved: https://github.com/pytorch/pytorch/pull/144994 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2025-01-27 18:08:07 +00:00
rzou	ea141d8134	functional compiled autograd (#144707 ) This PR squashes together the following commits: https://github.com/pytorch/pytorch/pull/144115 https://github.com/pytorch/pytorch/pull/143417 https://github.com/pytorch/pytorch/pull/143405 https://github.com/pytorch/pytorch/pull/143387 https://github.com/pytorch/pytorch/pull/143304 https://github.com/pytorch/pytorch/pull/143296 This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses. For more information, please read the commit messages for each PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144707 Approved by: https://github.com/bdhirsh, https://github.com/xmfan, https://github.com/jansel	2025-01-27 05:20:56 +00:00
Aaron Orenstein	c6ad08357b	pickler for GraphModule (#141659 ) Pickling GraphModule needs some special handling for wrapping things that normally can't be pickled - but async compile needs to pass them across a wire so we need to be able to serialize it - add some helpers to enable that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141659 Approved by: https://github.com/jamesjwu	2025-01-26 19:29:13 +00:00
Edward Z. Yang	90448f0128	Output of nonzero is transposed, fix fake tensor (#144695 ) Needs this companion executorch PR: https://github.com/pytorch/executorch/pull/7657 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/144695 Approved by: https://github.com/bobrenjc93, https://github.com/albanD	2025-01-26 01:07:22 +00:00
Xuehai Pan	0afdee4c39	[dynamo] raise IndexError when inserting into a full `deque` (#139379 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139379 Approved by: https://github.com/jansel	2025-01-25 18:04:49 +00:00
Yuanhao Ji	cc1ecead07	[Dynamo] Allow `format()` to handle int (#144956 ) Fixes #144830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144956 Approved by: https://github.com/jansel	2025-01-25 04:12:45 +00:00
Animesh Jain	ef60de07a0	[dynamo] Log guard latency (#145132 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145132 Approved by: https://github.com/ezyang ghstack dependencies: #145509	2025-01-25 03:01:18 +00:00
Shangdi Yu	4cc5e880f9	Add accuracy issue support in AOTI Minifier (#145539 ) Summary: Add three more repro levels for AOTI minifier (level 2 already exists). They are the same as the existing dynamo minifier repro levels. Now AOTI minifier can minify and repro programs that have numerical accuracy issues as well. 1: Dumps the original graph out to repro.py if compilation fails 2: Dumps a minifier_launcher.py if aoti fails. 3: Always dumps a minifier_launcher.py. Good for segfaults. 4: Dumps a minifier_launcher.py if the accuracy fails. Refactor AOTI minifier unit tests to be cleaner and better re-use the existing minifier testing code. We do not need to manually patch {"aot_inductor.dump_aoti_minifier": True} to each test now, this config is generated in the test code. Differential Revision: D68294638 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145539 Approved by: https://github.com/desertfire	2025-01-24 23:07:19 +00:00
Aishwarya Sivaraman	457facf7e2	[caffe2] Use the manifold cache backend as the default (#144773 ) Test Plan: CI D68155591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144773 Approved by: https://github.com/izaitsevfb	2025-01-24 19:48:34 +00:00
Michael Lazos	8eea554332	[Dynamo] Fix names collisions with foreach decomps (#145479 ) Fixes https://github.com/pytorch/pytorch/issues/138698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145479 Approved by: https://github.com/yanboliang	2025-01-24 18:46:58 +00:00
Animesh Jain	74cfb4f364	[dynamo][refactor] Move collections.namedtuple out of SkipFunctionVariable (#145547 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145547 Approved by: https://github.com/zou3519 ghstack dependencies: #145519	2025-01-24 17:39:33 +00:00
Animesh Jain	9132f4b7ce	[dynamo][guards] Log guard latency to tlparse (#145509 ) Example ![image](https://github.com/user-attachments/assets/1503ee59-ff35-46d9-9b61-16352a4a30e2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145509 Approved by: https://github.com/ezyang	2025-01-24 16:33:29 +00:00
Animesh Jain	53fc921ce2	[dynamo][trace-rules-cleanup] Remove functools from the Builtins skiplist (#145519 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145519 Approved by: https://github.com/yanboliang, https://github.com/zou3519	2025-01-24 06:02:03 +00:00
PyTorch MergeBot	6f60c65a3a	Revert "[dynamo] Log guard latency (#145132 )" This reverts commit `0a310d7388`. Reverted https://github.com/pytorch/pytorch/pull/145132 on behalf of https://github.com/anijain2305 due to CI failures observed after PR was merged ([comment](https://github.com/pytorch/pytorch/pull/145132#issuecomment-2611268421))	2025-01-24 00:11:50 +00:00
PyTorch MergeBot	6dd8283381	Revert "[compiled autograd] Proxy opaque nodes for built-in autograd nodes (#143296 )" This reverts commit `5531fafffe`. Reverted https://github.com/pytorch/pytorch/pull/143296 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:13 +00:00
PyTorch MergeBot	9553301ade	Revert "[compiled autograd] Proxy nodes for user-defined C++ torch::autograd::Function (#143387 )" This reverts commit `784bb2127c`. Reverted https://github.com/pytorch/pytorch/pull/143387 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:13 +00:00
PyTorch MergeBot	16c4f8c395	Revert "[compiled autograd] Always proxy autograd.Function nodes; handle AOT backwards (#143405 )" This reverts commit `ec820fe57c`. Reverted https://github.com/pytorch/pytorch/pull/143405 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:13 +00:00
PyTorch MergeBot	3f6cfd0156	Revert "[compiled autograd] stop specializing on metadata during initial trace (#143417 )" This reverts commit `99dd1bf1b9`. Reverted https://github.com/pytorch/pytorch/pull/143417 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:12 +00:00
PyTorch MergeBot	ab082863a1	Revert "[compiled autograd] support Tensor Subclasses in AOTBackward (#144115 )" This reverts commit `082c28c3c6`. Reverted https://github.com/pytorch/pytorch/pull/144115 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:12 +00:00
Animesh Jain	0a310d7388	[dynamo] Log guard latency (#145132 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145132 Approved by: https://github.com/ezyang ghstack dependencies: #145351, #145420	2025-01-23 23:30:07 +00:00
Nikhil Gupta	41b38f755c	Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392 )" (#145505 ) https://github.com/pytorch/pytorch/pull/134124 was reverted by https://github.com/pytorch/pytorch/pull/145392 due to KleidiAI clone issue. 1. This reverts commit `0940eb6d44` (https://github.com/pytorch/pytorch/pull/145392 )and Fixes KleidiAI mirror issue. 2. KleidiAI is now cloned from github mirror instead of arm gitlab Change-Id: I7d6eee7214cd117d3057d615936fcc3ee6052fa2 Fixes https://github.com/pytorch/pytorch/issues/145273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145505 Approved by: https://github.com/malfet	2025-01-23 18:50:59 +00:00
Animesh Jain	015c6d6fdb	[dynamo][guards] Turn on profiling of guard manager (#145420 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145420 Approved by: https://github.com/ezyang ghstack dependencies: #145351	2025-01-23 18:17:43 +00:00
Animesh Jain	c58198184b	[dynamo][dicts] Insert LENTGH guard on an if condition on dict (#145432 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145432 Approved by: https://github.com/williamwen42, https://github.com/jansel	2025-01-23 04:40:56 +00:00
Animesh Jain	5a18f1e1eb	[dynamo] Support fx map_aggregate (#145351 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145351 Approved by: https://github.com/zou3519	2025-01-23 03:19:30 +00:00
Li Yu (ads)	e6a84be3d3	[PyTorch] Add backend aot_eager_decomp_partition_with_mode (#143250 ) Summary: ## Why To make it possible to run torch dispatch mode inside compiled modules. This is to enable running MemoryTrackerMode (in next diff) to collect memory usage of compiled modules. ## What Add a backend aot_eager_decomp_partition_with_mode. Add an enable_log to the backend to control the compilation logging (which can be very verbose and slow the run of mode) Test Plan: unittest E2e tested in the next diff which shows the memory read from the mode passed to this backend is very close to the actual job's memory snapshot. Differential Revision: D67227144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143250 Approved by: https://github.com/bdhirsh	2025-01-22 23:20:59 +00:00
PyTorch MergeBot	f0a210bf5d	Revert "Output of nonzero is transposed, fix fake tensor (#144695 )" This reverts commit `693d8c7e94`. Reverted https://github.com/pytorch/pytorch/pull/144695 on behalf of https://github.com/izaitsevfb due to breaking internal tests, see D68461259 ([comment](https://github.com/pytorch/pytorch/pull/144695#issuecomment-2608443589))	2025-01-22 23:04:50 +00:00
rzou	082c28c3c6	[compiled autograd] support Tensor Subclasses in AOTBackward (#144115 ) Compiled autograd's initial trace traces through the AOTBackward epilogue. The Tensor Subclass code is not traceable. This PR changes it so that when we see Tensor Subclass constructors, we proxy nodes for their construction into the graph. Test Plan: - New basic test with TwoTensor - Existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/144115 Approved by: https://github.com/jansel, https://github.com/xmfan, https://github.com/bdhirsh ghstack dependencies: #143296, #143304, #143387, #143405, #143417	2025-01-22 21:51:07 +00:00
rzou	99dd1bf1b9	[compiled autograd] stop specializing on metadata during initial trace (#143417 ) The previous PRs built up to this. We change compiled autograd's initial trace to stop baking in metadata. While tracing, we allocate some weirdly shaped tensors that we can put proxies on. The initial trace should not be accessing any metadata of these tensors (it will likely error out if it does because of how weird the shapes are). This involved fixing some various sites where we do specialize on the metadata, like: - we change CopySlices's apply_with_saved to proxy some calls into the graph (this change is fairly hard to split out by itself). - we stop calling InputBuffer::add - we delete the weird metadata from the graph so that no graph passes can make use of it. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/143417 Approved by: https://github.com/jansel, https://github.com/xmfan ghstack dependencies: #143296, #143304, #143387, #143405	2025-01-22 21:51:07 +00:00
rzou	ec820fe57c	[compiled autograd] Always proxy autograd.Function nodes; handle AOT backwards (#143405 ) We will always proxy autograd.Function nodes in compiled autograd's initial graph capture (previously there was an option to proxy vs trace into the autograd.Function) We have some requirements for the AOTBackward. Compiled Autograd runs accumulate grad reordering passes on the AOTBackward graph directly after the initial graph capture, so we can't just proxy a single node for it. Instead, we: - proxy the AOTBackward prologue function into the CA graph - copy-paste the AOTBackward graph into the CA graph - trace directly through the epilogue (the traced nodes go into the CA graph). Tracing through the epilogue is safe (assuming no Tensor subclasses) because the only thing the epilogue does is drop some outputs. The Tensor subclass situation was already broken so this doesn't regress anything but this PR sets it up to be fixed (in a followup, where we will proxy "make_subclass" calls into the graph from the epilogue). Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/143405 Approved by: https://github.com/jansel, https://github.com/xmfan ghstack dependencies: #143296, #143304, #143387	2025-01-22 21:50:56 +00:00
rzou	784bb2127c	[compiled autograd] Proxy nodes for user-defined C++ torch::autograd::Function (#143387 ) We define a functional version of a C++ torch::autograd::Function. The functional version reconstructs the ctx object and then calls backward with it. Some more details: - we define how to pack/unpack ctx.saved_data into an IValue. It's a Dict[str, IValue], so it wasn't difficult. - every call to CppNode::apply_with_saved binds a new function to Python. This is because we're unable to reuse the a previously bound function for reasons (the schema may change depending on what the user actually puts into their Dict[str, IValue]). Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/143387 Approved by: https://github.com/jansel, https://github.com/xmfan ghstack dependencies: #143296, #143304	2025-01-22 21:50:47 +00:00
rzou	5531fafffe	[compiled autograd] Proxy opaque nodes for built-in autograd nodes (#143296 ) This PR is on the way to getting compiled autograd's initial capture to stop specializing on Tensor metadata. This PR changes compiled autograd's initial capture to proxy an opaque (w.r.t. Dynamo) function into the graph for all built-in codegen'ed autograd nodes and validate_outputs. We changed each codegen'ed apply_with_saved (e.g. MulBackward0::apply_with_saved) to call into Python to proxy a function (compiled_autograd.ops.MulBackward0) into the graph. Then, we use the node's InputMetadata to "guess" at the properties of the output Tensors to create some new FakeTensors. Some details: - MulBackward0::apply_with_saved lives in libtorch_cpu, but needs to be call to Python via libtorch_python. There is an indirection (PyCompilerInterface) to do this. - MulBackward0::apply_with_saved passes a C++ function to Python. To make our lives easier, every codegen'ed apply_with_saved passes a C++ function with the same signature `(variable_list, ivalue_list) -> variable_list`. - We define how to pack arbitrary C++ types into IValue via a helper IValuePacker struct and codegen functional variants of each builtin C++ autograd node (e.g. MulBackward0_apply_functional_ivalue). MulBackward0 before this PR: https://gist.github.com/zou3519/a80381d5fa38e970e413fcd91b0530de MulBackward0 after this PR: https://gist.github.com/zou3519/0c2eee8b3d8d96232b51ef430b53c5b0 Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/143296 Approved by: https://github.com/jansel	2025-01-22 21:50:29 +00:00
albanD	0940eb6d44	Reverting the PR adding Kleidiai-based int4 kernels (#145392 ) Mitigation for https://github.com/pytorch/pytorch/issues/145273 Reverting https://github.com/pytorch/pytorch/pull/134124 and https://github.com/pytorch/pytorch/pull/144074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145392 Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/atalman, https://github.com/digantdesai	2025-01-22 20:11:49 +00:00
Isuru Fernando	0efa843392	Dynamic shape guards in C++ (#139899 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139899 Approved by: https://github.com/anijain2305, https://github.com/albanD, https://github.com/jansel ghstack dependencies: #143385, #143164	2025-01-22 14:58:35 +00:00
Isuru Fernando	fbaef0ac03	Add a language option for symbolic shape guards (#143164 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143164 Approved by: https://github.com/ezyang ghstack dependencies: #143385	2025-01-22 14:58:35 +00:00
Aaron Orenstein	1ce533867f	Teach dynamo to handle GenericAlias without a graph break (#145240 ) Dynamo wasn't handling the new PEP585 type annotations: ``` x = list[Foo] ``` Although this worked in py3.9 this was causing an `unimplemented` (Unexpected type in sourceless builder) in py3.12. This fixes it to treat them as a BuiltinVariable. Fixes #145226 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145240 Approved by: https://github.com/anijain2305	2025-01-22 01:55:51 +00:00
rzou	1e8d6d6f0e	[SkipFiles] New modules added to torch.* are inlined by default (#145279 ) This PR: - makes it so that new modules added to torch are inlined by default - adds a list of the previously "skipped by default" modules to avoid regressing anything. This is a new MOD_SKIPLIST list that is consulted in trace_rules.check_file. - Follow-up work will go through this list, one-by-one, and try to delete modules. I think we should be able to delete almost everything, except for torch._dynamo. Test Plan - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/145279 Approved by: https://github.com/yanboliang	2025-01-21 23:24:12 +00:00
Edward Z. Yang	693d8c7e94	Output of nonzero is transposed, fix fake tensor (#144695 ) Needs this companion executorch PR: https://github.com/pytorch/executorch/pull/7657 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/144695 Approved by: https://github.com/bobrenjc93, https://github.com/albanD	2025-01-21 20:50:09 +00:00
Animesh Jain	19584b28fd	[dynamo][dicts] Consolidate dict(..) construction (#144342 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144342 Approved by: https://github.com/StrongerXi	2025-01-20 04:42:06 +00:00
Aaron Orenstein	a79100ab11	PEP585 update - torch/_dynamo (#145105 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145105 Approved by: https://github.com/bobrenjc93	2025-01-18 20:47:11 +00:00
Yanbo Liang	43a00d73b3	[Trace Python Dispatcher] Support FuncTorchInterpreter (#144444 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144444 Approved by: https://github.com/williamwen42, https://github.com/zou3519 ghstack dependencies: #144439	2025-01-17 02:26:37 +00:00
Yanbo Liang	5d02575aa1	[Trace Python dispatcher] Support torch.DispatchKey & torch.DispatchKeySet (#144439 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144439 Approved by: https://github.com/zou3519	2025-01-17 02:26:36 +00:00
William Wen	3a50aba7d3	[dynamo] add option to not skip on empty graph (#144885 ) Temporary fix to https://github.com/pytorch/pytorch/issues/144360. Turning the config on globally will cause a bunch of tests to fail, which needs to be addressed in followups. I had a previous attempt at https://github.com/pytorch/pytorch/pull/144712, but this is a more complicated change and will likely be absorbed into work to refactor Dynamo's exception handling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144885 Approved by: https://github.com/jansel	2025-01-17 02:12:20 +00:00
Nikita Shulga	a61a65ff82	[MPSInductor] Add `Worker.current_device` method (#145023 ) That just returns 0, as multi-gpu is not currently supported by MPS Pull Request resolved: https://github.com/pytorch/pytorch/pull/145023 Approved by: https://github.com/dcci	2025-01-17 01:41:01 +00:00
PyTorch MergeBot	5e6e6200bf	Revert "[dynamo][dicts] Consolidate dict(..) construction (#144342 )" This reverts commit `a54a784b82`. Reverted https://github.com/pytorch/pytorch/pull/144342 on behalf of https://github.com/kit1980 due to breaking internal builds, see D68125388 ([comment](https://github.com/pytorch/pytorch/pull/144342#issuecomment-2597184167))	2025-01-17 00:32:09 +00:00
Laith Sakka	c3fcb3606d	Profile compile_inner instead of _compile_inner (#144930 ) Summary: title Test Plan: NA Reviewed By: jamesjwu Differential Revision: D67990492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144930 Approved by: https://github.com/jamesjwu	2025-01-16 23:59:27 +00:00
Colin L. Rice	95c363cc9b	dynamo: Don't crash with internal error if getattr on a tensor fails (#144817 ) This prevents crashes when getattr is called on a tensor for something which doesn't exist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144817 Approved by: https://github.com/williamwen42, https://github.com/jansel	2025-01-16 22:04:06 +00:00
Colin L. Rice	6492851125	symbolic_convert: Don't fail when we hit a undefined name (#144784 ) We're using a python builtin NameError here, instead of throwing a Unsupported exception. This causes the NameError to get wrapped in a InternalTorchDynamoError instead of just causing a graph break, and letting the user code fail directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144784 Approved by: https://github.com/williamwen42, https://github.com/jansel	2025-01-16 01:47:48 +00:00
Colin L. Rice	926f9056a9	speculation_log: Raise a unique error for divergence issues (#144785 ) This is primarily sent for discussion and to see what tests fail due to this. The idea is that rather than capturing this as a regex on the fail_reason, just give it a unique failure type Pull Request resolved: https://github.com/pytorch/pytorch/pull/144785 Approved by: https://github.com/ezyang	2025-01-16 00:49:43 +00:00
Colin L. Rice	b88dcb4835	dynamo: Don't crash when tracing a missing attr on a constant. (#144593 ) dynamo: Don't crash when tracing a missing attr on a constant. This throws a InternalTorchDynamoError: AttributeError: 'NoneType' object has no attribute 'max' instead of just skipping the bad call when tracing, and throwing a normal AttributeError instead. There are two questions that I would love reviewer comment on. 1) Is throwing unimplemented the right thing here? or should I throw something like ObservedAttributeError 2) Do we need to worry about performance with this code? In particular, should we just catch the exception? Or maybe cache the lookup result? Pull Request resolved: https://github.com/pytorch/pytorch/pull/144593 Approved by: https://github.com/jansel	2025-01-15 20:23:43 +00:00
Simon Fan	898a90c6bb	[dynamo][hop] Introduce FlexAttentionBackwardHighOrderVariable (#144533 ) FIXES https://github.com/pytorch/pytorch/issues/143180 This PR adds a new variable mapping to SourcelessBuilder to represent the flex attention intermediates. The variable proxies a call to HOP, and carryovers the graph state (subgraphs represented as UnspecializedNNModuleVariable) to the dynamo output graph. This is safe to do because the nn modules used in flex attention have either been speculated on before, or are outputs of make_fx of the forward. tlparse of `TestCompiledAutograd.test_flex_attention`: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpiWendk/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 ```python class GraphModule(torch.nn.Module): def forward(self, L_inputs_ : list): ... # File: /data/users/xmfan/core/b/pytorch/torch/_dynamo/compiled_autograd.py:832 in set_node_origin, code: CompiledFunctionBackward0 (NodeCall 1) ... fw_graph0_0 = self.fw_graph0_0 joint_graph0_0 = self.joint_graph0_0 mask_graph0_0 = self.mask_graph0_0 flex_attention_backward = torch.ops.higher_order.flex_attention_backward(aot0_primals_1, aot0_primals_1, aot0_primals_1, aot0_detach_3, aot0_detach_5, aot0_expand_5, aot0_zeros_1, fw_graph0_0, joint_graph0_0, (1, 1, aot0_ones, aot0_zeros, None, None, aot0__to_copy_1, aot0__to_copy_2, None, None, 1073741824, 1073741824, mask_graph0_0), 0.125, {'PRESCALE_QK': False, 'ROWS_GUARANTEED_SAFE': False, 'BLOCKS_ARE_CONTIGUOUS': False, 'WRITE_DQ': True, 'OUTPUT_LOGSUMEXP': True}, (), ()); aot0_primals_1 = aot0_detach_3 = aot0_detach_5 = aot0_expand_5 = aot0_zeros_1 = fw_graph0_0 = joint_graph0_0 = aot0_ones = aot0_zeros = aot0__to_copy_1 = aot0__to_copy_2 = mask_graph0_0 = None aot0_getitem_4: "bf16[1, 1, s0, s1][s0s1, s0s1, s1, 1]cuda:0" = flex_attention_backward[0] aot0_getitem_5: "bf16[1, 1, s0, s1][s0s1, s0s1, s1, 1]cuda:0" = flex_attention_backward[1] aot0_getitem_6: "bf16[1, 1, s0, s1][s0s1, s0s1, s1, 1]cuda:0" = flex_attention_backward[2]; flex_attention_backward = None ... class fw_graph0_0(torch.nn.Module): def forward(self, arg0_1: "bf16[][]cuda:0", arg1_1: "i32[][]cuda:0", arg2_1: "i32[][]cuda:0", arg3_1: "i32[][]cuda:0", arg4_1: "i32[][]cuda:0"): return arg0_1 class joint_graph0_0(torch.nn.Module): def forward(self, arg0_1: "bf16[][]cuda:0", arg1_1: "i32[][]cuda:0", arg2_1: "i32[][]cuda:0", arg3_1: "i32[][]cuda:0", arg4_1: "i32[][]cuda:0", arg5_1: "bf16[][]cuda:0"): return [arg5_1, None, None, None, None] class mask_graph0_0(torch.nn.Module): def forward(self, arg0_1: "i32[][]cuda:0", arg1_1: "i32[][]cuda:0", arg2_1: "i32[][]cuda:0", arg3_1: "i32[][]cuda:0"): # File: /data/users/xmfan/core/b/pytorch/torch/_dynamo/compiled_autograd.py:832 in set_node_origin, code: CompiledFunctionBackward0 (NodeCall 1) new_ones: "b8[][]cuda:0" = torch.ops.aten.new_ones.default(arg0_1, [], dtype = torch.bool, device = device(type='cuda', index=0), pin_memory = False); arg0_1 = None return new_ones ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144533 Approved by: https://github.com/zou3519	2025-01-15 18:40:57 +00:00
Edward Z. Yang	ee8f833d13	Undo leading underscore on ctx for breakpoint (#144864 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/144864 Approved by: https://github.com/Skylion007	2025-01-15 18:00:58 +00:00
Sujoy Saraswati	7e1c1e65eb	Graph freezing preparation for non-Inductor backends (#139902 ) Enable preparing module named parameters and buffers in tracing context for non-Inductor backends to implement graph freezing. Fixes #139272 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139902 Approved by: https://github.com/eellison, https://github.com/masnesral, https://github.com/gujinghui	2025-01-15 11:25:04 +00:00
James Wu	7d71ddbe5d	Add non_c_binding torch functions to allowlist for AOTAutogradCache, confirm no special handlers for them (#144802 ) Differential Revision: [D68173093](https://our.internmc.facebook.com/intern/diff/D68173093/) This diff allows any function in torch_non_c_binding_in_graph_functions to be safe to cache. These functions should be safe to cache because they are part of the torch API, and do not save global state (or if they do, dynamo creates unique guards around the constants they return). A function that's allowed in a dynamo graph is safe to cache for AOTAutograd purposes as long as: - It's functional (i.e. does not access global state); - or its value is constant folded away (and guarded against by dynamo) The tricky cases are functions that dynamo uses special handlers to track. These special handlers can sometimes close over stuff that's safe for dynamo locally, but isn't encoded anywhere when cached across processes. An example of this is `DTensor.from_local`, where various DeviceMesh information doesn't change in the same dynamo process, but can change across multiple processes. The handler for `DTensor.from_local` closes over these and dynamo creates a proxy for the function call. This is not safe to cache. That said, most special handlers are in fact functional and safe. So I add a unit test to test_trace_rules.py that confirms that any function with special handlers in dynamo added to this list needs to be audited to be safe to cache. The list of safe handlers there either: - Don't access global state; - Guard on global state; or - Always returns a constant that never changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/144802 Approved by: https://github.com/bdhirsh	2025-01-15 05:41:36 +00:00
Simon Fan	9cd6f46130	[ca] raise error message on AOT Autograd caching (#144595 ) FIXES https://github.com/pytorch/pytorch/issues/144175, bandaid Pull Request resolved: https://github.com/pytorch/pytorch/pull/144595 Approved by: https://github.com/bdhirsh	2025-01-15 05:09:42 +00:00
Nikita Shulga	18786c65e5	[BE] Extend `test_remove_no_ops` (#144795 ) ---- - Use `is_dtype_supported` to skip dtype promotions portion of the test on unsupported device - Extend it to use `torch.float16` so promotions could be checked there - Implement `CpuInterface.is_bfloat16_supported` that returns true (which looks like the case, even if it's supported via emulation) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144795 Approved by: https://github.com/Skylion007 ghstack dependencies: #144509, #144798	2025-01-15 05:00:26 +00:00
Nikita Shulga	9157a748a6	[MPSInductor] Add dummy properties (#144509 ) For compute capabilitiy (which is an empty string, same as CPU) And for multicore count return 8, as this is smallest number of GPU cores on Apple silicon Pull Request resolved: https://github.com/pytorch/pytorch/pull/144509 Approved by: https://github.com/jansel	2025-01-14 20:12:38 +00:00
James Wu	e58c823ab8	Implement increment and add_to_set for CompileEventLogger (#143427 ) This diff implements `increment` and `add_to_set`, which are features of MetricsContext, but not ChromiumEventLogger. This allows us to add a bunch of other metricscontext callsites to use CompileEventLogger instead. Differential Revision: [D67354867](https://our.internmc.facebook.com/intern/diff/D67354867/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143427 Approved by: https://github.com/masnesral	2025-01-14 02:42:49 +00:00
Animesh Jain	a54a784b82	[dynamo][dicts] Consolidate dict(..) construction (#144342 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144342 Approved by: https://github.com/StrongerXi	2025-01-13 22:24:56 +00:00
Ryan Guo	4ceca4d60f	[dynamo] Avoid graph break on updates to `obj.__dict__` (#144419 ) `obj.__dict__` is handled specially in Dynamo, and prior to this patch we only support read and membership check on that dictionary object. This patch adds support for writes and some documentation. Fixes #143756. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144419 Approved by: https://github.com/jansel, https://github.com/anijain2305	2025-01-13 21:04:10 +00:00
Yanbo Liang	3355103233	[Dynamo] Supports autograd.Function forward returns constant (#144597 ) Fixes #144142 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144597 Approved by: https://github.com/jansel	2025-01-12 03:53:10 +00:00
Simon Fan	8fa47c9455	[dynamo] log compiler collective duration to tlparse chromium trace (#144372 ) To show wall time in tlparse for the synchronous compiler collective. Can eliminate the leading hypothesis from https://fb.workplace.com/groups/1075192433118967/permalink/1578670289437843. <img width="1296" alt="image" src="https://github.com/user-attachments/assets/b17d4efb-8573-43e5-af58-c51af05acb54" /> sample: https://gist.github.com/xmfan/19eeaa80d55a4e7c168e150355ec7392 rank 0: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpr5WNMt/rank_0/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10 rank 1: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpr5WNMt/rank_1/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144372 Approved by: https://github.com/ezyang	2025-01-11 03:10:39 +00:00
Colin L. Rice	0cd9320c7f	easy: dynamo_config: sort keys and set values (#143317 ) This will create consistent ordering of keys when writing, as well as sorting sets before serializing Pull Request resolved: https://github.com/pytorch/pytorch/pull/143317 Approved by: https://github.com/masnesral ghstack dependencies: #143307	2025-01-11 03:08:04 +00:00
Sam Ginzburg	074aca3ed2	[user triton] add support for @triton.heuristics after @triton.autotune (#142208 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142208 Approved by: https://github.com/zou3519	2025-01-11 02:18:26 +00:00
PyTorch MergeBot	473b745cb9	Revert "[dynamo] Avoid graph break on updates to `obj.__dict__` (#144419 )" This reverts commit `c8595ba7d0`. Reverted https://github.com/pytorch/pytorch/pull/144419 on behalf of https://github.com/clee2000 due to newly added test fails internally D68004708 ([comment](https://github.com/pytorch/pytorch/pull/144419#issuecomment-2583265412))	2025-01-10 16:59:38 +00:00
bobrenjc93	1fe3af2c68	Migrate from Tuple -> tuple in torch/_dynamo (#144261 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144261 Approved by: https://github.com/aorenste, https://github.com/zou3519	2025-01-10 07:45:57 +00:00
Ryan Guo	c8595ba7d0	[dynamo] Avoid graph break on updates to `obj.__dict__` (#144419 ) `obj.__dict__` is handled specially in Dynamo, and prior to this patch we only support read and membership check on that dictionary object. This patch adds support for writes and some documentation. Fixes #143756. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144419 Approved by: https://github.com/jansel, https://github.com/anijain2305	2025-01-10 05:22:04 +00:00
Guilherme Leobas	bf6dd955cd	Fix max(map(...)) (#142443 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142443 Approved by: https://github.com/zou3519	2025-01-10 01:44:37 +00:00
Shangdi Yu	66ce13b497	Revert D67299312: Multisect successfully blamed "D67299312: [AoTI Minifier] UX Improvement" for one test failure (#144475 ) Summary: This diff partially reverts D67299312 D67299312: [AoTI Minifier] UX Improvement by yushangdi causes the following test failure: Differential Revision: D67963019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144475 Approved by: https://github.com/zhxchen17, https://github.com/angelayi	2025-01-09 23:27:55 +00:00
Colin L. Rice	73278e6a5d	easy: sort dictionary keys for inductor config when publishing (#143307 ) This means we should get consistent logging strings for the same config on different ranks Pull Request resolved: https://github.com/pytorch/pytorch/pull/143307 Approved by: https://github.com/xmfan	2025-01-09 18:01:20 +00:00
Xuehai Pan	dcc3cf7066	[BE] fix ruff rule E226: add missing whitespace around operator in f-strings (#144415 ) The fixes are generated by: ```bash ruff check --fix --preview --unsafe-fixes --select=E226 . lintrunner -a --take "RUFF,PYFMT" --all-files ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144415 Approved by: https://github.com/huydhn, https://github.com/Skylion007	2025-01-08 21:55:00 +00:00
Aaron Gokaslan	373541fbf4	[BE]: Remove unnecessary copy of gradients in util (#144329 ) No need to copy gradients to CPU too Pull Request resolved: https://github.com/pytorch/pytorch/pull/144329 Approved by: https://github.com/awgu, https://github.com/cyyever	2025-01-08 16:52:15 +00:00
Nikita Shulga	708ce3c008	Add `is_dtype_supported` predicate to DeviceInterface (#144355 ) Which will return true, unless dtype is bf16 by default For MPS device it will return false if dtype is double Check that it works by refactoring `test_inf` that should expect TypeError raised if invoked with unsupported dtype Pull Request resolved: https://github.com/pytorch/pytorch/pull/144355 Approved by: https://github.com/jansel, https://github.com/dcci	2025-01-08 13:59:46 +00:00
William Wen	f700035090	[3.13t] use sysconfig to check for Python nogil builds (#144361 ) `sys._is_gil_enabled()` wasn't working in certain cases, according to @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/144361 Approved by: https://github.com/atalman	2025-01-08 13:00:32 +00:00
Animesh Jain	2ac41404a8	[dynamo][dicts] Guarding lazily on dict keys (#143997 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143997 Approved by: https://github.com/jansel	2025-01-08 03:56:33 +00:00
Oguz Ulgen	9ee242213b	[RFC] Introduce cache hot loading APIs (a.k.a. "Mega-cache") (#143341 ) This PR essentially introduces two new APIs * torch.compiler.save_cache_artifacts * torch.compiler.load_cache_artifacts which aim to create a mega cache experience where the user can start collecting cache artifacts, and later call the save API to fetch them. In the next attempt, the user can "hot load" the cache artifacts via the load function. This bundling approach reduces the need to rely on porting individual files one by one, or relying on many network requests. Note that these APIs CANNOT log to structured logging as these functions will be called before and after compilation, as opposed to during compilation. Due to this limitation, the API returns a struct that the user can log with. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143341 Approved by: https://github.com/jansel	2025-01-07 23:13:24 +00:00
Yanbo Liang	430d54ee20	[Dynamo] Add functorch C++ bindings as in graph functions (#144309 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144309 Approved by: https://github.com/williamwen42 ghstack dependencies: #144306, #144307, #144308	2025-01-07 22:25:01 +00:00
Yanbo Liang	d146763f6f	[Dynamo] Inline functions in torch._ops (#144308 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144308 Approved by: https://github.com/williamwen42 ghstack dependencies: #144306, #144307	2025-01-07 22:25:01 +00:00
Yanbo Liang	242a4a3f83	[Dynamo] Inline functions in torch._functorch.pyfunctorch (#144307 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144307 Approved by: https://github.com/williamwen42 ghstack dependencies: #144306	2025-01-07 22:24:53 +00:00
Yanbo Liang	4417be65e5	[Dynamo] Inline functions in torch._functorch.autograd_function (#144306 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144306 Approved by: https://github.com/williamwen42	2025-01-07 22:24:46 +00:00
Simon Fan	f4969c8235	fix torch.compile + ddp + non-reentrant AC pack hook firing count (#144271 ) FIXES https://github.com/pytorch/pytorch/issues/144035 In order to preserve hook firing semantics, we disabled pack/unpack hooks for torch.compile: https://github.com/pytorch/pytorch/pull/123196. In DDP under torch.compile, there's this other callsite that we need to disable hooks for Pull Request resolved: https://github.com/pytorch/pytorch/pull/144271 Approved by: https://github.com/bdhirsh, https://github.com/soulitzer	2025-01-07 21:08:52 +00:00
Simon Fan	d38af6e8bc	[ca] dedup node names when AOT bwd graph is reused multiple times (#144202 ) This error started popping up in HUD CA benchmarks: ```python File "/data/users/xmfan/core/b/pytorch/torch/_dynamo/compiled_autograd.py", line 371, in dce self.fx_tracer.graph.eliminate_dead_code(is_impure) File "/data/users/xmfan/core/b/pytorch/torch/fx/graph.py", line 1862, in eliminate_dead_code self.lint() File "/data/users/xmfan/core/b/pytorch/torch/fx/graph.py", line 1753, in lint raise RuntimeError(f"Node redefined name {node.name}!") RuntimeError: Node redefined name aot0_expand! ``` We added CA initial capture's renaming (https://github.com/pytorch/pytorch/pull/133148) to help debug issues with AOT backward, but it errors out when we have multiple instances of the same AOT backward. This likely only showed up now because of increased hierarchical graph reuse. I fix it by adding a postfix counter to the node name Pull Request resolved: https://github.com/pytorch/pytorch/pull/144202 Approved by: https://github.com/bdhirsh, https://github.com/jansel	2025-01-07 20:23:09 +00:00
Shangdi Yu	72e8f34715	[AoTI Minifier] UX Improvement (#143330 ) Summary: - When a user specify `TORCHINDUCTOR_MAX_AUTOTUNE=1` env variable, we add `config.max_autotune=True` to the generated minifier_launcher - We should do this to other inductor configs as well in a followup Diff Currently in dynamo and aoti minifier, if a config is overwritten by an env variable, the config will not show up in the config list in the minifier_launcher.py file. As a result, when running the minifier_launcher, they need to re-apply the same env variable. This is: 1) not convenient for the users 2) if they copy-paste the minifier_launcher.py to us without including the env variable, we could be confused and not able to reproduce the error. Underlying implementation change: - Add `env_default` parameter to `codegen_config()`. If set, configs overriden by the env are not considered default. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:utils -- -r test_codegen_config ``` Differential Revision: D67299312 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143330 Approved by: https://github.com/jansel, https://github.com/eellison	2025-01-07 20:04:19 +00:00
Guilherme Leobas	4c8d661348	Set `enable_trace_contextlib_contextmanager` flag to True (#140604 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140604 Approved by: https://github.com/zou3519 ghstack dependencies: #136033	2025-01-06 16:56:22 +00:00
yijun-lee	d4609af1ca	Propagate callable parameter types using ParamSpec (#142306 ) (#144047 ) Fixes #142306 This PR includes typing improvements and refactoring for the following files: - __init__.py - decorators.py - _ops.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/144047 Approved by: https://github.com/XuehaiPan, https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>	2025-01-06 16:16:18 +00:00
Animesh Jain	f6488d85a0	[dynamo][user-defined] Remove __getattribute__ checks and add getsetdescriptor (#144173 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144173 Approved by: https://github.com/jansel	2025-01-05 13:48:15 +00:00
PyTorch MergeBot	b01556bd8a	Revert "[dynamo][dicts] Guarding lazily on dict keys (#143997 )" This reverts commit `f5df082fab`. Reverted https://github.com/pytorch/pytorch/pull/143997 on behalf of https://github.com/jeanschmidt due to Seems to have introduced internal ci redness in some tests, D67828366 ([comment](https://github.com/pytorch/pytorch/pull/143997#issuecomment-2571587599))	2025-01-05 11:09:45 +00:00
James Wu	f2d6cfa677	Introduce CompileEventLogger, replace usages of metrics_context and chromium_event with it (#143420 ) Problem statement: I want to be able to centralize and simplify the process by which people add columns/data to existing spans. We have MetricsContext and ChromiumEventLogger, and there's various choices you can make to decide where and when to log different levels of observability for your events. To resolve this, I want a central API for "adding to events under dynamo_timed". CompileEventLogger is intended as a frontend for MetricsContext and ChromiumEventLogger so we can use the same class for handling everything. CompileEventLogger is intended be used within a `dynamo_timed()` context. Its purpose is to 1. log to existing events that are in progress (i.e. within dynamo_timed), and 2. log instant events to chromium that are independent of any specific span. CompileEventLogger has three log levels: - CHROMIUM: Log only to chromium events, visible via tlparse. - PT2_COMPILE: Log to chromium_events + pt2_compile_events - COMPILATION_METRIC: Log to compilation metrics in addition to the toplevel chromium and pt2_compile_event. In addition, we have a function CompileEventLogger.add() that automagically chooses the correct log level. For now, it is conservative, and will never automagically choose to log CompilationMetrics (though I could imagine it figuring out the metadata are all keys in CompilationMetric and therefore loggable there). The goal here is to make one single interface to log stuff for observability reasons, and make it as easy as possible. Not included in this diff: - V1 of this diff will not have implementations of `increment` and `add_to_set` which MetricsContext has, so those usages are not replaced yet. But I'll add those in a followup. - We don't handle `RuntimeMetricsContext`. It's unclear if I want that to be part of this, because under RuntimeMetricsContext there might not be a toplevel event to log to, so chromium events doesn't make sense in that context. So I might leave that separate for now. Differential Revision: [D67346203](https://our.internmc.facebook.com/intern/diff/D67346203/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143420 Approved by: https://github.com/aorenste	2025-01-04 22:40:34 +00:00
Animesh Jain	f5df082fab	[dynamo][dicts] Guarding lazily on dict keys (#143997 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143997 Approved by: https://github.com/jansel ghstack dependencies: #144129, #144130, #144141, #144158, #144163, #144160	2025-01-04 18:13:00 +00:00
Animesh Jain	816328fa51	[dynamo][lazy] LazyVT utils to get original value/source and is_hashable (#144160 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144160 Approved by: https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #144129, #144130, #144141, #144158, #144163	2025-01-04 06:23:05 +00:00
Sam Ginzburg	ec1f56fdcf	[user triton] add support for prune_configs_by in @triton.autotune (#142207 ) This PR adds support for prune_configs_by in the @triton.autotune decorator [docs](https://triton-lang.org/main/python-api/generated/triton.autotune.html#triton.autotune). Supporting this lets users reduce autotuning time by running user-supplied code (early_config_prune, perf_model) to prune the provided list of configs. We implement this by realizing args/kwargs in call_triton_kernel(...), and then calling kernel.prune_configs(...). Pull Request resolved: https://github.com/pytorch/pytorch/pull/142207 Approved by: https://github.com/zou3519, https://github.com/aakhundov	2025-01-04 03:50:28 +00:00
Animesh Jain	087c625261	[dynamo] Trace torch.typename (#144163 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144163 Approved by: https://github.com/yanboliang, https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #144129, #144130, #144141, #144158	2025-01-04 02:52:58 +00:00
Animesh Jain	3292220c43	[dynamo][easy] Move symnode helpers to utils (#144158 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144158 Approved by: https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #144129, #144130, #144141	2025-01-04 02:52:58 +00:00
Xiaodong Wang	0a94bb432e	[ROCm] CK Flash Attention Backend (#143695 ) Replace https://github.com/pytorch/pytorch/pull/138947 for re-import. Replaces https://github.com/ROCm/pytorch/pull/1592 This PR contains the initial implementation of SDPA with composable_kernel backend. The CK path can be forced by simply calling torch.backends.cuda.preferred_rocm_fa_library("ck"). Similarly, you can force the incumbent aotriton implementation by passing in "aotriton" or "default". As you'd expect, not setting this option will result in aotriton to be used as the backend. In the case of CK, if pytorch deems flash attention usable, then it will use the CK path in all the same places aotriton would have been used. This PR makes no changes to the heuristics which select which attention scheme to use (i.e. flash attention vs memory efficient attention vs math etc etc). It only gets called when flash attention is both enabled (via USE_FLASH_ATTENTION) and is selected at runtime by the existing heuristics. Files located in pytorch/aten/src/ATen/native/transformers/hip/flash_attn/ck/mha* have been pulled from https://github.com/Dao-AILab/flash-attention courtesy of @tridao's hard work who is the co-author NOTE: In order to use this backend, the user MUST set USE_CK_FLASH_ATTENTION=1 in their environment when they build PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143695 Approved by: https://github.com/malfet Co-authored-by: Andy Lugo <Andy.LugoReyes@amd.com> Co-authored-by: Jithun Nair <jithun.nair@amd.com>	2025-01-03 22:01:36 +00:00
Yidi Wu	c36f94b373	[while_loop][dynamo] auto-unspecialize int input and output to unbacked symints (#143106 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143106 Approved by: https://github.com/zou3519 ghstack dependencies: #143105, #143545	2025-01-03 19:01:07 +00:00
Yidi Wu	5660709856	[hop][BE] unify meta checking with check_meta_consistency (#143545 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143545 Approved by: https://github.com/zou3519 ghstack dependencies: #143105	2025-01-03 19:01:07 +00:00
PyTorch MergeBot	8d63a4a409	Revert "Set `enable_trace_contextlib_contextmanager` flag to True (#140604 )" This reverts commit `1c817fe671`. Reverted https://github.com/pytorch/pytorch/pull/140604 on behalf of https://github.com/guilhermeleobas due to breaking one of the benchmarks (moco) ([comment](https://github.com/pytorch/pytorch/pull/140604#issuecomment-2569640837))	2025-01-03 18:23:53 +00:00
Animesh Jain	c5c897c3a1	[dynamo][easy] Miscellaneous fixes (#144141 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144141 Approved by: https://github.com/williamwen42 ghstack dependencies: #144129, #144130	2025-01-03 18:22:56 +00:00
Xuehai Pan	d9507548d8	[dynamo][BE] move `zip_longest` polyfill to submodule `polyfills.itertools` (#144067 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144067 Approved by: https://github.com/yanboliang ghstack dependencies: #144066	2025-01-03 08:08:31 +00:00
Xuehai Pan	fb1beb31d2	[dynamo][BE] move `dropwhile` polyfill to submodule `polyfills.itertools` (#144066 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144066 Approved by: https://github.com/jansel	2025-01-03 08:08:31 +00:00
Animesh Jain	dec1a6d0f0	[dynamo] Separate out GetItemSource and DictGetItemSource (#143926 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143926 Approved by: https://github.com/jansel	2025-01-01 02:39:41 +00:00
Vinayak Pandey	16a57e232c	removed dead code for dynamo flag dead_code_elimination (#140938 ) Fixes #136862 1. removed dead code from torch/_dynamo/convert_frame.py 2. ran `lintrunner -a` and all the tests passed. 3. ran the unit tests and everything seems to be in order. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140938 Approved by: https://github.com/zou3519	2024-12-31 09:27:43 +00:00
Kasperi Apell	a7915c56f6	Propagate callable parameter types using ParamSpec (#142306 ) (#143797 ) The codebase has a few locations where callable parameter type information is lost when the unpackings args and *kwargs are typed as Any. Refactor these instances to retain type information using typing_extensions.ParamSpec. Also, in these functions, enforce return type with TypeVar. Addresses #142306 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143797 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>	2024-12-29 23:03:14 +00:00
Animesh Jain	01980cac38	[dynamo] Make ConstDictKeySource a subclass of ChainedSource (#143924 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143924 Approved by: https://github.com/jansel	2024-12-28 05:59:45 +00:00
Animesh Jain	c3c27aef34	[dynamo] Remove HFPretrained config hack (#143698 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143698 Approved by: https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #143888	2024-12-28 02:03:13 +00:00
Nikita Shulga	1e65dec2b9	[Dynamo] Add MPSDevice interface (#143891 ) That simply checks if device is available and whether or not it supports bf16 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143891 Approved by: https://github.com/jansel	2024-12-27 20:31:44 +00:00
Animesh Jain	a87cd5283b	[dynamo] Trace through overridden __getattribute__ method (#143888 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143888 Approved by: https://github.com/jansel	2024-12-27 18:10:00 +00:00
Animesh Jain	0f474a960b	[dynamo] Remove dead code after introducing UserDefinedDictVariable (#143699 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143699 Approved by: https://github.com/williamwen42, https://github.com/yanboliang, https://github.com/jansel ghstack dependencies: #143722	2024-12-27 04:51:35 +00:00
Animesh Jain	e296bab614	[dynamo] Remove DICT_SUBCLASS_GUARD_MANAGER and use dict.keys (#143722 ) In hinsight, we never needed a DICT_SUBCLASS_GUARD_MANAGER, because Dynamo would inline through the overridden keys method. In this PR, we ensure that while creating guards and constructing variable trackers, we get the `d.keys()` value by using `dict.keys(d)`. This ensures that we do not call overridden keys method. Therefore, the C++ guard can use `PyDict_Next` directly to check the guards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143722 Approved by: https://github.com/jansel	2024-12-27 04:51:35 +00:00
PyTorch MergeBot	26364428f5	Revert "[dynamo] Remove DICT_SUBCLASS_GUARD_MANAGER and use dict.keys (#143722 )" This reverts commit `fe95cbe018`. Reverted https://github.com/pytorch/pytorch/pull/143722 on behalf of https://github.com/wdvr due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/143722#issuecomment-2563127017))	2024-12-26 22:04:36 +00:00
PyTorch MergeBot	ee25daef5a	Revert "[dynamo] Remove dead code after introducing UserDefinedDictVariable (#143699 )" This reverts commit `7d1c666139`. Reverted https://github.com/pytorch/pytorch/pull/143699 on behalf of https://github.com/wdvr due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/143722#issuecomment-2563127017))	2024-12-26 22:04:35 +00:00
Aaron Orenstein	3df12d38cf	dynamo tracing perf: cache cleaned_instructions: 33.7 -> 30.0 (#143070 ) See #143056 for overall docs. This PR: Cache the interesting/expensive bits of `cleaned_instructions()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143070 Approved by: https://github.com/jansel	2024-12-26 19:02:08 +00:00
Jason Ansel	9035fb5a7b	[dynamo] Add types to exc.py (#143626 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143626 Approved by: https://github.com/yanboliang ghstack dependencies: #143552, #143610	2024-12-24 21:48:32 +00:00
Jason Ansel	9e5f3fdfc7	[dynamo] Shorten tracebacks for backend compiler errors (#143552 ) Fixes #143406 After this PR the error for missing Triton is: ```py Traceback (most recent call last): File "/home/jansel/pytorch/repro.py", line 51, in <module> fp32_compiled = optimized_model(low_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/eval_frame.py", line 580, in _fn raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3624, in create_backend raise TritonMissing(inspect.currentframe()) torch._dynamo.exc.TritonMissing: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at: https://github.com/triton-lang/triton Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ``` Setting `TORCHDYNAMO_VERBOSE=1` yields something like the old error: ```py Traceback (most recent call last): File "/home/jansel/pytorch/repro.py", line 51, in <module> fp32_compiled = optimized_model(low_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/eval_frame.py", line 580, in _fn raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/eval_frame.py", line 576, in _fn return fn(args, *kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 1383, in __call__ return self._torchdynamo_orig_callable( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 1167, in __call__ result = self._inner_convert( ^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 548, in __call__ return _compile( ^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 988, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 716, in compile_inner return _compile_inner(code, one_graph, hooks, transform) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_utils_internal.py", line 95, in wrapper_function return function(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 751, in _compile_inner out_code = transform_code_object(code, transform) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object transformations(instructions, code_options) File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 232, in _fn return fn(args, kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 663, in transform tracer.run() File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 2870, in run super().run() File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 1053, in run while self.step(): ^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 963, in step self.dispatch_table[inst.opcode](self, inst) File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 3050, in RETURN_VALUE self._return(inst) File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 3035, in _return self.output.compile_subgraph( File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1102, in compile_subgraph self.compile_and_call_fx_graph( File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1383, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1433, in call_user_compiler return self._call_user_compiler(gm) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1463, in _call_user_compiler compiled_fn = compiler_fn(gm, self.example_inputs()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ compiled_gm = compiler_fn(gm, example_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/__init__.py", line 2314, in __call__ return compile_fx(model_, inputs_, config_patches=self.config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1880, in compile_fx return aot_autograd( ^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/backends/common.py", line 83, in __call__ cg = aot_module_simplified(gm, example_inputs, self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 1145, in aot_module_simplified compiled_fn = AOTAutogradCache.load( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/_aot_autograd/autograd_cache.py", line 754, in load compiled_fn = dispatch_and_compile() ^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile compiled_fn, _ = create_aot_dispatcher_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function return _create_aot_dispatcher_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function compiled_fn, fw_metadata = compiler_fn( ^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 676, in aot_dispatch_autograd compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 489, in __call__ return self.compiler_fn(gm, example_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1758, in fw_compiler_base return inner_compile( ^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 572, in compile_fx_inner return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper inner_compiled_fn = compiler_fn(gm, example_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 686, in _compile_fx_inner mb_compiled_graph = fx_codegen_and_compile( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile compiled_fn = graph.compile_to_module().call ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/graph.py", line 1975, in compile_to_module return self._compile_to_module() ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/graph.py", line 1981, in _compile_to_module self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen() ^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/graph.py", line 1916, in codegen self.scheduler.codegen() File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3667, in codegen return self._codegen() ^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3761, in _codegen if device is not None and self.get_backend(device).ready_to_flush(): ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3631, in get_backend self.backends[device] = self.create_backend(device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3624, in create_backend raise TritonMissing(inspect.currentframe()) torch._dynamo.exc.TritonMissing: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at: https://github.com/triton-lang/triton You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ``` This PR also strips dynamo stack frames from other types of backend compile errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143552 Approved by: https://github.com/yanboliang	2024-12-24 21:48:23 +00:00
Animesh Jain	7d1c666139	[dynamo] Remove dead code after introducing UserDefinedDictVariable (#143699 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143699 Approved by: https://github.com/williamwen42, https://github.com/yanboliang, https://github.com/jansel ghstack dependencies: #143722	2024-12-24 02:00:18 +00:00
Animesh Jain	fe95cbe018	[dynamo] Remove DICT_SUBCLASS_GUARD_MANAGER and use dict.keys (#143722 ) In hinsight, we never needed a DICT_SUBCLASS_GUARD_MANAGER, because Dynamo would inline through the overridden keys method. In this PR, we ensure that while creating guards and constructing variable trackers, we get the `d.keys()` value by using `dict.keys(d)`. This ensures that we do not call overridden keys method. Therefore, the C++ guard can use `PyDict_Next` directly to check the guards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143722 Approved by: https://github.com/jansel	2024-12-24 02:00:18 +00:00
Sam Larsen	4271a95590	[logging] A few fixes/updates to record_compilation_metrics (#143332 ) Summary: Mostly cosmetic, but one bug fix: * Bug fix: Make sure compile_id is converted to a string in the compilation metrics so it's printed as, e.g., "0/1" instead of "[0, 1]" * Sort collections in `collection_to_str` * Print non-string elements as `"<unknown>"` instead of None (since we don't expect non-strings) * Move the population of the legacy metrics and any pre-processing to a new factory method in CompilationMetrics Test Plan: ``` python test/dynamo/test_structured_trace.py python test/dynamo/test_utils.py ``` Internal testing: https://fburl.com/scuba/dynamo_compile/sandbox/l0me8auf Pull Request resolved: https://github.com/pytorch/pytorch/pull/143332 Approved by: https://github.com/ppanchalia	2024-12-23 23:10:11 +00:00
Aaron Orenstein	06b4b96b34	dynamo tracing perf: no re in arg_ref: 33.9 -> 33.7 (#143069 ) See #143056 for overall docs. This PR: Avoid use of python re and move valid varname check in `GuardBuilder.arg_ref()` into C++ Pull Request resolved: https://github.com/pytorch/pytorch/pull/143069 Approved by: https://github.com/jansel	2024-12-23 05:32:09 +00:00
Oguz Ulgen	dc55704b48	Rename cache limit to recompile limit in configs (#143709 ) This PR renames every cache_limit to recompile_limit via sed. Old config options are maintained via Config(alias='xyz') Pull Request resolved: https://github.com/pytorch/pytorch/pull/143709 Approved by: https://github.com/jansel	2024-12-22 10:03:57 +00:00
Aaron Orenstein	9bf4b1c2e9	dynamo tracing perf: c++ strip_function_call: 49.12 -> 47.77 (#143063 ) See #143056 for overall docs. This PR: Convert `strip_function_call()` into C++ Pull Request resolved: https://github.com/pytorch/pytorch/pull/143063 Approved by: https://github.com/jansel ghstack dependencies: #143057, #143062	2024-12-22 06:38:46 +00:00
Aaron Orenstein	3ec04d30d5	dynamo tracing perf: kill import: 50.36 -> 49.12 (#143062 ) See #143056 for overall docs. This PR: Stop importing in the body of `BuiltinVariable.call_getattr()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143062 Approved by: https://github.com/jansel ghstack dependencies: #143057	2024-12-22 06:38:46 +00:00
Aaron Orenstein	f2b744b9ca	dynamo tracing perf: import_module: 59.92 -> 52.9 (#143057 ) See #143056 for overall docs. This PR: Using `importlib.import_module()` within the hot path of symbolic_convert is slow. Memoize it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143057 Approved by: https://github.com/jansel	2024-12-22 06:38:38 +00:00
Tom Ritchford	f1cbf4b1b5	Enable ruff's unused variable checking everywhere in pytorch (#136965 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136965 Approved by: https://github.com/cyyever, https://github.com/albanD	2024-12-22 02:33:11 +00:00
Simon Fan	a8953c36f5	[compiled autograd] log compilation time to perfetto (#140964 ) https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmprli4iy/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 ``` [ { "args": { "compile_id": "0/-/-", "graph_id": 0 }, "cat": "dynamo_timed", "name": "compiled_autograd", "ph": "B", "pid": 0, "tid": 0, "ts": 1733886868992655.8 }, { "args": { "compile_id": "0/-/-", "graph_id": 0 }, "cat": "dynamo_timed", "name": "compiled_autograd", "ph": "E", "pid": 0, "tid": 0, "ts": 1733886869130681.0 }, { "args": { "compile_id": "0/0/0" }, "cat": "dynamo_timed", "name": "dynamo", "ph": "B", "pid": 0, "tid": 0, "ts": 1733886869134350.5 }, { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140964 Approved by: https://github.com/masnesral ghstack dependencies: #141907, #143175	2024-12-21 04:23:25 +00:00
Animesh Jain	0da004f3dd	[dynamo] Remove transformers ModelOutput hack (#143567 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143567 Approved by: https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #143548	2024-12-21 01:46:14 +00:00
Animesh Jain	4627cfd1f9	[dynamo] Support user defined dicts (#143548 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143548 Approved by: https://github.com/yanboliang, https://github.com/jansel, https://github.com/williamwen42	2024-12-21 01:46:14 +00:00
Simon Fan	ffd1b53f26	[aot] refactor dynamo source and cudagraphs static idx logic (#141748 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141748 Approved by: https://github.com/ezyang	2024-12-21 01:20:53 +00:00
Simon Fan	d88ebbf822	cleanup chromium event log on dynamo exit rather than on entry (#143175 ) clearing at dynamo start is an issue because it throws away events from compiled autograd Pull Request resolved: https://github.com/pytorch/pytorch/pull/143175 Approved by: https://github.com/Skylion007, https://github.com/jamesjwu ghstack dependencies: #141907	2024-12-21 00:41:24 +00:00
Simon Fan	4ee166b82f	[ca] add compiled autograd to CompileId (#141907 ) tlparse PR: https://github.com/ezyang/tlparse/pull/83 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141907 Approved by: https://github.com/ezyang	2024-12-21 00:41:24 +00:00
PyTorch MergeBot	ad7ab5ef84	Revert "[logging] A few fixes/updates to record_compilation_metrics (#143332 )" This reverts commit `a9c753bbc8`. Reverted https://github.com/pytorch/pytorch/pull/143332 on behalf of https://github.com/malfet due to Surprisingly failure is caused by this PR ([comment](https://github.com/pytorch/pytorch/pull/143332#issuecomment-2557899120))	2024-12-21 00:06:44 +00:00
Sam Larsen	a9c753bbc8	[logging] A few fixes/updates to record_compilation_metrics (#143332 ) Summary: Mostly cosmetic, but one bug fix: * Bug fix: Make sure compile_id is converted to a string in the compilation metrics so it's printed as, e.g., "0/1" instead of "[0, 1]" * Sort collections in `collection_to_str` * Print non-string elements as `"<unknown>"` instead of None (since we don't expect non-strings) * Move the population of the legacy metrics and any pre-processing to a new factory method in CompilationMetrics Test Plan: ``` python test/dynamo/test_structured_trace.py python test/dynamo/test_utils.py ``` Internal testing: https://fburl.com/scuba/dynamo_compile/sandbox/l0me8auf Pull Request resolved: https://github.com/pytorch/pytorch/pull/143332 Approved by: https://github.com/ppanchalia	2024-12-20 21:42:32 +00:00
Colin L. Rice	a94f259a69	pgo: Log feature use (#142819 ) This will cause dynamo_compile to popualte the feature column if we have a hit for PGO. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142819 Approved by: https://github.com/ezyang	2024-12-20 20:22:20 +00:00
Aaron Orenstein	8ce0bc282a	dynamo tracing perf: bytecode_transform improvements: 34.86 -> 33.9 (#143068 ) See #143056 for overall docs. This PR: Use slots on InstructionExnTabEntry and Instruction. Stop doing python version checks in the middle of `convert_instruction()` and `inst_has_op_bits()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143068 Approved by: https://github.com/jansel ghstack dependencies: #143065, #143067	2024-12-20 20:06:42 +00:00
Aaron Orenstein	5feb2d7b41	dynamo tracing perf: don't call expensive _set_guard_export_info if it's a duplicate guard: 37.66 -> 34.86 (#143067 ) See #143056 for overall docs. This PR: Move the call to `_set_guard_export_info()` after the duplicate guard check in `GuardBuilder.DUPLICATE_INPUT()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143067 Approved by: https://github.com/jansel ghstack dependencies: #143065	2024-12-20 20:06:42 +00:00
Aaron Orenstein	7d4e7fbfc1	dynamo tracing perf: no import on hot path: 47.62 -> 47.26 (#143065 ) See #143056 for overall docs. This PR: Removed another `import` in the body of the hot path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143065 Approved by: https://github.com/jansel	2024-12-20 20:06:42 +00:00
Nikhil Gupta	94737e8a2a	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-20 19:32:03 +00:00
bobrenjc93	4f8b7c4272	Revert "refactor tensorify restart logic to use sources (#141517 )" (#143623 ) This reverts commit `30d8b30db7`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143623 Approved by: https://github.com/mlazos	2024-12-20 15:38:34 +00:00
Guilherme Leobas	1c817fe671	Set `enable_trace_contextlib_contextmanager` flag to True (#140604 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140604 Approved by: https://github.com/zou3519 ghstack dependencies: #136033	2024-12-20 12:02:27 +00:00
Guilherme Leobas	673cc88fd6	Add support for `contextmanager` in Dynamo (#136033 ) Fixes #130559 * Intro This PR adds support for `@contextmanager` in Dynamo. We chose to limit the scope of this work to only `@contextmanager` and plan to handle generators fully in #141055 (still in draft). * Motivation Dynamo lacks support for generator functions. When it encounters one, it traces it as if it were a regular function. This is problematic because it can lead to incorrect behavior. To illustrate, consider the test case below: ```python import torch import contextlib @contextlib.contextmanager def set_default_dtype(dtype): old_dtype = torch.get_default_dtype() try: torch.set_default_dtype(dtype) yield finally: torch.set_default_dtype(old_dtype) @torch.compile(backend="eager", fullgraph=True) def fn(): with set_default_dtype(torch.float64): x = torch.tensor([3.0, 3.0 + 5.0j]) return x ``` Before this work, Dynamo would not stop at the `yield`, and the graph produced would contain both calls to `set_default_dtype` executed one after the other. This is incorrect because the context manager should execute code before and after the `yield`. * List of changes `YIELD_VALUE` now raises an exception (`YieldValueOp`) to signal that control flow must be suspended and returned to the caller. Additionally, `RETURN_VALUE` behaves differently in a generator function. Unlike regular functions, where `RETURN_VALUE` indicates the final result, in generators it signifies that the generator is exhausted and implicitly raises `StopIteration`. A new `VariableTracker` named `FunctionDecoratedByContextlibContextManagerVariable` was introduced to handle `@contextmanager`. This variable tracker acts not just as a wrapper for the original function but also maintains an internal `tx` (InstructionTranslator) object to suspend and return control flow to the parent tracer when a `yield` is encountered. * Corner cases Returning a context manager from a compiled function is not supported. This would require PyTorch to synchronize the generator state between Dynamo and the interpreter. Any attempt to return it will result in an `IncorrectUsage` exception. Graph breaks require special handling as well. In the event of a graph break, the frame associated with the context manager is skipped, and the context manager runs in eager mode. * This PR is breaking my code There is a configuration flag (`enable_trace_contextlib`) that can be set to `False` to disable tracing context managers. If this still causes crashes, please revert this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136033 Approved by: https://github.com/zou3519	2024-12-20 12:02:20 +00:00
Michael Lazos	270ad513c8	[Dynamo] only import einops if version is lower than 0.7.0 (#142847 ) Fixes internal xref (https://fb.workplace.com/groups/257735836456307/posts/804793021750583/?comment_id=805229281706957&reply_comment_id=805232695039949) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142847 Approved by: https://github.com/zou3519	2024-12-20 07:46:49 +00:00
Michael Lazos	fd23cf5848	[Dynamo] check node class first for graph dedup (#143609 ) as title Pull Request resolved: https://github.com/pytorch/pytorch/pull/143609 Approved by: https://github.com/williamwen42	2024-12-20 04:09:46 +00:00
Jane Xu	a0cff096bc	Improve cond error messaging (#143595 ) Discovered by @drisspg and I trying out a simple toy example and being way too confused :') Pull Request resolved: https://github.com/pytorch/pytorch/pull/143595 Approved by: https://github.com/zou3519, https://github.com/ydwu4	2024-12-20 01:19:20 +00:00
PyTorch MergeBot	8136daff5a	Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 )" This reverts commit `4b82251011`. Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks lots of internal build ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2555953189))	2024-12-19 23:33:17 +00:00
PyTorch MergeBot	145fd5bad0	Revert "[Dynamo] only import einops if version is lower than 0.7.0 (#142847 )" This reverts commit `a96387a481`. Reverted https://github.com/pytorch/pytorch/pull/142847 on behalf of https://github.com/huydhn due to This has been reverted internally D67436053 ([comment](https://github.com/pytorch/pytorch/pull/142847#issuecomment-2555942351))	2024-12-19 23:22:44 +00:00
bobrenjc93	8850a7b62c	add some logging for tensorify (#143391 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143391 Approved by: https://github.com/jamesjwu	2024-12-19 20:06:26 +00:00
Yanbo Liang	c46cfc245f	[Dynamo] Support dict_keys from nested dict object (#143557 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143557 Approved by: https://github.com/williamwen42 ghstack dependencies: #143374, #143547	2024-12-19 19:02:55 +00:00
Yanbo Liang	5fa287aa82	[Dynamo] Rename Dict{View/Keys/Values} to Dict{View/Keys/Values}Variable (#143547 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143547 Approved by: https://github.com/williamwen42 ghstack dependencies: #143374	2024-12-19 19:02:55 +00:00
Nikhil Gupta	4b82251011	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-19 18:51:26 +00:00
William Wen	e1e83015d2	[dynamo, 3.13t] raise error if torch.compile is attempted in 3.13t (nogil) (#143404 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143404 Approved by: https://github.com/colesbury, https://github.com/atalman	2024-12-19 18:10:01 +00:00
bobrenjc93	171e6a934f	Don't 1 specialize if stride is contiguous (#143365 ) Fixes: https://github.com/pytorch/pytorch/issues/142024 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143365 Approved by: https://github.com/ezyang	2024-12-19 15:22:47 +00:00
Animesh Jain	465f282a24	[reland][dynamo][guards] Consider tensors as immutable for dict tag matches (#141085 ) Reland - https://github.com/pytorch/pytorch/pull/139560 As mentioned in https://github.com/pytorch/pytorch/pull/130341, using `static py::object` can lead to segfaults. I suspect this is the reason for the import system error seen internally (https://www.internalfb.com/sevmanager/view/469592). In this PR, I am removing the `static` part. This is fine and also the right thing to do because this will catch if user changes the flag in the same process for compiling two different functions. Unfortunately, there is no easy way to trigger this segfault, so I can't write a test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141085 Approved by: https://github.com/jansel Co-authored-by: William Wen <williamwen@meta.com>	2024-12-19 15:16:10 +00:00
Aaron Orenstein	da06d47bdb	dynamo tracing perf: slight improvement on __instancecheck__: 47.77 -> 47.62 (#143064 ) See #143056 for overall docs. This PR: Switch out an `isinstance()` for an `is` in the very hot `VariableTrackerMeta.__instancecheck__`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143064 Approved by: https://github.com/ezyang, https://github.com/jansel	2024-12-19 09:19:35 +00:00
Yanbo Liang	2ffdcab04c	[Dynamo] Add DictKeySetVariable to capture dict_keys passed outside of compiled region (#143374 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143374 Approved by: https://github.com/williamwen42, https://github.com/jansel	2024-12-19 06:39:27 +00:00
PyTorch MergeBot	14fe1f7190	Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 )" This reverts commit `d3ff2d42c2`. Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/malfet due to This broke S390 builds, includes cpuinfo unconditionally ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2552560208))	2024-12-19 01:05:11 +00:00
Michael Lazos	5c3996cab2	[Dynamo] topologically sort duplicated graph regions (#143523 ) Ensure regions are topologically sorted Pull Request resolved: https://github.com/pytorch/pytorch/pull/143523 Approved by: https://github.com/williamwen42	2024-12-19 00:43:48 +00:00
Michael Lazos	4eafbe5288	[Dynamo] Flatten slices during graph deduplication (#143522 ) I encountered this issue while debugging torchtune - overall we need to make sure to not miss nodes that are slice arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143522 Approved by: https://github.com/williamwen42	2024-12-18 23:12:34 +00:00
Ryan Guo	5380407af5	[dynamo] Properly model root frame globals during inlining (#143447 ) This patch updates `InliningInstructionTranslator.STORE_GLOBAL` to properly check whether `self.f_globals` is the same as root frame `f_globals`. See added comments for why this is important. Fixes #143425. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143447 Approved by: https://github.com/zou3519	2024-12-18 23:04:02 +00:00
Nikhil Gupta	d3ff2d42c2	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-18 22:30:07 +00:00
Yidi Wu	1e201422ed	[export] add is_exporting flag (#142425 ) We added an is_export flag under torch.compiler.is_exporting. This comes handy when we try to do some special logic in user-level and system-level (e.g. in upper of the stack). In increasing-scope: - `_is_fx_tracing` is set to True when we use under symbolic_trace or make_fx. - `is_exporting` is set to True when we're doing strict or non-strict export, which internally has a step that calls make_fx and set _is_fx_tracing to be True. - `is_compiling` is set to True when we're either doing strict, non-strict export or torch.compile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142425 Approved by: https://github.com/avikchaudhuri	2024-12-18 21:36:28 +00:00
qiurc	90cc43f270	Support garbage collection after pt2 compilation (#143364 ) Summary: Support garbage collection after pt2 compilation. Add jk to control the global rollout / rollback of this functionality Add env var to control individual job's rollout Test Plan: Test the model training job with / without this changes Reviewers: @yuxihu @ezyang , @Yuzhen11 , Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/143364 Approved by: https://github.com/ezyang	2024-12-18 07:25:11 +00:00
Michael Lazos	a96387a481	[Dynamo] only import einops if version is lower than 0.7.0 (#142847 ) Fixes internal xref (https://fb.workplace.com/groups/257735836456307/posts/804793021750583/?comment_id=805229281706957&reply_comment_id=805232695039949) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142847 Approved by: https://github.com/zou3519	2024-12-17 20:50:25 +00:00
William Wen	18261e9f39	[dynamo] implement framelocals mapping as c++ object (#140063 ) Implements https://github.com/pytorch/pytorch/issues/93753 - move frame local guard accessors to C++. Before, we used dict accessors on a Python dict representing the frame's fastlocals that we manually build. We move this accessor to C++ and additionally use the fastlocal index whenever possible. Some implementation notes: - `FrameLocalsMapping` is now initialized as a C++ vector of `PyObject`s. We do not just use the frame's localsplus/fastlocals buffer because we also unbox cells. - `FrameLocalsMapping` can still be converted into a Python dict representing the frame's fastlocals, but it is done lazily. - We update `LeafGuard`, `GuardAccessor`, and `GuardManager`'s `check_nopybind` methods to accept `FrameLocalsMapping`. By default, we convert the `FrameLocalsMapping` to a Python dict and run the original `check_nopybind` on it, but in some cases, conversion is not needed. - We add a new guard accessor `FrameLocalsGuardAccessor`, which is similar to `DictGetItemGuardAccessor` but has special handling for `FrameLocalsMapping`. We create a separate class to emphasize different use cases, but we could probably combine these two (can do in a follow up) dynamo_guard_eval.py microbenchmark update: - 713.2us -> 630.0us (3.10) - 598.8us -> 530.7us (3.12) Other followups: - Add `FrameLocalsMapping` version for `check_verbose_nopybind` in order to match behavior between `check_nopybind` and `check_verbose_nopybind`. This can prevent difficult debugging situations where guards fail (`check_nopybind` returns false) but no guard error message is generated (`check_verbose_nopybind` succeeds). - Rewrite the `SHAPE_ENV` guard into C++ - it is a fairly common guard that results in `FrameLocalsMapping` needing to convert to a dict Pull Request resolved: https://github.com/pytorch/pytorch/pull/140063 Approved by: https://github.com/jansel ghstack dependencies: #142117, #142430	2024-12-17 18:54:27 +00:00
PyTorch MergeBot	e3d754419f	Revert "[reland][dynamo][guards] Consider tensors as immutable for dict tag matches (#141085 )" This reverts commit `1bf983077f`. Reverted https://github.com/pytorch/pytorch/pull/141085 on behalf of https://github.com/huydhn due to The diff D66211131 has been commandeered internally and is it not part of the train anymore. If codev is needed, pls reland this accordingly ([comment](https://github.com/pytorch/pytorch/pull/141085#issuecomment-2549092225))	2024-12-17 17:21:14 +00:00
Guilherme Leobas	487343346e	Prevent users from seeing hardcoded print stmt when hypothesis is not installed (#142398 ) Fixes: #142357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142398 Approved by: https://github.com/zou3519	2024-12-17 16:59:05 +00:00
PyTorch MergeBot	969b07b96f	Revert "[ROCm] CK Flash Attention Backend (#138947 )" This reverts commit `500d02921b`. Reverted https://github.com/pytorch/pytorch/pull/138947 on behalf of https://github.com/atalman due to Breaks default windows checkout ([comment](https://github.com/pytorch/pytorch/pull/138947#issuecomment-2548998359))	2024-12-17 16:46:57 +00:00
drisspg	5160a725c8	[FlexAttention] Fix broken eager tracing (#143344 ) Fixes #143331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143344 Approved by: https://github.com/Chillee ghstack dependencies: #143299	2024-12-17 09:42:36 +00:00
Andy Lugo	500d02921b	[ROCm] CK Flash Attention Backend (#138947 ) Replaces https://github.com/ROCm/pytorch/pull/1592 This PR contains the initial implementation of SDPA with composable_kernel backend. The CK path can be forced by simply calling `torch.backends.cuda.preferred_rocm_fa_library("ck")`. Similarly, you can force the incumbent aotriton implementation by passing in "aotriton" or "default". As you'd expect, not setting this option will result in aotriton to be used as the backend. In the case of CK, if pytorch deems flash attention usable, then it will use the CK path in all the same places aotriton would have been used. This PR makes no changes to the heuristics which select which attention scheme to use (i.e. flash attention vs memory efficient attention vs math etc etc). It only gets called when flash attention is both enabled (via `USE_FLASH_ATTENTION`) and is selected at runtime by the existing heuristics. Files located in pytorch/aten/src/ATen/native/transformers/hip/flash_attn/ck/mha* have been pulled from https://github.com/Dao-AILab/flash-attention courtesy of @tridao's hard work who is the co-author NOTE: In order to use this backend, the user MUST set USE_CK_FLASH_ATTENTION=1 in their environment when they build PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138947 Approved by: https://github.com/pruthvistony, https://github.com/xw285cornell, https://github.com/leitian Co-authored-by: Xiaodong Wang <xw285@cornell.edu>	2024-12-17 02:18:07 +00:00
William Wen	1b6b86fad7	[dynamo] disable eval frame callback around most of _TorchDynamoContext wrapper function (#143211 ) Internal xref: https://fb.workplace.com/groups/1075192433118967/permalink/1559636954674510/ If the `_fn` returned by `_TorchDynamoContext.__call__` makes an external function call, dynamo is recursively invoked. This can cause issues if there are added calls that are not skipped by Dynamo. So we should disable the eval frame callback as much as possible. Differential Revision: [D67211749](https://our.internmc.facebook.com/intern/diff/D67211749) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143211 Approved by: https://github.com/jansel	2024-12-16 18:38:58 +00:00
Animesh Jain	1bf983077f	[reland][dynamo][guards] Consider tensors as immutable for dict tag matches (#141085 ) Reland - https://github.com/pytorch/pytorch/pull/139560 As mentioned in https://github.com/pytorch/pytorch/pull/130341, using `static py::object` can lead to segfaults. I suspect this is the reason for the import system error seen internally (https://www.internalfb.com/sevmanager/view/469592). In this PR, I am removing the `static` part. This is fine and also the right thing to do because this will catch if user changes the flag in the same process for compiling two different functions. Unfortunately, there is no easy way to trigger this segfault, so I can't write a test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141085 Approved by: https://github.com/jansel Co-authored-by: William Wen <williamwen@meta.com>	2024-12-16 18:38:32 +00:00

... 3 4 5 6 7 ...

4261 Commits