pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
bobrenjc93	f649ee73ce	Use source hashing to generate consistent symbolic ids (#149665 ) This PR was inspired by internal models that were cache missing due to PGO. At a high level the problem looks as follows Run 1, Invocation 1: We do static compile, save some example values in PGO/automatic dynamic Run 1, Invocation 2: We detect varying inputs, do dynamic compile, get a dynamic graph and save to PGO. Crucially what we save to PGO is actually a superset of what is actually dynamic. If we notice an input was varying, we mark it as dynamic in PGO even if later on that value gets specialized. When a value gets specialized, we actually remove the symbol from the graph. This results in an interesting conundrum where although we are producing the same isomorphic graph, PGO makes the second run cache miss. Let's see how.... Run 2, Invocation 1: We fetch the PGO, over-mark things as dynamic, get a fx graph, look it up in the cache and... whoops! cache miss! This is because of the aforementioned behavior where the PGO profile will cause us to over-allocate symbols. In practice this means we end up saving a graph in cache with symbols x:s1, y:s3 and on second attempt we cache miss with x:s1, y:s6 where symbols s3,s4,s5 were all optimistically marked dynamic by PGO and subsequently specialized. We solve this problem by hashing the source names. This ensures somewhat stable assignment. To prevent catastrophic symbol collisions, we use linear probing to ensure no collisions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149665 Approved by: https://github.com/Mingming-Ding, https://github.com/laithsakka	2025-03-28 05:36:32 +00:00
PyTorch MergeBot	af7719a2fa	Revert "Use source hashing to generate consistent symbolic ids (#149665 )" This reverts commit `1f92348dc6`. Reverted https://github.com/pytorch/pytorch/pull/149665 on behalf of https://github.com/malfet due to Broke trunk, see `6eb3c2e282/1` ([comment](https://github.com/pytorch/pytorch/pull/149665#issuecomment-2758578187))	2025-03-27 16:02:27 +00:00
bobrenjc93	1f92348dc6	Use source hashing to generate consistent symbolic ids (#149665 ) This PR was inspired by internal models that were cache missing due to PGO. At a high level the problem looks as follows Run 1, Invocation 1: We do static compile, save some example values in PGO/automatic dynamic Run 1, Invocation 2: We detect varying inputs, do dynamic compile, get a dynamic graph and save to PGO. Crucially what we save to PGO is actually a superset of what is actually dynamic. If we notice an input was varying, we mark it as dynamic in PGO even if later on that value gets specialized. When a value gets specialized, we actually remove the symbol from the graph. This results in an interesting conundrum where although we are producing the same isomorphic graph, PGO makes the second run cache miss. Let's see how.... Run 2, Invocation 1: We fetch the PGO, over-mark things as dynamic, get a fx graph, look it up in the cache and... whoops! cache miss! This is because of the aforementioned behavior where the PGO profile will cause us to over-allocate symbols. In practice this means we end up saving a graph in cache with symbols x:s1, y:s3 and on second attempt we cache miss with x:s1, y:s6 where symbols s3,s4,s5 were all optimistically marked dynamic by PGO and subsequently specialized. We solve this problem by hashing the source names. This ensures somewhat stable assignment. To prevent catastrophic symbol collisions, we use linear probing to ensure no collisions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149665 Approved by: https://github.com/Mingming-Ding, https://github.com/laithsakka	2025-03-27 03:39:27 +00:00
Yidi Wu	0a0a73a9a9	[cond] don't trace fw and bw graph in autograd key (#148930 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148930 Approved by: https://github.com/zou3519	2025-03-24 17:07:29 +00:00
PyTorch MergeBot	24176f6e32	Revert "[cond] don't trace fw and bw graph in autograd key (#148930 )" This reverts commit `6e843a51dd`. Reverted https://github.com/pytorch/pytorch/pull/148930 on behalf of https://github.com/ydwu4 due to Test failure is legit ([comment](https://github.com/pytorch/pytorch/pull/148930#issuecomment-2741585315))	2025-03-20 20:28:29 +00:00
Yidi Wu	6e843a51dd	[cond] don't trace fw and bw graph in autograd key (#148930 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148930 Approved by: https://github.com/zou3519	2025-03-20 20:18:29 +00:00
IvanKobzarev	2c4bc65366	[aotd] Guess tangents stride as output strides (#144579 ) AOTDispatch doing AOT backward graph preparation does not know real tangents that user will specify when runs backward. AOTD guesses the tangents. Before - we guessed that memory format of tangents will be as memory format of corresponding outputs. And if specified tangents at runtime are not the same memory format as we guessed during compilation, AOTD does coercion (copy) to guessed memory_format But as Horace found, there are popular use cases, where the outputs of compiled region will be in specific memory_format. E.g. in 4D tensor transposing dims 1 and 2. https://github.com/karpathy/nanoGPT/blob/master/model.py#L57 This PR changes the logic, that AOTD expects the same "strideness" of tangents as outputs. As a result it will avoid coercion for the case of transposed dims. Limitations: We keep guessing memory_format for: 1/ Dynamic shapes (needs more changes) 2/ Tensor subclasses (needs more changes) Other changes: test_torchinductor was always creating contiguous tangents via `torch.randn()`, changing them to be `torch.randn_like()` to compare computation with the same strideness. (E.g. for cuda float16 strideness affects numerics for fft ops). Pull Request resolved: https://github.com/pytorch/pytorch/pull/144579 Approved by: https://github.com/bdhirsh	2025-03-20 15:41:36 +00:00
Aleksei Nikiforov	d5b1d99f78	Enable more nightly tests on s390x (#148452 ) Also enable some tests which probably were accidentally disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148452 Approved by: https://github.com/seemethere, https://github.com/malfet	2025-03-18 16:09:39 +00:00
angelayi	8d7c430e84	Symintify transpose_ (#149057 ) Fixes https://github.com/pytorch/pytorch/issues/148702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149057 Approved by: https://github.com/yushangdi	2025-03-17 19:11:54 +00:00
cyy	116c1e42c5	Re-enable tests (#148732 ) No UBSAN failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148732 Approved by: https://github.com/Skylion007	2025-03-07 18:11:57 +00:00
Mikayla Gawarecki	536bce5a04	Make Tensor.set_ validate storage_offset when sizes/strides are unchanged (#147354 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147354 Approved by: https://github.com/albanD ghstack dependencies: #147352	2025-02-27 15:48:58 +00:00
IvanKobzarev	8594856651	[aotd] Alias of intermediate unwrap TensorAlias (#147638 ) Bug was reported by internal user. AOTD classified outputs that are aliases of intermediates of the graph in different categories. ... - output is alias of intermediate which base is already output - output is alias of intermediate which base is not in output If we look at the fn: ``` def fn(x): ix = x + 1 a = ix.transpose(0, 1) return a.detach(), a ``` output 0: detach view of alias a, where a is already output output 1: alias of intermediate ix, then additional output ix will be added internally output 0 base is TensorAlias(a) in this case, but could be Tensor. Adding runtime unwrapping solves this problem. Alternatively we should track base of a.detach() all the way to ix, in that case the base will be always a Tensor, not TensorAlias. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147638 Approved by: https://github.com/bdhirsh	2025-02-26 19:42:21 +00:00
Brian Hirsh	447a142de2	support input mutations on tangents in compile (#141131 ) Fixes https://github.com/pytorch/pytorch/issues/141111. We previously supported mutations on saved activations that happened in the backward. This PR extends the support to tangents Pull Request resolved: https://github.com/pytorch/pytorch/pull/141131 Approved by: https://github.com/zou3519	2025-02-13 17:48:56 +00:00
Edward Z. Yang	87fdadde1d	Remove FFT from stride incorrect ops (#145080 ) I gotta say, the FFT implementation is completely insane, there's gotta be a better way to do this than repeatedly inplace restriding the output tensor. Anyway, this is a faithful translation of both the MKL and cuFFT paths to Python. Fixes https://github.com/pytorch/pytorch/issues/135087 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/145080 Approved by: https://github.com/Skylion007, https://github.com/albanD ghstack dependencies: #145530	2025-01-27 04:26:04 +00:00
Yidi Wu	bdc2c2a237	[be] fix flaky test aot_export_ cond caused by free symbol lifting and automatic dynamic shape (#145330 ) Fixes https://github.com/pytorch/pytorch/issues/139998#issuecomment-2605908426. It seems to be an issue caused by the interaction between dynamoed hop X automatic dynamic shape X auto_lift_free symbols. The immediate error is that the asserteExpectedInline of the graph can sometimes be different e.g. see https://hud.pytorch.org/flakytest?name=test_aot_export_with_torch_cond&suite=TestAOTExport&limit=100, where sometimes the shapes are lifted as input to the cond and sometimes they're not. The root cause of the flakyness is that the two invocations of torch.cond triggers two torch.compile on the same code object ([code](https://github.com/pytorch/pytorch/blob/main/torch/_higher_order_ops/cond.py#L192)), and triggers automatic dynamic shape because in test_aot_export_with_torch_cond, x has shape (3, 4) while the pre_dispatch one has shape (2, 2). Because of we auto lift free symbols for dynamic shaped input, this causes cond sometimes have the shape as arguments and sometimes not. This PR adds a simple fix by adding a _dynamo.reset before each torch.cond tests. This fixes the error by not triggering automatic dynamic shape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145330 Approved by: https://github.com/zou3519	2025-01-23 18:12:58 +00:00
Aaron Orenstein	99dbc5b0e2	PEP585 update - test (#145176 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145176 Approved by: https://github.com/bobrenjc93	2025-01-22 04:48:28 +00:00
PyTorch MergeBot	6c713ccb5e	Revert "Make functionalization `ViewMeta` serializable with pickle. (#143712 )" This reverts commit `b8abdaa286`. Reverted https://github.com/pytorch/pytorch/pull/143712 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/143712#issuecomment-2597205261))	2025-01-17 00:52:50 +00:00
Yukio Siraichi	b8abdaa286	Make functionalization `ViewMeta` serializable with pickle. (#143712 ) Fix: #141974 This PR makes `ViewMeta` sequence, present in functional tensors, serializable with pickle. In order to accomplish that, it makes `ViewMeta` an abstract class with overridable `forward` and `reverse` functions. In this context, each operation that once instanciated `ViewMeta`, should now create a new specialized class that inherits from `ViewMeta. Therefore, this PR also uses codegen for creating these specializations. In summary, these are the changes this PR introduces: - `ViewMeta` is turned into an abstract class (see _FunctionalStorageImpl.cpp_). `forward` and `reverse` are pure virtual functions that need to be implemented. `to_out_index` should be implemented by operations that might return more than 1 output. - New `ViewMeta` specializations for `resize_` and `_unsafe_view` are created (see _FunctionalizeFallbackKernel.h_). - New templates _ViewMetaClasses.{cpp,h}_ are created. They hold the declaration and definition of the `ViewMeta` specializations, which are automatically generated in the ATen codegen (see _gen.py_). - New `_functionalization` Python sub-module is created (see _Module.cpp_). It serves as namespace for the `ViewMeta` specializations and `InverseReturnMode` enum. - New template _ViewMetaClassesPythonBinding.cpp_ is created. It holds the automatically generated Python bindings for the `ViewMeta` specialization, which are generated in the torch codegen (see _generate_code.py_). Note that this PR makes use of codegen at 2 different moments: - ATen codegen (_gen.py_): generates the `ViewMeta` specialized classes. - Torch codegen (_generate_code.py_): generated the Python bindings for them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143712 Approved by: https://github.com/bdhirsh	2025-01-16 19:41:41 +00:00
Brian Hirsh	d7f45fc575	dynamic shape support for interpolate(antialias=True) backward (#141198 ) Fixes https://github.com/pytorch/pytorch/issues/141187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141198 Approved by: https://github.com/ezyang, https://github.com/Chillee ghstack dependencies: #141161	2025-01-16 00:08:25 +00:00
Tugsbayasgalan Manlaibaatar	d65a50ef34	Fix subclass unwrapping bug (#143945 ) I noticed a small bug in tensor subclass unwrapping logic. cc @IvanKobzarev It seems easier if we just implement it recursively so that it is easier to track the inner attrs to corresponding plain tensors and both aot_autograd and fake_tensor implement subclass unwrapping recursively. Differential Revision: [D67693610](https://our.internmc.facebook.com/intern/diff/D67693610) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143945 Approved by: https://github.com/IvanKobzarev	2025-01-06 17:38:19 +00:00
Tom Ritchford	d8c8ba2440	Fix unused Python variables in test/[e-z]* (#136964 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964 Approved by: https://github.com/justinchuby, https://github.com/albanD	2024-12-18 23:02:30 +00:00
Yidi Wu	a8fa98ccef	skip test dynamo for aot_dispatch tests on ci (#142185 ) A lot of tests in test_aotdispatch.py is not meaningful (from user's perspective) when we run with dynamo. So we skip them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142185 Approved by: https://github.com/zou3519	2024-12-11 18:46:58 +00:00
IvanKobzarev	1d3b0108a6	[subclass] Fix unwrap subclass parametrization for Nested subclasses (#142481 ) @tugsbayasgalan found a bug for nested subclasses: E.g. we have TwoTensor(TwoTensor(t1, t2), t0). After right_inverse we have: rebuilt_stack == [(TwoTensor, meta, ["a", "b"]), (TwoTensor, meta, ["a", "b"])] plain_tensors == [t0, t1, t2] We will first put plain tensors, and only then the nested TwoTensor. But when we unflatten: todo = [t0, t1, t2] we first create TwoTensor[t1, t2] put it to todo [t0, TwoTensor[t1, t2]] And as a result get TwoTensor(t0, TwoTensor(t1, t2)) which is swapping original a and b :) So the fix should be different, we need to preserve the order of elements in the stack for plain/subclasses. I will think about the fix. Fix: Keep order of inner_tensor_attr_names according them added to the stack. (first - plain tensor attributes, then subclass attributes) Test: ``` python test/functorch/test_aotdispatch.py -k test_subclass_parameters ``` Differential Revision: [D67032477](https://our.internmc.facebook.com/intern/diff/D67032477) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142481 Approved by: https://github.com/tugsbayasgalan, https://github.com/bdhirsh	2024-12-11 06:05:48 +00:00
PyTorch MergeBot	3e28da1e06	Revert "skip test dynamo for aot_dispatch tests on ci (#142185 )" This reverts commit `7eda06b366`. Reverted https://github.com/pytorch/pytorch/pull/142185 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I think it has a landrace in trunk ([comment](https://github.com/pytorch/pytorch/pull/142185#issuecomment-2532605728))	2024-12-10 18:50:17 +00:00
Yidi Wu	7eda06b366	skip test dynamo for aot_dispatch tests on ci (#142185 ) A lot of tests in test_aotdispatch.py is not meaningful (from user's perspective) when we run with dynamo. So we skip them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142185 Approved by: https://github.com/zou3519 ghstack dependencies: #141610	2024-12-10 17:33:57 +00:00
James Wu	6e203ae6de	[REFACTOR] Implement AOTDispatchCompiler wrapper (#142205 ) This implements a new wrapper class AOTDispatchCompiler wrapper, which is just a wrapper around a callable that returns an OutputCode. We can then use it in AOTDispatch to decide whether or not to use the cache: if fw_compiler, bw_compiler and inference_compiler are all AOTDispatchCompilers, then we enable caching. This type is pretty close to _CompiledFxGraphCallable, except it's not allowed to take any kwargs. Not sure how to consolidate the two ideas together just yet: unfortunately, there's no way to properly annotate the types to make them related. But a lot of the time, the input to this function will be a partially applied _CompiledFxGraphCallable. This allows the PR above this one to enable AOTAutogradCache everywhere, but not increase instruction count or enable cache on unit tests that use aot_eager or other non inductor compilers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142205 Approved by: https://github.com/oulgen, https://github.com/bdhirsh	2024-12-06 23:23:20 +00:00
IvanKobzarev	661d1f0372	[aotd] non-contiguous NestedTensor mutation in compile (#139630 ) Allow mutations mutations for subclasses that are non-contiguous. Changes: Removing assert in collect_metadata_analysis Main requested testcase: Compilation of NJT.index_put() Adding test in test_nestedtensor.py, that compiles NJT.index_put() It is decomposed to NJT split,unbind, which needed additional `torch._check`, `torch._check_is_size` for NJT.unbind() and guard_size_oblivious() usage in _meta_registrations and _inductor/lowering.py. Special case: If tangent is mutated outside of the graph, it does not participate in backward graph. Autograd in this case will set this tangent to zeros tensor. We handle it separately in CompiledFunction.backward: not doing any processing for this tangent and broadcast to number of expected subclass unwrapped arguments. disabling for dynamo 2 tests: 1/ For nested tensor - symbolic shapes issue on nested_tensor index operation that does splits [0, 0, 0] - there is a failure with "pending unbacked symints". This PR does not add more .tolist()/item() ops than it was before. 2/ As we do not fail with exception in collect_metadata_analysis new paths for dynamo started working and it started failing with smth strange that set_ in storage_offset (because of test for views) handling updates storage "cpu" -> "meta" Pull Request resolved: https://github.com/pytorch/pytorch/pull/139630 Approved by: https://github.com/bdhirsh	2024-12-06 12:18:46 +00:00
IvanKobzarev	efab8c433f	[subclass] Fix unwrap_subclass_parameters parametrization (#142155 ) Parametrization can not be registered for non-direct child parameters of the module. We have to iterate through all submodules and register parametrization at every level. Original testcase did not test the nested modules case - adding submodule to the test. Testing: ``` python test/functorch/test_aotdispatch.py -k test_subclass_parameters ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/142155 Approved by: https://github.com/bdhirsh	2024-12-05 23:53:36 +00:00
Yukio Siraichi	cbfab8b4de	Add `tensor._base` as a tracked fake for `ShapeEnv` guards. (#139554 ) This PR fixes the issue where AOTAutograd would produce a guard that used a symbolic value that came from one of the input's base. ```python @torch.compile(backend="aot_eager", dynamic=True) def f(a, b): a.add_(1) b.add_(1) return a x = torch.ones(10) f(x[1:], x[1:]) ``` In the example above, AOTAutograd functionalizes the mutation by making use of `as_strided_scatter` operation, which produces the guard: `s0 >= s1 + 1`, where: - `s0`: corresponds to `x.size()[0]` - `s1`: corresponds to `a.size()[0]` Pull Request resolved: https://github.com/pytorch/pytorch/pull/139554 Approved by: https://github.com/bdhirsh	2024-12-05 14:43:58 +00:00
James Wu	60a192036b	Refactor optional graph module into CompiledFxGraphConstants (#141897 ) FXGraphCache supports freezing, but AOTAutogradCache does not. This is due to the fact that when freezing is turned on, instead of using the constants from the graph module that was saved on cache miss, we have to take the constants from the AOTAutograd generated graph module. This PR does two things: - It bypasses AOTAutogradCache when freezing is turned on. We should have always been doing this. - It refactors the code to be way more clear about the constants we're using and when we're using them. Basically, there are two possible sets of constants we can grab from the compiled fx graph. 1. If freezing is turned off, we save the constants directly in CompiledFxGraph. 2. If freezing is turned on, we save the names of the constants in CompiledFxGraph, and use the runtime GraphModule's actual constant values: we reconstruct them from the saved names + the new graph module from AOTDispatch. We implement two different classes for doing just this: one that has access to the post aotdispatch gm, which supports freezing, and one that doesn't have it, which does not support freezing. Then we construct the wrappers and unwrap the result as needed. This makes it clear that the gm passed to AOTAutogradCache is not part of post compile, only the cache key generated from it is. The whole flow is pretty confusing, but hopefully this gives us better types and static information for understanding what the different codepaths are doing. Will add a specific AOTAutogradCache to confirm we bypass freezing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141897 Approved by: https://github.com/ezyang, https://github.com/masnesral	2024-12-05 00:34:14 +00:00
Edward Z. Yang	7666c8263a	[REFACTOR] Inline FxGraphCache.post_compile into sole call site (#141877 ) I am going to break apart the arguments passed to the constituents to only pass exactly what is needed, so easy access to the insides is helpful here. This also moves two helper functions to output_code.py as well. Also set _boxed_call at constructor. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/141877 Approved by: https://github.com/jamesjwu, https://github.com/jansel Co-authored-by: James Wu <jjwu@meta.com>	2024-12-04 14:21:04 +00:00
IvanKobzarev	f85e238186	[aotd] capture rrelu_with_noise noise mutation in compile (#141867 ) Rebase-copy of long standing already approved PR https://github.com/pytorch/pytorch/pull/138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) Corresponding xla PR: https://github.com/pytorch/xla/pull/8363 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141867 Approved by: https://github.com/bdhirsh	2024-12-04 12:18:58 +00:00
PyTorch MergeBot	2999dbfd21	Revert "[REFACTOR] Inline FxGraphCache.post_compile into sole call site (#141877 )" This reverts commit `3ab4a28eaa`. Reverted https://github.com/pytorch/pytorch/pull/141877 on behalf of https://github.com/huydhn due to Job are failing en masse after this lands, so it looks like a land race ([comment](https://github.com/pytorch/pytorch/pull/141877#issuecomment-2513552752))	2024-12-03 04:57:58 +00:00
Edward Z. Yang	3ab4a28eaa	[REFACTOR] Inline FxGraphCache.post_compile into sole call site (#141877 ) I am going to break apart the arguments passed to the constituents to only pass exactly what is needed, so easy access to the insides is helpful here. This also moves two helper functions to output_code.py as well. Also set _boxed_call at constructor. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/141877 Approved by: https://github.com/jamesjwu, https://github.com/jansel Co-authored-by: James Wu <jjwu@meta.com>	2024-12-03 03:48:23 +00:00
IvanKobzarev	225d3f4495	[subclasses] Subclass parameterization to not wrap-unwrap on forward (#140632 ) One of the common use cases of tensor Subclasses is to replace all model Parameters with Subclass that provides alternative implementation of common ops. E.g. quantization replaces weights to QuantizedSubclass. AotAutograd lifts up Parameters to graph arguments and wraps graph execution at runtime with wrapping/unwrapping of those subclasses. Even if one unwrapping is not critically big ~14us, when we have to unwrap/wrap all linear weights, that could result in substantial addition to runtime, which can be more than compiled region execution time. E.g. 20 weights * 14us = 0.3ms. This is parametrization to unwrap tensor subclasses, that is used in torch.ao: https://github.com/pytorch/ao/blob/main/torchao/utils.py#L294 It adds parametrization to unwrap tensor subclasses to plain tensors. As a result the registered parameters are changed (all registered parameters will become plain tensors) and state_dict is not compatible before/after transformation. This transformation is used before dynamo and does breaking changes, so we keep it for user to be used explicitly. Testing: ``` TORCH_LOGS="graph_code,aot" python test/functorch/test_aotdispatch.py -k test_subclass_parameters ``` ``` TORCH_LOGS="graph_code,aot,export" python test/dynamo/test_export.py -k test_subclass_parameters ``` ``` TRACED GRAPH ===== pre insert_deferred_runtime_asserts __compiled_fn_1 ===== <eval_with_key>.0 class GraphModule(torch.nn.Module): def forward(self, L_self_modules_parametrizations_modules_p1_parameters_original0_: "f32[3, 4]", L_x_: "f32[3, 4]", L_self_modules_parametrizations_modules_p2_parameters_original0_: "f32[3, 4]", L_self_modules_parametrizations_modules_p2_parameters_original1_: "f32[3, 4]"): l_self_modules_parametrizations_modules_p1_parameters_original0_ = L_self_modules_parametrizations_modules_p1_parameters_original0_ l_x_ = L_x_ l_self_modules_parametrizations_modules_p2_parameters_original0_ = L_self_modules_parametrizations_modules_p2_parameters_original0_ l_self_modules_parametrizations_modules_p2_parameters_original1_ = L_self_modules_parametrizations_modules_p2_parameters_original1_ # File: /data/users/ivankobzarev/a/pytorch/torch/testing/_internal/subclasses.py:42 in __tensor_unflatten__, code: return WrapperSubclass(a, outer_size, outer_stride) rebuilt: "f32[3, 4]" = torch.testing._internal.subclasses.WrapperSubclass(l_self_modules_parametrizations_modules_p1_parameters_original0_, None, None); l_self_modules_parametrizations_modules_p1_parameters_original0_ = None # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6301 in forward, code: return x + 2 * self.p1 + self.p2 mul: "f32[3, 4]" = 2 * rebuilt; rebuilt = None add: "f32[3, 4]" = l_x_ + mul; l_x_ = mul = None # File: /data/users/ivankobzarev/a/pytorch/torch/testing/_internal/two_tensor.py:58 in __tensor_unflatten__, code: return TwoTensor(a, b, outer_size, outer_stride) rebuilt_1: "f32[3, 4]" = torch.testing._internal.two_tensor.TwoTensor(l_self_modules_parametrizations_modules_p2_parameters_original0_, l_self_modules_parametrizations_modules_p2_parameters_original1_, None, None); l_self_modules_parametrizations_modules_p2_parameters_original0_ = l_self_modules_parametrizations_modules_p2_parameters_original1_ = None # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6301 in forward, code: return x + 2 * self.p1 + self.p2 add_1: "f32[3, 4]" = add + rebuilt_1; add = rebuilt_1 = None return (add_1,) ACED GRAPH ==== __compiled_fn_1 ===== data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module): def forward(self, L_self_modules_parametrizations_modules_p1_parameters_original0_: "f32[3, 4][4, 1]cpu", L_x_: "f32[3, 4][4, 1]cpu", L_self_modules_parametrizations_modules_p2_parameters_original0_: "f32[3, 4][4, 1]cpu", L_self_modules_parametrizations_modules_p2_parameters_original1_: "f32[3, 4][4, 1]cpu"): l_self_modules_parametrizations_modules_p1_parameters_original0_ = L_self_modules_parametrizations_modules_p1_parameters_original0_ l_x_ = L_x_ l_self_modules_parametrizations_modules_p2_parameters_original0_ = L_self_modules_parametrizations_modules_p2_parameters_original0_ l_self_modules_parametrizations_modules_p2_parameters_original1_ = L_self_modules_parametrizations_modules_p2_parameters_original1_ # File: /data/users/ivankobzarev/a/pytorch/torch/testing/_internal/subclasses.py:42 in __tensor_unflatten__, code: return WrapperSubclass(a, outer_size, outer_stride) rebuilt: "f32[3, 4][4, 1]cpu" = torch.testing._internal.subclasses.WrapperSubclass(l_self_modules_parametrizations_modules_p1_parameters_original0_, None, None); l_self_modules_parametrizations_modules_p1_parameters_original0_ = None # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6301 in forward, code: return x + 2 * self.p1 + self.p2 mul: "f32[3, 4][4, 1]cpu" = 2 * rebuilt; rebuilt = None add: "f32[3, 4][4, 1]cpu" = l_x_ + mul; l_x_ = mul = None # File: /data/users/ivankobzarev/a/pytorch/torch/testing/_internal/two_tensor.py:58 in __tensor_unflatten__, code: return TwoTensor(a, b, outer_size, outer_stride) rebuilt_1: "f32[3, 4][4, 1]cpu" = torch.testing._internal.two_tensor.TwoTensor(l_self_modules_parametrizations_modules_p2_parameters_original0_, l_self_modules_parametrizations_modules_p2_parameters_original1_, None, None); l_self_modules_parametrizations_modules_p2_parameters_original0_ = l_self_modules_parametrizations_modules_p2_parameters_original1_ = None # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6301 in forward, code: return x + 2 * self.p1 + self.p2 add_1: "f32[3, 4][4, 1]cpu" = add + rebuilt_1; add = rebuilt_1 = None return (add_1,) .py:381] [0/0] [__aot_joint_graph] TRACED GRAPH .py:381] [0/0] [__aot_joint_graph] ===== Joint graph 0 ===== .py:381] [0/0] [__aot_joint_graph] /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class joint_fn(torch.nn.Module): .py:381] [0/0] [__aot_joint_graph] def forward(self, primals, tangents): .py:381] [0/0] [__aot_joint_graph] primals_1: "f32[3, 4][4, 1]cpu"; primals_2: "f32[3, 4][4, 1]cpu"; primals_3: "f32[3, 4][4, 1]cpu"; primals_4: "f32[3, 4][4, 1]cpu"; tangents_1: "f32[3, 4][4, 1]cpu"; tangents_2: "f32[3, 4][4, 1]cpu"; .py:381] [0/0] [__aot_joint_graph] .py:381] [0/0] [__aot_joint_graph] primals_1, primals_2, primals_3, primals_4, tangents_1, tangents_2, = fx_pytree.tree_flatten_spec([primals, tangents], self._in_spec) .py:381] [0/0] [__aot_joint_graph] # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6301 in forward, code: return x + 2 * self.p1 + self.p2 .py:381] [0/0] [__aot_joint_graph] mul: "f32[3, 4][4, 1]cpu" = torch.ops.aten.mul.Tensor(primals_1, 2); primals_1 = None .py:381] [0/0] [__aot_joint_graph] add: "f32[3, 4][4, 1]cpu" = torch.ops.aten.add.Tensor(primals_2, mul); primals_2 = mul = None .py:381] [0/0] [__aot_joint_graph] add_1: "f32[3, 4][4, 1]cpu" = torch.ops.aten.add.Tensor(add, primals_3); primals_3 = None .py:381] [0/0] [__aot_joint_graph] add_2: "f32[3, 4][4, 1]cpu" = torch.ops.aten.add.Tensor(add, primals_4); add = primals_4 = None .py:381] [0/0] [__aot_joint_graph] return pytree.tree_unflatten([add_1, add_2, None, None, None, None], self._out_spec) .py:381] [0/0] [__aot_joint_graph] .py:381] [0/0] [__aot_joint_graph] graph_code] TRACED GRAPH graph_code] ===== tensorify_python_scalars ===== graph_code] /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class joint_fn(torch.nn.Module): graph_code] def forward(self, primals, tangents): graph_code] primals_1: "f32[3, 4]"; primals_2: "f32[3, 4]"; primals_3: "f32[3, 4]"; primals_4: "f32[3, 4]"; tangents_1: "f32[3, 4]"; tangents_2: "f32[3, 4]"; graph_code] graph_code] primals_1, primals_2, primals_3, primals_4, tangents_1, tangents_2, = fx_pytree.tree_flatten_spec([primals, tangents], self._in_spec) graph_code] # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6301 in forward, code: return x + 2 * self.p1 + self.p2 graph_code] mul: "f32[3, 4]" = torch.ops.aten.mul.Tensor(primals_1, 2); primals_1 = None graph_code] add: "f32[3, 4]" = torch.ops.aten.add.Tensor(primals_2, mul); primals_2 = mul = None graph_code] add_1: "f32[3, 4]" = torch.ops.aten.add.Tensor(add, primals_3); primals_3 = None graph_code] add_2: "f32[3, 4]" = torch.ops.aten.add.Tensor(add, primals_4); add = primals_4 = None graph_code] return pytree.tree_unflatten([add_1, add_2, None, None, None, None], self._out_spec) graph_code] graph_code] .py:463] [0/0] [__aot_graphs] aot_config id: 0, fw_metadata=ViewAndMutationMeta(input_info=[InputAliasInfo(is_leaf=True, mutates_data=False, mutates_metadata=False, mutations_hidden_from_autograd=True, mutations_under_no_grad_or_inference_mode=False, mutation_inductor_storage_resize=False, mutates_storage_metadata=False, requires_grad=True, keep_input_mutations=True), InputAliasInfo(is_leaf=True, mutates_data=False, mutates_metadata=False, mutations_hidden_from_autograd=True, mutations_under_no_grad_or_inference_mode=False, mutation_inductor_storage_resize=False, mutates_storage_metadata=False, requires_grad=False, keep_input_mutations=True), InputAliasInfo(is_leaf=True, mutates_data=False, mutates_metadata=False, mutations_hidden_from_autograd=True, mutations_under_no_grad_or_inference_mode=False, mutation_inductor_storage_resize=False, mutates_storage_metadata=False, requires_grad=True, keep_input_mutations=True), InputAliasInfo(is_leaf=True, mutates_data=False, mutates_metadata=False, mutations_hidden_from_autograd=True, mutations_under_no_grad_or_inference_mode=False, mutation_inductor_storage_resize=False, mutates_storage_metadata=False, requires_grad=True, keep_input_mutations=True)], output_info=[OutputAliasInfo(output_type=<OutputType.non_alias: 1>, raw_type=<class 'torch.testing._internal.subclasses.WrapperSubclass'>, base_idx=None, dynamic_dims=set(), requires_grad=True, functional_tensor=None)], num_intermediate_bases=0, keep_input_mutations=True, traced_tangents=[WrapperSubclass(TwoTensor(FakeTensor(..., size=(3, 4)), FakeTensor(..., size=(3, 4))))], subclass_inp_meta=[PlainTensorMeta(unwrapped_idx=0, memory_format=None), PlainTensorMeta(unwrapped_idx=1, memory_format=None), PlainTensorMeta(unwrapped_idx=2, memory_format=None), PlainTensorMeta(unwrapped_idx=3, memory_format=None)], subclass_fw_graph_out_meta=[SubclassCreationMeta(flat_tensor_start_idx=0, arg_count=2, included_subclass_symints=True, attrs={'a': SubclassCreationMeta(flat_tensor_start_idx=0, arg_count=2, included_subclass_symints=True, attrs={'a': PlainTensorMeta(unwrapped_idx=1, memory_format=None), 'b': PlainTensorMeta(unwrapped_idx=2, memory_format=None)}, outer_size=torch.Size([3, 4]), outer_stride=(4, 1), meta=None, original_subclass=TwoTensor(FakeTensor(..., size=(3, 4)), FakeTensor(..., size=(3, 4))), original_subclass_type=None, memory_format=None)}, outer_size=torch.Size([3, 4]), outer_stride=(4, 1), meta=None, original_subclass=WrapperSubclass(TwoTensor(FakeTensor(..., size=(3, 4)), FakeTensor(..., size=(3, 4)))), original_subclass_type=None, memory_format=None)], subclass_tangent_meta=[SubclassCreationMeta(flat_tensor_start_idx=0, arg_count=2, included_subclass_symints=False, attrs={'a': SubclassCreationMeta(flat_tensor_start_idx=0, arg_count=2, included_subclass_symints=False, attrs={'a': PlainTensorMeta(unwrapped_idx=1, memory_format=torch.contiguous_format), 'b': PlainTensorMeta(unwrapped_idx=2, memory_format=torch.contiguous_format)}, outer_size=torch.Size([3, 4]), outer_stride=(4, 1), meta=None, original_subclass=TwoTensor(FakeTensor(..., size=(3, 4)), FakeTensor(..., size=(3, 4))), original_subclass_type=None, memory_format=torch.contiguous_format)}, outer_size=torch.Size([3, 4]), outer_stride=(4, 1), meta=None, original_subclass=WrapperSubclass(TwoTensor(FakeTensor(..., size=(3, 4)), FakeTensor(..., size=(3, 4)))), original_subclass_type=None, memory_format=torch.contiguous_format)], is_train=True, traced_tangent_metas=None, num_symints_saved_for_bw=0, grad_enabled_mutation=None, deterministic=False, static_input_indices=[0, 2, 3], tokens={}, indices_of_inputs_that_requires_grad_with_mutations_in_bw=[], bw_donated_idxs=[], num_backward_tokens=0), inner_meta=ViewAndMutationMeta(input_info=[InputAliasInfo(is_leaf=True, mutates_data=False, mutates_metadata=False, mutations_hidden_from_autograd=True, mutations_under_no_grad_or_inference_mode=False, mutation_inductor_storage_resize=False, mutates_storage_metadata=False, requires_grad=True, keep_input_mutations=True), InputAliasInfo(is_leaf=True, mutates_data=False, mutates_metadata=False, mutations_hidden_from_autograd=True, mutations_under_no_grad_or_inference_mode=False, mutation_inductor_storage_resize=False, mutates_storage_metadata=False, requires_grad=False, keep_input_mutations=True), InputAliasInfo(is_leaf=True, mutates_data=False, mutates_metadata=False, mutations_hidden_from_autograd=True, mutations_under_no_grad_or_inference_mode=False, mutation_inductor_storage_resize=False, mutates_storage_metadata=False, requires_grad=True, keep_input_mutations=True), InputAliasInfo(is_leaf=True, mutates_data=False, mutates_metadata=False, mutations_hidden_from_autograd=True, mutations_under_no_grad_or_inference_mode=False, mutation_inductor_storage_resize=False, mutates_storage_metadata=False, requires_grad=True, keep_input_mutations=True)], output_info=[OutputAliasInfo(output_type=<OutputType.non_alias: 1>, raw_type=<class 'torch._subclasses.functional_tensor.FunctionalTensor'>, base_idx=None, dynamic_dims=set(), requires_grad=False, functional_tensor=None), OutputAliasInfo(output_type=<OutputType.non_alias: 1>, raw_type=<class 'torch._subclasses.functional_tensor.FunctionalTensor'>, base_idx=None, dynamic_dims=set(), requires_grad=False, functional_tensor=None)], num_intermediate_bases=0, keep_input_mutations=True, traced_tangents=[], subclass_inp_meta=[PlainTensorMeta(unwrapped_idx=0, memory_format=None), PlainTensorMeta(unwrapped_idx=1, memory_format=None), PlainTensorMeta(unwrapped_idx=2, memory_format=None), PlainTensorMeta(unwrapped_idx=3, memory_format=None)], subclass_fw_graph_out_meta=[PlainTensorMeta(unwrapped_idx=0, memory_format=None), PlainTensorMeta(unwrapped_idx=1, memory_format=None)], subclass_tangent_meta=[], is_train=True, traced_tangent_metas=None, num_symints_saved_for_bw=0, grad_enabled_mutation=None, deterministic=None, static_input_indices=[0], tokens={}, indices_of_inputs_that_requires_grad_with_mutations_in_bw=[], bw_donated_idxs=[], num_backward_tokens=0) .py:569] [0/0] [__aot_graphs] TRACED GRAPH .py:569] [0/0] [__aot_graphs] ===== Forward graph 0 ===== .py:569] [0/0] [__aot_graphs] /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module): .py:569] [0/0] [__aot_graphs] def forward(self, primals_1: "f32[3, 4][4, 1]cpu", primals_2: "f32[3, 4][4, 1]cpu", primals_3: "f32[3, 4][4, 1]cpu", primals_4: "f32[3, 4][4, 1]cpu"): .py:569] [0/0] [__aot_graphs] # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6301 in forward, code: return x + 2 * self.p1 + self.p2 .py:569] [0/0] [__aot_graphs] mul: "f32[3, 4][4, 1]cpu" = torch.ops.aten.mul.Tensor(primals_1, 2); primals_1 = None .py:569] [0/0] [__aot_graphs] add: "f32[3, 4][4, 1]cpu" = torch.ops.aten.add.Tensor(primals_2, mul); primals_2 = mul = None .py:569] [0/0] [__aot_graphs] add_1: "f32[3, 4][4, 1]cpu" = torch.ops.aten.add.Tensor(add, primals_3); primals_3 = None .py:569] [0/0] [__aot_graphs] add_2: "f32[3, 4][4, 1]cpu" = torch.ops.aten.add.Tensor(add, primals_4); add = primals_4 = None .py:569] [0/0] [__aot_graphs] return (add_1, add_2) .py:569] [0/0] [__aot_graphs] .py:569] [0/0] [__aot_graphs] .py:580] [0/0] [__aot_graphs] TRACED GRAPH .py:580] [0/0] [__aot_graphs] ===== Backward graph 0 ===== .py:580] [0/0] [__aot_graphs] /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module): .py:580] [0/0] [__aot_graphs] def forward(self, tangents_1: "f32[3, 4][4, 1]cpu", tangents_2: "f32[3, 4][4, 1]cpu"): .py:580] [0/0] [__aot_graphs] return (None, None, None, None) .py:580] [0/0] [__aot_graphs] .py:580] [0/0] [__aot_graphs] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140632 Approved by: https://github.com/bdhirsh	2024-11-21 01:09:33 +00:00
Aleksei Nikiforov	a82bab6419	Run only listed tests on s390x (#140265 ) Skip tests that are failing This was previously part of https://github.com/pytorch/pytorch/pull/125401 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140265 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-11-20 22:53:09 +00:00
zeshengzong	cb71bcc542	Replace clone.detach with detach.clone (#140264 ) Fixes #64532 As state in issue, replace `clone.detach` by `detach.clone` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264 Approved by: https://github.com/soulitzer	2024-11-13 07:01:02 +00:00
IvanKobzarev	781c68c865	[aotd] coerce_same_metadata_as_tangent with expected_type for e.g.AsyncCollectiveTensor (#139095 ) Based on discussion here: https://github.com/pytorch/pytorch/pull/138731 Introducing ability for subclass implement type convertion to expected_type. ``` def __coerce_same_metadata_as_tangent__( self, expected_metadata: Any, expected_type: Optional[Type] = None ): ``` Here if `expected_type=None` means `SubclassClass` is expected. E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case `expected_type=Tensor` will be called during runtime Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139095 Approved by: https://github.com/bdhirsh	2024-11-07 16:24:48 +00:00
Yukio Siraichi	a3cb8ee38b	AOTAutograd: Make general `SymInt` hashable when merging view inputs. (#139553 ) Fix: #139111 This PR wraps `SymInt` input arguments with `SymIntEqByExpr`, making them hashable when merging view inputs (`merge_view_inputs` function). Pull Request resolved: https://github.com/pytorch/pytorch/pull/139553 Approved by: https://github.com/ezyang	2024-11-02 23:57:11 +00:00
IvanKobzarev	d33849908d	[aotd] Fuse tangents subclasses runtime traversals (#139068 ) Reason: Currently we have multiple traversals for tangents in runtime: - To check that types and structure are identical to what we guessed during tracing time - Coerce metadata - Coerce memory_format - Unwrap_tensor_subclass All of them are traversing tangents via __tensor_flatten__ calls the tree of Subclasses. Change: To do everything in one traversal at runtime (including flattening) Implementation details: Add memory_format information inside SubclassCreationMeta, for PlainTensors keep not only (int) of unwrapped_index, but memory_format too. Preparing memory_format is optional (controlled by with_memory_format=True). 2. Removing unused subclass_utils.create_metadata_for_subclass which does not have any usages inside torch and would require update of the logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139068 Approved by: https://github.com/bdhirsh	2024-11-01 00:03:02 +00:00
PyTorch MergeBot	6f66398ab8	Revert "[aotd] Unwrap unseen AsyncCollectiveTensor tangents (#138731 )" This reverts commit `245026af2d`. Reverted https://github.com/pytorch/pytorch/pull/138731 on behalf of https://github.com/jeanschmidt due to introduced regressions on linux-focal-cuda12.1-py3.10-gcc9-bazel-test ([comment](https://github.com/pytorch/pytorch/pull/138731#issuecomment-2438417669))	2024-10-25 17:37:32 +00:00
IvanKobzarev	245026af2d	[aotd] Unwrap unseen AsyncCollectiveTensor tangents (#138731 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138731 Approved by: https://github.com/bdhirsh	2024-10-25 12:35:52 +00:00
Brian Hirsh	53af729a66	add meta for _segment_reduce_backward (#137442 ) reland of https://github.com/pytorch/pytorch/pull/124988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137442 Approved by: https://github.com/albanD	2024-10-08 18:40:06 +00:00
IvanKobzarev	34d788ffb0	[aotd] Do not force contiguous() for channels_last (#135225 ) Original Issue: https://github.com/pytorch/pytorch/issues/134644 We assume trace_tangents to have the same memory_format as inputs, outputs, intermediate during first tracing. => Tracing time: - Store trace_tangents_memory_formats in metadata - Coerce tangents to deduced memory_format Runtime: - Coerce tangents to tracing memory format from metadata Subclasses logic: - Previously coercing tangents logic did not handle nested subclasses case, fixing this. For Subclasses we deduce memory format for subclass_tensor first, then for each element of subclass: [subclass_tensor_memory_format, subclass_tensor_elem0_memory_format, ... ] If subclass element (__tensor_flatten__[0] tensors) is also subclass => on its place we will have a nested list of the same structure. The recursive traversal of subclass tree is expensive. So we do memory format deduction and coercing at the same time, to keep only one traverse for this. With this approach there is no regression in comparison with previous logic which also does one traversal. (`coerce_tangent_and_suggest_memory_format` method). Other small change: Remove duplicated not-related comment. Testing ``` python test/functorch/test_aotdispatch.py -k test_channels_last_grads_no_force_contiguous ``` Benchmarking: After change: ``` └─ $ PYTORCH_AOTD_DEBUG_PROFILE=1 python test/functorch/test_aotdispatch.py -k test_benchmark_grads_no_force_contiguous Benchmark SUBCLASS avg_bwd_duration:4.059906005859375 ms Benchmark NO_SUBCLASS avg_bwd_duration:3.1563830375671387 ms ``` Before change: ``` BEFORE_CHANGE SUBCLASS 4.1194 ``` No siginificant changes in processing time. (We do single traverse of subclass tree for collecting memory_formats and coercing during tracing.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135225 Approved by: https://github.com/bdhirsh	2024-09-27 15:01:20 +00:00
IvanKobzarev	9581508383	[aotd] Cleanup on subclasses in inductor freezing (#136549 ) Cleanup: 1/ We do not need to unwrap_subclasses() in freezing wrapper, as it will be wrapped by AOTD wrappers which inclused SubclassesWrapper 2/ No need to use weakreferences for unwrapped list, dynamo optimizers need to clean unwrapped list along with original params_flat. Verfified fbcode tests compiled_optimizers Differential Revision: [D63393651](https://our.internmc.facebook.com/intern/diff/D63393651) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136549 Approved by: https://github.com/bdhirsh	2024-09-27 11:20:03 +00:00
IvanKobzarev	370c1c4297	[aotd] Fix rrelu compilation (#136008 ) Issues: https://github.com/pytorch/pytorch/issues/135083 https://github.com/pytorch/pytorch/issues/120292 rrelu decomposition contains mutation, copy_. Decompositions are executed below Functionalization, as a result AOT produces non-functional graph. Also that decomposition is registered as python_dispatch kernel for AutogradCUDA. Autograd dispatch happens above Functionalization, so registering it for Autograd to handle all backends makes functionalization running after this. Testing: ``` python test/functorch/test_aotdispatch.py -k test_rrelu ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136008 Approved by: https://github.com/bdhirsh	2024-09-25 11:26:19 +00:00
IvanKobzarev	342c031f0e	[aotd] Fix freezing API for subclasses (#136265 ) Original issue: https://github.com/pytorch/ao/issues/890 The problem: TracingContext.flat_params contain original params, with not desugared Subclasses. While inductor.freezing API works on aot graphs, which already desugared Subclasses. flat_params are used only for this logic and storing in them desguared subclasses fixes the issue. Testing: ``` python test/functorch/test_aotdispatch.py -k test_inductor_freezing_with_subclasses ``` Torch AO original failure: ``` python test/integration/test_integration.py -k test_int8_weight_only_quant_with_freeze ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136265 Approved by: https://github.com/bdhirsh	2024-09-24 13:15:01 +00:00
James Wu	4649aeaebf	Make AOTAutogradCache support remote FXGraphCache (#136173 ) Summary: After the previous refactor, we can now call load_with_key directly from AOTAutogradCache to use the remote FXGraphCache. This does not implement a remote AOTAutogradCache. It just allows AOTAutogradCache to work with remote FXGraphCache. Test Plan: (Meta only tests) Reviewed By: aorenste Differential Revision: D62384944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136173 Approved by: https://github.com/oulgen	2024-09-23 17:24:27 +00:00
PyTorch MergeBot	df6a8fa1eb	Revert "[aotd] Fix freezing API for subclasses (#136265 )" This reverts commit `cdef760560`. Reverted https://github.com/pytorch/pytorch/pull/136265 on behalf of https://github.com/atalman due to Breaks internal CI sorry, need to revert ([comment](https://github.com/pytorch/pytorch/pull/136265#issuecomment-2368772574))	2024-09-23 16:25:05 +00:00
IvanKobzarev	cdef760560	[aotd] Fix freezing API for subclasses (#136265 ) Original issue: https://github.com/pytorch/ao/issues/890 The problem: TracingContext.flat_params contain original params, with not desugared Subclasses. While inductor.freezing API works on aot graphs, which already desugared Subclasses. flat_params are used only for this logic and storing in them desguared subclasses fixes the issue. Testing: ``` python test/functorch/test_aotdispatch.py -k test_inductor_freezing_with_subclasses ``` Torch AO original failure: ``` python test/integration/test_integration.py -k test_int8_weight_only_quant_with_freeze ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136265 Approved by: https://github.com/bdhirsh	2024-09-20 16:32:49 +00:00

1 2 3 4 5 ...

467 Commits