pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Simon Fan	7e0edafe86	[compiled autograd][dynamo] improve lifted autograd.Function.backward handling and fallback to pseudo-eager (#125661 ) - `FakeContext` hides all fields other than ctx.saved_tensors, this dynamo errors when the autograd.Function.backward uses other attrs on ctx and it also doesn't allow fallback to eager. - If we remove it, we still can't fallback to eager: node variables are already freed (ctx.saved_tensors throws) - However, we can fallback to "pseudo-eager" by using a duck-typed ctx and routing the ctx.saved_tensors to lifted tensors - Dynamo tries to inline external_utils.call_backward, treats BackwardCFunction as a AutogradFunctionContextVariable (only used up until we create the fake context: FakeBackwardCFunction) - we call_function backward from the forward class AutogradFunctionVariable, and we still pass in the fake context as a UserDefinedObjectVariable (can later use AutogradFunctionContextVariable + HOO graph speculate) Fixes #125489 #124827 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125661 Approved by: https://github.com/jansel	2024-05-08 21:00:37 +00:00
Yu, Guangye	d17be10df1	make torch.amp.autocast more generic (#125103 ) # Motivation As discussed in [#124479](https://github.com/pytorch/pytorch/pull/124479), `torch.amp.autocast` can NOT be completely equivalent to `torch.cuda.amp.autocast` and `torch.cpu.amp.autocast` since `torch.amp.autocast` has NOT the default `dtype` for CPU (`torch.bfloat16` by default) and CUDA (`torch.float16` by default) respectively. We would like `torch.amp.autocast` to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast `torch.xxx.amp.autocast` for each device backend. # Solution When `None` is passed to `dtype`, we should use `torch.get_autocast_dtype` to get the related dtype for each backend. Meanwhile, `torch.get_autocast_dtype` is necessary to be supported in JIT path for BC. # Additional Context With this PR, `torch.amp.autocast(device_type='cuda')` is equivalent to `torch.cuda.amp.autocast`. Add two new UTs to cover this change in eager and jit path respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125103 Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/gujinghui	2024-05-08 12:13:26 +00:00
ydwu4	461ffaaaf3	[dynamo] support torchbind object input (#124978 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124978 Approved by: https://github.com/jansel	2024-05-07 03:02:00 +00:00
Edward Z. Yang	b6bcd09173	Get rid of tabular and sizes, beef up verbosity of output graph (#125507 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125507 Approved by: https://github.com/Chillee, https://github.com/jansel ghstack dependencies: #125505	2024-05-06 13:41:58 +00:00
Aaron Gokaslan	1dd42e42c4	[BE]: Try TCH autofixes on torch/ (#125536 ) Tries TCH autofixes and see what breaks Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536 Approved by: https://github.com/ezyang	2024-05-05 23:13:59 +00:00
Edward Z. Yang	650a248d3e	Rename is_unspecialized to pass_arg_as_tensor, add comment (#125496 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125496 Approved by: https://github.com/lezcano ghstack dependencies: #125395, #125419, #125483, #125494	2024-05-05 16:57:50 +00:00
Animesh Jain	071ee40793	[dynamo][nn module] Check for duplicate tensors in register_attr_or_module (#125421 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125421 Approved by: https://github.com/mlazos ghstack dependencies: #125439	2024-05-03 05:08:09 +00:00
Animesh Jain	e68d65dae2	[dynamo][cpp-guards] Differentiate dict guards wrt to guarding on key order (#124779 ) We guard on key order 1) When a key is a non-constant object 2) When we actually need key order - like .values, .items etc For dicts/OrderedDicts that do not require key order guarding, we just rely on usual `GuardManger + DictGetItemGuardAccessor`. This is faster than going through the `list(d.keys())` based design for OrderedDicts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124779 Approved by: https://github.com/jansel	2024-04-25 08:20:35 +00:00
Animesh Jain	59a1f1f308	[dynamo][inline inbuilt nn modules] Do not inline for export (#124814 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124814 Approved by: https://github.com/jansel	2024-04-25 06:35:31 +00:00
Yu, Guangye	25f321b84f	Refactor autocast C++ APIs to be device-agnostic (#124359 ) # Motivation This PR aims to refactor autocast C++ APIs to be device-agnostic and deprecate the device-specific autocast C++ APIs. In C++ side, - `is_enabled()` -> `is_enabled(device_type)`. - `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`. - `get_autocast_dtype()` -> `get_autocast_dtype(device_type)` - `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)` These following C++ APIs are deprecated and should be removed in PyTorch 2.5 - `is_cpu_enabled` - `set_cpu_enabled` - `get_autocast_cpu_dtype` - `set_autocast_cpu_dtype` - `is_xpu_enabled` - `set_xpu_enabled` - `get_autocast_xpu_dtype` - `set_autocast_xpu_dtype` - `is_ipu_enabled` - `set_ipu_enabled` - `get_autocast_ipu_dtype` - `set_autocast_ipu_dtype` - `is_hpu_enabled` - `set_hpu_enabled` - `get_autocast_hpu_dtype` - `set_autocast_hpu_dtype` - `is_xla_enabled` - `set_xla_enabled` - `get_autocast_xla_dtype` - `set_autocast_xla_dtype` - `is_privateuseone_enabled` - `set_privateuseone_enabled` - `get_autocast_privateuseone_dtype` - `set_autocast_privateuseone_dtype` In Python side, provide 4 generic autocast APIs: - `torch.is_autocast_enabled(device_type)` - `torch.set_autocast_enabled(device_type, new_enabled)` - `torch.get_autocast_dtype(device_type)` - `torch.set_autocast_dtype(device_type, dtype)` # Additional Context We will submit another PR to refactor autocast Python APIs based on this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124359 Approved by: https://github.com/jgong5, https://github.com/albanD	2024-04-23 10:38:50 +00:00
Boyuan Feng	aa2da0cdd2	[Export] Add runtime assert to non-strict export (#123681 ) This PR moves insert_deferred_runtime_asserts from dynamo to torch.fx.passes and uses it to add runtime assertion for non-strict export. Differential Revision: D55944267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123681 Approved by: https://github.com/tugsbayasgalan, https://github.com/angelayi	2024-04-18 16:13:27 +00:00
Edward Z. Yang	bebdbb63ce	Introduce set_example_value and use it throughout Dynamo (#124176 ) I'm going to setup some extra behavior when we set example value, so I need a convenient place to interpose. I cannot easily do it on meta itself because its a generic dict with no interposition point. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124176 Approved by: https://github.com/oulgen ghstack dependencies: #124105, #124059	2024-04-17 22:57:11 +00:00
Simon Fan	67bd43b510	[compiled autograd][dynamo] use aliases for stack restore when partial graphs steal inputs (#124127 ) same idea as https://github.com/pytorch/pytorch/pull/123359, but for when we restore stack variables after calling a partial graph: Illustrated by the test case: before: ```python def function(inputs): graph_out_0 = __compiled_fn_2(inputs) getitem_1 = graph_out_0[0] add = inputs[1] <---- error inputs is already cleared del graph_out_0 add_1 = add + getitem_1 add = None getitem_1 = None cpu = add_1.cpu() add_1 = None return (cpu,) ``` after: ```python def function(inputs): inputs_ref_0 = inputs[1] graph_out_1 = __compiled_fn_2(inputs) getitem_1 = graph_out_1[0] add = inputs_ref_0 del graph_out_1 add_1 = add + getitem_1 add = None getitem_1 = None cpu = add_1.cpu() add_1 = None return (cpu,) ``` Co-authored-by: Jason Ansel <jansel@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124127 Approved by: https://github.com/jansel	2024-04-16 17:01:34 +00:00
William Wen	9309580d69	[dynamo, 3.12] handle possibility of NULL local variables during graph breaks (#124095 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124095 Approved by: https://github.com/jansel	2024-04-16 08:44:43 +00:00
Animesh Jain	bb0c768c5b	[dynamo][refactor] Move LazyGraphModule handling (#124113 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124113 Approved by: https://github.com/jansel ghstack dependencies: #124078	2024-04-16 06:39:45 +00:00
Simon Fan	540b451e91	[compiled autograd][dynamo] Codegen aliases to keep grad mutated tensors alive (#123359 ) The current codegen is problematic if __compiled_fn_0 clears the inputs list, since we need it for assignment afterwards ```python def forward(inputs): __compiled_fn_0 = ... # The actual function needs to be provided graph_out_0 = __compiled_fn_0(inputs) # clears inputs temp_list = [] temp_list.append(graph_out_0[0]) inputs[4].grad = graph_out_0[1] # inputs is empty, index error inputs[7].grad = graph_out_0[2] inputs[8].grad = graph_out_0[3] inputs[9].grad = graph_out_0[3] del graph_out_0 return temp_list ``` With this fix, we use aliases to keep the tensors alive ```python def forward(inputs): __compiled_fn_0 = ... # The actual function needs to be provided inputs_ref_1 = inputs[9] inputs_ref_2 = inputs[4] inputs_ref_3 = inputs[8] inputs_ref_4 = inputs[7] graph_out_0 = __compiled_fn_0(inputs) temp_list = [] temp_list.append(graph_out_0[0]) inputs_ref_2.grad = graph_out_0[1] inputs_ref_4.grad = graph_out_0[2] inputs_ref_3.grad = graph_out_0[3] inputs_ref_1.grad = graph_out_0[3] del graph_out_0 return temp_list ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123359 Approved by: https://github.com/jansel ghstack dependencies: #123630, #123674, #122353	2024-04-12 10:29:09 +00:00
Animesh Jain	7283c37c98	[dynamo] Keep guards on global function (#123423 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123423 Approved by: https://github.com/jansel	2024-04-09 04:23:11 +00:00
Oguz Ulgen	287680176b	Use graph.find_nodes in dynamo (#122257 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122257 Approved by: https://github.com/jansel ghstack dependencies: #121565, #122255, #122256	2024-04-07 18:51:18 +00:00
Animesh Jain	8c84fe3c86	[dynamo][guards] Forward fix for #123302 (#123485 ) For some reason, adding a `TYPE_CHECK` in DATA_PTR_MATCH guard in https://github.com/pytorch/pytorch/issues/123302 increases optimizer guard overhead for `MT5ForConditionalGeneration` by 10x. There is nothing special about MT5. As we are going to move towards the CPP guards soon, there is no reason to investigate this deeper. We can use `ID_MATCH` instead of `DATA_PTR` match. Today both cant be serialized, so there is no one preference over the other. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123485 Approved by: https://github.com/mlazos	2024-04-06 02:34:06 +00:00
Guilherme Leobas	32f9453c2a	[dynamo] Emit FUNCTORCH_STACK_MATCH guard in vmap(compile(f)) case (#122786 ) Fixes: #122201 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122786 Approved by: https://github.com/zou3519	2024-04-05 15:04:16 +00:00
Michael Lazos	512759a3d7	Fix for tensor attribute missing (#123313 ) Tensors would sometimes be realized after we already registered attrs on the root nn module. This ensures all stack values are realized before registering attrs on the root nn module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123313 Approved by: https://github.com/anijain2305	2024-04-04 21:11:04 +00:00
rzou	fd60752786	Turn _allow_unsafe_data_ptr_access into a config option (#123291 ) We're not planning on having this flag around for very long (see deprecation in next PR), so it's better as a config option. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/123291 Approved by: https://github.com/eellison ghstack dependencies: #123261, #123282	2024-04-04 20:35:24 +00:00
Animesh Jain	5b45ec8892	[dynamo][guards] Use DATA_PTR instead of ID_MATCH for tensors (#123302 ) We should sparingly use ID_MATCH guards. When it comes to performance, ID_MATCH is much faster DATA_PTR for Python guards. However, the difference is very small in C++. So, its worth just using DATA_PTR_MATCH. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123302 Approved by: https://github.com/mlazos ghstack dependencies: #123285	2024-04-04 03:52:50 +00:00
Michael Lazos	3e2b7e6052	[dynamo][guard overhead] Data ptr guard optimizer state tensors (#122858 ) Stricter (but faster) guarding on optimizer state tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/122858 Approved by: https://github.com/anijain2305	2024-04-03 21:42:06 +00:00
Animesh Jain	5d0ac887b9	[dynamo][higher order ops] Make the subgraph sourceless (#123071 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123071 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #123046, #123058, #123059	2024-04-01 21:09:41 +00:00
William Wen	abe4a0e9eb	[dynamo] pop result of print reordering (#122744 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122744 Approved by: https://github.com/jansel ghstack dependencies: #122146, #122335, #122354, #122355, #122356, #122449, #122455, #122456, #122530, #122737, #122738, #122739, #122740, #122741, #122742, #122743	2024-03-27 20:39:39 +00:00
rzou	c81c9ba472	Disallow {FakeTensor,FunctionalTensor}.data_ptr (#122514 ) This PR: - disallows FakeTensor.data_ptr when it is called inside PT2 or fx tracing. - disallows FunctionalTensor.data_ptr (python FunctionalTensor is only used in PT2) The motivation behind this is that the leading cause of segfaults when using custom ops with PT2 is calling .data_ptr on FunctionalTensor or FakeTensor. This change is BC-breaking. If your code broke as a result of this, it's because there was a bug in it (these .data_ptr should never be accessed!). You can either fix the bug (recommended) or get the previous behavior back with: ``` from torch._subclasses.fake_tensor import FakeTensor from torch._subclasses.functional_tensor import FunctionalTensor data_ptr = 0 if isinstance(tensor, (FakeTensor, FunctionalTensor)) else tensor.data_ptr() ``` Test Plan: - existing tests Differential Revision: [D55366199](https://our.internmc.facebook.com/intern/diff/D55366199) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122514 Approved by: https://github.com/ezyang, https://github.com/albanD, https://github.com/yifuwang, https://github.com/kurtamohler	2024-03-26 23:55:42 +00:00
Joel Schlosser	07b618e2d4	Graph break cleanly in Dynamo for module parametrization (#121041 ) Fixes #118795 This is a graph breaking partial fix for #120914. We still need -actual- module parametrization tracing support, but at least it doesn't blow up hard now. Background: Module parametrization injects a property as the module parameter attribute that calls a `nn.Module` whose forward takes in a module parameter and returns a reparametrized module parameter. Example: ``` class MyParametrization(nn.Module): def forward(X): # This reparametrization just negates the original parameter value return -X m = nn.Linear(...) p = MyParametrization() register_parametrization(m, "weight", p) # Accessing the "weight" attribute will invoke p's forward() on m's original weight and return the output as the new weight. # m.weight here is now an injected property that does the above instead of an actual Parameter. # This property is defined in torch/nn/utils/parametrize.py. m.weight # NB: Parametrization changes the module type (e.g. torch.nn.utils.parametrize.ParametrizedLinear) print(type(m)) ``` Problem 1: Dynamo has special tracing rules for things in `torch.nn`. Parametrizing a module changes the type of the module and the parametrized attribute, so now these rules wrongly affect tracing here. To fix this: * For parametrized modules, call `convert_to_unspecialized()` to restart analysis where Dynamo starts inlining the module. Problem 2: The issue seen in #118795 is that Dynamo will see a dynamically constructed tensor when `m.weight` is called and introduce that to its `tensor_weakref_to_sizes_strides` cache during fake-ification. This tensor is also made to be a graph input, since it's a module parameter. When guards are created for this module parameter input, the logic calls `m.weight` again and tries to look the result up in the cache, but this is a different tensor now, giving the `KeyError` symptom. To fix this: * Replace Dynamo's `tensor_weakref_to_sizes_strides` cache with a `input_source_to_sizes_strides` cache. * This cache was originally introduced in #100128. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121041 Approved by: https://github.com/anijain2305	2024-03-26 23:44:51 +00:00
Yifu Wang	09cb42ce29	[dynamo] delete graph_out_{n} after restoring local vars (#122658 ) At graph breaks, we create a graph_out_{n} symbol to hold the graph output and use it to restore the local vars. In addition to their own symbols, the local vars are kept alive by the symbol we created. This means that if the graph break is the last usage of one of the symbols, the symbol would still be kept alive upon graph resumption. This PR: delete the graph_out_{n} symbol after restoring local vars so the lifetime of the local vars is governed by themselves. ## Example Problem Tensor `b`'s last usage is in the graph break. However, it won't be deallocated until `bar()` completes. In the orignal issue report by @Yuzhen11, `b` is a large tensor and `bar()` is an expensive computation. ```python import torch def foo(a): return torch.mm(a, a) @torch._dynamo.disable() def graph_break_fn(a): ret = a.bfloat16() return ret def bar(c): return torch.mm(c, c) def fn(a): b = foo(a) c = graph_break_fn(b) # del b return bar(c) fn_compiled = torch.compile(fn, backend="eager") a = torch.randn(10000, 10000, device="cuda", requires_grad=True) fn_compiled(a).sum().backward() ``` Bytecode before this PR: ``` ORIGINAL BYTECODE fn /home/yifu/microbench/del2.py line 18 19 0 LOAD_GLOBAL 0 (foo) 2 LOAD_FAST 0 (a) 4 CALL_FUNCTION 1 6 STORE_FAST 1 (b) 20 8 LOAD_GLOBAL 1 (graph_break_fn) 10 LOAD_FAST 1 (b) 12 CALL_FUNCTION 1 14 STORE_FAST 2 (c) 22 16 LOAD_GLOBAL 2 (bar) 18 LOAD_FAST 2 (c) 20 CALL_FUNCTION 1 22 RETURN_VALUE MODIFIED BYTECODE fn /home/yifu/microbench/del2.py line 18 18 0 LOAD_GLOBAL 3 (__compiled_fn_0) 2 LOAD_FAST 0 (a) 4 CALL_FUNCTION 1 6 STORE_FAST 3 (graph_out_0) 8 LOAD_GLOBAL 1 (graph_break_fn) 10 LOAD_FAST 3 (graph_out_0) 12 LOAD_CONST 1 (0) 14 BINARY_SUBSCR 20 16 CALL_FUNCTION 1 18 LOAD_GLOBAL 4 (__resume_at_14_1) 20 ROT_TWO 22 CALL_FUNCTION 1 24 RETURN_VALUE ORIGINAL BYTECODE torch_dynamo_resume_in_fn_at_20 /home/yifu/microbench/del2.py line 20 20 0 LOAD_FAST 0 (___stack0) 2 JUMP_ABSOLUTE 9 (to 18) 4 LOAD_GLOBAL 0 (foo) 6 LOAD_FAST 1 (a) 8 CALL_FUNCTION 1 10 STORE_FAST 2 (b) 12 LOAD_GLOBAL 1 (graph_break_fn) 14 LOAD_FAST 2 (b) 16 CALL_FUNCTION 1 >> 18 STORE_FAST 3 (c) 22 20 LOAD_GLOBAL 2 (bar) 22 LOAD_FAST 3 (c) 24 CALL_FUNCTION 1 26 RETURN_VALUE MODIFIED BYTECODE torch_dynamo_resume_in_fn_at_20 /home/yifu/microbench/del2.py line 20 20 0 LOAD_GLOBAL 3 (__compiled_fn_2) 2 LOAD_FAST 0 (___stack0) 4 CALL_FUNCTION 1 6 UNPACK_SEQUENCE 1 8 RETURN_VALUE ``` Bytecode after this PR: ``` ORIGINAL BYTECODE fn /home/yifu/microbench/del2.py line 18 19 0 LOAD_GLOBAL 0 (foo) 2 LOAD_FAST 0 (a) 4 CALL_FUNCTION 1 6 STORE_FAST 1 (b) 20 8 LOAD_GLOBAL 1 (graph_break_fn) 10 LOAD_FAST 1 (b) 12 CALL_FUNCTION 1 14 STORE_FAST 2 (c) 22 16 LOAD_GLOBAL 2 (bar) 18 LOAD_FAST 2 (c) 20 CALL_FUNCTION 1 22 RETURN_VALUE MODIFIED BYTECODE fn /home/yifu/microbench/del2.py line 18 18 0 LOAD_GLOBAL 3 (__compiled_fn_0) 2 LOAD_FAST 0 (a) 4 CALL_FUNCTION 1 6 STORE_FAST 3 (graph_out_0) 8 LOAD_GLOBAL 1 (graph_break_fn) 10 LOAD_FAST 3 (graph_out_0) 12 LOAD_CONST 1 (0) 14 BINARY_SUBSCR 16 DELETE_FAST 3 (graph_out_0) 20 18 CALL_FUNCTION 1 20 LOAD_GLOBAL 4 (__resume_at_14_1) 22 ROT_TWO 24 CALL_FUNCTION 1 26 RETURN_VALUE ORIGINAL BYTECODE torch_dynamo_resume_in_fn_at_20 /home/yifu/microbench/del2.py line 20 20 0 LOAD_FAST 0 (___stack0) 2 JUMP_ABSOLUTE 9 (to 18) 4 LOAD_GLOBAL 0 (foo) 6 LOAD_FAST 1 (a) 8 CALL_FUNCTION 1 10 STORE_FAST 2 (b) 12 LOAD_GLOBAL 1 (graph_break_fn) 14 LOAD_FAST 2 (b) 16 CALL_FUNCTION 1 >> 18 STORE_FAST 3 (c) 22 20 LOAD_GLOBAL 2 (bar) 22 LOAD_FAST 3 (c) 24 CALL_FUNCTION 1 26 RETURN_VALUE MODIFIED BYTECODE torch_dynamo_resume_in_fn_at_20 /home/yifu/microbench/del2.py line 20 20 0 LOAD_GLOBAL 3 (__compiled_fn_2) 2 LOAD_FAST 0 (___stack0) 4 CALL_FUNCTION 1 6 UNPACK_SEQUENCE 1 8 RETURN_VALUE ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122658 Approved by: https://github.com/jansel, https://github.com/anijain2305	2024-03-26 22:49:05 +00:00
IvanKobzarev	9b095c3fe6	[dynamo] Config to not emit runtime asserts (#122603 ) Repetition on squashed & merged by mistake https://github.com/pytorch/pytorch/pull/122406 Differential Revision: [D55312394](https://our.internmc.facebook.com/intern/diff/D55312394) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122603 Approved by: https://github.com/ezyang	2024-03-25 21:17:44 +00:00
Edward Z. Yang	47a9725de9	Implement prefer_deferred_runtime_asserts_over_guards (#122090 ) Fixes https://github.com/pytorch/pytorch/issues/121749 As promised, it is pretty easy. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/122090 Approved by: https://github.com/lezcano	2024-03-25 16:31:16 +00:00
William Wen	23524710e6	[dynamo] use proxies to nn.Module in dynamo generated GraphModules (#120756 ) Fixes remaining refleaks found when debugging https://github.com/pytorch/pytorch/issues/119607, tests added in https://github.com/pytorch/pytorch/pull/120657. Also fixes some tests that xfail: https://github.com/pytorch/pytorch/issues/120631 (not entirely sure why), but introduced tests now fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120756 Approved by: https://github.com/jansel	2024-03-21 21:23:12 +00:00
IvanKobzarev	8a94005d46	[dynamo][runtime_asserts] Ignore failures on sorting sympy relations (#122205 ) Differential Revision: [D55075500](https://our.internmc.facebook.com/intern/diff/D55075500) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122205 Approved by: https://github.com/ezyang	2024-03-20 13:25:37 +00:00
Jason Ansel	c05bf0037d	[dynamo] Remove copy_graphstate/restore_graphstate (#122067 ) Some dead code cleanup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122067 Approved by: https://github.com/oulgen	2024-03-19 15:37:53 +00:00
Animesh Jain	c568b84794	[dynamo][guards] Move backend match to eval_frame (#121954 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121954 Approved by: https://github.com/jansel	2024-03-17 06:52:10 +00:00
Jason Ansel	0b7d9711d4	[dynamo] Add support for nn.Parameter constructor (part 2) (#120965 ) This handles the case where the tensor isn't an input. The changes to dynamo tests are cases where we would previously fall back to eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120965 Approved by: https://github.com/yanboliang ghstack dependencies: #121735	2024-03-16 04:29:58 +00:00
Animesh Jain	a623666066	[dynamo][compile-time] Make output_graph new_var linear (#121858 ) Fixes https://github.com/pytorch/pytorch/issues/121679 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121858 Approved by: https://github.com/jansel	2024-03-15 03:20:04 +00:00
Jason Ansel	3c8c7e2a46	[dynamo] Tweak naming for module hook bw_state (#121609 ) Some minor changes not related to the other PRs in the stack Pull Request resolved: https://github.com/pytorch/pytorch/pull/121609 Approved by: https://github.com/yanboliang	2024-03-12 16:27:56 +00:00
Animesh Jain	78b4793c96	[dynamo][compile-time] Caching VTs to reduce compile-time (#121031 ) Reduces the `torch.compile(backend="eager")` for this code ~~~ def fn(x): for _ in range(10000): # x = torch.sin(x) x = torch.ops.aten.sin(x) # x = sin(x) return x ~~~ From 18 seconds to 12 seconds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121031 Approved by: https://github.com/jansel	2024-03-12 09:19:50 +00:00
Jason Ansel	7cc476ea16	[dynamo] Fix support for nn.Parameter constructor (part 1) (#120163 ) This captures calls to `torch.nn.Parameter` by lifting them to graph inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120163 Approved by: https://github.com/albanD, https://github.com/yanboliang ghstack dependencies: #121086	2024-03-11 05:14:42 +00:00
IvanKobzarev	9a45001905	[dynamo] relax missing symbols runtime assert (#121339 ) Differential Revision: [D54603361](https://our.internmc.facebook.com/intern/diff/D54603361) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121339 Approved by: https://github.com/ezyang	2024-03-07 14:53:38 +00:00
angelayi	c844b377fa	[dynamo] Reorder logs (#116106 ) Currently when there is a print/warning in the graph, dynamo graph breaks causing export to fail. However export would like to just skip over these print/warning calls: https://github.com/pytorch/pytorch/issues/113792. Additionally there's a torch.compile feature request to "reorder prints" so that instead of graph breaking when hitting prints/logging, we can skip over these prints to create larger compiled graphs, and then print the results out after those compiled graphs: https://github.com/pytorch/pytorch/issues/93739. This PR also adds the `reorderable_logging_functions` config for users to register logging functions to be reordered (like `print` or a custom logging function). Printout of the bytecode after reordering the prints looks like the following: P914736600 There are some limitations to the printing right now: * You can only register logging functions, not methods * Inputs to the logging functions can only be tensors, constants, and format strings * Inputs to the logging functions which will later be mutated in-place will not be printed correctly TODO: Add the following tests * print function with argument of nested data structure; * print function with argument of nested data structure being updated inside of compile region (this would test if we handle side effect correctly); * custom defined logging functions with nn.Module or nn.Module attribute arguments; * custom defined logging functions with submodule input/output as arguments (we need to handle the mapping and fused-out value); * custom defined logging functions with tensor argument and mutation inside of the function (TBD: this may increase memory usage); Pull Request resolved: https://github.com/pytorch/pytorch/pull/116106 Approved by: https://github.com/yanboliang	2024-03-01 17:04:24 +00:00
PyTorch MergeBot	63b259492a	Revert "[dynamo] Reorder logs (#116106 )" This reverts commit `c5472628ff`. Reverted https://github.com/pytorch/pytorch/pull/116106 on behalf of https://github.com/clee2000 due to landrace with `342e7929b8`, which removed the import for warnings. Should be an easy fix after rebase `c5472628ff` ([comment](https://github.com/pytorch/pytorch/pull/116106#issuecomment-1972586180))	2024-03-01 06:25:46 +00:00
Angela Yi	c5472628ff	[dynamo] Reorder logs (#116106 ) Currently when there is a print/warning in the graph, dynamo graph breaks causing export to fail. However export would like to just skip over these print/warning calls: https://github.com/pytorch/pytorch/issues/113792. Additionally there's a torch.compile feature request to "reorder prints" so that instead of graph breaking when hitting prints/logging, we can skip over these prints to create larger compiled graphs, and then print the results out after those compiled graphs: https://github.com/pytorch/pytorch/issues/93739. This PR also adds the `reorderable_logging_functions` config for users to register logging functions to be reordered (like `print` or a custom logging function). Printout of the bytecode after reordering the prints looks like the following: P914736600 There are some limitations to the printing right now: * You can only register logging functions, not methods * Inputs to the logging functions can only be tensors, constants, and format strings * Inputs to the logging functions which will later be mutated in-place will not be printed correctly TODO: Add the following tests * print function with argument of nested data structure; * print function with argument of nested data structure being updated inside of compile region (this would test if we handle side effect correctly); * custom defined logging functions with nn.Module or nn.Module attribute arguments; * custom defined logging functions with submodule input/output as arguments (we need to handle the mapping and fused-out value); * custom defined logging functions with tensor argument and mutation inside of the function (TBD: this may increase memory usage); Pull Request resolved: https://github.com/pytorch/pytorch/pull/116106 Approved by: https://github.com/yanboliang	2024-03-01 04:48:44 +00:00
Jason Ansel	01ec8df6d8	[Compiled Autograd] Introduce BackwardState capture (#120382 ) This adds support for backwards hooks that are both: 1) Interior to the graph; and 2) Dynamically generated (e.g. lambdas) We do this by creating a BackwardState object that is used to register the hooks in the forward, then populated by dynamo after the forwards runs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120382 Approved by: https://github.com/xmfan	2024-02-28 20:36:47 +00:00
Edward Z. Yang	1a1fc1047d	Add structured trace logs (#120289 ) Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit How to read the diff: * Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes) * torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs * torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines. * torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log. * test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable. https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289 Approved by: https://github.com/Skylion007 ghstack dependencies: #120712	2024-02-28 01:01:41 +00:00
PyTorch MergeBot	f3dd2a544c	Revert "Add structured trace logs (#120289 )" This reverts commit `9dfaef962c`. Reverted https://github.com/pytorch/pytorch/pull/120289 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54230697 ([comment](https://github.com/pytorch/pytorch/pull/120289#issuecomment-1967477120))	2024-02-27 19:49:05 +00:00
Edward Z. Yang	9dfaef962c	Add structured trace logs (#120289 ) Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit How to read the diff: * Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes) * torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs * torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). There's a teensy bit of FB specific code to automatically enable trace logging if a /logs directory exists. `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines. * torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log. * test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable. https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs. Testing that the fbcode detection works at https://www.internalfb.com/mlhub/pipelines/runs/fblearner/534553450 (Meta-only) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289 Approved by: https://github.com/Skylion007	2024-02-27 00:04:23 +00:00
angelayi	759204253f	[export] Change runtime asserts to using assert_scalar (#119608 ) By changing runtime symbolic asserts to using assert_scalar, the asserts can call into `expect_true` and modify the shape env so that we can run through the traced graph module with fake tensors. With assert_async, the asserts only get hit during runtime, but that means if we run the graph module with fake tensors, the asserts will not affect the shape env, so later data dependent calls to the fake tensors may result in GuardOnDataDependentSymNode errors. https://github.com/pytorch/pytorch/issues/119587 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119608 Approved by: https://github.com/ezyang	2024-02-26 17:56:12 +00:00
PyTorch MergeBot	47300221c2	Revert "[export] Change runtime asserts to using assert_scalar (#119608 )" This reverts commit `f4d641ba2f`. Reverted https://github.com/pytorch/pytorch/pull/119608 on behalf of https://github.com/huydhn due to This break ONNX trunk job `65fd8b6730` ([comment](https://github.com/pytorch/pytorch/pull/119608#issuecomment-1947436402))	2024-02-15 22:25:24 +00:00

1 2 3 4 5 ...

300 Commits