pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Bob Ren	a4e7b8001c	refuse to generate a symbolic variable if a float input is inf (#139846 ) Fixes `PYTORCH_TEST_WITH_INDUCTOR=1 tlp python test/test_torch.py TestTorchDeviceTypeCPU.test_cauchy_cpu_float64` when `specialize_float=False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/139846 Approved by: https://github.com/ruidazeng, https://github.com/ezyang ghstack dependencies: #139569, #139457, #139568, #139572	2024-11-07 09:16:55 +00:00
Pian Pawakapan	a678eaf1ad	check fake/real mismatches during real tensor prop (#137747 ) Summary: While testing exportability for PT2 Inference models, we found various cases of invalid op inputs during tracing, for example errors like: `a and b must have same reduction dim`, `expected scalar type Long but found Int`, etc. Looking more closely, these happened to due the same few meta kernels & eager kernels producing mismatched outputs upstream (e.g. different output tensor dtype, int output). Adding checks to catch mismatched outputs in real tensor prop upstream, so errors are raised at the mismatched op, instead of the downstream ops taking them as inputs. Relies a lot on utils from [CrossRefFakeMode](`929797dedb/torch/_subclasses/fake_utils.py (L78)`) Follow ups: could add more checks, and maybe have a flag to only enable these for cases like draft mode, so perf doesn't suffer? Test Plan: test_export, test_fake_tensor Differential Revision: D64210055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137747 Approved by: https://github.com/zou3519	2024-11-04 23:39:48 +00:00
Bob Ren	25f243ff5d	Update tensorify pass to specialize symfloats we didn't tensorify away (#139564 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139564 Approved by: https://github.com/huydhn	2024-11-03 04:27:43 +00:00
angelayi	86db2cd194	[export] Initial draft export (#139383 ) Differential Revision: [D65288590](https://our.internmc.facebook.com/intern/diff/D65288590) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139383 Approved by: https://github.com/zou3519	2024-11-01 06:25:44 +00:00
Bob Ren	094d288f40	Update tensorify pass to specialize symfloats we didn't tensorify away (#138868 ) As discussed w/ @ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things: 1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation) 2) It updates the tensorify pass to do the backup specialization This pass was originally part of the [PR](https://github.com/pytorch/pytorch/pull/137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows: 1) Integrate turning off specialize float only in the automatic dynamic pass. 2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised. 3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138868 Approved by: https://github.com/ezyang	2024-11-01 03:18:02 +00:00
PyTorch MergeBot	b9acbde4fd	Revert "Update tensorify pass to specialize symfloats we didn't tensorify away (#138868 )" This reverts commit `a494572799`. Reverted https://github.com/pytorch/pytorch/pull/138868 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think the new tests are failing on fbcode ([comment](https://github.com/pytorch/pytorch/pull/138868#issuecomment-2450863895))	2024-10-31 21:46:06 +00:00
Bob Ren	a494572799	Update tensorify pass to specialize symfloats we didn't tensorify away (#138868 ) As discussed w/ @ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things: 1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation) 2) It updates the tensorify pass to do the backup specialization This pass was originally part of the [PR](https://github.com/pytorch/pytorch/pull/137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows: 1) Integrate turning off specialize float only in the automatic dynamic pass. 2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised. 3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138868 Approved by: https://github.com/ezyang	2024-10-30 23:28:25 +00:00
Richard Zou	ad0883a288	[real_tensor_prop] Infer Fake kernels during real tensor prop (#139213 ) This PR changes real_tensor_prop to also infer fake kernels when the operator doesn't have it. We infer the fake output to be of the same properties as the real output, with unbacked symints in the sizes and some stride order. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/139213 Approved by: https://github.com/pianpwk ghstack dependencies: #139212	2024-10-30 21:29:33 +00:00
Richard Zou	ef380f7b8e	[real tensor prop] Add some asserts for custom ops (#139212 ) When we see a custom op: - check that its mutation annotations are correct - check that its aliasing constraints matches our constraints for custom ops. Otherwise, there may be undefined behavior. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/139212 Approved by: https://github.com/angelayi	2024-10-30 19:29:11 +00:00
Guilherme Leobas	8785353f2f	Fix tensor subclass + dynamic shapes in torch.compile + aot autograd (#125941 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125941 Approved by: https://github.com/bdhirsh ghstack dependencies: #133337	2024-10-28 21:58:59 +00:00
Jack Zhang	dd688099af	Update unbacked symints in torch.nonzero more precisely (#137663 ) ### Summary The fake impl for `nonzero` sets the symint's upper range to `sys.maxsize - 1` if there are any SymInts in the original input tensor shape. This PR constrains the range more intelligently by using the upper ranges of each SymInt in the input tensor shape. See https://github.com/pytorch/pytorch/pull/134899 as a merged solution for a similar problem for a different op. ### Test plan Added unit test to verify upper bound reduction calculation (`python test/export/test_export.py TestExport.test_nonzero_dynamic`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137663 Approved by: https://github.com/ezyang	2024-10-28 20:57:23 +00:00
eellison	d90717e4e2	Add option to save real tensors in TORCH_COMPILE_DEBUG repro (#138110 ) This pr adds a utility to try to try to construct the corresponding real tensor values of fake tensors by seeing if their meta storage is contained in the meta converter. Then, we are able to save real tensor values for fx_graph_runnable if `TORCH_COMPILE_DEBUG_SAVE_REAL=1` is set. Differential Revision: [D64502744](https://our.internmc.facebook.com/intern/diff/D64502744) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138110 Approved by: https://github.com/ezyang	2024-10-28 16:18:22 +00:00
Angela Yi	a99e8eeb97	Propagate real tensor tracing with torchbind + fixing side effects (#138797 ) Summary: * Fixed real tensor tracing w/ torchbind objs by passing the cloned tensor obj. For now I just catch the exception and have an error message if the `_clone` fails, but up for discussion on what to do here * Separate question, should we require people to set up FakeScriptObjects and stuff for draft mode? * Prevent side effects from happening when we do the first pass of custom ops profiling by cloning/copying everything. Not sure if deepcopying the model will succeed in all cases... But also I guess this path can be removed once custom ops profiling turns into one pass. Test Plan: `buck2 run @//mode/dev-nosan //scripts/angelayi/draft_export:test_draft_export` Reviewed By: ydwu4 Differential Revision: D64124825 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138797 Approved by: https://github.com/ydwu4	2024-10-28 06:27:36 +00:00
Aaron Gokaslan	d72241d045	[Ez][BE]: Fix one more incorrect TypeIs (#139010 ) One other case where the side conditions could cause inaccurate typing info. Follow up to #138990 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139010 Approved by: https://github.com/malfet	2024-10-28 03:36:45 +00:00
Aaron Gokaslan	4de93d1ead	[BE][Ez]: Fix bad TypeIs conversion (#138990 ) Fixes on TypeIs / TypeGuard conversion error. Follow up to #133814 Thanks for @ezyang for reminding me to double check the side conditions here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138990 Approved by: https://github.com/malfet	2024-10-26 17:37:40 +00:00
Aaron Gokaslan	49ed365b22	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-10-26 15:07:13 +00:00
eellison	fe18a221eb	Add debug backend that applies CrossRefFakeMode, use in compiler bisector (#138651 ) I was debugging an internal ne divergence for a while that ended up being because of a bad meta. I added an explicit a config option and an explicit backend `aot_eager_decomp_partition_crossref` to enable the FakeCrossRefMode when running the graph. I added an explicit backend bc I suspect it will be useful for internal models but I'm also happy to leave as config option. It will only test ops that have meta to avoid memory overhead of hitting fallback path and running in eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138651 Approved by: https://github.com/zou3519, https://github.com/bdhirsh	2024-10-25 15:58:36 +00:00
IvanKobzarev	5ea6777861	[subclass] Unwrap_tensor_subclasses micro optimization (#138498 ) unwrap_tensor_subclasses -> get_plain_tensors Is used at runtime. For small models this overhead is feasible in comparison with small compiled kernel. 1/ Removing asserts from runtime path 2/ Removing list creation with using optional output list to append argument Pull Request resolved: https://github.com/pytorch/pytorch/pull/138498 Approved by: https://github.com/bdhirsh	2024-10-24 16:54:54 +00:00
PyTorch MergeBot	32d4582e02	Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814 )" This reverts commit `16caa8c1b3`. Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/jeanschmidt due to checking if this will solve inductor errors ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2427565425))	2024-10-21 19:40:58 +00:00
Aaron Gokaslan	16caa8c1b3	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-10-21 17:20:06 +00:00
Tom Ritchford	47077bfcb5	Remove an unused variable in _subclasses.fake_tensor (#138086 ) ---- * Extracted from https://github.com/pytorch/pytorch/pull/133492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138086 Approved by: https://github.com/Skylion007, https://github.com/albanD	2024-10-17 09:05:25 +00:00
Tom Ritchford	56379e2c17	Remove an unused variable in _subclasses.fake_impls (#138085 ) * Extracted from https://github.com/pytorch/pytorch/pull/133492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138085 Approved by: https://github.com/albanD, https://github.com/Skylion007	2024-10-16 22:41:04 +00:00
Animesh Jain	19665f4619	[fake_tensor][cache] Supports ops with tuple of output tensors (#137935 ) This is needed for invoke_subgraph work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137935 Approved by: https://github.com/masnesral	2024-10-15 22:15:07 +00:00
Laith Sakka	05b6200ccd	Do not compute base in export mode (#137760 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137760 Approved by: https://github.com/zou3519, https://github.com/bdhirsh	2024-10-15 19:04:42 +00:00
Pian Pawakapan	44653895cc	override bool(), is_nonzero for real tensor tracing (#136788 ) Fixes bool() and is_nonzero() calls for real tensor tracing, non-strict export Differential Revision: D63482693 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136788 Approved by: https://github.com/ezyang	2024-10-15 17:13:44 +00:00
eellison	881a18f25f	Set Cuda context in inductor and dont initialize wrong cuda device in fake_tensor (#137603 ) Previously we would construct tensors with "cuda" device which defaults to device:0 if not cuda context is set. Fix for https://github.com/pytorch/pytorch/issues/124854 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137603 Approved by: https://github.com/jansel	2024-10-10 18:25:22 +00:00
PyTorch MergeBot	796c3c3415	Revert "Disallow FakeTensor.data_ptr access in eager mode (#137221 )" This reverts commit `7e13e7dd7e`. Reverted https://github.com/pytorch/pytorch/pull/137221 on behalf of https://github.com/jovianjaison due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/137221#issuecomment-2397957081))	2024-10-07 21:46:13 +00:00
Angela Yi	f80ed0b831	[export] Custom op meta kernel generation (two pass) (#137277 ) Summary: Prototyping the custom op meta kernel generation. Rest of the changes are in fbcode/scripts/angelayi Test Plan: followup diff (D63837739) Differential Revision: D63837740 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137277 Approved by: https://github.com/zou3519	2024-10-07 15:34:19 +00:00
rzou	7e13e7dd7e	Disallow FakeTensor.data_ptr access in eager mode (#137221 ) Previously we raised a deprecation warning (beginning PyTorch 2.4). Now that we are on 2.6, we're completing the deprecation and disallowing this behavior. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/137221 Approved by: https://github.com/albanD, https://github.com/eellison	2024-10-03 23:47:55 +00:00
Edward Z. Yang	6bd9d37266	Remove allow-untyped-defs from torch.fx.experimental.symbolic_shapes (#137019 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/137019 Approved by: https://github.com/Skylion007 ghstack dependencies: #136934, #136935, #136972	2024-10-01 13:22:10 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	9409274bc1	Fix bug in functional tensor decomp (#136600 ) Summary: Previously we had a very bad bug where we don't allow any decomp on CIA. This never mattered before because we never had to actually push CIA decomp to Python key level in export. Test Plan: CI Differential Revision: D63363749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136600 Approved by: https://github.com/bdhirsh	2024-09-25 17:37:50 +00:00
Amadeusz Skrzypczak	47ebb5856e	Make avoid_device_init() aware of hpu device (#136194 ) Added hpu to devices handled by avoid_device_init() in FakeTensorMode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136194 Approved by: https://github.com/eellison	2024-09-24 14:13:45 +00:00
Aaron Gokaslan	31715be72a	[BE]: Update mypy to 1.11.2 (#133816 ) Updates mypy to 1.11.1 to improve type inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816 Approved by: https://github.com/ezyang	2024-09-16 19:44:11 +00:00
PyTorch MergeBot	3117f2cf67	Revert "[BE]: Update mypy to 1.11.2 (#133816 )" This reverts commit `55299cfc22`. Reverted https://github.com/pytorch/pytorch/pull/133816 on behalf of https://github.com/jeanschmidt due to seems to have broken https://github.com/pytorch/pytorch/actions/runs/10865710499/job/30155699792 on main ([comment](https://github.com/pytorch/pytorch/pull/133816#issuecomment-2352377684))	2024-09-16 09:11:16 +00:00
Aaron Gokaslan	55299cfc22	[BE]: Update mypy to 1.11.2 (#133816 ) Updates mypy to 1.11.1 to improve type inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816 Approved by: https://github.com/ezyang	2024-09-14 21:40:36 +00:00
Laith Sakka	ba6e0f31ab	Remove cycle dependency by localizing the import. (#135926 ) Summary: Since https://www.internalfb.com/diff/D62215095 landed there has been many silence errors due to the dependency between functional_tensor and config. ``` File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/export/__init__.py", line 64, in <module> File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/export/dynamic_shapes.py", line 23, in <module> File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/export/exported_program.py", line 26, in <module> File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/_higher_order_ops/__init__.py", line 1, in <module> File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/_higher_order_ops/cond.py", line 6, in <module> File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/_subclasses/functional_tensor.py", line 9, in <module> File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/_inductor/config.py", line 44, in <module> ``` https://fburl.com/logarithm/ol5kx0ee complaining about a cycle dependency this fix it. Test Plan: buck test multipy/runtime:test_deploy_embedded_cuda_interp_without_cuda_available -- --run-disabled TorchpyTest.AcquireMultipleSessionsInDifferentPackages Reviewed By: aorenste Differential Revision: D62616765 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135926 Approved by: https://github.com/aorenste, https://github.com/oulgen, https://github.com/Skylion007	2024-09-13 15:05:41 +00:00
Pian Pawakapan	6df91b5917	real tensor prop for composite ops (#135717 ) Fixes #135632 Adds real tensor propagation for decompositions, checking any symbols on their outputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/135717 Approved by: https://github.com/ezyang	2024-09-13 03:35:16 +00:00
Joel Schlosser	525bec804c	NJT <-> padded dense conversions (#125947 ) This PR: * Implements the pre-existing `nt.to_padded_tensor(padding_val)` ATen op via the FBGEMM kernel + appropriate view gymnastics (since that kernel only handles 2D values) * Introduces a new `_nested_from_padded_tensor` op for the reverse conversion, implemented via the reverse FBGEMM kernel + view gymnastics * Note: there is currently no public API for this; design booted to a future PR TODO: * ~~Propagate min / max sequence length via the new factory function `_nested_from_padded_tensor`~~ * ~~Verify that Inductor does computation fusion via test logic~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/125947 Approved by: https://github.com/soulitzer	2024-09-12 17:54:25 +00:00
Laith Sakka	66dd4577b1	Track base of FunctionalTensor in inference mode. (#135141 ) The idea behind the tracking is the following, whenever we see a tensor if the tensors is a root tensors (does not have any view metas ) when we consider is as the base of the all the tensors that shares its storage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135141 Approved by: https://github.com/zou3519	2024-09-06 00:10:25 +00:00
Avik Chaudhuri	43f4947d44	fix fake tensor tolist implementation (#135131 ) Summary: When exporting for training with `tolist`, we do not hit `FunctionalTensor.tolist` since we do not functionalize. Unfortunately, this means we hit `FakeTensor.tolist`, which creates unbacked symints that are not backed by proxies. Rather than trying to patch up this low-level implementation, we replace it with essentially what `FunctionalTensor.tolist` does, which is higher-level: we essentially desugar to `item()` calls and let it take care of unbacked symints. Test Plan: Some expected failures are gone now. Also found a test for `tolist` that was written when `FunctionalTensor.tolist` was implemented but not really doing much; repurposed it now to exercise more modes. Differential Revision: D62197742 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135131 Approved by: https://github.com/ezyang	2024-09-05 23:20:31 +00:00
IvanKobzarev	1efd341d15	[fake_tensor] Move unrecognized_type NotImplemented before ConstProp (#135033 ) We should not try to do ConstProp on the unrecognized types (e.g. Subclasses). In case of those types throwing NotImplemented will jump to the next torch_dispatch. Test: ``` python test/functorch/test_aotdispatch.py -k test_aot_test_subclasses_with_tensor_factories ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/135033 Approved by: https://github.com/zou3519, https://github.com/bdhirsh	2024-09-05 17:09:41 +00:00
Jack Zhang	8a5c8e5db9	Update unbacked symints in masked_select more precisely (#134899 ) ## Summary At the moment, the fake impl for `masked_select` simply sets the upper range while updating its size-like SymInt to `sys.maxsize`(9223372036854775807, max value for an unsigned int64) if the there are any SymInts in the original input tensor shape. This PR constrains the range more intelligently by using the upper ranges of each SymInt in the input tensor shape. This solves an issue where an model being lowered to Executorch errors during memory planning because the memory allocated for `masked_select` ended up exceeded the 64-bit address space (`INT_MAX * size(dtype)`). ## Test plan - Passes existing unit tests (tests case where upper bound is inf) - Added unit test to verify upper bound reduction calculation - Tested end-to-end by exporting with TORCH_LOGS="export" and ensuring that the range for `masked_select`'s SymInt size has the correct upper bound Pull Request resolved: https://github.com/pytorch/pytorch/pull/134899 Approved by: https://github.com/ezyang	2024-09-05 09:01:06 +00:00
Tugsbayasgalan Manlaibaatar	9d705605dd	Fix decomp behaviour in export training IR (#134801 ) Subset of changes in https://github.com/pytorch/pytorch/pull/132901, can't land the previous one because it is too complicated. Rest of the change will be implemented as follow up after export design meeting. This part just makes the training IR -> inference IR decomp to have the same path as normal export. Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134801 Approved by: https://github.com/avikchaudhuri, https://github.com/angelayi	2024-09-05 06:37:44 +00:00
Laith Sakka	c8ab9b06a2	Redesign custom op functionlaization for better re-inplace (#134409 ) - The new implementation (auto_functionalized_v2) is enabled by default but can be disable using an inductor flag. - In export mode the old implementation is used. Motiviation Previous functionalization fails to re-inplace arguments when they are view over other tensors. see issue https://github.com/pytorch/pytorch/issues/131192 The new functionalization is easier to re-inplace for views. A) Functionalizations pass consider a program: ``` func(t) x = t[0] y = t[1] foo(x, y) # custom operator with x, y mutable return (x, y, t) ``` - To functionalize `foo` we generate a function that operates on the base tensors of the inputs; (x.base() and y.base()) and record how to regenerates the views out of the base for argument x by recording ```ViewInfo=(x.base(), x.size(), x.stride, x,storage_offset())``` - Due to some limitations on the torch.export arguments format, we have to generate alot of arguments, but this is something we can simplify in the future, for the example above we get the following function. ``` auto_functionalized = torch.ops.higher_order.auto_functionalized(torch.ops.mylib.foo.default, _x_base_index = 0, _x_size = (), _x_stride = (), _x_storage_offset = 0 , _y_base_index = 0,_y_size = (), _y_stride = (), _y_storage_offset = 1 , _all_bases = [arg0_1]) ``` - In the code above: - _all_bases[t]: refers to a unique set of bases for all foo arguments. - for each argument x we have _x_base_index, _x_size, _x_stride, _x_storage_offset that can be used to (1) regenerate x from _all_bases[_x_base_index] or a copy of a the base. - the output of auto_functionalized is foo output , followed by x tensors one for each base in _all_bases, that is a copy of the base tensor after observing the mutations of the all the arguments that are views of that base. - for each use of a base in _all_bases or a view of it , that are after the call to foo, replace it with a view of the new output for the function above after functionalization we get : ``` def forward(self, arg0_1: "f32[2][1]cpu"): auto_functionalized = torch.ops.higher_order.auto_functionalized(torch.ops.mylib.foo.default, _x_base_index = 0, _x_size = (), _x_stride = (), _x_storage_offset = 0, _y_base_index = 0, _y_size = (), _y_stride = (), _y_storage_offset = 1, _all_bases = [arg0_1]) getitem_1: "f32[2][1]cpu" = auto_functionalized[1]; auto_functionalized = None copy_: "f32[2][1]cpu" = torch.ops.aten.copy_.default(arg0_1, getitem_1); arg0_1 = copy_ = None # No stacktrace found for following nodes select_2: "f32[][]cpu" = torch.ops.aten.select.int(getitem_1, 0, 0) select_3: "f32[][]cpu" = torch.ops.aten.select.int(getitem_1, 0, 1); getitem_1 = None return (select_2, select_3) ``` B) Semantics of auto_functionalize The new semantics of auto_functionalize is as the following: 1. For each base in all_bases, copy the base and create all_bases copies. (if a base is inplaced we do not need to copy it) 2. For each arg, regenerate the arg from the copy of its base using the view information above. 3. return the original foo output followed by the new bases. C) Re-inplace pass since auto_functionalize not copy the bases, what we actually inplace is the bases. (run just like before but on the beses instead of args). 1. For each base b in _all_bases check if there is any use of base (or its aliases/views) after auto_functionalize (before its overwritten with a copy) if there is not any, then inplace it (avoid copying it in step 1 above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/134409 Approved by: https://github.com/zou3519	2024-09-04 17:08:58 +00:00
IvanKobzarev	33ba952e31	[subclasses] Do not fakeTensor const prop subclass args (#134855 ) The issue: Const propagation checks only if arguments do not have FakeTensor. If argument is Subclass, it will pass this condition. As a result Const Propogation execution happens without FakeTensorMode and having tensor factories inside Subclass.__torch_dispatch__ results that this Tensor is not Fakified. Solution: If we have subclasses arguments, do not count that const propagation is doable Pull Request resolved: https://github.com/pytorch/pytorch/pull/134855 Approved by: https://github.com/zou3519	2024-09-03 13:31:49 +00:00
Avik Chaudhuri	92e38a476f	preserve aten::to device in export training (#134622 ) Summary: With training IR, we cannot rely on trapping `to()` in `FunctionalTensor` because the regular decomposition kicks it first, and that can cause it to be optimized away. So instead we preserve it until we functionalize, and then replace it explicitly with `_to_copy()`. Test Plan: expected test failures go away Differential Revision: D61883878 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134622 Approved by: https://github.com/zhxchen17, https://github.com/tugsbayasgalan	2024-08-29 14:53:30 +00:00
David Berard	289486d007	Move attention kernels back from fake_impls to meta_registrations (#134288 ) See #121528 for additional context. In #120682, we moved the attention kernels from meta_registrations to fake_impls with the intent of fixing the device handling for seed/offset: these are typically on CPU. We needed to put the registrations in fake_impls to do this because meta_registrations doesn't have a way to specify device, whereas fake_impls does. But when we tried to actually fix the device types (#120839), we had to revert the PR because it broke cudagraph handling (during which seed/offset _are_ on CUDA). Now, we want to put the registrations back in meta_registrations so that we can call these kernels with meta tensors. The use case is later in this stack - we want to be able to use the flop counter with these kernels. Also - I specifically skip the `compare_tensor_meta()` check in test_fake / test_fake_autocast tests for the `_efficient_attention_forward` and `_flash_attention_forward` kernels, which fails because of the device mismatch from the seed/offset tensors. Then we can un-skip these opinfos. I verified that the efficient_attention_forward bug (#120842) is now caught by these opinfos if I revert the fix from this PR. Differential Revision: [D61687369](https://our.internmc.facebook.com/intern/diff/D61687369) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134288 Approved by: https://github.com/drisspg	2024-08-27 21:10:36 +00:00
Aart Bik	1a0d00f1f4	[traced-graph][sparse] enable to_dense() for compressed (#133371 ) Fixes https://github.com/pytorch/pytorch/issues/133174 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133371 Approved by: https://github.com/ezyang	2024-08-24 20:33:23 +00:00
IvanKobzarev	8ae4f82243	[aotd] Support HOP effects in backward (#132638 ) Support of effectful operations in backward: 1/ AOTD collects metadata from forward fn only, so we can have usage of effectful ops in backward, that were not used in forward => Allowing tokens discovery during joint function . FunctionalTensorMode holds _tokens, in Joint function after tracing forward we memoize _tokens as `_tokens_forward_output`. 2/ Tokens are added as primals inputs (forward) in EffectTokensWrapper. Tokens that will be used in backward are in partitioner saved values. We do not have control on which positions they are saved in forward outputs. 2/ If new tokens discovered in backward after tracing joint_fn, the result graph will be manually added in the end of primals. _aot_autograd/utils.py 3/ All effectful ops during backward are marked with 'must_be_in_backward' partitioner_tag, to prevent partiitoner to place them in forward. For that functional_tensor_mode got new optional state `self._effects_partitioner_tag` for effectful ops, to set after tracing forward. There are additional changes in partitioner to improve functionality of 'must_be_in_backward' 4/ Unlift tokens now should run for both forward and backward. - As saved for backward tokens are placed on non static places - we identify input and output tokens to erase, by input and output of `with_effects` operation - In forward we can have input tokens, discovered in backward, that are not used in with_effects ops in forward, but saved for backward. We identify them by position in forward inputs. 5/ Adding aot debug logging for graphs before unlifting and before adding additional primal for backward tokens. Tests: ``` python test/higher_order_ops/test_with_effects.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/132638 Approved by: https://github.com/bdhirsh	2024-08-23 15:30:58 +00:00
Aaron Orenstein	d95aedf5fd	[BE] typing for decorators - fx/_compatibility (part 1) (#134202 ) Part of #134054. This corresponds to the pytorch mypy changes from D61493706. Updating takes so long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change. So landing these 'type: ignore' for pytorch in advance of them actually being needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202 Approved by: https://github.com/Skylion007	2024-08-22 17:07:33 +00:00

1 2 3 4 5 ...

539 Commits