pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
IvanKobzarev	b470e59c38	partitioner option to ignore partitioner_tag for abstract usage (#166725 ) Partitioner functionality is appealing to use in different scenarios (E.g. Autoparallel) We have special logic about "partitioner_tag" from meta that is only needed for forward/backward split. Adding optional argument to avoid it and do only generic split based on inputs/outputs. Potentially we want to make `_extract_graph_with_inputs_outputs` without underscore :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166725 Approved by: https://github.com/bdhirsh	2025-10-31 18:50:02 +00:00
Yuanyuan Chen	694db5f549	Use 'is' in callable comparisons (#166624 ) Just like we use `is/is not` for class comparisons, it is generally advised to use `is/is not` for comparisons against torch functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166624 Approved by: https://github.com/Lucaskabela, https://github.com/Skylion007	2025-10-30 19:00:09 +00:00
Yuanyuan Chen	2de4cf2102	[1/N] Remove unused loop variables (#166258 ) This PR removes unused loop variables. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166258 Approved by: https://github.com/Lucaskabela, https://github.com/mlazos	2025-10-30 12:22:25 +00:00
PyTorch MergeBot	1dd6b76914	Revert "[1/N] Remove unused loop variables (#166258 )" This reverts commit `76b2c37045`. Reverted https://github.com/pytorch/pytorch/pull/166258 on behalf of https://github.com/atalman due to breaks test/distributed/test_serialization.py::TestSerialization::test_weights_only [GH job link](https://github.com/pytorch/pytorch/actions/runs/18894311802/job/53929321703) [HUD commit link](`76b2c37045`) ([comment](https://github.com/pytorch/pytorch/pull/166258#issuecomment-3460964612))	2025-10-29 11:10:37 +00:00
Yuanyuan Chen	76b2c37045	[1/N] Remove unused loop variables (#166258 ) This PR removes unused loop variables. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166258 Approved by: https://github.com/Lucaskabela, https://github.com/mlazos	2025-10-29 01:34:15 +00:00
Maggie Moss	31e42eb732	Fix pyrefly ignore syntax (#166438 ) Reformats pyrefly ignore suppressions so they only ignore one error code. pyrefly check lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/166438 Approved by: https://github.com/Skylion007	2025-10-29 00:02:21 +00:00
Maggie Moss	d795fb225a	[RFC] Add pyrefly to lintrunner (#165179 ) This will add pyrefly to lint runner as a warning only - and allow us to collect feedback about the tool before switching to pyrefly as the main type checker. References the steps outlined here: : https://github.com/pytorch/pytorch/issues/163283: test plan: `lintrunner init` `lintrunner` confirm when pyrefly errors are present results look like: https://gist.github.com/maggiemoss/e6cb2d015dd1ded560ae1329098cf33f Pull Request resolved: https://github.com/pytorch/pytorch/pull/165179 Approved by: https://github.com/ezyang	2025-10-16 20:07:09 +00:00
Brian Hirsh	ed74dc054d	add the option to disable functionalization in AOTDispatcher (#164577 ) I'm cleaning this PR up as a proper way of disabling functionalization via config in AOTDispatcher. I removed the non-functionalization related changes from the original version: (1) preventing proxy mode (and functionalization) from incorrectly decomposing CIA ops (Ed has a PR for it here: https://github.com/pytorch/pytorch/pull/164939) (2) preventing python-dispatcher-based decomps above autograd from running. I'm not doing this for now, will likely do it in a followup Pull Request resolved: https://github.com/pytorch/pytorch/pull/164577 Approved by: https://github.com/ezyang ghstack dependencies: #165372	2025-10-16 15:44:11 +00:00
Brian Hirsh	f33c7e1a43	add and fix OpInfo tests for the default partitioner (#165372 ) I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/165372 Approved by: https://github.com/ezyang	2025-10-16 15:44:11 +00:00
PyTorch MergeBot	b509fb9b5d	Revert "add and fix OpInfo tests for the default partitioner (#165372 )" This reverts commit `bcfea48ab7`. Reverted https://github.com/pytorch/pytorch/pull/165372 on behalf of https://github.com/malfet due to Looks like it broke slow jobs, see `331b7cc054/1` ([comment](https://github.com/pytorch/pytorch/pull/165372#issuecomment-3407567748))	2025-10-15 17:38:52 +00:00
Brian Hirsh	bcfea48ab7	add and fix OpInfo tests for the default partitioner (#165372 ) I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/165372 Approved by: https://github.com/ezyang ghstack dependencies: #165327	2025-10-14 23:34:34 +00:00
Yuanyuan Chen	fbe0d20a17	[2/N] More ruff SIM fixes (#165031 ) This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031 Approved by: https://github.com/mlazos	2025-10-14 14:22:54 +00:00
PyTorch MergeBot	b8be796a57	Revert "[2/N] More ruff SIM fixes (#165031 )" This reverts commit `38095fbd13`. Reverted https://github.com/pytorch/pytorch/pull/165031 on behalf of https://github.com/albanD due to One of the changed line started to fail on trunk ([comment](https://github.com/pytorch/pytorch/pull/165031#issuecomment-3390190870))	2025-10-10 13:42:14 +00:00
Yuanyuan Chen	38095fbd13	[2/N] More ruff SIM fixes (#165031 ) This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031 Approved by: https://github.com/mlazos	2025-10-10 05:37:46 +00:00
Maggie Moss	086dec3235	Pyrefly suppressions 6/n (#164877 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Almost there! Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the project-excludes field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: INFO 0 errors (5,064 ignored) Only four directories left to enable Pull Request resolved: https://github.com/pytorch/pytorch/pull/164877 Approved by: https://github.com/oulgen	2025-10-08 02:30:57 +00:00
PyTorch MergeBot	5d7360bb03	Revert "Enable all SIM rules except disabled ones (#164645 )" This reverts commit `321e602692`. Reverted https://github.com/pytorch/pytorch/pull/164645 on behalf of https://github.com/izaitsevfb due to causes lint failures ([comment](https://github.com/pytorch/pytorch/pull/164645#issuecomment-3369274351))	2025-10-05 19:32:21 +00:00
Yuanyuan Chen	321e602692	Enable all SIM rules except disabled ones (#164645 ) `SIM` rules are useful for simplifying boolean expressions and enhances code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645 Approved by: https://github.com/ezyang	2025-10-05 07:38:25 +00:00
Maggie Moss	f414aa8e0d	Add pyrefly suppressions (3/n) (#164588 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: uncomment lines in the pyrefly.toml file step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/bb31574ac8a59893c9cf52189e67bb2d after: 0 errors (1,970 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164588 Approved by: https://github.com/oulgen	2025-10-03 22:03:03 +00:00
Yuanyuan Chen	a43c4c3972	[5/N] Apply ruff UP035 rule (#164423 ) Continued code migration to enable ruff `UP035`. Most changes are about moving `Callable` from `typing` to `from collections.abc`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164423 Approved by: https://github.com/ezyang	2025-10-02 07:31:11 +00:00
Yidi Wu	8f6dbc0ba8	[scan] create fw and bw graphs via partitioning (#162754 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162754 Approved by: https://github.com/zou3519 ghstack dependencies: #161557, #161664, #161808, #162025, #161732	2025-09-27 18:13:15 +00:00
Basil Wong	4941719061	Enable logging for absolute memory estimation (#158799 ) Summary: Update the Auto AC logging so that it also provides the absolute memory estimations for each node. Test Plan: (aps-gem_omnifm_v2_mwb_dynamic_005_budget-f23a84c3d8): https://fburl.com/ai_infra/0r738h5r {F1980393481} * Memory Recorded in bytes --- ``` buck2 test //caffe2/test/functorch:test_ac_logging ``` https://www.internalfb.com/intern/testinfra/testrun/14918173863021573 Rollback Plan: Differential Revision: D78580107 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158799 Approved by: https://github.com/jansel	2025-09-22 18:36:49 +00:00
Menglu Yu	5050cfa363	[Opitmus] fix fp8 activation quatization for duplicates forward output (#163364 ) Summary: We observe a case then the fwd graph has duplicated return nodes, which will lead to errors due to fx renaming the node, thus we add poi info into the node name. Test Plan: ### unit test ``` CUDA_VISIBLE_DEVICES=3 buck2 test mode/opt -m ovr_config//triton:beta -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //caffe2/test/functorch:test_aotdispatch -- test_quantize_activation_duplicate_nodes ``` Buck UI: https://www.internalfb.com/buck2/de5eccc6-4064-4214-843d-70b8e3829afe Test UI: https://www.internalfb.com/intern/testinfra/testrun/4503599937670844 Network: Up: 217KiB Down: 72KiB (reSessionID-73e5c269-4f4d-4a54-896a-79c077eea326) Executing actions. Remaining 0/2 0.1s exec time total Command: test. Finished 1 local Time elapsed: 45.9s Tests finished: Pass 2. Fail 0. Fatal 0. Skip 0. Build failure 0 ### E2E before f798417700 after Differential Revision: D82844100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163364 Approved by: https://github.com/Yuzhen11	2025-09-20 06:33:20 +00:00
joshuamarkovic	559e8d1c20	[doc]: Small typos (#162982 ) Small typo fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/162982 Approved by: https://github.com/ezyang, https://github.com/zou3519	2025-09-16 17:42:19 +00:00
Animesh Jain	5805c4210b	[invoke_subgraph][inductor] Thread graphsafe rng input states for hops (#160713 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160713 Approved by: https://github.com/eellison	2025-08-21 20:41:29 +00:00
Edward Z. Yang	204eb4da5e	Add expanded_def option for FX printing, render descriptor, update tests (#158708 ) ---- - First, we add a new expanded_def to FX, which will expand the definitions of variables into multiple lines, one per variable definition. This makes extremely long args/return lists much more readable. - Next, we extend this mechanism to also print out descriptors on placeholders and return values, as comments, if available. This is how we will test descriptors. - We update tlparse for AOTAutograd to use this format. - We update expect tests to use this format and update their formats, so you can inspect what it can look at. There may be other tests I should update, open to suggestions. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/158708 Approved by: https://github.com/wconstab ghstack dependencies: #158624	2025-07-25 13:22:32 +00:00
Xuehai Pan	f903bc475c	[BE] add noqa for flake8 rule B036: found `except BaseException` without re-raising (#159043 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159043 Approved by: https://github.com/Skylion007	2025-07-25 02:56:34 +00:00
Menglu Yu	4657a84bc5	[Optimus][fp8_activation_quantization] Only log when there's some node to be quantized (#158129 ) Summary: We add some extra check on whether there's some node has been marked as should quantize, otherwise we skip the quantizaton and tlparse log. Rollback Plan: Differential Revision: D78173788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158129 Approved by: https://github.com/Skylion007, https://github.com/avicizhu	2025-07-15 19:22:26 +00:00
Xuehai Pan	7f14b42adf	[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156312 Approved by: https://github.com/albanD	2025-07-12 05:47:06 +00:00
PyTorch MergeBot	e15f4248ad	Revert "[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )" This reverts commit `7a92b51196`. Reverted https://github.com/pytorch/pytorch/pull/156312 on behalf of https://github.com/XuehaiPan due to landrace ([comment](https://github.com/pytorch/pytorch/pull/156312#issuecomment-3064672250))	2025-07-12 04:40:52 +00:00
Xuehai Pan	7a92b51196	[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156312 Approved by: https://github.com/albanD	2025-07-12 01:47:22 +00:00
Shuai Yang	6c42afe196	Introduce sync_cross_rank_decision (#156287 ) Summary: This is an improvement over `_broadcast_rank0_decision` where we uses the rank0's decision to broadcast to every rank. The issue of `_broadcast_rank0_decision` is that we observed large variance on the peak memory usage. One cause is that different ranks receive different dynamic shaped tensors and the hints of those tensors are different in different ranks. If we only rely on rank0's decision and it's unlucky to get unrepresentative hints, then the decision it makes may not be suitable for other ranks. Here, we introduce `sync_cross_rank_decision` which comes up with the decision after comparing all ranks' local decision, it will: 1. all gather decisions from all ranks; 2. test each decision on the current rank and get its estimated memory usage; 3. all reduce estimated memory usage with ReduceOp.MAX, so that we know the maximum memory usage of each decision on all ranks; 4. pick the decision which gives us minimum maximum memory memory usage; A graph to show more details https://internalfb.com/excalidraw/EX484509 After applying sync_cross_rank_decision, we observed that the variance are much smaller Rollback Plan: Differential Revision: D76714005 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156287 Approved by: https://github.com/fmassa, https://github.com/bdhirsh	2025-07-03 23:43:53 +00:00
Animesh Jain	22edb457c9	[invoke_subgraph][partitioner] Add meta val on run_and_save_rng ops (#157319 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157319 Approved by: https://github.com/zou3519	2025-07-01 21:02:08 +00:00
IvanKobzarev	2f94f69b7c	[aotd] Support mutations of the same input in fw and bw (#155354 ) Original issue: https://github.com/pytorch/pytorch/issues/154820 The issue happens when there is a mutation for the same input in forward AND in backward. AOTD emited copy_ after joint_function tracing. This made this fx-node to correspond to the side effects of both mutations (in forward and in backward). After that partitioner can put it either in forward or in backward. The fix: 1/ Introduce joint_function.handle that allows to set "post_forward" callback, to be able to check inputs state after forward We do not want to apply the mutation after joint, if we already applied it in forward. For that we need "mutation_counter" and memorize the version of mutation that we applied for forward mutation. 2/ Exposing mutation_counter to python We want to keep invariant that copy_ exist only in the end of joint graph. 3/ We memorize mutation_counter and state of the inputs after forward, using the handle post_forward. Emit post_forward mutations after joint graph fully traced. add for post_forward mutations "must_be_in_forward" tag (similar to existing "must_be_in_backward") to keep them in forward. 4/ Ban recompute of the source of mutation. Recompute can apply the same op (e.g. add) in forward and backward. For this set MUST_SAVE for the source of mutation in forward. proxy_tensor changes: By default proxy tensor updates tensor_tracker. In this case applied mutations will be chained. But we want that this copy_ will be independent and applied just to primals. For this introducing a contextmanager to be able to disable update of tensor_tracker for adding forward mutations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155354 Approved by: https://github.com/bdhirsh	2025-06-26 14:05:54 +00:00
Xuehai Pan	162ca185ff	[BE][PYFMT] migrate PYFMT for `torch/_[a-h]*/` to `ruff format` (#144551 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144551 Approved by: https://github.com/ezyang ghstack dependencies: #148186	2025-06-25 06:16:06 +00:00
PyTorch MergeBot	e600e044a7	Revert "[aotd] Support mutations of the same input in fw and bw (#155354 )" This reverts commit `3f920f3d8f`. Reverted https://github.com/pytorch/pytorch/pull/155354 on behalf of https://github.com/malfet due to Not sure why CI was green, but it breaks tons of tests, see `930b575389/1` ([comment](https://github.com/pytorch/pytorch/pull/155354#issuecomment-2998780884))	2025-06-24 04:42:14 +00:00
Francisco Massa	ca5a40395d	[partitioner] Fix _broadcast_on_rank0 to use deterministic hash function (#153734 ) Summary: I was using python's hash, which is not deterministic across different interpreter runs. Use hashlib instead. Test Plan: Run using it https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-rebase_sanity_128bs_8t_cc-8e17be61ce?job_attempt=1&version=0&tab=summary&env=prod Differential Revision: D74882405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153734 Approved by: https://github.com/Microve	2025-06-24 00:06:23 +00:00
IvanKobzarev	3f920f3d8f	[aotd] Support mutations of the same input in fw and bw (#155354 ) Original issue: https://github.com/pytorch/pytorch/issues/154820 The issue happens when there is a mutation for the same input in forward AND in backward. AOTD emited copy_ after joint_function tracing. This made this fx-node to correspond to the side effects of both mutations (in forward and in backward). After that partitioner can put it either in forward or in backward. The fix: 1/ Introduce joint_function.handle that allows to set "post_forward" callback, to be able to check inputs state after forward We do not want to apply the mutation after joint, if we already applied it in forward. For that we need "mutation_counter" and memorize the version of mutation that we applied for forward mutation. 2/ Exposing mutation_counter to python We want to keep invariant that copy_ exist only in the end of joint graph. 3/ We memorize mutation_counter and state of the inputs after forward, using the handle post_forward. Emit post_forward mutations after joint graph fully traced. add for post_forward mutations "must_be_in_forward" tag (similar to existing "must_be_in_backward") to keep them in forward. 4/ Ban recompute of the source of mutation. Recompute can apply the same op (e.g. add) in forward and backward. For this set MUST_SAVE for the source of mutation in forward. proxy_tensor changes: By default proxy tensor updates tensor_tracker. In this case applied mutations will be chained. But we want that this copy_ will be independent and applied just to primals. For this introducing a contextmanager to be able to disable update of tensor_tracker for adding forward mutations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155354 Approved by: https://github.com/bdhirsh	2025-06-23 22:25:45 +00:00
Aaron Orenstein	54b8087f63	Improve torch.ops typing (#154555 ) Summary: Cloned https://github.com/pytorch/pytorch/pull/153558 from benjaminglass1 and fixed internal typing errors. Fixes longstanding issue where direct references to aten operations are seen as untyped by type checkers. This is accomplished by setting attributes on several classes more consistently, so that `__getattr__` can return a single type in all other cases. Decisions made along the way: 1. `torch.ops.higher_order` is now implemented by a single-purpose class. This was effectively true before, but the class implementing it attempted to be generalized unnecessarily. Fixing this simplified typing for the `_Ops` class. 2. `__getattr__` is only called when all other lookup methods have failed, so several constant special-cases in the function could be implemented as class variables. The remainder of this PR is fixing up all the bugs exposed by the updated typing, as well as all the nitpicky typing issues. Test Plan: CI Differential Revision: D75497142 Co-authored-by: Benjamin Glass <bglass@quansight.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/154555 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/zou3519, https://github.com/benjaminglass1	2025-06-22 15:52:27 +00:00
Xuan Zhang	c2d1b225e6	[PT2][partitioners] raise getitems in partitioners to allow earlier release of buffers (#155809 ) Problem & Solution: Assume we have something like: ``` x = some_op(...) x0 = x[0] do_something_with_and_is_last_use_of(x0) do_a_bunch_of_other_things() x1 = x[1] ``` In this case, the memory associated with `x0` cannot be released until `x1 = x[1]`. Since `x1 = x[1]` does not use additional memory, it would be beneficial to move and `x1 = x[1]` and all such `getitem` operations to be immediately after `x = some_op(...)` such as ``` x = some_op(...) x0 = x[0] x1 = x[1] do_something_with_and_is_last_use_of(x0) do_a_bunch_of_other_things() ``` Results: For instance, for the `res2net101_26w_4s` model in pytorch benchmark, when running with `aot_eager` backend and with `activation_memory_budget=0.4`, the peak memory are * baseline: 7.73GiB * with the chage: 6.45GiB As a sanity check, for the same setting with `inductor` backend, the peak memory is not regressed. cc and credit to @ShatianWang for noticing this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155809 Approved by: https://github.com/fmassa, https://github.com/bdhirsh	2025-06-21 19:57:21 +00:00
PyTorch MergeBot	94f8679019	Revert "[PT2][partitioners] raise getitems in partitioners to allow earlier release of buffers (#155809 )" This reverts commit `6d3a4356f6`. Reverted https://github.com/pytorch/pytorch/pull/155809 on behalf of https://github.com/laithsakka due to pr_time_benchmarks ([comment](https://github.com/pytorch/pytorch/pull/155809#issuecomment-2985022572))	2025-06-18 16:52:19 +00:00
Xuan Zhang	6d3a4356f6	[PT2][partitioners] raise getitems in partitioners to allow earlier release of buffers (#155809 ) Problem & Solution: Assume we have something like: ``` x = some_op(...) x0 = x[0] do_something_with_and_is_last_use_of(x0) do_a_bunch_of_other_things() x1 = x[1] ``` In this case, the memory associated with `x0` cannot be released until `x1 = x[1]`. Since `x1 = x[1]` does not use additional memory, it would be beneficial to move and `x1 = x[1]` and all such `getitem` operations to be immediately after `x = some_op(...)` such as ``` x = some_op(...) x0 = x[0] x1 = x[1] do_something_with_and_is_last_use_of(x0) do_a_bunch_of_other_things() ``` Results: For instance, for the `res2net101_26w_4s` model in pytorch benchmark, when running with `aot_eager` backend and with `activation_memory_budget=0.4`, the peak memory are * baseline: 7.73GiB * with the chage: 6.45GiB As a sanity check, for the same setting with `inductor` backend, the peak memory is not regressed. cc and credit to @ShatianWang for noticing this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155809 Approved by: https://github.com/fmassa, https://github.com/bdhirsh ghstack dependencies: #155943	2025-06-18 14:38:55 +00:00
Xuan Zhang	eb2af14f8e	[PT2][partitioners] Add aten.split to view_ops list [relanding #155424 ] (#155943 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155943 Approved by: https://github.com/ShatianWang	2025-06-16 20:42:54 +00:00
PyTorch MergeBot	8372d0986a	Revert "[PT2][partitioners] Add aten.split to view_ops list (#155424 )" This reverts commit `e1db10e05a`. Reverted https://github.com/pytorch/pytorch/pull/155424 on behalf of https://github.com/clee2000 due to I think this broke inductor/test_cpu_repro.py::CPUReproTests::test_transpose_with_norm [GH job link](https://github.com/pytorch/pytorch/actions/runs/15596830833/job/43931044625) [HUD commit link](`e1db10e05a`) but idk how, reverting to see if it fixes the problem ([comment](https://github.com/pytorch/pytorch/pull/155424#issuecomment-2964717706))	2025-06-12 01:38:34 +00:00
Shatian Wang	e1db10e05a	[PT2][partitioners] Add aten.split to view_ops list (#155424 ) Summary: Add `aten.split` to view_ops list in partitioners.py Test Plan: na Rollback Plan: Differential Revision: D76011951 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155424 Approved by: https://github.com/xuanzhang816	2025-06-11 22:12:13 +00:00
Oguz Ulgen	d1947a8707	Migrate from lru_cache to cache (#155613 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613 Approved by: https://github.com/ezyang ghstack dependencies: #155612	2025-06-11 19:44:18 +00:00
IvanKobzarev	0083032e75	[aotd] Support mutations in reordering_to_mimic_autograd_engine (#155353 ) Original issue: https://github.com/pytorch/pytorch/issues/154820 Dedicated sub-issue: https://github.com/pytorch/pytorch/issues/155242 Backward graph is reordered by partitioners.py: reordering_to_mimic_autograd_engine Which only records in the backward graph compute that starts from tangents. Mutation of primals(inputs) in backward can be disconnected from backward. Handling this copy_ specifically, as we add this mutation in framework and this is the only mutation that exist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155353 Approved by: https://github.com/bdhirsh, https://github.com/zou3519	2025-06-09 16:39:47 +00:00
Aaron Gokaslan	bbda22e648	[BE][Ez]: Optimize unnecessary lambda with operator (#154722 ) Automated edits performed by FURB118. Operator is implemented in C and way faster when passed to another C method like sorted, max etc as a `key=` Pull Request resolved: https://github.com/pytorch/pytorch/pull/154722 Approved by: https://github.com/jansel	2025-05-30 23:47:10 +00:00
Menglu Yu	ba0a91b3ea	[4/n][Optimus][Auto-AC] Expose the config to skip the dynamo gaurds to avoid recompile (#154152 ) Summary: context: https://fb.workplace.com/groups/1075192433118967/permalink/1673720956599442/ Thanks Microve for raising the existing dynamo skip API in D75196435 The dynamic shape triggers recompilation, introducing compilation time increase, we expose config that users can skip the dynamo guards to avoid the recompile. Note that it may quantize unnessarily nodes, which can impact NE, QPS and memory saving, needs verification. Differential Revision: D75248430 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154152 Approved by: https://github.com/bobrenjc93	2025-05-29 00:35:37 +00:00
Aaron Orenstein	6503b4a96e	Update to using mypy 1.15 (#154054 ) The BC break isn't real - mypy decided to start complaining about the way we were typing that function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154054 Approved by: https://github.com/Skylion007	2025-05-24 04:30:57 +00:00
Menglu Yu	788d9cb2d7	[3/n][Optimus][Auto-AC][reland] Support any fp8 quantization type and set scaling as the default" (#154057 ) Summary: This is a reland of D74910193. We change the dtype to torch.float8_e5m2 in unit test since it is not supported. Test Plan: ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:quantization ``` Differential Revision: D75169792 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154057 Approved by: https://github.com/Mingming-Ding	2025-05-22 18:26:34 +00:00

1 2 3 4 5

214 Commits