Commit Graph

143 Commits

Author SHA1 Message Date
nullplay
ac529df244 Native matmul (#157743)
### Implementation of #151705

This PR introduces the initial implementation of native `tl.dot` support in Inductor, with the goal of generating Triton matmul kernels directly—without relying on predefined templates.

To avoid complexity and ease the review process, I plan to split this work into two phases as outlined in #151705:

1. **Basic support** (this PR)
2. **Lazy broadcasting** for optimal performance (future PR)

### Summary of This PR

This PR implements the basic functionality. It does **not** include lazy broadcasting, so the generated kernels may involve explicit `tl.reshape` and `tl.trans` operations before calling `tl.dot`, which introduces some overhead.

### Notable Changes

1. Adds a new config flag: `config.triton.enable_native_matmul`
2. Introduces a new `ops.dot` IR node in Inductor and lowers `aten.mm` and `aten.bmm` to it when native matmul is enabled
3. Enforces tililng suitable for matmul when the native matmul flag is enabled
4. Implements code generation for `ops.dot`
5. Adds Triton autotuning heuristics: for now, I’ve copied the configuration from the existing matmul templates. However, this may not be optimal—it currently takes a long time to tune, and I think there must be a better way to tackle this.

@eellison @jansel @PaulZhang12 @shunting314

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157743
Approved by: https://github.com/jansel
2025-10-14 04:22:30 +00:00
Deng, Daisy
c9485f8ff3 [Reland][2/N]Port several test files under test/distributed to Intel GPU (#159473)
For https://github.com/pytorch/pytorch/issues/114850, we will port distributed tests to Intel GPU. This PR will work on some test files under test/distributed. We could enable Intel GPU with following methods and try the best to keep the original code styles:

- instantiate_device_type_tests()
- use "torch.accelerator.current_accelerator()" to determine the accelerator backend
- use requires_accelerator_dist_backend to allow both nccl and xccl test
- enabled XPU for some test path
- Change the hardcoded world_size according to device_count.
- Unify some common code under torch/testing/_internal for multiple backend, for example:
  Added xpu for Backend.backend_capability and dist.Backend.register_backend()

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159473
Approved by: https://github.com/guangyey, https://github.com/d4l3k
2025-09-17 06:42:27 +00:00
PyTorch MergeBot
92f9ed7ac3 Revert "[2/N]Port several test files under test/distributed to Intel GPU (#159473)"
This reverts commit fa1d409e83.

Reverted https://github.com/pytorch/pytorch/pull/159473 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break an distributed tests ([comment](https://github.com/pytorch/pytorch/pull/159473#issuecomment-3282999084))
2025-09-11 23:51:21 +00:00
Deng, Daisy
fa1d409e83 [2/N]Port several test files under test/distributed to Intel GPU (#159473)
For https://github.com/pytorch/pytorch/issues/114850, we will port distributed tests to Intel GPU. This PR will work on some test files under test/distributed. We could enable Intel GPU with following methods and try the best to keep the original code styles:

- instantiate_device_type_tests()
- use "torch.accelerator.current_accelerator()" to determine the accelerator backend
- use requires_accelerator_dist_backend to allow both nccl and xccl test
- enabled XPU for some test path
- Change the hardcoded world_size according to device_count.
- Unify some common code under torch/testing/_internal for multiple backend, for example:
  Added xpu for Backend.backend_capability and dist.Backend.register_backend()

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159473
Approved by: https://github.com/guangyey, https://github.com/d4l3k
2025-09-11 06:44:26 +00:00
PyTorch MergeBot
ff2de5d522 Revert "[2/N]Port several test files under test/distributed to Intel GPU (#159473)"
This reverts commit 040d00af04.

Reverted https://github.com/pytorch/pytorch/pull/159473 on behalf of https://github.com/jeanschmidt due to Seems to be breaking internal signals, @d4l3k please help the author to have this change landed. [D81718444](https://www.internalfb.com/diff/D81718444) ([comment](https://github.com/pytorch/pytorch/pull/159473#issuecomment-3264046983))
2025-09-07 21:06:38 +00:00
Deng, Daisy
040d00af04 [2/N]Port several test files under test/distributed to Intel GPU (#159473)
For https://github.com/pytorch/pytorch/issues/114850, we will port distributed tests to Intel GPU. This PR will work on some test files under test/distributed. We could enable Intel GPU with following methods and try the best to keep the original code styles:

- instantiate_device_type_tests()
- use "torch.accelerator.current_accelerator()" to determine the accelerator backend
- use requires_accelerator_dist_backend to allow both nccl and xccl test
- enabled XPU for some test path
- Change the hardcoded world_size according to device_count.
- Unify some common code under torch/testing/_internal for multiple backend, for example:
  Added xpu for Backend.backend_capability and dist.Backend.register_backend()

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159473
Approved by: https://github.com/guangyey, https://github.com/d4l3k
2025-09-04 12:53:17 +00:00
PyTorch MergeBot
14d67eec05 Revert "[dynamo][fsdp] Consistent behavior of int attributes (#157262)"
This reverts commit 9b4d938f04.

Reverted https://github.com/pytorch/pytorch/pull/157262 on behalf of https://github.com/ZainRizvi due to This was reverted internally. Somehow this PR didn't get reverted alongside it. See D78772867. To validate your fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/157262#issuecomment-3128148475))
2025-07-28 16:58:27 +00:00
Animesh Jain
9b4d938f04 [dynamo][fsdp] Consistent behavior of int attributes (#157262)
Reimpl of https://github.com/pytorch/pytorch/pull/150954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157262
Approved by: https://github.com/bdhirsh
2025-07-22 11:26:54 +00:00
Xuehai Pan
0d17029fea [BE][6/6] fix typos in test/ (test/distributed/) (#157640)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157640
Approved by: https://github.com/yewentao256, https://github.com/malfet
2025-07-11 14:09:37 +00:00
PyTorch MergeBot
e517066f41 Revert "[dynamo][fsdp] Consistent behavior of int attributes (#157262)"
This reverts commit 178fe7aa98.

Reverted https://github.com/pytorch/pytorch/pull/157262 on behalf of https://github.com/huydhn due to This fails some internal tests and needs to be relanded ([comment](https://github.com/pytorch/pytorch/pull/157262#issuecomment-3059463896))
2025-07-10 23:11:18 +00:00
Animesh Jain
178fe7aa98 [dynamo][fsdp] Consistent behavior of int attributes (#157262)
Reimpl of https://github.com/pytorch/pytorch/pull/150954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157262
Approved by: https://github.com/bdhirsh
2025-07-08 22:11:33 +00:00
PyTorch MergeBot
8c0df6fe17 Revert "[dynamo][fsdp] Consistent behavior of int attributes (#157262)"
This reverts commit 42b48ee672.

Reverted https://github.com/pytorch/pytorch/pull/157262 on behalf of https://github.com/jeanschmidt due to Newly introduced tests are red in internal runs, check D77593713 ([comment](https://github.com/pytorch/pytorch/pull/157262#issuecomment-3026944993))
2025-07-02 08:30:39 +00:00
Animesh Jain
42b48ee672 [dynamo][fsdp] Consistent behavior of int attributes (#157262)
Reimpl of https://github.com/pytorch/pytorch/pull/150954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157262
Approved by: https://github.com/bdhirsh
2025-06-30 22:32:52 +00:00
Edward Z. Yang
17eb649d55 Implement guard collectives (optimized version) (#156562)
This is a remix of https://github.com/pytorch/pytorch/pull/155558

Instead of mediating guard collective via a config option, in this one it's done via a `set_stance` like API. The motivation is that checking for the config value on entry on torch.compile is apparently quite expensive, according to functorch_maml_omniglot. So this makes it a bit cheaper.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156562
Approved by: https://github.com/Microve
2025-06-24 04:59:49 +00:00
Xuehai Pan
6d5c789ad5 [BE][PYFMT] migrate PYFMT for test/[a-h]*/ to ruff format (#144555)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144555
Approved by: https://github.com/ezyang
ghstack dependencies: #144551, #144554
2025-06-24 04:53:54 +00:00
Oguz Ulgen
a2a75be0f8 Rename inductor cache (#156128)
Requested by Simon on a different PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156128
Approved by: https://github.com/xmfan
2025-06-17 03:57:18 +00:00
PyTorch MergeBot
190f76fa31 Revert "Implement guard collectives (#155558)"
This reverts commit 5a5a05a6a3.

Reverted https://github.com/pytorch/pytorch/pull/155558 on behalf of https://github.com/malfet due to Hmm, may be I'm looking at the wrong metric, but c92f1075aa/1 shows that test started to pass after PR were reverted ([comment](https://github.com/pytorch/pytorch/pull/155558#issuecomment-2978337152))
2025-06-16 22:26:52 +00:00
Edward Z. Yang
5a5a05a6a3 Implement guard collectives (#155558)
When running a distributed job with compiler collectives enabled, if one rank recompiles while others do not, this leads to a deadlock (as not everyone will rendezvous with the compiler collective from the recompile). Although there aren't any convenient ways to cheaply solve this problem, if you are willing to force everyone to sync when evaluating guards, you can just force everyone to recompile if anyone requires a recompile. So the way guard collectives work is:

1. Perform compiled code lookup (evaluating guards)
2. Run a collective, communicating if you found a compiled code or not
3. If anyone requires recompile, force everyone to recompile

One current deficiency in the implementation is we can't conveniently track the time it takes to run this collective.

I need to test if we actually successfully are running the collective on a separate stream, or if we have to wait for user collectives to all finish.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155558
Approved by: https://github.com/Microve
2025-06-16 19:46:16 +00:00
PyTorch MergeBot
61b271e0f3 Revert "Implement guard collectives (#155558)"
This reverts commit 38e5e81e55.

Reverted https://github.com/pytorch/pytorch/pull/155558 on behalf of https://github.com/atalman due to Breaks CI, sorry: [GH job link](https://github.com/pytorch/pytorch/actions/runs/15683161593/job/44181274826) [HUD commit link](38e5e81e55) ([comment](https://github.com/pytorch/pytorch/pull/155558#issuecomment-2977871178))
2025-06-16 19:40:46 +00:00
Edward Z. Yang
38e5e81e55 Implement guard collectives (#155558)
When running a distributed job with compiler collectives enabled, if one rank recompiles while others do not, this leads to a deadlock (as not everyone will rendezvous with the compiler collective from the recompile). Although there aren't any convenient ways to cheaply solve this problem, if you are willing to force everyone to sync when evaluating guards, you can just force everyone to recompile if anyone requires a recompile. So the way guard collectives work is:

1. Perform compiled code lookup (evaluating guards)
2. Run a collective, communicating if you found a compiled code or not
3. If anyone requires recompile, force everyone to recompile

One current deficiency in the implementation is we can't conveniently track the time it takes to run this collective.

I need to test if we actually successfully are running the collective on a separate stream, or if we have to wait for user collectives to all finish.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155558
Approved by: https://github.com/Microve
2025-06-16 14:09:14 +00:00
Boyuan Feng
38410cf9b5 Fix DDPOptimizer issue on static tensor index (#155746)
We rely on `_try_get_metadata_from_dynamo()` to get static input indices. When the meta info is missing, it just returns an empty list of static input indices. This wrong list of static input indices lead to repeated cudagraph re-recording, which looks like a hang from the user perspective. bc3972b80a/torch/_functorch/aot_autograd.py (L1025-L1031)

The root cause is `split_module` in DDP Optimizer loses meta info and gm attributes. This PR fixes the issue by propagating these metadata from original module to submodules.
bc3972b80a/torch/_dynamo/backends/distributed.py (L515-L517)

Fixes #140395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155746
Approved by: https://github.com/xmfan, https://github.com/bdhirsh
2025-06-14 00:15:58 +00:00
William Wen
1f29190b59 [dynamo] unimplemented -> unimplemented_v2 in variables/builtin.py (#151145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151145
Approved by: https://github.com/Skylion007, https://github.com/StrongerXi, https://github.com/jansel, https://github.com/zou3519
2025-04-16 17:16:05 +00:00
clr
4e194bbfd6 dynamo: fsdp throw unimplemented vs attribute error (#146188)
Rather than throw a full exception for fsdp, instead just return unimplemented,
and respect the user options (i.e. fullgraph, vs graph break).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146188
Approved by: https://github.com/jansel
2025-02-04 21:45:55 +00:00
Aaron Orenstein
99dbc5b0e2 PEP585 update - test (#145176)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145176
Approved by: https://github.com/bobrenjc93
2025-01-22 04:48:28 +00:00
Tom Ritchford
d25e6e623f Fix unused Python variables in test/[a-d]* (#134665)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134665
Approved by: https://github.com/albanD
2024-12-13 22:13:12 +00:00
Yuanhao Ji
67ba79676f [Dynamo] Replace torch._dynamo.optimize() with torch.compile() [7/N] (#140922)
related commits:

- #139706
- #140238
- #140247
- #140253
- #140663
- #140688
- #140922
- #140924
- #140933

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140922
Approved by: https://github.com/williamwen42
2024-12-06 07:07:29 +00:00
Brian Hirsh
471017cbc9 avoid specializing strides with DDPOptimizer + inductor (#140751)
Fixes https://github.com/pytorch/pytorch/issues/140229

Fixes https://github.com/pytorch/pytorch/issues/139474

The issue was that:

(1) DDPOptimizer has some logic to partition the dynamo graph into buckets, and run AOTAutograd/inductor on each bucket

(2) doing so requires knowing the **exact** strides of the outputs of each subgraph, so we can have example inputs (with correct strides) to each of the later subgraphs to compile with

(3) there is some existing logic to do this today: we have a `fakify_first_call` flag in AOTAutograd that lets you run it with fake tensor inputs (to handle the calling convention changes that AOTAutograd performs at runtime). During this process, we query inductor for the output strides that it compiled with

(4) these outputs strides are stored in the FX graph cache as raw strings of sympy expressions. We have a function, `evaluate_symexpr`, which given the sympy string, and the ShapeEnv's `var_to_val` mapping, will evaluate the sympy string to generate concrete strides

(5) evaluating this expression will specialize on the exact values of any variables in our shape env, however. In DDPOptimizer, we want to know what inductor's stride outputs are symbolically. This requires converting the (string) sympy expression into actual `SymInts` that we can return.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140751
Approved by: https://github.com/eellison
2024-12-05 03:41:12 +00:00
Yanbo Liang
5f004f455a [Dynamo][Distributed] Fix ProcessGroup getattr (#141638)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141638
Approved by: https://github.com/williamwen42, https://github.com/jansel
2024-11-27 21:42:33 +00:00
chilli
392221b390 Made DDPOptimizer work with HOPs (#138787)
Fixes https://github.com/pytorch/pytorch/issues/137481

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138787
Approved by: https://github.com/yf225
ghstack dependencies: #138733, #138794, #138881
2024-10-25 18:59:01 +00:00
zeshengzong
e374d6850a [distributed][test] Remove unused variable and fix doc typo (#136943)
Refactor distributed test code:
- Fix TODO: Remove unused variable
- Fix doc typo
- Migrate deprecated method call `load_state_dict` and `save_state_dict`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136943
Approved by: https://github.com/H-Huang
2024-10-02 08:31:53 +00:00
Jez Ng
71aac59e93 Add Triton CPU as an Inductor backend (#133408)
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend.

Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133408
Approved by: https://github.com/jansel, https://github.com/blaine-rister, https://github.com/malfet
2024-09-30 20:24:52 +00:00
PyTorch MergeBot
36428f91e9 Revert "Add Triton CPU as an Inductor backend (#133408)"
This reverts commit 31c0467594.

Reverted https://github.com/pytorch/pytorch/pull/133408 on behalf of https://github.com/int3 due to internal tests failing ([comment](https://github.com/pytorch/pytorch/pull/133408#issuecomment-2379692517))
2024-09-27 16:54:27 +00:00
Jez Ng
31c0467594 Add Triton CPU as an Inductor backend (#133408)
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend.

Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133408
Approved by: https://github.com/jansel, https://github.com/blaine-rister, https://github.com/malfet
2024-09-26 15:35:26 +00:00
PyTorch MergeBot
d0cebedb31 Revert "Add Triton CPU as an Inductor backend (#133408)"
This reverts commit e498b02b47.

Reverted https://github.com/pytorch/pytorch/pull/133408 on behalf of https://github.com/jeanschmidt due to Broke internal signals, see D62737208 for more details ([comment](https://github.com/pytorch/pytorch/pull/133408#issuecomment-2353623816))
2024-09-16 18:33:33 +00:00
Jez Ng
e498b02b47 Add Triton CPU as an Inductor backend (#133408)
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133408
Approved by: https://github.com/jansel
2024-09-14 21:45:19 +00:00
Animesh Jain
693897df42 [dynamo] Missing guard source keys for corner case of NNModuleVariabl… (#135041)
Potentially fixes - https://fb.workplace.com/groups/1286739428954016/permalink/1319662695661689/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135041
Approved by: https://github.com/ezyang
2024-09-11 00:43:26 +00:00
Edward Z. Yang
386b313028 Handle KeyError for compiler collective in scalars too (#135385)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135385
Approved by: https://github.com/jansel
2024-09-10 12:33:04 +00:00
Edward Z. Yang
d6b9bd3e60 Also handle compiler collective when input variable doesn't exist on all ranks (#135147)
Internal xref:
https://fb.workplace.com/groups/3095840833991792/permalink/3810738595835342/

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135147
Approved by: https://github.com/jansel
2024-09-06 13:18:36 +00:00
Pian Pawakapan
5a0e7a408f restore CSE'd node metadata in runtime asserts pass (#134516)
Adds val, and optionally stack_trace & nn_module_stack metadata back to SymInt compute nodes that we CSE, with a hook on `graph.create_node()`. Not sure if there's other metadata we want to populate here?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134516
Approved by: https://github.com/ezyang
2024-09-05 07:50:04 +00:00
Animesh Jain
32f45f01a9 [dynamo] Retire CompileProfiler (#135133)
Fixes confusion in https://github.com/pytorch/pytorch/issues/113443

We have TORCH_LOGS that supersedes CompileProfiler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135133
Approved by: https://github.com/ezyang
ghstack dependencies: #135039, #135121, #135129, #135130
2024-09-05 01:08:40 +00:00
PyTorch MergeBot
c88c19c6de Revert "restore CSE'd node metadata in runtime asserts pass (#134516)"
This reverts commit 1dfb105239.

Reverted https://github.com/pytorch/pytorch/pull/134516 on behalf of https://github.com/pianpwk due to breaking NestedTensor test ([comment](https://github.com/pytorch/pytorch/pull/134516#issuecomment-2329738450))
2024-09-04 18:41:21 +00:00
Pian Pawakapan
1dfb105239 restore CSE'd node metadata in runtime asserts pass (#134516)
Adds val, and optionally stack_trace & nn_module_stack metadata back to SymInt compute nodes that we CSE, with a hook on `graph.create_node()`. Not sure if there's other metadata we want to populate here?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134516
Approved by: https://github.com/ezyang
2024-09-04 05:56:28 +00:00
Edward Z. Yang
15c25c4580 Fix dim mismatch logic automatic dynamic not working with compiler collectives (#135025)
Fixes
https://fb.workplace.com/groups/3095840833991792/permalink/3810738595835342/

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135025
Approved by: https://github.com/albanD
2024-09-04 01:50:21 +00:00
Oguz Ulgen
0063e56949 Make FX Graph Cache work with distributed training (#133374)
During distributed training if all ranks except one hit the cache, the rank that did not hit the cache will cause a NCCL timeout since rest of the ranks will enter the collective and start the timer. This PR uses the new PTD API to increase timeout for the ranks that hit the cache by the amount of time the cache would save.

Differential Revision: [D61363722](https://our.internmc.facebook.com/intern/diff/D61363722)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133374
Approved by: https://github.com/ezyang
2024-08-16 18:51:14 +00:00
PyTorch MergeBot
07adae3dac Revert "Make FX Graph Cache work with distributed training (#133374)"
This reverts commit dcdb25453e.

Reverted https://github.com/pytorch/pytorch/pull/133374 on behalf of https://github.com/albanD due to Broke trunk ([comment](https://github.com/pytorch/pytorch/pull/133374#issuecomment-2291289260))
2024-08-15 13:43:16 +00:00
Oguz Ulgen
dcdb25453e Make FX Graph Cache work with distributed training (#133374)
During distributed training if all ranks except one hit the cache, the rank that did not hit the cache will cause a NCCL timeout since rest of the ranks will enter the collective and start the timer. This PR uses the new PTD API to increase timeout for the ranks that hit the cache by the amount of time the cache would save.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133374
Approved by: https://github.com/ezyang
ghstack dependencies: #133362, #133363
2024-08-14 22:58:48 +00:00
Oguz Ulgen
63e5b09218 Add unit test for asymmetric compilation (#133363)
Unit test for asymmetric compilation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133363
Approved by: https://github.com/jamesjwu
ghstack dependencies: #133362
2024-08-14 19:32:18 +00:00
Edward Z. Yang
296c339f98 Ensure compiler collective is called even when no graph is compiled (#132163)
It's very important to make sure we always run the compiler collective, because if we don't, we will fail to apply automatic dynamic at all.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132163
Approved by: https://github.com/jansel
2024-08-02 16:31:54 +00:00
PyTorch MergeBot
9eeb5eebab Revert "Ensure compiler collective is called even when no graph is compiled (#132163)"
This reverts commit 0d9c9716b2.

Reverted https://github.com/pytorch/pytorch/pull/132163 on behalf of https://github.com/ezyang due to test_correct_module_names ([comment](https://github.com/pytorch/pytorch/pull/132163#issuecomment-2265729449))
2024-08-02 16:16:31 +00:00
Edward Z. Yang
0d9c9716b2 Ensure compiler collective is called even when no graph is compiled (#132163)
It's very important to make sure we always run the compiler collective, because if we don't, we will fail to apply automatic dynamic at all.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132163
Approved by: https://github.com/jansel
2024-08-02 12:18:34 +00:00