Commit Graph

33948 Commits

Author SHA1 Message Date
Laith Sakka
39df901b2a introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432)
when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors.
in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want
to use definitely _contiguous API.

This is appleid for reshape in this PR and also to  tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true  now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432
Approved by: https://github.com/bobrenjc93
2025-05-28 03:41:26 +00:00
Sidharth
54f1f29fed [dynamo] dynamic gb_type -> static gb_type (#154435)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154435
Approved by: https://github.com/williamwen42
2025-05-28 03:14:26 +00:00
ZhiweiYan-96
f12ce4e36b [Intel GPU] convolution fusion at XPU backend (#154202)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154202
Approved by: https://github.com/EikanWang, https://github.com/guangyey, https://github.com/etaf
ghstack dependencies: #140365
2025-05-28 03:14:18 +00:00
FFFrog
c6fc11af76 Fix the Problems About Defining Static Variable in Inline Function (#147095)
Refer to https://github.com/pytorch/pytorch/issues/125465 for more informations

- Remove unused header files
- Move the inline function that defines the static variable to .cc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147095
Approved by: https://github.com/cyyever, https://github.com/albanD
2025-05-28 02:47:16 +00:00
Bin Bao
a84d8c4a1c [AOTI] Support multi-arch when using package_cpp_only (#154414)
Summary: Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary.

Differential Revision: [D75452096](https://our.internmc.facebook.com/intern/diff/D75452096)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154414
Approved by: https://github.com/angelayi
ghstack dependencies: #154412, #154413
2025-05-28 01:20:38 +00:00
Bin Bao
cde82d25b7 [AOTI] Add a multi_arch_kernel_binary option (#154413)
Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs.

Differential Revision: [D75452094](https://our.internmc.facebook.com/intern/diff/D75452094)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154413
Approved by: https://github.com/angelayi
ghstack dependencies: #154412
2025-05-28 01:20:38 +00:00
Bin Bao
4d8f3d537a [AOTI][refactor] Rename embed_cubin to embed_kernel_binary (#154412)
Summary: Rename as it is not CUDA specific.

Differential Revision: [D75452095](https://our.internmc.facebook.com/intern/diff/D75452095)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154412
Approved by: https://github.com/angelayi
2025-05-28 01:20:28 +00:00
iupaikov-amd
3f10c9d8af Fixed an issue with XPU skip so the test_decompose_mem_bound_mm.py suite can be ran correctly (#153245)
Fixes #153239

Replaced custom decorator with the common one. Although the better way to skip the whole suite would be to add it to skip list in run_test.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153245
Approved by: https://github.com/jeffdaily
2025-05-27 23:10:25 +00:00
Guilherme Leobas
7359705232 Add CPython tests for unittest (#150788)
Tests:
* test_assertions.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150788
Approved by: https://github.com/williamwen42
2025-05-27 20:26:17 +00:00
Guilherme Leobas
12fc06d267 Add CPython complex tests (#152015)
Tests:
* test_complex.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152015
Approved by: https://github.com/williamwen42
2025-05-27 20:24:28 +00:00
Guilherme Leobas
3b218e56dc Add CPython tests for iter/sort (#150797)
Tests:
* test_iter.py
* test_sort.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150797
Approved by: https://github.com/williamwen42
2025-05-27 20:22:34 +00:00
Ryan Guo
75bbd4989c [dynamo] Support using symint from dispatcher-style tensor subclass (#154130)
Fixes #146932.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154130
Approved by: https://github.com/laithsakka
2025-05-27 19:05:46 +00:00
ZhiweiYan-96
0ddfd1ed43 [Intel GPU] Enable mkdnn._linear_pointwise at XPU backend (#140365)
# Motivation

This PR is intended to add post-op fusion support fo Linear. The liner-pointwise fusion is expected to be used in graph mode like torch.compile. The FusionUtils.cpp file defines a utilization APIs for generating primitive attribute. This APIs would also be used for conv-pointwise fusion, which is in #140372.

# Validation
```bash
   python test/xpu/test_fusion.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140365
Approved by: https://github.com/etaf, https://github.com/guangyey, https://github.com/EikanWang
2025-05-27 15:57:15 +00:00
Xuehai Pan
7ae204c3b6 [BE][CI][Easy] Run lintrunner on generated .pyi stub files (#150732)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150732
Approved by: https://github.com/malfet, https://github.com/cyyever, https://github.com/aorenste
2025-05-27 14:58:02 +00:00
PyTorch MergeBot
11a51a11af Revert "introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432)"
This reverts commit 5c6d7caaaa.

Reverted https://github.com/pytorch/pytorch/pull/153432 on behalf of https://github.com/malfet due to Looks like it broke flex attention tests, see https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=g6.4xlarge&mergeEphemeralLF=true ([comment](https://github.com/pytorch/pytorch/pull/153432#issuecomment-2912562570))
2025-05-27 13:42:34 +00:00
Laith Sakka
5c6d7caaaa introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432)
when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors.
in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want
to use definitely _contiguous API.

This is appleid for reshape in this PR and also to  tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true  now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432
Approved by: https://github.com/bobrenjc93
2025-05-27 08:54:31 +00:00
Georgia Phillips
f8010e7b93 [nativert] Move file_util to pytorch core (#153162)
Summary: fbcode//sigmoid/core/common -> fbcode//caffe2/torch/nativert/common

Test Plan: Github CI

Differential Revision: D74328089

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153162
Approved by: https://github.com/zhxchen17
2025-05-27 03:42:47 +00:00
Will Feng
100ec0b34a [Inductor] Allow passing in custom lowering dict to register_lowering() (#154344)
This PR adds support for passing in custom lowering dict to `register_lowering()`, which allows systems (e.g. Helion, https://github.com/pytorch-labs/helion/pull/80) that uses Inductor to maintain their own lowering dict instead of using the Inductor global `lowerings` dict.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154344
Approved by: https://github.com/jansel
2025-05-27 01:35:26 +00:00
Randolf Scholz
839c9c6156 Use property instead of ClassVar for Uniform.arg_constraints and Wishart.arg_constraints (#154361)
Fixes #154355

For these two distributions, the constraints depend on the actual values, and so `arg_constraints` cannot be a `ClassVar`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154361
Approved by: https://github.com/Skylion007
2025-05-26 17:48:28 +00:00
PyTorch MergeBot
3f64502c98 Revert "Re-enable FakeTensor caching for SymInts (#152662)"
This reverts commit 7d11c61c26.

Reverted https://github.com/pytorch/pytorch/pull/152662 on behalf of https://github.com/malfet due to Looks like it broke bunch of inductor tests, see 187d38185e/1 ([comment](https://github.com/pytorch/pytorch/pull/152662#issuecomment-2910293593))
2025-05-26 17:13:22 +00:00
Narek Malkhasyan
21e42c5d62 More descriptive error message for torch.nanmean() with complex dtypes (#153252)
Fixes #153132

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153252
Approved by: https://github.com/colesbury
2025-05-26 05:42:57 +00:00
Aaron Orenstein
7d11c61c26 Re-enable FakeTensor caching for SymInts (#152662)
Summary:

This backs out D60320595 which itself turned off FakeTensor caching when a SymInt was present.

There has been a lot of dynamic shape fixes done this year and tests pass so I'm assuming some of that work fixed what was breaking previously.

Test Plan: Reran the tests listed in T196779132 and they pass.

## Perf
### Instruction Counter Benchmark:
- 26% win on add_loop_eager_dynamic
- 13% win on add_loop_inductor_dynamic_gpu
### Perf Dashboard
Compilation Latency wins across the board but especially strong on the dynamic tests (like cudagraphs_dynamic) - for example MobileBertForMaskedLM went from 66s -> 50s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152662
Approved by: https://github.com/anijain2305
2025-05-26 04:17:56 +00:00
Ke Wen
062387fb53 [SymmMem] Speed up tests (#153677)
Use `MultiProcContinousTest` to avoid re-create ProcessGroup in each test instance.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153677
Approved by: https://github.com/fegin, https://github.com/Skylion007, https://github.com/ngimel
ghstack dependencies: #153653
2025-05-26 03:39:11 +00:00
Ke Wen
8c16d0e404 [c10d] Add support for testing SIGABRT return (#153167)
`SIGABRT` is a common return by *negative* distributed tests, which checks for effectiveness of NaN assert, watchdog throw, etc.

These errors are not detectable by traditional statements like `with self.assertRaises(RuntimeError)`.

Instead, we'd need to check for the process's return code, e.g. `SIGABRT(6)` would have a return code of -6.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153167
Approved by: https://github.com/fduwjj
2025-05-26 00:56:05 +00:00
Natalia Gimelshein
b04852e404 Fix deterministic indexing with broadcast (#154296)
Fixes #79987, now for real.
Also removed thrust sort path that was needed for cuda <=11.2 because we no longer support it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154296
Approved by: https://github.com/soumith
2025-05-25 21:14:50 +00:00
Justin Chu
c3100067ae [ONNX] Update onnx to 1.18 (#153746)
Update onnx python package to 1.18.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153746
Approved by: https://github.com/titaiwangms, https://github.com/cyyever, https://github.com/malfet
2025-05-25 20:58:47 +00:00
Laith Sakka
43b2716e89 PYFMT lint grandfathered files 1 (#154261)
lint:
-  test/test_fake_tensor.py
-  test/test_flop_counter.py
- torch/_export/verifier.py

with same rules as other files, it was a night mare for me to update tests in one of the skipped files
with not being able to lint them locally like other files with lintrunner -a.
note that those file do have active dev and not old not touched files.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154261
Approved by: https://github.com/angelayi, https://github.com/Skylion007
2025-05-25 17:36:14 +00:00
PyTorch MergeBot
54932d865e Revert "[c10d] Add support for testing SIGABRT return (#153167)"
This reverts commit 03e102dbe8.

Reverted https://github.com/pytorch/pytorch/pull/153167 on behalf of https://github.com/malfet due to It broke lint ([comment](https://github.com/pytorch/pytorch/pull/153167#issuecomment-2907820789))
2025-05-25 13:17:27 +00:00
Keith
c4ef4090c5 Fix segfault on exit in CachingHostAllocator by signaling background thread to exit (#154117)
Fixes #152008

This PR fixes a segmentation fault that occurred when exiting the program due to improper background thread management in CachingHostAllocator.

Previously, the background thread continued running and called process_events() even after the allocator object was destroyed, leading to a crash on exit.

f12d8d60b1/aten/src/ATen/core/CachingHostAllocator.h (L218)

```cpp
// Launch the background thread and process events in a loop.
static bool background_thread_flag [[maybe_unused]] = [this] {
  getBackgroundThreadPool()->run([&]() {
    while (true) {
      process_events();  // <-- This line may cause segfault on exit
      std::this_thread::sleep_for(std::chrono::microseconds(100));
    }
  });
  return true;
}();
```

The fix adds a mechanism to signal the background thread to exit before the object is destructed, ensuring the thread stops safely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154117
Approved by: https://github.com/ngimel, https://github.com/cyyever
2025-05-25 07:46:12 +00:00
Ke Wen
9d922b55ef [Distributed][CI] Rework continuous TestCase (#153653)
1. Reworked `MultiProcContinousTest` to spawn processes during `setUpClass` instead of `main` (so that we can support multiple TestClass'es in one file).

2. The child processes are now an infinite loop, monitoring test IDs passed from main process via a task queue. Reciprocally, the child processes inform the main process completion of a test via a completion queue.

3. Added a test template.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153653
Approved by: https://github.com/d4l3k, https://github.com/fegin, https://github.com/fduwjj
2025-05-25 03:49:29 +00:00
Ke Wen
03e102dbe8 [c10d] Add support for testing SIGABRT return (#153167)
`SIGABRT` is a common return by *negative* distributed tests, which checks for effectiveness of NaN assert, watchdog throw, etc.

These errors are not detectable by traditional statements like `with self.assertRaises(RuntimeError)`.

Instead, we'd need to check for the process's return code, e.g. `SIGABRT(6)` would have a return code of -6.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153167
Approved by: https://github.com/fduwjj
2025-05-25 03:48:34 +00:00
Justin Chu
10c51b11ff Bump protobuf version and refactor tensorboard tests (#154244)
In preparation for https://github.com/pytorch/pytorch/pull/153746, I am bumping protobuf to 5.29.4 and fixing the tensorboard tests first.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154244
Approved by: https://github.com/malfet, https://github.com/cyyever
2025-05-25 00:50:07 +00:00
bobrenjc93
53ecb8159a Introduce statically_known_false (#154291)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154291
Approved by: https://github.com/mengluy0125
2025-05-24 14:23:55 +00:00
xinan.lin
2dfc0e3327 [Inductor UT] Reuse test_fused_attention.py for Intel GPU. (#154110)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154110
Approved by: https://github.com/eellison, https://github.com/jansel, https://github.com/EikanWang
2025-05-24 09:51:33 +00:00
Yu, Guangye
e904d01c16 Make inductor UT to be generic (#154196)
# Motivation
https://github.com/pytorch/pytorch/pull/151773 introduces UT `test_triton_template_generated_code_caching` failed on XPU;
https://github.com/pytorch/pytorch/pull/153895 introduces UT `test_mutation_rename` failed on XPU;

fix https://github.com/pytorch/pytorch/issues/154218

# Additional Context
With this PR, both failed UTs passed on local machine.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154196
Approved by: https://github.com/jansel
2025-05-24 02:47:46 +00:00
Nikita Shulga
975bbc63db [MPS][BE] Move fmod/remainder to Metal ops (#154280)
This accomplishes following:
 - Fixes correctness problem with large integer types (though probably makes it slower, but this could not be avoided if one wants to compute accurate answer)
 - Makes op faster for floating point types (as Metal kernel invocation is faster than creating MPSGraph)
 - Eliminates need for several correctness workarounds

Fixes https://github.com/pytorch/pytorch/issues/154171
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154280
Approved by: https://github.com/dcci
ghstack dependencies: #154275, #154290
2025-05-24 01:45:33 +00:00
Nikita Shulga
e5f63f4f66 [CI] Move Mac testing to 3.12 (#154177)
Prep step to completely move away from Conda during the builds..

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154177
Approved by: https://github.com/huydhn, https://github.com/cyyever, https://github.com/atalman
ghstack dependencies: #154237, #154268, #154271, #154269, #154270
2025-05-24 01:41:20 +00:00
leslie-fang-intel
7ba6fb69e6 [Inductor][CPP] Enable vectorized fp8 E5M2 quant dequant (#153365)
**Summary**
This PR enables the vectorization codegen with Inductor CPP backend for `FP8_E5M2` `quant` from `float32` and `dequant` to `float32`.

**Test Plan**
```
python test/inductor/test_cpu_repro.py -k test_dequant_quant_lowering_fp8_e5m2
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153365
Approved by: https://github.com/jansel, https://github.com/jgong5
ghstack dependencies: #152417, #152418, #153364
2025-05-23 23:20:02 +00:00
leslie-fang-intel
b77a6504fa [Inductor][CPP] Enable vectorized fp8 quant dequant (#152418)
**Summary**
This PR enables the vectorization codegen with Inductor CPP backend for `FP8_E4M3` `quant` from `float32` and `dequant` to `float32`.

**Test Plan**
```
python test/inductor/test_cpu_repro.py -k test_dequant_quant_lowering_fp8_e4m3
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152418
Approved by: https://github.com/jansel, https://github.com/jgong5, https://github.com/CaoE
ghstack dependencies: #152417
2025-05-23 23:05:17 +00:00
Howard Huang
aa3eab2ce6 Fix tcp init when using port 0 (#154156)
I hit this in tests when calling `init_process_group(init_method="tcp://localhost:0", ...)`. You can't use port 0 due to the bug in the conditional and will get error `ValueError: Error initializing torch.distributed using tcp:// rendezvous: port number missing`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154156
Approved by: https://github.com/d4l3k, https://github.com/Skylion007
2025-05-23 21:41:58 +00:00
Nikita Shulga
acd0873d3b [CI] Fix TestDynamoTimed.test_ir_count for 3.12 (#154268)
Python-3.12 emits the same bytecode as 3.13 for code in question
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154268
Approved by: https://github.com/clee2000, https://github.com/atalman
ghstack dependencies: #154237
2025-05-23 20:08:19 +00:00
PyTorch MergeBot
28af44285b Revert "[c10d] Add support for testing SIGABRT return (#153167)"
This reverts commit 499a76b844.

Reverted https://github.com/pytorch/pytorch/pull/153167 on behalf of https://github.com/malfet due to Broke lint, see fe784c5a2c/1 ([comment](https://github.com/pytorch/pytorch/pull/153167#issuecomment-2905623868))
2025-05-23 19:44:08 +00:00
Shangdi Yu
fe784c5a2c Fix torchbind path in AOTI package loader (#154265)
Summary: as title, fix the path in package loader and fix the test to take the additional dir into consideration.

Test Plan:
```
buck run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:torchbind
```

Reviewed By: angelayi

Differential Revision: D75308904

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154265
Approved by: https://github.com/clee2000, https://github.com/malfet
2025-05-23 19:32:53 +00:00
Angela Yi
3b21d79225 [export] Move PT2ArchiveWriter/Reader to torch/export (#153795)
Summary:
Before:
`from sigmoid.core.package.pt2_archive import PT2ArchiveWriter, PT2ArchiveReader, is_sigmoid_package`
After:
`from torch.export.pt2_archive import PT2ArchiveWriter, PT2ArchiveReader, is_pt2_package`

By merging the two PT2ArchiveReader/Writers, into using the native PytorchFileReader/Writer, the open source PT2 archive also changed to have an additional folder. However this PR still maintains support for loading an old PT2 archive which does not have the additional folder.

Before:
```
├── archive_format
├── byteorder
├── .data
│   ├── serialization_id
│   └── version
├── data
│   ├── aotinductor

```
After:
```
├── tmp
│   ├── archive_format
│   ├── byteorder
│   ├── .data
│   │   ├── serialization_id
│   │   └── version
│   ├── data
│   │   ├── aotinductor
```

Test Plan:
`buck2 test //sigmoid/...`
https://www.internalfb.com/intern/testinfra/testrun/5348024839248187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153795
Approved by: https://github.com/zhxchen17
2025-05-23 19:04:36 +00:00
Ke Wen
499a76b844 [c10d] Add support for testing SIGABRT return (#153167)
`SIGABRT` is a common return by *negative* distributed tests, which checks for effectiveness of NaN assert, watchdog throw, etc.

These errors are not detectable by traditional statements like `with self.assertRaises(RuntimeError)`.

Instead, we'd need to check for the process's return code, e.g. `SIGABRT(6)` would have a return code of -6.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153167
Approved by: https://github.com/fduwjj
2025-05-23 19:04:28 +00:00
PyTorch MergeBot
4ff19ecf66 Revert "[export] Move PT2ArchiveWriter/Reader to torch/export (#153795)"
This reverts commit 7e80f23516.

Reverted https://github.com/pytorch/pytorch/pull/153795 on behalf of https://github.com/malfet due to Looks like it broke lots of tests, see ec368a1903/1 ([comment](https://github.com/pytorch/pytorch/pull/153795#issuecomment-2905415496))
2025-05-23 18:29:08 +00:00
Angela Yi
7e80f23516 [export] Move PT2ArchiveWriter/Reader to torch/export (#153795)
Summary:
Before:
`from sigmoid.core.package.pt2_archive import PT2ArchiveWriter, PT2ArchiveReader, is_sigmoid_package`
After:
`from torch.export.pt2_archive import PT2ArchiveWriter, PT2ArchiveReader, is_pt2_package`

By merging the two PT2ArchiveReader/Writers, into using the native PytorchFileReader/Writer, the open source PT2 archive also changed to have an additional folder. However this PR still maintains support for loading an old PT2 archive which does not have the additional folder.

Before:
```
├── archive_format
├── byteorder
├── .data
│   ├── serialization_id
│   └── version
├── data
│   ├── aotinductor

```
After:
```
├── tmp
│   ├── archive_format
│   ├── byteorder
│   ├── .data
│   │   ├── serialization_id
│   │   └── version
│   ├── data
│   │   ├── aotinductor
```

Test Plan:
`buck2 test //sigmoid/...`
https://www.internalfb.com/intern/testinfra/testrun/5348024839248187

Differential Revision: D74616598

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153795
Approved by: https://github.com/zhxchen17
2025-05-23 15:40:25 +00:00
Laith Sakka
9e089bb5b6 change guard_or impl for better perf and simplicity (#153674)
PR time benchmarks has been showing regressions as we move to guard_or_false, reason is that prev implementation do not cache.
This new approach will propagate the fallback value to eval and return it. allowing eval to cache and reducing scamming logs and complexity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153674
Approved by: https://github.com/bobrenjc93
2025-05-23 15:24:28 +00:00
Aaron Orenstein
4b7abce6a4 Fix fake tensor caching when output has unbacked (#153034)
We handle fake tensor caching in two ways:
1. If the inputs have no symbols (SymInt, etc) then we cache on the FakeTensorMode.
2. If the inputs have symbols then we cache on the ShapeEnv.

This way the symbols in the inputs and outputs are associated with the guards in place at the time of the call.

However - it's possible to have an op where there are no symbols in the inputs but there is an unbacked symbol in the output.  In this case we shouldn't cache at all because what would that really mean?

So this PR changes the caching behavior so that if there's a symbol in the output which doesn't come in some way from the input then we refuse to cache that op.

Added a test which checks for this case.

While in there I also did a couple other related changes:
1. Added negative caching - if we see that an (op, args) failed to cache previously we don't even bother trying to cache it again.
2. Reworked the inner behavior of _cached_dispatch_impl a little to make it more clear which bits we expect to be able to throw _BypassDispatchCache and add some comments.

The latest version of this also:
1. Addresses the problem that caused #153891.
    The issue was that with caching ops are required to support `__eq__`.  Unfortunately _RecordFunction is minimalistic and doesn't support that - so in the off-chance that two keys hash to the same value the `__eq__` check would raise an exception.

    Apparently this was much more common on MacOS where memory patterns end up with more reuse (so the object IDs are the same and give you the same hash value for objects that use pointer hash).

    Tested locally on MacOS where running
```
python test/inductor/test_torchinductor.py GPUTests
```
was pretty much guaranteed to fail (at least for me) somewhere around test 100-200 and passed all 800 tests after this change.

Another way to test this is to run the inductor tests with `torch._subclasses.fake_tensor._DispatchCacheKey.__hash__` monkey-patched to return a constant (causing all values to hash-collide) but this can't really be checked-in since it causes the cache lookup to turn into an O(n) lookup which takes a crazy long time to run through all the tests...

2. Folds in #153780 to ensure that exceptions raised from the op don't include the context from the cache key bypass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153034
Approved by: https://github.com/masnesral, https://github.com/tugsbayasgalan
2025-05-23 15:03:31 +00:00
Ke Wen
25149cd173 [c10d] Add more tests to prevent extra context (#154174)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):

Loop a bunch of sync ops and see if any of them creates extra context.
Requires nvml to check number of processes resident on a device.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154174
Approved by: https://github.com/atalman
2025-05-23 09:54:01 +00:00