Commit Graph

434 Commits

Author SHA1 Message Date
PyTorch MergeBot
07adae3dac Revert "Make FX Graph Cache work with distributed training (#133374)"
This reverts commit dcdb25453e.

Reverted https://github.com/pytorch/pytorch/pull/133374 on behalf of https://github.com/albanD due to Broke trunk ([comment](https://github.com/pytorch/pytorch/pull/133374#issuecomment-2291289260))
2024-08-15 13:43:16 +00:00
PyTorch MergeBot
32d890745d Revert "Add cache timings info to tlparse (#133504)"
This reverts commit 7eb31e5023.

Reverted https://github.com/pytorch/pytorch/pull/133504 on behalf of https://github.com/albanD due to Broke trunk ([comment](https://github.com/pytorch/pytorch/pull/133374#issuecomment-2291289260))
2024-08-15 13:43:16 +00:00
Oguz Ulgen
7eb31e5023 Add cache timings info to tlparse (#133504)
https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpLR1T85/rank_1/0_0_0/fx_graph_cache_hash_11.json

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133504
Approved by: https://github.com/jamesjwu
ghstack dependencies: #133362, #133363, #133374
2024-08-15 05:53:00 +00:00
Oguz Ulgen
dcdb25453e Make FX Graph Cache work with distributed training (#133374)
During distributed training if all ranks except one hit the cache, the rank that did not hit the cache will cause a NCCL timeout since rest of the ranks will enter the collective and start the timer. This PR uses the new PTD API to increase timeout for the ranks that hit the cache by the amount of time the cache would save.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133374
Approved by: https://github.com/ezyang
ghstack dependencies: #133362, #133363
2024-08-14 22:58:48 +00:00
Xu Han
36c4ed8e49 [inductor] add FreeLibrary to DLLWrapper for Windows. (#133184)
For previous PR https://github.com/pytorch/pytorch/pull/132630 . We found `DLLWrapper` class doesn't have `_dlclose` implemention for Windows.

I write a small test project to figure out how to make it works on Windows: https://github.com/xuhancn/ctypes_all_lifecycle/blob/main/pysrc/module_manage.py#L30-L61
Test result: https://github.com/xuhancn/ctypes_all_lifecycle/tree/main?tab=readme-ov-file#ctypes_cyclepy

So, I have port the Windows FreeLibrary implemention to pytorch DLLWrapper in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133184
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-08-12 19:55:48 +00:00
Xu Han
4a3a30c36e [inductor] remove deprecated cpp_builder implementation. (#133161)
I have worked with @henrylhtsang to switch the cpp_builder to new one. We have removed the dependency to the old implementation.
So, it is time to remove the old implementation now. This PR is done the change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133161
Approved by: https://github.com/ezyang
2024-08-10 14:21:22 +00:00
Xu Han
2ad011ca73 [inductor] remove debug code of AotCodeCompiler (#132823)
Since we switch AotCodeCompiler to new cpp_builder: https://github.com/pytorch/pytorch/pull/132766
We can remove debug code of AotCodeCompiler.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132823
Approved by: https://github.com/henrylhtsang
2024-08-10 08:04:48 +00:00
James Wu
f037803290 Add ChromiumEventLogger, log FXGraphCache and AOTAutogradCache (#132864)
This PR implements ChromiumEventLogger in all @dynamo_timed events. For each dynamo timed call, we log:
- A start event before starting the function execution
- An end event after finishing the function execution
- An extra pair of start/end events for any phase names included in dynamo.

Separately, this also gives us the ability to log instant events. I use them to log cache hits/misses as a first step. The little arrows on the bottom of the UI are cache hits/misses, and you can look at cache details by clicking each triangle.

The outputted chromium trace events can be viewed in perfetto for a timeline of an execution. Here's what it looks like for a run of nanogpt:
![image](https://github.com/user-attachments/assets/cb9e6c7a-1acf-45e6-8a27-6651d9ae6132)

And another with warm start:
![image](https://github.com/user-attachments/assets/cd9709bc-59ef-4da1-a7dd-10b1a0ab9b8f)

Trace events are based around the JSON Event format: https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview

We may want to switch to the less deprecated Protobuf format later, but so far I don't see any features we care about supported there.

Internal FB employees can see a link to this in the tlparse output:
https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpVi1FIl/dedicated_log_torch_trace_bb4zl_bc.log/index.html

I'll also work on logging these

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132864
Approved by: https://github.com/aorenste
2024-08-10 01:15:53 +00:00
Henry Tsang
78cf8df4a0 [aoti] forward fix of [inductor] switch AotCodeCompiler to new cpp_builder. (take 3) (#133042)
Summary:
Forward fix of a test failure caused by D60773405.

The idea of D60773405 is that we need to use absolute path. So we will want to use the older version of path for output_so and output_o.

However, when I was copying the older definitions of output_so and output_o, I thought it was okay to simplify it a bit. See https://github.com/pytorch/pytorch/pull/131304#issuecomment-2270016609

Turns out I was wrong.

Test Plan: ci

Differential Revision: D60990594

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133042
Approved by: https://github.com/hl475, https://github.com/desertfire
2024-08-09 18:53:27 +00:00
Danielmic
32f9a809c7 Replace [[unlikely]] with unlikely(x) (#130816)
Do not use `[[unlikely]]` as its c++20 language features, see https://en.cppreference.com/w/cpp/language/attributes/likely

Fixes https://github.com/pytorch/pytorch/issues/130815

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130816
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/malfet
2024-08-07 10:38:13 +00:00
Henry Tsang
e98eac76b3 [inductor] switch AotCodeCompiler to new cpp_builder. (take 3) (#132766)
Summary: This is basically https://github.com/pytorch/pytorch/pull/131304 together with https://github.com/pytorch/pytorch/pull/132594 and absolute path fix for fbcode.

Test Plan: ci

Differential Revision: D60773405

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132766
Approved by: https://github.com/xuhancn, https://github.com/chenyang78, https://github.com/desertfire
2024-08-06 23:56:34 +00:00
eellison
18b678082e [Easy] log output code path on cache hit (#132718)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132718
Approved by: https://github.com/oulgen, https://github.com/masnesral
2024-08-06 21:59:30 +00:00
Gabriel Ferns
c3ee07c71c add missing profiler include in cpp code generation (#132419)
Summary:
When a user sets config.profiler_mark_wrapper_call, RECORD_FUNCTION annotations are added to the code. This requires importing the header <ATen/record_function.h>, but the conditional for doing so didn't check
 config.profiler_mark_wrapper_call.

Test Plan:
This case is already covered in test_profiler_mark_wrapper_call.

```
(pytorch-3.10) [gabeferns@devvm2252.cco0 ~/pytorch (missing-profile-include)]$ TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCHINDUCTOR_CPP_WRAPPER=1 python test/inductor/test_torchinductor.py -k CpuTests.test_profiler_mark_wrapper_call_cpu
stats [('calls_captured', 1), ('unique_graphs', 1)]
inductor [('fxgraph_cache_miss', 1)]
aot_autograd [('total', 1), ('ok', 1)]
.
----------------------------------------------------------------------
Ran 1 test in 8.080s

OK
```

Fixes https://github.com/pytorch/pytorch/issues/131339

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132419
Approved by: https://github.com/jgong5, https://github.com/desertfire
2024-08-05 13:40:47 +00:00
Oguz Ulgen
09f9c256ad Add basic mypy annotations to inductor (#132416)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132416
Approved by: https://github.com/XuehaiPan, https://github.com/jamesjwu
ghstack dependencies: #132415
2024-08-04 18:43:37 +00:00
PyTorch MergeBot
f2ddd5e9e0 Revert "Add basic mypy annotations to inductor (#132416)"
This reverts commit 78927d37f6.

Reverted https://github.com/pytorch/pytorch/pull/132416 on behalf of https://github.com/ZainRizvi due to Sorry, this PR has entered a weird state in the diff train. Trying to revert it to skip it, and then we can try relanding it ([comment](https://github.com/pytorch/pytorch/pull/132415#issuecomment-2267631785))
2024-08-04 18:39:29 +00:00
Xuehai Pan
f7aeb394b6 [BE][Easy] Remove empty ISORT_SKIPLIST (#132572)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132572
Approved by: https://github.com/ezyang, https://github.com/justinchuby
ghstack dependencies: #129769
2024-08-04 10:24:09 +00:00
Sam Larsen
b71cd149ce Fix file lock issue in AotCodeCompiler (#132343)
Summary:
It looks like there are several places in AotCodeCompiler that write files in a way that aren't safe for concurrency. There's a filelock to cope with that, but it seems like the lock path isn't quite robust enough to prevent races. We have an internal stress test failing when executing multiple concurrent versions of the test. It seems as though there's some variability in the content we write to the cpp file, which means we can get a different 'key' across different runs. The lock path includes that key in the lock path name, but the path for the "consts_path" is computed separately. Therefore, I see things like this:

- The computed 'key' is `cp5tgbuxuegvg5g2j7oi6u74nkf3v7mx5w3qzl6qbedtmw5tq77z`
- The lock_path (based on the key) is: `/tmp/torchinductor_slarsen/locks/cp5tgbuxuegvg5g2j7oi6u74nkf3v7mx5w3qzl6qbedtmw5tq77z.lock`
- The cpp path is (also includes the key) is: `/tmp/torchinductor_slarsen/cenzkqfnhu53mrhrdhzjtnblzyma2hgmeo7hai5yqsxzirdavurh/cp5tgbuxuegvg5g2j7oi6u74nkf3v7mx5w3qzl6qbedtmw5tq77z.cpp`
- The consts_path (not based on the key) is: `/tmp/torchinductor_slarsen/cenzkqfnhu53mrhrdhzjtnblzyma2hgmeo7hai5yqsxzirdavurh/cifbshkqkbsurzldsyi2vl5bsnhvejmavys4kktpwrzmpo4ysuoy.bin`

So we have different test instances using different lock paths, but touching the same consts_path and therefore stomping on each others' consts_path. To fix, include the key in the consts_paths.

Test Plan: Ran internal stress test. Repro'd failure and verified this change fixes it.

Differential Revision: D60552021

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132343
Approved by: https://github.com/desertfire
2024-08-02 19:01:37 +00:00
Edward Z. Yang
290f09f829 Ban decorator usage of dynamo_timed (#132328)
This is a more manual version of https://github.com/pytorch/pytorch/pull/132073 that just manually creates the new function at each call site instead of magicking it with clone. Review with whitespace diffs off.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132328
Approved by: https://github.com/albanD
2024-08-02 12:00:46 +00:00
PyTorch MergeBot
c8958f8f84 Revert "Ban decorator usage of dynamo_timed (#132328)"
This reverts commit 9853c048eb.

Reverted https://github.com/pytorch/pytorch/pull/132328 on behalf of https://github.com/clee2000 due to seems to have broken functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation_aliases_other_input [GH job link](https://github.com/pytorch/pytorch/actions/runs/10204547165/job/28233976446) [HUD commit link](9853c048eb).  Test passed on PR, probably a landrace, base is only 10 hours old ([comment](https://github.com/pytorch/pytorch/pull/132328#issuecomment-2263909337))
2024-08-01 20:20:28 +00:00
Oguz Ulgen
78927d37f6 Add basic mypy annotations to inductor (#132416)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132416
Approved by: https://github.com/XuehaiPan, https://github.com/jamesjwu
ghstack dependencies: #132415
2024-08-01 20:14:25 +00:00
Edward Z. Yang
9853c048eb Ban decorator usage of dynamo_timed (#132328)
This is a more manual version of https://github.com/pytorch/pytorch/pull/132073 that just manually creates the new function at each call site instead of magicking it with clone. Review with whitespace diffs off.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132328
Approved by: https://github.com/albanD
2024-08-01 19:27:58 +00:00
eellison
f32ab3b9e3 Migrate Inductor scheduler, dependencies, ir, and codegen/common to use OrderedSet (#130004)
Python's set is non deterministic. There is an internal failure which we recently ran into which did not consistently fail.

See, repro here: P1453035092.

Now, with these changes, it does consistently fail. In follow ups we could also consider adding a lintrule for uses of either set() or set literals.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130004
Approved by: https://github.com/oulgen
2024-08-01 04:37:15 +00:00
Sergii Dymchenko
d72e863b3e Fix lint after PR #130572 (#132316)
Fix lint after https://github.com/pytorch/pytorch/pull/130572

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132316
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/ZainRizvi
2024-07-31 20:00:31 +00:00
James Wu
f9e4d05c15 Save and run post compilation steps within FXGraphCache (#130572)
This PR mostly refactors by putting code into utils files so that they can be shared between codecache.py and compile_fx.py. Afterwards, it then changes compile_fx so that:
- When saving to FXGraphCache, we save onto the CompiledFXGraph all the necessary metadata for running post compile steps (realigning inputs, cudagraphification).
- When loading from FXGraphCache, we use the saved information directly, instead of calculating them from scratch.

What this does is make it so that `FXGraphCache.load()` is a perfect cache on compile_fx_inner, in that it **returns exactly what compile_fx_inner returns**. This also makes it possible for AOTAutogradCache, given a key to the fx graph cache and example inputs, to get back the full return value of compile_fx_inner.

## What's a post compile step?
We define a **post-compile** to be the set of actions that need to run after FXGraphCache either loads from the cache or misses and runs compilation. These steps include:
- Setting the tracing context's output strides
- Running cudagraphs if enabled
- Maybe realign inputs if cudagraphs didn't run

To run these steps, we save all the necessary metadata in CompiledFxGraph, and use them on a cache hit to reconstruct the object.

## Splitting cudagraphs work into pre/post compile
Cudagraphs does a lot of work on the input graph module to determine if cudagraphs can be enabled. This is the code that involves cudagraph_tests and stack traces. This will work in a world where we have access to the input graph module, but with AOTAutograd warm start, we won't have access to that information anymore. Therefore we can split cudagraphs work into two parts: on a cache miss (and therefore a full compile), we do the cudagraphs testing work, and save cudagraph_fail_reasons into the cache. Then on a cache hit, we know whether or not we can run cudagraphs, and if we can't, we can emit the correct error messages.

Implementation notes:
- We save `fx_kwargs` directly onto the CompiledFXGraph. `fx_kwargs` is already, by definition, part of the cache key, so this is safe to do when it comes to cache correctness.
- ^ Why do we do above even though FXGraphCache.load takes fx_kwargs as an argument? Because AOTAutogradCache **doesn't** have access to fx_kwargs: they're annoyingly encoded in the functools.partial() of the fw_compiler, so *only* inductor knows about these options. They're fully captured by the AOTAutogradCache key (since every key to fx_kwargs is either a global config, or a field that's deterministic based on an input graph module), but their values are still needed to run cudagraphs/postprocessing. Therefore, it's easier/safer to store it on the cached result.
- Willing to hear other approaches here if we think saving these extra fields is not reasonable, though I can't think of another way to do this that's less complicated to explain.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130572
Approved by: https://github.com/eellison
2024-07-31 18:32:40 +00:00
PyTorch MergeBot
784a6ec5a3 Revert "Migrate Inductor scheduler, dependencies, ir, and codegen/common to use OrderedSet (#130004)"
This reverts commit 13d744464f.

Reverted https://github.com/pytorch/pytorch/pull/130004 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/10183945999/job/28170099930) [HUD commit link](13d744464f) probably a landrace, the base is 21 hours old ([comment](https://github.com/pytorch/pytorch/pull/130004#issuecomment-2260946562))
2024-07-31 16:49:21 +00:00
eellison
13d744464f Migrate Inductor scheduler, dependencies, ir, and codegen/common to use OrderedSet (#130004)
Python's set is non deterministic. There is an internal failure which we recently ran into which did not consistently fail.

See, repro here: P1453035092.

Now, with these changes, it does consistently fail. In follow ups we could also consider adding a lintrule for uses of either set() or set literals.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130004
Approved by: https://github.com/oulgen
2024-07-31 16:22:11 +00:00
Xuehai Pan
e7eeee473c [BE][Easy][14/19] enforce style for empty lines in import segments in torch/_[a-c]*/ and torch/_[e-h]*/ and torch/_[j-z]*/ (#129765)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129765
Approved by: https://github.com/ezyang
2024-07-31 10:42:50 +00:00
PyTorch MergeBot
239d4d2489 Revert "[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127)"
This reverts commit 9606d61e0c.

Reverted https://github.com/pytorch/pytorch/pull/130127 on behalf of https://github.com/ZainRizvi due to broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/130127#issuecomment-2258871791))
2024-07-30 17:39:41 +00:00
Xu Han
9606d61e0c [reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127)
Changes:
1. Switch `AotCodeCompiler` to new cpp_builder.
2. Only use `deprecated_cpp_compile_command` for `fb_code`, due to I can't debug anymore on no Meta internal environment access.
3. Add `TODO` comments for further some Meta employee help on contine to do this work.
4. Due to item 3, we only remaining `deprecated_cpp_compile_command` for `fb_code` to be fix, let's remove `validate_new_cpp_commands`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-27 01:46:13 +00:00
PyTorch MergeBot
bb64702eb3 Revert "[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127)"
This reverts commit 520182dbff.

Reverted https://github.com/pytorch/pytorch/pull/130127 on behalf of https://github.com/clee2000 due to broke internal tests D60265910 ([comment](https://github.com/pytorch/pytorch/pull/130127#issuecomment-2253113689))
2024-07-26 16:40:03 +00:00
Bin Bao
1e24f7875e [AOTI] Fix ABI-compatible mode link issue for CPU (#131791)
Summary: Found this "cannot find -ltorch: No such file or directory" issue when collecting AOTI CPU perf for the dashboard. Debugging on the CI machine revealed two problems: 1) no valid VEC_ISA was picked; 2) when 1 happens, libtorch path is not specified in the linker path.

This PR fixes the second problem. A later PR will fix the first problem, but somehow finding the right VEC_ISA causes a performance regression, which needs more investigation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131791
Approved by: https://github.com/zou3519, https://github.com/chenyang78
2024-07-26 09:02:13 +00:00
Xu Han
520182dbff [reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127)
Changes:
1. Switch `AotCodeCompiler` to new cpp_builder.
2. Only use `deprecated_cpp_compile_command` for `fb_code`, due to I can't debug anymore on no Meta internal environment access.
3. Add `TODO` comments for further some Meta employee help on contine to do this work.
4. Due to item 3, we only remaining `deprecated_cpp_compile_command` for `fb_code` to be fix, let's remove `validate_new_cpp_commands`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-25 21:45:40 +00:00
PyTorch MergeBot
fe2e6f0c51 Revert "[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127)"
This reverts commit dfc9bfc883.

Reverted https://github.com/pytorch/pytorch/pull/130127 on behalf of https://github.com/atalman due to Breask CI test_dataloader.py::TestDataLoader::test_segfault [GH job link](https://github.com/pytorch/pytorch/actions/runs/10099725941/job/27930133346) [HUD commit link](2c1851f04e) ([comment](https://github.com/pytorch/pytorch/pull/130127#issuecomment-2251360224))
2024-07-25 20:44:04 +00:00
Xu Han
dfc9bfc883 [reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127)
Changes:
1. Switch `AotCodeCompiler` to new cpp_builder.
2. Only use `deprecated_cpp_compile_command` for `fb_code`, due to I can't debug anymore on no Meta internal environment access.
3. Add `TODO` comments for further some Meta employee help on contine to do this work.
4. Due to item 3, we only remaining `deprecated_cpp_compile_command` for `fb_code` to be fix, let's remove `validate_new_cpp_commands`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-25 18:34:08 +00:00
Yunqiu Guo
059f9fb30b [BE][inductor] Type annotate codecache.py and config.py (#131427)
As title.

Checked/ Referred to the raw json file for runtime types . (and tried to cover all the missing annotations listed in the .json) this time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131427
Approved by: https://github.com/eellison, https://github.com/oulgen
2024-07-25 05:54:38 +00:00
angelayi
b90aa18569 [aoti] Add initial custom op support (#127034)
Re-land of https://github.com/pytorch/pytorch/pull/125242

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127034
Approved by: https://github.com/malfet
2024-07-24 20:29:55 +00:00
Xu Han
72d17d95d7 [inductor] Enable dynamo for Windows. RC1 (#131286)
Changes:
1. Enable Windows in `check_if_inductor_supported`.
2. Disable Windows in `AotCodeCompiler`.
3. Force Windows inductor to `c++20` to support `std::enable_if_t`.
4. Disable `test_x86inductor_quantizer` UT on `Windows` temporary, It still some issue need to be fix: https://github.com/pytorch/pytorch/pull/131308 .

Based on this PR, I have run first model `resnet18` on Windows inductor successful.
<img width="1036" alt="image" src="https://github.com/user-attachments/assets/2642bda1-1845-417a-aaba-39bdf22e65d6">

TODO:
1. Upgrade pytorch Windows build to `c++20`.
2. Fix and re-enable `test_x86inductor_quantizer` UT on `Windows`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131286
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-24 15:26:55 +00:00
Aaron Orenstein
0e780a7d69 [BE] Remove some mypy allow-untyped-decorators that are no longer needed (#131564)
See #131429
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131564
Approved by: https://github.com/oulgen
2024-07-24 02:00:08 +00:00
Nicolas Macchioni
2cf220956a [inductor] fix CacheBase.get_system on AMD (#131365)
Summary: CacheBase.get_system on AMD is missing device name and hip version, fix that

Test Plan:
on AMD:
```
buck run fbcode//mode/opt-amd-gpu scripts/nmacchioni/repros/amd_cache_key:repro
{'device': {'name': 'gfx942:sramecc+:xnack-'}, 'version': {'triton': '3.0.006965bceb379c60d8184a4166f502457952938167bfb69592ebf48abebfb0ce9-4856d26164925fd955c779d8f67ecf47cc5754052b008714b3a580d708b13dd8-06965bceb379c60d8184a4166f502457952938167bfb69592ebf48abebfb0ce9-23d635e690d670bf61798e1259674b78c0ed5ba222ab6a455f329f27a758fc2d-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-166fbf4e6f8845f354611638861a2a9e1dc2654224c278e10b566f09549dae7e-ccd93feaad4c82c8c1604557340de15fda0a3c84fe83f5a4d1e12a07a77bf3f4-cf28658fa328f7f283ec4e6ccc6c48d7c2a8ddbdf5134d3eb35c9b38ce4ace44-b9d80690b3109c2aaf5ece450d62e93b37eb6ab38552089794b3bb36e36a22b3-36130a37af1b19a0dec569aa08d30b00c74c8f02b6b632999d86dea169146792-4a620da64e0c263067f0dbf6c721f5214a5ac315625a07dd98520502ddf7e22f-6ace95666f6a4ecd2b1a7fc7ae865d1a9239608bd020cb6e4b8d15233c2dd9b3', 'hip': '6.0.32830'}, 'hash': 'c4db04316e15953dda8648f5a43a3f208f2c0ba454666cc7d78e40527aab85ec'}
```

on Nvidia:
```
buck run fbcode//mode/opt scripts/nmacchioni/repros/amd_cache_key:repro
{'device': {'name': 'NVIDIA PG509-210'}, 'version': {'triton': '6de41ec76ecad84e618d692e6793a4ebe707ae68a0c033a222105daa72214d7c', 'cuda': '12.0.0'}, 'hash': 'b58d0aa37d80fc2932c1b7576ca876b77aa1258db1c14e27d1f201bd15376faf'}
```

Differential Revision: D60062972

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131365
Approved by: https://github.com/eellison
2024-07-24 00:11:59 +00:00
Aaron Orenstein
5a0068cc69 [BE] mypy: disallow untyped decorators (#131428)
Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations.

Step 1 - Enable the error and override in all the offending files.

#131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428
Approved by: https://github.com/justinchuby, https://github.com/oulgen
2024-07-23 21:50:55 +00:00
Oguz Ulgen
4f0497c747 Divorce triton and pt2 remote caching (#131345)
Now that remote caching has evolved into various parts of PT2, we want to separate triton and pt2 caching as changes to one have caused SEVs to the other.

Differential Revision: D60047752

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131345
Approved by: https://github.com/aorenste
2024-07-23 16:28:12 +00:00
xinan.lin
8da19fec60 [Inductor] Support store SPIR-V binary file output from Intel Triton. (#130849)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130849
Approved by: https://github.com/peterbell10, https://github.com/EikanWang
2024-07-22 05:59:03 +00:00
Xuehai Pan
b6d477fd56 [BE][Easy][16/19] enforce style for empty lines in import segments in torch/_i*/ (#129768)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129768
Approved by: https://github.com/jansel
2024-07-20 16:20:58 +00:00
Xu Han
6e7b9ee8a0 [inductor] adapte windows file path (#130713)
This PR is depends on https://github.com/pytorch/pytorch/pull/130132 can be landed successful.
The detailed log: https://github.com/pytorch/pytorch/issues/124245#issuecomment-2211889758

After the file path was adapted for Windows, the first Windows inductor case was run successful.

```python
import torch

def foo(x, y):
    a = torch.sin(x)
    b = torch.cos(x)
    return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))
```

Result:
![image](https://github.com/user-attachments/assets/4944df47-e74d-476b-8eb5-1d1fd5abeb41)

Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130713
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/desertfire
2024-07-18 23:19:38 +00:00
Oguz Ulgen
442bfa7fc4 Fix mypy error (#130992)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130992
Approved by: https://github.com/izaitsevfb
2024-07-17 22:49:23 +00:00
Oguz Ulgen
a0da1265c5 Define key in codecache (#130979)
Test Plan:
```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_misc.py::InlineInbuiltNNModulesMiscTests::test_auto_functionalize_can_with_none_return_inline_inbuilt_nn_modules'
```

Differential Revision: D59875657

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130979
Approved by: https://github.com/jamesjwu
2024-07-17 22:44:50 +00:00
PyTorch MergeBot
874bbc53c9 Revert "Define key in codecache (#130979)"
This reverts commit 4112f68783.

Reverted https://github.com/pytorch/pytorch/pull/130979 on behalf of https://github.com/clee2000 due to broke lint on torch/_inductor/codecache.py https://github.com/pytorch/pytorch/actions/runs/9981737836/job/27586013811 f0faecd291 ([comment](https://github.com/pytorch/pytorch/pull/130979#issuecomment-2234392332))
2024-07-17 21:59:19 +00:00
Oguz Ulgen
4112f68783 Define key in codecache (#130979)
Test Plan:
```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_misc.py::InlineInbuiltNNModulesMiscTests::test_auto_functionalize_can_with_none_return_inline_inbuilt_nn_modules'
```

Differential Revision: D59875657

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130979
Approved by: https://github.com/jamesjwu
2024-07-17 21:19:13 +00:00
PyTorch MergeBot
41f5d5dcaf Revert "[inductor] adapte windows file path (#130713)"
This reverts commit e51e971a86.

Reverted https://github.com/pytorch/pytorch/pull/130713 on behalf of https://github.com/clee2000 due to sorry but I think its still failing, this time on windows CUDA https://github.com/pytorch/pytorch/actions/runs/9971126834/job/27552761451 bb62e9d7c3.  It was not run on PR due to being on the periodic workflow, which isnt usually run on PRs due to capacity issues for windows CUDA machines.  I will add ciflow/periodic to the PR to ensure the test gets run ([comment](https://github.com/pytorch/pytorch/pull/130713#issuecomment-2234092078))
2024-07-17 19:37:16 +00:00
Oguz Ulgen
1e13cb2f28 Log cache state to structured logs (#130845)
https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpRm4MaD/0_0_0/fx_graph_cache_hash_4.json

Differential Revision: D59795574

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130845
Approved by: https://github.com/jamesjwu
2024-07-17 16:45:45 +00:00