Commit Graph

413 Commits

Author SHA1 Message Date
eellison
f32ab3b9e3 Migrate Inductor scheduler, dependencies, ir, and codegen/common to use OrderedSet (#130004)
Python's set is non deterministic. There is an internal failure which we recently ran into which did not consistently fail.

See, repro here: P1453035092.

Now, with these changes, it does consistently fail. In follow ups we could also consider adding a lintrule for uses of either set() or set literals.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130004
Approved by: https://github.com/oulgen
2024-08-01 04:37:15 +00:00
Sergii Dymchenko
d72e863b3e Fix lint after PR #130572 (#132316)
Fix lint after https://github.com/pytorch/pytorch/pull/130572

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132316
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/ZainRizvi
2024-07-31 20:00:31 +00:00
James Wu
f9e4d05c15 Save and run post compilation steps within FXGraphCache (#130572)
This PR mostly refactors by putting code into utils files so that they can be shared between codecache.py and compile_fx.py. Afterwards, it then changes compile_fx so that:
- When saving to FXGraphCache, we save onto the CompiledFXGraph all the necessary metadata for running post compile steps (realigning inputs, cudagraphification).
- When loading from FXGraphCache, we use the saved information directly, instead of calculating them from scratch.

What this does is make it so that `FXGraphCache.load()` is a perfect cache on compile_fx_inner, in that it **returns exactly what compile_fx_inner returns**. This also makes it possible for AOTAutogradCache, given a key to the fx graph cache and example inputs, to get back the full return value of compile_fx_inner.

## What's a post compile step?
We define a **post-compile** to be the set of actions that need to run after FXGraphCache either loads from the cache or misses and runs compilation. These steps include:
- Setting the tracing context's output strides
- Running cudagraphs if enabled
- Maybe realign inputs if cudagraphs didn't run

To run these steps, we save all the necessary metadata in CompiledFxGraph, and use them on a cache hit to reconstruct the object.

## Splitting cudagraphs work into pre/post compile
Cudagraphs does a lot of work on the input graph module to determine if cudagraphs can be enabled. This is the code that involves cudagraph_tests and stack traces. This will work in a world where we have access to the input graph module, but with AOTAutograd warm start, we won't have access to that information anymore. Therefore we can split cudagraphs work into two parts: on a cache miss (and therefore a full compile), we do the cudagraphs testing work, and save cudagraph_fail_reasons into the cache. Then on a cache hit, we know whether or not we can run cudagraphs, and if we can't, we can emit the correct error messages.

Implementation notes:
- We save `fx_kwargs` directly onto the CompiledFXGraph. `fx_kwargs` is already, by definition, part of the cache key, so this is safe to do when it comes to cache correctness.
- ^ Why do we do above even though FXGraphCache.load takes fx_kwargs as an argument? Because AOTAutogradCache **doesn't** have access to fx_kwargs: they're annoyingly encoded in the functools.partial() of the fw_compiler, so *only* inductor knows about these options. They're fully captured by the AOTAutogradCache key (since every key to fx_kwargs is either a global config, or a field that's deterministic based on an input graph module), but their values are still needed to run cudagraphs/postprocessing. Therefore, it's easier/safer to store it on the cached result.
- Willing to hear other approaches here if we think saving these extra fields is not reasonable, though I can't think of another way to do this that's less complicated to explain.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130572
Approved by: https://github.com/eellison
2024-07-31 18:32:40 +00:00
PyTorch MergeBot
784a6ec5a3 Revert "Migrate Inductor scheduler, dependencies, ir, and codegen/common to use OrderedSet (#130004)"
This reverts commit 13d744464f.

Reverted https://github.com/pytorch/pytorch/pull/130004 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/10183945999/job/28170099930) [HUD commit link](13d744464f) probably a landrace, the base is 21 hours old ([comment](https://github.com/pytorch/pytorch/pull/130004#issuecomment-2260946562))
2024-07-31 16:49:21 +00:00
eellison
13d744464f Migrate Inductor scheduler, dependencies, ir, and codegen/common to use OrderedSet (#130004)
Python's set is non deterministic. There is an internal failure which we recently ran into which did not consistently fail.

See, repro here: P1453035092.

Now, with these changes, it does consistently fail. In follow ups we could also consider adding a lintrule for uses of either set() or set literals.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130004
Approved by: https://github.com/oulgen
2024-07-31 16:22:11 +00:00
Xuehai Pan
e7eeee473c [BE][Easy][14/19] enforce style for empty lines in import segments in torch/_[a-c]*/ and torch/_[e-h]*/ and torch/_[j-z]*/ (#129765)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129765
Approved by: https://github.com/ezyang
2024-07-31 10:42:50 +00:00
PyTorch MergeBot
239d4d2489 Revert "[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127)"
This reverts commit 9606d61e0c.

Reverted https://github.com/pytorch/pytorch/pull/130127 on behalf of https://github.com/ZainRizvi due to broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/130127#issuecomment-2258871791))
2024-07-30 17:39:41 +00:00
Xu Han
9606d61e0c [reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127)
Changes:
1. Switch `AotCodeCompiler` to new cpp_builder.
2. Only use `deprecated_cpp_compile_command` for `fb_code`, due to I can't debug anymore on no Meta internal environment access.
3. Add `TODO` comments for further some Meta employee help on contine to do this work.
4. Due to item 3, we only remaining `deprecated_cpp_compile_command` for `fb_code` to be fix, let's remove `validate_new_cpp_commands`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-27 01:46:13 +00:00
PyTorch MergeBot
bb64702eb3 Revert "[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127)"
This reverts commit 520182dbff.

Reverted https://github.com/pytorch/pytorch/pull/130127 on behalf of https://github.com/clee2000 due to broke internal tests D60265910 ([comment](https://github.com/pytorch/pytorch/pull/130127#issuecomment-2253113689))
2024-07-26 16:40:03 +00:00
Bin Bao
1e24f7875e [AOTI] Fix ABI-compatible mode link issue for CPU (#131791)
Summary: Found this "cannot find -ltorch: No such file or directory" issue when collecting AOTI CPU perf for the dashboard. Debugging on the CI machine revealed two problems: 1) no valid VEC_ISA was picked; 2) when 1 happens, libtorch path is not specified in the linker path.

This PR fixes the second problem. A later PR will fix the first problem, but somehow finding the right VEC_ISA causes a performance regression, which needs more investigation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131791
Approved by: https://github.com/zou3519, https://github.com/chenyang78
2024-07-26 09:02:13 +00:00
Xu Han
520182dbff [reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127)
Changes:
1. Switch `AotCodeCompiler` to new cpp_builder.
2. Only use `deprecated_cpp_compile_command` for `fb_code`, due to I can't debug anymore on no Meta internal environment access.
3. Add `TODO` comments for further some Meta employee help on contine to do this work.
4. Due to item 3, we only remaining `deprecated_cpp_compile_command` for `fb_code` to be fix, let's remove `validate_new_cpp_commands`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-25 21:45:40 +00:00
PyTorch MergeBot
fe2e6f0c51 Revert "[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127)"
This reverts commit dfc9bfc883.

Reverted https://github.com/pytorch/pytorch/pull/130127 on behalf of https://github.com/atalman due to Breask CI test_dataloader.py::TestDataLoader::test_segfault [GH job link](https://github.com/pytorch/pytorch/actions/runs/10099725941/job/27930133346) [HUD commit link](2c1851f04e) ([comment](https://github.com/pytorch/pytorch/pull/130127#issuecomment-2251360224))
2024-07-25 20:44:04 +00:00
Xu Han
dfc9bfc883 [reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127)
Changes:
1. Switch `AotCodeCompiler` to new cpp_builder.
2. Only use `deprecated_cpp_compile_command` for `fb_code`, due to I can't debug anymore on no Meta internal environment access.
3. Add `TODO` comments for further some Meta employee help on contine to do this work.
4. Due to item 3, we only remaining `deprecated_cpp_compile_command` for `fb_code` to be fix, let's remove `validate_new_cpp_commands`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-25 18:34:08 +00:00
Yunqiu Guo
059f9fb30b [BE][inductor] Type annotate codecache.py and config.py (#131427)
As title.

Checked/ Referred to the raw json file for runtime types . (and tried to cover all the missing annotations listed in the .json) this time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131427
Approved by: https://github.com/eellison, https://github.com/oulgen
2024-07-25 05:54:38 +00:00
angelayi
b90aa18569 [aoti] Add initial custom op support (#127034)
Re-land of https://github.com/pytorch/pytorch/pull/125242

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127034
Approved by: https://github.com/malfet
2024-07-24 20:29:55 +00:00
Xu Han
72d17d95d7 [inductor] Enable dynamo for Windows. RC1 (#131286)
Changes:
1. Enable Windows in `check_if_inductor_supported`.
2. Disable Windows in `AotCodeCompiler`.
3. Force Windows inductor to `c++20` to support `std::enable_if_t`.
4. Disable `test_x86inductor_quantizer` UT on `Windows` temporary, It still some issue need to be fix: https://github.com/pytorch/pytorch/pull/131308 .

Based on this PR, I have run first model `resnet18` on Windows inductor successful.
<img width="1036" alt="image" src="https://github.com/user-attachments/assets/2642bda1-1845-417a-aaba-39bdf22e65d6">

TODO:
1. Upgrade pytorch Windows build to `c++20`.
2. Fix and re-enable `test_x86inductor_quantizer` UT on `Windows`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131286
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-24 15:26:55 +00:00
Aaron Orenstein
0e780a7d69 [BE] Remove some mypy allow-untyped-decorators that are no longer needed (#131564)
See #131429
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131564
Approved by: https://github.com/oulgen
2024-07-24 02:00:08 +00:00
Nicolas Macchioni
2cf220956a [inductor] fix CacheBase.get_system on AMD (#131365)
Summary: CacheBase.get_system on AMD is missing device name and hip version, fix that

Test Plan:
on AMD:
```
buck run fbcode//mode/opt-amd-gpu scripts/nmacchioni/repros/amd_cache_key:repro
{'device': {'name': 'gfx942:sramecc+:xnack-'}, 'version': {'triton': '3.0.006965bceb379c60d8184a4166f502457952938167bfb69592ebf48abebfb0ce9-4856d26164925fd955c779d8f67ecf47cc5754052b008714b3a580d708b13dd8-06965bceb379c60d8184a4166f502457952938167bfb69592ebf48abebfb0ce9-23d635e690d670bf61798e1259674b78c0ed5ba222ab6a455f329f27a758fc2d-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-166fbf4e6f8845f354611638861a2a9e1dc2654224c278e10b566f09549dae7e-ccd93feaad4c82c8c1604557340de15fda0a3c84fe83f5a4d1e12a07a77bf3f4-cf28658fa328f7f283ec4e6ccc6c48d7c2a8ddbdf5134d3eb35c9b38ce4ace44-b9d80690b3109c2aaf5ece450d62e93b37eb6ab38552089794b3bb36e36a22b3-36130a37af1b19a0dec569aa08d30b00c74c8f02b6b632999d86dea169146792-4a620da64e0c263067f0dbf6c721f5214a5ac315625a07dd98520502ddf7e22f-6ace95666f6a4ecd2b1a7fc7ae865d1a9239608bd020cb6e4b8d15233c2dd9b3', 'hip': '6.0.32830'}, 'hash': 'c4db04316e15953dda8648f5a43a3f208f2c0ba454666cc7d78e40527aab85ec'}
```

on Nvidia:
```
buck run fbcode//mode/opt scripts/nmacchioni/repros/amd_cache_key:repro
{'device': {'name': 'NVIDIA PG509-210'}, 'version': {'triton': '6de41ec76ecad84e618d692e6793a4ebe707ae68a0c033a222105daa72214d7c', 'cuda': '12.0.0'}, 'hash': 'b58d0aa37d80fc2932c1b7576ca876b77aa1258db1c14e27d1f201bd15376faf'}
```

Differential Revision: D60062972

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131365
Approved by: https://github.com/eellison
2024-07-24 00:11:59 +00:00
Aaron Orenstein
5a0068cc69 [BE] mypy: disallow untyped decorators (#131428)
Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations.

Step 1 - Enable the error and override in all the offending files.

#131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428
Approved by: https://github.com/justinchuby, https://github.com/oulgen
2024-07-23 21:50:55 +00:00
Oguz Ulgen
4f0497c747 Divorce triton and pt2 remote caching (#131345)
Now that remote caching has evolved into various parts of PT2, we want to separate triton and pt2 caching as changes to one have caused SEVs to the other.

Differential Revision: D60047752

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131345
Approved by: https://github.com/aorenste
2024-07-23 16:28:12 +00:00
xinan.lin
8da19fec60 [Inductor] Support store SPIR-V binary file output from Intel Triton. (#130849)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130849
Approved by: https://github.com/peterbell10, https://github.com/EikanWang
2024-07-22 05:59:03 +00:00
Xuehai Pan
b6d477fd56 [BE][Easy][16/19] enforce style for empty lines in import segments in torch/_i*/ (#129768)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129768
Approved by: https://github.com/jansel
2024-07-20 16:20:58 +00:00
Xu Han
6e7b9ee8a0 [inductor] adapte windows file path (#130713)
This PR is depends on https://github.com/pytorch/pytorch/pull/130132 can be landed successful.
The detailed log: https://github.com/pytorch/pytorch/issues/124245#issuecomment-2211889758

After the file path was adapted for Windows, the first Windows inductor case was run successful.

```python
import torch

def foo(x, y):
    a = torch.sin(x)
    b = torch.cos(x)
    return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))
```

Result:
![image](https://github.com/user-attachments/assets/4944df47-e74d-476b-8eb5-1d1fd5abeb41)

Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130713
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/desertfire
2024-07-18 23:19:38 +00:00
Oguz Ulgen
442bfa7fc4 Fix mypy error (#130992)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130992
Approved by: https://github.com/izaitsevfb
2024-07-17 22:49:23 +00:00
Oguz Ulgen
a0da1265c5 Define key in codecache (#130979)
Test Plan:
```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_misc.py::InlineInbuiltNNModulesMiscTests::test_auto_functionalize_can_with_none_return_inline_inbuilt_nn_modules'
```

Differential Revision: D59875657

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130979
Approved by: https://github.com/jamesjwu
2024-07-17 22:44:50 +00:00
PyTorch MergeBot
874bbc53c9 Revert "Define key in codecache (#130979)"
This reverts commit 4112f68783.

Reverted https://github.com/pytorch/pytorch/pull/130979 on behalf of https://github.com/clee2000 due to broke lint on torch/_inductor/codecache.py https://github.com/pytorch/pytorch/actions/runs/9981737836/job/27586013811 f0faecd291 ([comment](https://github.com/pytorch/pytorch/pull/130979#issuecomment-2234392332))
2024-07-17 21:59:19 +00:00
Oguz Ulgen
4112f68783 Define key in codecache (#130979)
Test Plan:
```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_misc.py::InlineInbuiltNNModulesMiscTests::test_auto_functionalize_can_with_none_return_inline_inbuilt_nn_modules'
```

Differential Revision: D59875657

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130979
Approved by: https://github.com/jamesjwu
2024-07-17 21:19:13 +00:00
PyTorch MergeBot
41f5d5dcaf Revert "[inductor] adapte windows file path (#130713)"
This reverts commit e51e971a86.

Reverted https://github.com/pytorch/pytorch/pull/130713 on behalf of https://github.com/clee2000 due to sorry but I think its still failing, this time on windows CUDA https://github.com/pytorch/pytorch/actions/runs/9971126834/job/27552761451 bb62e9d7c3.  It was not run on PR due to being on the periodic workflow, which isnt usually run on PRs due to capacity issues for windows CUDA machines.  I will add ciflow/periodic to the PR to ensure the test gets run ([comment](https://github.com/pytorch/pytorch/pull/130713#issuecomment-2234092078))
2024-07-17 19:37:16 +00:00
Oguz Ulgen
1e13cb2f28 Log cache state to structured logs (#130845)
https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpRm4MaD/0_0_0/fx_graph_cache_hash_4.json

Differential Revision: D59795574

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130845
Approved by: https://github.com/jamesjwu
2024-07-17 16:45:45 +00:00
angelayi
cbf274d4a7 [aoti] Add packaging solution (#129895)
In this PR, I added support for packaging the AOTI generated files into a zipfile, and loading it in python.

`compile_so` takes the path to the package, a device, and a desired so_path location, and compiles package into a .so, and saves to the specified location.
`load_package` takes a path to the package and device, calls _extract_so, and then creates a callable to run the compiled model.

The zipfile generated looks like the following:
```
|- version
|- archive_format
|- data
   |- aotinductor
      |- cbtnafqaqrhvwztv7xudlal4xs6sofxa5oxccyuaqtrt6aozaklx.cubin  # AOTI cuda generated cubin files
      |- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe.cpp  # AOTI generated cpp file
      |- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe_compile_flags  # Flags for compiling the .o
      |- c6qqtnpgwfi3dv5nb76ai773kt45ezoxfwdmd7q37lvq6fs2tnoi.o  # AOTI saved const.o
      |- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe_linker_flags  # Flags for linking the files to form the .so
   |- constants
      |- constants.pt  # Constants saved using torch.save, can be loaded using mmap
```

The workflow is something like:
```
with torch.no_grad():
    ep = torch.export.export(
        model,
        example_inputs,
        dynamic_shapes=dynamic_shapes,
        strict=False,
    )
    gm = ep.module()
    package_path = torch._inductor.aot_compile(
        gm,
        example_inputs,
        options= {
              "aot_inductor.output_path": "my_path.pt2",  # or a directory
              "aot_inductor.package": True,
        }
    )
compiled_model = torch._inductor.package.load_package(package_path, device)
return compiled_model
```

I tried turning on loading the weights using mmap by default, but had some trouble with it, so that is just left as a todo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129895
Approved by: https://github.com/malfet
2024-07-17 13:56:58 +00:00
Xu Han
e51e971a86 [inductor] adapte windows file path (#130713)
This PR is depends on https://github.com/pytorch/pytorch/pull/130132 can be landed successful.
The detailed log: https://github.com/pytorch/pytorch/issues/124245#issuecomment-2211889758

After the file path was adapted for Windows, the first Windows inductor case was run successful.

```python
import torch

def foo(x, y):
    a = torch.sin(x)
    b = torch.cos(x)
    return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))
```

Result:
![image](https://github.com/user-attachments/assets/4944df47-e74d-476b-8eb5-1d1fd5abeb41)

Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130713
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/desertfire
2024-07-17 06:36:11 +00:00
PyTorch MergeBot
5f3c356a56 Revert "[inductor] adapte windows file path (#130713)"
This reverts commit 69e9917245.

Reverted https://github.com/pytorch/pytorch/pull/130713 on behalf of https://github.com/clee2000 due to broke functorch\test_eager_transforms.py on windows https://github.com/pytorch/pytorch/actions/runs/9958208725/job/27530132704 69e9917245.  Test failure on PR is real, possibly force merged to get around lint error? ([comment](https://github.com/pytorch/pytorch/pull/130713#issuecomment-2231901793))
2024-07-16 22:07:55 +00:00
Sam Larsen
156b99cfb1 [inductor] Handle inductor counters in fx graph cache (#130635)
Summary: Similar to the handling of metrics, save inductor counter deltas in the FX graph cache entry and increment the counters appropriately on a cache hit

Test Plan: new unit test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130635
Approved by: https://github.com/eellison
2024-07-16 20:07:16 +00:00
Xu Han
69e9917245 [inductor] adapte windows file path (#130713)
This PR is depends on https://github.com/pytorch/pytorch/pull/130132 can be landed successful.
The detailed log: https://github.com/pytorch/pytorch/issues/124245#issuecomment-2211889758

After the file path was adapted for Windows, the first Windows inductor case was run successful.

```python
import torch

def foo(x, y):
    a = torch.sin(x)
    b = torch.cos(x)
    return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))
```

Result:
![image](https://github.com/user-attachments/assets/4944df47-e74d-476b-8eb5-1d1fd5abeb41)

Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130713
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/desertfire
2024-07-16 13:53:39 +00:00
PyTorch MergeBot
f0d7164cb9 Revert "[inductor] switch AotCodeCompiler to new cpp_builder (#130127)"
This reverts commit 2abc7cc21b.

Reverted https://github.com/pytorch/pytorch/pull/130127 on behalf of https://github.com/izaitsevfb due to breaks meta-internal tests ([comment](https://github.com/pytorch/pytorch/pull/130127#issuecomment-2226313943))
2024-07-12 20:36:00 +00:00
Xuehai Pan
973037be6a [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199)
This PR changes the empty collection factory call to Python literals:

- `list()` -> `[]`
- `tuple()` -> `()`
- `dict()` -> `{}`

The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary:

```bash
$ python3 -m dis - <<EOS
import collections

d1 = {}
d2 = dict()

dict = collections.OrderedDict
d3 = dict()
EOS
```

```text
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (0)
              4 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (collections)
              8 STORE_NAME               0 (collections)

  3          10 BUILD_MAP                0
             12 STORE_NAME               1 (d1)

  4          14 PUSH_NULL
             16 LOAD_NAME                2 (dict)
             18 CALL                     0
             26 STORE_NAME               3 (d2)

  6          28 LOAD_NAME                0 (collections)
             30 LOAD_ATTR                8 (OrderedDict)
             50 STORE_NAME               2 (dict)

  7          52 PUSH_NULL
             54 LOAD_NAME                2 (dict)
             56 CALL                     0
             64 STORE_NAME               5 (d3)
             66 RETURN_CONST             1 (None)
```

The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199
Approved by: https://github.com/malfet
2024-07-11 17:30:28 +00:00
James Wu
5ed72ff5f5 Reduce all tensors to their metadata in AOTAutogradCache; add tests (#128583)
This PR makes it so that all tensors are reduced to their metadata in AOTAutogradCache. Because dynamo always embeds constant tensors into the FXgraph directly, there's no risk of a constant tensor whose values are semantically important being lost here. AOTAutograd itself may take a constant tensor and set it as an attribute on an FXGraph for inductor, but Dynamo never does this.

One other thing that this diff does is add `[pickler.fast](https://docs.python.org/3/library/pickle.html#pickle.Pickler.fast)` to our pickling algorithm for cache key generation. Pickle will often memoize/intern strings when pickling, leading to false cache misses due to inconsistent memoization. Turning on pickler.fast removes this behavior.

Technically `fast` is a "deprecated" feature according to python docs. But it's still supported in py3.8-3.12, and if it ever is removed, the only downside will just be a few more cache misses, so I think it's worth just adding here (and removing later as needed)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128583
Approved by: https://github.com/oulgen
ghstack dependencies: #128335
2024-07-11 15:39:09 +00:00
Xu Han
79c41bb58a [inductor] switch CppCodeCache to new cpp_builder. (#130132)
Changes:
1. switch CppCodeCache to new cpp_builder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130132
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-11 07:03:43 +00:00
Xu Han
2abc7cc21b [inductor] switch AotCodeCompiler to new cpp_builder (#130127)
Changes:
1. Switch `AotCodeCompiler` to new cpp_builder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-10 22:28:29 +00:00
Xu Han
e235db98c9 [Inductor] Add aot_mode UT to new cpp_builder. (#130105)
Changes:
1. Add `aot_mode` parameter to `validate_new_cpp_commands` UT.
2. Switch AotCodeCompiler vec isa command gen to new cpp_builder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130105
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-09 04:08:35 +00:00
PyTorch MergeBot
f9bb258892 Revert "[Inductor] Add aot_mode UT to new cpp_builder. (#130105)"
This reverts commit 21eeedb455.

Reverted https://github.com/pytorch/pytorch/pull/130105 on behalf of https://github.com/izaitsevfb due to Breaks 46 tests internally at meta with: OSError: CUDA_HOME environment variable is not set ([comment](https://github.com/pytorch/pytorch/pull/130105#issuecomment-2215392198))
2024-07-08 21:40:03 +00:00
PyTorch MergeBot
5e467604c3 Revert "[inductor] switch AotCodeCompiler to new cpp_builder (#130127)"
This reverts commit dc5f37193f.

Reverted https://github.com/pytorch/pytorch/pull/130127 on behalf of https://github.com/izaitsevfb due to Depends on #130105 which has to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130127#issuecomment-2215355259))
2024-07-08 21:25:28 +00:00
PyTorch MergeBot
09d57f577b Revert "[inductor] switch CppCodeCache to new cpp_builder. (#130132)"
This reverts commit 3957b3b349.

Reverted https://github.com/pytorch/pytorch/pull/130132 on behalf of https://github.com/izaitsevfb due to Depends on  #130105 which has to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130132#issuecomment-2215352180))
2024-07-08 21:22:39 +00:00
Xu Han
3957b3b349 [inductor] switch CppCodeCache to new cpp_builder. (#130132)
Changes:
1. switch CppCodeCache to new cpp_builder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130132
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-06 18:57:44 +00:00
Xu Han
dc5f37193f [inductor] switch AotCodeCompiler to new cpp_builder (#130127)
Changes:
1. Switch `AotCodeCompiler` to new cpp_builder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-06 18:44:13 +00:00
Xu Han
01ec03bac6 [inductor] switch HalideCodeCache to new cpp_builder. (#130146)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130146
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-06 17:35:17 +00:00
James Wu
e7ab7b83bc Have torch_key hash entire torch directory (#129250)
Summary:
Title. This way, both FXGraphCache and AOTAutogradCache use the same torch_key, and we don't need to only hash specific files.

There's an argument to be made to only hash *.py and *.cpp files. Maybe we can fix the glob to do that.

We use a buck_filegroup because otherwise $SRCs gets too large. By using `$(location :torch_sources)`, we make the genrule implicitly depend on all files globbed by torch_sources.

Test Plan:
Unit tests still pass on OSS
For torch_key:

```
buck2 build caffe2:src_hash.txt -v 2 --show-output
```
See the output, then make any change to any torch file. See that the hash changes.

Reviewed By: oulgen

Differential Revision: D58875785

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129250
Approved by: https://github.com/oulgen
2024-07-05 15:37:16 +00:00
Xu Han
21eeedb455 [Inductor] Add aot_mode UT to new cpp_builder. (#130105)
Changes:
1. Add `aot_mode` parameter to `validate_new_cpp_commands` UT.
2. Switch AotCodeCompiler vec isa command gen to new cpp_builder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130105
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-07-04 19:08:56 +00:00
Jason Ansel
0abcca85b7 [halide-backend] Support manual schedules (#129321)
Currently using this for some by-hand hacking, but might need to implement our own scheduler later.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129321
Approved by: https://github.com/shunting314
2024-07-03 05:56:40 +00:00
Xu Han
567dd1a3ca [inductor] unificate toolchain code. (#129816)
This PR is the implemention of https://github.com/pytorch/pytorch/issues/124245#issuecomment-2197778902 plan 2, and it is continued PR to https://github.com/pytorch/pytorch/pull/129789

Changes:
1. Unificate cpp builder's toolchain code.
2. Move all build related code to `cpp_builder.py`.
3. Optimize `codecache.py`, `cpp_builder.py` and `cpu_vec_isa.py` import logical follow: https://github.com/pytorch/pytorch/issues/124245#issuecomment-2197778902

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129816
Approved by: https://github.com/jansel
2024-07-02 09:52:06 +00:00