Commit Graph

87 Commits

Author SHA1 Message Date
Shangdi Yu
d1950d4bb5 Change IR node's stack trace to be computed lazily (#160487)
Summary: When an IR node is an inherited class, post_init is called once for each super().__init__() call. To avoid duplicated calls, we make stack trace computation happen lazily.

Test Plan:
CI

Rollback Plan:

Differential Revision: D80137870

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160487
Approved by: https://github.com/angelayi
2025-08-13 21:41:25 +00:00
Sandeep Narendranath Karjala
fc80f6859e Fix collective schedule logging and runtime tests (#160260)
Summary:

- Fix collective schedule logging so that only logs when collectives present
- Fix runtime estimate test to check if each op has a number value

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160260
Approved by: https://github.com/Skylion007
2025-08-11 20:58:52 +00:00
Sandeep Narendranath Karjala
8034b2a732 [inductor] Add TLParse artifact for logging runtime of collective and compute ops (#159730)
Summary:

- debug.py: Added log_runtime_estimates() function to dump runtime estimation data as structured tlparse artifacts in JSON format
- test_structured_trace.py: Added comprehensive test coverage with testing compute and collective ops

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159730
Approved by: https://github.com/yushangdi
ghstack dependencies: #159190
2025-08-05 22:06:32 +00:00
Sandeep Narendranath Karjala
85e74d5ace [inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190)
This change introduces structured logging of the collective communication schedule, enabling downstream tools (e.g. TLParse) to ingest and analyze per‑rank collective‐order information for multi‑rank jobs.

- Iterates over scheduler.nodes, filters for _CollectiveKernel nodes
- Extracts each op’s python_kernel_name
- Emits a structured JSON payload under the inductor_collective_schedule artifact name
- Dumps the full schedule list to collective_schedule.json via the PyTorch trace‑structured artifact
- Added comprehensive unit tests for collective schedule tracing: Created test_collective_schedule_empty() and test_collective_schedule_real() tests to verify structured trace logging works correctly for both empty collective schedules and real collective operations (like all_reduce and wait_tensor from _c10d_functional ops).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159190
Approved by: https://github.com/yushangdi, https://github.com/xmfan
2025-08-01 21:51:42 +00:00
PyTorch MergeBot
490cb3f1a4 Revert "[inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190)"
This reverts commit bb62e1f769.

Reverted https://github.com/pytorch/pytorch/pull/159190 on behalf of https://github.com/clee2000 due to broke [GH job link](https://github.com/pytorch/pytorch/actions/runs/16658705097/job/47150840171) [HUD commit link](bb62e1f769) on mac ([comment](https://github.com/pytorch/pytorch/pull/159190#issuecomment-3141513921))
2025-07-31 22:22:13 +00:00
Sandeep Narendranath Karjala
bb62e1f769 [inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190)
This change introduces structured logging of the collective communication schedule, enabling downstream tools (e.g. TLParse) to ingest and analyze per‑rank collective‐order information for multi‑rank jobs.

- Iterates over scheduler.nodes, filters for _CollectiveKernel nodes
- Extracts each op’s python_kernel_name
- Emits a structured JSON payload under the inductor_collective_schedule artifact name
- Dumps the full schedule list to collective_schedule.json via the PyTorch trace‑structured artifact
- Added comprehensive unit tests for collective schedule tracing: Created test_collective_schedule_empty() and test_collective_schedule_real() tests to verify structured trace logging works correctly for both empty collective schedules and real collective operations (like all_reduce and wait_tensor from _c10d_functional ops).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159190
Approved by: https://github.com/yushangdi, https://github.com/xmfan
2025-07-31 19:58:07 +00:00
Shangdi Yu
fd2c64e286 Fix duplicated sources in inductor provenance tracking (#159484)
Summary:

The `replace_hook` is called once for each user of the replaced node. This fix avoids adding duplicated node sources.

This also means that if there are two nested pass like:

```
with GraphTransformObserver(gm, "outer"):
      with GraphTransformObserver(gm, "inner"):
              .....
```

We'll only see the outer pass's pass name recorded for the replaced node in the "from_node" node meta. I think this is fine. In practice, the outer pass usually contains a more meaningful name, e.g. `decompose_auto_functionalized`, and the inner pass name is just a default pass name like `pattern_matcher`.

Test Plan:
```
buck2 run @mode/dev-nosan fbcode//caffe2/test:fx -- -r test_graph_transform_observer_replace
```

Rollback Plan:

Differential Revision: D79203058

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159484
Approved by: https://github.com/angelayi
2025-07-30 23:03:11 +00:00
Shangdi Yu
1e86fa2e5b Add stack trace to Inductor IR nodes if inductor.config.trace.provenance_tracing=True (#158576)
Summary:
- Split `create_mapping` to `create_mapping_pre_post_grad_nodes` and  ` create_node_mapping_kernel_to_post_grad`
- Store a mapping from pre_grad graph node names to stack traces in `_inductor_pre_grad_node_stack_trace`
- Add `stack_traces` member to ir.Node and add it to the string representation of ir.Node
- When we create an IR node, if `inductor.config.trace.provenance_tracing=True`, we populate `stack_traces` from `origins`. The nodes in `origins` are post_grad graph nodes. If a node has `node.stack_trace`, we store the stack_trace directly. This is particularly important for backward graph nodes because they don't have a mapping to pre-grad graph nodes. If a node doesn't have `.stack_trace ` (such as `linear`-> `addmm` nodes), we use the stack trace of the pre_grad graph nodes that it maps to.
  - A post grad graph node might not have stack trace if it correspond to multiple pre grad graph nodes, e.g. [GroupLinearFusion](a00442421a/torch/_inductor/fx_passes/group_batch_fusion.py (L299))

Example:

```
scheduling ExternKernelOut(
  python_kernel_name='extern_kernels.mm',
  name=buf0,
  layout=FixedLayout('cuda:0', torch.float32, size=[8, 16], stride=[16, 1]),
  inputs=[InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[8, 10], stride=[10, 1])), ReinterpretView(
    StorageBox(
      ConstantBuffer(name='fc1_weight', layout=FixedLayout('cuda:0', torch.float32, size=[16, 10], stride=[10, 1]))
    ),
    FixedLayout('cuda:0', torch.float32, size=[10, 16], stride=[1, 10]),
    origins=OrderedSet([mm_default_1]),
    stack_traces = {,
    File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/scripts/shangdiy/aot.py", line 29, in forward,
        x = self.fc1(x),
      File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/torch/nn/modules/linear.py", line 125, in forward,
        return F.linear(input, self.weight, self.bias),
    }
  )],
  constant_args=(),
  kwargs={},
  output_view=None,
  python_kernel_name=extern_kernels.mm,
  cpp_kernel_name=at::mm_out,
  ordered_kwargs_for_cpp_kernel=(),
  op_overload=None,
  arg_properties=[{}, {}],
  allarg_properties={},
  kwarg_properties=None,
  unbacked_bindings={},
  mutation_outputs=[],
  origin_node=mm_default_1,
  origins=OrderedSet([mm_default_1]),
  stack_traces = {,
  File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/scripts/shangdiy/aot.py", line 29, in forward,
      x = self.fc1(x),
    File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/torch/nn/modules/linear.py", line 125, in forward,
      return F.linear(input, self.weight, self.bias),
  }
)
```

Test Plan:
```
buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing
```

Rollback Plan:

Differential Revision: D78365534

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158576
Approved by: https://github.com/angelayi
2025-07-18 04:05:17 +00:00
Shangdi Yu
82a1ee1135 Refactor Provenance Tracking (#158399)
Summary:
As inductor provenance tracking is getting more use cases, we want to separate the inductor provenance tracking guarding flag from the general `trace.enabled`, so we can enable provenance tracking without all the overhead of `trace.enabled`

- change the guard flag from `trace.enabled` to `trace.provenance_tracking`.  It is turned on by either `TORCH_COMPILE_DEBUG=1` or `INDUCTOR_PROVENANCE=1`.
- Move the provenance tracking logic and variables out of DebugContext, because DebugContext is only enabled with `trace.enabled`. Since the variables are now global variables, added `reset_provenance_globals()` context manager to reset them for each `compile_fx()` call.
- Move `set_kernel_post_grad_provenance_tracing` from `util.py` to `debug.py` so now all provenance related logic is in `debug.py`.

In the future, if we want to enable it further, we can change the provenance tracking flag to be enabled when `TORCH_TRACE` is set. I think we should do that in a separate PR, so it's easier to revert if this flag change creates any problem.

See more motivation in internal Diff

Test Plan:
```
buck2 run mode/dev-nosan fbcode//caffe2/test:fx -- -r test_graph_transform_observer
buck run mode/dev-nosan  fbcode//caffe2/test:fx -- -r graph_provenance
buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing
```

Differential Revision: D78287976

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158399
Approved by: https://github.com/angelayi
2025-07-17 00:23:00 +00:00
henrylhtsang
6c0b42fd2f [inductor][cutlass backend] Log prescreening elpase (#155508)
Differential Revision: [D76311352](https://our.internmc.facebook.com/intern/diff/D76311352/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155508
Approved by: https://github.com/jingsh
2025-06-12 16:48:52 +00:00
Oguz Ulgen
d1947a8707 Migrate from lru_cache to cache (#155613)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613
Approved by: https://github.com/ezyang
ghstack dependencies: #155612
2025-06-11 19:44:18 +00:00
Rachel Guo
cad0727fe1 Rename the provenance tracing artifact name for kernel <-> post_grad nodes mapping (#154046)
Summary:
Context:

Recently we've added a couple more kernel types support other than inductor generated triton kernels,

such as cpu cpp kernels, extern kernels.

The name appeared in tlparse chrome link can be confusing to users.

Rename from

`inductor_triton_kernel_to_post_grad_nodes.json`

to `inductor_generated_kernel_to_post_grad_nodes.json`

Test Plan: CI

Differential Revision: D75159042

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154046
Approved by: https://github.com/yushangdi
2025-05-22 19:20:56 +00:00
PyTorch MergeBot
3443627e07 Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473)"
This reverts commit 4f4ecc583e.

Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))
2025-05-16 08:29:26 +00:00
Aaron Gokaslan
4f4ecc583e [BE]: Enable RUFF TRY400 rule - log.exception (#153473)
Change logging.error to logging.exception to log additional information when relevant.  A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473
Approved by: https://github.com/albanD, https://github.com/cyyever
2025-05-15 13:36:59 +00:00
Vlad K
6a84fe65ec Fix code portability when looking for Dot (#153259)
When trying to plot a trace graph, Inductor checks if "dot" is installed. Currently, the code runs a "which dot" command.

By default, Windows doesn't have the "which" command. This patch replaces it with the more portable alternative.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153259
Approved by: https://github.com/Skylion007
2025-05-10 16:12:44 +00:00
Benjamin Glass
01cbf5a30a [AOTInductor] Add wrapper and kernel code to debug code logging (#153181)
This is a simple PR to make the AOTInductor wrapper and kernel code get output by `TORCH_COMPILE_DEBUG=1`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153181
Approved by: https://github.com/desertfire
2025-05-10 15:31:18 +00:00
eellison
2295efa1b3 Fix only logging ir_post_fusion with torch_compile_debug enabled (#148499)
Because we were invoking the logs through `V.debug`, it was not running if TORCH_COMPILE_DEBUG was not set. this is because there is some magic the in debug [getattr](d789c22712/torch/_inductor/debug.py (L468-L480)).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148499
Approved by: https://github.com/shunting314
2025-03-05 05:35:09 +00:00
Xuehai Pan
1cb4e2df65 [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144550
Approved by: https://github.com/jansel
2025-02-28 13:33:19 +00:00
Riley Dulin
20295c017e Fix import of getArtifactLogger for ir_pre_fusion and ir_post_fusion (#147560)
Fixes #147002

There was an issue with the previous PR https://github.com/pytorch/pytorch/pull/147248 that didn't show up in CI,
where a logging import was not complete in torch/_inductor/debug.py before importing it.
This only happened if someone directly imported the file without doing any other imports before.

Also set to off_by_default by request to reduce log spew.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147560
Approved by: https://github.com/Skylion007
2025-02-25 03:36:08 +00:00
Riley Dulin
93316cfe94 Move ir_pre_fusion.txt and ir_post_fusion.txt to TORCH_LOGS (#147248)
Fixes #147002

Moves ir_{pre, post}_fusion.txt to be controlled by TORCH_LOGS instead of TORCH_COMPILE_DEBUG.
Updated tests of these logs as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147248
Approved by: https://github.com/eellison
2025-02-20 00:26:17 +00:00
Shangdi Yu
a4e4368157 add node mapping processing (#146103)
Summary:
Add `node_mapping = create_node_mapping(pre_grad_graph_id, inductor_post_to_pre_grad_nodes, debug_info)`, to produce a `inductor_provenance_tracking_node_mappings.json` file. This file will be used by the provenance tracking highlighter tool to create provenance visualization.

`inductor_triton_kernel_to_post_grad_nodes.json` and `inductor_provenance_tracking_node_mappings.json` files are not dumped if they are both empty. So it's removed from some of the `test_structured_trace` tests.

Test Plan:
CI
```
buck run mode/dev-nosan  fbcode//caffe2/test:fx -- -r graph_provenance

buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing

python test/dynamo/test_structured_trace.py
```

Differential Revision: D68190173

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146103
Approved by: https://github.com/chenyang78
2025-02-01 08:29:29 +00:00
shangdiy
6bd19e65b1 add inductor_triton_kernel_mapping_post_grad.json to tlparseadd changes (#145954)
Landing D67612181 here. The original exported PR somehow fails OSS CI, but this one doesn't (though the PR content is the same).

Add debug trace artifact to inductor_triton_kernel_mapping_post_grad.json (debug artifact for provenance tracking) to tlparse.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145954
Approved by: https://github.com/YUNQIUGUO
2025-01-30 06:18:48 +00:00
Randolf Scholz
835e770bad Use typing.IO[bytes] instead of io.BytesIO in annotations (#144994)
Fixes #144976

Using appoach ① `IO[bytes]`, but could also try with a protocol.

## Notes:

- moved `torch.serialization.FILE_LIKE` to `torch.types.FileLike`
- Use `FileLike` annotation where it makes sense
- made sure those functions also support `os.PathLike`
- Replaced `isinstance(x, io.BytesIO)` with `isinstance(x, (io.IOBase, IO))` where appropriate.
- Replaced `BinaryIO` with `IO[bytes]` (the two ABCs are almost identical, the only difference is that `BinaryIO` allows `bytearray` input to `write`, whereas `IO[bytes]` only `bytes`)
- needed to make `torch.serialization._opener` generic to avoid LSP violations.
- skipped `torch/onnx/verification` for now (functions use `BytesIO.getvalue` which is not part of the `IO[bytes]` ABC, but it kind of seems that this is redundant, as e.g. `onnx.load` supports `str | PathLike[str] | IO[bytes]` directly...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144994
Approved by: https://github.com/ezyang, https://github.com/Skylion007
2025-01-27 18:08:07 +00:00
Shangdi Yu
4cc5e880f9 Add accuracy issue support in AOTI Minifier (#145539)
Summary:

Add three more repro levels for AOTI minifier (level 2 already exists). They are the same as the existing dynamo minifier repro levels.

Now AOTI minifier can minify and repro programs that have numerical accuracy issues as well.

1: Dumps the original graph out to repro.py if compilation fails
2: Dumps a minifier_launcher.py if aoti fails.
3: Always dumps a minifier_launcher.py. Good for segfaults.
4: Dumps a minifier_launcher.py if the accuracy fails.

Refactor AOTI minifier unit tests to be cleaner and better re-use the existing minifier testing code. We do not need to manually patch {"aot_inductor.dump_aoti_minifier": True} to each test now, this config is generated in the test code.

Differential Revision: D68294638

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145539
Approved by: https://github.com/desertfire
2025-01-24 23:07:19 +00:00
Aaron Orenstein
893ca1dfe1 PEP585 update - torch/_inductor/[_-i]* (#145137)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145137
Approved by: https://github.com/bobrenjc93
2025-01-19 01:22:47 +00:00
Rachel Guo
9275091d6e [provenance_tracking] Dump inductor_triton_kernel_to_post_grad_nodes.json info in debug_trace (#143055)
Summary:
This diff mainly adds code changes to dump `inductor_triton_kernel_to_post_grad_nodes.json` artifact which contains mapping info from post_grad -> inductor kernel code:
`{"inductor_triton_kernel_name": [post_grad_node_0, post_grad_node_1, ..., ], "..."}.`

Example paste: P1695235000 verified on the test model.  See "Test Plan":

We use this artifact to demonstrate provenance tracking in the frontend 3-tab highlighter tool:
https://github.com/YUNQIUGUO/compiler_explorer (copy/pasted the input files for demo purpose for now and will integrate with Shangdi's tool to 4-tab)

https://pxl.cl/66BzK

Note: Currently only supports mapping for inductor's`TritonKernel` type. TODO for enhancing more support for `ExternKernel` and other inductor generated kernel type, etc.

Test Plan:
test_model_coverage.sh:
```
#!/bin/sh
MODEL_ENTITY_ID=644688112
SNAPSHOT_ID=32
MODULE=merge

# buck2 build --show-output mode/opt -c=python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true -c fbcode.nvcc_arch=a100,h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark

TORCH_COMPILE_DEBUG=1 CUDA_VISIBLE_DEVICES=0 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCH_LOGS="+inductor, schedule, fusion, output_code" TORCH_TRACE="tmp/guorachel_tt" TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 ../buck-out/v2/gen/fbcode/d29ee94b913014f1/caffe2/torch/fb/model_transform/experimental/benchmark/__mts_gpu_benchmark__/mts_gpu_benchmark.par --model-path manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/${MODEL_ENTITY_ID}/${SNAPSHOT_ID}/gpu_lowering/input.predictor.disagg.gpu.merge --lower-backend AOT_INDUCTOR_EP --gpu-trace --aot-inductor-config="{'max_autotune': True}" 2>&1 | tee output.txt
```
 {F1973765026}

```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/inductor:provenance_tracing -- --exact 'caffe2/test/inductor:provenance_tracing - test_triton_kernel_post_grad_mapping_aot_inductor (caffe2.test.inductor.test_provenance_tracing.TestProvenanceTracingArtifact)'
```

```
TORCH_LOGS="+inductor, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:provenance_tracing -- -r test_triton_kernel_post_grad_mapping_aot_inductor
```

Differential Revision: D66967510

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143055
Approved by: https://github.com/chenyang78
2024-12-18 06:51:50 +00:00
Tom Ritchford
da67a6a7bb [inductor] Replace set by OrderedSet (#138466)
Uses the set_linter from https://github.com/pytorch/pytorch/pull/138454
and considerable manual editing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138466
Approved by: https://github.com/eellison
2024-12-13 16:08:45 +00:00
Tom Ritchford
dc23f1944a Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-12 17:39:14 +00:00
PyTorch MergeBot
5c97ac9721 Revert "Remove unused Python variables in torch/[_-a]* (#133492)"
This reverts commit fda975a7b3.

Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else.  The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))
2024-12-11 17:29:12 +00:00
Tom Ritchford
fda975a7b3 Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-10 21:48:44 +00:00
Shangdi Yu
02c509669a Aoti minifier flatten (#141156)
Flatten the inputs to minifier so AOTI Minifier can handle unflattened inputs and kwargs.

- flatten the inputs in minifier
- changed the "load_and_run" part of the minifier verification to run on the flattened inputs.
- refactored code to keep `torch._inductor.__init__.py` clean
- update doc

`python test/inductor/test_minifier.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141156
Approved by: https://github.com/desertfire
2024-12-06 07:12:45 +00:00
Jason Ansel
6eca0aee76 [inductor] Refactor ir.Layout into ir.OutputSpec (#140910)
This separate the concepts of a Layout (size/stride/etc) and an OutputSpec (which includes multiple outputs).  Which should make typing easier.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140910
Approved by: https://github.com/ezyang
ghstack dependencies: #140895
2024-11-21 20:01:57 +00:00
Aaron Orenstein
06f619d999 typing ir.py - part 2 (#131846)
See #131852

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131846
Approved by: https://github.com/eellison
ghstack dependencies: #139238
2024-11-06 00:01:15 +00:00
PyTorch MergeBot
6dada2136a Revert "Refactor FxGraphDrawer to use HTML-like labels (#137726)"
This reverts commit 1e73842029.

Reverted https://github.com/pytorch/pytorch/pull/137726 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it looks like some internal components are failing after this change and need to be updated ([comment](https://github.com/pytorch/pytorch/pull/137726#issuecomment-2455332612))
2024-11-04 17:44:44 +00:00
Gabriel Ferns
1e73842029 Refactor FxGraphDrawer to use HTML-like labels (#137726)
Fixes https://github.com/pytorch/pytorch/issues/137499
Testing: Added a new unit test to make sure that the regression case succeeds.
I'm debating about whether to make the borders visible. I'm partial to no borders, but it might make it harder for some people to read?
![68a2b0e3-orig_fx_graph_diagram](https://github.com/user-attachments/assets/fbc2fd98-9e76-488e-8ebe-c64fbf206932)
Vs.
![2bfe1c4f-orig_fx_graph_diagram](https://github.com/user-attachments/assets/b6bc88ba-dda2-4cf7-84ac-a615e1e03a74)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137726
Approved by: https://github.com/eellison, https://github.com/malfet
2024-11-01 23:19:50 +00:00
eellison
d90717e4e2 Add option to save real tensors in TORCH_COMPILE_DEBUG repro (#138110)
This pr adds a utility to try to try to construct the corresponding real tensor values of fake tensors by seeing if their meta storage is contained in the meta converter.

Then, we are able to save real tensor values for fx_graph_runnable if `TORCH_COMPILE_DEBUG_SAVE_REAL=1` is set.

Differential Revision: [D64502744](https://our.internmc.facebook.com/intern/diff/D64502744)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138110
Approved by: https://github.com/ezyang
2024-10-28 16:18:22 +00:00
Xuan Zhang
c05a7adb36 [inductor][debug] fix draw_buffers (#135266)
**Before:**
![image](https://github.com/user-attachments/assets/aac756f3-1349-4647-9da3-87cf105cf647)

**After:**
<img width="791" alt="image" src="https://github.com/user-attachments/assets/d72c663c-e598-42fa-ac40-9e58956f1ec1">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135266
Approved by: https://github.com/yf225
2024-09-06 04:12:41 +00:00
Aaron Orenstein
d95aedf5fd [BE] typing for decorators - fx/_compatibility (part 1) (#134202)
Part of #134054.

This corresponds to the pytorch mypy changes from D61493706. Updating takes so
long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change.
So landing these 'type: ignore' for pytorch in advance of them actually being needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202
Approved by: https://github.com/Skylion007
2024-08-22 17:07:33 +00:00
Edward Z. Yang
5e4d8eb831 Don't generate stack entry for DebugContext.wrap (#132802)
See https://github.com/pytorch/pytorch/pull/132073 for motivation

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132802
Approved by: https://github.com/albanD
ghstack dependencies: #132801
2024-08-07 23:59:38 +00:00
Adnan Akhundov
8927fc209f [inductor] Add type hints to functions in debug.py (#131836)
Summary: ATT

Test Plan: lintrunner

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131836
Approved by: https://github.com/eellison
2024-07-28 04:54:22 +00:00
PyTorch MergeBot
945bf78894 Revert "[BE] typing for decorators - fx/_compatibility (#131568)"
This reverts commit 193f62fde9.

Reverted https://github.com/pytorch/pytorch/pull/131568 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident.  This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))
2024-07-28 03:43:39 +00:00
Xuan Zhang
3d7c424a75 [inductor] update users to buffers instead of scheduler nodes (#131796)
After a recent refactoring of inductor, `.users` are now associated with buffers instead of scheduler nodes.

In `debug.py`, one such usage of `.users` is not updated accordingly, and the change here fixes that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131796
Approved by: https://github.com/yf225
2024-07-26 03:34:26 +00:00
Aaron Orenstein
193f62fde9 [BE] typing for decorators - fx/_compatibility (#131568)
See #131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131568
Approved by: https://github.com/justinchuby, https://github.com/oulgen, https://github.com/zou3519
2024-07-25 22:24:19 +00:00
Xuehai Pan
b6d477fd56 [BE][Easy][16/19] enforce style for empty lines in import segments in torch/_i*/ (#129768)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129768
Approved by: https://github.com/jansel
2024-07-20 16:20:58 +00:00
Aaron Orenstein
ea614fb2b1 Flip default value for mypy disallow_untyped_defs [2/11] (#127839)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127839
Approved by: https://github.com/oulgen
2024-06-08 18:23:08 +00:00
_daohang_
0a6df4fca6 delete inductor config.trace.compile_profile (#127143)
Fixes #ISSUE_NUMBER

https://fb.workplace.com/groups/257735836456307/posts/687858786777341/?comment_id=687861123443774&reply_comment_id=687865486776671

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127143
Approved by: https://github.com/Chillee
2024-06-07 18:05:50 +00:00
Xuehai Pan
a28bfb5ed5 [4/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort functorch (#127125)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127125
Approved by: https://github.com/Skylion007
ghstack dependencies: #127122, #127123, #127124
2024-05-25 22:45:38 +00:00
Jason Ansel
235f24fc66 [inductor] Add FileLock around V.debug.copy (#122665)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122665
Approved by: https://github.com/ezyang
2024-03-28 03:17:33 +00:00
eellison
1d13c82559 Precompile in background (#121997)
Precompile benchmarking choices in parallel, and then wait on those choices prior to benchmarking. In the case of deferred templates, we only only wait only those choices in the scheduler to allow multiple separate lowerings to compile in parallel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121997
Approved by: https://github.com/jansel
ghstack dependencies: #121996, #120275
2024-03-20 18:34:12 +00:00
Kai Londenberg
96eff4ef70 [inductor max autotune] Detailed autotuning result logs ( machine-readable ) (#119004)
This diff introduces a new separate logging of autotuning results,
with the intention of making the results analyzable, specifically
those for the new experimental Cutlass backend.

Results are logged as text files with one JSON document corresponding to a single benchmark result per line.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119004
Approved by: https://github.com/jansel
ghstack dependencies: #120620
2024-02-29 18:24:13 +00:00