Commit Graph

116 Commits

Author SHA1 Message Date
Bin Bao
b0816e4714 [inductor] Fix AOTInductor output issues (#105773)
Summary: This is a follow-up on https://github.com/pytorch/pytorch/pull/105496. There are several issues with the previous fix,
1) It explicitly does copy for every output at the end of the main function;
2) When an output is ReinterpretView, no as_strided was generated for it;
3) There can be duplicated buffer declarations.

This PR fixes by making sure can_reuse behave consistently between two AOTIndcutor passes, and thus always generate the same set of kernels. It also adds handling of ReinterpretView.

Differential Revision: [D47692214](https://our.internmc.facebook.com/intern/diff/D47692214)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105773
Approved by: https://github.com/jansel
2023-07-24 01:58:49 +00:00
Aaron Gokaslan
6d43c89f37 [BE]: Update Ruff to 0.0.280 (#105724)
Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724
Approved by: https://github.com/ezyang, https://github.com/janeyx99
2023-07-22 23:03:34 +00:00
Justin Chu
79c5e33349 [BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436
Approved by: https://github.com/malfet, https://github.com/albanD
2023-07-21 07:38:46 +00:00
Shunting Zhang
1e87778552 [inductor] refactor wrapper benchmark code out of utils.py (#105584)
Refactor wrapper benchmark out of utils.py since
1. utils.py gets too large
2. I plan to add more code to wrapper benchmark for multi-kernel.

This is split out from https://github.com/pytorch/pytorch/pull/103469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105584
Approved by: https://github.com/jansel
2023-07-21 00:01:35 +00:00
Bin Bao
71067631c2 [inductor] Fix an AOTInductor missing output issue (#105496)
Summary: When an output buffer is reused instead of directly referring to the passed-in output, we need to explictly make a copy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105496
Approved by: https://github.com/jansel
2023-07-20 08:27:31 +00:00
Bin Bao
b10de43c0a Add aot_inductor as a test backend for benchmarking (#105221)
Summary:
Original PR at https://github.com/pytorch/pytorch/pull/104977. Landing from fbcode instead.

Add an aot_inductor backend (Export+AOTInductor) in the benchmarking harness. Note it is not a dynamo backend.

Moved files from torch/_inductor/aot_inductor_include to torch/csrc/inductor as a more standard way for exposing headers
Created a caching function in benchmarks/dynamo/common.py for compiling, loading and caching the .so file, as a proxy for a pure C++ deployment, but easier for benchmarking.

Differential Revision: D47452591

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105221
Approved by: https://github.com/jansel
2023-07-18 13:16:36 +00:00
chunyuan
1fdc88f877 Inductor cpp wrapper: fix codegen of FallbackKernel with kwargs (#104575)
Fix cpp wrapper failure on TorchBench model `hf_Reformer` with `randn`:
```
random_rotations = torch.randn(rotations_shape, device=vectors.device, dtype=vectors.dtype)
```

For cpp wrapper, when `kwargs` is not empty, for `OpOverloadPacket` kernel, we need to know the exact overload schema to handle the `kwargs` properly when calling the cpp kernel: including finding the correct order of the kwargs and getting the default value for optional args without provided value when calling the function (`layout` in the above case).

The current support in this PR is conservative and we'll extend the functionality in subsequent PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104575
Approved by: https://github.com/jgong5, https://github.com/desertfire
2023-07-15 03:33:44 +00:00
Bin Bao
528ab477ce [reland][inductor] Register an op for mm_plus_mm (#105153)
Summary: Reland https://github.com/pytorch/pytorch/pull/104835 after fixing internal build issues

Test Plan: CI

Differential Revision: D47442849

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105153
Approved by: https://github.com/clee2000
2023-07-14 14:35:29 +00:00
Kefei Lu
4328138c1e AOT inductor: error: ‘c10::Dispatcher’ has not been declared (#104742)
Differential Revision: D47275262

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104742
Approved by: https://github.com/desertfire
2023-07-14 01:47:52 +00:00
Catherine Lee
c36dca7bc5 Revert "[inductor] Register an op for mm_plus_mm (#104835)" (#105150)
This reverts commit 9c46a1620c.

Actual revert referenced in https://github.com/pytorch/pytorch/pull/105149

#104835 is causing internal builds to fail

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105150
Approved by: https://github.com/atalman
2023-07-13 17:13:45 +00:00
Bin Bao
9c46a1620c [inductor] Register an op for mm_plus_mm (#104835)
Summary: Currently the aten version of mm_plus_mm has no cpp
implementation, and thus cpp_wrapper can not generate the correct cpp
function call for it.

Differential Revision: [D47372057](https://our.internmc.facebook.com/intern/diff/D47372057)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104835
Approved by: https://github.com/jansel, https://github.com/SherlockNoMad
2023-07-12 02:34:02 +00:00
chunyuan
ba167e6578 Inductor cpp wrapper: fix codegen of ScatterFallback (#104524)
Fix cpp wrapper failure on TorchBench model `basic_gnn_edgecnn` and `hf_Reformer` which contain scatter OP.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104524
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-07-11 08:17:56 +00:00
XiaobingSuper
54f33265db inductor(re-land): support cpu fusion path for bfloat16 amp (#104399)
This PR is about the fusion of amp path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104399
Approved by: https://github.com/jgong5, https://github.com/eellison
2023-07-10 00:58:04 +00:00
Bin Bao
a860b965f1 [inductor] Relax custom op schema checking for cpp_wrapper (#104349)
Summary: Remove fallback ops whitelist because FallbackKernel.set_cpp_kernel is doing sufficient checking

Differential Revision: [D47269612](https://our.internmc.facebook.com/intern/diff/D47269612)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104349
Approved by: https://github.com/jgong5, https://github.com/chunyuan-w, https://github.com/jansel
2023-07-09 17:31:31 +00:00
XiaobingSuper
8ce3a18b6a inductor: reduce complie time by reducing repr calls of quantize or Opaque tensor (#104696)
For quantize or opaue tensor, if they are constant values, the calls of  tensor ```__repr__``` will have memory copy(https://github.com/pytorch/pytorch/blob/main/torch/_tensor_str.py#L550):
db1ac4e29b/torch/_inductor/codegen/wrapper.py (L289-L292)

for CPP codegen, there have many times of initiation of ```WrapperCodeGen```: https://github.com/pytorch/pytorch/blob/main/torch/_inductor/codegen/cpp.py#L2023, which consumes much time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104696
Approved by: https://github.com/jgong5, https://github.com/desertfire
2023-07-07 01:12:34 +00:00
Yang Chen
d2281e38ae Adds the initial support for AOTInductor model and interface (#104202)
This PR combines the C++ code for the AOTInductor's model and interface with Bin Bao's changes to AOTInductor codegen.

It adds a number of AOTInductor C interfaces that can be used by an inference runtime. Under the hood of the interfaces, the model code generated by the AOTInductor's codegen is wrapped into a class, AOTInductorModel, which manages tensors and run the model inference.

On top of AOTInductorModel, we provide one more abstract layer, AOTInductorModelContainer, which allows the user to have multiple inference runs concurrently for the same model.

This PR also adjusts the compilation options for AOT codegen, particularly some fbcode-related changes such as libs to be linked and header-file search paths.

Note that this is the very first version of the AOTInductor model and interface, so many features (e.g. dynamic shape) are incomplete. We will support those missing features in in future PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104202
Approved by: https://github.com/desertfire
2023-06-27 00:37:26 +00:00
Jason Ansel
8c54cd434f [inductor] Fix allow_buffer_reuse=False (#103630)
Fixes #103461

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103630
Approved by: https://github.com/anijain2305
2023-06-15 22:50:01 +00:00
chunyuan
17217d367f Inductor cpp wrapper: support Constant in input (#103496)
## Description
Fix cpp wrapper for models which have constants in the graph inputs.

Python wrapper directly gets the value inside the wrapper call as a global variable passed when calling:
4081e924a8/torch/_inductor/codecache.py (L757)
The constants value has been saved in `mod.__dict__` in
4081e924a8/torch/_inductor/graph.py (L874-L875)
For cpp wrapper, we need to append constants to the input args, so as to pass this python value to the `inductor_entry_cpp` function explicitly.

### Example
Example of output code for dlrm in TorchBench with this fix:
```py
module = CppWrapperCodeCache.load(cpp_wrapper_src, 'inductor_entry_cpp', 'cfkc6c36t7cggi6mnokrdm5jhesnunjg5xysv3o3x3vaqmzmpe6r', False)

def _wrap_func(f):
    def g(args):
        args_tensor = [arg if isinstance(arg, torch.Tensor) else torch.tensor(arg) for arg in args]
        constants_tensor = [constant0, constant1]
        args_tensor.extend(constants_tensor)

        return f(args_tensor)
    return g
call = _wrap_func(module.inductor_entry_cpp)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103496
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/desertfire
2023-06-15 05:01:25 +00:00
Animesh Jain
58d2c66a70 [activation checkpointing] Higher order functional rng op wrappers (#102934)
Introduces two higher order operators
* run_and_save_rng_state - Saves the current rng state and then runs the op.
* run_with_rng_state - Runs the op with the rng state supplied as an input

Ideally, we would like to use torch.compile for these operators. But currently the plan is to introduce these operators at the partitioner level, obviating the need to support them fully through the torch.compile stack. To ensure that we have good enough debugging with minifiers, we have ensure that they work with make_fx. In future, we can move on torch.compile.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102934
Approved by: https://github.com/jansel, https://github.com/zou3519
2023-06-12 22:54:17 +00:00
Shunting Zhang
daf75c0759 [AOTAutograd] compare with stride hints (#103342)
We previously compare FakeTensor's strides with real tensor's strides. This cause dynamic dimension of FakeTensor being specialized to static int. This may cause a graph specialized for one shape being used by another shape which is wrong.

Use stride hints for the comparison instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103342
Approved by: https://github.com/malfet
2023-06-10 06:51:54 +00:00
chunyuan
d61cd03b97 Inductor cpp wrapper: support ConvTranspose and fix Convolution ir (#103308)
The changes in this PR include:
- Support ConvTranspose in cpp wrapper
- Fix cpp wrapper support for aten convolution when bias is `not None`: bias is in `args` instead of `kwargs` when it is `not None`. The change is covered by ConvTranspose dynamic shapes UT since we'll fall back to aten convolution in dynamic shape cases.
- Fix cpp wrapper support for `inf`. This is a UT added in https://github.com/pytorch/pytorch/issues/101865. The cpp wrapper UT is covered in `test_conv2d_unary` of `test_cpp_wrapper.py`. It's in `slowTest` category and seems not captured in the CI of that PR.

I will submit another PR to remove the hard-coded schema in these `ExternKernel`s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103308
Approved by: https://github.com/jgong5, https://github.com/desertfire
2023-06-10 03:53:05 +00:00
David Berard
cde4657284 [inductor] Support complex fallback for convert_element_type, _fft_c2c, view_as_real to support GoogleFnet with cpp wrapper (#103183)
Fixes #102752

These 3 fallback kernels appear in GoogleFnet because they take complex arguments - i.e., usually they aren't fallback kernels. To support this model, we added support for these 3 ops.

Details:
1. Add these 3 ops to the allowlist. I assume that we eventually want to support all fallback kernels, but for now we just add these 3 ops to the allowlist.
2. Support complex64 in cpp codegen
3. Support List[] arguments and ScalarType arguments in cpp codegen
4. Allow alias_info in schema arguments. In the original PR supporting fallback kernels for cpp wrapper, ops with schemas with non-null alias_info for any of the arguments were disallowed; but I don't think there's any reason we need to disallow these in cpp wrapper code.

Caveats:
* This has not added support for complex32 or complex128
* It only works with static shapes, not dynamic shapes. It seems like the dynamic shapes issue is unrelated to cpp wrapper, since it fails in the test_torchinductor_dynamic_shapes.py test. I checked these `test_fft_.*` tests, which I added in this PR, and verified that they were broken with dynamic shapes before any of the code changes from this PR.

**Test**:

```
benchmarks/dynamo/huggingface.py --inductor --amp --accuracy --inference --device cuda   --cpp-wrapper --only GoogleFnet
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103183
Approved by: https://github.com/desertfire, https://github.com/jgong5, https://github.com/chunyuan-w
2023-06-09 21:12:41 +00:00
Bin Bao
fbbde8df69 [inductor] fix a numel expr codegen issue (#103005)
Summary: Correctly use pexpr or cexpr for generating symbolic expression
during wrapper codegen.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103005
Approved by: https://github.com/jansel
2023-06-06 14:08:05 +00:00
Bin Bao
44fdfd3222 [inductor] Support select_algorithm with cpp_wrapper (#103003)
Summary: This is one step towards getting cpp_wrapper work with max_autotune.
Switch to use unique kernel name to cache generated cubin file.

This is a copy of https://github.com/pytorch/pytorch/pull/102738 to solve a ghstack issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103003
Approved by: https://github.com/jansel
2023-06-06 14:08:05 +00:00
Shunting Zhang
86c7652503 [inductor] layout optimization for conv (#99773)
convolution kernel with channels last runs much faster then kernel with contiguous inputs. The PR leverage that to optimize tensor layouts so we provide 'channels last' inputs to convolution. Some care need to be taken to not convert tensor layout between contiguous and channels last back and forth. Those extra copies hurt performance quite much.

Latest perf number [here](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2024%20May%202023%2023%3A40%3A37%20GMT&stopTime=Wed%2C%2031%20May%202023%2023%3A40%3A37%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=shunting-layout-opt-19&lCommit=baa797fc100688dfb044fbcbdebcfd2591710f78&rBranch=main&rCommit=999bae0f54108ffc5b7cf2524a02a83901554b16)
- TB: 1.64x -> 1.69x
- HF: 1.79x -> 1.78x (random noise)
- TIMM: 1.51x -> 1.65x

Right now we disable layout optimization for dynamic shape since there is perf loss in that combination. Here is a GH issue to followup: https://github.com/pytorch/pytorch/issues/102670

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99773
Approved by: https://github.com/jansel
2023-06-02 21:08:18 +00:00
chunyuan
4c9992d5ed Inductor cpp wrapper: cache the wrapper (#89743)
If the wrapper code has been built, directly load the .so file to avoid recompilation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89743
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-06-02 00:02:39 +00:00
Bin Bao
c58264c3e9 [inductor] Support multiple symbolic numel expr in CudaWrapperCodeGen (#102093)
Summary: Add a set to avoid generating extra `auto` when seeing the
symbolic numel expression for the second time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102093
Approved by: https://github.com/jansel
2023-05-30 16:08:00 +00:00
chunyuan
3469f100f3 support ConvUnary in Inductor cpp wrapper (#101392)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101392
Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/EikanWang
2023-05-26 15:52:06 +00:00
Natalia Gimelshein
68816e4fa9 Remove inplace buffers when original and mutation are both removed (#102289)
Currently if we have an inplaced buffer that's completely internal to a fused kernel and thus doesn't need to be allocated, we are still allocating it and sending unused argument to a kernel, because our analysis for removing buffers treats it separately (assuming that either original or mutated value are still needed).
This PR extends buffer removal to inplaced buffers that can be removed.

Generated kernel for e.g. ln changes from
```
def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, in_ptr1, in_ptr2, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr):
```
where in_out_ptr0 is unused in the kernel to
```
def triton_(in_out_ptr1, in_ptr0, in_ptr1, in_ptr2, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr):
```
and corresponding allocation/reuse lines in the wrapper are removed.
The `in_out_ptr1` is also mislabeled - it's not `in_out`, it's only written to, but this PR doesn't fix it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102289
Approved by: https://github.com/jansel
2023-05-26 02:06:36 +00:00
Bin Bao
836798e0f3 [inductor] Support precomputed_sizes in CppWrapperCodeGen (#102083)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102083
Approved by: https://github.com/jansel, https://github.com/ngimel
2023-05-25 23:14:28 +00:00
Bin Bao
fd1d442185 [inductor] Add more dynamic shapes support for CudaWrapperCodeGen (#102019)
Summary: Use size hint for autotuning; Fix some symbol arg codegen
problem. More PRs coming for fixing unit test failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102019
Approved by: https://github.com/jansel
2023-05-24 13:29:47 +00:00
Bin Bao
431344f2d0 [inductor] Refactor generate_kernel_call (#102018)
Summary: Refactor generate_kernel_call to support codegen call to Triton
kernel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102018
Approved by: https://github.com/jansel, https://github.com/jgong5
2023-05-23 15:54:49 +00:00
Jason Ansel
0c6f409cda [inductor] Refactor RNG operators (#100064)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100064
Approved by: https://github.com/ngimel
2023-05-20 03:43:33 +00:00
XiaobingSuper
bb62a3734e inductor: fix name 'inf' is not defined issue when calling external_call function (#101865)
This PR will fix https://github.com/pytorch/pytorch/issues/101695.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101865
Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/Skylion007
2023-05-20 01:44:21 +00:00
PyTorch MergeBot
5f07c589b0 Revert "[inductor] Refactor RNG operators (#100064)"
This reverts commit 3bbf0683a1.

Reverted https://github.com/pytorch/pytorch/pull/100064 on behalf of https://github.com/izaitsevfb due to breaks inductor tests, see D45936056 ([comment](https://github.com/pytorch/pytorch/pull/100064#issuecomment-1552093728))
2023-05-17 21:16:41 +00:00
Michael Voznesensky
39f52c0218 Switch AOT Inductor test to export, add dynamic, fix invocation bug (#101585)
Fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101585
Approved by: https://github.com/ngimel, https://github.com/desertfire
2023-05-17 05:52:08 +00:00
Jason Ansel
3bbf0683a1 [inductor] Refactor RNG operators (#100064)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100064
Approved by: https://github.com/ngimel
2023-05-17 01:29:31 +00:00
Edward Z. Yang
b94f143ace SymIntify convNd and conv_transposeNd, fix inductor symint handling (#101488)
Fixes https://github.com/pytorch/pytorch/issues/101014

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101488
Approved by: https://github.com/ngimel
2023-05-16 17:46:52 +00:00
Jiong Gong
dfc46153a7 [inductor] add graph id prefix to inductor_wrapper_call profile info (#101350)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101350
Approved by: https://github.com/jansel
2023-05-16 06:32:00 +00:00
chunyuan
cc54da4877 Inductor cpp wrapper: fix FallbackKernel support (#100788)
Fixes cpp wrapper support for kernels that are not exposed in `torch.ops.aten`. The current PR limits the support scope to `repeat_interleave.Tensor` and will submit follow-up PRs for more OPs.

The PR maps the python schema of the kernel to the cpp schema and uses `c10::Dispatcher::singleton().findSchemaOrThrow` to find the corresponding cpp OP.

The current support is limited and will raise `AssertionError` for unsupported cases.
The limitation includes:
- only support kernel that is not alias
- only support kernel the args and returns of which don't have `alias_info`
- only support output args to be a `Tensor`
- only support input args to be `Tensor`, `Optional[int]`, `Optional[float]` and `Optional[bool]`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100788
Approved by: https://github.com/jgong5, https://github.com/desertfire
2023-05-15 00:45:44 +00:00
Aaron Gokaslan
dfe484a3b3 [BE]: Bugfix functorch and some generic typing improvements (#101337)
Fixes some typing bugs found with newer versions of mypy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101337
Approved by: https://github.com/ezyang
2023-05-14 14:20:56 +00:00
Bin Bao
03433080e6 [inductor] Support FallbackKernel in cpp wrapper codegen (#100553)
Summary: This works well for ops without kwargs. For ops with kwargs, we
need to register ordered_kwargs_for_cpp_kernel for them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100553
Approved by: https://github.com/jansel
2023-05-07 14:33:53 +00:00
Edward Z. Yang
4101de342b Type torch._inductor.codegen.wrapper (#100657)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100657
Approved by: https://github.com/voznesenskym
2023-05-05 16:19:23 +00:00
Edward Z. Yang
f093ee1722 Prevent Triton from getting eagerly imported when importing torch._inductor (#100374)
This makes 'import torch._inductor.utils' go from 3.5s to 2.1s

See also https://github.com/openai/triton/issues/1599

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100374
Approved by: https://github.com/voznesenskym
2023-05-02 11:44:12 +00:00
Natalia Gimelshein
ff29722364 [inductor] Prevent reusing aliased buffers if aliases still have uses (#100332)
Fixes #100314
In dependencies, we should track not only immediately used buffer, but also aliased buffers that point to it, otherwise we can reuse and overwrite the buffer while there are still pending uses.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100332
Approved by: https://github.com/jansel
2023-05-02 04:05:16 +00:00
Edward Z. Yang
2d8deffc1e Refactor repro/minifier into CLI; add analyze (#100226)
This is a two part PR; I can split it if you really want me to.

The first part is a refactor of the after aot repro/minifier scripts to come with a command line interface. I maintain exact BC with the previous interface (so, e.g., you still get a repro.py and a run_minifier.py that do the same thing as before), but each of these scripts also take command line arguments now which you can use to customize what actually happens. Check `run_repro` for full documentation on the arguments.

The second part of this is an implementation of `analyze` subcommand on the new CLI for any repro.

<img width="1277" alt="image" src="https://user-images.githubusercontent.com/13564/235045677-8545aab7-5e83-4813-bbec-47783dc60122.png">

This facility is oriented towards accuracy debugging. It does several things:

1. It will run your model twice and check for nondeterminism in inductor/float64, *even* on intermediate inputs (our benchmarking nondeterminism test only checks for nondeterminism on the final output). This makes localizing which operator is nondeterministic easy.
2. It will run your compiled model side-by-side with eager and float64 variants, and then report when things diverge too far from RMSE delta from float64.

Importantly, it does all this without requiring every intermediate to be held in memory (which will cause an OOM on large repros, such as the one I tested this on.)

Some other minor improvements:

* MinifierTestBase now has an easy to comment out spot that you can use to retain the temporary directory; good for debugging
* We print "running minifier" and "running repro" in MinifierTestBase to make it easier to orient where logs are coming from
* same takes a `log_error` optional argument which you can use to reroute the error logs when things mismatch
* counters["inductor"]["intermediate_hooks"] tracks the number of intermediate hooks we've codegen'ed; good for populate the tqdm interface
* torch.fx.interpreter gets an official `boxed_run` interface which uses the boxed arguments calling convention and doesn't retain inputs unnecessarily long
* torch.utils._content_store gets compute_tensor_metadata/read_tensor_metadata helper functions for computing tensor information without serializing it

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100226
Approved by: https://github.com/bertmaher, https://github.com/bdhirsh, https://github.com/anijain2305
2023-05-01 11:12:38 +00:00
Edward Z. Yang
beb7f79517 Fix intermediate hooks on inplace buffers, enable it in testing (#100322)
Fixes https://github.com/pytorch/pytorch/issues/100312

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100322
Approved by: https://github.com/ngimel
2023-04-30 13:34:44 +00:00
Edward Z. Yang
54c0edf6da Track exact origin_node on best effort basis (#100110)
Currently, we track 'origins' on IR nodes so that we have some idea about what FX IR nodes contributed to any given fused kernel. However, the origins are dumped into an undifferentiated set, so if you have, e.g., multiple outputs, you cannot easily tell which output corresponds to which FX node.

This PR introduce a more precise notion of tracking "origin_node" which says that the contents of this Buffer/Loop node corresponds EXACTLY to the output of a particular FX node; e.g., if you serialized each intermediate when running the generated inductor code, you could compare them with the corresponding intermediates from the original FX graph.

Tracking origin_node in all cases requires quite a bit of effort, so this PR introduces the tracking on a strictly best effort basis. The logic in torch/_inductor/graph.py sets up the associations, but only when it is "obvious" which IR node should get the assignment, and there is work in torch/_inductor/ir.py for propagating this information around as necessary. Like origins, origin_node is not a true dataclass field (as this would break all existing positional arg call sites), instead, it is added post facto via `__post_init__`. At the moment, it is only valid for Buffer/Loop to have an origin_node, but we could imagine relaxing this in the future.

The payoff is in torch/_inductor/codegen/wrapper.py and torch/_inductor/codegen/triton.py where we currently just print the FX node name and the tensor (but a more useful integration will be coming later.)

I also introduce a debugging tool `debug_ir_traceback` which tracks tracebacks of where IRNodes were allocated, to help you understand why a node doesn't have an `origin_node`.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100110
Approved by: https://github.com/voznesenskym
2023-04-28 04:15:27 +00:00
Bin Bao
afa9d10ed6 [inductor] Support mixed device in cpp wrapper (#99950)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99950
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-04-26 16:26:56 +00:00
Bin Bao
efded3f3e9 [inductor] Add cpp_wrapper support for FallbackKernel (#99887)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99887
Approved by: https://github.com/ngimel
2023-04-26 01:03:53 +00:00