pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Bin Bao	b0816e4714	[inductor] Fix AOTInductor output issues (#105773 ) Summary: This is a follow-up on https://github.com/pytorch/pytorch/pull/105496. There are several issues with the previous fix, 1) It explicitly does copy for every output at the end of the main function; 2) When an output is ReinterpretView, no as_strided was generated for it; 3) There can be duplicated buffer declarations. This PR fixes by making sure can_reuse behave consistently between two AOTIndcutor passes, and thus always generate the same set of kernels. It also adds handling of ReinterpretView. Differential Revision: [D47692214](https://our.internmc.facebook.com/intern/diff/D47692214) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105773 Approved by: https://github.com/jansel	2023-07-24 01:58:49 +00:00
Aaron Gokaslan	6d43c89f37	[BE]: Update Ruff to 0.0.280 (#105724 ) Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724 Approved by: https://github.com/ezyang, https://github.com/janeyx99	2023-07-22 23:03:34 +00:00
Justin Chu	79c5e33349	[BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436 Approved by: https://github.com/malfet, https://github.com/albanD	2023-07-21 07:38:46 +00:00
Shunting Zhang	1e87778552	[inductor] refactor wrapper benchmark code out of utils.py (#105584 ) Refactor wrapper benchmark out of utils.py since 1. utils.py gets too large 2. I plan to add more code to wrapper benchmark for multi-kernel. This is split out from https://github.com/pytorch/pytorch/pull/103469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105584 Approved by: https://github.com/jansel	2023-07-21 00:01:35 +00:00
Bin Bao	71067631c2	[inductor] Fix an AOTInductor missing output issue (#105496 ) Summary: When an output buffer is reused instead of directly referring to the passed-in output, we need to explictly make a copy Pull Request resolved: https://github.com/pytorch/pytorch/pull/105496 Approved by: https://github.com/jansel	2023-07-20 08:27:31 +00:00
Bin Bao	b10de43c0a	Add aot_inductor as a test backend for benchmarking (#105221 ) Summary: Original PR at https://github.com/pytorch/pytorch/pull/104977. Landing from fbcode instead. Add an aot_inductor backend (Export+AOTInductor) in the benchmarking harness. Note it is not a dynamo backend. Moved files from torch/_inductor/aot_inductor_include to torch/csrc/inductor as a more standard way for exposing headers Created a caching function in benchmarks/dynamo/common.py for compiling, loading and caching the .so file, as a proxy for a pure C++ deployment, but easier for benchmarking. Differential Revision: D47452591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105221 Approved by: https://github.com/jansel	2023-07-18 13:16:36 +00:00
chunyuan	1fdc88f877	Inductor cpp wrapper: fix codegen of FallbackKernel with kwargs (#104575 ) Fix cpp wrapper failure on TorchBench model `hf_Reformer` with `randn`: ``` random_rotations = torch.randn(rotations_shape, device=vectors.device, dtype=vectors.dtype) ``` For cpp wrapper, when `kwargs` is not empty, for `OpOverloadPacket` kernel, we need to know the exact overload schema to handle the `kwargs` properly when calling the cpp kernel: including finding the correct order of the kwargs and getting the default value for optional args without provided value when calling the function (`layout` in the above case). The current support in this PR is conservative and we'll extend the functionality in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104575 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-07-15 03:33:44 +00:00
Bin Bao	528ab477ce	[reland][inductor] Register an op for mm_plus_mm (#105153 ) Summary: Reland https://github.com/pytorch/pytorch/pull/104835 after fixing internal build issues Test Plan: CI Differential Revision: D47442849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105153 Approved by: https://github.com/clee2000	2023-07-14 14:35:29 +00:00
Kefei Lu	4328138c1e	AOT inductor: error: ‘c10::Dispatcher’ has not been declared (#104742 ) Differential Revision: D47275262 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104742 Approved by: https://github.com/desertfire	2023-07-14 01:47:52 +00:00
Catherine Lee	c36dca7bc5	Revert "[inductor] Register an op for mm_plus_mm (#104835 )" (#105150 ) This reverts commit `9c46a1620c`. Actual revert referenced in https://github.com/pytorch/pytorch/pull/105149 #104835 is causing internal builds to fail Pull Request resolved: https://github.com/pytorch/pytorch/pull/105150 Approved by: https://github.com/atalman	2023-07-13 17:13:45 +00:00
Bin Bao	9c46a1620c	[inductor] Register an op for mm_plus_mm (#104835 ) Summary: Currently the aten version of mm_plus_mm has no cpp implementation, and thus cpp_wrapper can not generate the correct cpp function call for it. Differential Revision: [D47372057](https://our.internmc.facebook.com/intern/diff/D47372057) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104835 Approved by: https://github.com/jansel, https://github.com/SherlockNoMad	2023-07-12 02:34:02 +00:00
chunyuan	ba167e6578	Inductor cpp wrapper: fix codegen of ScatterFallback (#104524 ) Fix cpp wrapper failure on TorchBench model `basic_gnn_edgecnn` and `hf_Reformer` which contain scatter OP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104524 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-07-11 08:17:56 +00:00
XiaobingSuper	54f33265db	inductor(re-land): support cpu fusion path for bfloat16 amp (#104399 ) This PR is about the fusion of amp path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104399 Approved by: https://github.com/jgong5, https://github.com/eellison	2023-07-10 00:58:04 +00:00
Bin Bao	a860b965f1	[inductor] Relax custom op schema checking for cpp_wrapper (#104349 ) Summary: Remove fallback ops whitelist because FallbackKernel.set_cpp_kernel is doing sufficient checking Differential Revision: [D47269612](https://our.internmc.facebook.com/intern/diff/D47269612) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104349 Approved by: https://github.com/jgong5, https://github.com/chunyuan-w, https://github.com/jansel	2023-07-09 17:31:31 +00:00
XiaobingSuper	8ce3a18b6a	inductor: reduce complie time by reducing repr calls of quantize or Opaque tensor (#104696 ) For quantize or opaue tensor, if they are constant values, the calls of tensor ```__repr__``` will have memory copy(https://github.com/pytorch/pytorch/blob/main/torch/_tensor_str.py#L550): `db1ac4e29b/torch/_inductor/codegen/wrapper.py (L289-L292)` for CPP codegen, there have many times of initiation of ```WrapperCodeGen```: https://github.com/pytorch/pytorch/blob/main/torch/_inductor/codegen/cpp.py#L2023, which consumes much time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104696 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-07-07 01:12:34 +00:00
Yang Chen	d2281e38ae	Adds the initial support for AOTInductor model and interface (#104202 ) This PR combines the C++ code for the AOTInductor's model and interface with Bin Bao's changes to AOTInductor codegen. It adds a number of AOTInductor C interfaces that can be used by an inference runtime. Under the hood of the interfaces, the model code generated by the AOTInductor's codegen is wrapped into a class, AOTInductorModel, which manages tensors and run the model inference. On top of AOTInductorModel, we provide one more abstract layer, AOTInductorModelContainer, which allows the user to have multiple inference runs concurrently for the same model. This PR also adjusts the compilation options for AOT codegen, particularly some fbcode-related changes such as libs to be linked and header-file search paths. Note that this is the very first version of the AOTInductor model and interface, so many features (e.g. dynamic shape) are incomplete. We will support those missing features in in future PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104202 Approved by: https://github.com/desertfire	2023-06-27 00:37:26 +00:00
Jason Ansel	8c54cd434f	[inductor] Fix allow_buffer_reuse=False (#103630 ) Fixes #103461 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103630 Approved by: https://github.com/anijain2305	2023-06-15 22:50:01 +00:00
chunyuan	17217d367f	Inductor cpp wrapper: support Constant in input (#103496 ) ## Description Fix cpp wrapper for models which have constants in the graph inputs. Python wrapper directly gets the value inside the wrapper call as a global variable passed when calling: `4081e924a8/torch/_inductor/codecache.py (L757)` The constants value has been saved in `mod.__dict__` in `4081e924a8/torch/_inductor/graph.py (L874-L875)` For cpp wrapper, we need to append constants to the input args, so as to pass this python value to the `inductor_entry_cpp` function explicitly. ### Example Example of output code for dlrm in TorchBench with this fix: ```py module = CppWrapperCodeCache.load(cpp_wrapper_src, 'inductor_entry_cpp', 'cfkc6c36t7cggi6mnokrdm5jhesnunjg5xysv3o3x3vaqmzmpe6r', False) def _wrap_func(f): def g(args): args_tensor = [arg if isinstance(arg, torch.Tensor) else torch.tensor(arg) for arg in args] constants_tensor = [constant0, constant1] args_tensor.extend(constants_tensor) return f(args_tensor) return g call = _wrap_func(module.inductor_entry_cpp) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/103496 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/desertfire	2023-06-15 05:01:25 +00:00
Animesh Jain	58d2c66a70	[activation checkpointing] Higher order functional rng op wrappers (#102934 ) Introduces two higher order operators * run_and_save_rng_state - Saves the current rng state and then runs the op. * run_with_rng_state - Runs the op with the rng state supplied as an input Ideally, we would like to use torch.compile for these operators. But currently the plan is to introduce these operators at the partitioner level, obviating the need to support them fully through the torch.compile stack. To ensure that we have good enough debugging with minifiers, we have ensure that they work with make_fx. In future, we can move on torch.compile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102934 Approved by: https://github.com/jansel, https://github.com/zou3519	2023-06-12 22:54:17 +00:00
Shunting Zhang	daf75c0759	[AOTAutograd] compare with stride hints (#103342 ) We previously compare FakeTensor's strides with real tensor's strides. This cause dynamic dimension of FakeTensor being specialized to static int. This may cause a graph specialized for one shape being used by another shape which is wrong. Use stride hints for the comparison instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103342 Approved by: https://github.com/malfet	2023-06-10 06:51:54 +00:00
chunyuan	d61cd03b97	Inductor cpp wrapper: support ConvTranspose and fix Convolution ir (#103308 ) The changes in this PR include: - Support ConvTranspose in cpp wrapper - Fix cpp wrapper support for aten convolution when bias is `not None`: bias is in `args` instead of `kwargs` when it is `not None`. The change is covered by ConvTranspose dynamic shapes UT since we'll fall back to aten convolution in dynamic shape cases. - Fix cpp wrapper support for `inf`. This is a UT added in https://github.com/pytorch/pytorch/issues/101865. The cpp wrapper UT is covered in `test_conv2d_unary` of `test_cpp_wrapper.py`. It's in `slowTest` category and seems not captured in the CI of that PR. I will submit another PR to remove the hard-coded schema in these `ExternKernel`s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103308 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-06-10 03:53:05 +00:00
David Berard	cde4657284	[inductor] Support complex fallback for convert_element_type, _fft_c2c, view_as_real to support GoogleFnet with cpp wrapper (#103183 ) Fixes #102752 These 3 fallback kernels appear in GoogleFnet because they take complex arguments - i.e., usually they aren't fallback kernels. To support this model, we added support for these 3 ops. Details: 1. Add these 3 ops to the allowlist. I assume that we eventually want to support all fallback kernels, but for now we just add these 3 ops to the allowlist. 2. Support complex64 in cpp codegen 3. Support List[] arguments and ScalarType arguments in cpp codegen 4. Allow alias_info in schema arguments. In the original PR supporting fallback kernels for cpp wrapper, ops with schemas with non-null alias_info for any of the arguments were disallowed; but I don't think there's any reason we need to disallow these in cpp wrapper code. Caveats: * This has not added support for complex32 or complex128 * It only works with static shapes, not dynamic shapes. It seems like the dynamic shapes issue is unrelated to cpp wrapper, since it fails in the test_torchinductor_dynamic_shapes.py test. I checked these `test_fft_.` tests, which I added in this PR, and verified that they were broken with dynamic shapes before any of the code changes from this PR. Test*: ``` benchmarks/dynamo/huggingface.py --inductor --amp --accuracy --inference --device cuda --cpp-wrapper --only GoogleFnet ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/103183 Approved by: https://github.com/desertfire, https://github.com/jgong5, https://github.com/chunyuan-w	2023-06-09 21:12:41 +00:00
Bin Bao	fbbde8df69	[inductor] fix a numel expr codegen issue (#103005 ) Summary: Correctly use pexpr or cexpr for generating symbolic expression during wrapper codegen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103005 Approved by: https://github.com/jansel	2023-06-06 14:08:05 +00:00
Bin Bao	44fdfd3222	[inductor] Support select_algorithm with cpp_wrapper (#103003 ) Summary: This is one step towards getting cpp_wrapper work with max_autotune. Switch to use unique kernel name to cache generated cubin file. This is a copy of https://github.com/pytorch/pytorch/pull/102738 to solve a ghstack issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103003 Approved by: https://github.com/jansel	2023-06-06 14:08:05 +00:00
Shunting Zhang	86c7652503	[inductor] layout optimization for conv (#99773 ) convolution kernel with channels last runs much faster then kernel with contiguous inputs. The PR leverage that to optimize tensor layouts so we provide 'channels last' inputs to convolution. Some care need to be taken to not convert tensor layout between contiguous and channels last back and forth. Those extra copies hurt performance quite much. Latest perf number [here](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2024%20May%202023%2023%3A40%3A37%20GMT&stopTime=Wed%2C%2031%20May%202023%2023%3A40%3A37%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=shunting-layout-opt-19&lCommit=baa797fc100688dfb044fbcbdebcfd2591710f78&rBranch=main&rCommit=999bae0f54108ffc5b7cf2524a02a83901554b16) - TB: 1.64x -> 1.69x - HF: 1.79x -> 1.78x (random noise) - TIMM: 1.51x -> 1.65x Right now we disable layout optimization for dynamic shape since there is perf loss in that combination. Here is a GH issue to followup: https://github.com/pytorch/pytorch/issues/102670 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99773 Approved by: https://github.com/jansel	2023-06-02 21:08:18 +00:00
chunyuan	4c9992d5ed	Inductor cpp wrapper: cache the wrapper (#89743 ) If the wrapper code has been built, directly load the .so file to avoid recompilation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89743 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-06-02 00:02:39 +00:00
Bin Bao	c58264c3e9	[inductor] Support multiple symbolic numel expr in CudaWrapperCodeGen (#102093 ) Summary: Add a set to avoid generating extra `auto` when seeing the symbolic numel expression for the second time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102093 Approved by: https://github.com/jansel	2023-05-30 16:08:00 +00:00
chunyuan	3469f100f3	support ConvUnary in Inductor cpp wrapper (#101392 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101392 Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/EikanWang	2023-05-26 15:52:06 +00:00
Natalia Gimelshein	68816e4fa9	Remove inplace buffers when original and mutation are both removed (#102289 ) Currently if we have an inplaced buffer that's completely internal to a fused kernel and thus doesn't need to be allocated, we are still allocating it and sending unused argument to a kernel, because our analysis for removing buffers treats it separately (assuming that either original or mutated value are still needed). This PR extends buffer removal to inplaced buffers that can be removed. Generated kernel for e.g. ln changes from ``` def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, in_ptr1, in_ptr2, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr): ``` where in_out_ptr0 is unused in the kernel to ``` def triton_(in_out_ptr1, in_ptr0, in_ptr1, in_ptr2, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr): ``` and corresponding allocation/reuse lines in the wrapper are removed. The `in_out_ptr1` is also mislabeled - it's not `in_out`, it's only written to, but this PR doesn't fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102289 Approved by: https://github.com/jansel	2023-05-26 02:06:36 +00:00
Bin Bao	836798e0f3	[inductor] Support precomputed_sizes in CppWrapperCodeGen (#102083 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102083 Approved by: https://github.com/jansel, https://github.com/ngimel	2023-05-25 23:14:28 +00:00
Bin Bao	fd1d442185	[inductor] Add more dynamic shapes support for CudaWrapperCodeGen (#102019 ) Summary: Use size hint for autotuning; Fix some symbol arg codegen problem. More PRs coming for fixing unit test failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102019 Approved by: https://github.com/jansel	2023-05-24 13:29:47 +00:00
Bin Bao	431344f2d0	[inductor] Refactor generate_kernel_call (#102018 ) Summary: Refactor generate_kernel_call to support codegen call to Triton kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/102018 Approved by: https://github.com/jansel, https://github.com/jgong5	2023-05-23 15:54:49 +00:00
Jason Ansel	0c6f409cda	[inductor] Refactor RNG operators (#100064 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100064 Approved by: https://github.com/ngimel	2023-05-20 03:43:33 +00:00
XiaobingSuper	bb62a3734e	inductor: fix name 'inf' is not defined issue when calling external_call function (#101865 ) This PR will fix https://github.com/pytorch/pytorch/issues/101695. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101865 Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/Skylion007	2023-05-20 01:44:21 +00:00
PyTorch MergeBot	5f07c589b0	Revert "[inductor] Refactor RNG operators (#100064 )" This reverts commit `3bbf0683a1`. Reverted https://github.com/pytorch/pytorch/pull/100064 on behalf of https://github.com/izaitsevfb due to breaks inductor tests, see D45936056 ([comment](https://github.com/pytorch/pytorch/pull/100064#issuecomment-1552093728))	2023-05-17 21:16:41 +00:00
Michael Voznesensky	39f52c0218	Switch AOT Inductor test to export, add dynamic, fix invocation bug (#101585 ) Fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/101585 Approved by: https://github.com/ngimel, https://github.com/desertfire	2023-05-17 05:52:08 +00:00
Jason Ansel	3bbf0683a1	[inductor] Refactor RNG operators (#100064 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100064 Approved by: https://github.com/ngimel	2023-05-17 01:29:31 +00:00
Edward Z. Yang	b94f143ace	SymIntify convNd and conv_transposeNd, fix inductor symint handling (#101488 ) Fixes https://github.com/pytorch/pytorch/issues/101014 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101488 Approved by: https://github.com/ngimel	2023-05-16 17:46:52 +00:00
Jiong Gong	dfc46153a7	[inductor] add graph id prefix to inductor_wrapper_call profile info (#101350 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101350 Approved by: https://github.com/jansel	2023-05-16 06:32:00 +00:00
chunyuan	cc54da4877	Inductor cpp wrapper: fix FallbackKernel support (#100788 ) Fixes cpp wrapper support for kernels that are not exposed in `torch.ops.aten`. The current PR limits the support scope to `repeat_interleave.Tensor` and will submit follow-up PRs for more OPs. The PR maps the python schema of the kernel to the cpp schema and uses `c10::Dispatcher::singleton().findSchemaOrThrow` to find the corresponding cpp OP. The current support is limited and will raise `AssertionError` for unsupported cases. The limitation includes: - only support kernel that is not alias - only support kernel the args and returns of which don't have `alias_info` - only support output args to be a `Tensor` - only support input args to be `Tensor`, `Optional[int]`, `Optional[float]` and `Optional[bool]` Pull Request resolved: https://github.com/pytorch/pytorch/pull/100788 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-05-15 00:45:44 +00:00
Aaron Gokaslan	dfe484a3b3	[BE]: Bugfix functorch and some generic typing improvements (#101337 ) Fixes some typing bugs found with newer versions of mypy Pull Request resolved: https://github.com/pytorch/pytorch/pull/101337 Approved by: https://github.com/ezyang	2023-05-14 14:20:56 +00:00
Bin Bao	03433080e6	[inductor] Support FallbackKernel in cpp wrapper codegen (#100553 ) Summary: This works well for ops without kwargs. For ops with kwargs, we need to register ordered_kwargs_for_cpp_kernel for them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100553 Approved by: https://github.com/jansel	2023-05-07 14:33:53 +00:00
Edward Z. Yang	4101de342b	Type torch._inductor.codegen.wrapper (#100657 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100657 Approved by: https://github.com/voznesenskym	2023-05-05 16:19:23 +00:00
Edward Z. Yang	f093ee1722	Prevent Triton from getting eagerly imported when importing torch._inductor (#100374 ) This makes 'import torch._inductor.utils' go from 3.5s to 2.1s See also https://github.com/openai/triton/issues/1599 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100374 Approved by: https://github.com/voznesenskym	2023-05-02 11:44:12 +00:00
Natalia Gimelshein	ff29722364	[inductor] Prevent reusing aliased buffers if aliases still have uses (#100332 ) Fixes #100314 In dependencies, we should track not only immediately used buffer, but also aliased buffers that point to it, otherwise we can reuse and overwrite the buffer while there are still pending uses. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100332 Approved by: https://github.com/jansel	2023-05-02 04:05:16 +00:00
Edward Z. Yang	2d8deffc1e	Refactor repro/minifier into CLI; add analyze (#100226 ) This is a two part PR; I can split it if you really want me to. The first part is a refactor of the after aot repro/minifier scripts to come with a command line interface. I maintain exact BC with the previous interface (so, e.g., you still get a repro.py and a run_minifier.py that do the same thing as before), but each of these scripts also take command line arguments now which you can use to customize what actually happens. Check `run_repro` for full documentation on the arguments. The second part of this is an implementation of `analyze` subcommand on the new CLI for any repro. <img width="1277" alt="image" src="https://user-images.githubusercontent.com/13564/235045677-8545aab7-5e83-4813-bbec-47783dc60122.png"> This facility is oriented towards accuracy debugging. It does several things: 1. It will run your model twice and check for nondeterminism in inductor/float64, even on intermediate inputs (our benchmarking nondeterminism test only checks for nondeterminism on the final output). This makes localizing which operator is nondeterministic easy. 2. It will run your compiled model side-by-side with eager and float64 variants, and then report when things diverge too far from RMSE delta from float64. Importantly, it does all this without requiring every intermediate to be held in memory (which will cause an OOM on large repros, such as the one I tested this on.) Some other minor improvements: * MinifierTestBase now has an easy to comment out spot that you can use to retain the temporary directory; good for debugging * We print "running minifier" and "running repro" in MinifierTestBase to make it easier to orient where logs are coming from * same takes a `log_error` optional argument which you can use to reroute the error logs when things mismatch * counters["inductor"]["intermediate_hooks"] tracks the number of intermediate hooks we've codegen'ed; good for populate the tqdm interface * torch.fx.interpreter gets an official `boxed_run` interface which uses the boxed arguments calling convention and doesn't retain inputs unnecessarily long * torch.utils._content_store gets compute_tensor_metadata/read_tensor_metadata helper functions for computing tensor information without serializing it Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100226 Approved by: https://github.com/bertmaher, https://github.com/bdhirsh, https://github.com/anijain2305	2023-05-01 11:12:38 +00:00
Edward Z. Yang	beb7f79517	Fix intermediate hooks on inplace buffers, enable it in testing (#100322 ) Fixes https://github.com/pytorch/pytorch/issues/100312 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100322 Approved by: https://github.com/ngimel	2023-04-30 13:34:44 +00:00
Edward Z. Yang	54c0edf6da	Track exact origin_node on best effort basis (#100110 ) Currently, we track 'origins' on IR nodes so that we have some idea about what FX IR nodes contributed to any given fused kernel. However, the origins are dumped into an undifferentiated set, so if you have, e.g., multiple outputs, you cannot easily tell which output corresponds to which FX node. This PR introduce a more precise notion of tracking "origin_node" which says that the contents of this Buffer/Loop node corresponds EXACTLY to the output of a particular FX node; e.g., if you serialized each intermediate when running the generated inductor code, you could compare them with the corresponding intermediates from the original FX graph. Tracking origin_node in all cases requires quite a bit of effort, so this PR introduces the tracking on a strictly best effort basis. The logic in torch/_inductor/graph.py sets up the associations, but only when it is "obvious" which IR node should get the assignment, and there is work in torch/_inductor/ir.py for propagating this information around as necessary. Like origins, origin_node is not a true dataclass field (as this would break all existing positional arg call sites), instead, it is added post facto via `__post_init__`. At the moment, it is only valid for Buffer/Loop to have an origin_node, but we could imagine relaxing this in the future. The payoff is in torch/_inductor/codegen/wrapper.py and torch/_inductor/codegen/triton.py where we currently just print the FX node name and the tensor (but a more useful integration will be coming later.) I also introduce a debugging tool `debug_ir_traceback` which tracks tracebacks of where IRNodes were allocated, to help you understand why a node doesn't have an `origin_node`. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100110 Approved by: https://github.com/voznesenskym	2023-04-28 04:15:27 +00:00
Bin Bao	afa9d10ed6	[inductor] Support mixed device in cpp wrapper (#99950 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99950 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-04-26 16:26:56 +00:00
Bin Bao	efded3f3e9	[inductor] Add cpp_wrapper support for FallbackKernel (#99887 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99887 Approved by: https://github.com/ngimel	2023-04-26 01:03:53 +00:00

1 2 3

116 Commits