pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
James Wu	06773663b5	Implement an AOT precompile mode for standalone_compile (#165843 ) This PR introduces an `aot` flag to standalone_compile that uses BundledAOTAutogradCacheEntry, and then allows regional_inductor to use this so that we can start aot compiling regional compiler graphs. The diff above this will attempt to allow GraphPickler to fully serialize graphs that have regionally compiled subgraphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165843 Approved by: https://github.com/oulgen	2025-10-21 15:02:45 +00:00
James Wu	dd3b48e85d	Fix bug with serialization after AOTAutogradCache hit (#165474 ) Fixes #165447 On AOTAutogradCache load, the serialization function we pick is just lambda: self, because the object itself is an AOTAutogradCacheEntry. However, this isn't safe, because `wrap_post_compile` will make `self` unserializable, since it needs to load triton kernels and stuff! So instead, on AOTAutogradCache load, we preserve the bytes that were used to load the object to begin with, and return that object on a call to serialize(). This effectively makes it so that we save a copy of the pre-hydrated artifact, without needing to do an eager copy until someone actually calls `serialize`. Test Plan: Run ```py import torch class M(torch.nn.Module): def __init__(self): super().__init__() self.linear1 = torch.nn.Linear(2, 4) self.relu = torch.nn.ReLU() self.linear2 = torch.nn.Linear(4, 8) def forward(self, x): return self.linear2(self.relu(self.linear1(x))) device = "cuda" m = M().to(device) sample_inputs = (torch.randn(2, 2, device=device),) eager_out = m(sample_inputs) with torch._dynamo.config.patch("enable_aot_compile", True): compiled_fn_path = "./m.pt" compiled_fn = torch.compile( m, fullgraph=True ).forward.aot_compile((sample_inputs, {})) compiled_fn.save_compiled_function(compiled_fn_path) torch._dynamo.reset() with torch.compiler.set_stance("fail_on_recompile"): with open(compiled_fn_path, "rb") as f: loaded_fn = torch.compiler.load_compiled_function(f) assert loaded_fn is not None compiled_out = loaded_fn(m, sample_inputs) assert torch.allclose(eager_out, compiled_out) ``` twice, see that it succeeds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165474 Approved by: https://github.com/yiming0416, https://github.com/zhxchen17	2025-10-17 17:47:24 +00:00
rzou	723c27ed78	[standalone_compile] binary format write should be atomic (#162432 ) We update it to call write_atomic instead of file.write Pull Request resolved: https://github.com/pytorch/pytorch/pull/162432 Approved by: https://github.com/oulgen	2025-09-09 18:43:13 +00:00
Xu Han	0e3e377bd5	[inductor] fix CompiledArtifact.load path on Windows. (#160268 ) fix CompiledArtifact.load path on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160268 Approved by: https://github.com/ezyang	2025-08-10 14:22:52 +00:00
Lucas Kabela	2b1ae29960	[Dynamo][Better Engineering] Add typing annotations to guard and source (#158397 ) (#159491 ) Summary: X-link: https://github.com/pytorch/executorch/pull/12986 As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo This PR adds strict typing support to a critical set of files for dynamo, `source.py` and the base `_guards.py` Running ``` mypy torch/_dynamo/source.py torch/_guards.py --linecount-report /tmp/coverage_log ``` \| -------- \| Lines Unannotated \| Lines Total \| % lines covered \| Funcs Unannotated \| Funcs Total \| % funcs covered \| \| -------- \| ------- \| -------- \| ------- \| ------- \| ------- \| ------- \| \| Main \| 1227 \| 2208 \| 55.57% \| 207 \| 362 \| 57.18% \| \| This PR \| 2217 \| 2217 \| 100.00% \| 362 \| 362 \| 100.00% \| \| Delta \| +990 \| +9 \| +44.43% \| +155 \| 0 \| +42.82% \| cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 jerryzh168 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Test Plan: Imported from GitHub, without a `Test Plan:` line. Rollback Plan: Reviewed By: JacobSzwejbka, yangw-dev Differential Revision: D79199389 Pulled By: Lucaskabela Pull Request resolved: https://github.com/pytorch/pytorch/pull/159491 Approved by: https://github.com/anijain2305, https://github.com/yangw-dev	2025-07-30 22:57:50 +00:00
PyTorch MergeBot	d987a6f7f0	Revert "[Dynamo][Better Engineering] Add typing annotations to guard and source (#158397 )" This reverts commit `abcb24f4de`. Reverted https://github.com/pytorch/pytorch/pull/158397 on behalf of https://github.com/yangw-dev due to Suggested to fix failing internal signals on D78911890 ([comment](https://github.com/pytorch/pytorch/pull/158397#issuecomment-3133823766))	2025-07-29 19:49:40 +00:00
Lucas Kabela	abcb24f4de	[Dynamo][Better Engineering] Add typing annotations to guard and source (#158397 ) As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo This PR adds strict typing support to a critical set of files for dynamo, `source.py` and the base `_guards.py` Running ``` mypy torch/_dynamo/source.py torch/_guards.py --linecount-report /tmp/coverage_log ``` \| -------- \| Lines Unannotated \| Lines Total \| % lines covered \| Funcs Unannotated \| Funcs Total \| % funcs covered \| \| -------- \| ------- \| -------- \| ------- \| ------- \| ------- \| ------- \| \| Main \| 1227 \| 2208 \| 55.57% \| 207 \| 362 \| 57.18% \| \| This PR \| 2217 \| 2217 \| 100.00% \| 362 \| 362 \| 100.00% \| \| Delta \| +990 \| +9 \| +44.43% \| +155 \| 0 \| +42.82% \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/158397 Approved by: https://github.com/anijain2305	2025-07-24 15:55:18 +00:00
rzou	a9537b626c	[standalone_compile] Fix single Tensor outputs from split_module (#157803 ) We assumed that the output in an FX graph would always just be a list[Tensor], even in the single tensor return case. It is possible for the output to be a single Tensor. This can happen by calling torch.fx.split_module on the module. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/157803 Approved by: https://github.com/oulgen	2025-07-10 12:49:03 +00:00
Xuehai Pan	6ff6630375	[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156313 Approved by: https://github.com/jingsh	2025-06-23 02:57:12 +00:00
James Wu	10fb98a004	[Precompile] Hook up backend="inductor" (#155387 ) This PR adds the necessary things to register and record backend ids from BundledAOTAutogradCacheEntry. One TODO to point out; in this diff, if there are multiple backends that would have the same AOTAutogradCache key (traditional cache key, not backend_id), we just end up serializing the same BundledAOTAutogradCache entry multiple times. This is not ideal obviously, so we'll want to deduplicate these and just track the different keys that one BundledAOTAutogradCacheEntry is associated with instead. This shouldn't be super hard to do, though, as we just need to run a deduplication step on call to `serialize()`, I think. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155387 Approved by: https://github.com/oulgen	2025-06-22 15:05:08 +00:00
PyTorch MergeBot	f1331f3f1b	Revert "[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 )" This reverts commit `3627270bdf`. Reverted https://github.com/pytorch/pytorch/pull/156313 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))	2025-06-22 12:31:57 +00:00
Xuehai Pan	3627270bdf	[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156313 Approved by: https://github.com/jingsh	2025-06-22 08:43:09 +00:00
PyTorch MergeBot	edd45f3a02	Revert "[Precompile] Hook up backend="inductor" (#155387 )" This reverts commit `2c68c3e8d5`. Reverted https://github.com/pytorch/pytorch/pull/155387 on behalf of https://github.com/atalman due to dynamo/test_precompile_context.py::PrecompileContextTests::test_basic [GH job link](https://github.com/pytorch/pytorch/actions/runs/15772892021/job/44464141039) [HUD commit link](`2c68c3e8d5`) ([comment](https://github.com/pytorch/pytorch/pull/155387#issuecomment-2992044073))	2025-06-20 15:30:04 +00:00
James Wu	2c68c3e8d5	[Precompile] Hook up backend="inductor" (#155387 ) This PR adds the necessary things to register and record backend ids from BundledAOTAutogradCacheEntry. One TODO to point out; in this diff, if there are multiple backends that would have the same AOTAutogradCache key (traditional cache key, not backend_id), we just end up serializing the same BundledAOTAutogradCache entry multiple times. This is not ideal obviously, so we'll want to deduplicate these and just track the different keys that one BundledAOTAutogradCacheEntry is associated with instead. This shouldn't be super hard to do, though, as we just need to run a deduplication step on call to `serialize()`, I think. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155387 Approved by: https://github.com/oulgen	2025-06-20 06:38:29 +00:00
James Wu	e21ff9c3be	Add logging for guard miss failure (#153125 ) Differential Revision: [D74371381](https://our.internmc.facebook.com/intern/diff/D74371381/) This PR adds some logging for guard misses to tlparse, so that we know when AOTAutogradCache and FxGraphCache miss due to guards. Example tlparse result: https://gist.github.com/jamesjwu/afa19335c0aee85b24546b13c1cf6427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153125 Approved by: https://github.com/oulgen, https://github.com/jingsh	2025-05-09 16:51:04 +00:00
Oguz Ulgen	e4a1a16bef	Check integrity of bytes in AppendingByteSerializer (#152139 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152139 Approved by: https://github.com/zou3519	2025-04-26 18:10:58 +00:00
Oguz Ulgen	cc793e895e	[StandaloneCompile] Autotune at compile time (#151922 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151922 Approved by: https://github.com/jamesjwu ghstack dependencies: #151921	2025-04-23 04:32:06 +00:00
rzou	596296fb0b	[standalone_compile] Dynamic shape handling (#151788 ) standalone_compile needs to get dynamic shape information from somewhere. We add a new `dynamic_shapes` argument with three options: 1. from the passed-in graph (dynamic="from_graph"). This is the default. 2. from the example inputs, thereby specializing on them. (dynamic="from_example_inputs") 3. from the current tracing context (dynamic="from_tracing_context") 1 and 3 are not exactly the same. 2 can also be used for more advanced things... (specialize on one input but not the other). Most of this PR is tests. Test Plan: - a lot of new tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151788 Approved by: https://github.com/oulgen	2025-04-22 20:17:24 +00:00
James Wu	a4fdae5c84	Lift guard checking logic to AOTAutogradCache (#151563 ) This somewhat complicated PR does a few things: - It separates out a lot of the guard checking logic into its own class, GuardedCache[T] - It adds a new `check_guard_hit` lambda to FXGraphCache._lookup_graph, which allows callers to define their own guard checking logic - It then uses these two combined parts to lift guard checking to AOTAutogradCache. This means that AOTAutogradCache stores its own guard expressions and evaluates them. - FXGraphCache's guard checking logic is completely unchanged, just refactored. As part of the work, I'm able to extend a bit of the logging functionality of AOTAutogradCache into FXGraphCache, so that you can know if FXGraphCache missed due to a guard failure or a full cache miss. # Why do this? Lifting guards to AOTAutogradCache has a few benefits: - First, it fixes a long standing bug in guard checking logic. Backward passes can have different symint inputs than forward passes depending on forward output, if AOTAutograd chooses to store symints for the backward. These symint inputs have the same underlying symbols as the forward, but on AOTAutogradCache hit, we don't have access to the hints backing these exact symints (we only have hints for the symints on the forward function). By lifting guard checking logic to AOTAutogradCache, we no longer need to check the backward guards, as they'll be included in the AOTAutogradCache guard expression. I've added a unit test that failed before my diff, and now passes, as an example of this - Secondly, this is the first step necessary to bundle CompiledFxGraph into AOTAutogradCache. Doing so will simplify our cache logic significantly, and also make precompile logic simpler, as precompiles will only need to store AOTAutogradCacheEntrys, without needing to match them up with inductor FXGraphCache entries. - Finally, adding guard checking logic to AOTAutogradCache my allow us in the future to handle more complicated cases like a single forward with multiple backwards, as guard checks are now storable on the cache entry itself. # Guard checking logic of AOTAutogradCache When AOTAutogradCache evaluates guard expressions, it no longer needs to evaluate the forward/backward guards in the FXGraphCacheEntry (since the AOTAutogradCache guard expressions will encompass them). Because of this, we still need a way for AOTAutogradCache to distinguish between multiple FXGraphCache local entries. To do so, AOTAutogradCache stores the guard string from FXGraphCache, which it uses as a second "cache key". It doesn't need to evaluate these guards, it just needs to find the cache entry from FXGraphCache that had the same guards as when it was stored. After this, I will work on putting the FXGraphCache entries directly into AOTAutogradCache. If I can put CompiledFxGraphs in the cache directly, I no longer need this complicated `check_guard_hit` overriding logic. ## Test Plan Added a new unit test. There are comprehensive guard checking unit tests in `test_aot_autograd_cache` already, and those pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151563 Approved by: https://github.com/oulgen	2025-04-22 03:01:08 +00:00
Oguz Ulgen	67c2869a38	Unpack the output code in the standalone_compile (#151609 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151609 Approved by: https://github.com/zou3519 ghstack dependencies: #151768	2025-04-21 17:37:38 +00:00
rzou	29317f8585	[standalone_compile] Some misc fixes (#151502 ) This PR fixes two things. The first problem is that in the vLLM style standalone_compile is called from within a custom torch.compile backend. If there already is a FakeTensorMode (which there is), we shouldn't create a new FakeTensorMode with the same shape_env, instead we should just reuse the same FakeTensorMode. The second thing is that compile_fx can mutate the passed in gm, so we deepcopy (since standalone_compile should be standalone) Test Plan: - new test - updated old tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/151502 Approved by: https://github.com/oulgen ghstack dependencies: #151501, #151551	2025-04-18 12:34:13 +00:00
rzou	58310a0043	[standalone_compile] support multiple returns (#151551 ) We were only returning the first one. There's an edge case on what to do if the original function returns a single Tensor. capture(f) returns a function that returns a tuple of one Tensor in this case and we were originally converting this back to one single Tensor. I think it's fine to return a tuple of one Tensor (that is what the graph passed to standalone_compile asked for!) but we can revisit. fine Test Plan: - modified one test to used multiple outputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/151551 Approved by: https://github.com/Skylion007, https://github.com/oulgen ghstack dependencies: #151501	2025-04-18 12:34:13 +00:00
rzou	ac715e96b4	[standalone_compile] Don't check if path is directory if it doesn't exist (#151501 ) os.path.isdir(path) will return False if the path doesn't exist. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/151501 Approved by: https://github.com/Skylion007, https://github.com/oulgen	2025-04-18 12:34:13 +00:00
Oguz Ulgen	3cf0e2d8ec	Add inductor standalone_compile API (#150670 ) This PR adds standalone_compile API that does precompilation via caching to support vLLM use case in the short term while we work on the longer term precompilation solution. ``` standalone_compile(gm, example_inputs, options) -> CompiledArtifact CompiledArtifact.save(path, format: binary\|unpacked = binary) CompiledArtifact.load(path, format: binary\|unpacked = binary) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150670 Approved by: https://github.com/jamesjwu, https://github.com/zou3519	2025-04-15 23:38:15 +00:00
PyTorch MergeBot	74f6bc28a7	Revert "Add inductor standalone_compile API (#150670 )" This reverts commit `c9aef50898`. Reverted https://github.com/pytorch/pytorch/pull/150670 on behalf of https://github.com/Camyll due to breaking internal builds with torch module not found error ([comment](https://github.com/pytorch/pytorch/pull/150670#issuecomment-2806975267))	2025-04-15 17:35:59 +00:00
Oguz Ulgen	c9aef50898	Add inductor standalone_compile API (#150670 ) This PR adds standalone_compile API that does precompilation via caching to support vLLM use case in the short term while we work on the longer term precompilation solution. ``` standalone_compile(gm, example_inputs, options) -> CompiledArtifact CompiledArtifact.save(path, format: binary\|unpacked = binary) CompiledArtifact.load(path, format: binary\|unpacked = binary) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150670 Approved by: https://github.com/jamesjwu, https://github.com/zou3519	2025-04-14 22:00:09 +00:00
PyTorch MergeBot	24b3ab9255	Revert "Add inductor standalone_compile API (#150670 )" This reverts commit `bbc5fe8504`. Reverted https://github.com/pytorch/pytorch/pull/150670 on behalf of https://github.com/albanD due to Broke profiler test ([comment](https://github.com/pytorch/pytorch/pull/150670#issuecomment-2802067144))	2025-04-14 15:22:33 +00:00
Oguz Ulgen	bbc5fe8504	Add inductor standalone_compile API (#150670 ) This PR adds standalone_compile API that does precompilation via caching to support vLLM use case in the short term while we work on the longer term precompilation solution. ``` standalone_compile(gm, example_inputs, options) -> CompiledArtifact CompiledArtifact.save(path, format: binary\|unpacked = binary) CompiledArtifact.load(path, format: binary\|unpacked = binary) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150670 Approved by: https://github.com/jamesjwu, https://github.com/zou3519	2025-04-14 07:07:10 +00:00

28 Commits