pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Oguz Ulgen	2dadc2c8fc	Log fx graph cache bypass reasons (#134792 ) Summary: Lets track when we bypass and why Test Plan: unit tests Differential Revision: D61994739 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134792 Approved by: https://github.com/jamesjwu	2024-09-01 19:02:09 +00:00
Aaron Orenstein	7239b8a4f1	Clean up RemoteCache classes (#134032 ) Summary: The existing RemoteCacheBackend classes were a bit haphazard - some of them accepted bytes only, some accepted objects, some returned different types of objects than were passed in. Update them to be more consistent: 1. RemoteCacheBackend is an implementation of a backend: Redis, Memcache, Manifold, LocalFile 2. RemoteCacheSerde is an implementation of a serde protocol - to turn structured objects (dict, list, etc) into bytes: RemoteCacheJsonSerde (json encoding), RemoteCachePassthroughSerde (strictly bytes only) 3. RemoteCache is the cache implementation itself, mixing a RemoteCacheBackend along with an RemoteCacheSerde to provide structured caching. Other than simply reorganizing the existing cache code this also fixes the Redis autotune caching for OSS. Test Plan: unit tests Reviewed By: oulgen Differential Revision: D61178859 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134032 Approved by: https://github.com/oulgen, https://github.com/bhack	2024-08-31 20:18:59 +00:00
Nikita Shulga	af82dc816a	Fix lint failures (#134488 ) Introduced by https://github.com/pytorch/pytorch/pull/131000 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134488 Approved by: https://github.com/Skylion007, https://github.com/msaroufim, https://github.com/albanD, https://github.com/atalman	2024-08-26 20:13:21 +00:00
eqy	3541e450af	Support larger page sizes with `use_mmap_weights` (#131000 ) Fixes e.g., `test_large_mmaped_weights_non_abi_compatible_cuda` on machines with 64K page size CC @malfet @tinglvv @nWEIdia Pull Request resolved: https://github.com/pytorch/pytorch/pull/131000 Approved by: https://github.com/malfet	2024-08-26 18:35:55 +00:00
James Wu	3c5485fb7f	[Retry] Log chromium events to scuba (#134118 ) Summary: This diff implements a bunch of views for internal scuba viewing. TODOS that I might punt to another diff: - Saving cache stats via counter is definitely sus here, but there's not really a good way to track "fx graph cache hit for this compile phase" right now. Will think about this more. - We should definitely log frame id, compile id, etc - We should definitely be logging configs. That way, we can A/B test based on whether a config is turned on. - idk what I'm doing with compile_uuid yet, but it's useful when you want to look at samples for a single run. I think if we had mast job info this field is not needed, but it's nice to be able to drill down to a single run and get its chrome trace view or icicle view, so idk Test Plan: All of the above views are run with nanogpt benchmark: ``` buck run mode/opt caffe2/benchmarks/dynamo:torchbench -- --training --backend=inductor --only nanogpt --performance ``` Differential Revision: D61603243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134118 Approved by: https://github.com/oulgen	2024-08-22 14:59:45 +00:00
Xu Han	5fb8754434	[inductor] write cpp code with encoding utf-8 (#134027 ) Windows is different to Linux, each Windows version with different language pack have different code page. Inductor on Windows will write the genarated cpp code with its code page, and it should occured un-decode character failed. For this situlation, Microsoft suggest to use Unicode to instead of a specific code page. Ref: https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers Changes: 1. Use `utf-8` as encoder for cpp code. 2. It only change encode for cpp code, but not for binary type. binary type is for AoT binary context. It works on https://github.com/pytorch/pytorch/issues/122094#issuecomment-2299592942. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134027 Approved by: https://github.com/desertfire, https://github.com/jgong5, https://github.com/jansel	2024-08-22 11:54:32 +00:00
Xu Han	fbf3fc2a30	[inductor] Use int64_t as index type for all platfroms 4 (#133892 ) It is parallel PR to https://github.com/pytorch/pytorch/pull/133819 , and it is append change for @jansel 's comments. 1. For `torch/_inductor/codegen/cpp_wrapper_cpu.py`, revert to origin code to append LL on MacOS and Windows: `bdc14ad89a` 2. For `torch/_inductor/codegen/cpp_utils.py`, append LL on MacOS and Windows forlarge constants. And fix its UTs: `3a56b76ce0` ------------------------------ Another solution for https://github.com/pytorch/pytorch/pull/133615, use `int64_t` as index type for all plartform. ### Development notes: The metioned PR( https://github.com/pytorch/pytorch/pull/133615) is fix the index type not match to parse_arg args types. As reviewed with @jansel , Jason think we need to unificate `INDEX_TYPE` for all platforms. Current code is make code cumbersome: ```python INDEX_TYPE = "int64_t" if _IS_WINDOWS else "long" ``` So, I have some attempts to unificate `INDEX_TYPE` as `long` or `int64_t`. For use `long` as index type: https://github.com/pytorch/pytorch/pull/133768 For use `int64_t` as index type: https://github.com/pytorch/pytorch/pull/133782 Since that, we still discussed which type we will select as final solution. ![image](https://github.com/user-attachments/assets/b23fa577-2d40-4bd6-b934-fb7994fe0bb0) `long` type is different define and size in different OSs and different compilers. So, @jansel make decision that, we need to select `int64_t` for all platforms. So, I would comtine my work based on https://github.com/pytorch/pytorch/pull/133782. As https://github.com/pytorch/pytorch/pull/133782 still has two issues: 1. std::min/std::max could not match function instances by arg types. It as fixed and validated in PR: https://github.com/pytorch/pytorch/pull/133812 4. Cuda TestMemoryPlanning::test_cpp_wrapper issue by wrong index type. It is fixing in this PR. So, we made final solution in this PR. ### Changes: 1. Use `int64_t` type as index type for all OSs: `Windows`, `Linux` and `MacOS`. 2. Use static_cast<int64_t>(`constant`) to convert constant to `div_floor_integer` with args type(`int64_t`). 3. Update `parse_arg` function signature to `int64_t`, which follow the index type. 4. Append double L(`LL`) to constant on Windows and MacOS, because of their int64_t are are long long. 5. Fix `std::min/std::max` type miss match by static_cast to `INDEX_TYPE`. 6. Fix UTs, containts: cuda `TestMemoryPlanning::test_cpp_wrapper`, and `test_indexing.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133892 Approved by: https://github.com/jansel	2024-08-20 16:54:12 +00:00
Oguz Ulgen	65b3e42074	Warn on fx graph cache bypass and log it to tlparse (#133826 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133826 Approved by: https://github.com/aorenste	2024-08-19 23:39:55 +00:00
Aaron Orenstein	68fcd54226	Lower cache mocking to test more pytorch code (#133579 ) Summary: Previously we were mocking out FbRemoteFxGraphCacheBackend which meant that we were missing testing a whole bunch of the cache code. Cache at a lower level (CacheClient, LocalAutotuneCacheBackend, ManifoldClient, Redis) so we cover a larger amount of the caching code. Test Plan: unit tests Reviewed By: oulgen Differential Revision: D60937966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133579 Approved by: https://github.com/oulgen	2024-08-19 16:32:36 +00:00
Oguz Ulgen	12b8e29203	Add a fudge factor to ephemeral NCCL timeout increase (#133722 ) Differential Revision: [D61422431](https://our.internmc.facebook.com/intern/diff/D61422431) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133722 Approved by: https://github.com/c00w, https://github.com/aorenste ghstack dependencies: #133504	2024-08-17 03:08:40 +00:00
Oguz Ulgen	455f6bda56	Add cache timings info to tlparse (#133504 ) https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpLR1T85/rank_1/0_0_0/fx_graph_cache_hash_11.json Differential Revision: [D61422432](https://our.internmc.facebook.com/intern/diff/D61422432) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133504 Approved by: https://github.com/jamesjwu	2024-08-17 01:37:53 +00:00
Oguz Ulgen	0063e56949	Make FX Graph Cache work with distributed training (#133374 ) During distributed training if all ranks except one hit the cache, the rank that did not hit the cache will cause a NCCL timeout since rest of the ranks will enter the collective and start the timer. This PR uses the new PTD API to increase timeout for the ranks that hit the cache by the amount of time the cache would save. Differential Revision: [D61363722](https://our.internmc.facebook.com/intern/diff/D61363722) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133374 Approved by: https://github.com/ezyang	2024-08-16 18:51:14 +00:00
Bill Yoshimi	4ee65c7e4e	Add message text to BypassFxGraphCache exceptions. (#133505 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133505 Approved by: https://github.com/oulgen	2024-08-16 18:02:59 +00:00
Xuehai Pan	758a0a88a2	[BE][Easy] enable `ruff` rule `PIE790`: unnecessary `pass` statement (#133200 ) This PR removes unnecessary `pass` statement. This is semanticly safe because the bytecode for the Python code does not change. Note that if there is a docstring in the function, a empty function does not need a `pass` statement as placeholder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133200 Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/kit1980	2024-08-15 15:50:19 +00:00
PyTorch MergeBot	07adae3dac	Revert "Make FX Graph Cache work with distributed training (#133374 )" This reverts commit `dcdb25453e`. Reverted https://github.com/pytorch/pytorch/pull/133374 on behalf of https://github.com/albanD due to Broke trunk ([comment](https://github.com/pytorch/pytorch/pull/133374#issuecomment-2291289260))	2024-08-15 13:43:16 +00:00
PyTorch MergeBot	32d890745d	Revert "Add cache timings info to tlparse (#133504 )" This reverts commit `7eb31e5023`. Reverted https://github.com/pytorch/pytorch/pull/133504 on behalf of https://github.com/albanD due to Broke trunk ([comment](https://github.com/pytorch/pytorch/pull/133374#issuecomment-2291289260))	2024-08-15 13:43:16 +00:00
Oguz Ulgen	7eb31e5023	Add cache timings info to tlparse (#133504 ) https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpLR1T85/rank_1/0_0_0/fx_graph_cache_hash_11.json Pull Request resolved: https://github.com/pytorch/pytorch/pull/133504 Approved by: https://github.com/jamesjwu ghstack dependencies: #133362, #133363, #133374	2024-08-15 05:53:00 +00:00
Oguz Ulgen	dcdb25453e	Make FX Graph Cache work with distributed training (#133374 ) During distributed training if all ranks except one hit the cache, the rank that did not hit the cache will cause a NCCL timeout since rest of the ranks will enter the collective and start the timer. This PR uses the new PTD API to increase timeout for the ranks that hit the cache by the amount of time the cache would save. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133374 Approved by: https://github.com/ezyang ghstack dependencies: #133362, #133363	2024-08-14 22:58:48 +00:00
Xu Han	36c4ed8e49	[inductor] add FreeLibrary to DLLWrapper for Windows. (#133184 ) For previous PR https://github.com/pytorch/pytorch/pull/132630 . We found `DLLWrapper` class doesn't have `_dlclose` implemention for Windows. I write a small test project to figure out how to make it works on Windows: https://github.com/xuhancn/ctypes_all_lifecycle/blob/main/pysrc/module_manage.py#L30-L61 Test result: https://github.com/xuhancn/ctypes_all_lifecycle/tree/main?tab=readme-ov-file#ctypes_cyclepy So, I have port the Windows FreeLibrary implemention to pytorch DLLWrapper in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133184 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-08-12 19:55:48 +00:00
Xu Han	4a3a30c36e	[inductor] remove deprecated cpp_builder implementation. (#133161 ) I have worked with @henrylhtsang to switch the cpp_builder to new one. We have removed the dependency to the old implementation. So, it is time to remove the old implementation now. This PR is done the change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133161 Approved by: https://github.com/ezyang	2024-08-10 14:21:22 +00:00
Xu Han	2ad011ca73	[inductor] remove debug code of AotCodeCompiler (#132823 ) Since we switch AotCodeCompiler to new cpp_builder: https://github.com/pytorch/pytorch/pull/132766 We can remove debug code of AotCodeCompiler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132823 Approved by: https://github.com/henrylhtsang	2024-08-10 08:04:48 +00:00
James Wu	f037803290	Add ChromiumEventLogger, log FXGraphCache and AOTAutogradCache (#132864 ) This PR implements ChromiumEventLogger in all @dynamo_timed events. For each dynamo timed call, we log: - A start event before starting the function execution - An end event after finishing the function execution - An extra pair of start/end events for any phase names included in dynamo. Separately, this also gives us the ability to log instant events. I use them to log cache hits/misses as a first step. The little arrows on the bottom of the UI are cache hits/misses, and you can look at cache details by clicking each triangle. The outputted chromium trace events can be viewed in perfetto for a timeline of an execution. Here's what it looks like for a run of nanogpt: ![image](https://github.com/user-attachments/assets/cb9e6c7a-1acf-45e6-8a27-6651d9ae6132) And another with warm start: ![image](https://github.com/user-attachments/assets/cd9709bc-59ef-4da1-a7dd-10b1a0ab9b8f) Trace events are based around the JSON Event format: https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview We may want to switch to the less deprecated Protobuf format later, but so far I don't see any features we care about supported there. Internal FB employees can see a link to this in the tlparse output: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpVi1FIl/dedicated_log_torch_trace_bb4zl_bc.log/index.html I'll also work on logging these Pull Request resolved: https://github.com/pytorch/pytorch/pull/132864 Approved by: https://github.com/aorenste	2024-08-10 01:15:53 +00:00
Henry Tsang	78cf8df4a0	[aoti] forward fix of [inductor] switch AotCodeCompiler to new cpp_builder. (take 3) (#133042 ) Summary: Forward fix of a test failure caused by D60773405. The idea of D60773405 is that we need to use absolute path. So we will want to use the older version of path for output_so and output_o. However, when I was copying the older definitions of output_so and output_o, I thought it was okay to simplify it a bit. See https://github.com/pytorch/pytorch/pull/131304#issuecomment-2270016609 Turns out I was wrong. Test Plan: ci Differential Revision: D60990594 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133042 Approved by: https://github.com/hl475, https://github.com/desertfire	2024-08-09 18:53:27 +00:00
Danielmic	32f9a809c7	Replace [[unlikely]] with unlikely(x) (#130816 ) Do not use `[[unlikely]]` as its c++20 language features, see https://en.cppreference.com/w/cpp/language/attributes/likely Fixes https://github.com/pytorch/pytorch/issues/130815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130816 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/malfet	2024-08-07 10:38:13 +00:00
Henry Tsang	e98eac76b3	[inductor] switch AotCodeCompiler to new cpp_builder. (take 3) (#132766 ) Summary: This is basically https://github.com/pytorch/pytorch/pull/131304 together with https://github.com/pytorch/pytorch/pull/132594 and absolute path fix for fbcode. Test Plan: ci Differential Revision: D60773405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132766 Approved by: https://github.com/xuhancn, https://github.com/chenyang78, https://github.com/desertfire	2024-08-06 23:56:34 +00:00
eellison	18b678082e	[Easy] log output code path on cache hit (#132718 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132718 Approved by: https://github.com/oulgen, https://github.com/masnesral	2024-08-06 21:59:30 +00:00
Gabriel Ferns	c3ee07c71c	add missing profiler include in cpp code generation (#132419 ) Summary: When a user sets config.profiler_mark_wrapper_call, RECORD_FUNCTION annotations are added to the code. This requires importing the header <ATen/record_function.h>, but the conditional for doing so didn't check config.profiler_mark_wrapper_call. Test Plan: This case is already covered in test_profiler_mark_wrapper_call. ``` (pytorch-3.10) [gabeferns@devvm2252.cco0 ~/pytorch (missing-profile-include)]$ TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCHINDUCTOR_CPP_WRAPPER=1 python test/inductor/test_torchinductor.py -k CpuTests.test_profiler_mark_wrapper_call_cpu stats [('calls_captured', 1), ('unique_graphs', 1)] inductor [('fxgraph_cache_miss', 1)] aot_autograd [('total', 1), ('ok', 1)] . ---------------------------------------------------------------------- Ran 1 test in 8.080s OK ``` Fixes https://github.com/pytorch/pytorch/issues/131339 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132419 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-08-05 13:40:47 +00:00
Oguz Ulgen	09f9c256ad	Add basic mypy annotations to inductor (#132416 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132416 Approved by: https://github.com/XuehaiPan, https://github.com/jamesjwu ghstack dependencies: #132415	2024-08-04 18:43:37 +00:00
PyTorch MergeBot	f2ddd5e9e0	Revert "Add basic mypy annotations to inductor (#132416 )" This reverts commit `78927d37f6`. Reverted https://github.com/pytorch/pytorch/pull/132416 on behalf of https://github.com/ZainRizvi due to Sorry, this PR has entered a weird state in the diff train. Trying to revert it to skip it, and then we can try relanding it ([comment](https://github.com/pytorch/pytorch/pull/132415#issuecomment-2267631785))	2024-08-04 18:39:29 +00:00
Xuehai Pan	f7aeb394b6	[BE][Easy] Remove empty `ISORT_SKIPLIST` (#132572 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132572 Approved by: https://github.com/ezyang, https://github.com/justinchuby ghstack dependencies: #129769	2024-08-04 10:24:09 +00:00
Sam Larsen	b71cd149ce	Fix file lock issue in AotCodeCompiler (#132343 ) Summary: It looks like there are several places in AotCodeCompiler that write files in a way that aren't safe for concurrency. There's a filelock to cope with that, but it seems like the lock path isn't quite robust enough to prevent races. We have an internal stress test failing when executing multiple concurrent versions of the test. It seems as though there's some variability in the content we write to the cpp file, which means we can get a different 'key' across different runs. The lock path includes that key in the lock path name, but the path for the "consts_path" is computed separately. Therefore, I see things like this: - The computed 'key' is `cp5tgbuxuegvg5g2j7oi6u74nkf3v7mx5w3qzl6qbedtmw5tq77z` - The lock_path (based on the key) is: `/tmp/torchinductor_slarsen/locks/cp5tgbuxuegvg5g2j7oi6u74nkf3v7mx5w3qzl6qbedtmw5tq77z.lock` - The cpp path is (also includes the key) is: `/tmp/torchinductor_slarsen/cenzkqfnhu53mrhrdhzjtnblzyma2hgmeo7hai5yqsxzirdavurh/cp5tgbuxuegvg5g2j7oi6u74nkf3v7mx5w3qzl6qbedtmw5tq77z.cpp` - The consts_path (not based on the key) is: `/tmp/torchinductor_slarsen/cenzkqfnhu53mrhrdhzjtnblzyma2hgmeo7hai5yqsxzirdavurh/cifbshkqkbsurzldsyi2vl5bsnhvejmavys4kktpwrzmpo4ysuoy.bin` So we have different test instances using different lock paths, but touching the same consts_path and therefore stomping on each others' consts_path. To fix, include the key in the consts_paths. Test Plan: Ran internal stress test. Repro'd failure and verified this change fixes it. Differential Revision: D60552021 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132343 Approved by: https://github.com/desertfire	2024-08-02 19:01:37 +00:00
Edward Z. Yang	290f09f829	Ban decorator usage of dynamo_timed (#132328 ) This is a more manual version of https://github.com/pytorch/pytorch/pull/132073 that just manually creates the new function at each call site instead of magicking it with clone. Review with whitespace diffs off. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132328 Approved by: https://github.com/albanD	2024-08-02 12:00:46 +00:00
PyTorch MergeBot	c8958f8f84	Revert "Ban decorator usage of dynamo_timed (#132328 )" This reverts commit `9853c048eb`. Reverted https://github.com/pytorch/pytorch/pull/132328 on behalf of https://github.com/clee2000 due to seems to have broken functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation_aliases_other_input [GH job link](https://github.com/pytorch/pytorch/actions/runs/10204547165/job/28233976446) [HUD commit link](`9853c048eb`). Test passed on PR, probably a landrace, base is only 10 hours old ([comment](https://github.com/pytorch/pytorch/pull/132328#issuecomment-2263909337))	2024-08-01 20:20:28 +00:00
Oguz Ulgen	78927d37f6	Add basic mypy annotations to inductor (#132416 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132416 Approved by: https://github.com/XuehaiPan, https://github.com/jamesjwu ghstack dependencies: #132415	2024-08-01 20:14:25 +00:00
Edward Z. Yang	9853c048eb	Ban decorator usage of dynamo_timed (#132328 ) This is a more manual version of https://github.com/pytorch/pytorch/pull/132073 that just manually creates the new function at each call site instead of magicking it with clone. Review with whitespace diffs off. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132328 Approved by: https://github.com/albanD	2024-08-01 19:27:58 +00:00
eellison	f32ab3b9e3	Migrate Inductor scheduler, dependencies, ir, and codegen/common to use OrderedSet (#130004 ) Python's set is non deterministic. There is an internal failure which we recently ran into which did not consistently fail. See, repro here: P1453035092. Now, with these changes, it does consistently fail. In follow ups we could also consider adding a lintrule for uses of either set() or set literals. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130004 Approved by: https://github.com/oulgen	2024-08-01 04:37:15 +00:00
Sergii Dymchenko	d72e863b3e	Fix lint after PR #130572 (#132316 ) Fix lint after https://github.com/pytorch/pytorch/pull/130572 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132316 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/ZainRizvi	2024-07-31 20:00:31 +00:00
James Wu	f9e4d05c15	Save and run post compilation steps within FXGraphCache (#130572 ) This PR mostly refactors by putting code into utils files so that they can be shared between codecache.py and compile_fx.py. Afterwards, it then changes compile_fx so that: - When saving to FXGraphCache, we save onto the CompiledFXGraph all the necessary metadata for running post compile steps (realigning inputs, cudagraphification). - When loading from FXGraphCache, we use the saved information directly, instead of calculating them from scratch. What this does is make it so that `FXGraphCache.load()` is a perfect cache on compile_fx_inner, in that it returns exactly what compile_fx_inner returns. This also makes it possible for AOTAutogradCache, given a key to the fx graph cache and example inputs, to get back the full return value of compile_fx_inner. ## What's a post compile step? We define a post-compile to be the set of actions that need to run after FXGraphCache either loads from the cache or misses and runs compilation. These steps include: - Setting the tracing context's output strides - Running cudagraphs if enabled - Maybe realign inputs if cudagraphs didn't run To run these steps, we save all the necessary metadata in CompiledFxGraph, and use them on a cache hit to reconstruct the object. ## Splitting cudagraphs work into pre/post compile Cudagraphs does a lot of work on the input graph module to determine if cudagraphs can be enabled. This is the code that involves cudagraph_tests and stack traces. This will work in a world where we have access to the input graph module, but with AOTAutograd warm start, we won't have access to that information anymore. Therefore we can split cudagraphs work into two parts: on a cache miss (and therefore a full compile), we do the cudagraphs testing work, and save cudagraph_fail_reasons into the cache. Then on a cache hit, we know whether or not we can run cudagraphs, and if we can't, we can emit the correct error messages. Implementation notes: - We save `fx_kwargs` directly onto the CompiledFXGraph. `fx_kwargs` is already, by definition, part of the cache key, so this is safe to do when it comes to cache correctness. - ^ Why do we do above even though FXGraphCache.load takes fx_kwargs as an argument? Because AOTAutogradCache doesn't have access to fx_kwargs: they're annoyingly encoded in the functools.partial() of the fw_compiler, so only inductor knows about these options. They're fully captured by the AOTAutogradCache key (since every key to fx_kwargs is either a global config, or a field that's deterministic based on an input graph module), but their values are still needed to run cudagraphs/postprocessing. Therefore, it's easier/safer to store it on the cached result. - Willing to hear other approaches here if we think saving these extra fields is not reasonable, though I can't think of another way to do this that's less complicated to explain. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130572 Approved by: https://github.com/eellison	2024-07-31 18:32:40 +00:00
PyTorch MergeBot	784a6ec5a3	Revert "Migrate Inductor scheduler, dependencies, ir, and codegen/common to use OrderedSet (#130004 )" This reverts commit `13d744464f`. Reverted https://github.com/pytorch/pytorch/pull/130004 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/10183945999/job/28170099930) [HUD commit link](`13d744464f`) probably a landrace, the base is 21 hours old ([comment](https://github.com/pytorch/pytorch/pull/130004#issuecomment-2260946562))	2024-07-31 16:49:21 +00:00
eellison	13d744464f	Migrate Inductor scheduler, dependencies, ir, and codegen/common to use OrderedSet (#130004 ) Python's set is non deterministic. There is an internal failure which we recently ran into which did not consistently fail. See, repro here: P1453035092. Now, with these changes, it does consistently fail. In follow ups we could also consider adding a lintrule for uses of either set() or set literals. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130004 Approved by: https://github.com/oulgen	2024-07-31 16:22:11 +00:00
Xuehai Pan	e7eeee473c	[BE][Easy][14/19] enforce style for empty lines in import segments in `torch/_[a-c]/` and `torch/_[e-h]/` and `torch/_[j-z]*/` (#129765 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129765 Approved by: https://github.com/ezyang	2024-07-31 10:42:50 +00:00
PyTorch MergeBot	239d4d2489	Revert "[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127 )" This reverts commit `9606d61e0c`. Reverted https://github.com/pytorch/pytorch/pull/130127 on behalf of https://github.com/ZainRizvi due to broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/130127#issuecomment-2258871791))	2024-07-30 17:39:41 +00:00
Xu Han	9606d61e0c	[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127 ) Changes: 1. Switch `AotCodeCompiler` to new cpp_builder. 2. Only use `deprecated_cpp_compile_command` for `fb_code`, due to I can't debug anymore on no Meta internal environment access. 3. Add `TODO` comments for further some Meta employee help on contine to do this work. 4. Due to item 3, we only remaining `deprecated_cpp_compile_command` for `fb_code` to be fix, let's remove `validate_new_cpp_commands`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-07-27 01:46:13 +00:00
PyTorch MergeBot	bb64702eb3	Revert "[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127 )" This reverts commit `520182dbff`. Reverted https://github.com/pytorch/pytorch/pull/130127 on behalf of https://github.com/clee2000 due to broke internal tests D60265910 ([comment](https://github.com/pytorch/pytorch/pull/130127#issuecomment-2253113689))	2024-07-26 16:40:03 +00:00
Bin Bao	1e24f7875e	[AOTI] Fix ABI-compatible mode link issue for CPU (#131791 ) Summary: Found this "cannot find -ltorch: No such file or directory" issue when collecting AOTI CPU perf for the dashboard. Debugging on the CI machine revealed two problems: 1) no valid VEC_ISA was picked; 2) when 1 happens, libtorch path is not specified in the linker path. This PR fixes the second problem. A later PR will fix the first problem, but somehow finding the right VEC_ISA causes a performance regression, which needs more investigation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131791 Approved by: https://github.com/zou3519, https://github.com/chenyang78	2024-07-26 09:02:13 +00:00
Xu Han	520182dbff	[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127 ) Changes: 1. Switch `AotCodeCompiler` to new cpp_builder. 2. Only use `deprecated_cpp_compile_command` for `fb_code`, due to I can't debug anymore on no Meta internal environment access. 3. Add `TODO` comments for further some Meta employee help on contine to do this work. 4. Due to item 3, we only remaining `deprecated_cpp_compile_command` for `fb_code` to be fix, let's remove `validate_new_cpp_commands`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-07-25 21:45:40 +00:00
PyTorch MergeBot	fe2e6f0c51	Revert "[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127 )" This reverts commit `dfc9bfc883`. Reverted https://github.com/pytorch/pytorch/pull/130127 on behalf of https://github.com/atalman due to Breask CI test_dataloader.py::TestDataLoader::test_segfault [GH job link](https://github.com/pytorch/pytorch/actions/runs/10099725941/job/27930133346) [HUD commit link](`2c1851f04e`) ([comment](https://github.com/pytorch/pytorch/pull/130127#issuecomment-2251360224))	2024-07-25 20:44:04 +00:00
Xu Han	dfc9bfc883	[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127 ) Changes: 1. Switch `AotCodeCompiler` to new cpp_builder. 2. Only use `deprecated_cpp_compile_command` for `fb_code`, due to I can't debug anymore on no Meta internal environment access. 3. Add `TODO` comments for further some Meta employee help on contine to do this work. 4. Due to item 3, we only remaining `deprecated_cpp_compile_command` for `fb_code` to be fix, let's remove `validate_new_cpp_commands`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-07-25 18:34:08 +00:00
Yunqiu Guo	059f9fb30b	[BE][inductor] Type annotate `codecache.py` and `config.py` (#131427 ) As title. Checked/ Referred to the raw json file for runtime types . (and tried to cover all the missing annotations listed in the .json) this time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131427 Approved by: https://github.com/eellison, https://github.com/oulgen	2024-07-25 05:54:38 +00:00
angelayi	b90aa18569	[aoti] Add initial custom op support (#127034 ) Re-land of https://github.com/pytorch/pytorch/pull/125242 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127034 Approved by: https://github.com/malfet	2024-07-24 20:29:55 +00:00

1 2 3 4 5 ...

448 Commits