pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Bin Bao	687c15c0b3	[AOTI][BE] Change test_aoti_inference to one-pass build (#164277 ) Summary: To fix https://github.com/pytorch/pytorch/issues/159400. Currently, test_aoti_abi_check and test_aoti_inference need to be built in two passes, first build pytorch using the regular `pythonsetup.py develop` and then build with `CMAKE_FRESH=1 BUILD_AOT_INDUCTOR_TEST=1 python setup.py devleop`. This is cumbersome. Fix by rewriting CMakeLists.txt for test_aoti_inference to one-pass build which runs AOTI to compile models at the test time. Also update CI test script to get rid of two-pass build. For test_aoti_abi_check, it is not AOTI specific, so we make it not guarded by BUILD_AOT_INDUCTOR_TEST. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164277 Approved by: https://github.com/janeyx99	2025-10-28 17:43:22 +00:00
Mu-Chu Lee	2291199e9b	[AOTInductor] Use CudaCachingAllocator for memory allocation (#162893 ) Summary: Use c10::CudaCachingAllocator for AOTInductor's initial constant buffer allocation. Test Plan: Activate test under test/cpp/aoti_inference/test.cpp Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/162893 Approved by: https://github.com/desertfire	2025-09-17 17:08:20 +00:00
Mu-Chu Lee	19ce1beb05	[AOTInductor] Add test for enabling CUDACachingAllocator for AOTInductor's Weight (#159279 ) Summary: Add test for enabling CUDACachingAllocator for AOTInductor's Weight. Implementation TBD Test Plan: N/A, commit is adding a test. Rollback Plan: Differential Revision: D79107507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159279 Approved by: https://github.com/desertfire, https://github.com/jingsh	2025-07-29 02:52:10 +00:00
Benjamin Glass	4060f30042	[AOTI] Convert C-struct zip handling to RAII container (#158687 ) Attempts to fix a memory leak reported in #158614 by wrapping manually managed MiniZ C-structs in an RAII container. I have been unable to reproduce the reported leak, but this seems like the most likely candidate. Fixes #158614 (hopefully) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158687 Approved by: https://github.com/desertfire	2025-07-22 16:01:51 +00:00
PyTorch MergeBot	97d7dc197f	Revert "[AOTI] Convert C-struct zip handling to RAII container (#158687 )" This reverts commit `8ed5e1844c`. Reverted https://github.com/pytorch/pytorch/pull/158687 on behalf of https://github.com/ZainRizvi due to Sorry but I had to revert this PR in order to revert https://github.com/pytorch/pytorch/pull/158671 ([comment](https://github.com/pytorch/pytorch/pull/158687#issuecomment-3099515618))	2025-07-21 22:13:26 +00:00
Benjamin Glass	8ed5e1844c	[AOTI] Convert C-struct zip handling to RAII container (#158687 ) Attempts to fix a memory leak reported in #158614 by wrapping manually managed MiniZ C-structs in an RAII container. I have been unable to reproduce the reported leak, but this seems like the most likely candidate. Fixes #158614 (hopefully) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158687 Approved by: https://github.com/desertfire	2025-07-21 18:53:14 +00:00
Julius Herb	8f54e56e62	Add optional device index to AOTIModelPackageLoader (#152093 ) This is my suggestion for resolving #152087 This PR extends the constructor of `AOTIModelPackageLoader` with an (optional) device index. The device type is still determined by `metadata_["AOTI_DEVICE_KEY"]`, but the `device_index` argument can be used to move an AOTI model package to different devices like `cuda:0`, `cuda:1`, ... in a convenient way. AFAIK, this is not possible so far using `AOTIModelPackageLoader` alone. The default case (no device index specified) with `metadata_["AOTI_DEVICE_KEY"] == "cuda"` would lead to the current behavior, i.e., the model is loaded to device `cuda`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152093 Approved by: https://github.com/desertfire	2025-05-04 11:40:12 +00:00
Mu-Chu Lee	c3a18f6126	[AOTInductor] Add states for constant folding process (#151273 ) Summary: We add states in the constant folding process for AOTInductor. Basically, there's 3 states, which is (1) None: The state when no constants are loaded and uninitialized. (2) Initialized: The state when constants are loaded, but not yet folded. (3) Folded: The state where the model is fully ready with folded constants. Note that even if constant folding is not enabled, we still only run when state is FOLDED, this is okay because without constant folding, the transition from INITIALIZED to FOLDED is just a pass-throught. Test Plan: python test/inductor/test_aot_inductor.py -k test_constant_folding_with_update Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D73002538](https://our.internmc.facebook.com/intern/diff/D73002538) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151273 Approved by: https://github.com/jingsh, https://github.com/desertfire	2025-04-17 16:41:38 +00:00
Mu-Chu Lee	f3cf3ec591	[AOTInductor] Add User Managed buffer for AOTI constant buffer. (#150276 ) Summary: We add the functionality to allow users to directly pass in a at::Tensor into AOTInductor, that would be used as the constant. This user managed buffer skips the copying step in AOTInductor, and let users to directly manage the memory usage themselve. Test Plan: LD_LIBRARY_PATH=/data/users/$USER/pytorch/build/lib /data/users/$USER/pytorch/build/bin/test_aoti_inference Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D72589514](https://our.internmc.facebook.com/intern/diff/D72589514) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150276 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2025-04-10 00:15:44 +00:00
Mu-Chu Lee	063ea5d669	[AOTInductor] Modify test for Memory tracking for memory-related (#150269 ) operations Summary: Fix the test for memory tracking. This PR does: (1) Add tracking before and after for all memory-related operations. Make sure the operation do indeed captures memory both in CUDA and torch's CUDACachAllocator Make sure the operation do indeed captures consumed memory both in CUDA and torch's CUDACachAllocator. (2) Keep track of memory being reserved by CUDACacheAllocator in torch and it's relationship with global CUDA memory consumption. Test Plan: This PR is adding tests. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/150269 Approved by: https://github.com/jingsh, https://github.com/chenyang78, https://github.com/desertfire	2025-04-02 04:18:18 +00:00
Mu-Chu Lee	a2070e2fd5	[AOTInductor] Free tensors in test (#150274 ) Summary: This PR frees tensor that were new-ed within the test itself to prevent memory leak. Test Plan: Fixing tests itself. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/150274 Approved by: https://github.com/chenyang78	2025-03-31 23:28:13 +00:00
Mu-Chu Lee	03313c6619	[AOTInductor] Add function for users to extract constants in container (#150163 ) Summary: Add extract_constant_map that allows users to inspect the constants being used by AOTInductor Test Plan: `python test/inductor/test_aot_inductor.py -k extract_constants_map` `LD_LIBRARY_PATH=/data/users/$USER/pytorch/build/lib /data/users/$USER/pytorch/build/bin/test_aoti_inference` Differential Revision: D72020400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150163 Approved by: https://github.com/chenyang78	2025-03-29 03:36:12 +00:00
Mu-Chu Lee	e6afb51805	[AOTInductor] Free folded constants that's managed by AOTInductor (#149825 ) internally. Summary: This diff allows freeing the usage of folded constants that's created by AOTInductor through CUDACachingAllocator instead of the constant blob from cudaMalloc directly. Test Plan: LD_LIBRARY_PATH=/data/users/$USER/pytorch/build/lib /home/$USER/local/pytorch/build/bin/test_aoti_inference Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/149825 Approved by: https://github.com/chenyang78, https://github.com/desertfire, https://github.com/jingsh	2025-03-27 06:05:50 +00:00
Mu-Chu Lee	12628ba24d	[AOTInductor] Bug fix for freeing buffers when freeing multiple times (#149810 ) Summary: We might free the active buffer if we free the buffer twice. Test Plan: ``` LD_LIBRARY_PATH=/data/users/$USER/pytorch/build/lib /home/$USER/local/pytorch/build/bin/test_aoti_inference ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/149810 Approved by: https://github.com/chenyang78	2025-03-25 20:26:36 +00:00
Bin Bao	04e251a7dd	[AOTI] Add num_runners to AOTIModelPackageLoader (#149364 ) Summary: AOTIModelContainerRunner takes a num_runners argument for multi-threaded inference, but AOTIModelPackageLoader forgot to take the same parameter, although its run() API already expects to take an optional cudaStream_t parameter for multi-threaded inference. Differential Revision: [D71357418](https://our.internmc.facebook.com/intern/diff/D71357418) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149364 Approved by: https://github.com/angelayi	2025-03-19 02:28:06 +00:00
Mu-Chu Lee	bb42e4d137	[AOTInductor] Add function to free buffer (#149161 ) Summary: We add a function that allows users to free the unused buffer. Test Plan: Testing correctness: python test/inductor/test_aot_inductor.py -k free_inactive Testing memory consumption: LD_LIBRARY_PATH=/data/users/$USER/pytorch/build/lib /home/$USER/local/pytorch/build/bin/test_aoti_inference Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/149161 Approved by: https://github.com/chenyang78, https://github.com/desertfire ghstack dependencies: #149249	2025-03-18 02:43:14 +00:00
Joel Schlosser	5e1b715dda	BC fix for AOTIModelPackageLoader() constructor defaults (#149082 ) The default value for `run_single_threaded` was wrongly specified in the .cpp file instead of the header, breaking C++-side instantiation of `AOTIModelPackageLoader` with no arguments. This PR fixes this and adds a test for the use case of running with `AOTIModelPackageLoader` instead of `AOTIModelContainerRunner` on the C++ side. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149082 Approved by: https://github.com/desertfire	2025-03-13 18:40:53 +00:00
Bin Bao	b9803a5c81	[AOTI] Re-enable AOTI cpp unit test (#149085 ) Summary: test_inductor_aoti was removed by accident previously. Add it back. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149085 Approved by: https://github.com/jbschlosser	2025-03-13 16:00:38 +00:00
Bin Bao	1868fc63d8	[AOTI] Update C++ runner API to take a const vector (#139955 ) Summary: Tighten the AOTIModelContainerRunner::run interface to take a const vector of at::Tensor, which 1) makes it clear that the runner will not modify the input tensor vector; 2) runner will be able to take a temp vector of tensors as the input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139955 Approved by: https://github.com/chenyang78	2024-11-08 16:59:10 +00:00
Bin Bao	310eb6d8c6	[AOTI] Fix test_aoti_inference CPU build issue (#134675 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/130311. We need to guard CUDA-only code in test_aoti_inference with macros so that it won't fail for CPU-only platform. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134675 Approved by: https://github.com/atalman, https://github.com/chunyuan-w	2024-08-28 17:42:19 +00:00
Prachi Gupta	c326533999	[ROCm][Inductor] Enable AOT Inductor CPP UTs for ROCm (#131521 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/131521 Approved by: https://github.com/jataylo, https://github.com/pruthvistony, https://github.com/malfet	2024-08-08 19:49:56 +00:00
Bin Bao	4946638f06	[AOTI] Add ABI-compatiblity tests (#123848 ) Summary: In AOTInductor generated CPU model code, there can be direct references to some aten/c10 utility functions and data structures, e.g. at::vec and c10::Half. These are performance critical and thus it doesn't make sense to create C shim for them. Instead, we make sure they are implemented in a header-only way, and use this set of tests to guard future changes. There are more header files to be updated, but we will do it in other followup PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123848 Approved by: https://github.com/jansel ghstack dependencies: #123847	2024-04-19 00:51:24 +00:00

22 Commits