pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Tom Ritchford	b5475d334e	[inductor] Fix an unused variable in cpu_vec_isa.py (#138473 ) ---- * Extracted from https://github.com/pytorch/pytorch/pull/133492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138473 Approved by: https://github.com/EikanWang, https://github.com/albanD, https://github.com/xuhancn	2024-12-20 18:50:19 +00:00
Huamin Li	f5af87c23c	Make Inductor cpp backend enable_floating_point_contract_flag to take string (#143450 ) Differential Revision: D66269001 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143450 Approved by: https://github.com/desertfire	2024-12-20 16:28:54 +00:00
PyTorch MergeBot	71479a9b9c	Revert "[AOTI] Emit a CMakeLists.txt when package_cpp_only (#143352 )" This reverts commit `429f4cd140`. Reverted https://github.com/pytorch/pytorch/pull/143352 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the new test is failing on ROCm ([comment](https://github.com/pytorch/pytorch/pull/143352#issuecomment-2556365140))	2024-12-20 06:21:31 +00:00
Bin Bao	429f4cd140	[AOTI] Emit a CMakeLists.txt when package_cpp_only (#143352 ) Summary: Emit a CMakeLists.txt with compile and link options when package_cpp_only is specified. After unzipping AOTI generated .pt2 package file, user can manually build the generated model code in their local environment. Differential Revision: [D67458526](https://our.internmc.facebook.com/intern/diff/D67458526) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143352 Approved by: https://github.com/malfet	2024-12-19 22:01:05 +00:00
Bin Bao	0e8013fc1c	[AOTI] Fix a typo in cpp_builder.py (#143351 ) Summary: passthough -> passthrough Pull Request resolved: https://github.com/pytorch/pytorch/pull/143351 Approved by: https://github.com/yushangdi, https://github.com/chenyang78 ghstack dependencies: #143350	2024-12-18 16:28:37 +00:00
Benjamin Glass	bb06fc79fb	cpp_builder: handle CUDA lib paths involving "stubs" in more circumstances (#142175 ) conda packages for `cuda-driver-dev=12.4.127` use a "stubs" subdirectory to contain `libcuda.so`. This was previously only handled by cpp_builder in some cases, but now needs to be potentially handled more generally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142175 Approved by: https://github.com/desertfire	2024-12-17 17:21:27 +00:00
Tom Ritchford	dc23f1944a	Remove unused Python variables in torch/[_-a]* (#133492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492 Approved by: https://github.com/albanD	2024-12-12 17:39:14 +00:00
Colin L. Rice	d68403df3b	filelock: Make waitcounter variant to use (#139816 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139816 Approved by: https://github.com/ezyang	2024-12-12 01:18:34 +00:00
PyTorch MergeBot	5c97ac9721	Revert "Remove unused Python variables in torch/[_-a]* (#133492 )" This reverts commit `fda975a7b3`. Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else. The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))	2024-12-11 17:29:12 +00:00
PyTorch MergeBot	2374d460d0	Revert "filelock: Make waitcounter variant to use (#139816 )" This reverts commit `237c4b559c`. Reverted https://github.com/pytorch/pytorch/pull/139816 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else. The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/139816#issuecomment-2536616808))	2024-12-11 17:26:46 +00:00
Colin L. Rice	237c4b559c	filelock: Make waitcounter variant to use (#139816 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139816 Approved by: https://github.com/ezyang	2024-12-10 23:02:59 +00:00
Tom Ritchford	fda975a7b3	Remove unused Python variables in torch/[_-a]* (#133492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492 Approved by: https://github.com/albanD	2024-12-10 21:48:44 +00:00
Colin Peppler	0602676c8d	[CUTLASS][AOTI] Fixes undefined symbol: cudaLaunchKernelExC (#142094 ) Summary: ### Context * When compiling the object file for a CUTLASS kernel, CUDA RT symbols are left undefined. * When compiling the final shared object file, we statically link with `libcudart_static.a`. * One important thing is that ordering matters when specifying the lib search paths (-L). Test Plan: ``` // before diff RuntimeError: Failure loading .so: /tmp/tmpqhz_dnza/model.so: undefined symbol: cudaLaunchKernelExC ``` Differential Revision: D66793974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142094 Approved by: https://github.com/chenyang78, https://github.com/hl475	2024-12-06 02:18:54 +00:00
xinan.lin	4742080ed9	[AOTI XPU] Enable Cpp wraper for Intel GPU. (#135318 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135318 Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/guangyey, https://github.com/desertfire	2024-11-26 11:51:32 +00:00
Joseph Kleinhenz	7b2138b864	[inductor] fix uncaught exception when checking for openmp on macos (#141208 ) Based on #133776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141208 Approved by: https://github.com/Skylion007	2024-11-21 22:17:52 +00:00
Aaron Gokaslan	12e95aa4ee	[BE]: Apply PERF401 autofixes from ruff (#140980 ) * Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables. * list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize. * Manually went back and made mypy happy after the change. * Also fixed style lints in files covered by flake8 but not by pyfmt Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-11-20 17:52:07 +00:00
Valentine233	263a5bf95e	[cpu] Modify inductor opt flag --- ftree-loop-vectorize (#136827 ) Reopen https://github.com/pytorch/pytorch/pull/121782, as more optimizations have landed. Fixes https://github.com/pytorch/pytorch/issues/115261, https://github.com/pytorch/pytorch/issues/113017. For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues. ### Validation on 3 benchmark suites #### FP32 ![image](https://github.com/user-attachments/assets/ec920928-fa36-467f-ba07-d2c05c51b92e) Outlier models (speedup<0.8, single socket): None. #### BF16 ![image](https://github.com/user-attachments/assets/4a301e5e-147d-4b74-beb1-40290969ed80) Outlier models (speedup<0.8, single socket multi threads): - functorch_dp_cifar10 0.58 - opacus_cifar10 0.57 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136827 Approved by: https://github.com/jansel, https://github.com/jgong5	2024-11-12 01:26:18 +00:00
PyTorch MergeBot	347f96061f	Revert "[cpu] Modify inductor opt flag --- ftree-loop-vectorize (#136827 )" This reverts commit `cf0bb6c435`. Reverted https://github.com/pytorch/pytorch/pull/136827 on behalf of https://github.com/ZainRizvi due to Sorry but this breaks internally. See D65605094 for more details ([comment](https://github.com/pytorch/pytorch/pull/136827#issuecomment-2465805271))	2024-11-08 21:52:33 +00:00
Valentine233	cf0bb6c435	[cpu] Modify inductor opt flag --- ftree-loop-vectorize (#136827 ) Reopen https://github.com/pytorch/pytorch/pull/121782, as more optimizations have landed. Fixes https://github.com/pytorch/pytorch/issues/115261, https://github.com/pytorch/pytorch/issues/113017. For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues. ### Validation on 3 benchmark suites #### FP32 ![image](https://github.com/user-attachments/assets/ec920928-fa36-467f-ba07-d2c05c51b92e) Outlier models (speedup<0.8, single socket): None. #### BF16 ![image](https://github.com/user-attachments/assets/4a301e5e-147d-4b74-beb1-40290969ed80) Outlier models (speedup<0.8, single socket multi threads): - functorch_dp_cifar10 0.58 - opacus_cifar10 0.57 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136827 Approved by: https://github.com/jansel, https://github.com/jgong5	2024-11-07 02:49:52 +00:00
Irem Yuksel	b021486405	Enable Windows Arm64 (#133088 ) This PR enables Pytorch for Windows on Arm64 - CPU only. Currently, there aren't any checks in place to build and test for Windows on Arm64, but we're working to implement those as soon as possible. We recommend using [Arm Performance Libraries (APL)](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries) as a BLAS option, which is introduced in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133088 Approved by: https://github.com/malfet Co-authored-by: cristian panaite <panaite.cristian2000@gmail.com> Co-authored-by: Stefan-Alin Pahontu <56953855+alinpahontu2912@users.noreply.github.com> Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>	2024-10-24 16:10:44 +00:00
Zhuoran Zhao	2414c3f534	AOTI fixes for MI300 lowering (#137939 ) Summary: 1) Add sleef back to enable SIMD on AMD 2) adding kpack to triton compute_meta for AMD triton, since there will be user-defined triton kernels using this for k-dim packing Test Plan: ``` HIP_VISIBLE_DEVICES=0 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" buck run mode/{opt,amd-gpu} -c fbcode.triton_backend=amd -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --skip-flop-estimation --skip-trt --skip-ait --enable-aot-inductor --sync-mode=0 --gpu-trace --sample-input-tile-factor=1 --load="manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/925729118/0/gpu_lowering/input.merge" --lowering-input-str='{"serialized_inference_model_input_path":"ads_storage_fblearner/tree/user/facebook/fblearner/predictor/925729118/0/gpu_lowering/input.merge","serialized_inference_model_output_path":"ads_storage_fblearner/tree/user/facebook/fblearner/predictor/925729118/0/gpu_lowering/mi300_output.merge","submodule_names_to_lower":["merge"],"inductor_lowering_context":{"aot_inductor_lowering_settings":{"use_scripting":true,"preset_lowerer":"ifu_cint;disable_new_lowering_weights;disable_dper_passes:passes=fuse_parallel_linear_no_weight_change","precision":3,"output_precision":3, "remove_unexpected_type_cast":false, "sample_input_tile_factor":32}},"model_entity_id":925729118,"model_snapshot_id":0,"add_sample_inputs":false,"hardware_type":0,"platform_arch":1,"dense_in_place_format":2}' --precision=bf16 2>&1 \| tee local_benchmark_log.txt ``` Differential Revision: D64262924 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137939 Approved by: https://github.com/frank-wei	2024-10-17 16:09:04 +00:00
Bin Bao	fe43f72be7	[AOTI] Remove the non-ABI-compatible mode (part 2) (#138047 ) Summary: Continue to clean up non-ABI-compatible mode related code. Differential Revision: [D64444327](https://our.internmc.facebook.com/intern/diff/D64444327) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138047 Approved by: https://github.com/chenyang78 ghstack dependencies: #137982, #138016, #138009	2024-10-17 02:54:24 +00:00
Henry Tsang	a0a978ce23	[aoti config] add raise_error_on_ignored_optimization (#138035 ) Summary: Unfortunately this means adding another config. Test Plan: ci Differential Revision: D64437699 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138035 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2024-10-16 18:38:47 +00:00
Bin Bao	c04b35a5ae	[AOTI] Add standalone version of TORCH_CHECK (#136873 ) Summary: In the standalone mode, TORCH_CHECK throws std::runtime_error, instead of c10::Error. The goal is to cut dependency on libtorch. Specifically, AOTI generates CPU code which may call ATen vectorization ops and we need to make sure those ops are self-contained. Differential Revision: [D63911928](https://our.internmc.facebook.com/intern/diff/D63911928) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136873 Approved by: https://github.com/albanD, https://github.com/chenyang78	2024-10-08 15:30:01 +00:00
Dan Zimmerman	b3972ee19a	[triton] Unify build_paths.py for NV & AMD, fix typing (#136952 ) Summary: Some build improvements. Test Plan: CI Differential Revision: D63583959 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136952 Approved by: https://github.com/bertmaher	2024-09-30 21:51:45 +00:00
Isuru Fernando	2a178a6982	Avoid changing FTZ/DAZ flags in CPP builder (#136466 ) Fixes https://github.com/pytorch/pytorch/issues/136273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136466 Approved by: https://github.com/ezyang	2024-09-24 14:39:17 +00:00
xinan.lin	67735d1ee8	[Inductor] Generalize `is_cuda` to specific device_type to make cpp_wrapper mode be extensible (#134693 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134693 Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/jansel	2024-09-10 10:11:13 +00:00
Xu Han	29d72c1100	[inductor] check intel compiler minimal version (#135209 ) On Windows: early version icx has `-print-file-name` issue, and can't preload correctly for inductor. Add minimal version check for Intel compiler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135209 Approved by: https://github.com/ezyang	2024-09-06 03:21:07 +00:00
Xu Han	6448d351db	[inductor] clean up cpp_builder code. (#134909 ) Clean up cpp_builder duplication code. Hi @henrylhtsang , could you please help on land internally? Pull Request resolved: https://github.com/pytorch/pytorch/pull/134909 Approved by: https://github.com/henrylhtsang	2024-09-04 05:29:08 +00:00
Xu Han	c40e622966	[inductor] add openmp config for intel conpiler on Linux. (#134973 ) Config `openmp` for Intel Compiler on Linux. Base on this PR, we can confirm the Intel optimized libraries are work built well. <img width="1039" alt="image" src="https://github.com/user-attachments/assets/838d5114-c778-4961-9cfe-39a814647089"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/134973 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-09-03 20:10:21 +00:00
Xu Han	136badae64	[inductor] preload icx built in math libs (#134870 ) Intel Compiler implenmented more math libraries than clang, for performance proposal. We need preload them like openmp library. reproduce UT: ```cmd pytest test/inductor/test_cpu_cpp_wrapper.py -v -k test_silu_cpu_dynamic_shapes_cpp_wrapper ``` Depends of module: <img width="804" alt="Image" src="https://github.com/user-attachments/assets/9a672e03-ebf5-4ebb-b182-09180e6f7841"> Local test pass: <img width="857" alt="image" src="https://github.com/user-attachments/assets/afbb8c1c-8fcc-4d64-a3ad-c8521b137d2d"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/134870 Approved by: https://github.com/jansel	2024-08-31 04:50:31 +00:00
Xu Han	15f5a4858b	[inductor] enable Intel Compiler(icx-cl) for inductor windows (#134772 ) This PR is enable Intel Compiler (`icx-cl`) for Windows inductor, likes previous PR: https://github.com/pytorch/pytorch/pull/134444 which enable clang. Changes: 1. Fix icx-cl crash by wrong decode args, the right decode should be "utf-8". 2. Add intel compiler check, and intel compiler Windows drivers check(icx-cl). 3. Add Intel compiler openmp args config. 4. Add intel compiler openmp binary preload. For intel compiler openmp binary path: <img width="788" alt="image" src="https://github.com/user-attachments/assets/54c76356-018d-4bef-a9b7-0ea150fd7aba"> For performance, Intel compiler(`icx-cl`) is much better performance than MSVC(`cl`): <img width="875" alt="image" src="https://github.com/user-attachments/assets/67865faf-b1de-4535-917a-486b72527204"> Append `clang-cl` performance data: <img width="821" alt="image" src="https://github.com/user-attachments/assets/476f4568-bf58-457f-b73d-4e57f49be384"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/134772 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-08-30 17:51:46 +00:00
Zhuoran Zhao	8b4c487581	Fix AOTInductor complication on ROCM (#134522 ) Summary: Original PR (https://github.com/pytorch/pytorch/pull/124123) is broken by cpp_builder refactoring So resubmit it to fix Test Plan: Test with command here: https://www.internalfb.com/phabricator/paste/view/P1549765548 Differential Revision: D61827208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134522 Approved by: https://github.com/frank-wei	2024-08-29 21:59:04 +00:00
Xu Han	1dd4b9221b	[inductor] enable clang for Windows inductor (#134444 ) Changes: 1. Add Windows clang-cl compiler check. 2. Add openmp config for clang-cl. 3. Preload libomp.dll when use clang. 4. Add compiler flags syntax check for `clang` and `clang++`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134444 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/malfet	2024-08-26 18:19:59 +00:00
Xu Han	98d6a6eb7d	[inductor] clean up TODO comments. (#133718 ) clean up TODO comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133718 Approved by: https://github.com/henrylhtsang	2024-08-16 22:12:01 +00:00
Xu Han	89795da5e3	[inductor] process compile_only case in all build options class. (#129975 ) Optimize `compile_only` logical. Origin code only apply for `CppTorchCudaOptions`, this PR make it apply for all build option classes. Changes: 1. Remove `libraries_dirs` and `libraries` settings, when `compile_only`. 2. Remove compile_only from CppTorchCudaOptions. 3. Make the `compile_only` apply for all classes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129975 Approved by: https://github.com/henrylhtsang	2024-08-13 16:45:27 +00:00
Xu Han	9f0d90655d	[inductor] cpp_builder add dynamo time trace for compile_file (#133103 ) trace `compile_file` time for cpp_builder. Ref: https://github.com/pytorch/pytorch/pull/132328/files#diff-c9b517f8db609ffa866804dfa2689188a4fee20abacaa0b0dca91625c1b5cb8dR2224 <img width="994" alt="image" src="https://github.com/user-attachments/assets/862c7943-79dc-4d06-b398-a09595ad1295"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/133103 Approved by: https://github.com/ezyang	2024-08-10 04:55:02 +00:00
Henry Tsang	e98eac76b3	[inductor] switch AotCodeCompiler to new cpp_builder. (take 3) (#132766 ) Summary: This is basically https://github.com/pytorch/pytorch/pull/131304 together with https://github.com/pytorch/pytorch/pull/132594 and absolute path fix for fbcode. Test Plan: ci Differential Revision: D60773405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132766 Approved by: https://github.com/xuhancn, https://github.com/chenyang78, https://github.com/desertfire	2024-08-06 23:56:34 +00:00
Xu Han	a672f6c84e	[inductor] unificate SUBPROCESS_DECODE_ARGS variable in cpp_builder.py (#132615 ) [inductor] unificate SUBPROCESS_DECODE_ARGS variable in cpp_builder.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/132615 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-08-05 16:00:35 +00:00
Xu Han	7f8a384a8f	[inductor] add msvc_cl compiler check (#132571 ) add `msvc_cl` compiler check. Local test: <img width="880" alt="image" src="https://github.com/user-attachments/assets/fe4da5e0-dd52-4dbc-831e-c32479e27a29"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132571 Approved by: https://github.com/ezyang	2024-08-04 03:48:25 +00:00
Xu Han	36ec0fdf10	[inductor] check compiler exist on Windows. (#132533 ) Current Windows env, if we are not activate the MSVC env. It will not raise a clear error to compiler: <img width="904" alt="image" src="https://github.com/user-attachments/assets/725ea608-d181-40b1-8930-42fe2b32643a"> With this PR, we can help users point to the issue is from compiler. <img width="1034" alt="image" src="https://github.com/user-attachments/assets/8515a796-e3e9-4909-a68f-8a14d4864951"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132533 Approved by: https://github.com/jansel	2024-08-03 07:47:11 +00:00
Xu Han	475da800c7	[inductor] optimize cflags for Windows. (#131980 ) changes: 1. optimize cflags for Windows. Ref: https://github.com/pytorch/pytorch/blob/v2.4.0/torch/utils/cpp_extension.py#L215 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131980 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-07-30 02:59:51 +00:00
Xu Han	28fd2e905d	[inductor] enhance cpp_builder lint check. (#131752 ) enhance cpp_builder `mypy` check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131752 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-07-27 02:46:27 +00:00
Xu Han	72d17d95d7	[inductor] Enable dynamo for Windows. RC1 (#131286 ) Changes: 1. Enable Windows in `check_if_inductor_supported`. 2. Disable Windows in `AotCodeCompiler`. 3. Force Windows inductor to `c++20` to support `std::enable_if_t`. 4. Disable `test_x86inductor_quantizer` UT on `Windows` temporary, It still some issue need to be fix: https://github.com/pytorch/pytorch/pull/131308 . Based on this PR, I have run first model `resnet18` on Windows inductor successful. <img width="1036" alt="image" src="https://github.com/user-attachments/assets/2642bda1-1845-417a-aaba-39bdf22e65d6"> TODO: 1. Upgrade pytorch Windows build to `c++20`. 2. Fix and re-enable `test_x86inductor_quantizer` UT on `Windows`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131286 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-07-24 15:26:55 +00:00
Xuehai Pan	b6d477fd56	[BE][Easy][16/19] enforce style for empty lines in import segments in `torch/_i*/` (#129768 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129768 Approved by: https://github.com/jansel	2024-07-20 16:20:58 +00:00
Xu Han	6e7b9ee8a0	[inductor] adapte windows file path (#130713 ) This PR is depends on https://github.com/pytorch/pytorch/pull/130132 can be landed successful. The detailed log: https://github.com/pytorch/pytorch/issues/124245#issuecomment-2211889758 After the file path was adapted for Windows, the first Windows inductor case was run successful. ```python import torch def foo(x, y): a = torch.sin(x) b = torch.cos(x) return a + b opt_foo1 = torch.compile(foo) print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10))) ``` Result: ![image](https://github.com/user-attachments/assets/4944df47-e74d-476b-8eb5-1d1fd5abeb41) Co-authored-by: Jiong Gong <jiong.gong@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/130713 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/desertfire	2024-07-18 23:19:38 +00:00
PyTorch MergeBot	41f5d5dcaf	Revert "[inductor] adapte windows file path (#130713 )" This reverts commit `e51e971a86`. Reverted https://github.com/pytorch/pytorch/pull/130713 on behalf of https://github.com/clee2000 due to sorry but I think its still failing, this time on windows CUDA https://github.com/pytorch/pytorch/actions/runs/9971126834/job/27552761451 `bb62e9d7c3`. It was not run on PR due to being on the periodic workflow, which isnt usually run on PRs due to capacity issues for windows CUDA machines. I will add ciflow/periodic to the PR to ensure the test gets run ([comment](https://github.com/pytorch/pytorch/pull/130713#issuecomment-2234092078))	2024-07-17 19:37:16 +00:00
angelayi	cbf274d4a7	[aoti] Add packaging solution (#129895 ) In this PR, I added support for packaging the AOTI generated files into a zipfile, and loading it in python. `compile_so` takes the path to the package, a device, and a desired so_path location, and compiles package into a .so, and saves to the specified location. `load_package` takes a path to the package and device, calls _extract_so, and then creates a callable to run the compiled model. The zipfile generated looks like the following: ``` \|- version \|- archive_format \|- data \|- aotinductor \|- cbtnafqaqrhvwztv7xudlal4xs6sofxa5oxccyuaqtrt6aozaklx.cubin # AOTI cuda generated cubin files \|- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe.cpp # AOTI generated cpp file \|- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe_compile_flags # Flags for compiling the .o \|- c6qqtnpgwfi3dv5nb76ai773kt45ezoxfwdmd7q37lvq6fs2tnoi.o # AOTI saved const.o \|- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe_linker_flags # Flags for linking the files to form the .so \|- constants \|- constants.pt # Constants saved using torch.save, can be loaded using mmap ``` The workflow is something like: ``` with torch.no_grad(): ep = torch.export.export( model, example_inputs, dynamic_shapes=dynamic_shapes, strict=False, ) gm = ep.module() package_path = torch._inductor.aot_compile( gm, example_inputs, options= { "aot_inductor.output_path": "my_path.pt2", # or a directory "aot_inductor.package": True, } ) compiled_model = torch._inductor.package.load_package(package_path, device) return compiled_model ``` I tried turning on loading the weights using mmap by default, but had some trouble with it, so that is just left as a todo Pull Request resolved: https://github.com/pytorch/pytorch/pull/129895 Approved by: https://github.com/malfet	2024-07-17 13:56:58 +00:00
Xu Han	e51e971a86	[inductor] adapte windows file path (#130713 ) This PR is depends on https://github.com/pytorch/pytorch/pull/130132 can be landed successful. The detailed log: https://github.com/pytorch/pytorch/issues/124245#issuecomment-2211889758 After the file path was adapted for Windows, the first Windows inductor case was run successful. ```python import torch def foo(x, y): a = torch.sin(x) b = torch.cos(x) return a + b opt_foo1 = torch.compile(foo) print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10))) ``` Result: ![image](https://github.com/user-attachments/assets/4944df47-e74d-476b-8eb5-1d1fd5abeb41) Co-authored-by: Jiong Gong <jiong.gong@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/130713 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/desertfire	2024-07-17 06:36:11 +00:00
PyTorch MergeBot	5f3c356a56	Revert "[inductor] adapte windows file path (#130713 )" This reverts commit `69e9917245`. Reverted https://github.com/pytorch/pytorch/pull/130713 on behalf of https://github.com/clee2000 due to broke functorch\test_eager_transforms.py on windows https://github.com/pytorch/pytorch/actions/runs/9958208725/job/27530132704 `69e9917245`. Test failure on PR is real, possibly force merged to get around lint error? ([comment](https://github.com/pytorch/pytorch/pull/130713#issuecomment-2231901793))	2024-07-16 22:07:55 +00:00

1 2

75 Commits