pytorch/torch/csrc/inductor
Sheng Qin d25617255c Fix AOTI update_constant_buffer issue. (#149243)
Summary:
In D69553929 we changed the logic of constant & buffer update in AOTI. However this is incompatible with current Sigmoid runtime since we have different logics to pass in buffers, resulted in errors like
```
I0310 17:29:24.456960 3679102 AOTIDelegateExecutor.cpp:89] AOTIDelegateExecutor processing weights
*** Aborted at 1741652964 (Unix time, try 'date -d 1741652964') ***
*** Signal 11 (SIGSEGV) (0x30) received by PID 3679102 (pthread TID 0x7f9933e49000) (linux TID 3679102) (code: address not mapped to object), stack trace: ***
    @ 00000000000040b9 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       ./fbcode/folly/debugging/symbolizer/SignalHandler.cpp:453
    @ 0000000000006c45 folly::fibers::(anonymous namespace)::sigsegvSignalHandler(int, siginfo_t*, void*)
                       ./fbcode/folly/fibers/GuardPageAllocator.cpp:237
    @ 000000000004455f (unknown)
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:8
                       -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c
    @ 00000000001e8164 torch::aot_inductor::AOTInductorModelContainer::update_constant_buffer(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, AtenTensorOpaque*, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, AtenTensorOpaque*> > > const&, bool, bool)
```

Test Plan:
1) Generate lowered merge net
```
CUDA_VISIBLE_DEVICES=0 ../buck-out/v2/gen/fbcode/b5b13003c82cbdec/caffe2/torch/fb/model_transform/fx2trt/packaging/__generate_merge_net_file__/generate_merge_net_file.par  --action=generate --input-file=/home/shengqin/models/aoti_sigmoid_test/cmf_interformer_with_custom_triton_kernels_691990503_0_input --output-file=/home/shengqin/models/aoti_sigmoid_test/cmf_interformer_with_custom_triton_kernels_691990503_0_output.aoti_sigmoid --lower-backend=aot_inductor  --use_sigmoid=true --aot_inductor_config="{'max_autotune': True, 'comprehensive_padding': False}" --add_passes=use_matmul_lce_replace_normal_LCE,use_triton_dot_compress,use_matmul_fuse_lce_replace_first_LCE,use_contiguous_linear_reduction_replace_linear_reduction --disable_acc_tracer=false
```

2) Load net predictor
```
CUDA_VISIBLE_DEVICES=1 ../buck-out/v2/gen/fbcode/103717df3cc2b97a/caffe2/torch/fb/model_transform/fx2trt/packaging/__load_net_predictor__/load_net_predictor --loadMode=AccuracyAB --inputNetFile=/home/shengqin/models/aoti_sigmoid_test/cmf_interformer_with_custom_triton_kernels_691990503_0_output.aoti_ts --otherNetFile=/home/shengqin/models/aoti_sigmoid_test/cmf_interformer_with_custom_triton_kernels_691990503_0_output.aoti_sigmoid --moduleName=merge --benchmarkEnableProfiling=false —-predictor_hardware_type=1 --disableStaticRuntime=true
```

Reviewed By: hl475

Differential Revision: D71236710

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149243
Approved by: https://github.com/hl475, https://github.com/jingsh
2025-03-17 22:10:57 +00:00
..
aoti_eager Fix for AOTI + CUDAGraphs when calling from Python (#148601) 2025-03-08 02:44:14 +00:00
aoti_include cpp_wrapper: reduce memory usage by removing unneeded temporaries (#147403) 2025-03-06 16:08:16 +00:00
aoti_package BC fix for AOTIModelPackageLoader() constructor defaults (#149082) 2025-03-13 18:40:53 +00:00
aoti_runner Revert "[AOTInductor] [BE] Add swap_constant_buffer into pybind for tests. (#149167)" 2025-03-14 15:16:21 +00:00
aoti_runtime Fix AOTI update_constant_buffer issue. (#149243) 2025-03-17 22:10:57 +00:00
aoti_torch op should NOT be static in aoti_torch_call_dispatcher (#149208) 2025-03-15 01:47:11 +00:00
cpp_wrapper cpp_wrapper: reduce memory usage by removing unneeded temporaries (#147403) 2025-03-06 16:08:16 +00:00
array_ref_impl.h cpp_wrapper: Move #includes to per-device header files (#145932) 2025-01-29 21:08:45 +00:00
inductor_ops.cpp [Reland] [1/N] Fix clang-tidy warnings in inductor (#134544) 2024-08-28 04:05:06 +00:00
inductor_ops.h [2/N] Fix clang-tidy warnings in inductor (#132040) 2024-07-29 18:41:24 +00:00
resize_storage_bytes.cpp [Reland] [1/N] Fix clang-tidy warnings in inductor (#134544) 2024-08-28 04:05:06 +00:00
static_cuda_launcher.cpp [Reland] First version of statically compiled launcher for triton compiled CUDA kernels (#149238) 2025-03-15 15:06:46 +00:00
static_cuda_launcher.h [Reland] First version of statically compiled launcher for triton compiled CUDA kernels (#149238) 2025-03-15 15:06:46 +00:00