pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mikayla Gawarecki	372b023eb1	Fix test_serialization_zipfile_actually_jit when weights_only is not default (#143668 ) Fails in fbcode where weights_only isn't default Pull Request resolved: https://github.com/pytorch/pytorch/pull/143668 Approved by: https://github.com/awgu ghstack dependencies: #143326, #143403	2024-12-20 21:25:10 +00:00
Darshan Sanghani	33dd4f187d	[pytorch/et] Allow ET to save additional resources for completing a trace like generated kernels and index tensor data (#143430 ) The resources directory lets ET observer dump any additional data like Triton kernels while capturing the ET. This allows us to use the ET trace to replay PT2 workloads and get visibility into data like generated kernels and their usage in a model, index tensor data etc. We also added a few ways to enable ET and ET Resources through the OS environment variables. Setting `ENABLE_PYTORCH_EXECUTION_TRACE` will enable default Execution Tracing in Pytorch. Additionally setting `ENABLE_PYTORCH_EXECUTION_TRACE_EXTRAS` will enable ET to collect extra resources from the ET run like Triton Kernels. Differential Revision: [D58707846](https://our.internmc.facebook.com/intern/diff/D58707846/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143430 Approved by: https://github.com/shengfukevin, https://github.com/sraikund16	2024-12-20 21:20:32 +00:00
zeshengzong	cee06e74ee	Apply clang-format for ATen/core/dispatch headers (#143620 ) Code change via add path config in `.lintrunner.toml` file and running ```bash $ lintrunner -a --take CLANGFORMAT --all-files ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143620 Approved by: https://github.com/malfet	2024-12-20 21:16:23 +00:00
Mikayla Gawarecki	8e483654cb	Add config.save.use_pinned_memory_for_d2h to serialization config (#143342 ) This was benchmarked with two separate scripts on my A100 (A) Save state_dict of llama3-style model on CUDA to disk with ``torch.save`` (B) Save `ModuleList` of 10 `nn.Linear(10,000, 10,000)` on CUDA to disk with `torch.save` Timings are an average of 5 runs and benchmark scripts + results are attached Under both scenarios, we see ~2x speedup in ``torch.save`` time with (``compute_crc32=False`` and ``use_pinned_memory_for_d2h=True``) compared to the baseline of the current defaults (``compute_crc32=True`` and ``use_pinned_memory_for_d2h=False`` (A) Save state_dict of llama3-style model on CUDA to disk with ``torch.save`` [[script](https://gist.github.com/mikaylagawarecki/d3a86ea1bb08045d1a839976808d7432)][[results](https://gist.github.com/mikaylagawarecki/f61a4714e5cff703146a1fcb7e0c755c)] \| \| use_pinned_memory_for_d2h=False (Default) \| use_pinned_memory_for_d2h=True \| \|-\|-\|-\| \| `compute_crc_32= True` (Default)\| 28.54s \| 20.76s \| \| `compute_crc_32 = False` \| 22.57s \| 14.51s \| (B) Save `ModuleList` of 10 `nn.Linear(10,000, 10,000)` on CUDA to disk with `torch.save` [[script](https://gist.github.com/mikaylagawarecki/ecbc505436bdd4b5190ef1b3430c12b6)][[results](https://gist.github.com/mikaylagawarecki/4e686bcf030b57de8c3ca74d8f5a88f7)] \| \| use_pinned_memory_for_d2h=False (Default) \| use_pinned_memory_for_d2h=True \| \|-\|-\|-\| \| `compute_crc_32= True` (Default)\| 8.38s \| 5.53s \| \| `compute_crc_32 = False` \| 6.94s \| 3.99s \| Trace of (A) with `use_pinned_memory_for_d2h=True`, `compute_crc32=False` <img width="1745" alt="Screenshot 2024-12-16 at 7 32 33 PM" src="https://github.com/user-attachments/assets/80b87a8c-5a70-4eb9-ad66-7abc4aa7cc25" /> Baseline trace of (A) with `use_pinned_memory_for_d2h=False`, `compute_crc32=True` <img width="1799" alt="Screenshot 2024-12-16 at 7 38 20 PM" src="https://github.com/user-attachments/assets/13fa12d1-8f5f-424c-adc4-275b67012927" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/143342 Approved by: https://github.com/albanD ghstack dependencies: #143324	2024-12-20 21:01:18 +00:00
Mikayla Gawarecki	3f63b742e6	Refactor serialization getter/setters into torch.utils.serialization.config (#143324 ) Consolidate - get/set_default_load_endianness - get/set_default_mmap_options - get/set_crc32_options into one global dynamo-style config + allow global setting of mmap. The existing APIs are not removed and will get/set from the config (as they can't be removed for BC) In #143459 I add the local (argument style) config Pull Request resolved: https://github.com/pytorch/pytorch/pull/143324 Approved by: https://github.com/albanD	2024-12-20 21:01:17 +00:00
Scott Wolchok	629de988df	Fix old-compiler-unfriendly zero init of bfloat16_t array (#143504 ) clang versions before 17 don't like to assign 0 to a bfloat16_t. gcc versions before 13 also won't assign 0.0 to a bfloat16_t. (Citation: https://godbolt.org/z/Gzs5ebdej) Differential Revision: [D67396740](https://our.internmc.facebook.com/intern/diff/D67396740/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143504 Approved by: https://github.com/malfet	2024-12-20 20:49:51 +00:00
Chirag Pandya	485497e727	[c10d][fr] flight recorder improvements (#143446 ) Summary: 1. Flight recorder dumps are now automatically dumped by default upon timeout or exception. Users don't need to opt-in. 2. Change default dump location to running user's home directory `.cache` folder. Test Plan: 1. Tested locally by running the crash program from flight recorder tutorial page. https://pytorch.org/tutorials/prototype/flight_recorder_tutorial.html#an-end-to-end-example 2. Noted that flight recorder files were correctly created. ❯ pwd /home/cpio/.cache/fr_trace ❯ ls nccl_trace_rank_0 nccl_trace_rank_1 Differential Revision: [D67363720](https://our.internmc.facebook.com/intern/diff/D67363720) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143446 Approved by: https://github.com/d4l3k	2024-12-20 20:41:30 +00:00
Colin L. Rice	a94f259a69	pgo: Log feature use (#142819 ) This will cause dynamo_compile to popualte the feature column if we have a hit for PGO. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142819 Approved by: https://github.com/ezyang	2024-12-20 20:22:20 +00:00
Aaron Orenstein	8ce0bc282a	dynamo tracing perf: bytecode_transform improvements: 34.86 -> 33.9 (#143068 ) See #143056 for overall docs. This PR: Use slots on InstructionExnTabEntry and Instruction. Stop doing python version checks in the middle of `convert_instruction()` and `inst_has_op_bits()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143068 Approved by: https://github.com/jansel ghstack dependencies: #143065, #143067	2024-12-20 20:06:42 +00:00
Aaron Orenstein	5feb2d7b41	dynamo tracing perf: don't call expensive _set_guard_export_info if it's a duplicate guard: 37.66 -> 34.86 (#143067 ) See #143056 for overall docs. This PR: Move the call to `_set_guard_export_info()` after the duplicate guard check in `GuardBuilder.DUPLICATE_INPUT()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143067 Approved by: https://github.com/jansel ghstack dependencies: #143065	2024-12-20 20:06:42 +00:00
Aaron Orenstein	7d4e7fbfc1	dynamo tracing perf: no import on hot path: 47.62 -> 47.26 (#143065 ) See #143056 for overall docs. This PR: Removed another `import` in the body of the hot path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143065 Approved by: https://github.com/jansel	2024-12-20 20:06:42 +00:00
Yanbo Liang	792e6184c5	[GPT-fast] Support run spcific model or micro-benchmark (#143607 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143607 Approved by: https://github.com/BoyuanFeng, https://github.com/jerryzh168, https://github.com/huydhn	2024-12-20 19:58:07 +00:00
Nikhil Gupta	94737e8a2a	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-20 19:32:03 +00:00
Tom Ritchford	b5475d334e	[inductor] Fix an unused variable in cpu_vec_isa.py (#138473 ) ---- * Extracted from https://github.com/pytorch/pytorch/pull/133492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138473 Approved by: https://github.com/EikanWang, https://github.com/albanD, https://github.com/xuhancn	2024-12-20 18:50:19 +00:00
Nikita Shulga	5a69c2a649	[BE][Sparse] Get rid of gcc-5 workaround (#143653 ) Discovered those comments while looking at https://github.com/pytorch/pytorch/pull/143620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143653 Approved by: https://github.com/albanD	2024-12-20 18:40:45 +00:00
Joy Dong	a5ed499f6a	FlexAttention Benchmark (#139665 ) 1. Add alibi, sliding window, tahn softcap, prefixLM, and document_mask from attn_gym to benchmark. 2. Add comparison to different SDPA backends & FAv2, FAv3, FAKV. Dependent on https://github.com/pytorch/pytorch/pull/139639 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139665 Approved by: https://github.com/drisspg	2024-12-20 17:52:24 +00:00
Hyunho Yeo	c7d9f29807	(MTIA) Move "empty_cache" API (#143402 ) Summary: This diff moves one of memory-related APIs to the consolidated location, which is `mtia/memory.py`. Test Plan: ``` buck2 test //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api ``` https://www.internalfb.com/intern/testinfra/testrun/13510798943184259 Reviewed By: nautsimon Differential Revision: D67148738 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143402 Approved by: https://github.com/nautsimon	2024-12-20 17:39:06 +00:00
Colin L. Rice	d79fbf6b6d	test/dynamo/test_utils: logging - Stop testing for impossible things. (#143535 ) We don't support assigning to objects or numeric constants at the top level in config modules, no need to test for them. (This specifically breaks later sorting refactoring, since it requires < to be implemented). Pull Request resolved: https://github.com/pytorch/pytorch/pull/143535 Approved by: https://github.com/ppanchalia	2024-12-20 17:21:49 +00:00
Huamin Li	f5af87c23c	Make Inductor cpp backend enable_floating_point_contract_flag to take string (#143450 ) Differential Revision: D66269001 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143450 Approved by: https://github.com/desertfire	2024-12-20 16:28:54 +00:00
William Wen	7ab880bc5e	fix typo in autocast header (#143625 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143625 Approved by: https://github.com/mlazos ghstack dependencies: #143592	2024-12-20 16:17:15 +00:00
bobrenjc93	4f8b7c4272	Revert "refactor tensorify restart logic to use sources (#141517 )" (#143623 ) This reverts commit `30d8b30db7`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143623 Approved by: https://github.com/mlazos	2024-12-20 15:38:34 +00:00
leslie-fang-intel	607884c9af	[Inductor][CPP] Fix bitwise shift with corner inputs (#143635 ) Summary Fix issue https://github.com/pytorch/pytorch/issues/143555 and https://github.com/pytorch/pytorch/issues/143566, we can align the implementation with Eager: `29b586bbad/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp (L501)` at these corner inputs. Test Plan ``` python test/inductor/test_cpu_repro.py -k test_bitwise_shift_corner_inputs ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143635 Approved by: https://github.com/jgong5	2024-12-20 13:47:40 +00:00
Guilherme Leobas	7bf3b7cdc5	Rewrite _reparametrize_module to use `contextmanager` (#138203 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138203 Approved by: https://github.com/zou3519 ghstack dependencies: #136033, #140604	2024-12-20 12:02:27 +00:00
Guilherme Leobas	1c817fe671	Set `enable_trace_contextlib_contextmanager` flag to True (#140604 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140604 Approved by: https://github.com/zou3519 ghstack dependencies: #136033	2024-12-20 12:02:27 +00:00
Guilherme Leobas	673cc88fd6	Add support for `contextmanager` in Dynamo (#136033 ) Fixes #130559 * Intro This PR adds support for `@contextmanager` in Dynamo. We chose to limit the scope of this work to only `@contextmanager` and plan to handle generators fully in #141055 (still in draft). * Motivation Dynamo lacks support for generator functions. When it encounters one, it traces it as if it were a regular function. This is problematic because it can lead to incorrect behavior. To illustrate, consider the test case below: ```python import torch import contextlib @contextlib.contextmanager def set_default_dtype(dtype): old_dtype = torch.get_default_dtype() try: torch.set_default_dtype(dtype) yield finally: torch.set_default_dtype(old_dtype) @torch.compile(backend="eager", fullgraph=True) def fn(): with set_default_dtype(torch.float64): x = torch.tensor([3.0, 3.0 + 5.0j]) return x ``` Before this work, Dynamo would not stop at the `yield`, and the graph produced would contain both calls to `set_default_dtype` executed one after the other. This is incorrect because the context manager should execute code before and after the `yield`. * List of changes `YIELD_VALUE` now raises an exception (`YieldValueOp`) to signal that control flow must be suspended and returned to the caller. Additionally, `RETURN_VALUE` behaves differently in a generator function. Unlike regular functions, where `RETURN_VALUE` indicates the final result, in generators it signifies that the generator is exhausted and implicitly raises `StopIteration`. A new `VariableTracker` named `FunctionDecoratedByContextlibContextManagerVariable` was introduced to handle `@contextmanager`. This variable tracker acts not just as a wrapper for the original function but also maintains an internal `tx` (InstructionTranslator) object to suspend and return control flow to the parent tracer when a `yield` is encountered. * Corner cases Returning a context manager from a compiled function is not supported. This would require PyTorch to synchronize the generator state between Dynamo and the interpreter. Any attempt to return it will result in an `IncorrectUsage` exception. Graph breaks require special handling as well. In the event of a graph break, the frame associated with the context manager is skipped, and the context manager runs in eager mode. * This PR is breaking my code There is a configuration flag (`enable_trace_contextlib`) that can be set to `False` to disable tracing context managers. If this still causes crashes, please revert this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136033 Approved by: https://github.com/zou3519	2024-12-20 12:02:20 +00:00
Jason Ansel	04b26ee1e8	Fix false positive from f-strings in set_linter (#143628 ) This linter was going crazy in python 3.12, example: ```py $ python3 tools/linter/adapters/set_linter.py torch/_inductor/runtime/triton_heuristics.py torch/_inductor/runtime/triton_heuristics.py:192:25: Builtin `set` is deprecated 190 \| args_str += ", ".join(call_args) 191 \| for k, v in call_kwargs.items(): 192 \| args_str += f", {k}={v}" ^ 193 \| 194 \| abs_path = os.path.abspath(sys.argv[0]) torch/_inductor/runtime/triton_heuristics.py:192:27: Builtin `set` is deprecated 190 \| args_str += ", ".join(call_args) 191 \| for k, v in call_kwargs.items(): 192 \| args_str += f", {k}={v}" ^ 193 \| 194 \| abs_path = os.path.abspath(sys.argv[0]) torch/_inductor/runtime/triton_heuristics.py:192:29: Builtin `set` is deprecated 190 \| args_str += ", ".join(call_args) 191 \| for k, v in call_kwargs.items(): 192 \| args_str += f", {k}={v}" ^ 193 \| 194 \| abs_path = os.path.abspath(sys.argv[0]) torch/_inductor/runtime/triton_heuristics.py:192:31: Builtin `set` is deprecated 190 \| args_str += ", ".join(call_args) 191 \| for k, v in call_kwargs.items(): 192 \| args_str += f", {k}={v}" ^ 193 \| 194 \| abs_path = os.path.abspath(sys.argv[0]) torch/_inductor/runtime/triton_heuristics.py:195:17: Builtin `set` is deprecated 193 \| 194 \| abs_path = os.path.abspath(sys.argv[0]) 195 \| with open(f"{abs_path}.launch_params", "a") as f: ^ 196 \| f.write(f"{kernel_name} \| {args_str}\n") 197 \| torch/_inductor/runtime/triton_heuristics.py:195:26: Builtin `set` is deprecated 193 \| 194 \| abs_path = os.path.abspath(sys.argv[0]) 195 \| with open(f"{abs_path}.launch_params", "a") as f: ^ 196 \| f.write(f"{kernel_name} \| {args_str}\n") 197 \| torch/_inductor/runtime/triton_heuristics.py:196:19: Builtin `set` is deprecated 194 \| abs_path = os.path.abspath(sys.argv[0]) 195 \| with open(f"{abs_path}.launch_params", "a") as f: 196 \| f.write(f"{kernel_name} \| {args_str}\n") ^ 197 \| 198 \| torch/_inductor/runtime/triton_heuristics.py:196:31: Builtin `set` is deprecated 194 \| abs_path = os.path.abspath(sys.argv[0]) 195 \| with open(f"{abs_path}.launch_params", "a") as f: 196 \| f.write(f"{kernel_name} \| {args_str}\n") ^ 197 \| 198 \| torch/_inductor/runtime/triton_heuristics.py:196:35: Builtin `set` is deprecated 194 \| abs_path = os.path.abspath(sys.argv[0]) 195 \| with open(f"{abs_path}.launch_params", "a") as f: 196 \| f.write(f"{kernel_name} \| {args_str}\n") ^ 197 \| 198 \| torch/_inductor/runtime/triton_heuristics.py:196:44: Builtin `set` is deprecated 194 \| abs_path = os.path.abspath(sys.argv[0]) 195 \| with open(f"{abs_path}.launch_params", "a") as f: 196 \| f.write(f"{kernel_name} \| {args_str}\n") ^ 197 \| 198 \| torch/_inductor/runtime/triton_heuristics.py:729:26: Builtin `set` is deprecated 727 \| exec( 728 \| f""" 729 \| def launcher({', '.join(def_args)}, grid, stream): ^ 730 \| if callable(grid): 731 \| grid_0, grid_1, grid_2 = grid(grid_meta) torch/_inductor/runtime/triton_heuristics.py:729:46: Builtin `set` is deprecated 727 \| exec( 728 \| f""" 729 \| def launcher({', '.join(def_args)}, grid, stream): ^ 730 \| if callable(grid): 731 \| grid_0, grid_1, grid_2 = grid(grid_meta) torch/_inductor/runtime/triton_heuristics.py:735:24: Builtin `set` is deprecated 733 \| grid_0, grid_1, grid_2 = grid 734 \| 735 \| args = {', '.join(call_args)}, ^ 736 \| launch_args = get_launch_args( 737 \| grid, grid_0, grid_1, grid_2, stream, function, torch/_inductor/runtime/triton_heuristics.py:735:45: Builtin `set` is deprecated 733 \| grid_0, grid_1, grid_2 = grid 734 \| 735 \| args = {', '.join(call_args)}, ^ 736 \| launch_args = get_launch_args( 737 \| grid, grid_0, grid_1, grid_2, stream, function, torch/_inductor/runtime/triton_heuristics.py:1144:20: Builtin `set` is deprecated 1142 \| cur_file = inspect.stack()[1].filename 1143 \| summary_str = ( 1144 \| f"SUMMARY ({cur_file})\n" ^ 1145 \| f"{overall_time:.2f}ms \t {overall_gb:.2f} GB\t {overall_gb / (overall_time / 1e3):.2f}GB/s" 1146 \| ) torch/_inductor/runtime/triton_heuristics.py:1144:29: Builtin `set` is deprecated 1142 \| cur_file = inspect.stack()[1].filename 1143 \| summary_str = ( 1144 \| f"SUMMARY ({cur_file})\n" ^ 1145 \| f"{overall_time:.2f}ms \t {overall_gb:.2f} GB\t {overall_gb / (overall_time / 1e3):.2f}GB/s" 1146 \| ) torch/_inductor/runtime/triton_heuristics.py:1162:61: Builtin `set` is deprecated 1160 \| ) 1161 \| file.write("====================\n") 1162 \| file.write(f"TRITON KERNELS BANDWIDTH INFO ({cur_file})\n") ^ 1163 \| for ms, num_gb, gb_per_s, kernel_name in sorted_calls: 1164 \| # also display the runtime percentage for each kernel torch/_inductor/runtime/triton_heuristics.py:1162:70: Builtin `set` is deprecated 1160 \| ) 1161 \| file.write("====================\n") 1162 \| file.write(f"TRITON KERNELS BANDWIDTH INFO ({cur_file})\n") ^ 1163 \| for ms, num_gb, gb_per_s, kernel_name in sorted_calls: 1164 \| # also display the runtime percentage for each kernel torch/_inductor/runtime/triton_heuristics.py:1166:36: Builtin `set` is deprecated 1164 \| # also display the runtime percentage for each kernel 1165 \| percentage = f"{ms / overall_time * 100:.2f}%" 1166 \| suffix = f" \t {percentage} \t {kernel_name}" ^ 1167 \| bw_info_str = create_bandwidth_info_str( 1168 \| ms, torch/_inductor/runtime/triton_heuristics.py:1166:47: Builtin `set` is deprecated 1164 \| # also display the runtime percentage for each kernel 1165 \| percentage = f"{ms / overall_time * 100:.2f}%" 1166 \| suffix = f" \t {percentage} \t {kernel_name}" ^ 1167 \| bw_info_str = create_bandwidth_info_str( 1168 \| ms, torch/_inductor/runtime/triton_heuristics.py:1166:52: Builtin `set` is deprecated 1164 \| # also display the runtime percentage for each kernel 1165 \| percentage = f"{ms / overall_time * 100:.2f}%" 1166 \| suffix = f" \t {percentage} \t {kernel_name}" ^ 1167 \| bw_info_str = create_bandwidth_info_str( 1168 \| ms, torch/_inductor/runtime/triton_heuristics.py:1166:64: Builtin `set` is deprecated 1164 \| # also display the runtime percentage for each kernel 1165 \| percentage = f"{ms / overall_time * 100:.2f}%" 1166 \| suffix = f" \t {percentage} \t {kernel_name}" ^ 1167 \| bw_info_str = create_bandwidth_info_str( 1168 \| ms, torch/_inductor/runtime/triton_heuristics.py:1175:30: Builtin `set` is deprecated 1173 \| ) 1174 \| file.write(bw_info_str + "\n") 1175 \| file.write(f"{summary_str}\n\n") ^ 1176 \| except Exception as e: 1177 \| log.warning( torch/_inductor/runtime/triton_heuristics.py:1175:42: Builtin `set` is deprecated 1173 \| ) 1174 \| file.write(bw_info_str + "\n") 1175 \| file.write(f"{summary_str}\n\n") ^ 1176 \| except Exception as e: 1177 \| log.warning( torch/_inductor/runtime/triton_heuristics.py:1205:29: Builtin `set` is deprecated 1203 \| else: 1204 \| possible_names = _find_names(self) 1205 \| kernel_name = f"{max(possible_names, key=len)}" ^ 1206 \| if not re.match(self.regex_filter, kernel_name): 1207 \| return torch/_inductor/runtime/triton_heuristics.py:1205:58: Builtin `set` is deprecated 1203 \| else: 1204 \| possible_names = _find_names(self) 1205 \| kernel_name = f"{max(possible_names, key=len)}" ^ 1206 \| if not re.match(self.regex_filter, kernel_name): 1207 \| return torch/_inductor/runtime/triton_heuristics.py:1241:60: Builtin `set` is deprecated 1239 \| "%s", 1240 \| create_bandwidth_info_str( 1241 \| ms, num_gb, gb_per_s, suffix=f" \t {kernel_name}" ^ 1242 \| ), 1243 \| ) torch/_inductor/runtime/triton_heuristics.py:1241:72: Builtin `set` is deprecated 1239 \| "%s", 1240 \| create_bandwidth_info_str( 1241 \| ms, num_gb, gb_per_s, suffix=f" \t {kernel_name}" ^ 1242 \| ), 1243 \| ) torch/_inductor/runtime/triton_heuristics.py:1256:15: Builtin `set` is deprecated 1254 \| for cfg in configs: 1255 \| hasher.update( 1256 \| f"{sorted(cfg.kwargs.items())} {cfg.num_warps} {cfg.num_stages}\n".encode() ^ 1257 \| ) 1258 \| return hasher.hexdigest() torch/_inductor/runtime/triton_heuristics.py:1256:42: Builtin `set` is deprecated 1254 \| for cfg in configs: 1255 \| hasher.update( 1256 \| f"{sorted(cfg.kwargs.items())} {cfg.num_warps} {cfg.num_stages}\n".encode() ^ 1257 \| ) 1258 \| return hasher.hexdigest() torch/_inductor/runtime/triton_heuristics.py:1256:44: Builtin `set` is deprecated 1254 \| for cfg in configs: 1255 \| hasher.update( 1256 \| f"{sorted(cfg.kwargs.items())} {cfg.num_warps} {cfg.num_stages}\n".encode() ^ 1257 \| ) 1258 \| return hasher.hexdigest() torch/_inductor/runtime/triton_heuristics.py:1256:58: Builtin `set` is deprecated 1254 \| for cfg in configs: 1255 \| hasher.update( 1256 \| f"{sorted(cfg.kwargs.items())} {cfg.num_warps} {cfg.num_stages}\n".encode() ^ 1257 \| ) 1258 \| return hasher.hexdigest() torch/_inductor/runtime/triton_heuristics.py:1256:60: Builtin `set` is deprecated 1254 \| for cfg in configs: 1255 \| hasher.update( 1256 \| f"{sorted(cfg.kwargs.items())} {cfg.num_warps} {cfg.num_stages}\n".encode() ^ 1257 \| ) 1258 \| return hasher.hexdigest() torch/_inductor/runtime/triton_heuristics.py:1256:75: Builtin `set` is deprecated 1254 \| for cfg in configs: 1255 \| hasher.update( 1256 \| f"{sorted(cfg.kwargs.items())} {cfg.num_warps} {cfg.num_stages}\n".encode() ^ 1257 \| ) 1258 \| return hasher.hexdigest() torch/_inductor/runtime/triton_heuristics.py:1377:23: Builtin `set` is deprecated 1375 \| if numel is None: 1376 \| continue 1377 \| block = cfg[f"{label}BLOCK"] ^ 1378 \| if numel == 1: 1379 \| assert block == 1, ( torch/_inductor/runtime/triton_heuristics.py:1377:29: Builtin `set` is deprecated 1375 \| if numel is None: 1376 \| continue 1377 \| block = cfg[f"{label}BLOCK"] ^ 1378 \| if numel == 1: 1379 \| assert block == 1, ( torch/_inductor/runtime/triton_heuristics.py:1381:24: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:38: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:46: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:52: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:58: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:64: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:71: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:77: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:84: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:88: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1384:52: Builtin `set` is deprecated 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] 1384 \| max_block_str = f'config.triton.max_block["{label}"]' ^ 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" torch/_inductor/runtime/triton_heuristics.py:1384:58: Builtin `set` is deprecated 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] 1384 \| max_block_str = f'config.triton.max_block["{label}"]' ^ 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" torch/_inductor/runtime/triton_heuristics.py:1386:45: Builtin `set` is deprecated 1384 \| max_block_str = f'config.triton.max_block["{label}"]' 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" ^ 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." 1388 \| ) torch/_inductor/runtime/triton_heuristics.py:1386:51: Builtin `set` is deprecated 1384 \| max_block_str = f'config.triton.max_block["{label}"]' 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" ^ 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." 1388 \| ) torch/_inductor/runtime/triton_heuristics.py:1386:66: Builtin `set` is deprecated 1384 \| max_block_str = f'config.triton.max_block["{label}"]' 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" ^ 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." 1388 \| ) torch/_inductor/runtime/triton_heuristics.py:1386:80: Builtin `set` is deprecated 1384 \| max_block_str = f'config.triton.max_block["{label}"]' 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" ^ 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." 1388 \| ) torch/_inductor/runtime/triton_heuristics.py:1387:20: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:26: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:33: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:39: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:45: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:59: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:61: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:71: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:78: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:82: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1402:19: Builtin `set` is deprecated 1400 \| assert ( 1401 \| val <= max_block 1402 \| ), f"'{var}' too large. Maximum: {max_block}. Actual: {val}." ^ 1403 \| 1404 \| torch/_inductor/runtime/triton_heuristics.py:1402:23: Builtin `set` is deprecated 1400 \| assert ( 1401 \| val <= max_block 1402 \| ), f"'{var}' too large. Maximum: {max_block}. Actual: {val}." ^ 1403 \| 1404 \| torch/_inductor/runtime/triton_heuristics.py:1402:46: Builtin `set` is deprecated 1400 \| assert ( 1401 \| val <= max_block 1402 \| ), f"'{var}' too large. Maximum: {max_block}. Actual: {val}." ^ 1403 \| 1404 \| torch/_inductor/runtime/triton_heuristics.py:1402:56: Builtin `set` is deprecated 1400 \| assert ( 1401 \| val <= max_block 1402 \| ), f"'{var}' too large. Maximum: {max_block}. Actual: {val}." ^ 1403 \| 1404 \| torch/_inductor/runtime/triton_heuristics.py:1402:67: Builtin `set` is deprecated 1400 \| assert ( 1401 \| val <= max_block 1402 \| ), f"'{var}' too large. Maximum: {max_block}. Actual: {val}." ^ 1403 \| 1404 \| torch/_inductor/runtime/triton_heuristics.py:1402:71: Builtin `set` is deprecated 1400 \| assert ( 1401 \| val <= max_block 1402 \| ), f"'{var}' too large. Maximum: {max_block}. Actual: {val}." ^ 1403 \| 1404 \| torch/_inductor/runtime/triton_heuristics.py:1551:21: Builtin `set` is deprecated 1549 \| rnumels = {} 1550 \| for idx in range(num_reduction_dims - 1, -1, -1): 1551 \| prefix = f"r{idx}_" ^ 1552 \| max_size = min(size_hints[prefix], TRITON_MAX_BLOCK[prefix.upper()]) 1553 \| dim = min(max_size, remaining) torch/_inductor/runtime/triton_heuristics.py:1551:25: Builtin `set` is deprecated 1549 \| rnumels = {} 1550 \| for idx in range(num_reduction_dims - 1, -1, -1): 1551 \| prefix = f"r{idx}_" ^ 1552 \| max_size = min(size_hints[prefix], TRITON_MAX_BLOCK[prefix.upper()]) 1553 \| dim = min(max_size, remaining) torch/_inductor/runtime/triton_heuristics.py:1556:34: Builtin `set` is deprecated 1554 \| assert ( 1555 \| remaining % dim == 0 1556 \| ), f"Expected dimension '{dim}' to divide remaining size '{remaining}'" ^ 1557 \| rnumels[prefix] = dim 1558 \| remaining //= dim torch/_inductor/runtime/triton_heuristics.py:1556:38: Builtin `set` is deprecated 1554 \| assert ( 1555 \| remaining % dim == 0 1556 \| ), f"Expected dimension '{dim}' to divide remaining size '{remaining}'" ^ 1557 \| rnumels[prefix] = dim 1558 \| remaining //= dim torch/_inductor/runtime/triton_heuristics.py:1556:67: Builtin `set` is deprecated 1554 \| assert ( 1555 \| remaining % dim == 0 1556 \| ), f"Expected dimension '{dim}' to divide remaining size '{remaining}'" ^ 1557 \| rnumels[prefix] = dim 1558 \| remaining //= dim torch/_inductor/runtime/triton_heuristics.py:1556:77: Builtin `set` is deprecated 1554 \| assert ( 1555 \| remaining % dim == 0 1556 \| ), f"Expected dimension '{dim}' to divide remaining size '{remaining}'" ^ 1557 \| rnumels[prefix] = dim 1558 \| remaining //= dim torch/_inductor/runtime/triton_heuristics.py:1564:38: Builtin `set` is deprecated 1562 \| assert ( 1563 \| r == final_numel 1564 \| ), f"Expected ND reduction size ({rnumels}) to have {r} elements." ^ 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels torch/_inductor/runtime/triton_heuristics.py:1564:46: Builtin `set` is deprecated 1562 \| assert ( 1563 \| r == final_numel 1564 \| ), f"Expected ND reduction size ({rnumels}) to have {r} elements." ^ 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels torch/_inductor/runtime/triton_heuristics.py:1564:57: Builtin `set` is deprecated 1562 \| assert ( 1563 \| r == final_numel 1564 \| ), f"Expected ND reduction size ({rnumels}) to have {r} elements." ^ 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels torch/_inductor/runtime/triton_heuristics.py:1564:59: Builtin `set` is deprecated 1562 \| assert ( 1563 \| r == final_numel 1564 \| ), f"Expected ND reduction size ({rnumels}) to have {r} elements." ^ 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels torch/_inductor/runtime/triton_heuristics.py:1567:37: Builtin `set` is deprecated 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels 1567 \| ), f"rnumels exceed size_hints. {rnumels} > {size_hints}" ^ 1568 \| 1569 \| return rnumels torch/_inductor/runtime/triton_heuristics.py:1567:45: Builtin `set` is deprecated 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels 1567 \| ), f"rnumels exceed size_hints. {rnumels} > {size_hints}" ^ 1568 \| 1569 \| return rnumels torch/_inductor/runtime/triton_heuristics.py:1567:49: Builtin `set` is deprecated 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels 1567 \| ), f"rnumels exceed size_hints. {rnumels} > {size_hints}" ^ 1568 \| 1569 \| return rnumels torch/_inductor/runtime/triton_heuristics.py:1567:60: Builtin `set` is deprecated 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels 1567 \| ), f"rnumels exceed size_hints. {rnumels} > {size_hints}" ^ 1568 \| 1569 \| return rnumels torch/_inductor/runtime/triton_heuristics.py:1746:49: Builtin `set` is deprecated 1744 \| 1745 \| if not configs: 1746 \| raise NotImplementedError(f"size_hints: {size_hints}") ^ 1747 \| return cached_autotune( 1748 \| size_hints, torch/_inductor/runtime/triton_heuristics.py:1746:60: Builtin `set` is deprecated 1744 \| 1745 \| if not configs: 1746 \| raise NotImplementedError(f"size_hints: {size_hints}") ^ 1747 \| return cached_autotune( 1748 \| size_hints, torch/_inductor/runtime/triton_heuristics.py:1928:32: Builtin `set` is deprecated 1926 \| for prefix in size_hints: 1927 \| if prefix_is_reduction(prefix): 1928 \| c.kwargs.pop(f"{prefix.upper()}BLOCK") ^ 1929 \| 1930 \| if disable_pointwise_autotuning(inductor_meta): torch/_inductor/runtime/triton_heuristics.py:1928:47: Builtin `set` is deprecated 1926 \| for prefix in size_hints: 1927 \| if prefix_is_reduction(prefix): 1928 \| c.kwargs.pop(f"{prefix.upper()}BLOCK") ^ 1929 \| 1930 \| if disable_pointwise_autotuning(inductor_meta): torch/_inductor/runtime/triton_heuristics.py:1975:49: Builtin `set` is deprecated 1973 \| assert triton_meta is not None 1974 \| if len(size_hints) != 2: 1975 \| raise NotImplementedError(f"size_hints: {size_hints}") ^ 1976 \| 1977 \| configs = _reduction_configs(size_hints=size_hints, inductor_meta=inductor_meta) torch/_inductor/runtime/triton_heuristics.py:1975:60: Builtin `set` is deprecated 1973 \| assert triton_meta is not None 1974 \| if len(size_hints) != 2: 1975 \| raise NotImplementedError(f"size_hints: {size_hints}") ^ 1976 \| 1977 \| configs = _reduction_configs(size_hints=size_hints, inductor_meta=inductor_meta) torch/_inductor/runtime/triton_heuristics.py:2082:56: Builtin `set` is deprecated 2080 \| xnumel, ynumel, znumel = numels[2], numels[1], numels[0] 2081 \| else: 2082 \| raise AssertionError(f"invalid size for numels {len(numels)}") ^ 2083 \| 2084 \| def get_grid_dim(numel, block): torch/_inductor/runtime/triton_heuristics.py:2082:68: Builtin `set` is deprecated 2080 \| xnumel, ynumel, znumel = numels[2], numels[1], numels[0] 2081 \| else: 2082 \| raise AssertionError(f"invalid size for numels {len(numels)}") ^ 2083 \| 2084 \| def get_grid_dim(numel, block): torch/_inductor/runtime/triton_heuristics.py:2104:57: Builtin `set` is deprecated 2102 \| torch._check( 2103 \| y_grid <= max_y_grid, 2104 \| lambda: f"Generated y grid beyond 2^16 ({y_grid}) not supported with z dimension present. File issue", ^ 2105 \| ) 2106 \| torch/_inductor/runtime/triton_heuristics.py:2104:64: Builtin `set` is deprecated 2102 \| torch._check( 2103 \| y_grid <= max_y_grid, 2104 \| lambda: f"Generated y grid beyond 2^16 ({y_grid}) not supported with z dimension present. File issue", ^ 2105 \| ) 2106 \| torch/_inductor/runtime/triton_heuristics.py:2113:43: Builtin `set` is deprecated 2111 \| ) 2112 \| 2113 \| setattr(grid_fn, "grid_fn_str", f"grid{numels}") # noqa: B010 ^ 2114 \| 2115 \| return grid_fn torch/_inductor/runtime/triton_heuristics.py:2113:50: Builtin `set` is deprecated 2111 \| ) 2112 \| 2113 \| setattr(grid_fn, "grid_fn_str", f"grid{numels}") # noqa: B010 ^ 2114 \| 2115 \| return grid_fn torch/_inductor/runtime/triton_heuristics.py:2122:48: Builtin `set` is deprecated 2120 \| return (meta["RSPLIT"], ceildiv(xnumel, meta.get("XBLOCK", 1)), 1) 2121 \| 2122 \| grid_fn_str = f"cooperative_reduction_grid({xnumel})" ^ 2123 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2124 \| return grid_fn torch/_inductor/runtime/triton_heuristics.py:2122:55: Builtin `set` is deprecated 2120 \| return (meta["RSPLIT"], ceildiv(xnumel, meta.get("XBLOCK", 1)), 1) 2121 \| 2122 \| grid_fn_str = f"cooperative_reduction_grid({xnumel})" ^ 2123 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2124 \| return grid_fn torch/_inductor/runtime/triton_heuristics.py:2135:54: Builtin `set` is deprecated 2133 \| coop_grid = cooperative_reduction_grid(xnumel) 2134 \| normal_grid = grid(xnumel) 2135 \| grid_fn_str = f"maybe_cooperative_reduction_grid({xnumel})" ^ 2136 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2137 \| return grid_fn torch/_inductor/runtime/triton_heuristics.py:2135:61: Builtin `set` is deprecated 2133 \| coop_grid = cooperative_reduction_grid(xnumel) 2134 \| normal_grid = grid(xnumel) 2135 \| grid_fn_str = f"maybe_cooperative_reduction_grid({xnumel})" ^ 2136 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2137 \| return grid_fn torch/_inductor/runtime/triton_heuristics.py:2145:37: Builtin `set` is deprecated 2143 \| return (ceildiv(rnumel, meta.get("R0_BLOCK", 1)), xnumel, 1) 2144 \| 2145 \| grid_fn_str = f"split_scan_grid({xnumel}, {rnumel})" ^ 2146 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2147 \| torch/_inductor/runtime/triton_heuristics.py:2145:44: Builtin `set` is deprecated 2143 \| return (ceildiv(rnumel, meta.get("R0_BLOCK", 1)), xnumel, 1) 2144 \| 2145 \| grid_fn_str = f"split_scan_grid({xnumel}, {rnumel})" ^ 2146 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2147 \| torch/_inductor/runtime/triton_heuristics.py:2145:47: Builtin `set` is deprecated 2143 \| return (ceildiv(rnumel, meta.get("R0_BLOCK", 1)), xnumel, 1) 2144 \| 2145 \| grid_fn_str = f"split_scan_grid({xnumel}, {rnumel})" ^ 2146 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2147 \| torch/_inductor/runtime/triton_heuristics.py:2145:54: Builtin `set` is deprecated 2143 \| return (ceildiv(rnumel, meta.get("R0_BLOCK", 1)), xnumel, 1) 2144 \| 2145 \| grid_fn_str = f"split_scan_grid({xnumel}, {rnumel})" ^ 2146 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2147 \| torch/_inductor/runtime/triton_heuristics.py:2173:42: Builtin `set` is deprecated 2171 \| assert ( 2172 \| min_blocks_d is None or min_blocks == min_blocks_d 2173 \| ), f"inconsistent min_blocks {min_blocks} vs x grid {numels[-1]}" ^ 2174 \| else: 2175 \| # sequential dispatch torch/_inductor/runtime/triton_heuristics.py:2173:53: Builtin `set` is deprecated 2171 \| assert ( 2172 \| min_blocks_d is None or min_blocks == min_blocks_d 2173 \| ), f"inconsistent min_blocks {min_blocks} vs x grid {numels[-1]}" ^ 2174 \| else: 2175 \| # sequential dispatch torch/_inductor/runtime/triton_heuristics.py:2173:66: Builtin `set` is deprecated 2171 \| assert ( 2172 \| min_blocks_d is None or min_blocks == min_blocks_d 2173 \| ), f"inconsistent min_blocks {min_blocks} vs x grid {numels[-1]}" ^ 2174 \| else: 2175 \| # sequential dispatch torch/_inductor/runtime/triton_heuristics.py:2173:77: Builtin `set` is deprecated 2171 \| assert ( 2172 \| min_blocks_d is None or min_blocks == min_blocks_d 2173 \| ), f"inconsistent min_blocks {min_blocks} vs x grid {numels[-1]}" ^ 2174 \| else: 2175 \| # sequential dispatch ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143628 Approved by: https://github.com/yanboliang, https://github.com/rec	2024-12-20 11:45:26 +00:00
Xu Han	6733045a4a	export AOTI_TORCH_EXPORT on Windows. (#140030 ) Fixes #139954 reproduce UT: ```cmd pytest test/inductor/test_torchinductor_codegen_dynamic_shapes.py -k test_device_assert_dynamic_shapes_cpu ``` Issue: <img width="856" alt="image" src="https://github.com/user-attachments/assets/5fc501a9-54e5-45ac-9fb3-509ec11a7abe"> After fixing: ![Image](https://github.com/user-attachments/assets/883846fb-8e92-4b9c-9400-daab32382a3a) Reland: 1. Declare export on Windows explicitly. 2. Support cpu, cuda and xpu devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140030 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-12-20 11:42:09 +00:00
Michael Lazos	b539c61631	[Hierarchical Compile] Update NoneAsConstantBuffer to support graph d… (#143531 ) Fixes issues I hit while running graph deduplication with torch tune. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143531 Approved by: https://github.com/eellison	2024-12-20 09:23:12 +00:00
Pian Pawakapan	f9f82ca48f	[ts converter] use Dim.AUTO for ts -> export converter (#138273 ) Switches TS converter to use `Dim.AUTO` by default, exporting models with max dynamism. Adds runtime input tests to `test_converter.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138273 Approved by: https://github.com/avikchaudhuri	2024-12-20 07:48:24 +00:00
Michael Lazos	270ad513c8	[Dynamo] only import einops if version is lower than 0.7.0 (#142847 ) Fixes internal xref (https://fb.workplace.com/groups/257735836456307/posts/804793021750583/?comment_id=805229281706957&reply_comment_id=805232695039949) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142847 Approved by: https://github.com/zou3519	2024-12-20 07:46:49 +00:00
Avik Chaudhuri	29b586bbad	fix formatting in programming model doc (#143587 ) Test Plan: Some of the formatting in https://docs-preview.pytorch.org/pytorch/pytorch/143546/export.programming_model.html is broken. Differential Revision: D67458972 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143587 Approved by: https://github.com/yushangdi	2024-12-20 07:09:19 +00:00
Huy Do	fe0f20615c	[DynamoBench] Handle accuracy results in benchmark records (#143611 ) I discovered this issue when trying to search for the accuracy results on the database and couldn't find any. It turns out that the results is there on the JSON file, for example `"metric": {"name": "accuracy", "benchmark_values": ["pass_due_to_skip"]}`, but inserting them into the database fails because benchmark values is a list of strings here while the expectation is that it's a list of numbers. ClickHouse doesn't support mix types atm. It has a Variant type https://clickhouse.com/docs/en/sql-reference/data-types/variant, but this isn't recommended by CH team themselves. So, the remaining option is to store this in the `extra_info` field. This field is a dictionary, so it can goes there. ### Testing https://github.com/pytorch/pytorch/actions/runs/12421747715 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143611 Approved by: https://github.com/kit1980	2024-12-20 06:43:38 +00:00
Sam Ginzburg	132fcf4e0d	[user triton] Raise an exception when encountering nested @triton.autotune decorators or @triton.heuristics (#143519 ) We support running a single Autotuner for each Triton kernel. Currently, if there are multiple autotuning decorators, the subsequent ones will be silently ignored. Instead, we should raise an error here to avoid silent incorrectness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143519 Approved by: https://github.com/aakhundov	2024-12-20 06:38:45 +00:00
PyTorch MergeBot	71479a9b9c	Revert "[AOTI] Emit a CMakeLists.txt when package_cpp_only (#143352 )" This reverts commit `429f4cd140`. Reverted https://github.com/pytorch/pytorch/pull/143352 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the new test is failing on ROCm ([comment](https://github.com/pytorch/pytorch/pull/143352#issuecomment-2556365140))	2024-12-20 06:21:31 +00:00
Jane Xu	4e29e4aa63	[BE] Add a test to ensure grads are never inplaced into accidentally (#143612 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143612 Approved by: https://github.com/soulitzer	2024-12-20 06:15:08 +00:00
Xu Han	2daa666591	update kineto to XPU Windows fixed PR. [submodule kineto] (#143445 ) Include XPU Windows Fixed PR: https://github.com/pytorch/kineto/pull/1012 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143445 Approved by: https://github.com/sraikund16	2024-12-20 05:57:30 +00:00
zeshengzong	217a4ddb04	Add range check embedding_bag on input index >= 0 of cuda device (#140791 ) Fixes #89362 Test Result Before ``` >>> import torch >>> input = torch.randint(-5, 1, [1, 2], dtype=torch.int64).cuda() >>> weight = torch.rand([2, 3], dtype=torch.float32).cuda() >>> print(torch.nn.functional.embedding_bag(input, weight)) tensor([[0., 0., 0.]], device='cuda:0') ``` After ```python >>> import torch >>> input = torch.randint(-5, 1, [1, 2], dtype=torch.int64).cuda() >>> weight = torch.rand([2, 3], dtype=torch.float32).cuda() >>> print(torch.nn.functional.embedding_bag(input, weight)) /home/zong/code/pytorch/aten/src/ATen/native/cuda/EmbeddingBag.cu:141: EmbeddingBag_updateOutputKernel_sum_mean: block: [0,0,0], thread: [0,0,0] Assertion `0 <= input_idx && input_idx < numRows` failed. /home/zong/code/pytorch/aten/src/ATen/native/cuda/EmbeddingBag.cu:141: EmbeddingBag_updateOutputKernel_sum_mean: block: [0,0,0], thread: [1,0,0] Assertion `0 <= input_idx && input_idx < numRows` failed. /home/zong/code/pytorch/aten/src/ATen/native/cuda/EmbeddingBag.cu:141: EmbeddingBag_updateOutputKernel_sum_mean: block: [0,0,0], thread: [2,0,0] Assertion `0 <= input_idx && input_idx < numRows` failed. Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/zong/code/pytorch/torch/_tensor.py", line 568, in __repr__ return torch._tensor_str._str(self, tensor_contents=tensor_contents) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zong/code/pytorch/torch/_tensor_str.py", line 708, in _str return _str_intern(self, tensor_contents=tensor_contents) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zong/code/pytorch/torch/_tensor_str.py", line 625, in _str_intern tensor_str = _tensor_str(self, indent) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zong/code/pytorch/torch/_tensor_str.py", line 357, in _tensor_str formatter = _Formatter(get_summarized_data(self) if summarize else self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zong/code/pytorch/torch/_tensor_str.py", line 146, in __init__ tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. ``` ```bash $ pytest test/nn/test_embedding.py ``` ![image](https://github.com/user-attachments/assets/6a5ec759-a3dc-4d51-9e5e-ec79c0aac526) ```bash $ lintrunner ``` ![image](https://github.com/user-attachments/assets/2ce4ac24-74fb-4181-9510-18b96a2c2acb) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140791 Approved by: https://github.com/eqy	2024-12-20 05:47:26 +00:00
bobrenjc93	9713a6eeca	remove allow-untyped-defs from torch/fx/experimental/refinement_types.py (#143602 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143602 Approved by: https://github.com/aorenste	2024-12-20 05:40:52 +00:00
bobrenjc93	78d294379a	remove allow-untyped-defs from torch/_lazy/config.py (#143603 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143603 Approved by: https://github.com/aorenste	2024-12-20 05:34:19 +00:00
bobrenjc93	cb4e9888df	remove allow-untyped-defs from torch/ao/quantization/experimental/APoT_tensor.py (#143601 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143601 Approved by: https://github.com/aorenste	2024-12-20 05:26:09 +00:00
bobrenjc93	dd346dbeab	remove allow-untyped-defs from torch/distributed/elastic/multiprocessing/errors/handlers.py (#143605 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143605 Approved by: https://github.com/aorenste	2024-12-20 05:25:01 +00:00
Michael Lazos	fd23cf5848	[Dynamo] check node class first for graph dedup (#143609 ) as title Pull Request resolved: https://github.com/pytorch/pytorch/pull/143609 Approved by: https://github.com/williamwen42	2024-12-20 04:09:46 +00:00
William Wen	1c2593f035	[dynamo] guard global autocast state (#143592 ) Fixes https://github.com/pytorch/pytorch/issues/112260. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143592 Approved by: https://github.com/jansel	2024-12-20 03:30:54 +00:00
drisspg	d339f1506b	Add cutlass version guard in prep for upgrade (#143551 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143551 Approved by: https://github.com/eqy	2024-12-20 02:40:02 +00:00
Mayank Mishra	75661f2036	try root fix for FP8 tensor (#143248 ) Fixes #143194 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143248 Approved by: https://github.com/fegin	2024-12-20 01:57:17 +00:00
PyTorch MergeBot	4462cc6375	Revert "[Inductor] inplace padding (#140249 )" This reverts commit `297ce77636`. Reverted https://github.com/pytorch/pytorch/pull/140249 on behalf of https://github.com/huydhn due to This break an internal test https://fburl.com/test/ppl2we5l ([comment](https://github.com/pytorch/pytorch/pull/140249#issuecomment-2556079406))	2024-12-20 01:30:27 +00:00
bobrenjc93	e1b4635504	remove allow-untyped-defs from torch/distributed/pipelining/_debug.py (#143606 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143606 Approved by: https://github.com/aorenste	2024-12-20 01:26:51 +00:00
Jane Xu	a0cff096bc	Improve cond error messaging (#143595 ) Discovered by @drisspg and I trying out a simple toy example and being way too confused :') Pull Request resolved: https://github.com/pytorch/pytorch/pull/143595 Approved by: https://github.com/zou3519, https://github.com/ydwu4	2024-12-20 01:19:20 +00:00
Yanan Cao (PyTorch)	d547fae5b0	[Codemod][AddExplicitStrictExportArg] caffe2/torch/onnx/_internal/exporter (#143542 ) Reviewed By: avikchaudhuri Differential Revision: D67381244 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143542 Approved by: https://github.com/ydwu4, https://github.com/titaiwangms	2024-12-20 00:54:52 +00:00
Sun, Jiayi	544de4008e	[Inductor] Constrain the shape of other tensor for Conv/Linear + broadcast add fusion. (#141759 ) Fix https://github.com/pytorch/pytorch/issues/141671. Summary: The performance regression of these two timm_models is caused by Conv/Linear + broadcast add fusion run into oneDNN ref path. This PR constrains the shape of other tensor for Conv/Linear + broadcast add fusion to fix this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141759 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jansel	2024-12-20 00:35:58 +00:00

1 2 3 4 5 ...

82517 Commits