pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Hyunho Yeo	d70b7029c8	[MTIA] Support torch.mtia.empty_cache() (#141533 ) Summary: As title Test Plan: Passed a local unit test: `buck2 test //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api` https://www.internalfb.com/intern/testinfra/testrun/4785074861101240 Reviewed By: nautsimon Differential Revision: D66481778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141533 Approved by: https://github.com/nautsimon	2024-11-28 02:24:19 +00:00
Yu, Guangye	ac0b0d11ab	[Reland] Fix tensor.data_ptr() representation overflow (#135567 ) # Motivation fix https://github.com/pytorch/pytorch/issues/135550 In PyTorch, [`tensor.data_ptr()`](`e889252493/tools/autograd/templates/python_variable_methods.cpp (L204)`) is reinterpreted by a [signed int64](`e889252493/torch/csrc/autograd/utils/wrap_outputs.h (L50)`) data type, which could result in an overflow issue, like below: ```python import torch a = torch.randn(2).to('xpu') a.data_ptr() # one possible output is -23453392437248 # this is inconsistent with storage.data_ptr() a.untyped_storage().data_ptr() # one possible output is 18446720620317114368 ``` This PR aims to fix this representation overflow issue to make `tensor.data_ptr()` consistent with [`tensor.untyped_storage().data_ptr()`](`c0d2f991b1/torch/csrc/StorageMethods.cpp (L62)`). With this PR, the output will become: ```python import torch a = torch.randn(2).to('xpu') a.data_ptr() # one possible output is 18446720620317114368 # this is consistent with storage.data_ptr() a.untyped_storage().data_ptr() # one possible output is 18446720620317114368 ``` # Solution Use `PyLong_FromVoidPtr` to prevent the overflow issue and fit the semantic of `wrap`. # Additional Context This PR has been reverted (in place, no more change, and revert commit `2e8d431a8f`) due to the change of `tensor.data_ptr()`, which needs to sync up to intel xpu triton side, see [#2192](https://github.com/intel/intel-xpu-backend-for-triton/pull/2192). So we have to update xpu triton commit pin with this PR together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135567 Approved by: https://github.com/dvrogozh, https://github.com/EikanWang, https://github.com/albanD	2024-11-28 02:01:52 +00:00
Jason Ansel	ca9bfa1a38	[inductor] Fix 3d tiling (#141709 ) Fixes #141121 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141709 Approved by: https://github.com/eellison	2024-11-28 01:34:28 +00:00
Zhou, Lingzhi	ad3986498a	[Partitioner] Speed up the update of partition map (#136616 ) We can update partition map by iterating users of node but not all of the downstream users of node. The former is faster than the latter which has many duplicate insertion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136616 Approved by: https://github.com/jgong5, https://github.com/tarun292	2024-11-28 01:11:44 +00:00
cyy	45ed7c13fa	Remove unneeded std::make_optional (#141567 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141567 Approved by: https://github.com/albanD	2024-11-28 00:05:21 +00:00
Mark Saroufim	e24190709f	[BE] Remove Model Dump utility (#141540 ) So I found this utility by accident, trying to find how many html files we have in the repo so I could convert them to markdown Turns out we package some html and js files in pytorch to visualize torchscript models. This seems kinda strange, probably shouldn't be in core, I removed the tests I could find. Maybe some internal tests will break but considering torchscript is being superseded might make sense to do this Last time there was a meaningful update to the test for this file was about 2 years ago by @digantdesai since then it's a bunch of routine upgrades It seems like this package is unused https://github.com/search?type=code&auto_enroll=true&q=torch.utils.model_dump&p=1 I skimmed through 5 pages of these and the only time this shows up in code search is when someone is either cloning pytorch or checking in their venv into github Pull Request resolved: https://github.com/pytorch/pytorch/pull/141540 Approved by: https://github.com/malfet	2024-11-27 22:52:55 +00:00
Ryan Guo	533798ef46	[dynamo] Enforce some invariants on `ConstantVariable.create` (#140984 ) This addresses https://github.com/pytorch/pytorch/pull/140745#issuecomment-2480854259. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140984 Approved by: https://github.com/jansel ghstack dependencies: #141504	2024-11-27 21:58:35 +00:00
Ryan Guo	3141e038f0	[dynamo] Fix `VariableBuilder._wrap` on frozenset and enforce invariants on `ConstantVariable` (#141504 ) Prior to this patch, we are using `ConstantVariable.create` to create VT for frozenset objects, and intended yet failed to predicate that on all itmes being literals (see https://github.com/pytorch/pytorch/pull/140984#discussion_r1847393736). The code was from https://github.com/pytorch/torchdynamo/commit/7c03434 and the original goal was to help DBR quantization, but as the new test in this patch shows, it could lead to silent incorrectness. Upon a closer look, this exposes some subtleties in how Dynamo handles `ConstantVariable` and `LOAD_CONST`, so this patch both fixes the aforementioned issue and documents, enforces, and makes explicit the invariants around `ConstantVariable` and `LOAD_CONST` -- only immutable objects are supported. Specifically, this patch: 1. refine the checks for wrapping a `frozenset` object, document why we can't just wrap its items directly due to lack of `Sourcec` for set items, and use a safe workaround (`SourcelessBuilder`) to ensure soundness while keeping the DBR quantization support. 2. Adds more types to `common_constant_types`, thereby making `ConstantVariable.is_base_literal` more lenient, and strictly checks this property in the constructor of `ConstantVariable`. 3. Change relevant uses of `create_instruction("LOAD_CONST", ...)` to `create_load_const` which checks `is_safe_constant`, and makes developer overrides explicit by using `create_load_const_unchecked` when needed. 4. In a few places, use more specific `VariableTracker`, e.g., `TypingVariable` rather than `ConstantVariable`, and `FrozensetVariable` rather than `SetVariable`. (2) and (3) are mainly to future-proof Dynamo against bugs like (1). Pull Request resolved: https://github.com/pytorch/pytorch/pull/141504 Approved by: https://github.com/jansel	2024-11-27 21:58:35 +00:00
Yanbo Liang	5f004f455a	[Dynamo][Distributed] Fix ProcessGroup getattr (#141638 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141638 Approved by: https://github.com/williamwen42, https://github.com/jansel	2024-11-27 21:42:33 +00:00
Edward Z. Yang	dbbebee9d7	Code motion CompiledFxGraph to a dedicated file (#141654 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/141654 Approved by: https://github.com/aorenste, https://github.com/jansel ghstack dependencies: #141491, #141492, #141574	2024-11-27 20:42:21 +00:00
James Wu	a7ca6a9113	Enable autograd cache on inductor tests (#140890 ) This turns on AOTAutogradCache for all inductor tests. It clears AOTAutogradCache on each test as well, by virtue of the local cache using the same directory to store cache entries. I've also tested with INDUCTOR_TEST_DISABLE_FRESH_CACHE=1, running all the tests. AOTAutogradCache successfully caches 99% of these. There are a few tests that use view_replay and therefore save functional tensors, which cause AOTAutogradCache to fail to pickle its result. Will look into next steps there, but for now, it seems okay if the cache just misses on those cases where it can't serialize the result. It would be better to check before pickling, though. I've made the following small bugfixes to get this working: - Inductor is sometimes used in a standalone mode without dynamo, which leads to attribute errors in check_can_cache. In general, we should never crash in cache checking, only bypass. So I change a try catch to check Exception instead of just a specific exception. - Add extra structured logging for metadata on cache hits Pull Request resolved: https://github.com/pytorch/pytorch/pull/140890 Approved by: https://github.com/bdhirsh	2024-11-27 20:41:43 +00:00
FindHao	ab63b679e9	Save indexing for getitem nodes when do custom replacements (#140193 ) Fixes #137280 When we have multiple indexings for the same array as returned items in pattern replacement, we shouldn't ignore its indexing numbers. otherwise, we may create a wrong pattern_to_node mapping. A unit test is added in this PR. In this unit test, the function `rms_pattern_static` is replaced with `rms_replacement_static` when called. The function `rms_pattern_static` calls two functionalized custom operators, `torch.ops.vllm.rms_norm.default` and `torch.ops.vllm.static_scaled_int8_quant.default`, and it returns at2[1] and at2[2] as outputs. The function `rms_replacement_static` calls one functionalized custom operator `torch.ops.vllm.fused_rms_norm_quant_static.default`, which returns two corresponding items. Run `python test/inductor/test_pattern_matcher.py -k test_multioutput_register_replacement` to test. After set `TORCH_COMPILE_DEBUG` to 1, the final part of the `fx_graph_readable.py` is like the following. ```python # File: /home/yhao/p9/pytorch/test/inductor/test_pattern_matcher.py:1673 in rms_pattern_static, code: at1 = auto_functionalized( auto_functionalized = torch.ops.higher_order.auto_functionalized(torch.ops.vllm.rms_norm.default, result = permute_1, input = convert_element_type, weight = convert_element_type_1, epsilon = 1e-06); permute_1 = convert_element_type = convert_element_type_1 = None getitem_1: "bf16[5, 4]" = auto_functionalized[1]; auto_functionalized = None # File: /home/yhao/p9/pytorch/test/inductor/test_pattern_matcher.py:1680 in rms_pattern_static, code: at2 = auto_functionalized( auto_functionalized_1 = torch.ops.higher_order.auto_functionalized(torch.ops.vllm.static_scaled_int8_quant.default, result = permute, input = getitem_1, scale = full_default, azp = None); permute = getitem_1 = full_default = None getitem_3: "i8[5, 4]" = auto_functionalized_1[1] getitem_4: "f32[1, 1]" = auto_functionalized_1[2]; auto_functionalized_1 = None return (getitem_3, getitem_4) ``` This happens before pattern matching, so is it expected to call `static_scaled_int8_quant` and `rms_norm` and return auto_functionalized_1 as outputs. However, for pytorch before this PR, the `fx_graph_transformed.py`, which is after pattern matching, has the following code. ```python # File: /home/yhao/p9/pytorch/test/inductor/test_pattern_matcher.py:1748 in my_func_static, code: scale = torch.ones((1, 1)) full_default: "f32[1, 1]" = torch.ops.aten.full.default([1, 1], 1, dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False) # No stacktrace found for following nodes as_strided_default: "i8[20]" = torch.ops.aten.as_strided.default(permute, [20], [1], 0) clone_default: "i8[20]" = torch.ops.aten.clone.default(as_strided_default); as_strided_default = None as_strided_default_1: "i8[5, 4]" = torch.ops.aten.as_strided.default(clone_default, [5, 4], [4, 1], 0); clone_default = None as_strided_default_2: "f32[1]" = torch.ops.aten.as_strided.default(full_default, [1], [1], 0) clone_default_1: "f32[1]" = torch.ops.aten.clone.default(as_strided_default_2); as_strided_default_2 = None as_strided_default_3: "f32[1, 1]" = torch.ops.aten.as_strided.default(clone_default_1, [1, 1], [1, 1], 0); clone_default_1 = None static_scaled_int8_quant_default = torch.ops.vllm.static_scaled_int8_quant.default(as_strided_default_1, permute_1, as_strided_default_3); as_strided_default_1 = permute_1 = static_scaled_int8_quant_default = None fused_rms_norm_quant_static_default = torch.ops.vllm.fused_rms_norm_quant_static.default(permute, convert_element_type, convert_element_type_1, full_default, None, 1e-06); convert_element_type = convert_element_type_1 = full_default = fused_rms_norm_quant_static_default = None return (permute, as_strided_default_3) ``` Here, it returns `(permute, as_strided_default_3)` while `permute` is written by fused_rms_norm_quant_static and `as_strided_default_3` is written by `static_scaled_int8_quant`. This is wrong because in our expectation, the `static_scaled_int8_quant` should be removed since it is replaced with `fused_rms_norm_quant_static`. It is supposed to return `(permute, full_default)`. The root cause is the following part. When we [generate patterns](`5f4a21dc58/torch/_inductor/pattern_matcher.py (L1580)`) with traced fx graph and call the following function, the indexing numbers' type int in traced graph are ignored in `ignore_types`. So, the final arguments of patterns for those two output items are like `(CallFunction(auto_functionalized,XXX)), )`. `5f4a21dc58/torch/_inductor/pattern_matcher.py (L1839-L1847)` When we do pattern matching after we generated patterns in the following part, the `sorted(itertools.chain.from_iterable(nodes), reverse=True)` is `[getitem_4, getitem_3, getitem_1]`. The getitem_4's iteration is always FailedMatch because we always use the first element to do the pattern match here (it fails on different match functions before and after this PR, but the reason is always the indexing numbers issue)`d4cdc09881/torch/_inductor/pattern_matcher.py (L848)`. However, when we do pattern matching for getitem_3, the child_match returns a match for getitem_3 again which is because the `` pattern can match anything. Then the getitem_3's pattern matching returns a `[getitem_3, getitem_3]` as outputs which are wrong. `d4cdc09881/torch/_inductor/pattern_matcher.py (L856)` `d4cdc09881/torch/_inductor/pattern_matcher.py (L1750-L1774)` This PR doesn't ignore `int` type when we generate patterns for getitem functions because integer indexing numbers are important to them. Thus, the indexing information is kept in patterns, ensuring correct matchings. With this PR, the above `child_match` returns a match for getitem_4, and the final getitem_3's pattern matching returns the correct `[getitem_3, getitem_4]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140193 Approved by: https://github.com/eellison	2024-11-27 20:19:13 +00:00
Isuru Fernando	b37cfddeb3	Refactor ShapeGuardPrinter for future C++ addiiton (#140968 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140968 Approved by: https://github.com/anijain2305 ghstack dependencies: #140597	2024-11-27 20:09:58 +00:00
Francisco Massa	e5d02e0cfb	Fix non-determinism in the partitioner (#141682 ) When multiple nodes have similar sizes and are part of the `banned_nodes` (which is a `set` and not a `list`), there is non-determinism present in the partitioner due to sorting only by node-size. This PR fixes this by also sorting by node name. It would be good to add some tests, but I'm not sure about the best way to do it here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141682 Approved by: https://github.com/Chillee, https://github.com/yf225	2024-11-27 19:33:15 +00:00
PyTorch MergeBot	8c90a9a030	Revert "fix non termination in unflatten + state (#141494 )" This reverts commit `5d7c3701e4`. Reverted https://github.com/pytorch/pytorch/pull/141494 on behalf of https://github.com/jovianjaison due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/141494#issuecomment-2504639230))	2024-11-27 19:30:55 +00:00
Boyuan Feng	17fd53d8e5	[Inductor] Inplacing with Donated Buffer (#140113 ) Currently, inductor does not inplace update a buffer if it is an input buffer. Because we don't know if an input will be used by other functions. Donated buffer provides additional information that an input buffer will not be used by other functions. So we can inplace update donated buffer when possible. [Dashboard](https://hud.pytorch.org/benchmark/torchbench/inductor_dynamic?dashboard=torchinductor&startTime=Mon,%2011%20Nov%202024%2018:14:36%20GMT&stopTime=Mon,%2018%20Nov%202024%2018:14:36%20GMT&granularity=hour&mode=training&dtype=amp&deviceName=cuda%20(a100)&lBranch=bf/donated-buffer-inplace&lCommit=5df0769c00e6f9000caeb10fd5cbf0b165f69c2a&rBranch=main&rCommit=2b39a8db7741b816b03677a9c6fec1af05640dee) ![image](https://github.com/user-attachments/assets/f19d961f-7973-418e-9de8-5c2a97950478) ![image](https://github.com/user-attachments/assets/df3bd6a9-58b8-4e8a-8397-9e3b1de9adfe) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140113 Approved by: https://github.com/eellison	2024-11-27 18:51:52 +00:00
Ke Wen	ad39a2fc46	[1/N] Decouple Flight Recorder from NCCL utils (#141648 ) Part of the effort to make Flight Recorder device agnostic. Step 1: Move it out of NCCLUtils. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141648 Approved by: https://github.com/fduwjj	2024-11-27 18:29:42 +00:00
eellison	fd553b9817	Add remaining method and tests for dtype propagation (#140057 ) Adds the remaining unimplemented ops as well as an assertion failure if someone adds a new op without a dtype rule. We test all unique pointwise operators registered as lowerings which have an opinfo. There will be some follow ups for this to work well with both `codegen_upcast_to_fp32` as True and False. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140057 Approved by: https://github.com/arui-meta, https://github.com/blaine-rister, https://github.com/ezyang ghstack dependencies: #139945	2024-11-27 17:06:44 +00:00
eellison	566ceb3e7e	Refactor dtype propagation (#139945 ) A couple changes. - Tries to reuse dtype propagation rules that were already registered in inductor. These were present both with `pointwise_overrides_data` and the `boolean_ops` list. Additionally, the registration of pointwise ops already specified dtype propagation rules. Saves those registrations and reuses them later. - Factors out `get_promoted_dtype` which uses functools.lru_cache to take in non - CSEVariable args because those will not work with the functools cache. Tests get added later in the stack when everything is implemented. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139945 Approved by: https://github.com/blaine-rister, https://github.com/arui-meta, https://github.com/ezyang	2024-11-27 16:57:02 +00:00
Edward Z. Yang	7ea0da2d57	Modest code motion in compile_fx (#141574 ) Do code review with whitespace changes off. Check comments for what I changed. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/141574 Approved by: https://github.com/bobrenjc93, https://github.com/jansel ghstack dependencies: #141491, #141492	2024-11-27 13:38:14 +00:00
leslie-fang-intel	aa827e319e	[Inductor][CPP] Extract common functions to be reused in other CPP Template (#141554 ) Summary Extract common internal functions from GEMM Template into public function, so these functions can be reused by the subsequent group GEMM template. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141554 Approved by: https://github.com/jgong5	2024-11-27 09:52:18 +00:00
axel	763038db66	Clarify torch.arange floating-point rounding behavior (#141655 ) Added documentation note clarifying the rounding behavior of `torch.arange` when using floating-point dtypes, particularly for reduced precision types like `bfloat16`. This helps users understand potential issues like repeated values and provides guidance on using integer dtypes for precise sequences. ## Changes - Added explanatory note about floating-point rounding behavior and its effects - Included specific mention of `bfloat16` dtype issues - Added recommendation to use integer dtypes for precise sequences Fixes [#137774](https://github.com/pytorch/pytorch/issues/137774) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141655 Approved by: https://github.com/cpuhrsch	2024-11-27 09:31:39 +00:00
Jaewoo Song	43a2a231d3	Support linear/BN fusion and follow the API guideline (#141585 ) Current `fuse` function supports conv/BN fusions only. This commit is to support linear/BN fusion as well. Changes to follow the API guidelines are also applied. (This will close the PR #141352 which I created for the same topic and got approval but had lint and API guideline problems.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141585 Approved by: https://github.com/ezyang	2024-11-27 06:52:00 +00:00
Jesse Cai	5accae4197	[sparse] add extra options to _cslt_spare_mm (#137427 ) Summary: Splitting this PR into two, one for the cuSPARSELt improvements, and one for the inductor lowering. This PR adds in the additional cuSPARSELt bindings into pytorch. * `torch._cslt_sparse_mm_search` will be deprecated in a future PR, so a warning has been added * Added a header file for cuSPARSELtOps.cpp * max_id is now available in `torch.backends.cusparselt` via `torch.backends.cusparselt.get_max_alg_id()` * fixed meta registrations for float8 Test Plan: python test/test_sparse_semi_structured.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/137427 Approved by: https://github.com/cpuhrsch, https://github.com/eqy	2024-11-27 05:32:45 +00:00
vasiliy	3d5fe0ce78	torch._scaled_mm: support dims of size 0 for tensorwise scaling (#140967 ) Summary: Ensures we support dims of size 0 properly in `torch._scaled_mm`. Follows the behavior from `torch.mm`. For now only enable support for tensorwise, we can tackle rowwise in a future PR. Test Plan: ``` python test/test_matmul_cuda.py -k test_zero_dim ``` Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140967 Approved by: https://github.com/eqy, https://github.com/drisspg	2024-11-27 04:07:52 +00:00
PyTorch MergeBot	6e61ff4fd3	Revert "Add `truediv` support in export serializer (#136364 )" This reverts commit `1df440dc4e`. Reverted https://github.com/pytorch/pytorch/pull/136364 on behalf of https://github.com/huydhn due to Sorry for reverting your change but its doc build failure is legit ([comment](https://github.com/pytorch/pytorch/pull/136364#issuecomment-2502620732))	2024-11-27 03:24:31 +00:00
Joel Schlosser	c9e2b3fefe	NJT: Return correct number of outputs for chunk() on the batch dim (#141604 ) Old logic was completely wrong, returning `chunk_size` chunks instead of the intended number. The original test didn't catch this because `chunk_size == num_chunks` :p New OpInfo-based testing covers it though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141604 Approved by: https://github.com/soulitzer ghstack dependencies: #141500, #140736, #140161, #141392, #141506	2024-11-27 02:31:23 +00:00
Joel Schlosser	43121b6f0d	Adjust output NJT ragged_idx for reductions and select() (#141506 ) This fixes some bugs when performing reductions / select() on dims before the ragged dim. In this case, the output NJT has a smaller number of dims, and its ragged_idx should reflect that correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141506 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer ghstack dependencies: #141500, #140736, #140161, #141392	2024-11-27 02:25:53 +00:00
Arthur Feeney	0c587c324d	DOC: Correct torch.trapezoid docstring (#141459 ) This is super duper minor, but I believe this corrects a typo in the documentation of `torch.trapezoid`. The documentation says the input is a 1-dimensional tensor $y_0, \dots, y_n$, but it uses summations going from 1 to n-1. Since it's summing over terms $y_i - y_{i-1}$, stopping at n-1 excludes the last partition $y_n - y_{n-1}$, which doesn't match the implementation... ```python # (just showing it does include $y_n - y_{n-1}$) torch.trapezoid([0, 0, 9999]) == 9999 / 2 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141459 Approved by: https://github.com/colesbury	2024-11-27 01:54:14 +00:00
Richard Barnes	fca0f34b83	Switch c10::string_view to std::string_view (#139635 ) Shortens `string_view_starts_with` to `starts_with`. Adds some missing headers. Isolates `c10_string_view` to use with `get_fully_qualified_name`. Test Plan: Sandcastle Reviewed By: ezyang Differential Revision: D64833558 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139635 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2024-11-27 01:41:18 +00:00
Sypherd	d6276c2fbd	Remove double space from warning (#141566 ) Removes a double space from a warning in a way consistent with prior lines. (Sorry, I saw this a few times when running vllm and the double space was killing me) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141566 Approved by: https://github.com/colesbury	2024-11-27 01:32:00 +00:00
Yoni Chechik	3e90c00a87	Missing space in torch.autograd.Function deprecation warning (#141562 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141562 Approved by: https://github.com/colesbury	2024-11-27 01:31:26 +00:00
zeshengzong	136ff97095	[dynamo][log] Remove print torch inner stacktrace to let users focus on their code error (#141553 ) Fixes #140394 Test Result ```bash TORCH_LOGS="graph_breaks" python test.py ``` ```python # test.py from typing import List import torch def fn002(x): x = x + 1 torch._dynamo.graph_break() x = x + 1 return x def fn001(x): return fn002(x) torch.compile(fn001, backend="eager")(torch.randn(1)) ``` Before log ``` V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] Graph break in user code at /home/zong/code/pytorch/../scripts/dynamo.py:6 V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] Reason: Unsupported: 'skip function graph_break in file /home/zong/code/pytorch/torch/_dynamo/decorators.py' V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] User code traceback: V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/../scripts/dynamo.py", line 11, in fn001 V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] return fn002(x) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/../scripts/dynamo.py", line 6, in fn002 V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] torch._dynamo.graph_break() V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] Traceback (most recent call last): V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 641, in wrapper V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] return inner_fn(self, inst) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] ^^^^^^^^^^^^^^^^^^^^ V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 2314, in CALL V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] self._call(inst) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 2308, in _call V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] self.call_function(fn, args, kwargs) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 879, in call_function V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/variables/functions.py", line 328, in call_function V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] return super().call_function(tx, args, kwargs) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/variables/functions.py", line 129, in call_function V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] return tx.inline_user_function_return(self, [self.self_args(), args], kwargs) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 885, in inline_user_function_return V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] return InliningInstructionTranslator.inline_call(self, fn, args, kwargs) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 3045, in inline_call V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] return cls.inline_call_(parent, func, args, kwargs) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 3171, in inline_call_ V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] tracer.run() V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 1032, in run V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] while self.step(): V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] ^^^^^^^^^^^ V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 944, in step V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] self.dispatch_table[inst.opcode](self, inst) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 641, in wrapper V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] return inner_fn(self, inst) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] ^^^^^^^^^^^^^^^^^^^^ V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 2314, in CALL V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] self._call(inst) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 2308, in _call V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] self.call_function(fn, args, kwargs) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 879, in call_function V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/variables/functions.py", line 708, in call_function V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] unimplemented(msg) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/torch/_dynamo/exc.py", line 313, in unimplemented V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] raise Unsupported(msg, case_name=case_name) V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] torch._dynamo.exc.Unsupported: 'skip function graph_break in file /home/zong/code/pytorch/torch/_dynamo/decorators.py' V1126 16:01:41.722000 1303718 torch/_dynamo/symbolic_convert.py:424] [1/0] [__graph_breaks] Graph break (details suppressed) in user code at /home/zong/code/pytorch/../scripts/dynamo.py:6 V1126 16:01:41.722000 1303718 torch/_dynamo/symbolic_convert.py:424] [1/0] [__graph_breaks] Reason: Unsupported: 'skip function graph_break in file /home/zong/code/pytorch/torch/_dynamo/decorators.py ``` After log ``` V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] Graph break in user code at /home/zong/code/pytorch/../scripts/dynamo.py:6 V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] Reason: Unsupported: 'skip function graph_break in file /home/zong/code/pytorch/torch/_dynamo/decorators.py' V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] User code traceback: V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/../scripts/dynamo.py", line 11, in fn001 V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] return fn002(x) V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] File "/home/zong/code/pytorch/../scripts/dynamo.py", line 6, in fn002 V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] torch._dynamo.graph_break() V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] V1126 16:01:19.918000 1303438 torch/_dynamo/symbolic_convert.py:423] [1/0] [__graph_breaks] Graph break (details suppressed) in user code at /home/zong/code/pytorch/../scripts/dynamo.py:6 V1126 16:01:19.918000 1303438 torch/_dynamo/symbolic_convert.py:423] [1/0] [__graph_breaks] Reason: Unsupported: 'skip function graph_break in file /home/zong/code/pytorch/torch/_dynamo/decorators.py' ``` Using tlparse get stacktrace The trace log implement for graph breaks in `5318bf8baf/torch/_dynamo/symbolic_convert.py (L417-L424)` Get trace log by running ```bash TORCH_TRACE=/tmp/my_traced_log python test.py ``` Using tlparse to get report ``` tlparse dedicated_log_torch_trace_9unwqrxn.log -o out1 ``` Result ![image](https://github.com/user-attachments/assets/01d2ff25-90ec-4b9f-bcb6-5ae59ba65b35) strack info in `0_0_0/dynamo_graph_break_reason_0.txt ` ![image](https://github.com/user-attachments/assets/c4a04bd0-496a-4862-8230-c01f85e6f3c3) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141553 Approved by: https://github.com/shink, https://github.com/ezyang	2024-11-27 01:26:11 +00:00
Edward Z. Yang	8c8a484d72	Add some symbolic shapes guard logs to tlparse by default (#140867 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/140867 Approved by: https://github.com/bdhirsh	2024-11-27 01:00:14 +00:00
cyy	2f082e1e56	[13/N] Fix extra warnings brought by clang-tidy-17 (#140897 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140897 Approved by: https://github.com/ezyang	2024-11-27 00:35:19 +00:00
bhack	1df440dc4e	Add `truediv` support in export serializer (#136364 ) Fixes #136113 - [x] Inital `truediv` coverage - [ ] Expand/reduce coverage? - [x] Add tests - [x] Re-check docstrings - [ ] Linting Pull Request resolved: https://github.com/pytorch/pytorch/pull/136364 Approved by: https://github.com/pianpwk Co-authored-by: Angela Yi <angelayi@meta.com> Co-authored-by: Pian Pawakapan <pianpwk@meta.com>	2024-11-27 00:31:47 +00:00
Xuehai Pan	07850bb2c1	[dynamo][pytree][1/N] make CXX pytree traceable: `tree_iter` / `tree_leaves` (#137397 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137397 Approved by: https://github.com/jansel ghstack dependencies: #141360	2024-11-27 00:21:58 +00:00
Xuehai Pan	cdde73033e	[dynamo] fix generic namedtuple support when the class is created via `class MyTuple(NamedTuple, Generic[T]): ...` (#141360 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141360 Approved by: https://github.com/jansel	2024-11-27 00:21:58 +00:00
vasiliy	605392bd06	add float8 types to LoggingTensor (#141385 ) Summary: float8 dtypes were missing from this map, adding Test Plan: CI, and unbreaks debugging in torchao If there is an existing test I can add this to - lmk Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141385 Approved by: https://github.com/soulitzer	2024-11-26 23:39:57 +00:00
Kiuk Chung	5b0b16ca62	[torch/distributed] Make _SymmetricMemory.has_multicast_support() ret… (#141598 ) `SymmetricMemory.has_multicast_support()` throws an exception rather than returning `False` when called with a `DeviceType` that does not support. For example: ``` from torch._C._distributed_c10d import _SymmetricMemory from torch._C._autograd import DeviceType try: supports_multicast = _SymmetricMemory.has_multicast_support(DeviceType.CPU, 0) except RuntimeError as exc: assert str(exc) == "SymmetricMemory does not support device type cpu" ``` This is problematic when building PyTorch from source without `CUDASymmetricMemory.cu` since the [`@requires_multicast_support`](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_distributed.py#L353) test decorator will throw an exception rather than skipping the test (as intended) This PR makes `_SymmetricMemory.has_multicast_support()` properly return `False` when multicast is not supported on the passed device. cc) @malfet , @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/141598 Approved by: https://github.com/yifuwang	2024-11-26 23:36:32 +00:00
Joel Schlosser	23793cf93d	NJT unsqueeze() fixes (#141392 ) This PR contains three `unsqueeze()`-related fixes for NJT: 1. Adjusts the output's `_ragged_idx` when `unsqueeze()` inserts a dim before the ragged dim 2. Corrects the unbind reference for `unsqueeze()` after the last input dim. For this case, the dim kwarg canonicalization logic needs to be applied wrt `inp.dim() + 1` to account for `dim=-1` properly 3. Adds ragged dim support to `unsqueeze()`, allowing for e.g. `(B, j1, D) -> (B, 1, j1, D)`. This is okay now after #137125 Note that `unsqueeze()` still doesn't support batch dim operation, and arguably should never support this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141392 Approved by: https://github.com/cpuhrsch ghstack dependencies: #141500, #140736, #140161	2024-11-26 22:38:35 +00:00
Joel Schlosser	9ee5d6f83c	Initial NJT testing over dim type / views (#140161 ) This PR introduces `ExtraOpData`, a structure that contains op metadata regarding whether the op is a view and the dim-related args it accepts. It also populates a huge database for dim-wise / view ops with this info. Test logic (sample input generation, references) have been updated to utilize this data. It allows for a fairly generic set of sample inputs & a reference for the class of ops that accept a single NJT and operate dim-wise (AKA "unary dimwise ops"). Testing is added over the following ops: * `chunk()` * `narrow()` * `select()` * `split()` * `split_with_sizes()` * `squeeze()` * `unflatten()` * `unsqueeze()` Most of the above do not operate on the ragged / batch dims or on non-contiguous NJTs, so the proper xfails are added as needed. I also slipped in a couple minor fixes (sorry): 1. The `_wrap_jagged_dim()` helper now avoids assuming the `nt._ragged_idx == 1` and allows for a batch dim to be a valid input, disambiguating the converted inner dim as necessary through an additional `operating_on_batch` return value (i.e. both dim=0 and dim=1 map to dim=0 on the inner values tensor, since that dim represents a packed ragged dim for all batch items) 2. Padded dense -> NJT conversion requires shape gymnastics to operate with the restrictive FBGEMM kernel. The gymnastics were slightly wrong for the transposed NJT case, and this PR fixes that Pull Request resolved: https://github.com/pytorch/pytorch/pull/140161 Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch ghstack dependencies: #141500, #140736	2024-11-26 22:08:08 +00:00
PyTorch MergeBot	65dbd5cc2d	Revert "[Inductor] Inplacing with Donated Buffer (#140113 )" This reverts commit `eecc8e362c`. Reverted https://github.com/pytorch/pytorch/pull/140113 on behalf of https://github.com/BoyuanFeng due to break test_donated_buffer_inplace internally since donated_buffer = False if is_fbcode() else True ([comment](https://github.com/pytorch/pytorch/pull/140113#issuecomment-2501954300))	2024-11-26 21:20:59 +00:00
Joel Schlosser	869d629c0f	Forward / backward NJT support for several activation functions (#140736 ) Several activation functions were unimplemented due to missing `pointwise` tags. This PR adds them and corresponding backwards implementations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140736 Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch ghstack dependencies: #141500	2024-11-26 21:19:58 +00:00
Tristan Rice	9f4f061f89	PyProcessGroup: support rank, world size, group name/desc overrides (#141529 ) This improves `PyProcessGroup` so you can override rank, world size and group name/desc methods from Python. These will be needed to support resizable process groups in torchft. This also has some small fixes in test_c10d_pypg.py to use threads instead of processes which speeds up the test execution by ~10x. Test plan: ``` pytest test/distributed/test_c10d_pypg.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141529 Approved by: https://github.com/fegin	2024-11-26 20:56:57 +00:00
Joel Schlosser	8ba555ec8a	Fix where() for NJT (#141500 ) Background: It's common to use `scalar_tensor()` in the input to `where()` to convert any scalars present to compatible tensors with matching options, including layout. This shows up in various places, notably including derivative formulas ([example](`78491d6afc/tools/autograd/derivatives.yaml (L432-L434)`)). It causes problems for NJTs because they have `layout=torch.jagged` and it never makes sense to create a scalar tensor with this layout. Some of the breakage only seems to happen in CI for reasons I don't fully understand (see the revert of #140736 due to softshrink's derivative formula). This PR: * Allows non-contiguous NJT inputs to `where()` + adds tests for this * Handles scalar tensor / dense tensor inputs for `condition` / `other` + adds tests for this * Uses limited `broadcast_tensors()` / `broadcast_to()` support * Improves `expand()` to work on non-contig NJTs * Changes `scalar_tensor()` to use `torch.strided` instead of `torch.jagged` in both eager and torch.compile (i.e. meta registration) * Changes backward formulas for `sinc`, `pow`, `special.i1`, and `special.i1e` to uses `scalar_tensor()` instead of e.g. `zeros({})` Alternative approach: Update all problematic usages of `scalar_tensor()` to avoid ever passing `layout=torch.jagged`. This is an extensive change and includes `torch.where()` logic, a bunch of derivative formulas, and likely other places not yet discovered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141500 Approved by: https://github.com/malfet, https://github.com/cpuhrsch, https://github.com/soulitzer	2024-11-26 20:13:27 +00:00
Zhengxu Chen	011650adc5	[sigmoid] Refactor out a helper function to insert const graph into top level graph. (#140854 ) Summary: Add the helper function to put a const graph back to the toplevel graph, can be useful when we're taking const graphs from delegates. Test Plan: CI Reviewed By: trieuat Differential Revision: D63031982 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140854 Approved by: https://github.com/SherlockNoMad	2024-11-26 20:07:46 +00:00
William Wen	6fa4356451	handle sympy.oo in bitwise_and/or value_ranges (#141522 ) An internal test is failing due to not handling `sympy.oo` properly in bitwise_and/or value_ranges: [T208684142](https://www.internalfb.com/intern/tasks/?t=208684142). I don't know how to repro this - seems like this requires inductor to trigger as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141522 Approved by: https://github.com/ezyang ghstack dependencies: #138777	2024-11-26 20:01:31 +00:00
Tsung-Hsien Lee	84f818f359	[DTensorTestbase] Fix `TestFunc` typing issue (#141513 ) Summary: `TestFunc` is annotated as `Callable[[object], object]` which represents a callable that takes a single argument of any type (`object`) and returns a value of any type (`object`). However, in reality, `TestFunc` could be any number of arguments, as a result, the corret typing should be `Callable[[...], object]` instead which represents a callable that takes any number of arguments (including zero) and returns a value of any type (`object`). Test Plan: Contbuild & OSS CI Differential Revision: D66463705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141513 Approved by: https://github.com/wz337, https://github.com/Skylion007	2024-11-26 19:48:34 +00:00
Nichols A. Romero	a99332eb25	[ROCM] Support Multi-GPU offline tuning in TunableOp (#139673 ) This PR enhances offline tuning to support multi-GPUs. High-level description of algorithm: - Duplicate GEMMs are first eliminated - GEMMs are distributed to multi-GPUs for tuning - Results are gathered into a file with `_full` in the filename Also adding support for GemmAndBias and ScaledGemm Pull Request resolved: https://github.com/pytorch/pytorch/pull/139673 Approved by: https://github.com/jeffdaily, https://github.com/hongxiayang	2024-11-26 19:07:41 +00:00

1 2 3 4 5 ...

44191 Commits