Commit Graph

44191 Commits

Author SHA1 Message Date
Hyunho Yeo
d70b7029c8 [MTIA] Support torch.mtia.empty_cache() (#141533)
Summary: As title

Test Plan:
Passed a local unit test: `buck2 test //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api`

https://www.internalfb.com/intern/testinfra/testrun/4785074861101240

Reviewed By: nautsimon

Differential Revision: D66481778

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141533
Approved by: https://github.com/nautsimon
2024-11-28 02:24:19 +00:00
Yu, Guangye
ac0b0d11ab [Reland] Fix tensor.data_ptr() representation overflow (#135567)
# Motivation
fix https://github.com/pytorch/pytorch/issues/135550
In PyTorch, [`tensor.data_ptr()`](e889252493/tools/autograd/templates/python_variable_methods.cpp (L204)) is reinterpreted by a [signed int64](e889252493/torch/csrc/autograd/utils/wrap_outputs.h (L50)) data type, which could result in an **overflow issue**, like below:
```python
import torch
a = torch.randn(2).to('xpu')
a.data_ptr()
# one possible output is
-23453392437248
# this is inconsistent with storage.data_ptr()
a.untyped_storage().data_ptr()
# one possible output is
18446720620317114368
```
This PR aims to fix this representation overflow issue to make `tensor.data_ptr()` consistent with [`tensor.untyped_storage().data_ptr()`](c0d2f991b1/torch/csrc/StorageMethods.cpp (L62)). With this PR, the output will become:
```python
import torch
a = torch.randn(2).to('xpu')
a.data_ptr()
# one possible output is
18446720620317114368
# this is consistent with storage.data_ptr()
a.untyped_storage().data_ptr()
# one possible output is
18446720620317114368
```

# Solution
Use `PyLong_FromVoidPtr` to prevent the overflow issue and fit the semantic of `wrap`.

# Additional Context
This PR has been reverted (in place, no more change, and revert commit 2e8d431a8f) due to the change of `tensor.data_ptr()`, which needs to sync up to intel xpu triton side, see [#2192](https://github.com/intel/intel-xpu-backend-for-triton/pull/2192). So we have to update xpu triton commit pin with this PR together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135567
Approved by: https://github.com/dvrogozh, https://github.com/EikanWang, https://github.com/albanD
2024-11-28 02:01:52 +00:00
Jason Ansel
ca9bfa1a38 [inductor] Fix 3d tiling (#141709)
Fixes #141121

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141709
Approved by: https://github.com/eellison
2024-11-28 01:34:28 +00:00
Zhou, Lingzhi
ad3986498a [Partitioner] Speed up the update of partition map (#136616)
We can update partition map by iterating users of node but not all of the downstream users of node. The former is faster than the latter which has many duplicate insertion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136616
Approved by: https://github.com/jgong5, https://github.com/tarun292
2024-11-28 01:11:44 +00:00
cyy
45ed7c13fa Remove unneeded std::make_optional (#141567)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141567
Approved by: https://github.com/albanD
2024-11-28 00:05:21 +00:00
Mark Saroufim
e24190709f [BE] Remove Model Dump utility (#141540)
So I found this utility by accident, trying to find how many html files we have in the repo so I could convert them to markdown

Turns out we package some html and js files in pytorch to visualize torchscript models. This seems kinda strange, probably shouldn't be in core, I removed the tests I could find. Maybe some internal tests will break but considering torchscript is being superseded might make sense to do this

Last time there was a meaningful update to the test for this file was about 2 years ago by @digantdesai since then it's a bunch of routine upgrades

It seems like this package is unused https://github.com/search?type=code&auto_enroll=true&q=torch.utils.model_dump&p=1 I skimmed through 5 pages of these and the only time this shows up in code search is when someone is either cloning pytorch or checking in their venv into github
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141540
Approved by: https://github.com/malfet
2024-11-27 22:52:55 +00:00
Ryan Guo
533798ef46 [dynamo] Enforce some invariants on ConstantVariable.create (#140984)
This addresses https://github.com/pytorch/pytorch/pull/140745#issuecomment-2480854259.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140984
Approved by: https://github.com/jansel
ghstack dependencies: #141504
2024-11-27 21:58:35 +00:00
Ryan Guo
3141e038f0 [dynamo] Fix VariableBuilder._wrap on frozenset and enforce invariants on ConstantVariable (#141504)
Prior to this patch, we are using `ConstantVariable.create` to create VT
for frozenset objects, and intended yet failed to predicate that on all
itmes being literals (see https://github.com/pytorch/pytorch/pull/140984#discussion_r1847393736).

The code was from https://github.com/pytorch/torchdynamo/commit/7c03434 and
the original goal was to help DBR quantization, but as the new test in
this patch shows, it could lead to silent incorrectness.

Upon a closer look, this exposes some subtleties in how Dynamo handles
`ConstantVariable` and `LOAD_CONST`, so this patch both fixes the
aforementioned issue and documents, enforces, and makes explicit the
invariants around `ConstantVariable` and `LOAD_CONST` -- only immutable
objects are supported.

Specifically, this patch:
1. refine the checks for wrapping a `frozenset` object, document why we
   can't just wrap its items directly due to lack of `Sourcec` for set
   items, and use a safe workaround (`SourcelessBuilder`) to ensure
   soundness while keeping the DBR quantization support.
2. Adds more types to `common_constant_types`, thereby making
   `ConstantVariable.is_base_literal` more lenient, and strictly checks
   this property in the constructor of `ConstantVariable`.
3. Change relevant uses of `create_instruction("LOAD_CONST", ...)` to
   `create_load_const` which checks `is_safe_constant`, and makes
   developer overrides explicit by using `create_load_const_unchecked`
   when needed.
4. In a few places, use more specific `VariableTracker`, e.g.,
   `TypingVariable` rather than `ConstantVariable`, and
   `FrozensetVariable` rather than `SetVariable`.

(2) and (3) are mainly to future-proof Dynamo against bugs like (1).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141504
Approved by: https://github.com/jansel
2024-11-27 21:58:35 +00:00
Yanbo Liang
5f004f455a [Dynamo][Distributed] Fix ProcessGroup getattr (#141638)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141638
Approved by: https://github.com/williamwen42, https://github.com/jansel
2024-11-27 21:42:33 +00:00
Edward Z. Yang
dbbebee9d7 Code motion CompiledFxGraph to a dedicated file (#141654)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141654
Approved by: https://github.com/aorenste, https://github.com/jansel
ghstack dependencies: #141491, #141492, #141574
2024-11-27 20:42:21 +00:00
James Wu
a7ca6a9113 Enable autograd cache on inductor tests (#140890)
This turns on AOTAutogradCache for all inductor tests. It clears AOTAutogradCache on each test as well, by virtue of the local cache using the same directory to store cache entries.

I've also tested with INDUCTOR_TEST_DISABLE_FRESH_CACHE=1, running all the tests. AOTAutogradCache successfully caches 99% of these. There are a few tests that use view_replay and therefore save functional tensors, which cause AOTAutogradCache to fail to pickle its result. Will look into next steps there, but for now, it seems okay if the cache just misses on those cases where it can't serialize the result. It would be better to check before pickling, though.

I've made the following small bugfixes to get this working:
- Inductor is sometimes used in a standalone mode without dynamo, which leads to attribute errors in check_can_cache. In general, we should *never* crash in cache checking, only bypass. So I change a try catch to check Exception instead of just a specific exception.
- Add extra structured logging for metadata on cache hits

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140890
Approved by: https://github.com/bdhirsh
2024-11-27 20:41:43 +00:00
FindHao
ab63b679e9 Save indexing for getitem nodes when do custom replacements (#140193)
Fixes #137280

When we have multiple indexings for the same array as returned items in pattern replacement, we shouldn't ignore its indexing numbers. otherwise, we may create a wrong pattern_to_node mapping.

A unit test is added in this PR. In this unit test, the function `rms_pattern_static` is replaced with `rms_replacement_static` when called. The function `rms_pattern_static` calls two functionalized custom operators, `torch.ops.vllm.rms_norm.default` and `torch.ops.vllm.static_scaled_int8_quant.default`, and it returns at2[1] and at2[2] as outputs. The function `rms_replacement_static` calls one functionalized custom operator `torch.ops.vllm.fused_rms_norm_quant_static.default`, which returns two corresponding items.

Run `python test/inductor/test_pattern_matcher.py -k test_multioutput_register_replacement` to test. After set `TORCH_COMPILE_DEBUG` to 1, the final part of the `fx_graph_readable.py` is like the following.
```python
# File: /home/yhao/p9/pytorch/test/inductor/test_pattern_matcher.py:1673 in rms_pattern_static, code: at1 = auto_functionalized(
auto_functionalized = torch.ops.higher_order.auto_functionalized(torch.ops.vllm.rms_norm.default, result = permute_1, input = convert_element_type, weight = convert_element_type_1, epsilon = 1e-06);  permute_1 = convert_element_type = convert_element_type_1 = None
getitem_1: "bf16[5, 4]" = auto_functionalized[1];  auto_functionalized = None

# File: /home/yhao/p9/pytorch/test/inductor/test_pattern_matcher.py:1680 in rms_pattern_static, code: at2 = auto_functionalized(
auto_functionalized_1 = torch.ops.higher_order.auto_functionalized(torch.ops.vllm.static_scaled_int8_quant.default, result = permute, input = getitem_1, scale = full_default, azp = None);  permute = getitem_1 = full_default = None
getitem_3: "i8[5, 4]" = auto_functionalized_1[1]
getitem_4: "f32[1, 1]" = auto_functionalized_1[2];  auto_functionalized_1 = None
return (getitem_3, getitem_4)
```
This happens before pattern matching, so is it expected to call `static_scaled_int8_quant` and `rms_norm` and return auto_functionalized_1 as outputs.

However, for pytorch before this PR, the `fx_graph_transformed.py`, which is after pattern matching, has the following code.
```python
 # File: /home/yhao/p9/pytorch/test/inductor/test_pattern_matcher.py:1748 in my_func_static, code: scale = torch.ones((1, 1))
full_default: "f32[1, 1]" = torch.ops.aten.full.default([1, 1], 1, dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False)

# No stacktrace found for following nodes
as_strided_default: "i8[20]" = torch.ops.aten.as_strided.default(permute, [20], [1], 0)
clone_default: "i8[20]" = torch.ops.aten.clone.default(as_strided_default);  as_strided_default = None
as_strided_default_1: "i8[5, 4]" = torch.ops.aten.as_strided.default(clone_default, [5, 4], [4, 1], 0);  clone_default = None
as_strided_default_2: "f32[1]" = torch.ops.aten.as_strided.default(full_default, [1], [1], 0)
clone_default_1: "f32[1]" = torch.ops.aten.clone.default(as_strided_default_2);  as_strided_default_2 = None
as_strided_default_3: "f32[1, 1]" = torch.ops.aten.as_strided.default(clone_default_1, [1, 1], [1, 1], 0);  clone_default_1 = None
static_scaled_int8_quant_default = torch.ops.vllm.static_scaled_int8_quant.default(as_strided_default_1, permute_1, as_strided_default_3);  as_strided_default_1 = permute_1 = static_scaled_int8_quant_default = None
fused_rms_norm_quant_static_default = torch.ops.vllm.fused_rms_norm_quant_static.default(permute, convert_element_type, convert_element_type_1, full_default, None, 1e-06);  convert_element_type = convert_element_type_1 = full_default = fused_rms_norm_quant_static_default = None
return (permute, as_strided_default_3)
```
Here, it returns `(permute, as_strided_default_3)` while `permute` is written by fused_rms_norm_quant_static and `as_strided_default_3` is written by `static_scaled_int8_quant`. This is wrong because in our expectation, the `static_scaled_int8_quant` should be removed since it is replaced with `fused_rms_norm_quant_static`. It is supposed to return `(permute, full_default)`.

The root cause is the following part. When we [generate patterns](5f4a21dc58/torch/_inductor/pattern_matcher.py (L1580)) with traced fx graph and call the following function, the indexing numbers' type int in traced graph are ignored in `ignore_types`. So, the final arguments of patterns for those two output items are like `(CallFunction(auto_functionalized,XXX)), *)`.

5f4a21dc58/torch/_inductor/pattern_matcher.py (L1839-L1847)

When we do pattern matching after we generated patterns in the following part, the `sorted(itertools.chain.from_iterable(nodes), reverse=True)` is `[getitem_4, getitem_3, getitem_1]`. The getitem_4's iteration is always FailedMatch because we always use the first element to do the pattern match here (it fails on different match functions before and after this PR, but the reason is always the indexing numbers issue)d4cdc09881/torch/_inductor/pattern_matcher.py (L848). However, when we do pattern matching for getitem_3, the child_match returns a match for getitem_3 again which is because the `*` pattern can match anything. Then the getitem_3's pattern matching returns a `[getitem_3, getitem_3]` as outputs which are wrong.
d4cdc09881/torch/_inductor/pattern_matcher.py (L856)

d4cdc09881/torch/_inductor/pattern_matcher.py (L1750-L1774)

This PR doesn't ignore `int` type when we generate patterns for getitem functions because integer indexing numbers are important to them. Thus, the indexing information is kept in patterns, ensuring correct matchings. With this PR, the above `child_match` returns a match for getitem_4, and the final getitem_3's pattern matching returns the correct `[getitem_3, getitem_4]`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140193
Approved by: https://github.com/eellison
2024-11-27 20:19:13 +00:00
Isuru Fernando
b37cfddeb3 Refactor ShapeGuardPrinter for future C++ addiiton (#140968)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140968
Approved by: https://github.com/anijain2305
ghstack dependencies: #140597
2024-11-27 20:09:58 +00:00
Francisco Massa
e5d02e0cfb Fix non-determinism in the partitioner (#141682)
When multiple nodes have similar sizes and are part of the `banned_nodes` (which is a `set` and not a `list`), there is non-determinism present in the partitioner due to sorting only by node-size.

This PR fixes this by also sorting by node name.

It would be good to add some tests, but I'm not sure about the best way to do it here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141682
Approved by: https://github.com/Chillee, https://github.com/yf225
2024-11-27 19:33:15 +00:00
PyTorch MergeBot
8c90a9a030 Revert "fix non termination in unflatten + state (#141494)"
This reverts commit 5d7c3701e4.

Reverted https://github.com/pytorch/pytorch/pull/141494 on behalf of https://github.com/jovianjaison due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/141494#issuecomment-2504639230))
2024-11-27 19:30:55 +00:00
Boyuan Feng
17fd53d8e5 [Inductor] Inplacing with Donated Buffer (#140113)
Currently, inductor does not inplace update a buffer if it is an input buffer. Because we don't know if an input will be used by other functions.

Donated buffer provides additional information that an input buffer will not be used by other functions. So we can inplace update donated buffer when possible.

[Dashboard](https://hud.pytorch.org/benchmark/torchbench/inductor_dynamic?dashboard=torchinductor&startTime=Mon,%2011%20Nov%202024%2018:14:36%20GMT&stopTime=Mon,%2018%20Nov%202024%2018:14:36%20GMT&granularity=hour&mode=training&dtype=amp&deviceName=cuda%20(a100)&lBranch=bf/donated-buffer-inplace&lCommit=5df0769c00e6f9000caeb10fd5cbf0b165f69c2a&rBranch=main&rCommit=2b39a8db7741b816b03677a9c6fec1af05640dee)

![image](https://github.com/user-attachments/assets/f19d961f-7973-418e-9de8-5c2a97950478)
![image](https://github.com/user-attachments/assets/df3bd6a9-58b8-4e8a-8397-9e3b1de9adfe)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140113
Approved by: https://github.com/eellison
2024-11-27 18:51:52 +00:00
Ke Wen
ad39a2fc46 [1/N] Decouple Flight Recorder from NCCL utils (#141648)
Part of the effort to make Flight Recorder device agnostic.

Step 1: Move it out of NCCLUtils.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141648
Approved by: https://github.com/fduwjj
2024-11-27 18:29:42 +00:00
eellison
fd553b9817 Add remaining method and tests for dtype propagation (#140057)
Adds the remaining unimplemented ops as well as an assertion failure if someone adds a new op without a dtype rule.

We test all unique pointwise operators registered as lowerings which have an opinfo. There will be some follow ups for this to work well with both `codegen_upcast_to_fp32` as True and False.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140057
Approved by: https://github.com/arui-meta, https://github.com/blaine-rister, https://github.com/ezyang
ghstack dependencies: #139945
2024-11-27 17:06:44 +00:00
eellison
566ceb3e7e Refactor dtype propagation (#139945)
A couple changes.

- Tries to reuse dtype propagation rules that were already registered in inductor. These were present both with `pointwise_overrides_data` and the `boolean_ops` list. Additionally, the registration of pointwise ops already specified dtype propagation rules. Saves those registrations and reuses them later.

- Factors out `get_promoted_dtype` which uses functools.lru_cache to take in non - CSEVariable args because those will not work with the functools cache.

Tests get added later in the stack when everything is implemented.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139945
Approved by: https://github.com/blaine-rister, https://github.com/arui-meta, https://github.com/ezyang
2024-11-27 16:57:02 +00:00
Edward Z. Yang
7ea0da2d57 Modest code motion in compile_fx (#141574)
Do code review with whitespace changes off. Check comments for what I changed.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141574
Approved by: https://github.com/bobrenjc93, https://github.com/jansel
ghstack dependencies: #141491, #141492
2024-11-27 13:38:14 +00:00
leslie-fang-intel
aa827e319e [Inductor][CPP] Extract common functions to be reused in other CPP Template (#141554)
**Summary**
Extract common internal functions from GEMM Template into public function, so these functions can be reused by the  subsequent group GEMM template.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141554
Approved by: https://github.com/jgong5
2024-11-27 09:52:18 +00:00
axel
763038db66 Clarify torch.arange floating-point rounding behavior (#141655)
Added documentation note clarifying the rounding behavior of `torch.arange` when using floating-point dtypes, particularly for reduced precision types like `bfloat16`. This helps users understand potential issues like repeated values and provides guidance on using integer dtypes for precise sequences.

## Changes
- Added explanatory note about floating-point rounding behavior and its effects
- Included specific mention of `bfloat16` dtype issues
- Added recommendation to use integer dtypes for precise sequences

Fixes [#137774](https://github.com/pytorch/pytorch/issues/137774)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141655
Approved by: https://github.com/cpuhrsch
2024-11-27 09:31:39 +00:00
Jaewoo Song
43a2a231d3 Support linear/BN fusion and follow the API guideline (#141585)
Current `fuse` function supports conv/BN fusions only. This commit is to support linear/BN fusion as well. Changes to follow the API guidelines are also applied.

(This will close the PR #141352 which I created for the same topic and got approval but had lint and API guideline problems.)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141585
Approved by: https://github.com/ezyang
2024-11-27 06:52:00 +00:00
Jesse Cai
5accae4197 [sparse] add extra options to _cslt_spare_mm (#137427)
Summary:

Splitting this PR into two, one for the cuSPARSELt improvements, and one
for the inductor lowering.

This PR adds in the additional cuSPARSELt bindings into pytorch.

* `torch._cslt_sparse_mm_search` will be deprecated in a future PR,
  so a warning has been added

* Added a header file for cuSPARSELtOps.cpp

* max_id is now available in `torch.backends.cusparselt` via
  `torch.backends.cusparselt.get_max_alg_id()`

* fixed meta registrations for float8

Test Plan:

python test/test_sparse_semi_structured.py

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137427
Approved by: https://github.com/cpuhrsch, https://github.com/eqy
2024-11-27 05:32:45 +00:00
vasiliy
3d5fe0ce78 torch._scaled_mm: support dims of size 0 for tensorwise scaling (#140967)
Summary:

Ensures we support dims of size 0 properly in `torch._scaled_mm`. Follows the behavior from `torch.mm`.

For now only enable support for tensorwise, we can tackle rowwise in a future PR.

Test Plan:

```
python test/test_matmul_cuda.py -k test_zero_dim
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140967
Approved by: https://github.com/eqy, https://github.com/drisspg
2024-11-27 04:07:52 +00:00
PyTorch MergeBot
6e61ff4fd3 Revert "Add truediv support in export serializer (#136364)"
This reverts commit 1df440dc4e.

Reverted https://github.com/pytorch/pytorch/pull/136364 on behalf of https://github.com/huydhn due to Sorry for reverting your change but its doc build failure is legit ([comment](https://github.com/pytorch/pytorch/pull/136364#issuecomment-2502620732))
2024-11-27 03:24:31 +00:00
Joel Schlosser
c9e2b3fefe NJT: Return correct number of outputs for chunk() on the batch dim (#141604)
Old logic was completely wrong, returning `chunk_size` chunks instead of the intended number. The original test didn't catch this because `chunk_size == num_chunks` :p New OpInfo-based testing covers it though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141604
Approved by: https://github.com/soulitzer
ghstack dependencies: #141500, #140736, #140161, #141392, #141506
2024-11-27 02:31:23 +00:00
Joel Schlosser
43121b6f0d Adjust output NJT ragged_idx for reductions and select() (#141506)
This fixes some bugs when performing reductions / select() on dims before the ragged dim. In this case, the output NJT has a smaller number of dims, and its ragged_idx should reflect that correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141506
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
ghstack dependencies: #141500, #140736, #140161, #141392
2024-11-27 02:25:53 +00:00
Arthur Feeney
0c587c324d DOC: Correct torch.trapezoid docstring (#141459)
This is super duper minor, but I believe this corrects a typo in the documentation of `torch.trapezoid`.

The documentation says the input is a 1-dimensional tensor $y_0, \dots, y_n$, but it uses summations going from 1 to n-1. Since it's summing over terms $y_i - y_{i-1}$, stopping at n-1 excludes the last partition $y_n - y_{n-1}$, which doesn't match the implementation...

```python
# (just showing it does include $y_n - y_{n-1}$)
torch.trapezoid([0, 0, 9999]) == 9999 / 2
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141459
Approved by: https://github.com/colesbury
2024-11-27 01:54:14 +00:00
Richard Barnes
fca0f34b83 Switch c10::string_view to std::string_view (#139635)
Shortens `string_view_starts_with` to `starts_with`. Adds some missing headers. Isolates `c10_string_view` to use with `get_fully_qualified_name`.

Test Plan: Sandcastle

Reviewed By: ezyang

Differential Revision: D64833558

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139635
Approved by: https://github.com/Skylion007, https://github.com/ezyang
2024-11-27 01:41:18 +00:00
Sypherd
d6276c2fbd Remove double space from warning (#141566)
Removes a double space from a warning in a way consistent with prior lines.

(Sorry, I saw this a few times when running vllm and the double space was killing me)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141566
Approved by: https://github.com/colesbury
2024-11-27 01:32:00 +00:00
Yoni Chechik
3e90c00a87 Missing space in torch.autograd.Function deprecation warning (#141562)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141562
Approved by: https://github.com/colesbury
2024-11-27 01:31:26 +00:00
zeshengzong
136ff97095 [dynamo][log] Remove print torch inner stacktrace to let users focus on their code error (#141553)
Fixes #140394

**Test Result**

```bash
TORCH_LOGS="graph_breaks" python test.py
```

```python
# test.py
from typing import List
import torch

def fn002(x):
    x = x + 1
    torch._dynamo.graph_break()
    x = x + 1
    return x

def fn001(x):
    return fn002(x)

torch.compile(fn001, backend="eager")(torch.randn(1))

```
**Before log**
```
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] Graph break in user code at /home/zong/code/pytorch/../scripts/dynamo.py:6
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] Reason: Unsupported: 'skip function graph_break in file /home/zong/code/pytorch/torch/_dynamo/decorators.py'
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] User code traceback:
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/../scripts/dynamo.py", line 11, in fn001
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     return fn002(x)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/../scripts/dynamo.py", line 6, in fn002
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     torch._dynamo.graph_break()
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] Traceback (most recent call last):
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 641, in wrapper
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     return inner_fn(self, inst)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]            ^^^^^^^^^^^^^^^^^^^^
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 2314, in CALL
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     self._call(inst)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 2308, in _call
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     self.call_function(fn, args, kwargs)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 879, in call_function
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/variables/functions.py", line 328, in call_function
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     return super().call_function(tx, args, kwargs)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/variables/functions.py", line 129, in call_function
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 885, in inline_user_function_return
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 3045, in inline_call
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     return cls.inline_call_(parent, func, args, kwargs)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 3171, in inline_call_
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     tracer.run()
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 1032, in run
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     while self.step():
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]           ^^^^^^^^^^^
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 944, in step
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     self.dispatch_table[inst.opcode](self, inst)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 641, in wrapper
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     return inner_fn(self, inst)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]            ^^^^^^^^^^^^^^^^^^^^
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 2314, in CALL
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     self._call(inst)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 2308, in _call
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     self.call_function(fn, args, kwargs)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/symbolic_convert.py", line 879, in call_function
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/variables/functions.py", line 708, in call_function
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     unimplemented(msg)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/torch/_dynamo/exc.py", line 313, in unimplemented
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     raise Unsupported(msg, case_name=case_name)
V1126 16:01:41.701000 1303718 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] torch._dynamo.exc.Unsupported: 'skip function graph_break in file /home/zong/code/pytorch/torch/_dynamo/decorators.py'
V1126 16:01:41.722000 1303718 torch/_dynamo/symbolic_convert.py:424] [1/0] [__graph_breaks] Graph break (details suppressed) in user code at /home/zong/code/pytorch/../scripts/dynamo.py:6
V1126 16:01:41.722000 1303718 torch/_dynamo/symbolic_convert.py:424] [1/0] [__graph_breaks] Reason: Unsupported: 'skip function graph_break in file /home/zong/code/pytorch/torch/_dynamo/decorators.py
```

**After log**
```
V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] Graph break in user code at /home/zong/code/pytorch/../scripts/dynamo.py:6
V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] Reason: Unsupported: 'skip function graph_break in file /home/zong/code/pytorch/torch/_dynamo/decorators.py'
V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks] User code traceback:
V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/../scripts/dynamo.py", line 11, in fn001
V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     return fn002(x)
V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]   File "/home/zong/code/pytorch/../scripts/dynamo.py", line 6, in fn002
V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]     torch._dynamo.graph_break()
V1126 16:01:19.900000 1303438 torch/_dynamo/symbolic_convert.py:416] [0/0] [__graph_breaks]
V1126 16:01:19.918000 1303438 torch/_dynamo/symbolic_convert.py:423] [1/0] [__graph_breaks] Graph break (details suppressed) in user code at /home/zong/code/pytorch/../scripts/dynamo.py:6
V1126 16:01:19.918000 1303438 torch/_dynamo/symbolic_convert.py:423] [1/0] [__graph_breaks] Reason: Unsupported: 'skip function graph_break in file /home/zong/code/pytorch/torch/_dynamo/decorators.py'
```

**Using tlparse get stacktrace**

The trace log implement for graph breaks in
5318bf8baf/torch/_dynamo/symbolic_convert.py (L417-L424)

**Get trace log by running**

```bash
TORCH_TRACE=/tmp/my_traced_log python test.py
```

**Using tlparse to get report**

```
tlparse dedicated_log_torch_trace_9unwqrxn.log  -o out1
```

**Result**

![image](https://github.com/user-attachments/assets/01d2ff25-90ec-4b9f-bcb6-5ae59ba65b35)

strack info in `0_0_0/dynamo_graph_break_reason_0.txt `
![image](https://github.com/user-attachments/assets/c4a04bd0-496a-4862-8230-c01f85e6f3c3)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141553
Approved by: https://github.com/shink, https://github.com/ezyang
2024-11-27 01:26:11 +00:00
Edward Z. Yang
8c8a484d72 Add some symbolic shapes guard logs to tlparse by default (#140867)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140867
Approved by: https://github.com/bdhirsh
2024-11-27 01:00:14 +00:00
cyy
2f082e1e56 [13/N] Fix extra warnings brought by clang-tidy-17 (#140897)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140897
Approved by: https://github.com/ezyang
2024-11-27 00:35:19 +00:00
bhack
1df440dc4e Add truediv support in export serializer (#136364)
Fixes #136113

- [x] Inital `truediv` coverage
- [ ] Expand/reduce coverage?
- [x] Add tests
- [x] Re-check docstrings
- [ ] Linting

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136364
Approved by: https://github.com/pianpwk

Co-authored-by: Angela Yi <angelayi@meta.com>
Co-authored-by: Pian Pawakapan <pianpwk@meta.com>
2024-11-27 00:31:47 +00:00
Xuehai Pan
07850bb2c1 [dynamo][pytree][1/N] make CXX pytree traceable: tree_iter / tree_leaves (#137397)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137397
Approved by: https://github.com/jansel
ghstack dependencies: #141360
2024-11-27 00:21:58 +00:00
Xuehai Pan
cdde73033e [dynamo] fix generic namedtuple support when the class is created via class MyTuple(NamedTuple, Generic[T]): ... (#141360)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141360
Approved by: https://github.com/jansel
2024-11-27 00:21:58 +00:00
vasiliy
605392bd06 add float8 types to LoggingTensor (#141385)
Summary:

float8 dtypes were missing from this map, adding

Test Plan:

CI, and unbreaks debugging in torchao

If there is an existing test I can add this to - lmk

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141385
Approved by: https://github.com/soulitzer
2024-11-26 23:39:57 +00:00
Kiuk Chung
5b0b16ca62 [torch/distributed] Make _SymmetricMemory.has_multicast_support() ret… (#141598)
`SymmetricMemory.has_multicast_support()` throws an exception rather than returning `False` when called with a `DeviceType` that does not support. For example:

```
 from torch._C._distributed_c10d import _SymmetricMemory
 from torch._C._autograd import DeviceType

try:
	supports_multicast = _SymmetricMemory.has_multicast_support(DeviceType.CPU, 0)
except RuntimeError as exc:
	assert str(exc) == "SymmetricMemory does not support device type cpu"
```

This is problematic when building PyTorch from source without `CUDASymmetricMemory.cu` since the [`@requires_multicast_support`](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_distributed.py#L353) test decorator will throw an exception rather than skipping the test (as intended)

This PR makes `_SymmetricMemory.has_multicast_support()` properly return `False` when multicast is not supported on the passed device.

cc) @malfet , @atalman

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141598
Approved by: https://github.com/yifuwang
2024-11-26 23:36:32 +00:00
Joel Schlosser
23793cf93d NJT unsqueeze() fixes (#141392)
This PR contains three `unsqueeze()`-related fixes for NJT:
1. Adjusts the output's `_ragged_idx` when `unsqueeze()` inserts a dim before the ragged dim
2. Corrects the unbind reference for `unsqueeze()` after the last input dim. For this case, the dim kwarg canonicalization logic needs to be applied wrt `inp.dim() + 1` to account for `dim=-1` properly
3. Adds ragged dim support to `unsqueeze()`, allowing for e.g. `(B, j1, D) -> (B, 1, j1, D)`. This is okay now after #137125

Note that `unsqueeze()` still doesn't support batch dim operation, and arguably should never support this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141392
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #141500, #140736, #140161
2024-11-26 22:38:35 +00:00
Joel Schlosser
9ee5d6f83c Initial NJT testing over dim type / views (#140161)
This PR introduces `ExtraOpData`, a structure that contains op metadata regarding whether the op is a view and the dim-related args it accepts. It also populates a huge database for dim-wise / view ops with this info.

Test logic (sample input generation, references) have been updated to utilize this data. It allows for a fairly generic set of sample inputs & a reference for the class of ops that accept a single NJT and operate dim-wise (AKA "unary dimwise ops").

Testing is added over the following ops:
* `chunk()`
* `narrow()`
* `select()`
* `split()`
* `split_with_sizes()`
* `squeeze()`
* `unflatten()`
* `unsqueeze()`

Most of the above do not operate on the ragged / batch dims or on non-contiguous NJTs, so the proper xfails are added as needed.

I also slipped in a couple minor fixes (sorry):
1. The `_wrap_jagged_dim()` helper now avoids assuming the `nt._ragged_idx == 1` and allows for a batch dim to be a valid input, disambiguating the converted inner dim as necessary through an additional `operating_on_batch` return value (i.e. both dim=0 and dim=1 map to dim=0 on the inner values tensor, since that dim represents a packed ragged dim for all batch items)
2. Padded dense -> NJT conversion requires shape gymnastics to operate with the restrictive FBGEMM kernel. The gymnastics were slightly wrong for the transposed NJT case, and this PR fixes that
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140161
Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch
ghstack dependencies: #141500, #140736
2024-11-26 22:08:08 +00:00
PyTorch MergeBot
65dbd5cc2d Revert "[Inductor] Inplacing with Donated Buffer (#140113)"
This reverts commit eecc8e362c.

Reverted https://github.com/pytorch/pytorch/pull/140113 on behalf of https://github.com/BoyuanFeng due to break test_donated_buffer_inplace internally since donated_buffer = False if is_fbcode() else True ([comment](https://github.com/pytorch/pytorch/pull/140113#issuecomment-2501954300))
2024-11-26 21:20:59 +00:00
Joel Schlosser
869d629c0f Forward / backward NJT support for several activation functions (#140736)
Several activation functions were unimplemented due to missing `pointwise` tags. This PR adds them and corresponding backwards implementations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140736
Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch
ghstack dependencies: #141500
2024-11-26 21:19:58 +00:00
Tristan Rice
9f4f061f89 PyProcessGroup: support rank, world size, group name/desc overrides (#141529)
This improves `PyProcessGroup` so you can override rank, world size and group name/desc methods from Python. These will be needed to support resizable process groups in torchft.

This also has some small fixes in test_c10d_pypg.py to use threads instead of processes which speeds up the test execution by ~10x.

Test plan:

```
pytest test/distributed/test_c10d_pypg.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141529
Approved by: https://github.com/fegin
2024-11-26 20:56:57 +00:00
Joel Schlosser
8ba555ec8a Fix where() for NJT (#141500)
**Background:** It's common to use `scalar_tensor()` in the input to `where()` to convert any scalars present to compatible tensors with matching options, *including layout*. This shows up in various places, notably including derivative formulas ([example](78491d6afc/tools/autograd/derivatives.yaml (L432-L434))). It causes problems for NJTs because they have `layout=torch.jagged` and it never makes sense to create a scalar tensor with this layout. Some of the breakage only seems to happen in CI for reasons I don't fully understand (see the revert of #140736 due to softshrink's derivative formula).

**This PR:**
* Allows non-contiguous NJT inputs to `where()` + adds tests for this
* Handles scalar tensor / dense tensor inputs for `condition` / `other` + adds tests for this
    * Uses limited `broadcast_tensors()` / `broadcast_to()` support
    * Improves `expand()` to work on non-contig NJTs
* Changes `scalar_tensor()` to use `torch.strided` instead of `torch.jagged` in both eager and torch.compile (i.e. meta registration)
* Changes backward formulas for `sinc`, `pow`, `special.i1`, and `special.i1e` to uses `scalar_tensor()` instead of e.g. `zeros({})`

**Alternative approach:** Update all problematic usages of `scalar_tensor()` to avoid ever passing `layout=torch.jagged`. This is an extensive change and includes `torch.where()` logic, a bunch of derivative formulas, and likely other places not yet discovered.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141500
Approved by: https://github.com/malfet, https://github.com/cpuhrsch, https://github.com/soulitzer
2024-11-26 20:13:27 +00:00
Zhengxu Chen
011650adc5 [sigmoid] Refactor out a helper function to insert const graph into top level graph. (#140854)
Summary: Add the helper function to put a const graph back to the toplevel graph, can be useful when we're taking const graphs from delegates.

Test Plan: CI

Reviewed By: trieuat

Differential Revision: D63031982

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140854
Approved by: https://github.com/SherlockNoMad
2024-11-26 20:07:46 +00:00
William Wen
6fa4356451 handle sympy.oo in bitwise_and/or value_ranges (#141522)
An internal test is failing due to not handling `sympy.oo` properly in bitwise_and/or value_ranges: [T208684142](https://www.internalfb.com/intern/tasks/?t=208684142). I don't know how to repro this - seems like this requires inductor to trigger as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141522
Approved by: https://github.com/ezyang
ghstack dependencies: #138777
2024-11-26 20:01:31 +00:00
Tsung-Hsien Lee
84f818f359 [DTensorTestbase] Fix TestFunc typing issue (#141513)
Summary: `TestFunc` is annotated as `Callable[[object], object]` which represents a callable that takes a single argument of any type (`object`) and returns a value of any type (`object`). However, in reality, `TestFunc` could be any number of arguments, as a result, the corret typing should be `Callable[[...], object]` instead which represents a callable that takes any number of arguments (including zero) and returns a value of any type (`object`).

Test Plan: Contbuild & OSS CI

Differential Revision: D66463705

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141513
Approved by: https://github.com/wz337, https://github.com/Skylion007
2024-11-26 19:48:34 +00:00
Nichols A. Romero
a99332eb25 [ROCM] Support Multi-GPU offline tuning in TunableOp (#139673)
This PR enhances offline tuning to support multi-GPUs.

High-level description of algorithm:
- Duplicate GEMMs are first eliminated
- GEMMs are distributed to multi-GPUs for tuning
- Results are gathered into a file with `_full` in the filename

Also adding support for GemmAndBias and ScaledGemm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139673
Approved by: https://github.com/jeffdaily, https://github.com/hongxiayang
2024-11-26 19:07:41 +00:00