Commit Graph

77739 Commits

Author SHA1 Message Date
Animesh Jain
c566f2465f [dynamo][dicts] Support hasattr on dicts (#134590)
Fixes - https://github.com/pytorch/pytorch/issues/134577

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134590
Approved by: https://github.com/Skylion007
ghstack dependencies: #134610
2024-08-28 07:35:18 +00:00
Animesh Jain
880e3d18a4 [dynamo][exceptions] Use exception subclass whenever possible (#134610)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134610
Approved by: https://github.com/drisspg, https://github.com/jansel
2024-08-28 07:35:12 +00:00
xingyuan li
bf7db4e4f9 [Inductor UT] Generalize inductor UT for intel GPU (#133309)
[Inductor UT] Generalize Inductor test case for Intel GPU.

- Reuse `test/inductor/test_decompose_mem_bound_mm.py`
- Reuse `test/inductor/test_inplacing_pass.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133309
Approved by: https://github.com/EikanWang, https://github.com/jansel, https://github.com/etaf
2024-08-28 06:17:43 +00:00
haozhe.zhu
2ba60a1618 fix torch.prod vectorized path for bool (#128009)
Fix https://github.com/pytorch/pytorch/issues/127866.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128009
Approved by: https://github.com/jgong5, https://github.com/albanD
2024-08-28 05:27:50 +00:00
Rachel Guo
89929d9abc [AOTI][Tooling][4/n] Add torch.save() for individual intermediate tensor (#133871)
Differential Revision: D61415304

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133871
Approved by: https://github.com/ColinPeppler
2024-08-28 04:48:00 +00:00
PyTorch UpdateBot
ca77f0a986 [executorch hash update] update the pinned executorch hash (#133386)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned executorch hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133386
Approved by: https://github.com/pytorchbot
2024-08-28 04:16:42 +00:00
PyTorch UpdateBot
e3308d835d [audio hash update] update the pinned audio hash (#134632)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned audio hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134632
Approved by: https://github.com/pytorchbot
2024-08-28 04:16:25 +00:00
cyy
bb4dfe90b8 [Reland] [1/N] Fix clang-tidy warnings in inductor (#134544)
Reland #131979 and exclude aoti_torch_index_put_out changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134544
Approved by: https://github.com/ColinPeppler
2024-08-28 04:05:06 +00:00
Yiming Zhou
71d0eff6e7 Back out "[pytorch][PR] [export] Schematize nn_module_stack serialization" (#134628)
Summary: Breaking backward compatibilities for serialization and deserialization

Differential Revision: D61888223

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134628
Approved by: https://github.com/angelayi
2024-08-28 03:45:46 +00:00
cyy
ec3f52dd27 [21/N] Fix clang-tidy warnings in jit (#134537)
Follows #133399

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134537
Approved by: https://github.com/Skylion007
2024-08-28 03:22:01 +00:00
Ke Wen
5beb859e74 [BE] no need to print stream in comm abort (#134362)
Strictly speaking, NCCL communicator has nothing to do with CUDA streams. Thus, we don't need to print stream in comm abort's message.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134362
Approved by: https://github.com/fduwjj, https://github.com/wconstab
2024-08-28 02:14:18 +00:00
Tristan Rice
f33bcbe5fd c10d/logging: add C10D_LOCK_GUARD (#134131)
This adds logs if we can't acquire locks in NCCLUtils and ProcessGroupNCCL for 30s.

This is motivated by some deadlocks were seeing and it's unclear if it's in NCCL or on the PyTorch side of things.

This required replacing most `std::mutex` with `std::timed_mutex` and `std::condition_variable_any` as appropriate.

Test plan:

existing CI for regressions

will add unit tests on `C10D_LOCK_GUARD`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134131
Approved by: https://github.com/c-p-i-o, https://github.com/fduwjj
2024-08-28 01:40:42 +00:00
Yu, Guangye
c45ca8092d Refactor caching device allocator utils (#130923)
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy
2024-08-28 01:35:23 +00:00
atalman
d96254631e [CD] Fix docker builds by installing setuptools after python build (#134631)
Follow up after https://github.com/pytorch/pytorch/pull/134595

Same error happens silently before the error addressed in the above PR (and build continues and builds invalid Docker):
```
#47 457.5 Traceback (most recent call last):
#47 457.5   File "<string>", line 1, in <module>
#47 457.5   File "/opt/_internal/cpython-3.12.0/lib/python3.12/site-packages/wheel/pep425tags.py", line 3, in <module>
#47 457.5     import distutils.util
#47 457.5 ModuleNotFoundError: No module named 'distutils'
#47 457.5 + local abi_tag=
#47 457.5 + ln -s /opt/_internal/cpython-3.12.0 /opt/python/
#47 457.5 + rm -f Python-3.12.0.tgz
```

The fix in  https://github.com/pytorch/pytorch/pull/134595 is no longer needed since we will install setuptools right after python installation.

Link: https://github.com/pytorch/pytorch/actions/runs/10584642913/job/29329366729#step:6:6041
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134631
Approved by: https://github.com/kit1980
2024-08-28 01:17:41 +00:00
Sun, Jiayi
2b95da7ef4 allow conv_bn mixed dtype folding in post-grad (#133968)
This PR relaxes the condition to allow conv_bn mixed dtype folding in post-grad.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133968
Approved by: https://github.com/leslie-fang-intel, https://github.com/jansel
2024-08-28 01:02:09 +00:00
FFFrog
f7467c3b95 using new device-agnostic api instead of old api like torch.cpu or torch.cuda (#134448)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134448
Approved by: https://github.com/guangyey, https://github.com/shink, https://github.com/albanD
2024-08-28 01:01:49 +00:00
Pian Pawakapan
0c7856973b [export] enumerate unsupported sympy.Functions (#134271) (#134598)
Summary:
There's 2 concepts of unsupported sympy.Functions in symbolic_shapes:
1) unsupported by the export solver, meaning the solver doesn't know how to provide useful fixes for those functions
2) unsupported by the sympy interpreter - meaning we can't reify them into FX nodes because the functions aren't present in PythonReferenceAnalysis

This splits the current call into a call for each version, with the Export solver the only user of 1). For 1), we enumerate the functions in _sympy/functions.py, and subtract the functions we know we can support. For 2) there's only 3 functions we've seen pop up in test cases.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

Differential Revision: D61863394

Pulled By: pianpwk

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134598
Approved by: https://github.com/angelayi
2024-08-28 00:34:38 +00:00
albanD
3b33f26513 Add device daemon (#131814)
Base implementation aiming towards https://github.com/pytorch/rfcs/pull/64

Details of the implementation and next steps in https://github.com/pytorch/pytorch/blob/gh/albanD/3/head/test/cpp_extensions/open_registration_extension/README.md

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131814
Approved by: https://github.com/ezyang
2024-08-27 23:32:07 +00:00
Laith Sakka
d6091c8726 Add compile time instruction count metric (#133834)
PYTHONPATH=$(pwd) python benchmarks/update_hint_benchmark.py out
as of this diff, compile_time_instruction_count counts the number of instruction from within
convert_frame.compile_inner
```
update_hint_regression,compile_time_instruction_count,10522459165
```
 will add result from CI once populated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133834
Approved by: https://github.com/aorenste
2024-08-27 23:29:02 +00:00
Max Podkorytov
ef0f5919c7 [ROCm][Inductor][CK] Fix codegen after ck signature change (#134483)
MakeArgument signature was changed in https://github.com/ROCm/composable_kernel/pull/1453 adding splitK argument to universal gemm templates which are used to codegen addmm and matmul

(part of the series started at #125453 )

# Testing
`pytest test/inductor/test_ck_backend.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134483
Approved by: https://github.com/ColinPeppler
2024-08-27 23:25:42 +00:00
Pian Pawakapan
5ead965026 [export] don't duck size for DIM.AUTO (#134486)
Summary: apparently DIM.AUTO leads to duck sizing, I didn't catch this. Doing the least intrusive fix possible by using `torch._dynamo.maybe_mark_dynamic()` under the hood.

Test Plan: added test

Differential Revision: D61809344

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134486
Approved by: https://github.com/avikchaudhuri
2024-08-27 23:00:26 +00:00
PyTorch MergeBot
30094bedbc Revert "[dynamo][dicts] Support hasattr on dicts (#134590)"
This reverts commit d23c0150f3.

Reverted https://github.com/pytorch/pytorch/pull/134590 on behalf of https://github.com/anijain2305 due to causing trunk CI failures ([comment](https://github.com/pytorch/pytorch/pull/134590#issuecomment-2313705582))
2024-08-27 22:52:52 +00:00
drisspg
d966d91e37 [FlexAttention] Fix Sparse block multiple to ceildiv instead for floor div (#134538)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134538
Approved by: https://github.com/yanboliang
ghstack dependencies: #134507, #134511
2024-08-27 22:04:57 +00:00
drisspg
f5c67917d3 [FlexAttention] Remove unused code (#134511)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134511
Approved by: https://github.com/yanboliang
ghstack dependencies: #134507
2024-08-27 22:04:57 +00:00
drisspg
856a8410f2 [FlexAttention] Create new variables for the subgraphs (#134507)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134507
Approved by: https://github.com/yanboliang, https://github.com/BoyuanFeng
2024-08-27 22:04:57 +00:00
Nikita Shulga
41e512a4cd [EZ] Restore test_unicode_comments (#134589)
This reverts changes introduced by test_jit.py by 43737bd78a and adds lint suppression for this it

As test name suggests it should have an unicode comment to make sure our parser can handle it

Part of the fix for https://github.com/pytorch/pytorch/issues/134422
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134589
Approved by: https://github.com/aorenste, https://github.com/Skylion007
2024-08-27 21:51:06 +00:00
Bob Ren
1ba39ec1d0 Add test case test_arange_length_with_float32_dtype (#134415)
Adding a test as a followup from https://github.com/pytorch/pytorch/pull/134296

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134415
Approved by: https://github.com/ezyang
2024-08-27 21:36:23 +00:00
PaliC
b58a0c3c4d [split build] fix distributed problems (#134502)
Should fix the issue where USE_C10D_NCCL was not getting propagated to libtorch_python.so
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134502
Approved by: https://github.com/yifuwang
2024-08-27 21:12:58 +00:00
David Berard
289486d007 Move attention kernels back from fake_impls to meta_registrations (#134288)
See #121528 for additional context.

In #120682, we moved the attention kernels from meta_registrations to fake_impls with the intent of fixing the device handling for seed/offset: these are typically on CPU. We needed to put the registrations in fake_impls to do this because meta_registrations doesn't have a way to specify device, whereas fake_impls does. But when we tried to actually fix the device types (#120839), we had to revert the PR because it broke cudagraph handling (during which seed/offset _are_ on CUDA).

Now, we want to put the registrations back in meta_registrations so that we can call these kernels with meta tensors. The use case is later in this stack - we want to be able to use the flop counter with these kernels.

Also - I specifically skip the `compare_tensor_meta()` check in test_fake / test_fake_autocast tests for the `_efficient_attention_forward` and `_flash_attention_forward` kernels, which fails because of the device mismatch from the seed/offset tensors. Then we can un-skip these opinfos. I verified that the efficient_attention_forward bug (#120842) is now caught by these opinfos if I revert the fix from this PR.

Differential Revision: [D61687369](https://our.internmc.facebook.com/intern/diff/D61687369)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134288
Approved by: https://github.com/drisspg
2024-08-27 21:10:36 +00:00
rzou
39ca96398b Update label_to_label with oncall: pt2 hierarchy. (#134582)
Test Plan:
- None
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134582
Approved by: https://github.com/clee2000
2024-08-27 21:05:40 +00:00
cyy
b567ca0f51 Remove unused imported names in python files (#134438)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134438
Approved by: https://github.com/zou3519
2024-08-27 20:44:04 +00:00
Animesh Jain
d23c0150f3 [dynamo][dicts] Support hasattr on dicts (#134590)
Fixes - https://github.com/pytorch/pytorch/issues/134577

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134590
Approved by: https://github.com/Skylion007
ghstack dependencies: #134039
2024-08-27 20:43:40 +00:00
Bo Li
16b8146c9e Exclude test_transformers and unit tests which require recent GPU arch (#132895)
This PR is to exclude test_transformers on ROCm temporarily and skip some unit tests which require recent GPU arch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132895
Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet
2024-08-27 20:40:53 +00:00
Yuanhao Ji
44dadf2506 [Fix] Check name when registering privateuse1 backend (#134071)
do some checks when registering privateuse1 backend to avoid using in-tree deivce names

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134071
Approved by: https://github.com/albanD
2024-08-27 20:28:30 +00:00
Colin Peppler
f754c0ae1b [easy] rm duplicate definition for inductor in TORCH_LOGS documentation (#134480)
already defined in
2eb9339b71/torch/_logging/_internal.py (L286-L287)

Test Plan: Sandcastle run

Differential Revision: D61806088

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134480
Approved by: https://github.com/eellison, https://github.com/mlazos
2024-08-27 20:15:10 +00:00
Moritz Hennen
fe6d0e3a04 Do not compute unnecessary tensor!=0 for bool tensors in count_nonzero (#134254)
Updated aten/src/ATen/native/TensorAdvancedIndexing.cpp to only reduce non-bool tensors before computing a sum

Since I have no expertise for MPS, I did leave the MPS backend untouched. Also, in `count_nonzero_impl` for CPU, I assumed the comparison can be optimized by the compiler for boolean values? 90c821814e/aten/src/ATen/native/TensorAdvancedIndexing.cpp (L2262-L2264) Fixes #133983

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134254
Approved by: https://github.com/albanD
2024-08-27 20:09:29 +00:00
xpfjmj
b744ed6816 Add a cpu_dispatch_key parameter to the cpu_fallback function (#134321)
Fixes #134322
Add a cpu_dispatch_key parameter to the cpu_fallback function to support fallback, for example, to SparseCPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134321
Approved by: https://github.com/albanD
2024-08-27 19:57:57 +00:00
Ivan Duka
adf401f822 Links to contributors' GitHub accounts (#133787)
Maintainers have the links to their GitHub profiles, but the major contributors do not have them.
I added the links to the contributors' GitHub accounts in case anyone wants to follow them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133787
Approved by: https://github.com/albanD
2024-08-27 19:56:08 +00:00
Nikita Shulga
534f43ddce [Doc] Fix rendering of the unicode characters (#134597)
https://github.com/pytorch/pytorch/pull/124771 introduced unicode escape sequences inside raw strings, which were not rendered correctly. Also fix typo in `\uue0 ` escape sequence (should have been `\u00e0`)
Fix it by relying on [string literal concatenation](https://docs.python.org/3/reference/lexical_analysis.html#string-literal-concatenation) to join raw and regular strings together during lexical analysis stage

Fixes https://github.com/pytorch/pytorch/issues/134422

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134597
Approved by: https://github.com/aorenste, https://github.com/Skylion007
2024-08-27 19:52:46 +00:00
Jerry Zhang
3ef4c27ab3 Update pt2e numeric debugger to use node.meta["custom"] field (#134040)
Summary:
With https://github.com/pytorch/pytorch/pull/131912 we now have a "custom" field in node.meta that can be preserved
in

* copy/deepcopy
* run_decompositions()
* serialization
* re-exporting

So we refactored numeric debugger to use this.

Test Plan:
python test/test_quantization.py TestNumericDebugger

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134040
Approved by: https://github.com/tarun292
2024-08-27 19:51:03 +00:00
Xu Han
ed494603c7 [inductor] calibration inductor windows uts (16/N) (#134587)
skip UT for `test/inductor/test_compiled_autograd.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134587
Approved by: https://github.com/jansel
2024-08-27 19:45:02 +00:00
Xu Han
b094972051 [inductor] calibration inductor windows uts (17/N) (#134588)
skip UTs for `test/inductor/test_minifier_isolate.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134588
Approved by: https://github.com/jansel
2024-08-27 19:41:17 +00:00
Xu Han
9d0e0e6f1d [inductor] calibration inductor windows uts (14/N) (#134585)
skip UT for `test/dynamo/test_exc.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134585
Approved by: https://github.com/jansel
2024-08-27 19:40:56 +00:00
Roy Hvaara
05ac7cd760 [MPS] Remove superfluous label/link (#134090)
This was probably intended to be a comment. I removed it since the issue is already linked in the warning below.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134090
Approved by: https://github.com/albanD
2024-08-27 19:37:33 +00:00
atalman
d5aefadb17 [CD] Fix docker builds by installing setuptools (#134595)
Seeing failures like this:
```
#49 844.6 //build_scripts/manylinux1-check.py:6: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
.....
[python 3/3] RUN bash build_scripts/build.sh && rm -r build_scripts:
846.9 ...it did, yay.
846.9 + for PYTHON in '/opt/python/*/bin/python'
846.9 + /opt/python/cpython-3.12.0/bin/python build_scripts/manylinux1-check.py
847.0 Traceback (most recent call last):
847.0   File "//build_scripts/manylinux1-check.py", line 55, in <module>
847.0     if is_manylinux1_compatible():
847.0        ^^^^^^^^^^^^^^^^^^^^^^^^^^
847.0   File "//build_scripts/manylinux1-check.py", line 6, in is_manylinux1_compatible
847.0     from distutils.util import get_platform
847.0 ModuleNotFoundError: No module named 'distutils'
------
```
PR: https://github.com/pytorch/pytorch/pull/134455

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134595
Approved by: https://github.com/kit1980, https://github.com/seemethere, https://github.com/malfet
2024-08-27 19:31:44 +00:00
Bin Bao
a4b44dd2ef [AOTI] Introduce DeferredCudaGridLine for cuda cpp wrapper (#129268)
Summary: Similar to https://github.com/pytorch/pytorch/pull/129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

Differential Revision: [D61800622](https://our.internmc.facebook.com/intern/diff/D61800622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129268
Approved by: https://github.com/angelayi
2024-08-27 19:23:25 +00:00
Xinya Zhang
5fd670e0ef [ROCM] Properly disable Flash Attention/Efficient Attention with environment variables (#133866)
Now `USE_FLASH_ATTENTION=0 USE_MEM_EFF_ATTENTION=0 python setup.py` can compile correctly

Fixes #125230

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133866
Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily, https://github.com/malfet
2024-08-27 18:24:29 +00:00
PyTorch MergeBot
5b392d22c6 Revert "fix stuck floordiv (#134150)"
This reverts commit 92c4771853.

Reverted https://github.com/pytorch/pytorch/pull/134150 on behalf of https://github.com/anijain2305 due to compile time regression internal ([comment](https://github.com/pytorch/pytorch/pull/134150#issuecomment-2313230404))
2024-08-27 18:23:44 +00:00
Xilun Wu
0159ebb654 [dtensor] add test for local_map decorator (#127752)
**Summary**
This PR is a follow-up of #126924 to address reviewer's comments:
1) add a test case to show the use of `local_map` as a function decorator.
2) simplify the logic of handling different data types of `out_placements`.
3) correct variable naming in test cases to match math formulas.

**Test**
see #126924

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127752
Approved by: https://github.com/wanchaol
2024-08-27 18:22:23 +00:00
Nikita Shulga
8de0d7690c Use newer toAccumulateType signature in Normalization.cpp (#134540)
Which fixes BatchNorm behavior for if called with empty tensors on MPS backed. Removed `expectedFailureMPS` in test_nn.py, deleted expected failure in `test_mps.py` and adjusted `skipIfMPS` to `expectedFailureMPS`  in BatchNorm2d OpInfo decorator, but restrict it only to the memory format tests

Test Plan: CI + `python3 -c "import torch; print(torch.nn.BatchNorm2d(3, device='mps')(torch.rand(0, 3, 2, 2, device='mps')))"`

Fixes https://github.com/pytorch/pytorch/issues/134423

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134540
Approved by: https://github.com/Skylion007, https://github.com/albanD
2024-08-27 18:09:20 +00:00