This adds logs if we can't acquire locks in NCCLUtils and ProcessGroupNCCL for 30s.
This is motivated by some deadlocks were seeing and it's unclear if it's in NCCL or on the PyTorch side of things.
This required replacing most `std::mutex` with `std::timed_mutex` and `std::condition_variable_any` as appropriate.
Test plan:
existing CI for regressions
will add unit tests on `C10D_LOCK_GUARD`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134131
Approved by: https://github.com/c-p-i-o, https://github.com/fduwjj
Summary:
There's 2 concepts of unsupported sympy.Functions in symbolic_shapes:
1) unsupported by the export solver, meaning the solver doesn't know how to provide useful fixes for those functions
2) unsupported by the sympy interpreter - meaning we can't reify them into FX nodes because the functions aren't present in PythonReferenceAnalysis
This splits the current call into a call for each version, with the Export solver the only user of 1). For 1), we enumerate the functions in _sympy/functions.py, and subtract the functions we know we can support. For 2) there's only 3 functions we've seen pop up in test cases.
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10
Differential Revision: D61863394
Pulled By: pianpwk
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134598
Approved by: https://github.com/angelayi
PYTHONPATH=$(pwd) python benchmarks/update_hint_benchmark.py out
as of this diff, compile_time_instruction_count counts the number of instruction from within
convert_frame.compile_inner
```
update_hint_regression,compile_time_instruction_count,10522459165
```
will add result from CI once populated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133834
Approved by: https://github.com/aorenste
Summary: apparently DIM.AUTO leads to duck sizing, I didn't catch this. Doing the least intrusive fix possible by using `torch._dynamo.maybe_mark_dynamic()` under the hood.
Test Plan: added test
Differential Revision: D61809344
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134486
Approved by: https://github.com/avikchaudhuri
See #121528 for additional context.
In #120682, we moved the attention kernels from meta_registrations to fake_impls with the intent of fixing the device handling for seed/offset: these are typically on CPU. We needed to put the registrations in fake_impls to do this because meta_registrations doesn't have a way to specify device, whereas fake_impls does. But when we tried to actually fix the device types (#120839), we had to revert the PR because it broke cudagraph handling (during which seed/offset _are_ on CUDA).
Now, we want to put the registrations back in meta_registrations so that we can call these kernels with meta tensors. The use case is later in this stack - we want to be able to use the flop counter with these kernels.
Also - I specifically skip the `compare_tensor_meta()` check in test_fake / test_fake_autocast tests for the `_efficient_attention_forward` and `_flash_attention_forward` kernels, which fails because of the device mismatch from the seed/offset tensors. Then we can un-skip these opinfos. I verified that the efficient_attention_forward bug (#120842) is now caught by these opinfos if I revert the fix from this PR.
Differential Revision: [D61687369](https://our.internmc.facebook.com/intern/diff/D61687369)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134288
Approved by: https://github.com/drisspg
Maintainers have the links to their GitHub profiles, but the major contributors do not have them.
I added the links to the contributors' GitHub accounts in case anyone wants to follow them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133787
Approved by: https://github.com/albanD
Seeing failures like this:
```
#49 844.6 //build_scripts/manylinux1-check.py:6: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
.....
[python 3/3] RUN bash build_scripts/build.sh && rm -r build_scripts:
846.9 ...it did, yay.
846.9 + for PYTHON in '/opt/python/*/bin/python'
846.9 + /opt/python/cpython-3.12.0/bin/python build_scripts/manylinux1-check.py
847.0 Traceback (most recent call last):
847.0 File "//build_scripts/manylinux1-check.py", line 55, in <module>
847.0 if is_manylinux1_compatible():
847.0 ^^^^^^^^^^^^^^^^^^^^^^^^^^
847.0 File "//build_scripts/manylinux1-check.py", line 6, in is_manylinux1_compatible
847.0 from distutils.util import get_platform
847.0 ModuleNotFoundError: No module named 'distutils'
------
```
PR: https://github.com/pytorch/pytorch/pull/134455
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134595
Approved by: https://github.com/kit1980, https://github.com/seemethere, https://github.com/malfet
**Summary**
This PR is a follow-up of #126924 to address reviewer's comments:
1) add a test case to show the use of `local_map` as a function decorator.
2) simplify the logic of handling different data types of `out_placements`.
3) correct variable naming in test cases to match math formulas.
**Test**
see #126924
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127752
Approved by: https://github.com/wanchaol
Which fixes BatchNorm behavior for if called with empty tensors on MPS backed. Removed `expectedFailureMPS` in test_nn.py, deleted expected failure in `test_mps.py` and adjusted `skipIfMPS` to `expectedFailureMPS` in BatchNorm2d OpInfo decorator, but restrict it only to the memory format tests
Test Plan: CI + `python3 -c "import torch; print(torch.nn.BatchNorm2d(3, device='mps')(torch.rand(0, 3, 2, 2, device='mps')))"`
Fixes https://github.com/pytorch/pytorch/issues/134423
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134540
Approved by: https://github.com/Skylion007, https://github.com/albanD