Commit Graph

72362 Commits

Author SHA1 Message Date
cyy
b3fd94d15e [Distributed] [7/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124987)
This PR continues to clean clang-tidy warnings in torch/csrc/distributed/c10d, following #124701. In addition, libfmt dependency is added in CMake code to enable using it in the headers. The libfmt has to be added as private dependency to torch_cuda and torch_hip because they include torch/csrc/distributed/c10d/Utils.hpp which uses libfmt.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124987
Approved by: https://github.com/malfet
2024-04-27 07:22:27 +00:00
Yanbo Liang
ce503c1b40 Dynamo x autograd.Function supports setup_context (#124802)
Fixes part of #118397

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124802
Approved by: https://github.com/zou3519
2024-04-27 04:57:13 +00:00
eqy
a866bfff45 [cuDNN] cuDNN SDPA (Flash Attention) Backward (#122510)
#113713
currently passing trivial smoke tests but I just totally pattern-matched bits and pieces of the autograd defs

Will also collect benchmark data,

CC @drisspg

Co-authored-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122510
Approved by: https://github.com/drisspg
2024-04-27 04:15:49 +00:00
Nikita Shulga
5944a53555 [MPS] Fix nextafter for negative values (#125029)
By changing the logic to on older MacOS:
```cpp
bits += ((input > 0) ^ (input > other)) ? 1 : -1;
```
And use native `nextafter` on MacOS Sonoma (i.e. if Metal 3.1 is available)

TODO:
  - Add tests for infs and denorms

Fixes https://github.com/pytorch/pytorch/issues/124985

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125029
Approved by: https://github.com/Skylion007
2024-04-27 02:58:05 +00:00
Xia, Weiwen
35b332882b [Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387)
As the title
**Test plan**
python test/test_quantization.py -k test_linear_binary

Differential Revision: [D56288440](https://our.internmc.facebook.com/intern/diff/D56288440)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122387
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
ghstack dependencies: #123240
2024-04-27 02:40:57 +00:00
Tristan Rice
dc4c75ba72 elastic/rendezvous: make barrier and rank assignment operations O(n) instead of O(n^2) (#124982)
Summary:
This makes barrier and rank operations linear instead of quadratic with the number of workers. This drastically improves performance for rendezvous when running with over 1000 hosts.

This uses 2 approaches for different areas:

* local rank assignment: each worker does 1 set and 1 get, local ranks are assigned on the rank 0 host in a O(n) operation which reduces total store operations to be linear with number of workers.
* exit_barrier: use a counter and a final flag so each worker has to do max 1 set, 1 get and 1 add.

At 4000 hosts we see torchelastic be able to run in as little as 10 seconds down from 373 seconds.

Test Plan:
This is testing using many small tests running on a remote cluster.

{D56549942}

```
torchx run --scheduler mast -- --image=torchelastic_benchmark --j=4000x1
```

Differential Revision: D56605193

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124982
Approved by: https://github.com/kiukchung, https://github.com/kurman
2024-04-27 02:21:44 +00:00
Simon Fan
1a6fef15ef [compiled autograd] verbose logs for debugging cache misses (#124980)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124980
Approved by: https://github.com/jansel
ghstack dependencies: #124954
2024-04-27 01:10:37 +00:00
Simon Fan
43a7ab2a21 [compiled autograd] introduce verbose logs, add autograd node info to graph (#124954)
- sets it as a fake stack trace as we don't have a generic comment feature
- when verbose is disabled, still adds a contextmanager and flag checks. the alternative is to use MACROS, but that wouldn't be usable with TORCH_LOGS

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124954
Approved by: https://github.com/jansel
2024-04-27 01:10:37 +00:00
Xia, Weiwen
e592a609fd [Quant][ONEDNN] improve performance of qconv by reducing integration overhead (#123240)
## Description
Framework overhead is found to be big for the onednn qconv op (used for quantization with PT2E X86Inductor backend). This PR reduces the integration overhead by modifying the implementation of qconv.

## performance results
Running quantized Resnet50 on an Intel(R) Xeon(R) Platinum 8490H machine
Before
```
Average latency: 8.378 ms.
-------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                     Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-------------------------  ------------  ------------  ------------  ------------  ------------  ------------
onednn::qconv2d_pointwise        86.54%       6.954ms        87.42%       7.025ms     132.547us            53
```
After
```
Average latency: 6.255 ms.
-------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                     Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-------------------------  ------------  ------------  ------------  ------------  ------------  ------------
onednn::qconv2d_pointwise        85.05%       6.381ms        85.98%       6.451ms     121.717us            53
```
Test script:
```python
import torch
import torchvision
import time
import copy
import numpy as np
from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import (
    prepare_pt2e,
    convert_pt2e,
)
import torch.ao.quantization.quantizer.x86_inductor_quantizer as xiq
from torch.ao.quantization.quantizer.x86_inductor_quantizer import X86InductorQuantizer

torch._inductor.config.cpp.enable_kernel_profile=True
torch._inductor.config.profiler_mark_wrapper_call = True
torch._inductor.config.freezing = True
torch._inductor.config.cpp_wrapper = True

def bench_model(model, inputs):
    times =[]
    with torch.no_grad():
        for _ in range(5): # warm-up
            output = model(inputs)
        for _ in range(20):
            start_time = time.time()
            output = model(inputs)
            end_time = time.time()
            times.append(end_time - start_time)
        print ('Average latency: %0.3f ms.' % (np.median(times) * 1000.0))

        with torch.profiler.profile(activities=[torch.profiler.ProfilerActivity.CPU]) as p:
            out_ipex = model(inputs)
        print(p.key_averages().table(sort_by="self_cpu_time_total", row_limit=-1))

def pt2e_ptq(m, example_inputs):

    m = m.eval()

    exported_model = capture_pre_autograd_graph(m, example_inputs)
    quantizer = X86InductorQuantizer()
    quantizer.set_global(xiq.get_default_x86_inductor_quantization_config())
    prepared_model = prepare_pt2e(exported_model, quantizer)

    _ = prepared_model(*example_inputs)

    converted_model = convert_pt2e(prepared_model)
    torch.ao.quantization.move_exported_model_to_eval(converted_model)
    with torch.no_grad():
        optimized_model = torch.compile(converted_model)
        _ = optimized_model(*example_inputs)
        _ = optimized_model(*example_inputs)

    bench_model(optimized_model, *example_inputs)

    return optimized_model

if __name__ == "__main__":

    data = torch.randn(16, 3, 224, 224)
    model_fp = torchvision.models.resnet50(weights=torchvision.models.ResNet50_Weights.DEFAULT)
    pt2e_ptq(copy.deepcopy(model_fp), (data,))
```

Differential Revision: [D56288440](https://our.internmc.facebook.com/intern/diff/D56288440)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123240
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jerryzh168
2024-04-27 00:52:45 +00:00
Valentine233
368f5212fa [cpu] [inductor] decompose bmm for memory bound in lowering (#124826)
Fixes #124697. Resolve the issue of large regression of GPT-FAST MOE with `coordinate_descent_tuning` disabled.

To get better perf for memory bound case, we decompose bmm in lowering.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124826
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-04-27 00:19:10 +00:00
Valentine233
ebb8905e0c [cpu] add VecConvert between 8bits and 16bits (#124828)
The perf benefit was found in https://github.com/pytorch/pytorch/issues/124697#issuecomment-2071658300.

The PR adds intrinsic specializations between int8/uint8 and bf16/fp16.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124828
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-04-27 00:17:44 +00:00
Animesh Jain
fd24d8c05a [dynamo][nn module] Use correct sources for _call_impl (#124970)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124970
Approved by: https://github.com/jansel
ghstack dependencies: #124779, #124627
2024-04-26 23:18:30 +00:00
James Pang
43069c460e Correct check for Boolean list input type (#124899)
Summary:
This diff fixes a bug in PyTorch where when creating a tensor from a List of booleans, PyTorch was throwing an error.

This fix resolves that issue. All credit goes to swolchok for identifying the root cause of the issue and suggesting this fix.

Test Plan: Running our model end to end works as expected and no error occurs.

Differential Revision: D55990810

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124899
Approved by: https://github.com/zhxchen17
2024-04-26 22:25:43 +00:00
Xilun Wu
be2c09725a [dtensor][experimental] local_map (#123676)
**Summary**
This PR is attempt to land an experimental feature designed in #103686 . `local_map` is designed to allow users to apply to `DTensor` objects a function that was written to apply to `torch.Tensor`.

As a function, `local_map` takes in 2 required arguments (`func` and `out_placements`) and 3 optional arguments (`device_mesh`, `in_placements`, `redistribute_inputs`). `func` is the function to be applied to each local shard of input `DTensor`. `out_placements` is the sharding specification of output `DTensor`.

`local_map` returns a new function that does the following:

1. Infer `device_mesh` and `in_placements` from `DTensor` input if they're not provided. If `device_mesh` is provided, it must be identical to the device mesh of every `DTensor` input. If `in_placements` is provided, it serves as the required sharding specification of corresponding `DTensor` input before feeding its local shard into `func`. In case it is different from `DTensor`'s sharding specification, if `redistribute_inputs=False` an exception will be raised, otherwise perform a resharding to the required sharding.
2. Call `func` with the arguments passed in along with `device_mesh` except `DTensor`s. For `DTensor`, pass in its local shard. This `func` may include collectives.
3. For each output of `func` that has validate (i.e. not `None) sharding specification in `out_placements`, construct a new `DTensor` using the output and the specification. Use this `DTensor` as the output.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123676
Approved by: https://github.com/wanchaol
2024-04-26 22:23:59 +00:00
Luca Wehrstedt
83e7b9d25f [Inductor] Support fusion of chained reductions even if keepdims=True (#124843)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124843
Approved by: https://github.com/shunting314
2024-04-26 21:50:52 +00:00
Catherine Lee
a68a8c0f6b Disable test_binary_op_list_error_cases in test_foreach (#125046)
It's really flaky

ex
* https://github.com/pytorch/pytorch/issues/124636
* https://github.com/pytorch/pytorch/issues/124529

there are more
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125046
Approved by: https://github.com/huydhn
2024-04-26 21:25:38 +00:00
rzou
c6b7504d47 Fix torch.library.register_fake's module reporting (#125037)
torch.library.register_fake reports the python module the fake impl is
located in. This is used to check against
`m.set_python_module("foo.bar")` calls in C++.

The module reporting logic was wrong in most cases. This PR fixes it.

Test Plan:
- exhaustive tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125037
Approved by: https://github.com/williamwen42
2024-04-26 20:53:33 +00:00
Kai Londenberg
cd06c73cbd [Inductor Cutlass backend] Improved GEMM template (#124577)
Improves the Cutlass backend GEMM template:

 * Adds code which allows to create stand-alone test runners for Cutlass GEMM Kernels, which allows (manual) debugging of, for example, CUDA IMA errors or similar problems which occur in practice. Includes some utility code and tests to actually compile and run these standalone tests.
 * Cleans up the GEMM template code through various refactorings
 * Eliminates code sections and options that are unneccessary now that epilogue fusions are being removed.
 * Limits the scope of a workaround for (flaky) Cutlass issues with bias broadcasting to neccessary cases.
 * Puts some CPU runtime checks into #if / #endif blocks, such that it's possible to compile CUTLASS Kernels with lower CPU overhead.
 * Add documentation comments

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124577
Approved by: https://github.com/jansel
ghstack dependencies: #124576
2024-04-26 20:03:20 +00:00
Catherine Lee
4a6dfbe480 Add label to label config to auto apply labels based on other labels (#125042)
* Implemented in https://github.com/pytorch/test-infra/pull/5127,
* Tested in malfet/delete me: https://github.com/malfet/deleteme/issues/85
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125042
Approved by: https://github.com/huydhn
2024-04-26 19:58:56 +00:00
Aaron Orenstein
4e2b4c6ed6 Fix broken docs (#124940)
These were causing doctest to be unhappy.

In particular the doc from #124496 caused #124771 to fail "trunk / win-vs2019-cpu-py3 / test" to fail when pushing. Not sure why it wasn't a problem on the original PR.

Testing:

`./test/run_doctests.sh`:
  before:
```
=== 4 warnings in 11.21 seconds ===
```
  after:
```
===  in 11.11 seconds ===
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124940
Approved by: https://github.com/zou3519, https://github.com/atalman, https://github.com/huydhn
2024-04-26 19:24:52 +00:00
Ashwin Hari
9266e472e2 rename ort to maia in dynamo's ort backend. (#124967)
Fixes #124966

Co-authored-by: Thiago Crepaldi <thiagofc@microsoft.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124967
Approved by: https://github.com/thiagocrepaldi
2024-04-26 19:09:29 +00:00
Kurt Mohler
abcb42cdd2 Avoid COW materialize in various places (1) (#124984)
Most, not all, of these cases were found automatically with `git grep -n '^\s*\<const\>.*\*.*=.*\<data_ptr\>'`

Part of #97856

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124984
Approved by: https://github.com/Skylion007
2024-04-26 19:06:28 +00:00
Daohang Shi
2ea1e84d40 log pt2 config dict to signpost from inductor post grad (#124593)
Summary:
previous attempts don't work eventually.  D49720297 causes online train SEV due to extra importing.  D56299408 mitigates a tricky bug from Distributed Shampoo constructor but unfortutenaly didn't correct the scuba logging either.

see f552546983

Test Plan: {F1491621504}

Differential Revision: D56378270

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124593
Approved by: https://github.com/anijain2305
2024-04-26 18:57:11 +00:00
YangQun1
91d565da0c [dynamo] Add support for tensor's is_complex method (#124927)
This PR is to add support for tensor's is_complex method in dynamo. Take the following code as an example:
```python
   def test_tensor_is_complex(x):
        if x.is_complex():
            return x + 1
        else:
            return x - 1
```
Before this fix, the is_complex() call will cause a graph break "torch.* op returned non-Tensor bool call_method is_complex". After this fix, the graph break can be avoided.

Fixes #122692

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124927
Approved by: https://github.com/ezyang
2024-04-26 18:28:14 +00:00
Catherine Lee
781ea00c90 [TD] Query Github API for base (#122214)
A better query for the base commit of a PR.
Some ghstack PRs are not connected to main so git merge-base doesn't work.  Instead, use the Github API to query for the base of the PR, which should be more accurate

Sanity checked on one of Ed's ghstack PRs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122214
Approved by: https://github.com/seemethere
2024-04-26 18:21:24 +00:00
Huy Do
858fdd8c40 Remove cppwrapper option on inductor benchmark workflow (#124971)
I'm restoring the `training` and `inference` options after github.com/pytorch/pytorch/pull/124795 and remove the not less-known `cppwrapper` option instead per @desertfire suggestion.  The total number of parameters remains at 10.

Also, the default choice for training and inference are explicitly spelled out when dispatching the workflow manually to catch dev attention.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124971
Approved by: https://github.com/ezyang
2024-04-26 17:41:24 +00:00
chilli
392dc45597 Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124799
Approved by: https://github.com/drisspg
ghstack dependencies: #124444
2024-04-26 17:22:13 +00:00
PyTorch MergeBot
b4d39a5de9 Revert "[TD] Query Github API for base (#122214)"
This reverts commit b003e0f29e.

Reverted https://github.com/pytorch/pytorch/pull/122214 on behalf of https://github.com/clee2000 due to failing on main due to mistake ([comment](https://github.com/pytorch/pytorch/pull/122214#issuecomment-2079732105))
2024-04-26 16:42:51 +00:00
egienvalue
8461e7ed9e Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614)
Test the generic torch.Stream/Event with fake device gurad and hooks. Since we added a fake device backend, it is mutual exclusive to other backends. Tests will be skipped if TEST_CUDA or TEST_ROCM is true.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614
Approved by: https://github.com/albanD
ghstack dependencies: #123611, #123612
2024-04-26 16:17:54 +00:00
egienvalue
73744a2c00 torch.mtia module for MTIA device backend (#123612)
MTIA device has its own Module in PyTorch now.
torch.mtia has following APIs similar to other backends. The lazy_init is also supported.
```
__all__ = [
    "init",
    "is_available",
    "synchronize",
    "device_count",
    "current_device",
    "current_stream",
    "default_stream",
    "set_stream",
    "stream",
    "device",
]

```
------------
For device management. We expand AccleratorHooksInterface to support generic device management and it can be used in both C++ and PyThon.
```
def _accelerator_hooks_device_count() -> _int: ...
def _accelerator_hooks_set_current_device(device_index: _int) -> None: ...
def _accelerator_hooks_get_current_device() -> _int : ...
def _accelerator_hooks_exchange_device(device_index: _int) -> _int : ...
def _accelerator_hooks_maybe_exchange_device(device_index: _int) -> _int : ...
```

---------
Adding get_device_module API to retrieve device modules for different device types.
```
def get_device_module(device: Optional[Union[torch.device, str]] = None)
```
---------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123612
Approved by: https://github.com/albanD
ghstack dependencies: #123611
2024-04-26 16:17:54 +00:00
xinan.lin
36af9c0d7d [Aten] Fix XPU convolution_overrideable input memory format. (#124841)
[Aten] Fix convolution_overrideable input memory format.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124841
Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/albanD
2024-04-26 15:55:01 +00:00
Aaron Orenstein
a8574a9719 Fix global flake8 issues (#124771)
Prior to this `lintrunner --all-files --take FLAKE8` failed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124771
Approved by: https://github.com/Skylion007
ghstack dependencies: #124428
2024-04-26 15:35:53 +00:00
Aaron Orenstein
609c958281 Fix mypy issues in fake_tensor.py (#124428)
fake_tensor.py had mypy error ignored. That seems less than desirable.

Also added SafePyObjectT<T> which is a tagged wrapper around a SafePyObject but provides static type checking (with no other guarantees).

Used `SafePyObjectT<TorchDispatchModeKey>` on some of the TorchDispatchModeTLS API to ensure that we don't accidentally inject a different type than expected into the stack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124428
Approved by: https://github.com/malfet
2024-04-26 15:35:53 +00:00
Shan19900305
8d12ba9acf add methods for open device in PackedSequence module. (#124923)
1) add is_{custom_device_name}() and {custom_device_name}() for open device register;
2) fix open device failed testcases.

@ezyang  @bdhirsh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124923
Approved by: https://github.com/ezyang
2024-04-26 15:26:20 +00:00
Catherine Lee
b003e0f29e [TD] Query Github API for base (#122214)
A better query for the base commit of a PR.
Some ghstack PRs are not connected to main so git merge-base doesn't work.  Instead, use the Github API to query for the base of the PR, which should be more accurate

Sanity checked on one of Ed's ghstack PRs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122214
Approved by: https://github.com/seemethere
2024-04-26 15:16:36 +00:00
PyTorch MergeBot
6b54f9d3e1 Revert "fix Invalid call to aoti_torch_tensor_copy_ #123039 (#124037)"
This reverts commit f9379ebbbf.

Reverted https://github.com/pytorch/pytorch/pull/124037 on behalf of https://github.com/jeanschmidt due to introducing regressions in benchmark, see D56623194 for more details ([comment](https://github.com/pytorch/pytorch/pull/124037#issuecomment-2079574308))
2024-04-26 15:07:09 +00:00
DanilBaibak
6bef5e9f67 [CI] Add retry mechanism to check if the Docker daemon is running (#124728)
What is done:
* Skipped the 'Kill existing containers' step - ARC runners are always ephemeral.
* Added a retry mechanism to check if the Docker daemon is running.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124728
Approved by: https://github.com/seemethere, https://github.com/zxiiro, https://github.com/ZainRizvi
2024-04-26 14:36:32 +00:00
Aaron Gokaslan
2f3b0befed [BE]: Apply ruff FURB 118. (#124743)
Replaces various lambdas with operator.itemgetter which is more efficient (as it's a builtin function). Particularly useful for when lambdas are used as 'key' functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124743
Approved by: https://github.com/albanD, https://github.com/malfet
2024-04-26 14:34:52 +00:00
Brian Hirsh
fc2aa23c1e Test reland "AOTAutograd: gate view-replay behind config, not the def… (#124948)
A parallel attempt at landing https://github.com/pytorch/pytorch/pull/124945, but attempting to land through fbcode first

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124948
Approved by: https://github.com/albanD
2024-04-26 13:16:26 +00:00
Prachi Gupta
fc13c1c850 [aot_inductor] Enable test_aot_inductor tests for ROCm (#123393)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123393
Approved by: https://github.com/jithunnair-amd, https://github.com/malfet
2024-04-26 13:15:35 +00:00
Stonepia
3d8585e501 [XPU] Add manual_seed and synchronize method (#124709)
This PR set the following device-specific settings for xpu(Intel GPU) specific:
1. Set the manual seed for xpu
2. Set the synchronization method for xpu

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124709
Approved by: https://github.com/EikanWang, https://github.com/desertfire
2024-04-26 12:32:12 +00:00
Jerry Zhang
74afccdd80 [parametrization] fix requires_grad propagation (#124888)
Summary:
Previously the `requires_grad` is not propagated from original Tensor to decomposed tensors

Test Plan:
python test/test_parametrization.py -k test_register_parametrization_no_grad

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124888
Approved by: https://github.com/lezcano
2024-04-26 10:19:31 +00:00
PyTorch MergeBot
d1b25596d5 Revert "Add common used score_mod functions for templated attention (#124670)"
This reverts commit ed120b08c4.

Reverted https://github.com/pytorch/pytorch/pull/124670 on behalf of https://github.com/jeanschmidt due to Breaking internal CI, more info can be found in D56571389 ([comment](https://github.com/pytorch/pytorch/pull/124670#issuecomment-2079084881))
2024-04-26 10:18:18 +00:00
lezcano
bba59b718b Teach ShapeEnv that a <= b => a < b + 1 (#123436)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123436
Approved by: https://github.com/ezyang
ghstack dependencies: #123342
2024-04-26 10:18:01 +00:00
lezcano
fa5ea29863 Apply guard knowledge to all simplifications (#123342)
This was an oversight in a previous PR. We were just applying this
knowledge when the expression had an unbacked int
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123342
Approved by: https://github.com/ezyang
2024-04-26 10:18:00 +00:00
PyTorch MergeBot
359ff49bf4 Revert "[dtensor] move pad/unpad_tensor to separate utils (#124871)"
This reverts commit 0b0eea2229.

Reverted https://github.com/pytorch/pytorch/pull/124871 on behalf of https://github.com/jeanschmidt due to Broke internal tests, see D56587991 for more details ([comment](https://github.com/pytorch/pytorch/pull/124871#issuecomment-2079001103))
2024-04-26 09:30:34 +00:00
PyTorch MergeBot
35a82d4a4a Revert "Refresh OpOverloadPacket if a new OpOverload gets added (#124654)"
This reverts commit 872eeb0d7d.

Reverted https://github.com/pytorch/pytorch/pull/124654 on behalf of https://github.com/jeanschmidt due to Broken lots of internal signals, check D56571345 for more details ([comment](https://github.com/pytorch/pytorch/pull/124654#issuecomment-2078940680))
2024-04-26 08:56:03 +00:00
PyTorch MergeBot
7324ddd80c Revert "Delete erroneous print (#124972)"
This reverts commit 333f095d07.

Reverted https://github.com/pytorch/pytorch/pull/124972 on behalf of https://github.com/jeanschmidt due to Need to revert #124654 but this PR depends on it :( ([comment](https://github.com/pytorch/pytorch/pull/124972#issuecomment-2078936303))
2024-04-26 08:52:27 +00:00
Yu, Guangye
19a83eacb5 add new API torch.amp.is_autocast_available (#124938)
# Motivation
expose `torch._is_autocast_available` to `torch.amp.is_autocast_available` as a public api.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124938
Approved by: https://github.com/albanD
2024-04-26 08:45:20 +00:00
PyTorch MergeBot
a46c27d961 Revert "Verify types in custom op schemas (#124520)"
This reverts commit 141888765b.

Reverted https://github.com/pytorch/pytorch/pull/124520 on behalf of https://github.com/jeanschmidt due to Breaking internal tests check D56588015 for more details ([comment](https://github.com/pytorch/pytorch/pull/124520#issuecomment-2078917978))
2024-04-26 08:42:11 +00:00