Commit Graph

258 Commits

Author SHA1 Message Date
Jane Xu
cfe970260a Clarify opt-einsum usage, fix #127109 (#137596)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137596
Approved by: https://github.com/albanD
2024-10-09 20:31:24 +00:00
Jesse Cai
bc21689136 [sparse][semi-structured] Add float8 dtype support to 24 sparsity (#136397)
Summary:

This PR adds `torch.float8e4m3fn` support to cuSPARSELt and `to_sparse_semi_structured`.

This will let users to run fp8 + 2:4 sparse matmuls on Hopper GPUs with
cusparselt >= 0.6.2, via to `scaled_mm` API.

```
A = rand_sparse_semi_structured_mask(256, 128, dtype=torch.float16)
B = torch.rand(dense_input_shape, device=device).to(torch.float16).t()

A_fp8, A_scale = to_float8(A)
B_fp8, B_scale = to_float8(B)

dense_result = torch._scaled_mm(
    A_fp8, B_fp8,
    scale_a=A_scale, scale_b=B_scale,
    out_dtype=out_dtype
)
A_fp8_sparse = to_sparse_semi_structured(A_fp8)
sparse_result = torch._scaled_mm(
    A_fp8_sparse, B_fp8,
    scale_a=A_scale, scale_b=B_scale,
    out_dtype=out_dtype
)
```

Note that to keep this consistent with normal torch behavior, calling
`torch.mm(A_fp8_sparse, B_fp8)` will raise a NotImplementedError.

I also turned on cuSPARSELt by default and added CUSPARSELT_MAX_ID to the
backend to make the tests a bit cleaner

Test Plan:
```
python test/test_sparse_semi_structured -k scaled_mm
python test/test_sparse_semi_structured -k fp8
```

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136397
Approved by: https://github.com/drisspg
2024-09-27 21:37:34 +00:00
Jianyu Huang
0a35986cdb Add option to configure reduced precision math backend for SDPA (#135964)
Summary: Address https://github.com/pytorch/pytorch/issues/135778 by adding a global flag to configure whether using high precision or low precision for math backend of SDPA.

Test Plan: buck2 run mode/opt //scripts/feikou/llm:run_attn_kernels

Differential Revision: D62625515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135964
Approved by: https://github.com/jbschlosser
2024-09-24 07:11:38 +00:00
maajidkhann
5a6ddbcc3b Extending the Pytorch vec backend for SVE (ARM) (#119571)
**Motivation:**
In Pytorch, Aten vectorization supports multiple platforms, including x86 and Arm, as well as multiple data types. It provides a generic implementation of Vector (Vec) type that allows the programmer to write code packing various primitives (such as floats) within 256bit & 512bits registers. It can be extended to support other ISAs easily by adding more VecISA sub-classes.

**Reference Link:** https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/cpu/vec

**This PR:**

* Our goal with this contribution is to add support for SVE backend for Vec in the Aten vectorization for CPU backend which can be benefitted by any ARM architecture supported CPU's that supports SVE.

* More about SVE ISA for ARM: [https://developer.arm.com/Architectures/Scalable Vector Extensions](https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions)

* We are using the ARM C Language Extensions for SVE (https://developer.arm.com/documentation/102699/0100/Optimizing-with-intrinsics ) to accelerate performance for various operators in the SVE backend for Vec.

* Currently we are adding support only for SVE ISA with the vector length of 256 bits (SVE 256). In future, we plan to extend this SVE support for other vector lengths as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119571
Approved by: https://github.com/malfet, https://github.com/snadampal

Co-authored-by: Divya Kotadiya <divya.kotadiya@fujitsu.com>
2024-09-18 18:59:10 +00:00
Jesse Cai
255cd75a97 [sparse] Add cuSPARSELt as a backend (#128534)
Summary:

This PR adds in cuSPARSELt as a backend to PyTorch.

It is now possible to see if cuSPARSELt is available and the version if
it is with
```
torch.backends.cusparselt.is_available()
torch.backends.cusparselt.version()
```

Test Plan:
```
python test/test_sparse_semi_structured.py -k test_cusparselt_backend
```

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128534
Approved by: https://github.com/cpuhrsch, https://github.com/eqy, https://github.com/syed-ahmed
2024-08-21 22:06:07 +00:00
Xuehai Pan
758a0a88a2 [BE][Easy] enable ruff rule PIE790: unnecessary pass statement (#133200)
This PR removes unnecessary `pass` statement. This is semanticly safe because the bytecode for the Python code does not change.

Note that if there is a docstring in the function, a empty function does not need a `pass` statement as placeholder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133200
Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/kit1980
2024-08-15 15:50:19 +00:00
Xuehai Pan
f3fce597e9 [BE][Easy][17/19] enforce style for empty lines in import segments in torch/[a-c]*/ and torch/[e-n]*/ (#129769)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769
Approved by: https://github.com/ezyang
2024-08-04 10:24:09 +00:00
Oguz Ulgen
72d2dba992 Add None return type to init (#132335)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335
Approved by: https://github.com/albanD
2024-08-01 15:26:45 +00:00
Luca Wehrstedt
f4f7aba75d Expose function to probe whether PyTorch was built with FlashAttention (#131894)
This is needed by downstream projects (e.g., xFormers) to determine whether they can count on FlashAttention in PyTorch or whether they need to build it themselves.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131894
Approved by: https://github.com/drisspg, https://github.com/eqy
2024-07-31 11:33:09 +00:00
PyTorch MergeBot
609447a626 Revert "[BE] typing for decorators - _jit_internal (#131573)"
This reverts commit f0f20f7e97.

Reverted https://github.com/pytorch/pytorch/pull/131573 on behalf of https://github.com/clee2000 due to breaking lint internally D60265575 ([comment](https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359))
2024-07-28 03:29:32 +00:00
Aaron Orenstein
f0f20f7e97 [BE] typing for decorators - _jit_internal (#131573)
See #131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131573
Approved by: https://github.com/oulgen, https://github.com/zou3519
ghstack dependencies: #131568, #131569, #131570, #131571, #131572
2024-07-25 22:24:19 +00:00
Aaron Orenstein
5a0068cc69 [BE] mypy: disallow untyped decorators (#131428)
Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations.

Step 1 - Enable the error and override in all the offending files.

#131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428
Approved by: https://github.com/justinchuby, https://github.com/oulgen
2024-07-23 21:50:55 +00:00
eqy
f845a7a91a [cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343)
Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes.

What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here...

Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343
Approved by: https://github.com/Skylion007
2024-06-30 19:22:16 +00:00
PyTorch MergeBot
999eec8dea Revert "[cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343)"
This reverts commit b7e7a4cb01.

Reverted https://github.com/pytorch/pytorch/pull/125343 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break some test_transformer running on internal A100 and V100 ([comment](https://github.com/pytorch/pytorch/pull/125343#issuecomment-2196202003))
2024-06-28 06:03:54 +00:00
Eddie Yan
b7e7a4cb01 [cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343)
Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes.

What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here...

Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343
Approved by: https://github.com/Skylion007
2024-06-26 00:49:18 +00:00
Deng Weishi
b542825066 Enable deterministic support for oneDNN (#127277)
This PR is a part of RFC https://github.com/pytorch/pytorch/issues/114848.
For the request for Torchbenchmark models, this PR enables the deterministic attribute for the oneDNN operators for XPU backends, like convolution, deconvolution and matmult.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127277
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/desertfire, https://github.com/gujinghui
2024-06-21 05:21:24 +00:00
PyTorch MergeBot
817ce6835b Revert "[cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343)"
This reverts commit 4c971932e8.

Reverted https://github.com/pytorch/pytorch/pull/125343 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/125343#issuecomment-2163690162))
2024-06-12 18:47:52 +00:00
eqy
4c971932e8 [cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343)
Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes.

What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here...

Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343
Approved by: https://github.com/Skylion007
2024-06-09 06:53:34 +00:00
Aaron Orenstein
62bcdc0ac9 Flip default value for mypy disallow_untyped_defs [4/11] (#127841)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841
Approved by: https://github.com/oulgen
2024-06-08 18:36:48 +00:00
Xuehai Pan
67ef2683d9 [BE] wrap deprecated function/class with typing_extensions.deprecated (#127689)
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.

Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.

Resolves #126888

- #126888

This PR is split from PR #126898.

- #126898

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689
Approved by: https://github.com/Skylion007
2024-06-02 12:30:43 +00:00
PyTorch MergeBot
033e733021 Revert "[BE] wrap deprecated function/class with typing_extensions.deprecated (#126898)"
This reverts commit 749a132fb0.

Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))
2024-05-31 19:47:24 +00:00
Xuehai Pan
749a132fb0 [BE] wrap deprecated function/class with typing_extensions.deprecated (#126898)
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.

Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.

UPDATE: Use `FutureWarning` instead of `DeprecationWarning`.

Resolves #126888

- #126888

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898
Approved by: https://github.com/albanD
2024-05-29 12:09:27 +00:00
albanD
af9acc4168 Fix public binding to actually traverse modules (#126103)
The current call passes in `['/actual/path']` to os.walk which is a string pointing to no path and thus silently leads to and empty traversal.
There is an unused function just above that handles that, so I guess this is what was supposed to be called.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126103
Approved by: https://github.com/suo
2024-05-15 19:36:03 +00:00
Aaron Gokaslan
34910f87f0 [BE]: Update ruff to v0.4.4 (#125031)
Update ruff version to 0.4.2. This version mostly has bugfixes for the new parser and also updates the f-string rule to be able to apply more fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125031
Approved by: https://github.com/albanD, https://github.com/malfet
2024-05-12 20:02:37 +00:00
Guokai Ma
4dd33a1c2b Better core binding in torch.backends.xeon.run_cpu when launced from torchrun with --nproc-per-node (#123711)
This PR fix `torch.backends.xeon.run_cpu` behavior when it is launched from `torchrun` with `--nproc-per-node` parameter.

As a CPU launcher, `run_cpu` would bind cores to each instance it launches using `numactl`, and assign cores to each instance evenly.

However, if we use `torchrun` to start `run_cpu` and use `--nproc-per-node` to create multiple `run_cpu` processes.   In this case, each `run_cpu` process would assume it can use all the CPU cores, which causes each `run_cpu` process compete for CPU cores.  This results in poor performance.

This PR recognize environment variable `LOCAL_WORLD_SIZE` and `LOCAL_RANK` set by `torchrun`, then use this information to further shard the cores bind to each instance.  With this PR, when launched by `torchrun --nproc-per-node ...`, different CPU cores will be bind to different workers, which maximize CPU utilization and application performance.

The specific use case this PR enabled is using TorchServe with DeepSpeed tensor parallel.  In this case, TorchServe would run `torchrun --nproc-per-node <tp_size>` to start tensor parallel workers it needed.  When run TorchServe on multisocket CPU server with DeepSpeed tensor parallel, we need this PR to achieve best performance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123711
Approved by: https://github.com/jingxu10, https://github.com/ezyang
2024-05-09 00:32:11 +00:00
Aaron Gokaslan
8cad88e1f3 [BE]: Improve exception typing. Remove NOQAs (#125535)
Improve some exception typing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125535
Approved by: https://github.com/albanD
2024-05-08 14:07:13 +00:00
Jeff Daily
6ede882c0b preferred blas library; cublaslt gemm implementation (#122106)
Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources.

The default blas implementation remains cublas or hipblas.  cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling `torch.backends.cuda.preferred_blas_library(backend="cublaslt")` or as an alias `backend="hipblaslt"`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122106
Approved by: https://github.com/lezcano
2024-04-22 15:38:22 +00:00
Aaron Gokaslan
c5fafe9f48 [BE]: TRY002 - Ban raising vanilla exceptions (#124570)
Adds a ruff lint rule to ban raising raw exceptions. Most of these should at the very least be runtime exception, value errors, type errors or some other errors. There are hundreds of instance of these bad exception types already in the codebase, so I have noqa'd most of them. Hopefully this error code will get commiters to rethink what exception type they should raise when they submit a PR.

I also encourage people to gradually go and fix all the existing noqas that have been added so they can be removed overtime and our exception typing can be improved.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124570
Approved by: https://github.com/ezyang
2024-04-21 22:26:40 +00:00
Aaron Gokaslan
1d6c5972c1 [BE]: Optimize min/max/sum comprehensions C419 (#123960)
Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied.

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960
Approved by: https://github.com/malfet
2024-04-12 23:54:15 +00:00
Kurman Karabukaev
67d3e4f2a2 [TorchElastic] Refactoring to support non-default logging strategy (#120691)
Summary:
Pulling out logging parameters into a logging specs that can be overridden (follow-up changes on possible mechanism)

Why?
Right now the logging approach is quite rigid:
- Requires for log directory to exist and not be empty
- Will create tempdir otherwise,
- Creates subdir for a run
- creates subdir for each attempt
- creates files named as stdout.log, stderr.log, error.json

In some instances some of the users would like to customize the behavior including file names based on context. And we do have right now a mechanism to template multiplexed teed output prefix.

With current changes, users can create custom log spec that can use env variables to change the behavior.

Notes:
Made `LaunchConf.logs_specs` as an optional field that will be bound to `DefaultLogsSpecs` instance. There are large number of clients (code) that use the API directly without using torchrun API. For those cases, we have to explicitly pass LogSpecs implementation if we would like to override the implementation. For the regular torchrun users, we can use pluggable approach proposed in the follow up change.

Test Plan: CI + unit tests

Differential Revision: D54176265

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120691
Approved by: https://github.com/ezyang
2024-02-29 20:59:17 +00:00
Eddie Yan
cd380c794f [CUDNN][SDPA] Experimental cuDNN Flash Attention v2 Inference (#115663)
#113713

Going to clean up some of the checks and will remove draft status after.
Can be tested on SM80+ with `TORCH_CUDNN_MHA_ENABLED=1`.

CC @drisspg @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115663
Approved by: https://github.com/drisspg
2024-02-14 22:02:06 +00:00
Nikita Shulga
eb9a3383c2 [MPS] Add naive std_mean implementation (#119777)
By just calling `std_mps` and `mean` in sequence

Move `var_mean` decomp to `ReduceOps.mm`, as it should be faster to skip dispatching to a Python, which one can validate by running the following script:
```python
from timeit import default_timer

import torch
from torch.utils.benchmark import Measurement, Timer

def bench_var_mean(
    m, n, k,
    dtype = torch.float32,
    device:str = "cpu",
) -> Measurement:
    setup = f"""
     x = torch.rand({m}, {n}, {k}, dtype={dtype}, device="{device}")
    """

    t = Timer(
        stmt="torch.var_mean(x, dim=1)", setup=setup, language="python", timer=default_timer
    )
    return t.blocked_autorange()

for x in [100, 1000]:
    rc = bench_var_mean(1000, x, 100, device="mps")
    print(f"{x:5} : {rc.mean*1e6:.2f} usec")
```
which before the change reports 681 and 1268 usec and after 668 and 684 (which probably means that GPU is not saturated, but overhead from switching between native and interpretable runtimes are shorter.

Fixes https://github.com/pytorch/pytorch/issues/119663

TODOs:
 - Refactor the codebase and implement proper composite function (that must be faster)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119777
Approved by: https://github.com/albanD
2024-02-13 21:51:29 +00:00
Catherine Lee
4f5785b6b3 Enable possibly-undefined error code (#118533)
Fixes https://github.com/pytorch/pytorch/issues/118129

Suppressions automatically added with

```
import re

with open("error_file.txt", "r") as f:
    errors = f.readlines()

error_lines = {}
for error in errors:
    match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
    if match:
        file_path, line_number, error_type = match.groups()
        if file_path not in error_lines:
            error_lines[file_path] = {}
        error_lines[file_path][int(line_number)] = error_type

for file_path, lines in error_lines.items():
    with open(file_path, "r") as f:
        code = f.readlines()
    for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
        code[line_number - 1] = code[line_number - 1].rstrip() + f"  # type: ignore[{error_type}]\n"
    with open(file_path, "w") as f:
        f.writelines(code)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Co-authored-by: Catherine Lee <csl@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
2024-01-30 21:07:01 +00:00
PyTorch MergeBot
40ece2e579 Revert "Enable possibly-undefined error code (#118533)"
This reverts commit 4f13f69a45.

Reverted https://github.com/pytorch/pytorch/pull/118533 on behalf of https://github.com/clee2000 due to sorry i'm trying to figure out a codev merge conflict, if this works i'll be back to rebase and merge ([comment](https://github.com/pytorch/pytorch/pull/118533#issuecomment-1917695185))
2024-01-30 19:00:34 +00:00
Edward Z. Yang
4f13f69a45 Enable possibly-undefined error code (#118533)
Fixes https://github.com/pytorch/pytorch/issues/118129

Suppressions automatically added with

```
import re

with open("error_file.txt", "r") as f:
    errors = f.readlines()

error_lines = {}
for error in errors:
    match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
    if match:
        file_path, line_number, error_type = match.groups()
        if file_path not in error_lines:
            error_lines[file_path] = {}
        error_lines[file_path][int(line_number)] = error_type

for file_path, lines in error_lines.items():
    with open(file_path, "r") as f:
        code = f.readlines()
    for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
        code[line_number - 1] = code[line_number - 1].rstrip() + f"  # type: ignore[{error_type}]\n"
    with open(file_path, "w") as f:
        f.writelines(code)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
2024-01-30 05:08:10 +00:00
Driss Guessous
b1f8b6b8fc Forward Fix accidental removal of import (#118572)
Summary:
This Diff is a forward fix for this PR: https://github.com/pytorch/pytorch/pull/114689

Where I accidentally removed the old import from backends/cuda.

Test Plan: Verrified on failing revert diff and it did indeed fix the issue

Reviewed By: DanilBaibak

Differential Revision: D53193454

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118572
Approved by: https://github.com/DanilBaibak
2024-01-30 02:07:19 +00:00
Edward Z. Yang
46712b019d Enable local_partial_types (#118467)
When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467
Approved by: https://github.com/Skylion007
ghstack dependencies: #118414, #118418, #118432
2024-01-28 13:38:22 +00:00
Colin Peppler
1f6aa4b336 [mypy] Enable follow_imports = normal for mypy-torch.backends.* (#116311)
Summary:

Test Plan:

```
lintrunner --take MYPYINDUCTOR --all-files
ok No lint issues.

lintrunner -a
ok No lint issues.
Successfully applied all patches.
```

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116311
Approved by: https://github.com/int3
2024-01-25 20:17:22 +00:00
drisspg
4e29f01bf2 Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689)
# Summary
Simplification of Backend Selection

This PR deprecates the `torch.backends/cuda/sdp_kernel` context manager and replaces it with a new context manager `torch.nn.attention.sdpa_kernel`. This context manager also changes the api for this context manager.

For `sdp_kernel` one would specify the backend choice by taking the negation of what kernel they would like to run. The purpose of this backend manager was to only to be a debugging tool, "turn off the math backend" and see if you can run one of the fused implementations.

Problems:
- This pattern makes sense if majority of users don't care to know anything about the backends that can be run. However, if users are seeking to use this context manager then they are explicitly trying to run a specific backend.
- This is not scalable. We are working on adding the cudnn backend and this API makes it so so that more implementations will need to be turned off if user wants to explicitly run a given backend.
- Discoverability of the current context manager. It is somewhat un-intutive that this backend manager is in backends/cuda/init when this now also controls the CPU fused kernel behavior. I think centralizing to attention namespace will be helpful.

Other concerns:
- Typically backends (kernels) for operators are entirely hidden from users and implementation details of the framework. We have exposed this to users already, albeit not by default and with beta warnings. Does making backends choices even more explicit lead to problems when we potentially want to remove existing backends, (perhaps inputs shapes will get covered by newer backends).

A nice side effect is now that we aren't using the `BACKEND_MAP` in test_transformers many, many dynamo failures are passing for CPU tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114689
Approved by: https://github.com/cpuhrsch
2024-01-24 22:28:04 +00:00
Michael Lazos
c51a4e64c0 Add support for compiling SDPAParams (#117207)
Allows us to `allow_in_graph` this `torch._C` struct for supporting scaled dot product attention.
helps unblock https://github.com/pytorch/pytorch/pull/116071

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117207
Approved by: https://github.com/voznesenskym
2024-01-19 05:51:15 +00:00
PyTorch MergeBot
2f84a9d37c Revert "[CUDNN][SDPA] Experimental cuDNN Flash Attention v2 Inference (#115663)"
This reverts commit 5aa92b5090.

Reverted https://github.com/pytorch/pytorch/pull/115663 on behalf of https://github.com/PaliC due to Unfortunately, this pr breaks cuda builds internally ([comment](https://github.com/pytorch/pytorch/pull/115663#issuecomment-1899388813))
2024-01-18 23:40:30 +00:00
Eddie Yan
5aa92b5090 [CUDNN][SDPA] Experimental cuDNN Flash Attention v2 Inference (#115663)
#113713

Going to clean up some of the checks and will remove draft status after.
Can be tested on SM80+ with `TORCH_CUDNN_MHA_ENABLED=1`.

CC @drisspg @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115663
Approved by: https://github.com/drisspg
2024-01-18 01:20:36 +00:00
Mikayla Gawarecki
0f6f582c0d Add config to disable TransformerEncoder/MHA fastpath (#112212)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112212
Approved by: https://github.com/jbschlosser
2024-01-02 23:59:30 +00:00
angelayi
6b91e6907e Add setUserEnabledNNPACK config (#116152)
When exporting a model with a convolution kernel on cpu, if mkldnn is disabled and nnpack is enabled, export will go down the nnpack optimized convolution kernel for certain shapes ((code pointer)[cd449e260c/aten/src/ATen/native/Convolution.cpp (L542-L552)]). This means that we will automatically create a guard on that certain shape. If users want to export without any restrictions, one option is to disable nnpack. However, no config function exists for this, so this PR is adding a config function, similar to the `set_mkldnn_enabled` function.

Original context is in https://fb.workplace.com/groups/1075192433118967/posts/1349589822345892/?comment_id=1349597102345164&reply_comment_id=1349677642337110.

To test the flag, the following script runs successfully:
```
import os

import torch
from torchvision.models import ResNet18_Weights, resnet18

torch.set_float32_matmul_precision("high")

model = resnet18(weights=ResNet18_Weights.DEFAULT)
model.eval()

with torch.no_grad():
    # device = "cuda" if torch.cuda.is_available() else "cpu"
    torch.backends.mkldnn.set_flags(False)
    torch.backends.nnpack.set_flags(False)   # <--- Added config
    device = "cpu"
    model = model.to(device=device)
    example_inputs = (torch.randn(2, 3, 224, 224, device=device),)
    batch_dim = torch.export.Dim("batch", min=2, max=32)
    so_path = torch._export.aot_compile(
        model,
        example_inputs,
        # Specify the first dimension of the input x as dynamic
        dynamic_shapes={"x": {0: batch_dim}},
        # Specify the generated shared library path
        options={
            "aot_inductor.output_path": os.path.join(os.getcwd(), "resnet18_pt2.so"),
            "max_autotune": True,
        },
    )

```

I'm not sure who to add as reviewer, so please feel free to add whoever is relevant!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116152
Approved by: https://github.com/malfet
2023-12-27 06:00:16 +00:00
Aaron Gokaslan
6de28e92d2 [BE]: Apply FURB118 (prev): replaces unnecessary lambdas with operator. (#116027)
This replaces a bunch of unnecessary lambdas with the operator package. This is semantically equivalent, but the operator package is faster, and arguably more readable. When the FURB rules are taken out of preview, I will enable it as a ruff check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116027
Approved by: https://github.com/malfet
2023-12-20 19:35:08 +00:00
Aaron Gokaslan
ee5d981249 [BE]: Enable RUFF PERF402 and apply fixes (#115505)
* Enable PERF402. Makes code more efficient and succinct by removing useless list copies that could be accomplished either via a list constructor or extend call. All test cases have noqa added since performance is not as sensitive in that folder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115505
Approved by: https://github.com/malfet
2023-12-20 18:01:24 +00:00
Nikita Shulga
b706c4116d [MPS] Add MacOS 14 runtime check (#115512)
Prerequisite for adding more complex type support and FFT operation

Check using `conjugateWithTensor:name:` selector defined as follows
```objc
/// Returns the complex conjugate of the input tensor elements.
///
/// - Parameters:
///   - tensor: The input tensor.
///   - name: An optional string which serves as an identifier for the operation..
/// - Returns: A valid `MPSGraphTensor` object containing the elementwise result of the applied operation.
-(MPSGraphTensor *) conjugateWithTensor:(MPSGraphTensor *) tensor
                                   name:(NSString * _Nullable) name
MPS_AVAILABLE_STARTING(macos(14.0), ios(17.0), tvos(17.0))
MPS_SWIFT_NAME( conjugate(tensor:name:) );
```

- Rename `isOnMacOS13orNewer(unsigned minor)` hook to `isOnMacOSorNewer(major, minor)`
- Replace `torch._C.__mps_is_on_macos_13_or_newer` with `torch._C._mps_is_on_macos_or_newer`
- Add `torch.backends.mps.is_macos_or_newer` public API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115512
Approved by: https://github.com/albanD
2023-12-11 21:11:42 +00:00
zabboud
7f9fafed53 Resolve docstring errors in throughput_benchmark.py, weak.py, _traceback.py, file_baton.py, _contextlib.py, _device.py, cpp_backtrace.py, bundled_inputs.py, run_cpu.py, hooks.py, mobile_optimizer.py, _freeze.py, __init__.py, mkldnn.py, dlpack.py (#113311)
Fixes #112633

Fixed errors relating to pydocstyle in the following files. The remaining errors are not covered in this issue. `torch/utils/dlpack.py` was not modified as the errors are relating to the function signature in the first line in the docstring which must be maintained as is for proper Sphinx interpretation.

```python
def from_dlpack(ext_tensor: Any) -> 'torch.Tensor':
    """from_dlpack(ext_tensor) -> Tensor
         .....
    """
```

pydocstyle torch/utils/_contextlib.py --count
before: 4
after: 0

pydocstyle torch/backends/mps/__init__.py --count
before: 8
after: 1

**remaining errors**
```
torch/backends/mps/__init__.py:1 at module level:
        D104: Missing docstring in public package
```

pydocstyle torch/backends/xeon/run_cpu.py --count
before: 13
after: 1

**remaining errors**
```
torch/backends/xeon/run_cpu.py:864 in public function `main`:
        D103: Missing docstring in public function
```

pydocstyle torch/backends/cpu/__init__.py --count
before: 2
after: 1

**remaining errors**
```
torch/backends/cpu/__init__.py:1 at module level:
        D104: Missing docstring in public package
```

pydocstyle torch/utils/cpp_backtrace.py --count
before: 4
after: 1

**remaining errors**
```
torch/utils/cpp_backtrace.py:1 at module level:
        D100: Missing docstring in public module
```

pydocstyle torch/utils/bundled_inputs.py --count
before: 8
after: 1

**remaining errors**
```
torch/utils/bundled_inputs.py:1 at module level:
        D100: Missing docstring in public module
```

pydocstyle torch/utils/file_baton.py --count
before: 8
after: 1

**remaining errors**
```
torch/utils/file_baton.py:1 at module level:
        D100: Missing docstring in public module
```

pydocstyle torch/utils/mobile_optimizer.py --count
before: 6
after: 1

**remaining errors**
```
torch/utils/mobile_optimizer.py:8 in public class `LintCode`:
        D101: Missing docstring in public class
```

pydocstyle torch/backends/opt_einsum/__init__.py --count
before: 7
after: 5

**remaining errors**
```
torch/backends/opt_einsum/__init__.py:1 at module level:
        D104: Missing docstring in public package
torch/backends/opt_einsum/__init__.py:67 in public function `set_flags`:
        D103: Missing docstring in public function
torch/backends/opt_einsum/__init__.py:77 in public function `flags`:
        D103: Missing docstring in public function
torch/backends/opt_einsum/__init__.py:93 in public class `OptEinsumModule`:
        D101: Missing docstring in public class
torch/backends/opt_einsum/__init__.py:94 in public method `__init__`:
        D107: Missing docstring in __init__
```

pydocstyle torch/utils/_device.py --count
before:  9
after: 6

**remaining errors**
```
torch/utils/_device.py:58 in public class `DeviceContext`:
        D101: Missing docstring in public class
torch/utils/_device.py:59 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/_device.py:62 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/utils/_device.py:68 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/utils/_device.py:73 in public method `__torch_function__`:
        D105: Missing docstring in magic method
torch/utils/_device.py:80 in public function `device_decorator`:
        D103: Missing docstring in public function

```

pydocstyle torch/utils/_freeze.py --count
before: 15
after: 7

**remaining errors**
```
torch/utils/_freeze.py:77 in public function `indent_msg`:
        D103: Missing docstring in public function
torch/utils/_freeze.py:89 in public class `FrozenModule`:
        D101: Missing docstring in public class
torch/utils/_freeze.py:100 in public class `Freezer`:
        D101: Missing docstring in public class
torch/utils/_freeze.py:101 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/_freeze.py:106 in public method `msg`:
        D102: Missing docstring in public method
torch/utils/_freeze.py:185 in public method `get_module_qualname`:
        D102: Missing docstring in public method
torch/utils/_freeze.py:206 in public method `compile_string`:
        D102: Missing docstring in public method

```

pydocstyle torch/utils/throughput_benchmark.py --count
before: 25
after: 8
**remaining errors**
```
torch/utils/throughput_benchmark.py:1 at module level:
        D100: Missing docstring in public module
torch/utils/throughput_benchmark.py:27 in public class `ExecutionStats`:
        D101: Missing docstring in public class
torch/utils/throughput_benchmark.py:28 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/throughput_benchmark.py:33 in public method `latency_avg_ms`:
        D102: Missing docstring in public method
torch/utils/throughput_benchmark.py:37 in public method `num_iters`:
        D102: Missing docstring in public method
torch/utils/throughput_benchmark.py:46 in public method `total_time_seconds`:
        D102: Missing docstring in public method
torch/utils/throughput_benchmark.py:50 in public method `__str__`:
        D105: Missing docstring in magic method
torch/utils/throughput_benchmark.py:94 in public method `__init__`:
        D107: Missing docstring in __init__

```

pydocstyle torch/utils/hooks.py --count

before: 14
after: 11

**remaining errors**
```
torch/utils/hooks.py:1 at module level:
        D100: Missing docstring in public module
torch/utils/hooks.py:23 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/hooks.py:34 in public method `remove`:
        D102: Missing docstring in public method
torch/utils/hooks.py:44 in public method `__getstate__`:
        D105: Missing docstring in magic method
torch/utils/hooks.py:50 in public method `__setstate__`:
        D105: Missing docstring in magic method
torch/utils/hooks.py:64 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/utils/hooks.py:67 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/utils/hooks.py:82 in public function `warn_if_has_hooks`:
        D103: Missing docstring in public function
torch/utils/hooks.py:103 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/hooks.py:188 in public method `setup_input_hook`:
        D102: Missing docstring in public method
torch/utils/hooks.py:197 in public method `setup_output_hook`:
        D102: Missing docstring in public method
```

pydocstyle torch/utils/_traceback.py --count
before: 19
after: 14

**remaining errors**
```
torch/utils/_traceback.py:47 in public function `report_compile_source_on_error`:
        D103: Missing docstring in public function
torch/utils/_traceback.py:160 in public class `CapturedTraceback`:
        D101: Missing docstring in public class
torch/utils/_traceback.py:163 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/_traceback.py:167 in public method `cleanup`:
        D102: Missing docstring in public method
torch/utils/_traceback.py:170 in public method `summary`:
        D102: Missing docstring in public method
torch/utils/_traceback.py:182 in public method `__getstate__`:
        D105: Missing docstring in magic method
torch/utils/_traceback.py:190 in public method `extract`:
        D205: 1 blank line required between summary line and description (found 0)
torch/utils/_traceback.py:190 in public method `extract`:
        D400: First line should end with a period (not 't')
torch/utils/_traceback.py:213 in public method `format`:
        D205: 1 blank line required between summary line and description (found 0)
torch/utils/_traceback.py:213 in public method `format`:
        D400: First line should end with a period (not 'f')
torch/utils/_traceback.py:213 in public method `format`:
        D401: First line should be in imperative mood (perhaps 'Format', not 'Formats')
torch/utils/_traceback.py:224 in public method `format_all`:
        D200: One-line docstring should fit on one line with quotes (found 3)
torch/utils/_traceback.py:247 in private function `_extract_symbolized_tb`:
        D205: 1 blank line required between summary line and description (found 0)
torch/utils/_traceback.py:247 in private function `_extract_symbolized_tb`:
        D400: First line should end with a period (not 'f')
```

pydocstyle torch/utils/mkldnn.py --count
before: 28
after: 26

**remaining errors**
```
torch/utils/mkldnn.py:1 at module level:
        D100: Missing docstring in public module
torch/utils/mkldnn.py:4 in public class `MkldnnLinear`:
        D101: Missing docstring in public class
torch/utils/mkldnn.py:5 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/mkldnn.py:19 in public method `__getstate__`:
        D105: Missing docstring in magic method
torch/utils/mkldnn.py:23 in public method `__setstate__`:
        D105: Missing docstring in magic method
torch/utils/mkldnn.py:29 in public method `forward`:
        D102: Missing docstring in public method
torch/utils/mkldnn.py:75 in public class `MkldnnConv1d`:
        D101: Missing docstring in public class
torch/utils/mkldnn.py:76 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/mkldnn.py:82 in public method `__setstate__`:
        D105: Missing docstring in magic method
torch/utils/mkldnn.py:88 in public class `MkldnnConv2d`:
        D101: Missing docstring in public class
torch/utils/mkldnn.py:89 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/mkldnn.py:100 in public method `__setstate__`:
        D105: Missing docstring in magic method
torch/utils/mkldnn.py:110 in public class `MkldnnConv3d`:
        D101: Missing docstring in public class
torch/utils/mkldnn.py:111 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/mkldnn.py:122 in public method `__setstate__`:
        D105: Missing docstring in magic method
torch/utils/mkldnn.py:133 in public class `MkldnnBatchNorm`:
        D101: Missing docstring in public class
torch/utils/mkldnn.py:136 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/mkldnn.py:155 in public method `__getstate__`:
        D105: Missing docstring in magic method
torch/utils/mkldnn.py:163 in public method `__setstate__`:
        D105: Missing docstring in magic method
torch/utils/mkldnn.py:171 in public method `forward`:
        D102: Missing docstring in public method
torch/utils/mkldnn.py:184 in public class `MkldnnPrelu`:
        D101: Missing docstring in public class
torch/utils/mkldnn.py:185 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/mkldnn.py:190 in public method `__getstate__`:
        D105: Missing docstring in magic method
torch/utils/mkldnn.py:194 in public method `__setstate__`:
        D105: Missing docstring in magic method
torch/utils/mkldnn.py:199 in public method `forward`:
        D102: Missing docstring in public method
torch/utils/mkldnn.py:205 in public function `to_mkldnn`:
        D103: Missing docstring in public function
```

pydocstyle torch/utils/weak.py --count
before: 32
after: 30

**remaining errors**
```
torch/utils/weak.py:1 at module level:
        D100: Missing docstring in public module
torch/utils/weak.py:42 in public class `WeakIdRef`:
        D101: Missing docstring in public class
torch/utils/weak.py:45 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/weak.py:54 in public method `__call__`:
        D102: Missing docstring in public method
torch/utils/weak.py:61 in public method `__hash__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:64 in public method `__eq__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:84 in public class `WeakIdKeyDictionary`:
        D101: Missing docstring in public class
torch/utils/weak.py:87 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/weak.py:131 in public method `__delitem__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:135 in public method `__getitem__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:138 in public method `__len__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:145 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:148 in public method `__setitem__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:151 in public method `copy`:
        D102: Missing docstring in public method
torch/utils/weak.py:162 in public method `__deepcopy__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:172 in public method `get`:
        D102: Missing docstring in public method
torch/utils/weak.py:175 in public method `__contains__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:182 in public method `items`:
        D102: Missing docstring in public method
torch/utils/weak.py:189 in public method `keys`:
        D102: Missing docstring in public method
torch/utils/weak.py:198 in public method `values`:
        D102: Missing docstring in public method
torch/utils/weak.py:216 in public method `popitem`:
        D102: Missing docstring in public method
torch/utils/weak.py:224 in public method `pop`:
        D102: Missing docstring in public method
torch/utils/weak.py:228 in public method `setdefault`:
        D102: Missing docstring in public method
torch/utils/weak.py:231 in public method `update`:
        D102: Missing docstring in public method
torch/utils/weak.py:241 in public method `__ior__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:245 in public method `__or__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:252 in public method `__ror__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:262 in public method `__eq__`:
        D105: Missing docstring in magic method
torch/utils/weak.py:276 in public method `__init__`:
        D107: Missing docstring in __init__
torch/utils/weak.py:280 in public method `__call__`:
        D102: Missing docstring in public method

```

@mikaylagawarecki @jbschlosser @svekars
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113311
Approved by: https://github.com/ezyang
2023-11-15 17:40:04 +00:00
drisspg
9b0f2f8d94 expose sdpa helpers to python (#110496)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110496
Approved by: https://github.com/jbschlosser
2023-11-15 07:34:34 +00:00
Aaron Gokaslan
18d7b8e4f7 [BE]: ruff apply rule PLW1510 to find silent subprocess errors (#113644)
Reopens #111682 that I messed up due to a bad rebase and triggered some issues with CLA. This explicitly adds check=True or False to any subprocess calls where appropriate.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113644
Approved by: https://github.com/ezyang, https://github.com/kit1980
2023-11-14 20:59:40 +00:00