pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jane Xu	cfe970260a	Clarify opt-einsum usage, fix #127109 (#137596 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137596 Approved by: https://github.com/albanD	2024-10-09 20:31:24 +00:00
Jesse Cai	bc21689136	[sparse][semi-structured] Add float8 dtype support to 24 sparsity (#136397 ) Summary: This PR adds `torch.float8e4m3fn` support to cuSPARSELt and `to_sparse_semi_structured`. This will let users to run fp8 + 2:4 sparse matmuls on Hopper GPUs with cusparselt >= 0.6.2, via to `scaled_mm` API. ``` A = rand_sparse_semi_structured_mask(256, 128, dtype=torch.float16) B = torch.rand(dense_input_shape, device=device).to(torch.float16).t() A_fp8, A_scale = to_float8(A) B_fp8, B_scale = to_float8(B) dense_result = torch._scaled_mm( A_fp8, B_fp8, scale_a=A_scale, scale_b=B_scale, out_dtype=out_dtype ) A_fp8_sparse = to_sparse_semi_structured(A_fp8) sparse_result = torch._scaled_mm( A_fp8_sparse, B_fp8, scale_a=A_scale, scale_b=B_scale, out_dtype=out_dtype ) ``` Note that to keep this consistent with normal torch behavior, calling `torch.mm(A_fp8_sparse, B_fp8)` will raise a NotImplementedError. I also turned on cuSPARSELt by default and added CUSPARSELT_MAX_ID to the backend to make the tests a bit cleaner Test Plan: ``` python test/test_sparse_semi_structured -k scaled_mm python test/test_sparse_semi_structured -k fp8 ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/136397 Approved by: https://github.com/drisspg	2024-09-27 21:37:34 +00:00
Jianyu Huang	0a35986cdb	Add option to configure reduced precision math backend for SDPA (#135964 ) Summary: Address https://github.com/pytorch/pytorch/issues/135778 by adding a global flag to configure whether using high precision or low precision for math backend of SDPA. Test Plan: buck2 run mode/opt //scripts/feikou/llm:run_attn_kernels Differential Revision: D62625515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135964 Approved by: https://github.com/jbschlosser	2024-09-24 07:11:38 +00:00
maajidkhann	5a6ddbcc3b	Extending the Pytorch vec backend for SVE (ARM) (#119571 ) Motivation: In Pytorch, Aten vectorization supports multiple platforms, including x86 and Arm, as well as multiple data types. It provides a generic implementation of Vector (Vec) type that allows the programmer to write code packing various primitives (such as floats) within 256bit & 512bits registers. It can be extended to support other ISAs easily by adding more VecISA sub-classes. Reference Link: https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/cpu/vec This PR: * Our goal with this contribution is to add support for SVE backend for Vec in the Aten vectorization for CPU backend which can be benefitted by any ARM architecture supported CPU's that supports SVE. * More about SVE ISA for ARM: [https://developer.arm.com/Architectures/Scalable Vector Extensions](https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions) * We are using the ARM C Language Extensions for SVE (https://developer.arm.com/documentation/102699/0100/Optimizing-with-intrinsics ) to accelerate performance for various operators in the SVE backend for Vec. * Currently we are adding support only for SVE ISA with the vector length of 256 bits (SVE 256). In future, we plan to extend this SVE support for other vector lengths as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119571 Approved by: https://github.com/malfet, https://github.com/snadampal Co-authored-by: Divya Kotadiya <divya.kotadiya@fujitsu.com>	2024-09-18 18:59:10 +00:00
Jesse Cai	255cd75a97	[sparse] Add cuSPARSELt as a backend (#128534 ) Summary: This PR adds in cuSPARSELt as a backend to PyTorch. It is now possible to see if cuSPARSELt is available and the version if it is with ``` torch.backends.cusparselt.is_available() torch.backends.cusparselt.version() ``` Test Plan: ``` python test/test_sparse_semi_structured.py -k test_cusparselt_backend ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/128534 Approved by: https://github.com/cpuhrsch, https://github.com/eqy, https://github.com/syed-ahmed	2024-08-21 22:06:07 +00:00
Xuehai Pan	758a0a88a2	[BE][Easy] enable `ruff` rule `PIE790`: unnecessary `pass` statement (#133200 ) This PR removes unnecessary `pass` statement. This is semanticly safe because the bytecode for the Python code does not change. Note that if there is a docstring in the function, a empty function does not need a `pass` statement as placeholder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133200 Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/kit1980	2024-08-15 15:50:19 +00:00
Xuehai Pan	f3fce597e9	[BE][Easy][17/19] enforce style for empty lines in import segments in `torch/[a-c]/` and `torch/[e-n]/` (#129769 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769 Approved by: https://github.com/ezyang	2024-08-04 10:24:09 +00:00
Oguz Ulgen	72d2dba992	Add None return type to init (#132335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335 Approved by: https://github.com/albanD	2024-08-01 15:26:45 +00:00
Luca Wehrstedt	f4f7aba75d	Expose function to probe whether PyTorch was built with FlashAttention (#131894 ) This is needed by downstream projects (e.g., xFormers) to determine whether they can count on FlashAttention in PyTorch or whether they need to build it themselves. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131894 Approved by: https://github.com/drisspg, https://github.com/eqy	2024-07-31 11:33:09 +00:00
PyTorch MergeBot	609447a626	Revert "[BE] typing for decorators - _jit_internal (#131573 )" This reverts commit `f0f20f7e97`. Reverted https://github.com/pytorch/pytorch/pull/131573 on behalf of https://github.com/clee2000 due to breaking lint internally D60265575 ([comment](https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359))	2024-07-28 03:29:32 +00:00
Aaron Orenstein	f0f20f7e97	[BE] typing for decorators - _jit_internal (#131573 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131573 Approved by: https://github.com/oulgen, https://github.com/zou3519 ghstack dependencies: #131568, #131569, #131570, #131571, #131572	2024-07-25 22:24:19 +00:00
Aaron Orenstein	5a0068cc69	[BE] mypy: disallow untyped decorators (#131428 ) Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations. Step 1 - Enable the error and override in all the offending files. #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428 Approved by: https://github.com/justinchuby, https://github.com/oulgen	2024-07-23 21:50:55 +00:00
eqy	f845a7a91a	[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343 ) Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes. What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here... Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343 Approved by: https://github.com/Skylion007	2024-06-30 19:22:16 +00:00
PyTorch MergeBot	999eec8dea	Revert "[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343 )" This reverts commit `b7e7a4cb01`. Reverted https://github.com/pytorch/pytorch/pull/125343 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break some test_transformer running on internal A100 and V100 ([comment](https://github.com/pytorch/pytorch/pull/125343#issuecomment-2196202003))	2024-06-28 06:03:54 +00:00
Eddie Yan	b7e7a4cb01	[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343 ) Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes. What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here... Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343 Approved by: https://github.com/Skylion007	2024-06-26 00:49:18 +00:00
Deng Weishi	b542825066	Enable deterministic support for oneDNN (#127277 ) This PR is a part of RFC https://github.com/pytorch/pytorch/issues/114848. For the request for Torchbenchmark models, this PR enables the deterministic attribute for the oneDNN operators for XPU backends, like convolution, deconvolution and matmult. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127277 Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/desertfire, https://github.com/gujinghui	2024-06-21 05:21:24 +00:00
PyTorch MergeBot	817ce6835b	Revert "[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343 )" This reverts commit `4c971932e8`. Reverted https://github.com/pytorch/pytorch/pull/125343 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/125343#issuecomment-2163690162))	2024-06-12 18:47:52 +00:00
eqy	4c971932e8	[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343 ) Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes. What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here... Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343 Approved by: https://github.com/Skylion007	2024-06-09 06:53:34 +00:00
Aaron Orenstein	62bcdc0ac9	Flip default value for mypy disallow_untyped_defs [4/11] (#127841 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841 Approved by: https://github.com/oulgen	2024-06-08 18:36:48 +00:00
Xuehai Pan	67ef2683d9	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. Resolves #126888 - #126888 This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689 Approved by: https://github.com/Skylion007	2024-06-02 12:30:43 +00:00
PyTorch MergeBot	033e733021	Revert "[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 )" This reverts commit `749a132fb0`. Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))	2024-05-31 19:47:24 +00:00
Xuehai Pan	749a132fb0	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. UPDATE: Use `FutureWarning` instead of `DeprecationWarning`. Resolves #126888 - #126888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898 Approved by: https://github.com/albanD	2024-05-29 12:09:27 +00:00
albanD	af9acc4168	Fix public binding to actually traverse modules (#126103 ) The current call passes in `['/actual/path']` to os.walk which is a string pointing to no path and thus silently leads to and empty traversal. There is an unused function just above that handles that, so I guess this is what was supposed to be called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126103 Approved by: https://github.com/suo	2024-05-15 19:36:03 +00:00
Aaron Gokaslan	34910f87f0	[BE]: Update ruff to v0.4.4 (#125031 ) Update ruff version to 0.4.2. This version mostly has bugfixes for the new parser and also updates the f-string rule to be able to apply more fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125031 Approved by: https://github.com/albanD, https://github.com/malfet	2024-05-12 20:02:37 +00:00
Guokai Ma	4dd33a1c2b	Better core binding in torch.backends.xeon.run_cpu when launced from torchrun with --nproc-per-node (#123711 ) This PR fix `torch.backends.xeon.run_cpu` behavior when it is launched from `torchrun` with `--nproc-per-node` parameter. As a CPU launcher, `run_cpu` would bind cores to each instance it launches using `numactl`, and assign cores to each instance evenly. However, if we use `torchrun` to start `run_cpu` and use `--nproc-per-node` to create multiple `run_cpu` processes. In this case, each `run_cpu` process would assume it can use all the CPU cores, which causes each `run_cpu` process compete for CPU cores. This results in poor performance. This PR recognize environment variable `LOCAL_WORLD_SIZE` and `LOCAL_RANK` set by `torchrun`, then use this information to further shard the cores bind to each instance. With this PR, when launched by `torchrun --nproc-per-node ...`, different CPU cores will be bind to different workers, which maximize CPU utilization and application performance. The specific use case this PR enabled is using TorchServe with DeepSpeed tensor parallel. In this case, TorchServe would run `torchrun --nproc-per-node <tp_size>` to start tensor parallel workers it needed. When run TorchServe on multisocket CPU server with DeepSpeed tensor parallel, we need this PR to achieve best performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123711 Approved by: https://github.com/jingxu10, https://github.com/ezyang	2024-05-09 00:32:11 +00:00
Aaron Gokaslan	8cad88e1f3	[BE]: Improve exception typing. Remove NOQAs (#125535 ) Improve some exception typing Pull Request resolved: https://github.com/pytorch/pytorch/pull/125535 Approved by: https://github.com/albanD	2024-05-08 14:07:13 +00:00
Jeff Daily	6ede882c0b	preferred blas library; cublaslt gemm implementation (#122106 ) Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources. The default blas implementation remains cublas or hipblas. cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling `torch.backends.cuda.preferred_blas_library(backend="cublaslt")` or as an alias `backend="hipblaslt"`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122106 Approved by: https://github.com/lezcano	2024-04-22 15:38:22 +00:00
Aaron Gokaslan	c5fafe9f48	[BE]: TRY002 - Ban raising vanilla exceptions (#124570 ) Adds a ruff lint rule to ban raising raw exceptions. Most of these should at the very least be runtime exception, value errors, type errors or some other errors. There are hundreds of instance of these bad exception types already in the codebase, so I have noqa'd most of them. Hopefully this error code will get commiters to rethink what exception type they should raise when they submit a PR. I also encourage people to gradually go and fix all the existing noqas that have been added so they can be removed overtime and our exception typing can be improved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124570 Approved by: https://github.com/ezyang	2024-04-21 22:26:40 +00:00
Aaron Gokaslan	1d6c5972c1	[BE]: Optimize min/max/sum comprehensions C419 (#123960 ) Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied. Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960 Approved by: https://github.com/malfet	2024-04-12 23:54:15 +00:00
Kurman Karabukaev	67d3e4f2a2	[TorchElastic] Refactoring to support non-default logging strategy (#120691 ) Summary: Pulling out logging parameters into a logging specs that can be overridden (follow-up changes on possible mechanism) Why? Right now the logging approach is quite rigid: - Requires for log directory to exist and not be empty - Will create tempdir otherwise, - Creates subdir for a run - creates subdir for each attempt - creates files named as stdout.log, stderr.log, error.json In some instances some of the users would like to customize the behavior including file names based on context. And we do have right now a mechanism to template multiplexed teed output prefix. With current changes, users can create custom log spec that can use env variables to change the behavior. Notes: Made `LaunchConf.logs_specs` as an optional field that will be bound to `DefaultLogsSpecs` instance. There are large number of clients (code) that use the API directly without using torchrun API. For those cases, we have to explicitly pass LogSpecs implementation if we would like to override the implementation. For the regular torchrun users, we can use pluggable approach proposed in the follow up change. Test Plan: CI + unit tests Differential Revision: D54176265 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120691 Approved by: https://github.com/ezyang	2024-02-29 20:59:17 +00:00
Eddie Yan	cd380c794f	[CUDNN][SDPA] Experimental cuDNN Flash Attention v2 Inference (#115663 ) #113713 Going to clean up some of the checks and will remove draft status after. Can be tested on SM80+ with `TORCH_CUDNN_MHA_ENABLED=1`. CC @drisspg @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/115663 Approved by: https://github.com/drisspg	2024-02-14 22:02:06 +00:00
Nikita Shulga	eb9a3383c2	[MPS] Add naive std_mean implementation (#119777 ) By just calling `std_mps` and `mean` in sequence Move `var_mean` decomp to `ReduceOps.mm`, as it should be faster to skip dispatching to a Python, which one can validate by running the following script: ```python from timeit import default_timer import torch from torch.utils.benchmark import Measurement, Timer def bench_var_mean( m, n, k, dtype = torch.float32, device:str = "cpu", ) -> Measurement: setup = f""" x = torch.rand({m}, {n}, {k}, dtype={dtype}, device="{device}") """ t = Timer( stmt="torch.var_mean(x, dim=1)", setup=setup, language="python", timer=default_timer ) return t.blocked_autorange() for x in [100, 1000]: rc = bench_var_mean(1000, x, 100, device="mps") print(f"{x:5} : {rc.mean*1e6:.2f} usec") ``` which before the change reports 681 and 1268 usec and after 668 and 684 (which probably means that GPU is not saturated, but overhead from switching between native and interpretable runtimes are shorter. Fixes https://github.com/pytorch/pytorch/issues/119663 TODOs: - Refactor the codebase and implement proper composite function (that must be faster) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119777 Approved by: https://github.com/albanD	2024-02-13 21:51:29 +00:00
Catherine Lee	4f5785b6b3	Enable possibly-undefined error code (#118533 ) Fixes https://github.com/pytorch/pytorch/issues/118129 Suppressions automatically added with ``` import re with open("error_file.txt", "r") as f: errors = f.readlines() error_lines = {} for error in errors: match = re.match(r"(.):(\d+):\d+: error:.\[(.*)\]", error) if match: file_path, line_number, error_type = match.groups() if file_path not in error_lines: error_lines[file_path] = {} error_lines[file_path][int(line_number)] = error_type for file_path, lines in error_lines.items(): with open(file_path, "r") as f: code = f.readlines() for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True): code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n" with open(file_path, "w") as f: f.writelines(code) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Co-authored-by: Catherine Lee <csl@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2024-01-30 21:07:01 +00:00
PyTorch MergeBot	40ece2e579	Revert "Enable possibly-undefined error code (#118533 )" This reverts commit `4f13f69a45`. Reverted https://github.com/pytorch/pytorch/pull/118533 on behalf of https://github.com/clee2000 due to sorry i'm trying to figure out a codev merge conflict, if this works i'll be back to rebase and merge ([comment](https://github.com/pytorch/pytorch/pull/118533#issuecomment-1917695185))	2024-01-30 19:00:34 +00:00
Edward Z. Yang	4f13f69a45	Enable possibly-undefined error code (#118533 ) Fixes https://github.com/pytorch/pytorch/issues/118129 Suppressions automatically added with ``` import re with open("error_file.txt", "r") as f: errors = f.readlines() error_lines = {} for error in errors: match = re.match(r"(.):(\d+):\d+: error:.\[(.*)\]", error) if match: file_path, line_number, error_type = match.groups() if file_path not in error_lines: error_lines[file_path] = {} error_lines[file_path][int(line_number)] = error_type for file_path, lines in error_lines.items(): with open(file_path, "r") as f: code = f.readlines() for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True): code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n" with open(file_path, "w") as f: f.writelines(code) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2024-01-30 05:08:10 +00:00
Driss Guessous	b1f8b6b8fc	Forward Fix accidental removal of import (#118572 ) Summary: This Diff is a forward fix for this PR: https://github.com/pytorch/pytorch/pull/114689 Where I accidentally removed the old import from backends/cuda. Test Plan: Verrified on failing revert diff and it did indeed fix the issue Reviewed By: DanilBaibak Differential Revision: D53193454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118572 Approved by: https://github.com/DanilBaibak	2024-01-30 02:07:19 +00:00
Edward Z. Yang	46712b019d	Enable local_partial_types (#118467 ) When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467 Approved by: https://github.com/Skylion007 ghstack dependencies: #118414, #118418, #118432	2024-01-28 13:38:22 +00:00
Colin Peppler	1f6aa4b336	[mypy] Enable follow_imports = normal for mypy-torch.backends.* (#116311 ) Summary: Test Plan: ``` lintrunner --take MYPYINDUCTOR --all-files ok No lint issues. lintrunner -a ok No lint issues. Successfully applied all patches. ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/116311 Approved by: https://github.com/int3	2024-01-25 20:17:22 +00:00
drisspg	4e29f01bf2	Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689 ) # Summary Simplification of Backend Selection This PR deprecates the `torch.backends/cuda/sdp_kernel` context manager and replaces it with a new context manager `torch.nn.attention.sdpa_kernel`. This context manager also changes the api for this context manager. For `sdp_kernel` one would specify the backend choice by taking the negation of what kernel they would like to run. The purpose of this backend manager was to only to be a debugging tool, "turn off the math backend" and see if you can run one of the fused implementations. Problems: - This pattern makes sense if majority of users don't care to know anything about the backends that can be run. However, if users are seeking to use this context manager then they are explicitly trying to run a specific backend. - This is not scalable. We are working on adding the cudnn backend and this API makes it so so that more implementations will need to be turned off if user wants to explicitly run a given backend. - Discoverability of the current context manager. It is somewhat un-intutive that this backend manager is in backends/cuda/init when this now also controls the CPU fused kernel behavior. I think centralizing to attention namespace will be helpful. Other concerns: - Typically backends (kernels) for operators are entirely hidden from users and implementation details of the framework. We have exposed this to users already, albeit not by default and with beta warnings. Does making backends choices even more explicit lead to problems when we potentially want to remove existing backends, (perhaps inputs shapes will get covered by newer backends). A nice side effect is now that we aren't using the `BACKEND_MAP` in test_transformers many, many dynamo failures are passing for CPU tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114689 Approved by: https://github.com/cpuhrsch	2024-01-24 22:28:04 +00:00
Michael Lazos	c51a4e64c0	Add support for compiling SDPAParams (#117207 ) Allows us to `allow_in_graph` this `torch._C` struct for supporting scaled dot product attention. helps unblock https://github.com/pytorch/pytorch/pull/116071 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117207 Approved by: https://github.com/voznesenskym	2024-01-19 05:51:15 +00:00
PyTorch MergeBot	2f84a9d37c	Revert "[CUDNN][SDPA] Experimental cuDNN Flash Attention v2 Inference (#115663 )" This reverts commit `5aa92b5090`. Reverted https://github.com/pytorch/pytorch/pull/115663 on behalf of https://github.com/PaliC due to Unfortunately, this pr breaks cuda builds internally ([comment](https://github.com/pytorch/pytorch/pull/115663#issuecomment-1899388813))	2024-01-18 23:40:30 +00:00
Eddie Yan	5aa92b5090	[CUDNN][SDPA] Experimental cuDNN Flash Attention v2 Inference (#115663 ) #113713 Going to clean up some of the checks and will remove draft status after. Can be tested on SM80+ with `TORCH_CUDNN_MHA_ENABLED=1`. CC @drisspg @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/115663 Approved by: https://github.com/drisspg	2024-01-18 01:20:36 +00:00
Mikayla Gawarecki	0f6f582c0d	Add config to disable TransformerEncoder/MHA fastpath (#112212 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112212 Approved by: https://github.com/jbschlosser	2024-01-02 23:59:30 +00:00
angelayi	6b91e6907e	Add setUserEnabledNNPACK config (#116152 ) When exporting a model with a convolution kernel on cpu, if mkldnn is disabled and nnpack is enabled, export will go down the nnpack optimized convolution kernel for certain shapes ((code pointer)[`cd449e260c/aten/src/ATen/native/Convolution.cpp (L542-L552)`]). This means that we will automatically create a guard on that certain shape. If users want to export without any restrictions, one option is to disable nnpack. However, no config function exists for this, so this PR is adding a config function, similar to the `set_mkldnn_enabled` function. Original context is in https://fb.workplace.com/groups/1075192433118967/posts/1349589822345892/?comment_id=1349597102345164&reply_comment_id=1349677642337110. To test the flag, the following script runs successfully: ``` import os import torch from torchvision.models import ResNet18_Weights, resnet18 torch.set_float32_matmul_precision("high") model = resnet18(weights=ResNet18_Weights.DEFAULT) model.eval() with torch.no_grad(): # device = "cuda" if torch.cuda.is_available() else "cpu" torch.backends.mkldnn.set_flags(False) torch.backends.nnpack.set_flags(False) # <--- Added config device = "cpu" model = model.to(device=device) example_inputs = (torch.randn(2, 3, 224, 224, device=device),) batch_dim = torch.export.Dim("batch", min=2, max=32) so_path = torch._export.aot_compile( model, example_inputs, # Specify the first dimension of the input x as dynamic dynamic_shapes={"x": {0: batch_dim}}, # Specify the generated shared library path options={ "aot_inductor.output_path": os.path.join(os.getcwd(), "resnet18_pt2.so"), "max_autotune": True, }, ) ``` I'm not sure who to add as reviewer, so please feel free to add whoever is relevant! Pull Request resolved: https://github.com/pytorch/pytorch/pull/116152 Approved by: https://github.com/malfet	2023-12-27 06:00:16 +00:00
Aaron Gokaslan	6de28e92d2	[BE]: Apply FURB118 (prev): replaces unnecessary lambdas with operator. (#116027 ) This replaces a bunch of unnecessary lambdas with the operator package. This is semantically equivalent, but the operator package is faster, and arguably more readable. When the FURB rules are taken out of preview, I will enable it as a ruff check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116027 Approved by: https://github.com/malfet	2023-12-20 19:35:08 +00:00
Aaron Gokaslan	ee5d981249	[BE]: Enable RUFF PERF402 and apply fixes (#115505 ) * Enable PERF402. Makes code more efficient and succinct by removing useless list copies that could be accomplished either via a list constructor or extend call. All test cases have noqa added since performance is not as sensitive in that folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115505 Approved by: https://github.com/malfet	2023-12-20 18:01:24 +00:00
Nikita Shulga	b706c4116d	[MPS] Add MacOS 14 runtime check (#115512 ) Prerequisite for adding more complex type support and FFT operation Check using `conjugateWithTensor:name:` selector defined as follows ```objc /// Returns the complex conjugate of the input tensor elements. /// /// - Parameters: /// - tensor: The input tensor. /// - name: An optional string which serves as an identifier for the operation.. /// - Returns: A valid `MPSGraphTensor` object containing the elementwise result of the applied operation. -(MPSGraphTensor ) conjugateWithTensor:(MPSGraphTensor ) tensor name:(NSString * _Nullable) name MPS_AVAILABLE_STARTING(macos(14.0), ios(17.0), tvos(17.0)) MPS_SWIFT_NAME( conjugate(tensor:name:) ); ``` - Rename `isOnMacOS13orNewer(unsigned minor)` hook to `isOnMacOSorNewer(major, minor)` - Replace `torch._C.__mps_is_on_macos_13_or_newer` with `torch._C._mps_is_on_macos_or_newer` - Add `torch.backends.mps.is_macos_or_newer` public API Pull Request resolved: https://github.com/pytorch/pytorch/pull/115512 Approved by: https://github.com/albanD	2023-12-11 21:11:42 +00:00
zabboud	7f9fafed53	Resolve docstring errors in throughput_benchmark.py, weak.py, _traceback.py, file_baton.py, _contextlib.py, _device.py, cpp_backtrace.py, bundled_inputs.py, run_cpu.py, hooks.py, mobile_optimizer.py, _freeze.py, __init__.py, mkldnn.py, dlpack.py (#113311 ) Fixes #112633 Fixed errors relating to pydocstyle in the following files. The remaining errors are not covered in this issue. `torch/utils/dlpack.py` was not modified as the errors are relating to the function signature in the first line in the docstring which must be maintained as is for proper Sphinx interpretation. ```python def from_dlpack(ext_tensor: Any) -> 'torch.Tensor': """from_dlpack(ext_tensor) -> Tensor ..... """ ``` pydocstyle torch/utils/_contextlib.py --count before: 4 after: 0 pydocstyle torch/backends/mps/__init__.py --count before: 8 after: 1 remaining errors ``` torch/backends/mps/__init__.py:1 at module level: D104: Missing docstring in public package ``` pydocstyle torch/backends/xeon/run_cpu.py --count before: 13 after: 1 remaining errors ``` torch/backends/xeon/run_cpu.py:864 in public function `main`: D103: Missing docstring in public function ``` pydocstyle torch/backends/cpu/__init__.py --count before: 2 after: 1 remaining errors ``` torch/backends/cpu/__init__.py:1 at module level: D104: Missing docstring in public package ``` pydocstyle torch/utils/cpp_backtrace.py --count before: 4 after: 1 remaining errors ``` torch/utils/cpp_backtrace.py:1 at module level: D100: Missing docstring in public module ``` pydocstyle torch/utils/bundled_inputs.py --count before: 8 after: 1 remaining errors ``` torch/utils/bundled_inputs.py:1 at module level: D100: Missing docstring in public module ``` pydocstyle torch/utils/file_baton.py --count before: 8 after: 1 remaining errors ``` torch/utils/file_baton.py:1 at module level: D100: Missing docstring in public module ``` pydocstyle torch/utils/mobile_optimizer.py --count before: 6 after: 1 remaining errors ``` torch/utils/mobile_optimizer.py:8 in public class `LintCode`: D101: Missing docstring in public class ``` pydocstyle torch/backends/opt_einsum/__init__.py --count before: 7 after: 5 remaining errors ``` torch/backends/opt_einsum/__init__.py:1 at module level: D104: Missing docstring in public package torch/backends/opt_einsum/__init__.py:67 in public function `set_flags`: D103: Missing docstring in public function torch/backends/opt_einsum/__init__.py:77 in public function `flags`: D103: Missing docstring in public function torch/backends/opt_einsum/__init__.py:93 in public class `OptEinsumModule`: D101: Missing docstring in public class torch/backends/opt_einsum/__init__.py:94 in public method `__init__`: D107: Missing docstring in __init__ ``` pydocstyle torch/utils/_device.py --count before: 9 after: 6 remaining errors ``` torch/utils/_device.py:58 in public class `DeviceContext`: D101: Missing docstring in public class torch/utils/_device.py:59 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/_device.py:62 in public method `__enter__`: D105: Missing docstring in magic method torch/utils/_device.py:68 in public method `__exit__`: D105: Missing docstring in magic method torch/utils/_device.py:73 in public method `__torch_function__`: D105: Missing docstring in magic method torch/utils/_device.py:80 in public function `device_decorator`: D103: Missing docstring in public function ``` pydocstyle torch/utils/_freeze.py --count before: 15 after: 7 remaining errors ``` torch/utils/_freeze.py:77 in public function `indent_msg`: D103: Missing docstring in public function torch/utils/_freeze.py:89 in public class `FrozenModule`: D101: Missing docstring in public class torch/utils/_freeze.py:100 in public class `Freezer`: D101: Missing docstring in public class torch/utils/_freeze.py:101 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/_freeze.py:106 in public method `msg`: D102: Missing docstring in public method torch/utils/_freeze.py:185 in public method `get_module_qualname`: D102: Missing docstring in public method torch/utils/_freeze.py:206 in public method `compile_string`: D102: Missing docstring in public method ``` pydocstyle torch/utils/throughput_benchmark.py --count before: 25 after: 8 remaining errors ``` torch/utils/throughput_benchmark.py:1 at module level: D100: Missing docstring in public module torch/utils/throughput_benchmark.py:27 in public class `ExecutionStats`: D101: Missing docstring in public class torch/utils/throughput_benchmark.py:28 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/throughput_benchmark.py:33 in public method `latency_avg_ms`: D102: Missing docstring in public method torch/utils/throughput_benchmark.py:37 in public method `num_iters`: D102: Missing docstring in public method torch/utils/throughput_benchmark.py:46 in public method `total_time_seconds`: D102: Missing docstring in public method torch/utils/throughput_benchmark.py:50 in public method `__str__`: D105: Missing docstring in magic method torch/utils/throughput_benchmark.py:94 in public method `__init__`: D107: Missing docstring in __init__ ``` pydocstyle torch/utils/hooks.py --count before: 14 after: 11 remaining errors ``` torch/utils/hooks.py:1 at module level: D100: Missing docstring in public module torch/utils/hooks.py:23 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/hooks.py:34 in public method `remove`: D102: Missing docstring in public method torch/utils/hooks.py:44 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/hooks.py:50 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/hooks.py:64 in public method `__enter__`: D105: Missing docstring in magic method torch/utils/hooks.py:67 in public method `__exit__`: D105: Missing docstring in magic method torch/utils/hooks.py:82 in public function `warn_if_has_hooks`: D103: Missing docstring in public function torch/utils/hooks.py:103 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/hooks.py:188 in public method `setup_input_hook`: D102: Missing docstring in public method torch/utils/hooks.py:197 in public method `setup_output_hook`: D102: Missing docstring in public method ``` pydocstyle torch/utils/_traceback.py --count before: 19 after: 14 remaining errors ``` torch/utils/_traceback.py:47 in public function `report_compile_source_on_error`: D103: Missing docstring in public function torch/utils/_traceback.py:160 in public class `CapturedTraceback`: D101: Missing docstring in public class torch/utils/_traceback.py:163 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/_traceback.py:167 in public method `cleanup`: D102: Missing docstring in public method torch/utils/_traceback.py:170 in public method `summary`: D102: Missing docstring in public method torch/utils/_traceback.py:182 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/_traceback.py:190 in public method `extract`: D205: 1 blank line required between summary line and description (found 0) torch/utils/_traceback.py:190 in public method `extract`: D400: First line should end with a period (not 't') torch/utils/_traceback.py:213 in public method `format`: D205: 1 blank line required between summary line and description (found 0) torch/utils/_traceback.py:213 in public method `format`: D400: First line should end with a period (not 'f') torch/utils/_traceback.py:213 in public method `format`: D401: First line should be in imperative mood (perhaps 'Format', not 'Formats') torch/utils/_traceback.py:224 in public method `format_all`: D200: One-line docstring should fit on one line with quotes (found 3) torch/utils/_traceback.py:247 in private function `_extract_symbolized_tb`: D205: 1 blank line required between summary line and description (found 0) torch/utils/_traceback.py:247 in private function `_extract_symbolized_tb`: D400: First line should end with a period (not 'f') ``` pydocstyle torch/utils/mkldnn.py --count before: 28 after: 26 remaining errors ``` torch/utils/mkldnn.py:1 at module level: D100: Missing docstring in public module torch/utils/mkldnn.py:4 in public class `MkldnnLinear`: D101: Missing docstring in public class torch/utils/mkldnn.py:5 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/mkldnn.py:19 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:23 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:29 in public method `forward`: D102: Missing docstring in public method torch/utils/mkldnn.py:75 in public class `MkldnnConv1d`: D101: Missing docstring in public class torch/utils/mkldnn.py:76 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/mkldnn.py:82 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:88 in public class `MkldnnConv2d`: D101: Missing docstring in public class torch/utils/mkldnn.py:89 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/mkldnn.py:100 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:110 in public class `MkldnnConv3d`: D101: Missing docstring in public class torch/utils/mkldnn.py:111 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/mkldnn.py:122 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:133 in public class `MkldnnBatchNorm`: D101: Missing docstring in public class torch/utils/mkldnn.py:136 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/mkldnn.py:155 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:163 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:171 in public method `forward`: D102: Missing docstring in public method torch/utils/mkldnn.py:184 in public class `MkldnnPrelu`: D101: Missing docstring in public class torch/utils/mkldnn.py:185 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/mkldnn.py:190 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:194 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:199 in public method `forward`: D102: Missing docstring in public method torch/utils/mkldnn.py:205 in public function `to_mkldnn`: D103: Missing docstring in public function ``` pydocstyle torch/utils/weak.py --count before: 32 after: 30 remaining errors ``` torch/utils/weak.py:1 at module level: D100: Missing docstring in public module torch/utils/weak.py:42 in public class `WeakIdRef`: D101: Missing docstring in public class torch/utils/weak.py:45 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/weak.py:54 in public method `__call__`: D102: Missing docstring in public method torch/utils/weak.py:61 in public method `__hash__`: D105: Missing docstring in magic method torch/utils/weak.py:64 in public method `__eq__`: D105: Missing docstring in magic method torch/utils/weak.py:84 in public class `WeakIdKeyDictionary`: D101: Missing docstring in public class torch/utils/weak.py:87 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/weak.py:131 in public method `__delitem__`: D105: Missing docstring in magic method torch/utils/weak.py:135 in public method `__getitem__`: D105: Missing docstring in magic method torch/utils/weak.py:138 in public method `__len__`: D105: Missing docstring in magic method torch/utils/weak.py:145 in public method `__repr__`: D105: Missing docstring in magic method torch/utils/weak.py:148 in public method `__setitem__`: D105: Missing docstring in magic method torch/utils/weak.py:151 in public method `copy`: D102: Missing docstring in public method torch/utils/weak.py:162 in public method `__deepcopy__`: D105: Missing docstring in magic method torch/utils/weak.py:172 in public method `get`: D102: Missing docstring in public method torch/utils/weak.py:175 in public method `__contains__`: D105: Missing docstring in magic method torch/utils/weak.py:182 in public method `items`: D102: Missing docstring in public method torch/utils/weak.py:189 in public method `keys`: D102: Missing docstring in public method torch/utils/weak.py:198 in public method `values`: D102: Missing docstring in public method torch/utils/weak.py:216 in public method `popitem`: D102: Missing docstring in public method torch/utils/weak.py:224 in public method `pop`: D102: Missing docstring in public method torch/utils/weak.py:228 in public method `setdefault`: D102: Missing docstring in public method torch/utils/weak.py:231 in public method `update`: D102: Missing docstring in public method torch/utils/weak.py:241 in public method `__ior__`: D105: Missing docstring in magic method torch/utils/weak.py:245 in public method `__or__`: D105: Missing docstring in magic method torch/utils/weak.py:252 in public method `__ror__`: D105: Missing docstring in magic method torch/utils/weak.py:262 in public method `__eq__`: D105: Missing docstring in magic method torch/utils/weak.py:276 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/weak.py:280 in public method `__call__`: D102: Missing docstring in public method ``` @mikaylagawarecki @jbschlosser @svekars Pull Request resolved: https://github.com/pytorch/pytorch/pull/113311 Approved by: https://github.com/ezyang	2023-11-15 17:40:04 +00:00
drisspg	9b0f2f8d94	expose sdpa helpers to python (#110496 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110496 Approved by: https://github.com/jbschlosser	2023-11-15 07:34:34 +00:00
Aaron Gokaslan	18d7b8e4f7	[BE]: ruff apply rule PLW1510 to find silent subprocess errors (#113644 ) Reopens #111682 that I messed up due to a bad rebase and triggered some issues with CLA. This explicitly adds check=True or False to any subprocess calls where appropriate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113644 Approved by: https://github.com/ezyang, https://github.com/kit1980	2023-11-14 20:59:40 +00:00

1 2 3 4 5 ...

258 Commits