pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Haifeng Jin	37dd924c2d	Fix test/test_linalg.py for NumPy 2 (#136800 ) Related to #107302. When built and tested with NumPy 2 the following unit tests failed. ``` =========================================================== short test summary info ============================================================ FAILED [0.0026s] test/test_linalg.py::TestLinalgCPU::test_householder_product_cpu_complex128 - TypeError: expected np.ndarray (got Tensor) FAILED [0.0024s] test/test_linalg.py::TestLinalgCPU::test_householder_product_cpu_complex64 - TypeError: expected np.ndarray (got Tensor) FAILED [0.0025s] test/test_linalg.py::TestLinalgCPU::test_householder_product_cpu_float32 - TypeError: expected np.ndarray (got Tensor) FAILED [0.0024s] test/test_linalg.py::TestLinalgCPU::test_householder_product_cpu_float64 - TypeError: expected np.ndarray (got Tensor) FAILED [0.0016s] test/test_linalg.py::TestLinalgCPU::test_nuclear_norm_axes_small_brute_force_old_cpu - ValueError: Unable to avoid copy while creating an array as requested. FAILED [0.0054s] test/test_linalg.py::TestLinalgCPU::test_solve_cpu_complex128 - AssertionError: The values for attribute 'shape' do not match: torch.Size([0, 0]) != torch.Size([0, 0, 0]). FAILED [0.0055s] test/test_linalg.py::TestLinalgCPU::test_solve_cpu_complex64 - AssertionError: The values for attribute 'shape' do not match: torch.Size([0, 0]) != torch.Size([0, 0, 0]). FAILED [0.0048s] test/test_linalg.py::TestLinalgCPU::test_solve_cpu_float32 - AssertionError: The values for attribute 'shape' do not match: torch.Size([0, 0]) != torch.Size([0, 0, 0]). FAILED [0.0054s] test/test_linalg.py::TestLinalgCPU::test_solve_cpu_float64 - AssertionError: The values for attribute 'shape' do not match: torch.Size([0, 0]) != torch.Size([0, 0, 0]). =========================================== 9 failed, 1051 passed, 118 skipped in 152.51s (0:02:32) ============================================ ``` This PR fixes them. The test is now compatible with both NumPy 1 & 2. Some more details: 1. The `np.linalg.solve` has changed its behavior. So I added an adapt function in the unit test to keep its behavior the same no matter it is NumPy 1 or Numpy 2. 2. The cause of the failure is when passing a `torch.Tensor` to `np.linalg.qr`, the return type in NumPy 1 is `(np.ndarray, np.ndarray)`, while it is `(torch.Tensor, torch.Tensor)` in NumPy 2. 3. NumPy 2 does not allow `np.array(obj, copy=False)`, but recommended to use `np.asarray(obj)` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136800 Approved by: https://github.com/lezcano	2024-10-01 07:53:24 +00:00
Nikita Shulga	c3e678382b	Fix addmm silent correctness on aarch64 (#136371 ) Do not dispatch to fast gemmv functions when alpha is not equal to 1 Add regression test to address the problem Fixes https://github.com/pytorch/pytorch/issues/136299 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136371 Approved by: https://github.com/swolchok	2024-09-23 17:10:34 +00:00
Jeff Daily	0eb9c870fd	[reland][ROCm] TunableOp for gemm_and_bias (#128919 ) Reland of #128143 but added `alpha` and `bias` initialization to `launchTunableGemmAndBias` Thus far TunableOp was implemented for gemm, bgemm, and scaled_mm. gemm_and_bias was notably missing. This PR closes that gap. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128919 Approved by: https://github.com/malfet	2024-08-22 18:27:50 +00:00
Nichols A. Romero	f25df31008	TunableOp more unit test follow-up (#130065 ) More unit tests for preventing TunableOp regressions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130065 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2024-08-08 22:42:16 +00:00
Jiang, Yanbing	bceb91222c	Fix meta error in _convert_weight_to_int4pack (#130915 ) This PR is to fix meta error in _convert_weight_to_int4pack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130915 Approved by: https://github.com/jerryzh168	2024-07-26 08:36:30 +00:00
Zixi Qi	c3fe9075a9	[ROCM] Use hipblaslt version from hipblaslt runtime instead of header for tunableops validator (#131078 ) Summary: When tunable ops load selected kernels from csv file, it will validate hipblaslt version defined in hipblaslt-version.h This PR changes the validator to fetch hipblaslt version and revision from hipblaslt runtime instead of the header file, as in our environment we might rollout a new version of the run time prior to updating the header file fleet wide. Test Plan: Verified generated tunableops kernel selection has the correct hipblaslt version from runtime: ``` Validator,PT_VERSION,2.5.0 Validator,ROCBLAS_VERSION,4.0.0-72e57364-dirty Validator,HIPBLASLT_VERSION,800-bf2c3184 Validator,ROCM_VERSION,6.0.0.0-12969-1544e39 Validator,GCN_ARCH_NAME,gfx942:sramecc+:xnack- GemmTunableOp_BFloat16_TN,tn_8192_2_3584,Gemm_Hipblaslt_TN_572,0.0240676 GemmTunableOp_BFloat16_TN,tn_7168_2_8192,Gemm_Hipblaslt_TN_482,0.0359019 GemmTunableOp_BFloat16_TN,tn_8192_2_1024,Default,0.0173723 GemmTunableOp_BFloat16_TN,tn_1280_2_8192,Gemm_Hipblaslt_TN_491,0.0191047 ``` Differential Revision: D59889043 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131078 Approved by: https://github.com/jeffdaily, https://github.com/xw285cornell	2024-07-25 00:54:07 +00:00
Jeff Daily	69b1999586	TunableOp size hotfix (#130800 ) Fixes #130727. GetSize calculation was incorrect for strided batched gemm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130800 Approved by: https://github.com/xw285cornell	2024-07-22 23:42:26 +00:00
Aaron Gokaslan	d1c4e6b55f	[BE]: Enable a few additional ruff rules (#130700 ) Enables a few extra ruff rules, most of which do not have any violations as I already cleaned them with earlier PRs, these just turns them on to enforce them. Adds 1 noqa as we want the suboptimal lambda generation + call kept as a test. Also enables the test in flake8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130700 Approved by: https://github.com/justinchuby, https://github.com/ezyang	2024-07-17 02:06:04 +00:00
inkcherry	f422027fce	fix torch.linalg.lstsq input check (#130612 ) Fixes [#117236 ](https://github.com/pytorch/pytorch/issues/117236) The current case does not meet the vector scenario requirements, and it lacks sufficient checks (relying solely on ```dim_diff``` is insufficient). Consequently, it triggers an internal assertion error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130612 Approved by: https://github.com/lezcano	2024-07-12 23:06:52 +00:00
Jiang, Yanbing	6f662e9575	update the input `weight` of `_convert_weight_to_int4pack` to `[n][k / 2] uint8` (#129940 ) This PR is to update the input `weight` of `_convert_weight_to_int4pack` from `[n][k] int32` to `[n][k / 2] uint8`, both for CPU, CUDA and MPS, which can help decouple int4 model checkpoint with different ISAs and different platforms in `gpt-fast`. The advantage is int4 model checkpoint can be shared in different test machines, without re-generating in one certain platform. Meanwhile, the size of input `weight` can be reduced to `1 / 8`. Before this PR, packed weight stored in CUDA specific layout: `[n/8][k/(InnerKTiles*16)][32][InnerKTiles/2]`, dtype int32, where InnerKTiles = 2, 4, 8. CPU packed weight viewed as the SAME shape but stored in different layout: `[n/64][k][32]`, dtype uint8. Weight is strongly coupled with platforms (CPU/CUDA) and ISAs (AVX512/AVX2/scalar). And users cannot use a generated weight in another different ISA or platform, because when loading weight into devices, the compute format is different. ![image](https://github.com/pytorch/pytorch/assets/61222868/64971c4b-29b9-42cf-9aeb-ffa01cea93dd) Now, we use common serialized layout (`[n][k/2] uint8`) for different devices or ISAs as input `weight` of `_convert_weight_to_int4pack`, and each back chooses how to interpret as compute layout. ![image](https://github.com/pytorch/pytorch/assets/61222868/c7990761-c723-417b-aca2-7c60db7785c7) ### Performance Intel (R) Xeon (R) CPU Max 9480, single socket (56 cores) There is no obvious regression of this PR. ![image](https://github.com/pytorch/pytorch/assets/61222868/6046dcf4-920b-4c63-9ca3-1c8c3cafebde) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129940 Approved by: https://github.com/jgong5, https://github.com/lezcano, https://github.com/mingfeima	2024-07-11 15:26:48 +00:00
PyTorch MergeBot	637cc8d27f	Revert "update the input `weight` of `_convert_weight_to_int4pack` to `[n][k / 2] uint8` (#129940 )" This reverts commit `6367f02a0e`. Reverted https://github.com/pytorch/pytorch/pull/129940 on behalf of https://github.com/albanD due to Broke rocm tests on main `6367f02a0e` ([comment](https://github.com/pytorch/pytorch/pull/129940#issuecomment-2220554681))	2024-07-10 13:48:32 +00:00
Jiang, Yanbing	6367f02a0e	update the input `weight` of `_convert_weight_to_int4pack` to `[n][k / 2] uint8` (#129940 ) This PR is to update the input `weight` of `_convert_weight_to_int4pack` from `[n][k] int32` to `[n][k / 2] uint8`, both for CPU, CUDA and MPS, which can help decouple int4 model checkpoint with different ISAs and different platforms in `gpt-fast`. The advantage is int4 model checkpoint can be shared in different test machines, without re-generating in one certain platform. Meanwhile, the size of input `weight` can be reduced to `1 / 8`. Before this PR, packed weight stored in CUDA specific layout: `[n/8][k/(InnerKTiles*16)][32][InnerKTiles/2]`, dtype int32, where InnerKTiles = 2, 4, 8. CPU packed weight viewed as the SAME shape but stored in different layout: `[n/64][k][32]`, dtype uint8. Weight is strongly coupled with platforms (CPU/CUDA) and ISAs (AVX512/AVX2/scalar). And users cannot use a generated weight in another different ISA or platform, because when loading weight into devices, the compute format is different. ![image](https://github.com/pytorch/pytorch/assets/61222868/64971c4b-29b9-42cf-9aeb-ffa01cea93dd) Now, we use common serialized layout (`[n][k/2] uint8`) for different devices or ISAs as input `weight` of `_convert_weight_to_int4pack`, and each back chooses how to interpret as compute layout. ![image](https://github.com/pytorch/pytorch/assets/61222868/c7990761-c723-417b-aca2-7c60db7785c7) ### Performance Intel (R) Xeon (R) CPU Max 9480, single socket (56 cores) There is no obvious regression of this PR. ![image](https://github.com/pytorch/pytorch/assets/61222868/6046dcf4-920b-4c63-9ca3-1c8c3cafebde) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129940 Approved by: https://github.com/jgong5, https://github.com/lezcano, https://github.com/mingfeima	2024-07-10 07:38:42 +00:00
Jerry Mannil	42f647219a	[ROCm] Add int4 support (#129710 ) - Add AMD support for int4 kernel - Only supports CDNA2 and CDNA3 gpus for now - Uses `mfma_f32_16x16x16bf16` instruction for matrix multiply - Uses `v_and_or_b32` instruction and `__hfma2` instrinsic for unpacking bf16 values - Enable hipify for `__nv_bfloat16` and `__nv_bfloat162` data types - Enable int4 unit tests for CDNA2 and CDNA3 AMD gpus - Fix torchscript issues due to hipify for `__nv_bfloat16` type - TorchScript has its own implementation for bfloat16 type - Implemented in `__nv_bloat16` structure at [resource_strings.h](https://github.com/pytorch/pytorch/blob/main/torch/csrc/jit/codegen/fuser/cuda/resource_strings.h) - So, we shouldn't hipify any reference of `__nv_bfloat16` in the torchscript implementation - Hence moved the `__nv_bfloat16` direct references in `codegen.cpp` and `cuda_codegen.cpp` to `resource_strings.h` which is already exempted from hipify Fixes #124699 Fixes pytorch-labs/gpt-fast/issues/154 Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129710 Approved by: https://github.com/malfet	2024-07-09 19:49:12 +00:00
PyTorch MergeBot	d7b7f8b79f	Revert "[ROCm] Add int4 support (#129710 )" This reverts commit `d0ad13fa42`. Reverted https://github.com/pytorch/pytorch/pull/129710 on behalf of https://github.com/jeffdaily due to original ROCm PR did not have ciflow/rocm, missed signal ([comment](https://github.com/pytorch/pytorch/pull/129710#issuecomment-2214558368))	2024-07-08 16:07:53 +00:00
Jerry Mannil	d0ad13fa42	[ROCm] Add int4 support (#129710 ) Add AMD support for int4 kernel using mfma_f32_16x16x16bf16 instruction. Only supports CDNA2 and CDNA3 gpus for now. Fixes #124699 Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129710 Approved by: https://github.com/malfet	2024-07-07 23:54:22 +00:00
Edward Z. Yang	29c68df600	Stop immediately specializing common constants 0/1 for plain int (#128327 ) Fixes https://github.com/pytorch/pytorch/issues/128319 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/128327 Approved by: https://github.com/lezcano ghstack dependencies: #129983	2024-07-03 16:41:51 +00:00
Fuzzkatt	0441173ab2	Add slowTest marker to test_linalg_solve_triangular_large (#129903 ) In nvidia internal testing, for slower devices such as Orin NX, on large dtypes like complex128, test_linalg_solve_triangular_large is taking multiple hours to complete and timing out CI. This PR adds a slowTest marker so it can be skipped due to speed issues. cc @nWEIdia Pull Request resolved: https://github.com/pytorch/pytorch/pull/129903 Approved by: https://github.com/lezcano	2024-07-02 12:27:12 +00:00
awayzjj	ccc4ee7793	check boolean alpha and beta of Fake tensor impl for Tensor.addr (#129839 ) Fixes https://github.com/pytorch/pytorch/issues/127043 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129839 Approved by: https://github.com/lezcano	2024-07-02 09:20:49 +00:00
Jeff Daily	04206d1898	TunableOp hotfix, unit test follow-up (#129606 ) PR #129281 was landed to fix critical issues but did not contain unit tests to exercise those issues. This is a follow-up set of unit tests that would exercise the problems seen previously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129606 Approved by: https://github.com/atalman	2024-06-27 17:01:04 +00:00
Jeff Daily	0e7bd7fedd	[ROCm] TunableOp improvements (#124362 ) - use less memory; smaller default hipblaslt workspace size - options to avoid cache effects - icache flush option - rotating buffers during tuning - python APIs - unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/124362 Approved by: https://github.com/xw285cornell	2024-06-03 22:30:11 +00:00
Xuehai Pan	8b08b0f340	[BE] enable ruff rule `Q` from flake8-quotes (#127713 ) Enable [ruff rule `Q`](https://docs.astral.sh/ruff/rules/#flake8-quotes-q) from flake8-quotes. Fixes: - [avoidable-escaped-quote (Q003)](https://docs.astral.sh/ruff/rules/avoidable-escaped-quote/#avoidable-escaped-quote-q003) - [unnecessary-escaped-quote (Q004)](https://docs.astral.sh/ruff/rules/unnecessary-escaped-quote/#unnecessary-escaped-quote-q004) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127713 Approved by: https://github.com/ezyang	2024-06-02 23:25:26 +00:00
lezcano	48538d3d14	Implement svd_lowrank and pca_lowrank for complex numbers (#125580 ) We fix a number of bugs previously present in the complex implementation. We also heavily simplify the implementation, using, among other things, that we now have conjugate views. I saw there is a comment regarding how slow some checks on this function are. As such, I removed quite a few of the combinations of inputs to make the OpInfo lighter. I still left a couple relevant examples to not regress coverage though. Fixes https://github.com/pytorch/pytorch/issues/122188 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125580 Approved by: https://github.com/pearu, https://github.com/peterbell10	2024-05-30 14:45:58 +00:00
William Wen	5359af0c7e	[dynamo] wrap GraphModule exceptions in dynamo-wrapped tests (#126341 ) Better approach to https://github.com/pytorch/pytorch/pull/126197 to catch issues like https://github.com/pytorch/pytorch/issues/125568. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126341 Approved by: https://github.com/anijain2305, https://github.com/jansel	2024-05-29 05:18:04 +00:00
Nikita Shulga	4ff9113e3d	[MPS] Add `_weight_int8pack_mm` tests (#127041 ) As well as extend the test to cover MV cases (where A matrix is 1xM) Limit int8 op testing to 32x32 matrix sizes for now Pull Request resolved: https://github.com/pytorch/pytorch/pull/127041 Approved by: https://github.com/larryliu0820, https://github.com/manuelcandales	2024-05-24 16:08:06 +00:00
Scott Wolchok	85fd76f76d	Add test coverage for fp16 matrix-vector specialized kernel (#126700 ) Summary: This kernel is special-cased on ARM because it's important for LLMs, so let's have test coverage. Test Plan: Ran locally and it passes. Intentionally broke fp16_gemv_trans and saw it fail, confirming it provides coverage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126700 Approved by: https://github.com/malfet	2024-05-21 17:23:16 +00:00
Catherine Lee	6f619cc727	[ez] functorch/test_vmap and test_dataloader to run in parallel (#125597 ) Also mark test_svd serial in linalg to see if it helps with the flakiness Pull Request resolved: https://github.com/pytorch/pytorch/pull/125597 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2024-05-08 15:37:29 +00:00
Nikita Shulga	30610251ec	[MPS] And naive quantized intmm and `.gputrace` capture hooks (#125163 ) - Implement a very straightforward Metal copy of CPU int4mm kernel - Implement int8mm kernel by constructing a graph consisting of upcast, transpose and mm - Add `isCapturing`, `isCaptureEnabled`, `startCapture` and `stopCapture` methods to `MPSProfile` which can be used to help one debug/profile Metal kernels by wrapping the calls with the following ```cpp if (getMPSProfiler().profiler.isCaptureEnabled()) { getMPSProfiler().startCapture(__func__, mpsStream); } ... if (getMPSProfiler().isCapturing()) { getMPSProfiler().stopCapture(mpsStream); } ``` that, if invoked with `MTL_CAPTURE_ENABLED` environment variable set to one, will produce .gputrace files, in the current working directory, which can later be loaded and used to debug or profiler the kernel <img width="1093" alt="image" src="https://github.com/pytorch/pytorch/assets/2453524/a2bf27e8-df8a-442c-a525-1df67b8a376a"> - Added `test_int4mm` to TestLinalgMPS, which is mostly copy-n-paste of the test from `test_linalg` TODOs: - Add weight pack - Perf-tune both kernels Pull Request resolved: https://github.com/pytorch/pytorch/pull/125163 Approved by: https://github.com/mikekgfb	2024-05-03 15:20:39 +00:00
Nikita Shulga	acac7aa70f	[CI] Unskip Linalg tests on ARM (#125377 ) Removes obscure "Issue with numpy version on arm" added by https://github.com/pytorch/pytorch/pull/82213 And replaces it with 4 targeted skips: - test_addmv for `float16` - test_vector_norm for `float16`, `bfloat16` and `float32` Followups to fix them are tracked in https://github.com/pytorch/pytorch/issues/125438 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125377 Approved by: https://github.com/kit1980	2024-05-03 01:18:52 +00:00
aaitzhan	47ba7a76e2	[ATen][CUDA][AMP] Fix dtype mismatch in linalg_vector_norm (#125175 ) Fixes #125174 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125175 Approved by: https://github.com/eqy, https://github.com/lezcano	2024-05-01 10:57:12 +00:00
Aaron Orenstein	a8574a9719	Fix global flake8 issues (#124771 ) Prior to this `lintrunner --all-files --take FLAKE8` failed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124771 Approved by: https://github.com/Skylion007 ghstack dependencies: #124428	2024-04-26 15:35:53 +00:00
PyTorch MergeBot	1ac60484c1	Revert "Fix global flake8 issues (#124771 )" This reverts commit `f01275934b`. Reverted https://github.com/pytorch/pytorch/pull/124771 on behalf of https://github.com/jeanschmidt due to Unfortunately, I needed to revert #123735 and this one depends on it. So please check if there are no merge conflicts or breakages and feel free to merge this PR again ([comment](https://github.com/pytorch/pytorch/pull/124428#issuecomment-2078699836))	2024-04-26 06:15:17 +00:00
Aaron Orenstein	f01275934b	Fix global flake8 issues (#124771 ) Prior to this `lintrunner --all-files --take FLAKE8` failed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124771 Approved by: https://github.com/Skylion007 ghstack dependencies: #124428	2024-04-25 14:25:00 +00:00
Peter Y Yeh	2e7b8ff116	[ROCm] Fix Int_mm() Integration with hipblasLT (#122431 ) The PR - fixes int_mm() /int8_gemm() integration with hipblasLT backend (require ROCm 6.0). - enables/fixes the following tests on Rocm - test__int_mm_k_16_n_16_use_transpose_a_False_use_transpose_b_False_cuda - test__int_mm_k_16_n_16_use_transpose_a_False_use_transpose_b_True_cuda - test__int_mm_k_16_n_16_use_transpose_a_True_use_transpose_b_False_cuda - test__int_mm_k_16_n_16_use_transpose_a_True_use_transpose_b_True_cuda - test__int_mm_k_16_n_32_use_transpose_a_False_use_transpose_b_False_cuda - test__int_mm_k_16_n_32_use_transpose_a_False_use_transpose_b_True_cuda - test__int_mm_k_16_n_32_use_transpose_a_True_use_transpose_b_False_cuda - test__int_mm_k_16_n_32_use_transpose_a_True_use_transpose_b_True_cuda - test__int_mm_k_32_n_16_use_transpose_a_False_use_transpose_b_False_cuda - test__int_mm_k_32_n_16_use_transpose_a_False_use_transpose_b_True_cuda - test__int_mm_k_32_n_16_use_transpose_a_True_use_transpose_b_False_cuda - test__int_mm_k_32_n_16_use_transpose_a_True_use_transpose_b_True_cuda - test__int_mm_k_32_n_32_use_transpose_a_False_use_transpose_b_False_cuda - test__int_mm_k_32_n_32_use_transpose_a_False_use_transpose_b_True_cuda - test__int_mm_k_32_n_32_use_transpose_a_True_use_transpose_b_False_cuda - test__int_mm_k_32_n_32_use_transpose_a_True_use_transpose_b_True_cuda Pull Request resolved: https://github.com/pytorch/pytorch/pull/122431 Approved by: https://github.com/pruthvistony, https://github.com/jithunnair-amd, https://github.com/malfet, https://github.com/atalman	2024-04-24 02:29:33 +00:00
Jeff Daily	6ede882c0b	preferred blas library; cublaslt gemm implementation (#122106 ) Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources. The default blas implementation remains cublas or hipblas. cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling `torch.backends.cuda.preferred_blas_library(backend="cublaslt")` or as an alias `backend="hipblaslt"`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122106 Approved by: https://github.com/lezcano	2024-04-22 15:38:22 +00:00
Nikita Shulga	c74dfca5e7	Int4MM: Unswizzle for different dtypes (#124448 ) If dtype is not the one this platform is optimized for, it might need different unswizzling pattenrs Implement ones for non-vectorized flavor of the kernel, so that int4mm can be used with float32 and float16 dtypes Pull Request resolved: https://github.com/pytorch/pytorch/pull/124448 Approved by: https://github.com/jgong5, https://github.com/mikekgfb	2024-04-19 21:17:15 +00:00
William Wen	cbde0f048b	[dynamo, 3.12] enable tests disabled due to missing dynamo 3.12 support (#123300 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123300 Approved by: https://github.com/jansel, https://github.com/malfet, https://github.com/zou3519	2024-04-05 20:13:17 +00:00
AyaseNana	0a7162f898	Fix svd_lowrank parameter `M` (#122681 ) ISSUE: #122699 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122681 Approved by: https://github.com/lezcano	2024-03-29 18:06:38 +00:00
min-jean-cho	057892f4be	[CPU] optimize Lp norm for 1-dimensional vector (#122143 ) Fixes https://github.com/pytorch/pytorch/issues/120229 - Optimize vector norm by simplifying vector norm formula for 1-dimensional vector. - Vector norm formula for 1-dimensional vector simplifies to `abs(x)`. See below for proof. - Next step, we can similarly optimize matrix norm (`torch.linalg.matrix_norm`) for 1 x 1 matrix. - Additionally, avoids overflow in power, `abs(x) p` for large `p` or `x`, for 1-dimensional vector. ### Performance Avg Latency (ms) of `torch.norm` and `torch.linalg.vector_norm` for `torch.norm(torch.randn(218, 1), ord, -1)` `torch.linalg.vector_norm(torch.randn(218, 1), ord, -1)` Tested on 28 physical cores/socket, 1 socket on Skylake. \| \| \| \| \| Avg Latency (ms) \| \| \| \|-------------------------- \|----------------- \|--------- \|--------- \|----------------------- \|----------------------- \|---------------------------------------- \| \| op \| input shape \| dim \| ord \| baseline (master) \| optimized (7102f1ef372b248414d36cbd0c51a546b6b6a41a) \| speedup ratio (baseline/optimized) \| \| torch.norm \| (218, 1) \| -1 \| fro \| 34.3755531 \| 0.0125408 \| 2741.094 \| \| \| \| \| inf \| 34.0952635 \| 0.0122237 \| 2789.271 \| \| \| \| \| -inf \| 34.3674493 \| 0.0120759 \| 2845.953 \| \| \| \| \| 0 \| 34.1004515 \| 0.0175261 \| 1945.69 \| \| \| \| \| 1 \| 34.1688442 \| 0.0121593 \| 2810.089 \| \| \| \| \| -1 \| 33.949492 \| 0.0120282 \| 2822.487 \| \| \| \| \| 2 \| 34.3669581 \| 0.0120401 \| 2854.366 \| \| \| \| \| -2 \| 33.9252067 \| 0.0121069 \| 2802.139 \| \| \| \| \| \| \| \| \| \| torch.linalg.vector_norm \| (2**18, 1) \| -1 \| inf \| 34.090879 \| 0.0095105 \| 3584.545 \| \| \| \| \| -inf \| 34.3708754 \| 0.0099111 \| 3467.931 \| \| \| \| \| 0 \| 34.0880775 \| 0.0141716 \| 2405.38 \| \| \| \| \| 1 \| 34.1392851 \| 0.0093174 \| 3664.036 \| \| \| \| \| -1 \| 33.925395 \| 0.0092483 \| 3668.302 \| \| \| \| \| 2 \| 34.3854165 \| 0.0092459 \| 3719.002 \| \| \| \| \| -2 \| 33.932972 \| 0.0093007 \| 3648.429 \| ### Proof <details> <summary>For those interested :)</summary> <img width="382" alt="1_dim_vector_norm_proof1" src="https://github.com/pytorch/pytorch/assets/93151422/59b1e00b-8fcd-47cb-877d-d31403b5195b"> <img width="432" alt="1_dim_vector_norm_proof2" src="https://github.com/pytorch/pytorch/assets/93151422/236bea15-2dd5-480b-9871-58b2e3b24322"> </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/122143 Approved by: https://github.com/lezcano	2024-03-20 23:20:25 +00:00
Xia, Weiwen	8168338063	Add CPU implementation for `torch._int_mm` (s8s8->s32) (#121792 ) Fixes #121647 Description* Currently, the op `torch._int_mm` only supports CUDA device. This PR adds CPU implementation for it. Besides the request from the issue, this op may also be useful for planned CPU implementations of [LLM.int8()](https://arxiv.org/abs/2208.07339) in [Bitsandbytes](https://github.com/TimDettmers/bitsandbytes). The implementation prefers mkldnn (oneDNN) kernels. If mkldnn is not available, a reference implementation with nested for loops is used. Test plan `python test/test_linalg.py -k test__int_mm_cpu` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121792 Approved by: https://github.com/jgong5, https://github.com/lezcano	2024-03-19 08:44:33 +00:00
mingfeima	b3065f6899	add int8 packed gemm support on CPU device (#118056 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118056 Approved by: https://github.com/mikekgfb	2024-03-07 08:41:43 +00:00
mingfeima	a427d90411	add int4 packed gemm support on CPU device (#117475 ) This patch adds int4 packed gemm support on CPU, both `avx512` and `avx2` are supported. It is used to speedup https://github.com/pytorch-labs/gpt-fast The default perf measured on Intel (R) Xeon (R) CPU Max 9480, single socket (56 cores) is `16.13 sec total, 12.40 tokens/sec` * WOQ int4 on avx512: `5.92 sec total, 33.79 tokens/sec` * WOQ int4 on avx2: `6.90 sec total, 29.00 tokens/sec` WOQ int4 is measured with method: https://github.com/pytorch-labs/gpt-fast?tab=readme-ov-file#int4-weight-only-quantization Pull Request resolved: https://github.com/pytorch/pytorch/pull/117475 Approved by: https://github.com/jgong5, https://github.com/malfet	2024-03-06 16:25:53 +00:00
eqy	8dafc81ba9	[cuBLAS][cuBLASLt] Fix expected failures for `int_mm` on `sm75` (turing) (#121277 ) CC @malfet @atalman @ptrblck @tinglvv Pull Request resolved: https://github.com/pytorch/pytorch/pull/121277 Approved by: https://github.com/malfet	2024-03-06 01:51:01 +00:00
PyTorch MergeBot	0c07c0c15f	Revert "add int4 packed gemm support on CPU device (#117475 )" This reverts commit `30befa592e`. Reverted https://github.com/pytorch/pytorch/pull/117475 on behalf of https://github.com/izaitsevfb due to fails meta-internal tests ([comment](https://github.com/pytorch/pytorch/pull/117475#issuecomment-1977474686))	2024-03-04 21:20:57 +00:00
PyTorch MergeBot	a98c17edc7	Revert "add int8 packed gemm support on CPU device (#118056 )" This reverts commit `f84375ca5d`. Reverted https://github.com/pytorch/pytorch/pull/118056 on behalf of https://github.com/izaitsevfb due to breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/118056#issuecomment-1977368720))	2024-03-04 20:09:40 +00:00
mingfeima	f84375ca5d	add int8 packed gemm support on CPU device (#118056 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118056 Approved by: https://github.com/mikekgfb ghstack dependencies: #117475	2024-03-02 04:35:49 +00:00
mingfeima	30befa592e	add int4 packed gemm support on CPU device (#117475 ) This patch adds int4 packed gemm support on CPU, both `avx512` and `avx2` are supported. It is used to speedup https://github.com/pytorch-labs/gpt-fast The default perf measured on Intel (R) Xeon (R) CPU Max 9480, single socket (56 cores) is `16.13 sec total, 12.40 tokens/sec` * WOQ int4 on avx512: `5.92 sec total, 33.79 tokens/sec` * WOQ int4 on avx2: `6.90 sec total, 29.00 tokens/sec` WOQ int4 is measured with method: https://github.com/pytorch-labs/gpt-fast?tab=readme-ov-file#int4-weight-only-quantization Pull Request resolved: https://github.com/pytorch/pytorch/pull/117475 Approved by: https://github.com/jgong5, https://github.com/malfet	2024-03-02 00:17:34 +00:00
Jeff Daily	e98dbae0a0	[ROCm] enable hipsolver backend for linalg.eigh (#115177 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115177 Approved by: https://github.com/lezcano	2024-02-08 22:03:27 +00:00
haozhe.zhu@intel.com	0ae952db76	enable mkldnn bf32 matmul (#116015 ) ### Testing FP32 matmul vs. mkldnn BF32 matmul on SPR single core: Input \| BF32 / ms \| FP32 / ms \| Speed up -- \| -- \| -- \| -- M: 128, N: 128, K: 128, trans_a: False, trans_b: False \| 32.842 \| 38.279 \| 1.165 M: 128, N: 256, K: 128, trans_a: False, trans_b: False \| 38.590 \| 73.967 \| 1.917 M: 8192, N: 768, K: 768, trans_a: False, trans_b: False \| 18456.267 \| 74588.002 \| 4.041 56 cores: Input \| BF32 / ms \| FP32 / ms \| Speed up -- \| -- \| -- \| -- M: 8192, N: 768, K: 768, trans_a: False, trans_b: False \| 1199.400 \| 1715.548 \| 1.430 M: 8192, N: 768, K: 768, trans_a: False, trans_b: True \|1129.204 \| 1708.912 \| 1.513 M: 8192, N: 768, K: 3072, trans_a: False, trans_b: False \| 3655.915 \| 7992.877 \| 2.186 M: 8192, N: 768, K: 3072, trans_a: False, trans_b: True \| 3707.993 \| 8026.191 \| 2.165 Batch: 768, M: 128, N: 64, K: 128 \| 1296.419 \| 1308.411 \| 1.009 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116015 Approved by: https://github.com/jgong5, https://github.com/ezyang	2024-01-20 09:30:23 +00:00
lezcano	f2d6e99f8d	Workaround a cusolver bug on CUDA < 12.1 in triangular_solve (#117636 ) Fix https://github.com/pytorch/pytorch/issues/79191 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117636 Approved by: https://github.com/malfet	2024-01-19 12:42:37 +00:00
Animesh Jain	6e4e81a9ef	[dynamo] Extend LazyVariableTracker to tuples (#117426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117426 Approved by: https://github.com/lezcano, https://github.com/jansel	2024-01-18 15:51:28 +00:00
Sun, Jiayi	7a851fedc8	support torch.mm with conjugate transposed inputs (#117238 ) Fix https://github.com/pytorch/pytorch/issues/116855. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117238 Approved by: https://github.com/lezcano	2024-01-15 12:36:01 +00:00
soulitzer	e10cfdd895	Update matmul requires_grad checks (#117067 ) Fixes https://github.com/pytorch/pytorch/issues/116099 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117067 Approved by: https://github.com/lezcano, https://github.com/albanD ghstack dependencies: #116523, #116710	2024-01-10 23:16:42 +00:00
lcskrishna	b9293e74a2	[ROCm] Fixes for hipblasLt for mm use case. (#116537 ) This PR fixes the accuracy issues for hipblasLT for mm case on ROCm. This PR is a follow up to the integration PR https://github.com/pytorch/pytorch/pull/114329 and https://github.com/pytorch/pytorch/pull/114890 The accuracy issue arises for mm usecase for ROCm where hipblasLT is enabled, and a bias has been passed which is not required. This PR addresses that issue. Added a unit-test case for this issue (bias=None) case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116537 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2024-01-10 22:13:18 +00:00
Aaron Gokaslan	bbe3261dd3	[BE]: Use `iterable.chain.from_iterable` where possible (#116376 ) This is more readable and more efficient when dealing with lots of sequences to chain together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116376 Approved by: https://github.com/albanD	2023-12-27 19:20:07 +00:00
Aaron Gokaslan	6de28e92d2	[BE]: Apply FURB118 (prev): replaces unnecessary lambdas with operator. (#116027 ) This replaces a bunch of unnecessary lambdas with the operator package. This is semantically equivalent, but the operator package is faster, and arguably more readable. When the FURB rules are taken out of preview, I will enable it as a ruff check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116027 Approved by: https://github.com/malfet	2023-12-20 19:35:08 +00:00
PyTorch MergeBot	24af118e55	Revert "markDynamoStrictTest more tests (#115871 )" This reverts commit `478f0e96dc`. Reverted https://github.com/pytorch/pytorch/pull/115871 on behalf of https://github.com/jeanschmidt due to Breaking internal tests and builds, please check diff, this is required to revert #115870 ([comment](https://github.com/pytorch/pytorch/pull/115871#issuecomment-1862992931))	2023-12-19 15:36:27 +00:00
rzou	49af19cd8e	Skip some flaky Dynamo tests in test_linalg.py (#115925 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115925 Approved by: https://github.com/lezcano	2023-12-16 02:38:56 +00:00
rzou	478f0e96dc	markDynamoStrictTest more tests (#115871 ) For: test_dispatch.py test_fake_tensor.py test_indexing.py test_linalg.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/115871 Approved by: https://github.com/voznesenskym ghstack dependencies: #115845, #115855, #115856, #115857, #115858, #115870	2023-12-15 05:26:54 +00:00
eqy	353f2dbd9c	[CUDA] Fix V100 expected failures in `test_mm_decomp` and `test_linalg` (#115666 ) BFloat16 isn't supported on sm70 and we get an unexpected cuBLAS success in 12.3+ Pull Request resolved: https://github.com/pytorch/pytorch/pull/115666 Approved by: https://github.com/malfet	2023-12-14 19:17:53 +00:00
atalman	43e3242490	[BE] Remove test corner cases for CUDA older than supported 11.8 (#114989 ) Remove deprecated CUDA use cases from tests. Similar to: https://github.com/pytorch/pytorch/pull/112873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114989 Approved by: https://github.com/malfet	2023-12-04 21:41:03 +00:00
Miles Lubin	070b2d3cff	cholesky_solve_backward: speed up using output_mask (#112981 ) Introduces a faster path for `cholesky_solve_backward` when the gradient with respect to the cholesky factor isn't required. Adds test coverage in `test_linalg.py`. # Example ## Setup ```py import torch torch.set_num_threads(1) mat = torch.randn(500, 1000) mat = mat @ mat.T L = torch.linalg.cholesky(mat, upper=False) rhs = torch.randn(500, 1) rhs.requires_grad = True sol = torch.cholesky_solve(rhs, L, upper=False).sum(dim=0) ``` ## Before ``` %timeit torch.autograd.grad(sol, rhs, retain_graph=True) 2.61 ms ± 18.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` ## After ``` %timeit torch.autograd.grad(sol, rhs, retain_graph=True) 109 µs ± 3.42 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112981 Approved by: https://github.com/lezcano	2023-11-16 18:30:57 +00:00
Ting Lu	b3a7d9208b	disable test int_mm for sm90 or later (#113327 ) disable test int_mm for sm90 or later ``` python test/test_linalg.py -k test__int_mm_k_32_n_32_use_transpose_a_False_use_transpose_b_False_cuda _ TestLinalgCUDA.test__int_mm_k_32_n_32_use_transpose_a_False_use_transpose_b_False_cuda _ Traceback (most recent call last): File "/usr/lib/python3.10/unittest/case.py", line 59, in testPartExecutor yield File "/usr/lib/python3.10/unittest/case.py", line 591, in run self._callTestMethod(testMethod) File "/usr/lib/python3.10/unittest/case.py", line 549, in _callTestMethod method() File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2410, in wrapper method(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2410, in wrapper method(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_device_type.py", line 428, in instantiated_test raise rte File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_device_type.py", line 415, in instantiated_test result = test(self, param_kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_device_type.py", line 1084, in only_fn return fn(slf, args, *kwargs) File "/opt/pytorch/pytorch/test/test_linalg.py", line 5719, in test__int_mm _test(17, k, n, use_transpose_a, use_transpose_b) File "/opt/pytorch/pytorch/test/test_linalg.py", line 5680, in _test c_int32 = torch._int_mm(a_int8, b_int8) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 0 m 32 n 17 k 32 mat1_ld 32 mat2_ld 32 result_ld 32 abType 3 cType 10 computeType 72 scaleType 10 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113327 Approved by: https://github.com/malfet	2023-11-13 22:13:44 +00:00
Oleg Bulatov	192477b5ba	Enable flake8-bugbear B020 lint (#110823 ) Fixes part of https://github.com/pytorch/pytorch/issues/106571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110823 Approved by: https://github.com/Skylion007	2023-10-24 22:43:47 +00:00
Scruel Tao	503f44fbb8	Fix: perverse input's NaN values to prevent undefined behavior for `matrix_exp` function (#111539 ) Currently, for `matrix_exp` function, if we have NaN values in the input matrices (small batches), it will keep outputting a "normal" result without any NaN value in it, and this will cause some problems that we may can't notice. This PR is for preventing such undefined behavior by "bring back" those NaN values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111539 Approved by: https://github.com/lezcano	2023-10-19 09:07:36 +00:00
Yanbo Liang	29048be41c	[Reland] Add int4mm kernel (#111403 ) This is a reland for #110914, #111327 and #111390 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111403 Approved by: https://github.com/Chillee	2023-10-17 06:33:18 +00:00
PyTorch MergeBot	408e991dfe	Revert "Quant: add weight int4pack mm kernel (#110914 )" This reverts commit `9980876cab`. Reverted https://github.com/pytorch/pytorch/pull/110914 on behalf of https://github.com/jeanschmidt due to Breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110914#issuecomment-1765302621))	2023-10-16 21:27:26 +00:00
Evgeni Burovski	48989bc820	trace frames with np.ndarray (#110512 ) Fixes #109604 Resubmit gh-109715 + several skips and small fixes to make tests pass. The main fix here is by @ysiraichi : previously, dynamo did not resume tracing numpy ndarrays after a graph break. While at it, fix several small issues Yukio's fix uncovers: - graph break gracefully on numpy dtypes which do not map to torch.dtypes (uint16 etc) - recognize array scalars in dynamo, treat them as 0D ndarrays - make sure that iterating over torch.ndarray generates arrays not bare tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/110512 Approved by: https://github.com/lezcano	2023-10-15 00:56:10 +00:00
Yanbo Liang	9980876cab	Quant: add weight int4pack mm kernel (#110914 ) Adding the weight int4pack mm CUDA kernel. The kernel comes from the tinnygemm project which developed by Jeff Johnson. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110914 Approved by: https://github.com/Chillee	2023-10-13 01:21:18 +00:00
Liao, Xuan	8820dda943	Revise def of contiguity in bmm (#110811 ) Fixes #108754. `hf_T5_generate` would encounter a regression when calling `extern_kernels.bmm`, if one input is `reinterpret_tensor(buf2, (8, 1, 64), (64, 0, 1))` rather than `reinterpret_tensor(buf2, (8, 1, 64), (64, 512, 1), 0)`. As @jgong5 mentioned in comment, in fact the two tensors are equivalent: The stride doesn't matter when the corresponding size is 1. We revise the definition of contiguity in `bmm` to add the above situation as a contiguous case. Thus, when stride equals to 0, `extern_kernels.bmm` could still use `gemm` of MKL to gain the performance. Speedup of `hf_T5_generate` is 1.343x now and 1.138x before, with script `bash inductor_single_test.sh multiple inference performance torchbench hf_T5_generate float32 first dynamic default 0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110811 Approved by: https://github.com/jgong5, https://github.com/lezcano, https://github.com/Chillee	2023-10-10 06:48:08 +00:00
CaoE	9399e0b1ff	add fp16 support for gemm (#99498 ) ### Testing Native matmul vs. mkldnn matmul on SPR (with avx512_fp16 support) single core: Input \| Naïve impl / ms \| oneDNN / ms \| Speed up -- \| -- \| -- \| -- M: 128, N: 128, K: 128, trans_a: False, trans_b: False \| 2010.387 \| 64.700 \| 31.072 M: 128, N: 256, K: 128, trans_a: False, trans_b: False \| 4027.116 \| 107.780 \| 37.364 M: 8192, N: 768, K: 768, trans_a: False, trans_b: False \| 28685868.488 \| 90663.008 \| 316.401 56 cores: Input \| Naïve impl / ms \| oneDNN / ms \| Speed up -- \| -- \| -- \| -- M: 128, N: 128, K: 128, trans_a: False, trans_b: False \| 5.091 \| 0.24 \| 211.30 M: 128, N: 128, K: 128, trans_a: False, trans_b: True \| 5.224 \| 0.23 \| 220.09 M: 128, N: 256, K: 128, trans_a: False, trans_b: False \| 10.006 \| 0.30 \| 330.31 M: 8192, N: 768, K: 768, trans_a: False, trans_b: False \| 29435.372 \| 1.770 \| 1662.80 M: 8192, N: 768, K: 768, trans_a: False, trans_b: True \| 31464.961 \| 1.728 \| 18204.76 M: 8192, N: 768, K: 3072, trans_a: False, trans_b: False \| 115035.849 \| 7.990 \| 14396.90 M: 8192, N: 768, K: 3072, trans_a: False, trans_b: True \| 122981.023 \| 7.725 \| 15918.34 Batch: 768, M: 128, N: 64, K: 128 \| 2032.523 \| 0.705 \| 2882.23 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99498 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-09-28 01:03:50 +00:00
Kurt Mohler	3f88e3105f	Reland: Remove remaining global `set_default_dtype` calls from tests (#108088 ) Fixes #68972 Relands #107246 To avoid causing Meta-internal CI failures, this PR avoids always asserting that the default dtype is float in the `TestCase.setUp/tearDown` methods. Instead, the assert is only done if `TestCase._default_dtype_check_enabled == True`. `_default_dtype_check_enabled` is set to True in the `if __name__ == "__main__":` blocks of all the relevant test files that have required changes for this issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/108088 Approved by: https://github.com/ezyang	2023-09-07 03:04:34 +00:00
PyTorch MergeBot	161ea463e6	Revert "Remove remaining global `set_default_dtype` calls from tests (#107246 )" This reverts commit `aa8ea1d787`. Reverted https://github.com/pytorch/pytorch/pull/107246 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/107246#issuecomment-1693838522))	2023-08-25 19:34:55 +00:00
Kurt Mohler	aa8ea1d787	Remove remaining global `set_default_dtype` calls from tests (#107246 ) Fixes #68972 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107246 Approved by: https://github.com/ezyang	2023-08-24 16:10:48 +00:00
Xiao Wang	6bfb4f7c4b	[CUDA][Linalg} Patch crash of `linalg.eigh` when input matrix is ill-conditioned, in some cusolver version (#107082 ) Related: https://github.com/pytorch/pytorch/issues/94772, https://github.com/pytorch/pytorch/issues/105359 I can locally reproduce this crash with pytorch 2.0.1 stable pip binary. The test already passes with the latest cuda 12.2 release. Re: https://github.com/pytorch/pytorch/issues/94772#issuecomment-1658909998 > From discussion in triage review: - [x] we should add a test to prevent regressions - [x] properly document support wrt different CUDA versions - [x] possibly add support using MAGMA Pull Request resolved: https://github.com/pytorch/pytorch/pull/107082 Approved by: https://github.com/lezcano	2023-08-16 21:15:15 +00:00
Andres Lugo	7fb543e36d	[ROCm] Enable hipsolver unit tests for batched linalg drivers (#106620 ) This is a follow up to https://github.com/pytorch/pytorch/pull/105881 and replaces https://github.com/pytorch/pytorch/pull/103203 The batched linalg drivers from 103203 were brought in as part of the first PR. This change enables the ROCm unit tests that were enabled as a result of that change. Along with a fix to prioritize hipsolver over magma when the preferred linalg backend is set to `default` The following 16 unit tests will be enabled for rocm in this change: - test_inverse_many_batches_cuda* - test_inverse_errors_large_cuda* - test_linalg_solve_triangular_large_cuda* - test_lu_solve_batched_many_batches_cuda* Pull Request resolved: https://github.com/pytorch/pytorch/pull/106620 Approved by: https://github.com/lezcano	2023-08-15 15:54:27 +00:00
Nikita Shulga	387e3b04fa	Reenable `torch._int_mm` testing on newer CUDAs (#106840 ) Looks like "it just works" on SM80+ on CUDA-12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106840 Approved by: https://github.com/vkuzo	2023-08-09 16:23:30 +00:00
nikitaved	a61a0fe490	test_linalg: triangular_solve - make well_conditioned well conditioned (#105919 ) `well_contioned=True` does not guarantee that the samples for `triangular_solve` are actually well-conditioned. This PR fixes that. This issues was discovered in https://github.com/pytorch/pytorch/pull/104425. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105919 Approved by: https://github.com/lezcano	2023-07-26 09:21:12 +00:00
Andres Lugo-Reyes	c89aec207a	[ROCm] reduce tolerance for triangular solve with well_conditioned set to True (#104425 ) Current test case causes an edge case tensor input that causes a single generated tensor to fail the tolerance assertion on ROCm only and only for float32. We have reviewed the logic with our libraries team and have discovered the discrepancy is due to a difference in order of operations on AMD GPUs. They came back with "working as intended" and found no perceivable bug. Interestingly, if we change the values in ks, ns, or bs, the test passes on ROCm. These particular sizes in this particular order generates a single problematic input that causes the assertion to fail the tolerance check by ~0.07. Again, this is not a bug, just differences in implementation. This PR loosens the tolerance for ROCm only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104425 Approved by: https://github.com/jeffdaily, https://github.com/nikitaved, https://github.com/lezcano	2023-07-25 05:03:09 +00:00
Andres Lugo-Reyes	b78341dda9	Use hipsolver for default svd case on ROCm (#103540 ) Fixes #102678 Fixes #102629 Fixes #102558 HipSOLVER performance on ROCm5.4.2 and later no longer serves as massive bottleneck. Additionally, using magma on rocm in this case caused test_compare_cpu_lialg_pinv_singular_cuda_float32 to fail. Using hipSOLVER, the test now passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103540 Approved by: https://github.com/lezcano	2023-07-24 20:50:56 +00:00
Aaron Gokaslan	6d43c89f37	[BE]: Update Ruff to 0.0.280 (#105724 ) Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724 Approved by: https://github.com/ezyang, https://github.com/janeyx99	2023-07-22 23:03:34 +00:00
Justin Chu	4cc1745b13	[BE] f-stringify torch/ and scripts (#105538 ) This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`. - https://docs.python.org/3/reference/lexical_analysis.html#f-strings - https://pypi.org/project/flynt/ Command used: ``` flynt torch/ -ll 120 flynt scripts/ -ll 120 flynt tools/ -ll 120 ``` and excluded `collect_env.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-21 19:35:24 +00:00
Justin Chu	73e1455327	[BE] Enable ruff's UP rules and autoformat test/ (#105434 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434 Approved by: https://github.com/albanD	2023-07-19 20:36:06 +00:00
PyTorch MergeBot	f353d17755	Revert "[ROCm] reduce tolerance for triangular solve with well_conditioned set to True (#104425 )" This reverts commit `ef7bc3e23d`. Reverted https://github.com/pytorch/pytorch/pull/104425 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. It is failing CUDA test in trunk built in debug mode https://github.com/pytorch/pytorch/actions/runs/5429187622/jobs/9874360641 ([comment](https://github.com/pytorch/pytorch/pull/104425#issuecomment-1617247699))	2023-07-03 04:18:04 +00:00
Andres Lugo-Reyes	ef7bc3e23d	[ROCm] reduce tolerance for triangular solve with well_conditioned set to True (#104425 ) Current test case causes an edge case tensor input that causes a single generated tensor to fail the tolerance assertion on ROCm only and only for float32. We have reviewed the logic with our libraries team and have discovered the discrepancy is due to a difference in order of operations on AMD GPUs. They came back with "working as intended" and found no perceivable bug. Interestingly, if we change the values in ks, ns, or bs, the test passes on ROCm. These particular sizes in this particular order generates a single problematic input that causes the assertion to fail the tolerance check by ~0.07. Again, this is not a bug, just differences in implementation. This PR loosens the tolerance for ROCm only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104425 Approved by: https://github.com/jeffdaily, https://github.com/nikitaved, https://github.com/lezcano	2023-06-30 21:43:42 +00:00
cyy	54cb61f7d9	enable ASAN on some tests (#103647 ) Enabling more tests on ASAN, meanwhile we disable float-divide-by-zero and float-cast-overflow, both are disabled because they are also disabled by default in latest clang. The following cited doc explains the reasons. ``` -fsanitize=float-cast-overflow: Conversion to, from, or between floating-point types which would overflow the destination. Because the range of representable values for all floating-point types supported by Clang is [-inf, +inf], the only cases detected are conversions from floating point to integer types. -fsanitize=float-divide-by-zero: Floating point division by zero. This is undefined per the C and C++ standards, but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing either an infinity or NaN value, so is not included in -fsanitize=undefined. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/103647 Approved by: https://github.com/kit1980	2023-06-28 02:17:14 +00:00
Adnan Akhundov	e4d8504ebc	Unify GELU tanh approximation in _addmm_activation GPU back-end (#104061 ) Summary: Currently, cuBLASLt-based fused GELU epilogue in the GPU back-end of the `_addmm_activation` operator uses tanh approximation, whereas other code paths on GPU don't. With this PR, the GELU tanh approximation is switched on in all back-end code paths of `_addmm_activation` on GPU for better consistency. Test Plan: ``` $ python test/test_linalg.py -k test_addmm_relu -v test_addmm_relu_cpu_bfloat16 (__main__.TestLinalgCPU.test_addmm_relu_cpu_bfloat16) ... ok test_addmm_relu_cpu_float32 (__main__.TestLinalgCPU.test_addmm_relu_cpu_float32) ... ok test_addmm_relu_cpu_float64 (__main__.TestLinalgCPU.test_addmm_relu_cpu_float64) ... ok test_addmm_relu_cuda_bfloat16 (__main__.TestLinalgCUDA.test_addmm_relu_cuda_bfloat16) ... ok test_addmm_relu_cuda_float32 (__main__.TestLinalgCUDA.test_addmm_relu_cuda_float32) ... ok test_addmm_relu_cuda_float64 (__main__.TestLinalgCUDA.test_addmm_relu_cuda_float64) ... ok ---------------------------------------------------------------------- Ran 6 tests in 1.896s OK $ python test/test_linalg.py -k test_addmm_gelu -v test_addmm_gelu_cpu_bfloat16 (__main__.TestLinalgCPU.test_addmm_gelu_cpu_bfloat16) ... ok test_addmm_gelu_cpu_float32 (__main__.TestLinalgCPU.test_addmm_gelu_cpu_float32) ... ok test_addmm_gelu_cpu_float64 (__main__.TestLinalgCPU.test_addmm_gelu_cpu_float64) ... ok test_addmm_gelu_cuda_bfloat16 (__main__.TestLinalgCUDA.test_addmm_gelu_cuda_bfloat16) ... ok test_addmm_gelu_cuda_float32 (__main__.TestLinalgCUDA.test_addmm_gelu_cuda_float32) ... ok test_addmm_gelu_cuda_float64 (__main__.TestLinalgCUDA.test_addmm_gelu_cuda_float64) ... ok ---------------------------------------------------------------------- Ran 6 tests in 2.050s OK ``` Reviewers: @eellison Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/104061 Approved by: https://github.com/eellison	2023-06-24 18:36:45 +00:00
Adnan Akhundov	f818036f85	Fix test_addmm_gelu assertion on Windows CUDA (#104031 ) Summary: This PR fixes the wrong assertion in the `test_addmm_gelu` happening in the Windows CUDA CI job caused by #103811. The addmm + GELU fusion is likely not happening (or not using the tanh approximation) on Windows. See [this comment](https://github.com/pytorch/pytorch/pull/103811#issuecomment-1601936203) in the #103811 for the details of the error. Test Plan: ``` $ python test/test_linalg.py -k test_addmm_relu -v test_addmm_relu_cpu_bfloat16 (__main__.TestLinalgCPU.test_addmm_relu_cpu_bfloat16) ... ok test_addmm_relu_cpu_float32 (__main__.TestLinalgCPU.test_addmm_relu_cpu_float32) ... ok test_addmm_relu_cpu_float64 (__main__.TestLinalgCPU.test_addmm_relu_cpu_float64) ... ok test_addmm_relu_cuda_bfloat16 (__main__.TestLinalgCUDA.test_addmm_relu_cuda_bfloat16) ... ok test_addmm_relu_cuda_float32 (__main__.TestLinalgCUDA.test_addmm_relu_cuda_float32) ... ok test_addmm_relu_cuda_float64 (__main__.TestLinalgCUDA.test_addmm_relu_cuda_float64) ... ok ---------------------------------------------------------------------- Ran 6 tests in 2.131s OK $ python test/test_linalg.py -k test_addmm_gelu -v test_addmm_gelu_cpu_bfloat16 (__main__.TestLinalgCPU.test_addmm_gelu_cpu_bfloat16) ... ok test_addmm_gelu_cpu_float32 (__main__.TestLinalgCPU.test_addmm_gelu_cpu_float32) ... ok test_addmm_gelu_cpu_float64 (__main__.TestLinalgCPU.test_addmm_gelu_cpu_float64) ... ok test_addmm_gelu_cuda_bfloat16 (__main__.TestLinalgCUDA.test_addmm_gelu_cuda_bfloat16) ... ok test_addmm_gelu_cuda_float32 (__main__.TestLinalgCUDA.test_addmm_gelu_cuda_float32) ... ok test_addmm_gelu_cuda_float64 (__main__.TestLinalgCUDA.test_addmm_gelu_cuda_float64) ... ok ---------------------------------------------------------------------- Ran 6 tests in 2.194s OK ``` Reviewers: @eellison @huydhn Subscribers: Tasks: Tags: Differential Revision: [D46931688](https://our.internmc.facebook.com/intern/diff/D46931688) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104031 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-06-22 17:42:33 +00:00
Adnan Akhundov	1c79003b3c	Enable addmm + GELU epilogue fusion via cuBLASLt (#103811 ) Summary: Previously, addmm + GELU epilogue fusion was unconditionally disabled in `ATen/native/cuda/Blas.cpp` due to compilation and numerical issues in CUDA <= 11.4. This PR: 1. Enables addmm + GELU epilogue fusion for CUDA >= 11.8. 2. Restricts the usage of fused addmm epilogue to contiguous output (bugfix). 3. Extends unit tests with addmm epilogue fusion and GELU activation paths. Test Plan: $ python test/test_linalg.py -k test_addmm_relu -v test_addmm_relu_cpu_bfloat16 (__main__.TestLinalgCPU.test_addmm_relu_cpu_bfloat16) ... ok test_addmm_relu_cpu_float32 (__main__.TestLinalgCPU.test_addmm_relu_cpu_float32) ... ok test_addmm_relu_cpu_float64 (__main__.TestLinalgCPU.test_addmm_relu_cpu_float64) ... ok test_addmm_relu_cuda_bfloat16 (__main__.TestLinalgCUDA.test_addmm_relu_cuda_bfloat16) ... ok test_addmm_relu_cuda_float32 (__main__.TestLinalgCUDA.test_addmm_relu_cuda_float32) ... ok test_addmm_relu_cuda_float64 (__main__.TestLinalgCUDA.test_addmm_relu_cuda_float64) ... ok $ python test/test_linalg.py -k test_addmm_gelu -v test_addmm_gelu_cpu_bfloat16 (__main__.TestLinalgCPU.test_addmm_gelu_cpu_bfloat16) ... ok test_addmm_gelu_cpu_float32 (__main__.TestLinalgCPU.test_addmm_gelu_cpu_float32) ... ok test_addmm_gelu_cpu_float64 (__main__.TestLinalgCPU.test_addmm_gelu_cpu_float64) ... ok test_addmm_gelu_cuda_bfloat16 (__main__.TestLinalgCUDA.test_addmm_gelu_cuda_bfloat16) ... ok test_addmm_gelu_cuda_float32 (__main__.TestLinalgCUDA.test_addmm_gelu_cuda_float32) ... ok test_addmm_gelu_cuda_float64 (__main__.TestLinalgCUDA.test_addmm_gelu_cuda_float64) ... ok Reviewers: @eellison Differential Revision: [D46829884](https://our.internmc.facebook.com/intern/diff/D46829884) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103811 Approved by: https://github.com/IvanYashchuk, https://github.com/eellison	2023-06-21 19:59:40 +00:00
PyTorch MergeBot	b34ac35b77	Revert "Use hipsolver for default svd case on ROCm (#103540 )" This reverts commit `0a4a7d4b26`. Reverted https://github.com/pytorch/pytorch/pull/103540 on behalf of https://github.com/huydhn due to Turn out that the failure discussed in https://github.com/pytorch/pytorch/issues/102629 is not a fluke and ROCm signal in trunk is red atm ([comment](https://github.com/pytorch/pytorch/pull/103540#issuecomment-1595309297))	2023-06-16 20:59:40 +00:00
Andres Lugo-Reyes	0a4a7d4b26	Use hipsolver for default svd case on ROCm (#103540 ) Fixes #102678 Fixes #102629 Fixes #102558 HipSOLVER performance on ROCm5.4.2 and later no longer serves as massive bottleneck. Additionally, using magma on rocm in this case caused test_compare_cpu_lialg_pinv_singular_cuda_float32 to fail. Using hipSOLVER, the test now passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103540 Approved by: https://github.com/lezcano	2023-06-16 14:57:34 +00:00
Bearnardd	2abad0c184	Add dtype check baddbmm (#102659 ) Fixes part of the #100838 related to disabling support for non matching dtypes for input/batches for `baddbmm` operator. * [x] added dtype checks * [x] added test case Pull Request resolved: https://github.com/pytorch/pytorch/pull/102659 Approved by: https://github.com/ngimel	2023-06-13 00:31:06 +00:00
Xinya Zhang	1d857586f1	[ROCM] enable hipSOLVER backend for linalg.ldl_factor (#102665 ) * Add complex dtype support for linalg.ldl_factor * Fixes SWDEV-360139 * Enable the following 19 tests for ROCM + test_decomp.py TestDecompCUDA.test_comprehensive_linalg_ldl_factor_cuda_complex128 + test_decomp.py TestDecompCUDA.test_comprehensive_linalg_ldl_factor_cuda_complex64 + test_decomp.py TestDecompCUDA.test_comprehensive_linalg_ldl_factor_ex_cuda_complex128 + test_decomp.py TestDecompCUDA.test_comprehensive_linalg_ldl_factor_ex_cuda_complex64 + test_meta.py TestMetaCUDA.test_dispatch_meta_linalg_ldl_factor_cuda_complex128 + test_meta.py TestMetaCUDA.test_dispatch_meta_linalg_ldl_factor_cuda_complex64 + test_meta.py TestMetaCUDA.test_dispatch_meta_linalg_ldl_factor_ex_cuda_complex128 + test_meta.py TestMetaCUDA.test_dispatch_meta_linalg_ldl_factor_ex_cuda_complex64 + test_meta.py TestMetaCUDA.test_meta_linalg_ldl_factor_cuda_complex128 + test_ops.py TestCommonCUDA.test_noncontiguous_samples_linalg_ldl_factor_cuda_complex64 + test_ops.py TestCommonCUDA.test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_complex64 + test_ops.py TestCommonCUDA.test_variant_consistency_eager_linalg_ldl_factor_cuda_complex64 + test_ops.py TestCommonCUDA.test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_complex64 + test_ops.py TestMathBitsCUDA.test_conj_view_linalg_ldl_factor_cuda_complex64 + test_ops.py TestMathBitsCUDA.test_conj_view_linalg_ldl_factor_ex_cuda_complex64 + test_ops.py TestMathBitsCUDA.test_neg_conj_view_linalg_ldl_factor_cuda_complex128 + test_ops.py TestMathBitsCUDA.test_neg_conj_view_linalg_ldl_factor_ex_cuda_complex128 + test_ops_jit.py TestJitCUDA.test_variant_consistency_jit_linalg_ldl_factor_cuda_complex64 + test_ops_jit.py TestJitCUDA.test_variant_consistency_jit_linalg_ldl_factor_ex_cuda_complex64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102665 Approved by: https://github.com/lezcano	2023-06-08 20:05:01 +00:00
Andres Lugo-Reyes	eaffd98880	Enable hipSOLVER in ROCm builds (#97370 ) Enables the hipSolver backend for ROCm builds -------------------------------------------------------------------------- - Minimum ROCm version requirement - 5.3 - Introduces new macro USE_LINALG_SOLVER the controls enablement of both cuSOLVER and hipSOLVER - Adds hipSOLVER API to hipification process - combines hipSOLVER and hipSPARSE mappings into single SPECIAL map that takes priority among normal mappings - Torch api to be moved to hipsolver backend (as opposed to magma) include: torch.svd(), torch.geqrf(), torch.orgqr(), torch.ormqr() - Will enable 100+ linalg unit tests for ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/97370 Approved by: https://github.com/malfet	2023-05-31 16:53:23 +00:00
PyTorch MergeBot	a64e97b62c	Revert "[dynamo 3.11] enable other torch 3.11 dynamo-related tests (#99180 )" This reverts commit `aa8dcab1ce`. Reverted https://github.com/pytorch/pytorch/pull/99180 on behalf of https://github.com/huydhn due to Sorry for reverting this, but linux-bionic-py3.11-clang9 test starts to timeout after this taking more than 3h30m. This is probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/99180#issuecomment-1545982256))	2023-05-12 16:18:22 +00:00
William Wen	aa8dcab1ce	[dynamo 3.11] enable other torch 3.11 dynamo-related tests (#99180 ) Notes: - No segfaults observed in any CI tests: dynamo unittests, inductor unittests, dynamo-wrapped pytorch tests. So we remove the warning that using dynamo 3.11 may result in segfaults. - Some dynamo-wrapped pytorch tests hang. They will be skipped in the dynamo-wrapped test suite and will be addressed in a future PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/99180 Approved by: https://github.com/malfet	2023-05-12 07:03:09 +00:00
soulitzer	6a02342131	Check inputs have same dtype in addmm_impl_cpu_ even if input has zero numel (#100274 ) Fixes #99226 When an inputs has zero numel, addmm_impl_cpu_'s check that the inputs have the same dtype are bypassed. This PR adds a check before the early return. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100274 Approved by: https://github.com/ngimel	2023-04-29 00:07:54 +00:00
Irem Yuksel	2504089329	Enable test_linalg_solve_triangular_large (#96182 ) PR to see if test fails after removing skip line Fixes #70111 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96182 Approved by: https://github.com/lezcano	2023-04-28 12:54:27 +00:00
Larry Liu	687afeb686	[dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes (#95849 ) Issue: #93684 # Problem Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations. # Design (as I know it) * Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`. * Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent. This PR adds `NumpyTensorVariable` and supports: 1. tensor to ndarray, ndarray to tensor 2. numpy functions such as numpy.meshgrid() 3. ndarray attributes such as `itemsize`, `stride` Next PR will handle returning `np.ndarray` and add support for ndarray methods Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849 Approved by: https://github.com/ezyang	2023-04-27 16:18:35 +00:00
eqy	2fddcf0fc0	[CUDA][CUDA 11] Remove more CUDA 11 version checks (#92934 ) Working on removing stragglers missed in previous CUDA version < 11.0 cleanup PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92934 Approved by: https://github.com/ngimel	2023-03-30 19:49:52 +00:00
Aaron Gokaslan	47dca20d80	[BE] Enable flake8-comprehension rule C417 (#97880 ) Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880 Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD	2023-03-30 14:34:24 +00:00

1 2 3 4 5 ...

470 Commits