pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
eqy	5dbee5691c	[cuDNN][Convolution][TF32][64bit] Add `tf32_on_and_off` decorator to conv3d 64bit test (#161004 ) cuDNN has new generated kernels that can use TF32. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161004 Approved by: https://github.com/janeyx99, https://github.com/Skylion007	2025-09-10 21:39:35 +00:00
Jeff Daily	99f356fa58	[ROCm] revamp miopen integration (#161687 ) Update sources under ATen/miopen and ATen/native/miopen to align with best practices. Avoid reshape_ calls inside backward operations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161687 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-03 22:28:09 +00:00
Eddie Yan	f391afe9bf	[cuDNN][convolution] remove redundant conv3d 64bit test (#161177 ) turns out it's the same as ``` @onlyCUDA @largeTensorTest("40GB") @largeTensorTest("24GB", "cpu") @tf32_on_and_off(0.005) def test_conv3d_64bit_indexing(self, device): x = torch.rand(1, 32, 512, 512, 256) m = torch.nn.Conv3d(32, 1, kernel_size=1, padding=0, stride=1, bias=False) yref = m(x) y = m.to(device=device)(x.to(device=device)) self.assertEqual(yref, y) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/161177 Approved by: https://github.com/Skylion007	2025-08-25 15:01:05 +00:00
eqy	9903ca4f70	[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel (#156140 ) The native kernel doesn't support batch splitting so the previous check wasn't aggressive enough in dispatching to cuDNN https://github.com/pytorch/pytorch/issues/155225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156140 Approved by: https://github.com/ngimel, https://github.com/atalman	2025-08-12 18:07:41 +00:00
Nikita Shulga	e06b110f73	[Testing] Add MPS to NATIVE_DEVICES (#153835 ) This would allow me to enable more opinfo tests against MPS device eventually and supposed to be a very simple test, but actually required minor adjustments to lots of test files, namely: - Introduce `all_mps_types_and` that is very similar to `all_types_and`, but skips `float64` - Decorate lots of tests with `@dtypesIfMPS(*all_mps_types())` - Skip `test_from_dlpack_noncontinguous` as it currently crashes (need to be fixed) - Add lots of `expectedFailureIfMPS` - Delete all `@onlyNativeDeviceTypesAnd("mps")` <sarcasm> I love how well documented this variable are </sarcasm> Pull Request resolved: https://github.com/pytorch/pytorch/pull/153835 Approved by: https://github.com/Skylion007	2025-08-05 18:57:35 +00:00
eqy	c89fa88acb	[conv][cuDNN][64-bit indexing] reduce memory usage of depthwise conv 64-bit indexing test (#158981 ) Use half instead for reduced memory usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/158981 Approved by: https://github.com/soulitzer, https://github.com/Skylion007	2025-07-25 23:58:45 +00:00
PyTorch MergeBot	317af4c87b	Revert "[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel (#156140 )" This reverts commit `a5f59cc2ea`. Reverted https://github.com/pytorch/pytorch/pull/156140 on behalf of https://github.com/atalman due to breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/156140#issuecomment-2988441548))	2025-06-19 15:09:29 +00:00
eqy	a5f59cc2ea	[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel (#156140 ) The native kernel doesn't support batch splitting so the previous check wasn't aggressive enough in dispatching to cuDNN https://github.com/pytorch/pytorch/issues/155225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156140 Approved by: https://github.com/ngimel	2025-06-18 17:32:36 +00:00
eqy	bd3c32916c	[cuDNN] Enabled dilation for deterministic convolutions in cuDNN (#154292 ) Provides order-of-magnitude speedup over fallback impl. https://github.com/pytorch/pytorch/issues/28777 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154292 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-06-11 23:35:52 +00:00
Joona Havukainen	981bdb39ca	Enable ConvTranspose3D for FP32 and Complex64 (#154696 ) Fixes #154615 Enables using ConvTranspose3D since it seems support exists both on MacOS 14 and 15. For the half dtypes the discrepancy of CPU and GPU implementations is too large to conclude whether there is a bug in the implementation or not without a more rigorous study on what bounds are there to the expected error. So they are left unsupported for now and an assert is added to notify the user if the op is called with fp16 or bf16 inputs. Tests for ConvTranspose3D were enabled for the supported data types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154696 Approved by: https://github.com/malfet	2025-06-02 16:24:03 +00:00
Aaron Gokaslan	dbad6d71c7	[BE][Ez]: Unskip conv1d MPS test (#154795 ) Fixes issue I noticed where conv1d test is skipped for complex types unconditionally Pull Request resolved: https://github.com/pytorch/pytorch/pull/154795 Approved by: https://github.com/jansel	2025-05-31 23:01:19 +00:00
eqy	823a35807c	[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions (#153101 ) For #152816 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153101 Approved by: https://github.com/Skylion007	2025-05-20 20:19:03 +00:00
PyTorch MergeBot	bf0fe4f828	Revert "[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions (#153101 )" This reverts commit `ced90d23d3`. Reverted https://github.com/pytorch/pytorch/pull/153101 on behalf of https://github.com/jeanschmidt due to Seems to have introduced breakages on main, tentative revert: https://github.com/pytorch/pytorch/actions/runs/15024667248/job/42224521705 ([comment](https://github.com/pytorch/pytorch/pull/153101#issuecomment-2881208171))	2025-05-14 18:52:07 +00:00
eqy	ced90d23d3	[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions (#153101 ) For #152816 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153101 Approved by: https://github.com/Skylion007	2025-05-14 15:22:47 +00:00
Eddie Yan	ec68d082a1	[CUDA][TF32] Account for TF32 in `test_conv2d_same_padding` (#152618 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152618 Approved by: https://github.com/msaroufim, https://github.com/Skylion007	2025-05-02 20:19:00 +00:00
Jagadish Krishnamoorthy	0d99b4e9e2	ROCm: Enable tf32 testing on test_nn (#148945 ) Add tf32 support for ROCm tests. test command: python test/test_nn.py -v Pull Request resolved: https://github.com/pytorch/pytorch/pull/148945 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-04-28 23:01:04 +00:00
Alvaro-Kothe	8ce3d4a541	test(Conv3d): use correct class for `test_Conv3d_module_same_padding` (#152187 ) The test for the class `Conv3d` is calling `Conv2d`. This PR just ensure that we are testing the correct module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152187 Approved by: https://github.com/Skylion007	2025-04-28 16:59:12 +00:00
cyy	970fefcc53	Remove outdated skipCUDAIfCudnnVersionLessThan decoration (#148940 ) Test conditions for CUDNN 7 and 8 were removed because we have moved to CUDNN 9. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148940 Approved by: https://github.com/mikaylagawarecki	2025-03-13 18:02:50 +00:00
cyy	a5f6b24d87	Remove outdated skipIfRocmVersionLessThan decorations (#148941 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/148941 Approved by: https://github.com/jeffdaily	2025-03-11 18:37:40 +00:00
Jeff Daily	44248c44eb	[ROCm] miopen benchmark behavior now better aligns with cudnn (#145294 ) The default benchmark setting is now false. The new miopen behavior means when benchmarking is disabled, for any shape that doesn't have a find hit, then it will do a quick search (same behavior as the prior default), and use that result. Now when benchmark is enabled, it will perform an exhaustive search and update any DBs. miopen immediate mode is still available and is used when deterministic is true and benchmark is false. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145294 Approved by: https://github.com/BrianHarrisonAMD, https://github.com/malfet	2025-02-05 17:19:53 +00:00
Benjamin Glass	5aa5a5763e	[inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 (#145684 ) Triton 2.2 and greater have a bug where allowing TF32 generation for a GPU that does not support TF32 will cause code generation errors. Patch around this problem by: 1. Adding a function to `torch.cuda` that determines whether CUDA hardware is capable of using the TF32 format. 2. Using that function to explicitly disable TF32 generation when calling Triton, where needed. To demonstrate that this fix works, try running `test/inductor/test_max_autotune.py` on a GPU with CUDA compute capability < 8 (e.g. any NVIDIA consumer GPU) without this fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145684 Approved by: https://github.com/eqy	2025-01-28 22:01:08 +00:00
PyTorch MergeBot	6a4fb4b615	Revert "Align CPU behavior with CUDA for `ConvTranspose` when `out_channels=0` (#142859 )" This reverts commit `cb814c0b96`. Reverted https://github.com/pytorch/pytorch/pull/142859 on behalf of https://github.com/malfet due to It broke ROCM tests again, see `5cd2b34e82/1` ([comment](https://github.com/pytorch/pytorch/pull/142859#issuecomment-2614523822))	2025-01-26 17:49:05 +00:00
Wu, Chunyuan	cb814c0b96	Align CPU behavior with CUDA for `ConvTranspose` when `out_channels=0` (#142859 ) Fixes https://github.com/pytorch/pytorch/issues/142466. Remove the `weight.numel() != 0` check to align the behavior with CUDA for `ConvTranspose` when `out_channels=0`. After removing this check, the existing code is already able to give an empty output in such case. Test plan: ``` python -u test/nn/test_convolution.py -k test_ConvTranspose_output_channels_0_cpu_float32 python -u test/nn/test_convolution.py -k test_ConvTranspose_output_channels_0_cuda_float32 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/142859 Approved by: https://github.com/mingfeima, https://github.com/malfet	2025-01-26 01:56:40 +00:00
PyTorch MergeBot	d95a6babcc	Revert "Align CPU behavior with CUDA for `ConvTranspose` when `out_channels=0` (#142859 )" This reverts commit `0bff377880`. Reverted https://github.com/pytorch/pytorch/pull/142859 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the XLA failures look legit ([comment](https://github.com/pytorch/pytorch/pull/142859#issuecomment-2608631019))	2025-01-23 01:10:31 +00:00
Wu, Chunyuan	0bff377880	Align CPU behavior with CUDA for `ConvTranspose` when `out_channels=0` (#142859 ) Fixes https://github.com/pytorch/pytorch/issues/142466. Remove the `weight.numel() != 0` check to align the behavior with CUDA for `ConvTranspose` when `out_channels=0`. After removing this check, the existing code is already able to give an empty output in such case. Test plan: ``` python -u test/nn/test_convolution.py -k test_ConvTranspose_output_channels_0_cpu_float32 python -u test/nn/test_convolution.py -k test_ConvTranspose_output_channels_0_cuda_float32 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/142859 Approved by: https://github.com/mingfeima, https://github.com/malfet	2025-01-22 17:52:53 +00:00
Tom Ritchford	eaef613688	Fix issue with test/nn/test_convolution:TestConvolutionNNDeviceTypeCUDA.test_conv_large_batch_1_cuda (#145067 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145067 Approved by: https://github.com/Skylion007, https://github.com/nWEIdia Co-authored-by: Wei Wang <143543872+nWEIdia@users.noreply.github.com>	2025-01-17 20:31:25 +00:00
Tom Ritchford	c947a7d38e	Fix unused Python variables in test/nn (#143396 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143396 Approved by: https://github.com/mikaylagawarecki	2024-12-18 03:30:54 +00:00
Nikita Shulga	9c88b08ac9	[BE] Replace `skipIfMPS` with `expectedFailureMPS` (#139940 ) Functionally two decorators are very similar, but one should rely on expectedFailure as much as possible to get signal when something is fixed. - Move `product_version` variable from `test_mps` to common_utils, but call it `MACOS_VERSION` - Introduce `skipIfMPSOnMacOS13` to decorate the hard crashes that happens only on MacOS13 (which at this point will not get any fixes and will be deprecated soon) - Add `device_type='mps'` to all `skipIfMPS` per https://github.com/pytorch/pytorch/issues/140560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139940 Approved by: https://github.com/janeyx99, https://github.com/huydhn	2024-11-15 03:48:37 +00:00
Eddie Yan	846b4e614b	[TF32][cuDNN][Convolution] Add some missing TF32 decorators (#138768 ) Newer cuDNN versions seem to be able to dispatch to cuDNN kernels Pull Request resolved: https://github.com/pytorch/pytorch/pull/138768 Approved by: https://github.com/Skylion007	2024-10-25 19:03:42 +00:00
Siddharth Kotapati	e27c0048db	Enable additional tests for MPS CI runs (#134356 ) As part of the follow up for https://github.com/pytorch/pytorch/issues/133520, adapting existing unused tests for use in MPS CI runs. Focusing on nhwc & other memory formatting tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/134356 Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/huydhn	2024-10-04 21:52:38 +00:00
Mikayla Gawarecki	d9576c9440	Fix failures when default is flipped for weights_only (#127627 ) Tests on XLA shard not fixed yet but there is an issue here https://github.com/pytorch/xla/issues/7799 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127627 Approved by: https://github.com/albanD ghstack dependencies: #132349	2024-08-16 00:22:43 +00:00
Xuehai Pan	fbe6f42dcf	[BE][Easy][8/19] enforce style for empty lines in import segments in `test/[k-p]*/` (#129759 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129759 Approved by: https://github.com/justinchuby, https://github.com/ezyang	2024-07-31 02:09:20 +00:00
eellison	28f29e074b	Dont mutate tensor stride in place in cudnn conv (#126786 ) Fix for https://github.com/pytorch/pytorch/issues/126241. Within the cudnn convolution, we were in-place updating the strides of the tensor to disambiguate for size-1 dims and contiguous and channels last tensors. Instead of mutating the tensors stride, just use a temporary. Inside cudnn it is then copied: `d7ccb5b3c4/include/cudnn_frontend_Tensor.h (L201-L203)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126786 Approved by: https://github.com/ezyang, https://github.com/shunting314, https://github.com/eqy	2024-05-22 01:53:44 +00:00
eqy	973d724e21	[CUDA] Fix 64-bit indexing in `vol2col` in conv3d (#124650 ) Similar to #118005, fixes sometimes silent IMAs that occur CC @atalman @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/124650 Approved by: https://github.com/soulitzer	2024-04-25 23:21:43 +00:00
PyTorch MergeBot	24ed909934	Revert "[CUDA] Fix 64-bit indexing in `vol2col` in conv3d (#124650 )" This reverts commit `71d92bace2`. Reverted https://github.com/pytorch/pytorch/pull/124650 on behalf of https://github.com/jeanschmidt due to Reverting to check if it introduced regressions for linux-focal-rocm6.0-py3.8 tests ([comment](https://github.com/pytorch/pytorch/pull/124650#issuecomment-2076786795))	2024-04-25 09:46:21 +00:00
Eddie Yan	71d92bace2	[CUDA] Fix 64-bit indexing in `vol2col` in conv3d (#124650 ) Similar to #118005, fixes sometimes silent IMAs that occur CC @atalman @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/124650 Approved by: https://github.com/soulitzer	2024-04-24 19:47:18 +00:00
Yuanhao Ji	a625705290	Enable UFMT on all of `test/nn` (#123809 ) Part of: #123062 Ran lintrunner on: - `test/nn` with command: ```bash lintrunner -a --take UFMT --all-files ``` Co-authored-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123809 Approved by: https://github.com/mikaylagawarecki	2024-04-12 18:32:25 +00:00
eqy	624e58f2c6	[CUDA] Update `size_1` conv tests with TF32 thresholds (#118022 ) Seeing some numerical mismatches on A100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118022 Approved by: https://github.com/atalman	2024-04-09 23:49:40 +00:00
Eddie Yan	3db618d656	[CUDA] Use 64-bit indexing in `CUDA_KERNEL_LOOP` in `im2col` (#118005 ) #117736 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118005 Approved by: https://github.com/atalman	2024-04-09 21:04:20 +00:00
Xia Weiwen	d1510e01fa	Upgrade submodule onednn to v3.3.5 (#120767 ) This upgrade contains the fixes to the known issues brought by oneDNN v3.3.2, including issues https://github.com/pytorch/pytorch/issues/115346, https://github.com/pytorch/pytorch/issues/120211 and https://github.com/pytorch/pytorch/issues/120406 and those listed in PR #112700. Issue https://github.com/pytorch/pytorch/issues/115346 (perf regression) was fixed by oneDNN v3.3.4. No new regression was found with v3.3.5. The detailed results of v3.3.4 are given below and compared with v3.1.1 (the oneDNN version in PyTorch before it was updated to v3.3.2). 1. A performance regression with 5.8% perf drop from `pytorch_stargan-train` (see https://github.com/pytorch/benchmark/issues/2076#issuecomment-1847545843) Validation results with this patch: Latency increased by 0.60% ``` Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake) oneDNN v3.1.1 metrics-1484287.json { "name": "cpu", "environ": { "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0" }, "metrics": { "latency": 418.851717 } } oneDNN v3.3.4 { "name": "cpu", "environ": { "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0" }, "metrics": { "latency": 421.381313 } } ``` 2. Performance regression of FP32 rexnet_100 with Inductor, dynamic shape, multi-threads (see https://github.com/pytorch/pytorch/issues/115346#issue-2030859592) Validation results with this patch: Latency reduced by 3.23% ``` Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake) oneDNN v3.1.1 (inductor speedup over eager mode) 2.876x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,rexnet_100,128,2.875904,113.314765,18.455283,0.990437,1302.636134,1315.212902,351,1,0,0 oneDNN v3.3.4 (inductor speedup over eager mode) 3.003x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,rexnet_100,128,3.003012,109.653012,91.547260,0.990048,1302.532506,1315.625370,351,1,0,0 ``` 3. Performance regression of AMP hf_T5_generate and tinynet_a with Inductor, static shape, multi-threads (see https://github.com/pytorch/pytorch/issues/115346#issuecomment-1856029962) Validation results with this patch: Latency reduced by 0.85% ``` Tested on an AWS spr metal instance oneDNN v3.1.1 (inductor speedup over eager mode) 1.120x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,hf_T5_generate,1,1.120018,1197.807729,205.905466,0.442803,125.179904,282.698957,10550,48,8,4 oneDNN v3.3.4 (inductor speedup over eager mode) 1.134x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,hf_T5_generate,1,1.133594,1187.701514,205.855527,0.422012,128.405094,304.268493,10550,48,8,4 ``` The following issues about functionality are fixed by this upgrade. Test cases are also added for these issues. - https://github.com/pytorch/pytorch/issues/120211 - https://github.com/pytorch/pytorch/issues/120406 - https://github.com/pytorch/pytorch/issues/120547 ----- Below are detailed data of torchbench CPU userbenchmark test and Inductor FP32/AMP inference tests. No regression of perf or functionality was found. I. torchbench CPU userbenchmark test Suite \| Speedup -- \| -- eager_throughtput_bf16_infer \| 1.001848 eager_throughtput_fp32_infer \| 1.000257 eager_throughtput_fx_int8 \| 1.003069 jit_llga_throughtput_amp_bf16 \| 1.000682 jit_llga_throughtput_fp32 \| 1.000313 eager_throughtput_bf16_train \| 0.998222 eager_throughtput_fp32_train \| 1.003384 II. Inductor FP32/AMP inference tests i. FP32 static default suite \| name \| thread \| batch size \| Ratio Speedup(New/old) -- \| -- \| -- \| -- \| -- torchbench \| timm_efficientnet \| multiple \| 64 \| 1.09 timm_models \| tinynet_a \| multiple \| 128 \| 1.14 ii. FP32 dynamic default suite \| name \| thread \| batch size \| Ratio Speedup(New/old) -- \| -- \| -- \| -- \| -- torchbench \| alexnet \| multiple \| 128 \| 1.08 torchbench \| basic_gnn_edgecnn \| multiple \| 1 \| 0.98 torchbench \| timm_efficientnet \| multiple \| 64 \| 1.08 iii. AMP static default suite \| name \| thread \| batch size \| Ratio Speedup(New/old) -- \| -- \| -- \| -- \| -- torchbench \| hf_distil_whisper \| multiple \| 1 \| 1.18 torchbench \| timm_efficientnet \| multiple \| 64 \| 1.32 huggingface \| BartForConditionalGeneration \| multiple \| 2 \| 1.19 timm_models \| eca_halonext26ts \| multiple \| 128 \| 1.13 timm_models \| nfnet_l0 \| multiple \| 128 \| 1.13 timm_models \| rexnet_100 \| multiple \| 128 \| 1.45 timm_models \| spnasnet_100 \| multiple \| 128 \| 1.15 timm_models \| tf_efficientnet_b0 \| multiple \| 128 \| 1.22 timm_models \| tinynet_a \| multiple \| 128 \| 1.49 torchbench \| hf_Bert_large \| single \| 1 \| 1.16 huggingface \| XLNetLMHeadModel \| single \| 1 \| 1.07 iv. AMP dynamic default suite \| name \| thread \| batch size \| Ratio Speedup(New/old) -- \| -- \| -- \| -- \| -- torchbench \| timm_efficientnet \| multiple \| 64 \| 1.32 huggingface \| PLBartForConditionalGeneration \| multiple \| 4 \| 1.14 timm_models \| nfnet_l0 \| multiple \| 128 \| 1.15 timm_models \| rexnet_100 \| multiple \| 128 \| 1.45 timm_models \| tinynet_a \| multiple \| 128 \| 1.34 huggingface \| XLNetLMHeadModel \| single \| 1 \| 1.09 ----- Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120767 Approved by: https://github.com/chuanqi129, https://github.com/jgong5, https://github.com/atalman	2024-03-11 12:56:59 +00:00
Eddie Yan	d790c1dca6	[CUDA][cuDNN][TF32] Misc TF32 updates (#118781 ) Twiddle some thresholds that don't seem to play nice with sm90. CC @tinglvv @nWEIdia @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/118781 Approved by: https://github.com/ezyang	2024-02-01 15:32:50 +00:00
Damien	2d2016fdf8	WIP Add compatibility with channels_last_3d for conv3d (#114790 ) Part of a multi-PR work to fix #59168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114790 Approved by: https://github.com/albanD	2023-12-20 19:28:25 +00:00
PyTorch MergeBot	a7bfa04da6	Revert "More markDynamoStrictTest (#115870 )" This reverts commit `7f686c8fe1`. Reverted https://github.com/pytorch/pytorch/pull/115870 on behalf of https://github.com/jeanschmidt due to Breaking internal tests and builds, please check diff ([comment](https://github.com/pytorch/pytorch/pull/115870#issuecomment-1862997125))	2023-12-19 15:40:57 +00:00
rzou	7f686c8fe1	More markDynamoStrictTest (#115870 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115870 Approved by: https://github.com/voznesenskym ghstack dependencies: #115845, #115855, #115856, #115857, #115858	2023-12-15 05:26:54 +00:00
Jithun Nair	2ea2421b44	Skip unit tests that fail on MI210 runners (#114613 ) Taken from https://github.com/pytorch/pytorch/pull/105980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114613 Approved by: https://github.com/malfet	2023-11-27 22:25:35 +00:00
rraminen	44367c59b2	Update skip reason for failing unit tests on ROCm 5.7 (#113286 ) Follow up to https://github.com/pytorch/pytorch/pull/110465. Updated skip reason for failing unit tests on ROCm 5.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113286 Approved by: https://github.com/malfet	2023-11-13 19:29:04 +00:00
rraminen	3a429423fc	Upgrade CI to ROCm5.7 (#110465 ) This PR is to upgrade CI to ROCm5.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110465 Approved by: https://github.com/pruthvistony, https://github.com/malfet	2023-11-08 06:11:10 +00:00
Pruthvi Madugundu	9ce2e02fd6	Revert "[ROCm] Remove PYTORCH_MIOPEN_SUGGEST_NHWC flag (#90725 )" (#110319 ) This reverts commit `66bfcd32fd`. NHWC is have perf regression on MIOpen, so reverting till the performance issue is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110319 Approved by: https://github.com/jeffdaily, https://github.com/jithunnair-amd, https://github.com/kit1980	2023-10-03 19:14:47 +00:00
CaoE	7c9052165a	add fp16 support for native conv and deconv on CPU (#99497 ) ### Testing Native conv vs. mkldnn conv on SPR (with avx512_fp16 support) Single core: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 34676789 \| 524199.8 \| 66.15185 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 33454125 \| 349844.4 \| 95.62573 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 317650.1 \| 2317.677 \| 137.0554 IC: 128, OC: 256, kernel: 3, stride: 1, N: 1, L: 64 \| 15334.68 \| 167.264 \| 91.67952 56 cores: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 1032064 \| 11073.58 \| 93.20061 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 1000097 \| 16371.19 \| 61.08883 IC: 256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 981813.4 \| 9008.908 \| 108.9825 IC: 1024, OC: 256, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 1082606 \| 10150.47 \| 106.6558 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 319980.6 \| 181.598 \| 1762.027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99497 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-09-25 01:31:26 +00:00
Justin Chu	79c5e33349	[BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436 Approved by: https://github.com/malfet, https://github.com/albanD	2023-07-21 07:38:46 +00:00

1 2

64 Commits