pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mikayla Gawarecki	f138b08d2e	Migrated loss functions to ModuleInfos (#115584 ) Migrates most tests in `common_nn.py:criterion_tests` to ModuleInfos. I can split this up if it is too large to review What this PR does not include: - [`no_batch_dim` tests](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L3995-L4112) - [tests that use the functional variant of the loss function and `wrap_functional`](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L1079-L1128) #### On test times This PR increases test time by ~58s locally Before this PR: ``` >>> python test/test_nn.py -k Loss Ran 1003 tests in 28.977s ``` After this PR ``` >>> python test/test_nn.py -k Loss Ran 368 tests in 23.073s ``` ``` >>> python test/test_modules.py -k Loss Ran 836 tests in 63.900s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115584 Approved by: https://github.com/janeyx99 ghstack dependencies: #115617	2023-12-12 22:20:20 +00:00
Wongboo	68f74dd162	Add python and C++ support for LPPool3d (#114199 ) Add python and C++ support for LPPool3d to Fixes #114114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114199 Approved by: https://github.com/mikaylagawarecki	2023-12-08 18:18:44 +00:00
Aaron Gokaslan	b7b2178204	[BE]: Remove useless lambdas (#113602 ) Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602 Approved by: https://github.com/albanD	2023-11-14 20:06:48 +00:00
CaoE	7c9052165a	add fp16 support for native conv and deconv on CPU (#99497 ) ### Testing Native conv vs. mkldnn conv on SPR (with avx512_fp16 support) Single core: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 34676789 \| 524199.8 \| 66.15185 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 33454125 \| 349844.4 \| 95.62573 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 317650.1 \| 2317.677 \| 137.0554 IC: 128, OC: 256, kernel: 3, stride: 1, N: 1, L: 64 \| 15334.68 \| 167.264 \| 91.67952 56 cores: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 1032064 \| 11073.58 \| 93.20061 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 1000097 \| 16371.19 \| 61.08883 IC: 256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 981813.4 \| 9008.908 \| 108.9825 IC: 1024, OC: 256, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 1082606 \| 10150.47 \| 106.6558 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 319980.6 \| 181.598 \| 1762.027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99497 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-09-25 01:31:26 +00:00
FFFrog	003c5bb156	Add checks to `num_layers` for `RNN`, `LSTM`, `GRU` (#108853 ) Fixes #108223 As the title shown Pull Request resolved: https://github.com/pytorch/pytorch/pull/108853 Approved by: https://github.com/mikaylagawarecki	2023-09-09 19:33:52 +00:00
CaoE	8f02884569	add Half support for GroupNorm on CPU (#100234 ) ### Testing Single socket (28cores): * Contiguous: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 2.45E-05 \| 3.26E-05 \| 6.87E-05 \| 7.40E-05 [10, 128, 80, 80] \| 0.000726 \| 0.000606 \| 0.002183 \| 0.001112 * Channels Last: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 2.88E-05 \| 2.72E-05 \| 6.56E-05 \| 6.63E-05 [10, 128, 80, 80] \| 0.00076 \| 0.000256 \| 0.002385 \| 0.000735 Single core: * Contiguous: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 9.47E-05 \| 1.90E-04 \| 2.03E-04 \| 3.10E-04 [10, 128, 80, 80] \| 6.25E-03 \| 8.98E-03 \| 0.016485 \| 0.01369 * Channels Last: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 8.66E-05 \| 7.89E-05 \| 1.95E-04 \| 1.43E-04 [10, 128, 80, 80] \| 5.97E-03 \| 3.13E-03 \| 0.01626 \| 8.70E-03 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100234 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-09-01 21:25:24 +00:00
Mikayla Gawarecki	584a01b650	Fix LayerNorm(bias=False) error (#108060 ) Fixes #108048 - [ ] Cherry pick this [here](https://github.com/pytorch/pytorch/issues/108055) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108060 Approved by: https://github.com/jbschlosser, https://github.com/albanD, https://github.com/malfet	2023-08-28 18:23:13 +00:00
CaoE	3267996372	add channel last 3d support for maxpool3d on CPU (#97775 ) ### Testing Single socket (28 cores): shape \| fp32 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 3.959584 \| 5.493402 \| 0.557232 \| 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 0.815511 \| 1.351261 \| 5.710506 \| 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 10.63426 \| 15.28637 \| 2.67656 \| 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 2.63570 \| 2.05532 \| 2.55452 \| 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 0.375469 \| 0.479748 \| 0.066364 \| 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 0.112197 \| 0.112326 \| 0.111697 \| 0.145364 Single core: shape \| fp32 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 92.16582 \| 128.6513 \| 6.684325 \| 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 10.14318 \| 29.80297 \| 7.350142 \| 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 238.55453 \| 331.89967 \| 19.694657 \| 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 30.17079 \| 32.75628 \| 22.44543 \| 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 7.474389 \| 9.937217 \| 0.236015 \| 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 2.318954 \| 2.469444 \| 0.262125 \| 0.401361 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97775 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-08-26 00:21:27 +00:00
CaoE	3992450e8d	Add backward check for test_memory_format (#106104 ) Add backward check for test_memory_format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106104 Approved by: https://github.com/mikaylagawarecki	2023-08-25 18:11:54 +00:00
Prachi Gupta	3022a395f3	test_memory_format test now passes on rocm (#107696 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107696 Approved by: https://github.com/pruthvistony, https://github.com/albanD	2023-08-23 16:39:19 +00:00
Liao, Xuan	71632d4d24	[cpu] add sdpa choice and UT (#105131 ) Feature RFC: https://github.com/pytorch/rfcs/pull/56. Write an SDPA selecting function for CPU to automatically choose one SDPA implementation among several ones. There are two CPU implementations which could be chosen: the unfused SDPA and flash attention. In general, flash attention has a higher priority than the unfused SDPA. For cases where flash attention is not applicable, such as manually disabling flash attention or the inputs not 4 dimensional, the unfused SDPA is chosen. ## Performance of the stack ### NanoGPT's SDPA kernel Using benchmark [repo](https://github.com/mingfeima/bench_sdpa/blob/main/README.md), with one socket. Shape: Batch size 1, Sequence length 1024, Head number 25, Head size 64. Machine: SPR. \| Dtype \| Causal \| Mode \| SDPA \| Time (ms per iter) \| Speedup \| \| -------- \| -------- \| ------- \| ------- \| ------- \| ------- \| \| float32 \| FALSE \| Inference \| Unfused \| 3.081 \| \| \| \| \| \| Flash attention \| 1.665 \| 1.85045 \| \| float32 \| TRUE \| Inference \| Unfused \| 3.463 \| \| \| \| \| \| Flash attention \| 1.662 \| 2.083634\| \| bfloat16 \| FALSE \| Inference \| Unfused \| 1.203 \| \| \| \| \| \| Flash attention \| 1.154 \| 1.042461\| \| bfloat16 \| TRUE \| Inference \| Unfused \| 1.543 \| \| \| \| \| \| Flash attention \| 1.154 \| 1.337088\| \| float32 \| FALSE \| Training \| Unfused \| 54.938 \| \| \| \| \| \| Flash attention \| 23.029 \| 2.385601\| \| float32 \| TRUE \| Training \| Unfused \| 58.266 \| \| \| \| \| \| Flash attention \| 17.835 \| 3.266947\| \| bfloat16 \| FALSE \| Training \| Unfused \| 18.924 \| \| \| \| \| \| Flash attention \| 18.886 \| 1.002012\| \| bfloat16 \| TRUE \| Training \| Unfused \| 21.08 \| \| \| \| \| \| Flash attention \| 14.172 \| 1.48744 \| ### Stable Diffusion Following model's [BKM](https://github.com/intel-innersource/frameworks.ai.models.intel-models/blob/develop/quickstart/diffusion/pytorch/stable_diffusion/inference/cpu/README.md). Mode: Inference; Machine: SPR. \| Dtype \| SDPA \| Throughput (fps) \| Speedup SDPA \| Total Time (ms) \| Speedup \| \| -------- \| -------- \| ------- \| ------- \| ------- \| ------- \| \| float32 \| Unfused \| 1.63 \| \| 1139 \| \| \| \| Flash attention \| 1.983 \| 1.216564 \| 547.488 \| 2.080411\| \| bfloat16 \| Flash attention in IPEX \| 4.784 \| \| 429.051 \| \| \| \| Flash attention \| 4.857 \| 1.015259 \| 408.823 \| 1.049479\| ### LLM models of Torchbench Dtype: float32; Mode: Inference, single socket; Machine: CPX. Model name \| SDPA \| Inductor_new \| Inductor_old \| Inductor Ratio(old/new) -- \| -- \| -- \| -- \| -- hf_Albert \| Unfused -> Flash attention \| 0.048629309 \| 0.05591545 \| 1.14983024 hf_Bert \| Unfused -> Flash attention \| 0.053156243 \| 0.060732115 \| 1.142520841 hf_Bert_large \| Unfused -> Flash attention \| 0.141089502 \| 0.155190077 \| 1.099940636 llama \| Unfused -> Flash attention \| 0.033250106 \| 0.033720745 \| 1.01415451 Dtype: bfloat16; Mode: Inference, single socket; Machine: SPR. Model name \| SDPA \| Inductor_new \| Inductor_old \| Inductor Ratio(old/new) -- \| -- \| -- \| -- \| -- hf_Albert \| Unfused -> Flash attention \| 0.020681298 \| 0.020718282 \| 1.001788324 hf_Bert \| Unfused -> Flash attention \| 0.019932816 \| 0.019935424 \| 1.000130842 hf_Bert_large \| Unfused -> Flash attention \| 0.047949174 \| 0.048312502 \| 1.007577355 llama \| Unfused -> Flash attention \| 0.018528057 \| 0.01861126 \| 1.0044907 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105131 Approved by: https://github.com/drisspg ghstack dependencies: #104583, #104584, #103826, #104693, #104863, #107128	2023-08-20 08:56:21 +00:00
FFFrog	2d2d43d9fb	add more check on LSTMCell (#107380 ) Just like #107223, operator ``LSTMCell`` have the same problems as ``GRUCell``, and add some check and tests related to fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107380 Approved by: https://github.com/ezyang	2023-08-18 20:44:17 +00:00
PyTorch MergeBot	02bcaf45f6	Revert "Add backward check for test_memory_format (#106104 )" This reverts commit `2e44adb066`. Reverted https://github.com/pytorch/pytorch/pull/106104 on behalf of https://github.com/huydhn due to Sorry for reverting this but it is failing inductor job in trunk `2e44adb066`. I will add ciflow/inductor label to the PR make sure that the test runs there ([comment](https://github.com/pytorch/pytorch/pull/106104#issuecomment-1683119990))	2023-08-17 23:45:31 +00:00
CaoE	2e44adb066	Add backward check for test_memory_format (#106104 ) Add backward check for test_memory_format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106104 Approved by: https://github.com/mikaylagawarecki	2023-08-17 21:19:34 +00:00
FFFrog	a4229690e3	Add Some Checks about dim (#107223 ) Fixes #106769 As mentioned in [GRUCell](https://pytorch.org/docs/stable/generated/torch.nn.GRUCell.html#grucell), `hidden` should have the same dimension as `input`, and the dimension should be either `1D` or `2D`. As for other aspects, it has been verified in `C++`, such as the batch of `Input` and `hidden` are the same, `Input`'s Dim1 and `input_size` are the same, `hidden`'s Dim1 and `hidden_size` are the same, etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107223 Approved by: https://github.com/albanD	2023-08-16 22:03:31 +00:00
Mikayla Gawarecki	1317dbf176	Reland "Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148 )" (#106632 ) Previous one was reverted because the PR stacked under which added error-checking to Pad variants https://github.com/pytorch/pytorch/pull/106147 was reverted as internally some people pass 2D inputs to ZeroPad2d (which should actually take 3d or 4d inputs :) but there wasn't actually anything this PR was breaking according to my understanding Pull Request resolved: https://github.com/pytorch/pytorch/pull/106632 Approved by: https://github.com/albanD	2023-08-07 20:10:25 +00:00
PyTorch MergeBot	dfcfd5cedb	Revert "Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148 )" This reverts commit `87d2536971`. Reverted https://github.com/pytorch/pytorch/pull/106148 on behalf of https://github.com/malfet due to Reverting as dependent PR https://github.com/pytorch/pytorch/pull/106147 was reverted as well ([comment](https://github.com/pytorch/pytorch/pull/106148#issuecomment-1662344543))	2023-08-02 14:46:00 +00:00
PyTorch MergeBot	d83b887f2a	Revert "Add error checking for padding modules (#106147 )" This reverts commit `0547b6279d`. Reverted https://github.com/pytorch/pytorch/pull/106147 on behalf of https://github.com/jeanschmidt due to sadly it is breaking internal builds, and I can't coordinate a FF due to timezone differences ([comment](https://github.com/pytorch/pytorch/pull/106147#issuecomment-1661870970))	2023-08-02 09:37:40 +00:00
Mikayla Gawarecki	87d2536971	Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148 ) Fixes #105749 https://github.com/pytorch/pytorch/issues/95320 (tldr is that input should always be `[N, C, H, (W, D])` where only H, W and D dimensions get circular padding, so the 2D case where user wants both dimensions to be padded --> they should `.unsqueeze(0)` (as is the case for `Reflection/ReplicationPad`) but we didn't document this for circular padding. [This seems to be the old docstring](`277b05014a/torch/nn/functional.py (L4689)`) that was somehow lost. Fixes no_batch_dim support https://github.com/pytorch/pytorch/issues/104860 - Adds missing documentation for circular padding - Adds missing CircularPad modules - Migrates legacy test_nn tests from circular padding to ModuleInfo - Adds no_batch_dim support + sample inputs that test this Pull Request resolved: https://github.com/pytorch/pytorch/pull/106148 Approved by: https://github.com/albanD ghstack dependencies: #106325, #106147	2023-08-01 12:49:58 +00:00
Mikayla Gawarecki	0547b6279d	Add error checking for padding modules (#106147 ) Fixes https://github.com/pytorch/pytorch/issues/105627 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106147 Approved by: https://github.com/albanD ghstack dependencies: #106325	2023-08-01 12:49:58 +00:00
Mikayla Gawarecki	c9be60cd0e	Add error inputs to ModuleInfo (mirroring OpInfo) (#106325 ) Add infra for error inputs to ModuleInfos, migrate first few error inputs tests from test_nn.py (more to come!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106325 Approved by: https://github.com/albanD	2023-08-01 12:49:56 +00:00
Mikayla Gawarecki	e18d53e2df	Added ModuleInfo test for meta device ctx init (#105871 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105871 Approved by: https://github.com/albanD	2023-07-26 01:57:54 +00:00
Justin Chu	be03a56955	[BE] Enable ruff's UP rules and autoformat testing/ (#105425 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105425 Approved by: https://github.com/malfet	2023-07-18 21:04:39 +00:00
mingfeima	a66f08d626	enable channels last for replication padding on CPU (#102597 ) Enable channels last support for replication padding on CPU. This patch add channels last support for ReplicationPad2d/3d on CPU backend. The following test cases will pass with this patch: ``` python test_modules.py TestModuleCPU.test_memory_format_nn_ReplicationPad2d_cpu_float32 python test_modules.py TestModuleCPU.test_memory_format_nn_ReplicationPad3d_cpu_float32 ``` The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket. ### single core inference ``` (before) ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.339 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 82.935 ms (after) ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.324 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 16.717 ms ``` ### single socket inference ``` (before) ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.135 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 7.203 ms (after) ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.029 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 3.174 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102597 Approved by: https://github.com/CaoE, https://github.com/cpuhrsch	2023-07-14 03:44:55 +00:00
mingfeima	f73757d551	enable channels last for reflection padding on CPU (#102518 ) Add channels last support for reflection padding on CPU. The following test cases will pass with this patch: ``` python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32 python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad3d_cpu_float32 ``` The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket. ### single core inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.356 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 86.821 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.328 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 16.806 ms ``` ### single socket inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.142 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 7.367 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.027 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 3.181 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102518 Approved by: https://github.com/CaoE, https://github.com/cpuhrsch	2023-07-13 16:22:31 +00:00
Jens Glaser	86e0eda18d	Add partial derivative unit tests (#103809 ) Adds the unit tests requested in #95810 This PR also addresses a gap in unit testing of gradients, as `gradcheck` always performs total derivatives w.r.t. all arguments and module parameters. Some modules have different code paths for partial derivatives, e.g. `LayerNorm`, and those should be tested separately. The PR has the following limitations: - it does not test partial derivatives w.r.t. every combination of arguments, which would exponentially increase CI time. - it does not implement the same logic for Hessians, where the increase in CI time would be quadratic in the number of arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103809 Approved by: https://github.com/kit1980	2023-06-25 00:36:10 +00:00
Ramin Azarmehr	cecfcf1e17	[MPS] Handle MPS failures of test_modules.py in common_modules.py (#95334 ) - Also cleaned up `test_modules.py` from skipMPS code. - Added `skipMPS` for unsupported or failing tests on MPS backend in common_modules.py. (We'll remove `skipMPS` from those tests once a fix is available for them.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95334 Approved by: https://github.com/kulinseth, https://github.com/albanD	2023-05-09 03:55:16 +00:00
Mikayla Gawarecki	2c6c7deeb3	Added ModuleInfos for Pooling ops (#98358 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98358 Approved by: https://github.com/albanD	2023-04-05 19:39:07 +00:00
Mikayla Gawarecki	3a0ad3c194	[easy] Remove large LayerNorm sample input causing OOM from ModuleInfo (#98424 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98424 Approved by: https://github.com/huydhn, https://github.com/albanD	2023-04-05 19:38:15 +00:00
Mikayla Gawarecki	96ad739ddc	Added ModuleInfos for {*}Norm modules (#97919 ) Not adding Lazy variants yet pending investigation of #97915 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97919 Approved by: https://github.com/albanD	2023-04-04 01:15:25 +00:00
lezcano	6871665a97	Avoid copies in matmul (no ghstack) (#97355 ) Resubmit of https://github.com/pytorch/pytorch/pull/76828 without using ghstack so that @ngimel can import it and help me debug the issue why it was reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97355 Approved by: https://github.com/ngimel, https://github.com/malfet	2023-03-29 06:54:09 +00:00
Mikayla Gawarecki	1a2dcff127	Added ModuleInfos for remaining activation functions (#97704 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97704 Approved by: https://github.com/albanD	2023-03-28 17:11:41 +00:00
Mikayla Gawarecki	a283c15e34	Added ModuleInfos for {*}LU modules (#97375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97375 Approved by: https://github.com/albanD, https://github.com/jbschlosser	2023-03-28 00:36:31 +00:00
Mikayla Gawarecki	236bac811a	Add ModuleInfos for Adaptive{Max/Avg}Pool ops (#97291 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97291 Approved by: https://github.com/albanD	2023-03-27 19:45:37 +00:00
Mikayla Gawarecki	0b094ca37f	Add gradcheck_nondet_tol to a few padding moduleinfos (#97265 ) Fixes #96739, see https://github.com/pytorch/pytorch/issues/96739#issuecomment-1478327704 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97265 Approved by: https://github.com/albanD	2023-03-21 23:46:28 +00:00
Rishub Tamirisa	152c1529ca	Add tests for all padding layers to `module_db` in `common_modules.py` (#96641 ) Adding the PR discussed in #96295. - Adds tests for all current padding layers to `module_db` in `torch/testing/_internal/common_modules.py` ( `nn.ReflectionPad`, `nn.ReplicationPad`, `nn.ZeroPad`, `nn.ConstantPad` ) for 1D, 2D, and 3D variants. - Removes tests for the same padding layers from `torch/testing/_internal/common_nn.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96641 Approved by: https://github.com/albanD	2023-03-14 17:42:10 +00:00
Eli Uriegas	8c8148c887	Revert D43643526: Multisect successfully blamed D43643526 for test or build failures (#96126 ) Summary: This diff is reverting D43643526 Depends on D43693521 D43643526: Avoid copies in matmul (#76828) by generatedunixname499836121 has been identified to be causing the following test or build failures: Tests affected: - [mle/favour:tests - favour_test.py::TestLinears::test_psd](https://www.internalfb.com/intern/test/562950027104300/) Here's the Multisect link: https://www.internalfb.com/intern/testinfra/multisect/1611690 Here are the tasks that are relevant to this breakage: T146911536: 5 tests started failing for oncall prob in the last 2 weeks We're generating a revert to back out the changes in this diff, please note the backout may land if someone accepts it. Test Plan: NA Differential Revision: D43693526 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96126 Approved by: https://github.com/weiwangmeta	2023-03-06 22:30:07 +00:00
lezcano	b3175ae95f	Avoid copies in matmul (#76828 ) With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see https://github.com/pytorch/pytorch/pull/75197#discussion_r843413208 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489479 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489805 Fixes https://github.com/pytorch/pytorch/issues/76702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76828 Approved by: https://github.com/ngimel	2023-02-27 15:24:59 +00:00
Jeff Daily	66bfcd32fd	[ROCm] Remove PYTORCH_MIOPEN_SUGGEST_NHWC flag (#90725 ) Fixes #64427. MIOpen supports ChannelsLast. No longer need to opt-in with env var. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90725 Approved by: https://github.com/malfet	2023-02-09 22:26:24 +00:00
lezcano	5a7c1b7894	[decompositions] LSTM with packed input (#91465 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91465 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
lezcano	bef61225c3	[decompositions] add decomposition for RNN with packed sequence (#91281 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91281 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
lezcano	20d01d2dc9	[expanded weights] add RNN support via decomp (#91807 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91807 Approved by: https://github.com/albanD	2023-02-08 14:16:30 +00:00
lezcano	c2a92687e0	[decompositions] add RNN decomp and testing (#91123 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91123 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Vasiliy Kuznetsov	f15ab8a7f2	AO migration: replace torch internal callsites (#94170 ) Summary: Do the following renames: `torch.quantization` -> `torch.ao.quantization` `torch.nn.quantized` -> `torch.ao.nn.quantized` `torch.nn.quantizable` -> `torch.ao.nn.quantizable` `torch.nn.qat` -> `torch.ao.nn.qat` `torch.nn.intrinsic` -> `torch.ao.nn.intrinsic` And then, do `torch.ao.nn.quantized._reference` -> `torch.ao.nn.quantized.reference` to clean up the aftermath of https://github.com/pytorch/pytorch/pull/84974 Then, manually update `test/test_module_init.py` to fix hanging whitespace due to the replace. Run this script to do the replacements: https://gist.github.com/vkuzo/7f7afebf8c31b9ba48306223e68a1c82 This is for https://github.com/pytorch/pytorch/issues/81667 Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/94170 Approved by: https://github.com/jerryzh168	2023-02-07 02:32:23 +00:00
mingfeima	26cba842ad	Optimize ConvTransposed2D with mkldnn float32 and bfloat16 on CPU (#92530 ) this PR optimized `ConvTranspose2d` with oneDNN and add channels last support for it. Also the fallback path `slow_conv_transpose2d` also have channels last support. So the memory format propagation behavior would stay the same with or without oneDNN. Replacement of https://github.com/pytorch/pytorch/pull/77060, https://github.com/pytorch/pytorch/pull/70897 and https://github.com/pytorch/pytorch/pull/74023 which enables oneDNN for `ConvTranspose2d` and `ConvTranspose3d` The following results collects on Skylake Xeon 8180, dual sockets, 28 cores per socket. ### single core channels last configs \| forward before/ms \| forward after/ms \| ratio \| backward before/ms \| backward after/ms \| ratio -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) \| 181.36 \| 91.16 \| 1.99 \| 531.38 \| 124.08 \| 4.28 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) \| 324.35 \| 153.50 \| 2.11 \| 973.16 \| 185.97 \| 5.23 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) \| 1086.82 \| 671.52 \| 1.62 \| 3008.94 \| 1453.33 \| 2.07 ### single core channels first configs \| forward before/ms \| forward after/ms \| ratio \| backward before/ms \| backward after/ms \| ratio -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) \| 138.10 \| 5.94 \| 23.23 \| 37.97 \| 11.25 \| 3.38 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) \| 236.43 \| 8.75 \| 27.03 \| 87.77 \| 18.58 \| 4.72 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) \| 484.39 \| 37.69 \| 12.85 \| 185.40 \| 90.57 \| 2.05 ### single socket channels last configs \| forward before/ms \| forward after/ms \| ratio \| backward before/ms \| backward after/ms \| ratio -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) \| 138.10 \| 5.94 \| 23.23 \| 37.97 \| 11.25 \| 3.38 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) \| 236.43 \| 8.75 \| 27.03 \| 87.77 \| 18.58 \| 4.72 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) \| 484.39 \| 37.69 \| 12.85 \| 185.40 \| 90.57 \| 2.0 ### single socket channels first configs \| forward before/ms \| forward after/ms \| ratio \| backward before/ms \| backward after/ms \| ratio -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) \| 132.56 \| 7.19 \| 18.43 \| 31.43 \| 11.20 \| 2.81 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) \| 227.94 \| 13.33 \| 17.11 \| 63.00 \| 23.41 \| 2.69 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) \| 473.68 \| 52.79 \| 8.97 \| 150.40 \| 87.33 \| 1.72 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92530 Approved by: https://github.com/jgong5, https://github.com/ezyang	2023-02-06 10:11:25 +00:00
Joel Schlosser	1effabe257	Support per-parameter test decoration (#91658 ) Continuation of #79979. Fixes #79161 This PR does the following: * Expands the `parametrize_fn()` signature from returning a 3-tuple of `(test, test_name, param_kwargs)` to returning a 4-tuple of `(test, test_name, param_kwargs, decorator_fn)`. Expected signature for the addition is `decorator_fn(param_kwargs) -> List[decorator]` i.e. given the full set of test params, return a list of decorators to apply. * `modules`, `ops`, and `parametrize` now fit the new signature, returning `decorator_fn`s instead of applying decorators themselves. * `instantiate_parametrized_tests()` and `instantiate_device_type_tests()` now call the returned `decorator_fn`, passing in the full set of `param_kwargs` (after composition + `device` / `dtype` additions) and applying the returned decorators. * Composing multiple `parametrize_fn`s also composes the corresponding `decorator_fn`s; the composed `decorator_fn` simply concatenates the decorator lists returned by the constituents. * Expands `DecorateInfo.is_active` to support callables: ```python DecorateInfo( unittest.expectedFailure, "TestOps", "test_python_ref_executor", device_type='cuda', active_if=lambda params: params['executor'] == 'nvfuser' ), ``` * Adds several tests to `test/test_testing.py` ensuring proper decoration using `@parametrize`, `@modules`, and `@ops`. * (minor) Fixes a couple `ModuleInfo` naming oddities uncovered during testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91658 Approved by: https://github.com/malfet	2023-01-04 21:08:32 +00:00
PyTorch MergeBot	0a6053e9b5	Revert "Avoid copies in matmul (#76828 )" This reverts commit `8c2e82b487`. Reverted https://github.com/pytorch/pytorch/pull/76828 on behalf of https://github.com/mehtanirav due to Internal breakages	2023-01-03 23:36:58 +00:00
lezcano	8c2e82b487	Avoid copies in matmul (#76828 ) With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see https://github.com/pytorch/pytorch/pull/75197#discussion_r843413208 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489479 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489805 Fixes https://github.com/pytorch/pytorch/issues/76702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76828 Approved by: https://github.com/ngimel	2023-01-03 14:18:38 +00:00
PyTorch MergeBot	db2a237763	Revert "Avoid copies in matmul (#76828 )" This reverts commit `0c3659586d`. Reverted https://github.com/pytorch/pytorch/pull/76828 on behalf of https://github.com/lezcano due to Makes functorch tests fail	2023-01-03 12:26:29 +00:00

1 2 3

102 Commits