Mikayla Gawarecki
f138b08d2e
Migrated loss functions to ModuleInfos ( #115584 )
...
Migrates most tests in `common_nn.py:criterion_tests` to ModuleInfos.
**I can split this up if it is too large to review**
What this PR does not include:
- [`no_batch_dim` tests](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L3995-L4112 )
- [tests that use the functional variant of the loss function and `wrap_functional`](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L1079-L1128 )
#### On test times
This PR increases test time by ~58s locally
Before this PR:
```
>>> python test/test_nn.py -k Loss
Ran 1003 tests in 28.977s
```
After this PR
```
>>> python test/test_nn.py -k Loss
Ran 368 tests in 23.073s
```
```
>>> python test/test_modules.py -k Loss
Ran 836 tests in 63.900s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115584
Approved by: https://github.com/janeyx99
ghstack dependencies: #115617
2023-12-12 22:20:20 +00:00
Wongboo
68f74dd162
Add python and C++ support for LPPool3d ( #114199 )
...
Add python and C++ support for LPPool3d to Fixes #114114
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114199
Approved by: https://github.com/mikaylagawarecki
2023-12-08 18:18:44 +00:00
Aaron Gokaslan
b7b2178204
[BE]: Remove useless lambdas ( #113602 )
...
Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602
Approved by: https://github.com/albanD
2023-11-14 20:06:48 +00:00
CaoE
7c9052165a
add fp16 support for native conv and deconv on CPU ( #99497 )
...
### Testing
Native conv vs. mkldnn conv on SPR (with avx512_fp16 support)
Single core:
Input | Naïve impl / us | oneDNN / us | Speed up
-- | -- | -- | --
IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 | 34676789 | 524199.8 | 66.15185
IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 | 33454125 | 349844.4 | 95.62573
IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 | 317650.1 | 2317.677 | 137.0554
IC: 128, OC: 256, kernel: 3, stride: 1, N: 1, L: 64 | 15334.68 | 167.264 | 91.67952
56 cores:
Input | Naïve impl / us | oneDNN / us | Speed up
-- | -- | -- | --
IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 | 1032064 | 11073.58 | 93.20061
IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 | 1000097 | 16371.19 | 61.08883
IC: 256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 | 981813.4 | 9008.908 | 108.9825
IC: 1024, OC: 256, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 | 1082606 | 10150.47 | 106.6558
IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 | 319980.6 | 181.598 | 1762.027
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99497
Approved by: https://github.com/jgong5 , https://github.com/cpuhrsch
2023-09-25 01:31:26 +00:00
FFFrog
003c5bb156
Add checks to num_layers for RNN, LSTM, GRU ( #108853 )
...
Fixes #108223
As the title shown
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108853
Approved by: https://github.com/mikaylagawarecki
2023-09-09 19:33:52 +00:00
CaoE
8f02884569
add Half support for GroupNorm on CPU ( #100234 )
...
### Testing
Single socket (28cores):
* Contiguous:
shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
| fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10, 128, 10, 10] | 2.45E-05 | 3.26E-05 | 6.87E-05 | 7.40E-05
[10, 128, 80, 80] | 0.000726 | 0.000606 | 0.002183 | 0.001112
* Channels Last:
shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
| fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10, 128, 10, 10] | 2.88E-05 | 2.72E-05 | 6.56E-05 | 6.63E-05
[10, 128, 80, 80] | 0.00076 | 0.000256 | 0.002385 | 0.000735
Single core:
* Contiguous:
shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
| fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10, 128, 10, 10] | 9.47E-05 | 1.90E-04 | 2.03E-04 | 3.10E-04
[10, 128, 80, 80] | 6.25E-03 | 8.98E-03 | 0.016485 | 0.01369
* Channels Last:
shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
| fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10, 128, 10, 10] | 8.66E-05 | 7.89E-05 | 1.95E-04 | 1.43E-04
[10, 128, 80, 80] | 5.97E-03 | 3.13E-03 | 0.01626 | 8.70E-03
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100234
Approved by: https://github.com/jgong5 , https://github.com/mikaylagawarecki
2023-09-01 21:25:24 +00:00
Mikayla Gawarecki
584a01b650
Fix LayerNorm(bias=False) error ( #108060 )
...
Fixes #108048
- [ ] Cherry pick this [here](https://github.com/pytorch/pytorch/issues/108055 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108060
Approved by: https://github.com/jbschlosser , https://github.com/albanD , https://github.com/malfet
2023-08-28 18:23:13 +00:00
CaoE
3267996372
add channel last 3d support for maxpool3d on CPU ( #97775 )
...
### Testing
Single socket (28 cores):
shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms
-- | -- | -- | -- | --
size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485
size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506
size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365
size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923
size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155
size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364
Single core:
shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms
-- | -- | -- | -- | --
size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541
size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323
size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853
size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796
size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229
size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97775
Approved by: https://github.com/jgong5 , https://github.com/mikaylagawarecki
2023-08-26 00:21:27 +00:00
CaoE
3992450e8d
Add backward check for test_memory_format ( #106104 )
...
Add backward check for test_memory_format.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106104
Approved by: https://github.com/mikaylagawarecki
2023-08-25 18:11:54 +00:00
Prachi Gupta
3022a395f3
test_memory_format test now passes on rocm ( #107696 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107696
Approved by: https://github.com/pruthvistony , https://github.com/albanD
2023-08-23 16:39:19 +00:00
Liao, Xuan
71632d4d24
[cpu] add sdpa choice and UT ( #105131 )
...
Feature RFC: https://github.com/pytorch/rfcs/pull/56 .
Write an SDPA selecting function for CPU to automatically choose one SDPA implementation among several ones. There are two CPU implementations which could be chosen: the unfused SDPA and flash attention. In general, flash attention has a higher priority than the unfused SDPA. For cases where flash attention is not applicable, such as manually disabling flash attention or the inputs not 4 dimensional, the unfused SDPA is chosen.
## Performance of the stack
### NanoGPT's SDPA kernel
Using benchmark [repo](https://github.com/mingfeima/bench_sdpa/blob/main/README.md ), with one socket.
Shape: Batch size 1, Sequence length 1024, Head number 25, Head size 64.
Machine: SPR.
| Dtype | Causal | Mode | SDPA | Time (ms per iter) | Speedup |
| -------- | -------- | ------- | ------- | ------- | ------- |
| float32 | FALSE | Inference | Unfused | 3.081 | |
| | | | Flash attention | 1.665 | **1.85045** |
| float32 | TRUE | Inference | Unfused | 3.463 | |
| | | | Flash attention | 1.662 | **2.083634**|
| bfloat16 | FALSE | Inference | Unfused | 1.203 | |
| | | | Flash attention | 1.154 | **1.042461**|
| bfloat16 | TRUE | Inference | Unfused | 1.543 | |
| | | | Flash attention | 1.154 | **1.337088**|
| float32 | FALSE | Training | Unfused | 54.938 | |
| | | | Flash attention | 23.029 | **2.385601**|
| float32 | TRUE | Training | Unfused | 58.266 | |
| | | | Flash attention | 17.835 | **3.266947**|
| bfloat16 | FALSE | Training | Unfused | 18.924 | |
| | | | Flash attention | 18.886 | **1.002012**|
| bfloat16 | TRUE | Training | Unfused | 21.08 | |
| | | | Flash attention | 14.172 | **1.48744** |
### Stable Diffusion
Following model's [BKM](https://github.com/intel-innersource/frameworks.ai.models.intel-models/blob/develop/quickstart/diffusion/pytorch/stable_diffusion/inference/cpu/README.md ).
Mode: Inference; Machine: SPR.
| Dtype | SDPA | Throughput (fps) | Speedup SDPA | Total Time (ms) | Speedup |
| -------- | -------- | ------- | ------- | ------- | ------- |
| float32 | Unfused | 1.63 | | 1139 | |
| | Flash attention | 1.983 | 1.216564 | 547.488 | **2.080411**|
| bfloat16 | Flash attention in IPEX | 4.784 | | 429.051 | |
| | Flash attention | 4.857 | 1.015259 | 408.823 | **1.049479**|
### LLM models of Torchbench
Dtype: float32; Mode: Inference, single socket; Machine: CPX.
Model name | SDPA | Inductor_new | Inductor_old | Inductor Ratio(old/new)
-- | -- | -- | -- | --
hf_Albert | Unfused -> Flash attention | 0.048629309 | 0.05591545 | **1.14983024**
hf_Bert | Unfused -> Flash attention | 0.053156243 | 0.060732115 | **1.142520841**
hf_Bert_large | Unfused -> Flash attention | 0.141089502 | 0.155190077 | **1.099940636**
llama | Unfused -> Flash attention | 0.033250106 | 0.033720745 | **1.01415451**
Dtype: bfloat16; Mode: Inference, single socket; Machine: SPR.
Model name | SDPA | Inductor_new | Inductor_old | Inductor Ratio(old/new)
-- | -- | -- | -- | --
hf_Albert | Unfused -> Flash attention | 0.020681298 | 0.020718282 | **1.001788324**
hf_Bert | Unfused -> Flash attention | 0.019932816 | 0.019935424 | **1.000130842**
hf_Bert_large | Unfused -> Flash attention | 0.047949174 | 0.048312502 | **1.007577355**
llama | Unfused -> Flash attention | 0.018528057 | 0.01861126 | **1.0044907**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105131
Approved by: https://github.com/drisspg
ghstack dependencies: #104583 , #104584 , #103826 , #104693 , #104863 , #107128
2023-08-20 08:56:21 +00:00
FFFrog
2d2d43d9fb
add more check on LSTMCell ( #107380 )
...
Just like #107223 , operator ``LSTMCell`` have the same problems as ``GRUCell``, and add some check and tests related to fix it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107380
Approved by: https://github.com/ezyang
2023-08-18 20:44:17 +00:00
PyTorch MergeBot
02bcaf45f6
Revert "Add backward check for test_memory_format ( #106104 )"
...
This reverts commit 2e44adb066 .
Reverted https://github.com/pytorch/pytorch/pull/106104 on behalf of https://github.com/huydhn due to Sorry for reverting this but it is failing inductor job in trunk 2e44adb066 . I will add ciflow/inductor label to the PR make sure that the test runs there ([comment](https://github.com/pytorch/pytorch/pull/106104#issuecomment-1683119990 ))
2023-08-17 23:45:31 +00:00
CaoE
2e44adb066
Add backward check for test_memory_format ( #106104 )
...
Add backward check for test_memory_format.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106104
Approved by: https://github.com/mikaylagawarecki
2023-08-17 21:19:34 +00:00
FFFrog
a4229690e3
Add Some Checks about dim ( #107223 )
...
Fixes #106769
As mentioned in [GRUCell](https://pytorch.org/docs/stable/generated/torch.nn.GRUCell.html#grucell ), `hidden` should have the same dimension as `input`, and the dimension should be either `1D` or `2D`.
As for other aspects, it has been verified in `C++`, such as the batch of `Input` and `hidden` are the same, `Input`'s Dim1 and `input_size` are the same, `hidden`'s Dim1 and `hidden_size` are the same, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107223
Approved by: https://github.com/albanD
2023-08-16 22:03:31 +00:00
Mikayla Gawarecki
1317dbf176
Reland "Add nn.CircularPad{*}d for consistency + fix no_batch_dim support ( #106148 )" ( #106632 )
...
Previous one was reverted because the PR stacked under which added error-checking to Pad variants https://github.com/pytorch/pytorch/pull/106147 was reverted as internally some people pass 2D inputs to ZeroPad2d (which should actually take 3d or 4d inputs :) but there wasn't actually anything this PR was breaking according to my understanding
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106632
Approved by: https://github.com/albanD
2023-08-07 20:10:25 +00:00
PyTorch MergeBot
dfcfd5cedb
Revert "Add nn.CircularPad{*}d for consistency + fix no_batch_dim support ( #106148 )"
...
This reverts commit 87d2536971 .
Reverted https://github.com/pytorch/pytorch/pull/106148 on behalf of https://github.com/malfet due to Reverting as dependent PR https://github.com/pytorch/pytorch/pull/106147 was reverted as well ([comment](https://github.com/pytorch/pytorch/pull/106148#issuecomment-1662344543 ))
2023-08-02 14:46:00 +00:00
PyTorch MergeBot
d83b887f2a
Revert "Add error checking for padding modules ( #106147 )"
...
This reverts commit 0547b6279d .
Reverted https://github.com/pytorch/pytorch/pull/106147 on behalf of https://github.com/jeanschmidt due to sadly it is breaking internal builds, and I can't coordinate a FF due to timezone differences ([comment](https://github.com/pytorch/pytorch/pull/106147#issuecomment-1661870970 ))
2023-08-02 09:37:40 +00:00
Mikayla Gawarecki
87d2536971
Add nn.CircularPad{*}d for consistency + fix no_batch_dim support ( #106148 )
...
Fixes #105749 https://github.com/pytorch/pytorch/issues/95320
(tldr is that input should always be `[N, C, H, (W, D])` where only H, W and D dimensions get circular padding, so the 2D case where user wants both dimensions to be padded --> they should `.unsqueeze(0)` (as is the case for `Reflection/ReplicationPad`) but we didn't document this for circular padding. [This seems to be the old docstring](277b05014a/torch/nn/functional.py (L4689) ) that was somehow lost.
Fixes no_batch_dim support https://github.com/pytorch/pytorch/issues/104860
- Adds missing documentation for circular padding
- Adds missing CircularPad modules
- Migrates legacy test_nn tests from circular padding to ModuleInfo
- Adds no_batch_dim support + sample inputs that test this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106148
Approved by: https://github.com/albanD
ghstack dependencies: #106325 , #106147
2023-08-01 12:49:58 +00:00
Mikayla Gawarecki
0547b6279d
Add error checking for padding modules ( #106147 )
...
Fixes https://github.com/pytorch/pytorch/issues/105627
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106147
Approved by: https://github.com/albanD
ghstack dependencies: #106325
2023-08-01 12:49:58 +00:00
Mikayla Gawarecki
c9be60cd0e
Add error inputs to ModuleInfo (mirroring OpInfo) ( #106325 )
...
Add infra for error inputs to ModuleInfos, migrate first few error inputs tests from test_nn.py (more to come!)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106325
Approved by: https://github.com/albanD
2023-08-01 12:49:56 +00:00
Mikayla Gawarecki
e18d53e2df
Added ModuleInfo test for meta device ctx init ( #105871 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105871
Approved by: https://github.com/albanD
2023-07-26 01:57:54 +00:00
Justin Chu
be03a56955
[BE] Enable ruff's UP rules and autoformat testing/ ( #105425 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105425
Approved by: https://github.com/malfet
2023-07-18 21:04:39 +00:00
mingfeima
a66f08d626
enable channels last for replication padding on CPU ( #102597 )
...
Enable channels last support for replication padding on CPU. This patch add channels last support for ReplicationPad2d/3d on CPU backend. The following test cases will pass with this patch:
```
python test_modules.py TestModuleCPU.test_memory_format_nn_ReplicationPad2d_cpu_float32
python test_modules.py TestModuleCPU.test_memory_format_nn_ReplicationPad3d_cpu_float32
```
The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket.
### single core inference
```
(before)
ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.339 ms
ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 82.935 ms
(after)
ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.324 ms
ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 16.717 ms
```
### single socket inference
```
(before)
ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.135 ms
ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 7.203 ms
(after)
ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.029 ms
ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 3.174 ms
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102597
Approved by: https://github.com/CaoE , https://github.com/cpuhrsch
2023-07-14 03:44:55 +00:00
mingfeima
f73757d551
enable channels last for reflection padding on CPU ( #102518 )
...
Add channels last support for reflection padding on CPU. The following test cases will pass with this patch:
```
python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32
python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad3d_cpu_float32
```
The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket.
### single core inference
```
(before)
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.356 ms
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 86.821 ms
(after)
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.328 ms
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 16.806 ms
```
### single socket inference
```
(before)
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.142 ms
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 7.367 ms
(after)
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.027 ms
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 3.181 ms
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102518
Approved by: https://github.com/CaoE , https://github.com/cpuhrsch
2023-07-13 16:22:31 +00:00
Jens Glaser
86e0eda18d
Add partial derivative unit tests ( #103809 )
...
Adds the unit tests requested in #95810
This PR also addresses a gap in unit testing of gradients, as `gradcheck` always performs total derivatives w.r.t. all arguments and module parameters. Some modules have different code paths for partial derivatives, e.g. `LayerNorm`, and those should be tested separately.
The PR has the following limitations:
- it does not test partial derivatives w.r.t. every combination of arguments, which would exponentially increase CI time.
- it does not implement the same logic for Hessians, where the increase in CI time would be quadratic in the number of arguments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103809
Approved by: https://github.com/kit1980
2023-06-25 00:36:10 +00:00
Ramin Azarmehr
cecfcf1e17
[MPS] Handle MPS failures of test_modules.py in common_modules.py ( #95334 )
...
- Also cleaned up `test_modules.py` from skipMPS code.
- Added `skipMPS` for unsupported or failing tests on MPS backend in common_modules.py.
(We'll remove `skipMPS` from those tests once a fix is available for them.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95334
Approved by: https://github.com/kulinseth , https://github.com/albanD
2023-05-09 03:55:16 +00:00
Mikayla Gawarecki
2c6c7deeb3
Added ModuleInfos for Pooling ops ( #98358 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98358
Approved by: https://github.com/albanD
2023-04-05 19:39:07 +00:00
Mikayla Gawarecki
3a0ad3c194
[easy] Remove large LayerNorm sample input causing OOM from ModuleInfo ( #98424 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98424
Approved by: https://github.com/huydhn , https://github.com/albanD
2023-04-05 19:38:15 +00:00
Mikayla Gawarecki
96ad739ddc
Added ModuleInfos for {*}Norm modules ( #97919 )
...
Not adding Lazy variants yet pending investigation of #97915
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97919
Approved by: https://github.com/albanD
2023-04-04 01:15:25 +00:00
lezcano
6871665a97
Avoid copies in matmul (no ghstack) ( #97355 )
...
Resubmit of https://github.com/pytorch/pytorch/pull/76828 without using ghstack so that @ngimel can import it and help me debug the issue why it was reverted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97355
Approved by: https://github.com/ngimel , https://github.com/malfet
2023-03-29 06:54:09 +00:00
Mikayla Gawarecki
1a2dcff127
Added ModuleInfos for remaining activation functions ( #97704 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97704
Approved by: https://github.com/albanD
2023-03-28 17:11:41 +00:00
Mikayla Gawarecki
a283c15e34
Added ModuleInfos for {*}LU modules ( #97375 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97375
Approved by: https://github.com/albanD , https://github.com/jbschlosser
2023-03-28 00:36:31 +00:00
Mikayla Gawarecki
236bac811a
Add ModuleInfos for Adaptive{Max/Avg}Pool ops ( #97291 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97291
Approved by: https://github.com/albanD
2023-03-27 19:45:37 +00:00
Mikayla Gawarecki
0b094ca37f
Add gradcheck_nondet_tol to a few padding moduleinfos ( #97265 )
...
Fixes #96739 , see https://github.com/pytorch/pytorch/issues/96739#issuecomment-1478327704
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97265
Approved by: https://github.com/albanD
2023-03-21 23:46:28 +00:00
Rishub Tamirisa
152c1529ca
Add tests for all padding layers to module_db in common_modules.py ( #96641 )
...
Adding the PR discussed in #96295 .
- Adds tests for all current padding layers to `module_db` in `torch/testing/_internal/common_modules.py` ( `nn.ReflectionPad`, `nn.ReplicationPad`, `nn.ZeroPad`, `nn.ConstantPad` ) for 1D, 2D, and 3D variants.
- Removes tests for the same padding layers from `torch/testing/_internal/common_nn.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96641
Approved by: https://github.com/albanD
2023-03-14 17:42:10 +00:00
Eli Uriegas
8c8148c887
Revert D43643526: Multisect successfully blamed D43643526 for test or build failures ( #96126 )
...
Summary:
This diff is reverting D43643526
Depends on D43693521
D43643526: Avoid copies in matmul (#76828 ) by generatedunixname499836121 has been identified to be causing the following test or build failures:
Tests affected:
- [mle/favour:tests - favour_test.py::TestLinears::test_psd](https://www.internalfb.com/intern/test/562950027104300/ )
Here's the Multisect link:
https://www.internalfb.com/intern/testinfra/multisect/1611690
Here are the tasks that are relevant to this breakage:
T146911536: 5 tests started failing for oncall prob in the last 2 weeks
We're generating a revert to back out the changes in this diff, please note the backout may land if someone accepts it.
Test Plan: NA
Differential Revision: D43693526
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96126
Approved by: https://github.com/weiwangmeta
2023-03-06 22:30:07 +00:00
lezcano
b3175ae95f
Avoid copies in matmul ( #76828 )
...
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying. We add tests for this to make sure that
our algorithm to detect this is accurate.
For the cases where it was copying before see https://github.com/pytorch/pytorch/pull/75197#discussion_r843413208 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489479 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489805
Fixes https://github.com/pytorch/pytorch/issues/76702
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76828
Approved by: https://github.com/ngimel
2023-02-27 15:24:59 +00:00
Jeff Daily
66bfcd32fd
[ROCm] Remove PYTORCH_MIOPEN_SUGGEST_NHWC flag ( #90725 )
...
Fixes #64427 . MIOpen supports ChannelsLast. No longer need to opt-in with env var.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90725
Approved by: https://github.com/malfet
2023-02-09 22:26:24 +00:00
lezcano
5a7c1b7894
[decompositions] LSTM with packed input ( #91465 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91465
Approved by: https://github.com/zou3519
2023-02-08 14:16:30 +00:00
lezcano
bef61225c3
[decompositions] add decomposition for RNN with packed sequence ( #91281 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91281
Approved by: https://github.com/zou3519
2023-02-08 14:16:30 +00:00
lezcano
20d01d2dc9
[expanded weights] add RNN support via decomp ( #91807 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91807
Approved by: https://github.com/albanD
2023-02-08 14:16:30 +00:00
lezcano
c2a92687e0
[decompositions] add RNN decomp and testing ( #91123 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91123
Approved by: https://github.com/zou3519
2023-02-08 14:16:30 +00:00
Aaron Gokaslan
8fce9a09cd
[BE]: pyupgrade Python to 3.8 - imports and object inheritance only ( #94308 )
...
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang , https://github.com/albanD
2023-02-07 21:10:56 +00:00
Vasiliy Kuznetsov
f15ab8a7f2
AO migration: replace torch internal callsites ( #94170 )
...
Summary:
Do the following renames:
`torch.quantization` -> `torch.ao.quantization`
`torch.nn.quantized` -> `torch.ao.nn.quantized`
`torch.nn.quantizable` -> `torch.ao.nn.quantizable`
`torch.nn.qat` -> `torch.ao.nn.qat`
`torch.nn.intrinsic` -> `torch.ao.nn.intrinsic`
And then, do
`torch.ao.nn.quantized._reference` -> `torch.ao.nn.quantized.reference` to clean up the aftermath of https://github.com/pytorch/pytorch/pull/84974
Then, manually update `test/test_module_init.py` to fix hanging whitespace due to the replace.
Run this script to do the replacements: https://gist.github.com/vkuzo/7f7afebf8c31b9ba48306223e68a1c82
This is for https://github.com/pytorch/pytorch/issues/81667
Test plan: CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94170
Approved by: https://github.com/jerryzh168
2023-02-07 02:32:23 +00:00
mingfeima
26cba842ad
Optimize ConvTransposed2D with mkldnn float32 and bfloat16 on CPU ( #92530 )
...
this PR optimized `ConvTranspose2d` with oneDNN and add channels last support for it. Also the fallback path `slow_conv_transpose2d` also have channels last support. So the memory format propagation behavior would stay the same with or without oneDNN.
Replacement of https://github.com/pytorch/pytorch/pull/77060 , https://github.com/pytorch/pytorch/pull/70897 and https://github.com/pytorch/pytorch/pull/74023 which enables oneDNN for `ConvTranspose2d` and `ConvTranspose3d`
The following results collects on Skylake Xeon 8180, dual sockets, 28 cores per socket.
### single core channels last
configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio
-- | -- | -- | -- | -- | -- | --
input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 181.36 | 91.16 | 1.99 | 531.38 | 124.08 | 4.28
input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 324.35 | 153.50 | 2.11 | 973.16 | 185.97 | 5.23
input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 1086.82 | 671.52 | 1.62 | 3008.94 | 1453.33 | 2.07
### single core channels first
configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio
-- | -- | -- | -- | -- | -- | --
input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 138.10 | 5.94 | 23.23 | 37.97 | 11.25 | 3.38
input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 236.43 | 8.75 | 27.03 | 87.77 | 18.58 | 4.72
input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 484.39 | 37.69 | 12.85 | 185.40 | 90.57 | 2.05
### single socket channels last
configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio
-- | -- | -- | -- | -- | -- | --
input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 138.10 | 5.94 | 23.23 | 37.97 | 11.25 | 3.38
input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 236.43 | 8.75 | 27.03 | 87.77 | 18.58 | 4.72
input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 484.39 | 37.69 | 12.85 | 185.40 | 90.57 | 2.0
### single socket channels first
configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio
-- | -- | -- | -- | -- | -- | --
input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 132.56 | 7.19 | 18.43 | 31.43 | 11.20 | 2.81
input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 227.94 | 13.33 | 17.11 | 63.00 | 23.41 | 2.69
input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 473.68 | 52.79 | 8.97 | 150.40 | 87.33 | 1.72
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92530
Approved by: https://github.com/jgong5 , https://github.com/ezyang
2023-02-06 10:11:25 +00:00
Joel Schlosser
1effabe257
Support per-parameter test decoration ( #91658 )
...
Continuation of #79979 .
Fixes #79161
This PR does the following:
* Expands the `parametrize_fn()` signature from returning a 3-tuple of `(test, test_name, param_kwargs)` to returning a 4-tuple of `(test, test_name, param_kwargs, decorator_fn)`. Expected signature for the addition is `decorator_fn(param_kwargs) -> List[decorator]` i.e. given the full set of test params, return a list of decorators to apply.
* `modules`, `ops`, and `parametrize` now fit the new signature, returning `decorator_fn`s instead of applying decorators themselves.
* `instantiate_parametrized_tests()` and `instantiate_device_type_tests()` now call the returned `decorator_fn`, passing in the full set of `param_kwargs` (after composition + `device` / `dtype` additions) and applying the returned decorators.
* Composing multiple `parametrize_fn`s also composes the corresponding `decorator_fn`s; the composed `decorator_fn` simply concatenates the decorator lists returned by the constituents.
* Expands `DecorateInfo.is_active` to support callables:
```python
DecorateInfo(
unittest.expectedFailure, "TestOps", "test_python_ref_executor",
device_type='cuda', active_if=lambda params: params['executor'] == 'nvfuser'
),
```
* Adds several tests to `test/test_testing.py` ensuring proper decoration using `@parametrize`, `@modules`, and `@ops`.
* (minor) Fixes a couple `ModuleInfo` naming oddities uncovered during testing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91658
Approved by: https://github.com/malfet
2023-01-04 21:08:32 +00:00
PyTorch MergeBot
0a6053e9b5
Revert "Avoid copies in matmul ( #76828 )"
...
This reverts commit 8c2e82b487 .
Reverted https://github.com/pytorch/pytorch/pull/76828 on behalf of https://github.com/mehtanirav due to Internal breakages
2023-01-03 23:36:58 +00:00
lezcano
8c2e82b487
Avoid copies in matmul ( #76828 )
...
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying. We add tests for this to make sure that
our algorithm to detect this is accurate.
For the cases where it was copying before see https://github.com/pytorch/pytorch/pull/75197#discussion_r843413208 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489479 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489805
Fixes https://github.com/pytorch/pytorch/issues/76702
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76828
Approved by: https://github.com/ngimel
2023-01-03 14:18:38 +00:00
PyTorch MergeBot
db2a237763
Revert "Avoid copies in matmul ( #76828 )"
...
This reverts commit 0c3659586d .
Reverted https://github.com/pytorch/pytorch/pull/76828 on behalf of https://github.com/lezcano due to Makes functorch tests fail
2023-01-03 12:26:29 +00:00