Pruthvi Madugundu
9ce2e02fd6
Revert "[ROCm] Remove PYTORCH_MIOPEN_SUGGEST_NHWC flag ( #90725 )" ( #110319 )
...
This reverts commit 66bfcd32fd .
NHWC is have perf regression on MIOpen, so reverting till the performance issue is fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110319
Approved by: https://github.com/jeffdaily , https://github.com/jithunnair-amd , https://github.com/kit1980
2023-10-03 19:14:47 +00:00
Jerry Zhang
f2a1b93549
Back out "[quant] Support integer implementations for adaptive_avg_pool2d ( #104226 )" ( #110316 )
...
Summary:
Original commit changeset: acdb5b34e3aa
Original Phabricator Diff: D47321689
Test Plan: opinfo tests in CI
Differential Revision: D49789403
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110316
Approved by: https://github.com/kimishpatel
2023-10-03 16:59:23 +00:00
Driss Guessous
818f2297e6
Ensure fill_ works when value is a view of self ( #109835 )
...
# Summary
Introduced a BC breaking change in #109533 when self is a view of the value. By using the copy_() op inside fill_ we were hitting `assert_no_partial_overlap` in tensor iterator.
Ideal we would be able to avoid this check if value.numel() ==1 . But rather than monkeying around with tensor iterator I just clone the input instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109835
Approved by: https://github.com/mikaylagawarecki
2023-09-26 17:12:48 +00:00
CaoE
7c9052165a
add fp16 support for native conv and deconv on CPU ( #99497 )
...
### Testing
Native conv vs. mkldnn conv on SPR (with avx512_fp16 support)
Single core:
Input | Naïve impl / us | oneDNN / us | Speed up
-- | -- | -- | --
IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 | 34676789 | 524199.8 | 66.15185
IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 | 33454125 | 349844.4 | 95.62573
IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 | 317650.1 | 2317.677 | 137.0554
IC: 128, OC: 256, kernel: 3, stride: 1, N: 1, L: 64 | 15334.68 | 167.264 | 91.67952
56 cores:
Input | Naïve impl / us | oneDNN / us | Speed up
-- | -- | -- | --
IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 | 1032064 | 11073.58 | 93.20061
IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 | 1000097 | 16371.19 | 61.08883
IC: 256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 | 981813.4 | 9008.908 | 108.9825
IC: 1024, OC: 256, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 | 1082606 | 10150.47 | 106.6558
IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 | 319980.6 | 181.598 | 1762.027
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99497
Approved by: https://github.com/jgong5 , https://github.com/cpuhrsch
2023-09-25 01:31:26 +00:00
drisspg
deea268e43
Update aten_fill to avoid d2h sync ( #109533 )
...
Fixes #109115
### Before:
<img width="1526" alt="Screenshot 2023-09-18 at 11 57 32 AM" src="https://github.com/pytorch/pytorch/assets/32754868/394a4c51-7cae-4d05-b9ad-b17d02beaf72 ">
### After:
<img width="1550" alt="Screenshot 2023-09-18 at 11 57 25 AM" src="https://github.com/pytorch/pytorch/assets/32754868/e2f774f5-5374-49c3-95ec-dd3a85f74a2e ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109533
Approved by: https://github.com/mikaylagawarecki
2023-09-19 13:34:49 +00:00
FFFrog
bc3f0d341a
LazyBatchNorm{1-3}d support dict&set ( #109015 )
...
Fixes #105292
As the title shown ,LazyBatchNorm don`t support dict&set, keep consistent with BatchNorm{1-3}d.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109015
Approved by: https://github.com/mikaylagawarecki
2023-09-12 09:09:59 +00:00
CaoE
42f94d7e9f
add Half support for maxpool on CPU ( #98819 )
...
### Testing
Single socket (28 cores):
shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 4.12895 | 6.9669 | 5.30297 | 0.55775 | 1.98917 | 0.72233
size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.85093 | 1.88813 | 1.38063 | 5.5742 | 36.5086 | 10.58552
size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: contig | 22.37212 | 37.90383 | 30.94482 | 6.85868 | 10.6116 | 3.9993
size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: CL | 5.41658 | 4.71098 | 4.66578 | 6.69875 | 14.7171 | 5.1167
size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.69831 | 18.0468 | 13.71657 | 2.61192 | 4.96172 | 1.68635
size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.52637 | 2.0096 | 2.0055 | 2.60314 | 7.2093 | 2.49843
size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.47605 | 0.88398 | 0.65326 | 0.06525 | 0.115489 | 0.0674
size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.10902 | 0.25293 | 0.157475 | 0.11386 | 0.53319 | 0.17836
Single core:
shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 90.9809 | 163.473 | 126.1276 | 6.57721 | 41.40833 | 11.82505
size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 9.88405 | 38.39137 | 29.62069 | 7.10636 | 36.97535 | 11.0525
size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: contig | 476.782 | 855.4769 | 648.2248 | 46.6488 | 219.2586 | 67.10599
size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: CL | 80.29271 | 91.33854 | 87.80345 | 48.81692 | 203.9974 | 63.39004
size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 235.2113 | 419.0799 | 315.4284 | 20.6049 | 107.1524 | 32.39169
size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 29.47653 | 33.54905 | 32.82823 | 22.59674 | 98.5586 | 30.05763
size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.90684 | 13.9208 | 10.03272 | 0.23725 | 1.35269 | 0.41728
size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.33638 | 3.36894 | 2.64635 | 0.26535 | 1.244 | 0.38895
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98819
Approved by: https://github.com/mingfeima , https://github.com/mikaylagawarecki
2023-09-05 18:23:41 +00:00
CaoE
3267996372
add channel last 3d support for maxpool3d on CPU ( #97775 )
...
### Testing
Single socket (28 cores):
shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms
-- | -- | -- | -- | --
size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485
size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506
size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365
size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923
size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155
size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364
Single core:
shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms
-- | -- | -- | -- | --
size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541
size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323
size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853
size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796
size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229
size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97775
Approved by: https://github.com/jgong5 , https://github.com/mikaylagawarecki
2023-08-26 00:21:27 +00:00
Aaron Gokaslan
660e8060ad
[BE]: Update ruff to 0.285 ( #107519 )
...
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-22 23:16:38 +00:00
Huy Do
d9460bb8f8
Update test_MaxUnpool_index_errors XFAIL after #107483 ( #107658 )
...
After https://github.com/pytorch/pytorch/pull/107483 which reverted https://github.com/pytorch/pytorch/pull/95300 , these tests are not XFAIL anymore. So now we know the root cause of https://github.com/pytorch/pytorch/issues/103854 .
As this is failing slow jobs in trunk atm, i.e. 6981bcbc35 , I'm moving these tests back.
### Testing
Run locally and all tests passes.
```
PYTORCH_TEST_WITH_SLOW=1 PYTORCH_TEST_SKIP_FAST=1 python test/nn/test_pooling.py -k test_MaxUnpool_index_errors
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107658
Approved by: https://github.com/PaliC
2023-08-22 22:36:35 +00:00
PyTorch MergeBot
d59a6864fb
Revert "[BE]: Update ruff to 0.285 ( #107519 )"
...
This reverts commit 88ab3e4322 .
Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480 ))
2023-08-22 19:53:32 +00:00
Aaron Gokaslan
88ab3e4322
[BE]: Update ruff to 0.285 ( #107519 )
...
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-20 01:36:18 +00:00
summerdo
7db6eb7156
[test_nn] add custom device support for dropout tests、lazy_modules te… ( #106609 )
...
add custom device support for dropout tests、lazy_modules tests and multihead_attention tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106609
Approved by: https://github.com/mikaylagawarecki
2023-08-11 09:14:34 +00:00
Jason Lu
bc88028e8e
Back out "Reland "Make adding buffers more like adding parameters ( #104069 )" ( #106224 )" ( #106743 )
...
Summary:
Original commit changeset: 81319beb97f3
Original Phabricator Diff: D47961182
Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822
Reviewed By: atuljangra
Differential Revision: D48131623
@diff-train-skip-merge
(D48131623 landed internally)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743
Approved by: https://github.com/malfet
2023-08-08 15:27:34 +00:00
Michael Gschwind
3200f63ee6
Make mocked functioned return the proper result structure (tuple for native MHA for attn result and attn weights) ( #106526 )
...
Summary: Make mocked functioned return the proper result structure (tuple for native MHA for attn result and attn weights)
Test Plan: sandcastle
Differential Revision: D48021277
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106526
Approved by: https://github.com/davidberard98
2023-08-03 19:27:31 +00:00
Mikayla Gawarecki
d8e5f2aa6d
Reland "Make adding buffers more like adding parameters ( #104069 )" ( #106224 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224
Approved by: https://github.com/atalman , https://github.com/albanD
2023-07-31 17:18:56 +00:00
Justin Chu
de8bd108b4
[BE] Enable ruff's UP rules in pyproject.toml ( #105437 )
...
Signed-off-by: Justin Chu <justinchu@microsoft.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105437
Approved by: https://github.com/huydhn , https://github.com/malfet , https://github.com/Skylion007
2023-07-21 19:14:52 +00:00
Justin Chu
79c5e33349
[BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ ( #105436 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436
Approved by: https://github.com/malfet , https://github.com/albanD
2023-07-21 07:38:46 +00:00
Andrey Talman
c6653b65d8
Back out "Make adding buffers more like adding parameters ( #104069 )" ( #105581 )
...
Summary:
D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/
with `TypeError: register_buffer() takes 3 positional arguments but 4 were given`
Original commit changeset: d4b4069fbd38
Original Phabricator Diff: D47537831
Test Plan:
```
buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform
```
Reviewed By: atalman
Differential Revision: D47600140
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581
Approved by: https://github.com/mikaylagawarecki
2023-07-20 03:39:53 +00:00
ekamiti
32d422f335
Make adding buffers more like adding parameters ( #104069 )
...
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.
Fixes #35735
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
2023-07-17 17:59:05 +00:00
Aaron Gokaslan
2f95a3d0fc
[BE]: Apply ruff PERF fixes to torch ( #104917 )
...
Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917
Approved by: https://github.com/ezyang , https://github.com/albanD
2023-07-11 20:45:21 +00:00
Mikayla Gawarecki
1ad435772b
Added option to always call nn.Module global/non-global forward hooks ( #104278 )
...
Fix #103997
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104278
Approved by: https://github.com/albanD
2023-07-10 18:58:07 +00:00
Jerry Zhang
1a661639f7
[quant] Support integer implementations for adaptive_avg_pool2d ( #104226 )
...
Summary:
This is needed for representing quantized model in pt2 export quantization flow
Test Plan:
tested by opinfo, python test/test_ops.py
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104226
Approved by: https://github.com/jgong5 , https://github.com/andrewor14
2023-07-07 19:36:31 +00:00
Huy Do
f27a9129e7
XFAIL test_MaxUnpool_index_errors CUDA slow tests ( #103905 )
...
This has been failing in trunk for a while. Let's XFAIL it while continuing the investigation https://github.com/pytorch/pytorch/issues/103854 . We might not need this PR if the fix is on the way.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103905
Approved by: https://github.com/mikaylagawarecki
2023-06-22 18:05:10 +00:00
Michael Voznesensky
e5e9d563c2
Lift user defined attributes into inputs for certain cases (user defined types and tensors) ( #103386 )
...
(1) Lazy (converts to dynamo variable on access only)
(2) Uses existing side effect/reconstruct tech
(3) not tensor opinionated
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103386
Approved by: https://github.com/jansel
2023-06-20 23:45:19 +00:00
Fuzzkatt
6d570ccd59
tf32 context fixes for various tests ( #103137 )
...
Addresses tf32 context related failures from NVIDIA internal testing for following unit tests:
H100:
- functorch/test_vmap.py: test_op_has_batch_rule
A100:
- test_expanded_weights.py: test_cnn_model_sum
- nn/test_convolution.py: test_conv2d_same_padding_backward
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103137
Approved by: https://github.com/zou3519
2023-06-15 02:33:12 +00:00
Bearnardd
2abad0c184
Add dtype check baddbmm ( #102659 )
...
Fixes part of the #100838 related to disabling support for non matching dtypes for input/batches for `baddbmm` operator.
* [x] added dtype checks
* [x] added test case
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102659
Approved by: https://github.com/ngimel
2023-06-13 00:31:06 +00:00
Edward Z. Yang
ba962fefea
Add parametrization version of weight_norm ( #103001 )
...
This done in the ordinary way, but also:
* Deprecation warning for the old API, and a migration guide
* Backwards compatibility for state_dict loading the old weight_norm
* Test for pickling and deepcopy, which was the motivating reason
weight_norm is still used by HuggingFace Wav2Vec2.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103001
Approved by: https://github.com/albanD
2023-06-06 13:14:43 +00:00
Fuzzkatt
f8896b7b0e
update tf32 thresholds in nn/test_convolution.py ( #102015 )
...
updated tf32 thresholds for test_cudnn_convolution_relu, test_cudnn_convolution_add_relu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102015
Approved by: https://github.com/ngimel
2023-05-24 22:42:25 +00:00
Fuzzkatt
47e9dba765
move tf32_on_and_off fix for test_convolution.py ( #102007 )
...
move tf32_on_and_off after @torch.backends.cudnn.flags(enabled=True, benchmark=False) due to @torch.backends.cudnn.flags(enabled=True, benchmark=False) overwriting tf32_on_and_off if after.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102007
Approved by: https://github.com/ngimel
2023-05-24 02:23:06 +00:00
ts
74dc2a53f6
Thread generator through trunc_normal_ ( #100810 )
...
This will solve @albertz's issue as described in #98200 , threading the generator argument through the trunc_normal_ function. I'm still working on #99796 (and won't let it stall out), but this fix doesn't trigger any JIT issues, so I think it might be helpful to get it merged now.
Would be happy to iterate on this if there are any issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100810
Approved by: https://github.com/Skylion007 , https://github.com/albanD
2023-05-12 01:04:08 +00:00
eqy
3f656ad7bb
[CUDA] Do accumulation for Adaptive Average Pooling in opmath_t ( #99378 )
...
Fix for an issue surfaced from the discuss forum: https://discuss.pytorch.org/t/adaptiveavgpool2d-causes-some-data-to-contain-inf/177420
CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99378
Approved by: https://github.com/ngimel
2023-04-28 20:43:12 +00:00
Michael Gschwind
36e1ae6778
De-select odd numbered heads from nn.MHA fastpath ( #99672 )
...
Summary:
https://github.com/pytorch/pytorch/issues/97128
* Add test for mha num_heads %2 != 0
* Fix test
* Add test for bias false
* show test passes
Test Plan: sandcastle
Differential Revision: D45161767
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99672
Approved by: https://github.com/ngimel
2023-04-25 00:27:18 +00:00
soulitzer
ee1c539ecf
Fix module backward pre-hooks to actually update gradient ( #97983 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97983
Approved by: https://github.com/albanD
2023-03-30 20:33:44 +00:00
ecao
b72bddabe9
Move empty check to the start of _pack_padded_sequence ( #94885 )
...
Fixes #94122 .
Move empty check to the start of `_pack_padded_sequence`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94885
Approved by: https://github.com/kshitij12345 , https://github.com/jgong5 , https://github.com/malfet
2023-03-22 04:16:58 +00:00
Will Constable
2f6a371ae9
Revert "Optimize nn.Module __call__ fast path for dynamo ( #95931 )" ( #96242 )
...
Reverting due to concerns over silent unsoundness (skipped hooks) if users have directly added hooks dicts without using official torch APIs.
This reverts commit 26045336ca .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96242
Approved by: https://github.com/albanD
2023-03-10 01:05:01 +00:00
Will Constable
26045336ca
Optimize nn.Module __call__ fast path for dynamo ( #95931 )
...
This PR optimizes the guards overhead introduced by dynamo tracing module forward hooks.
It can and maybe should be followed by a wider change proposed by @voznesenskym to optimize specialized nnmodules by 'observing' any user mutations and directly invalidating the root guard, obviating the need to install other nnmodule guards. (But this observer change seems more involved...)
Idea: maintain a flag, and keep it up to date whenever adding or
removing hooks. Use the flag rather than dict checks to enter the call fast path.
- need to extend RemovableHandle to keep a ref to nnModule so it can update the flag on removal.
- also need to handle the flag in ScriptModule which still uses the python call impl when called from python.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95931
Approved by: https://github.com/ezyang , https://github.com/voznesenskym
2023-03-04 15:09:40 +00:00
Edward Z. Yang
d303665d33
Make int unspecialization actually work ( #95621 )
...
OK, so this PR used to be about reducing the number of constants we specialize on, but it turns out that unspecialization was ~essentially never used (because we still constant specialized way too aggressively) and I ended up having to fix a bunch of issues to actually get tests to pass. So this PR is now "make int unspecialization actually work". As part of this, I have to turn off unspecialization by default, as there are still latent bugs in inductor.
The general strategy is that an unspecialized int is represented as a SymInt. Representing it as a 0d tensor (which is what the code used to do) is untenable: (1) we often need unspecialized ints to participate in size computations, but we have no way of propagating sympy expressions through tensor compute, and (2) a lot of APIs work when passed SymInt, but not when passed a Tensor. However, I continue to represent Numpy scalars as Tensors, as they are rarely used for size computation and they have an explicit dtype, so they are more accurately modeled as 0d tensors.
* I folded in the changes from https://github.com/pytorch/pytorch/pull/95099 as I cannot represent unspecialized ints as SymInts without also turning on dynamic shapes. This also eliminates the necessity for test_unspec.py, as toggling specialization without dynamic shapes doesn't do anything. As dynamic shapes defaults to unspecializing, I just deleted this entirely; for the specialization case, I rely on regular static shape tests to catch it. (Hypothetically, we could also rerun all the tests with dynamic shapes, but WITH int/float specialization, but this seems... not that useful? I mean, I guess export wants it, but I'd kind of like our Source heuristic to improve enough that export doesn't have to toggle this either.)
* Only 0/1 integers get specialized by default now
* A hodgepodge of fixes. I'll comment on the PR about them.
Fixes https://github.com/pytorch/pytorch/issues/95469
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95621
Approved by: https://github.com/jansel , https://github.com/Chillee
2023-03-04 01:22:08 +00:00
kshitij12345
3b966a6ce3
[autograd] disable backward/grad for complex scalar output ( #92753 )
...
Fixes https://github.com/pytorch/pytorch/issues/92750
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753
Approved by: https://github.com/ezyang
2023-02-23 11:38:27 +00:00
puririshi98
8aa34602f7
Jetson Update for CI Redo ( #94549 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94549
Approved by: https://github.com/ezyang , https://github.com/malfet
2023-02-21 17:13:38 +00:00
soulitzer
e5c2a35d83
Add check that embedding_bag's weight is 2D ( #94931 )
...
Fixes https://github.com/pytorch/pytorch/issues/94445
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94931
Approved by: https://github.com/albanD
2023-02-16 02:37:47 +00:00
Xuehai Pan
b005ec62b9
[BE] Remove dependency on six and future ( #94709 )
...
Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six ) and [future](https://pypi.org/project/future ) and `torch._six`. We only support Python 3.8+ now. It's time to retire them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709
Approved by: https://github.com/malfet , https://github.com/Skylion007
2023-02-14 09:14:14 +00:00
Xuehai Pan
046e88a291
[BE] [3/3] Rewrite super() calls in test ( #94592 )
...
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.
- #94587
- #94588
- #94592
Also, methods with only a `super()` call are removed:
```diff
class MyModule(nn.Module):
- def __init__(self):
- super().__init__()
-
def forward(self, ...):
...
```
Some cases that change the semantics should be kept unchanged. E.g.:
f152a79be9/caffe2/python/net_printer.py (L184-L190)
f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang , https://github.com/seemethere
2023-02-12 22:20:53 +00:00
haozhe.zhu
ed54a5d06b
enable bf16 emb ( #94163 )
...
Merge https://github.com/pytorch/pytorch/pull/89199 and https://github.com/pytorch/pytorch/pull/91949 into one PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94163
Approved by: https://github.com/jianyuh , https://github.com/malfet , https://github.com/jgong5
2023-02-12 00:05:09 +00:00
Jeff Daily
66bfcd32fd
[ROCm] Remove PYTORCH_MIOPEN_SUGGEST_NHWC flag ( #90725 )
...
Fixes #64427 . MIOpen supports ChannelsLast. No longer need to opt-in with env var.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90725
Approved by: https://github.com/malfet
2023-02-09 22:26:24 +00:00
Yuyao Wang
0bf78b57c0
fix: max_unpool3d buffer overflow ( #94372 )
...
Fixes #88032
Previously `output_size` is accessed before the shape length check, which leads to a buffer overflow issue.
The fix is simply to prioritize the check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94372
Approved by: https://github.com/albanD
2023-02-08 19:48:25 +00:00
PyTorch MergeBot
53e4fe076a
Revert "enable bf16 emb ( #94163 )"
...
This reverts commit f3bf46e801 .
Reverted https://github.com/pytorch/pytorch/pull/94163 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. But I suspect that it causes flaky SIGSEGV failure for linux-bionic-py3.8-clang9 / test (crossref) job in trunk. For example, 05397b1250
2023-02-07 00:32:22 +00:00
mingfeima
26cba842ad
Optimize ConvTransposed2D with mkldnn float32 and bfloat16 on CPU ( #92530 )
...
this PR optimized `ConvTranspose2d` with oneDNN and add channels last support for it. Also the fallback path `slow_conv_transpose2d` also have channels last support. So the memory format propagation behavior would stay the same with or without oneDNN.
Replacement of https://github.com/pytorch/pytorch/pull/77060 , https://github.com/pytorch/pytorch/pull/70897 and https://github.com/pytorch/pytorch/pull/74023 which enables oneDNN for `ConvTranspose2d` and `ConvTranspose3d`
The following results collects on Skylake Xeon 8180, dual sockets, 28 cores per socket.
### single core channels last
configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio
-- | -- | -- | -- | -- | -- | --
input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 181.36 | 91.16 | 1.99 | 531.38 | 124.08 | 4.28
input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 324.35 | 153.50 | 2.11 | 973.16 | 185.97 | 5.23
input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 1086.82 | 671.52 | 1.62 | 3008.94 | 1453.33 | 2.07
### single core channels first
configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio
-- | -- | -- | -- | -- | -- | --
input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 138.10 | 5.94 | 23.23 | 37.97 | 11.25 | 3.38
input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 236.43 | 8.75 | 27.03 | 87.77 | 18.58 | 4.72
input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 484.39 | 37.69 | 12.85 | 185.40 | 90.57 | 2.05
### single socket channels last
configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio
-- | -- | -- | -- | -- | -- | --
input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 138.10 | 5.94 | 23.23 | 37.97 | 11.25 | 3.38
input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 236.43 | 8.75 | 27.03 | 87.77 | 18.58 | 4.72
input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 484.39 | 37.69 | 12.85 | 185.40 | 90.57 | 2.0
### single socket channels first
configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio
-- | -- | -- | -- | -- | -- | --
input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 132.56 | 7.19 | 18.43 | 31.43 | 11.20 | 2.81
input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 227.94 | 13.33 | 17.11 | 63.00 | 23.41 | 2.69
input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 473.68 | 52.79 | 8.97 | 150.40 | 87.33 | 1.72
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92530
Approved by: https://github.com/jgong5 , https://github.com/ezyang
2023-02-06 10:11:25 +00:00
haozhe.zhu
f3bf46e801
enable bf16 emb ( #94163 )
...
Merge https://github.com/pytorch/pytorch/pull/89199 and https://github.com/pytorch/pytorch/pull/91949 into one PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94163
Approved by: https://github.com/jianyuh , https://github.com/malfet , https://github.com/jgong5
2023-02-06 07:11:40 +00:00
Jeff Daily
72502b94f3
correct use of torch.backends.cudnn.flags() ( #93182 )
...
Fixes #77467 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93182
Approved by: https://github.com/ngimel
2023-01-28 06:50:06 +00:00