pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Pruthvi Madugundu	9ce2e02fd6	Revert "[ROCm] Remove PYTORCH_MIOPEN_SUGGEST_NHWC flag (#90725 )" (#110319 ) This reverts commit `66bfcd32fd`. NHWC is have perf regression on MIOpen, so reverting till the performance issue is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110319 Approved by: https://github.com/jeffdaily, https://github.com/jithunnair-amd, https://github.com/kit1980	2023-10-03 19:14:47 +00:00
Jerry Zhang	f2a1b93549	Back out "[quant] Support integer implementations for adaptive_avg_pool2d (#104226 )" (#110316 ) Summary: Original commit changeset: acdb5b34e3aa Original Phabricator Diff: D47321689 Test Plan: opinfo tests in CI Differential Revision: D49789403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110316 Approved by: https://github.com/kimishpatel	2023-10-03 16:59:23 +00:00
Driss Guessous	818f2297e6	Ensure fill_ works when value is a view of self (#109835 ) # Summary Introduced a BC breaking change in #109533 when self is a view of the value. By using the copy_() op inside fill_ we were hitting `assert_no_partial_overlap` in tensor iterator. Ideal we would be able to avoid this check if value.numel() ==1 . But rather than monkeying around with tensor iterator I just clone the input instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109835 Approved by: https://github.com/mikaylagawarecki	2023-09-26 17:12:48 +00:00
CaoE	7c9052165a	add fp16 support for native conv and deconv on CPU (#99497 ) ### Testing Native conv vs. mkldnn conv on SPR (with avx512_fp16 support) Single core: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 34676789 \| 524199.8 \| 66.15185 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 33454125 \| 349844.4 \| 95.62573 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 317650.1 \| 2317.677 \| 137.0554 IC: 128, OC: 256, kernel: 3, stride: 1, N: 1, L: 64 \| 15334.68 \| 167.264 \| 91.67952 56 cores: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 1032064 \| 11073.58 \| 93.20061 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 1000097 \| 16371.19 \| 61.08883 IC: 256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 981813.4 \| 9008.908 \| 108.9825 IC: 1024, OC: 256, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 1082606 \| 10150.47 \| 106.6558 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 319980.6 \| 181.598 \| 1762.027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99497 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-09-25 01:31:26 +00:00
drisspg	deea268e43	Update aten_fill to avoid d2h sync (#109533 ) Fixes #109115 ### Before: <img width="1526" alt="Screenshot 2023-09-18 at 11 57 32 AM" src="https://github.com/pytorch/pytorch/assets/32754868/394a4c51-7cae-4d05-b9ad-b17d02beaf72"> ### After: <img width="1550" alt="Screenshot 2023-09-18 at 11 57 25 AM" src="https://github.com/pytorch/pytorch/assets/32754868/e2f774f5-5374-49c3-95ec-dd3a85f74a2e"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109533 Approved by: https://github.com/mikaylagawarecki	2023-09-19 13:34:49 +00:00
FFFrog	bc3f0d341a	LazyBatchNorm{1-3}d support dict&set (#109015 ) Fixes #105292 As the title shown ,LazyBatchNorm don`t support dict&set, keep consistent with BatchNorm{1-3}d. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109015 Approved by: https://github.com/mikaylagawarecki	2023-09-12 09:09:59 +00:00
CaoE	42f94d7e9f	add Half support for maxpool on CPU (#98819 ) ### Testing Single socket (28 cores): shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 4.12895 \| 6.9669 \| 5.30297 \| 0.55775 \| 1.98917 \| 0.72233 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 0.85093 \| 1.88813 \| 1.38063 \| 5.5742 \| 36.5086 \| 10.58552 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: contig \| 22.37212 \| 37.90383 \| 30.94482 \| 6.85868 \| 10.6116 \| 3.9993 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: CL \| 5.41658 \| 4.71098 \| 4.66578 \| 6.69875 \| 14.7171 \| 5.1167 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 10.69831 \| 18.0468 \| 13.71657 \| 2.61192 \| 4.96172 \| 1.68635 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 2.52637 \| 2.0096 \| 2.0055 \| 2.60314 \| 7.2093 \| 2.49843 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 0.47605 \| 0.88398 \| 0.65326 \| 0.06525 \| 0.115489 \| 0.0674 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 0.10902 \| 0.25293 \| 0.157475 \| 0.11386 \| 0.53319 \| 0.17836 Single core: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 90.9809 \| 163.473 \| 126.1276 \| 6.57721 \| 41.40833 \| 11.82505 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 9.88405 \| 38.39137 \| 29.62069 \| 7.10636 \| 36.97535 \| 11.0525 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: contig \| 476.782 \| 855.4769 \| 648.2248 \| 46.6488 \| 219.2586 \| 67.10599 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: CL \| 80.29271 \| 91.33854 \| 87.80345 \| 48.81692 \| 203.9974 \| 63.39004 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 235.2113 \| 419.0799 \| 315.4284 \| 20.6049 \| 107.1524 \| 32.39169 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 29.47653 \| 33.54905 \| 32.82823 \| 22.59674 \| 98.5586 \| 30.05763 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 7.90684 \| 13.9208 \| 10.03272 \| 0.23725 \| 1.35269 \| 0.41728 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 2.33638 \| 3.36894 \| 2.64635 \| 0.26535 \| 1.244 \| 0.38895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98819 Approved by: https://github.com/mingfeima, https://github.com/mikaylagawarecki	2023-09-05 18:23:41 +00:00
CaoE	3267996372	add channel last 3d support for maxpool3d on CPU (#97775 ) ### Testing Single socket (28 cores): shape \| fp32 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 3.959584 \| 5.493402 \| 0.557232 \| 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 0.815511 \| 1.351261 \| 5.710506 \| 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 10.63426 \| 15.28637 \| 2.67656 \| 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 2.63570 \| 2.05532 \| 2.55452 \| 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 0.375469 \| 0.479748 \| 0.066364 \| 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 0.112197 \| 0.112326 \| 0.111697 \| 0.145364 Single core: shape \| fp32 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 92.16582 \| 128.6513 \| 6.684325 \| 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 10.14318 \| 29.80297 \| 7.350142 \| 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 238.55453 \| 331.89967 \| 19.694657 \| 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 30.17079 \| 32.75628 \| 22.44543 \| 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 7.474389 \| 9.937217 \| 0.236015 \| 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 2.318954 \| 2.469444 \| 0.262125 \| 0.401361 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97775 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-08-26 00:21:27 +00:00
Aaron Gokaslan	660e8060ad	[BE]: Update ruff to 0.285 (#107519 ) This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings. I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519 Approved by: https://github.com/ezyang	2023-08-22 23:16:38 +00:00
Huy Do	d9460bb8f8	Update test_MaxUnpool_index_errors XFAIL after #107483 (#107658 ) After https://github.com/pytorch/pytorch/pull/107483 which reverted https://github.com/pytorch/pytorch/pull/95300, these tests are not XFAIL anymore. So now we know the root cause of https://github.com/pytorch/pytorch/issues/103854. As this is failing slow jobs in trunk atm, i.e. `6981bcbc35`, I'm moving these tests back. ### Testing Run locally and all tests passes. ``` PYTORCH_TEST_WITH_SLOW=1 PYTORCH_TEST_SKIP_FAST=1 python test/nn/test_pooling.py -k test_MaxUnpool_index_errors ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107658 Approved by: https://github.com/PaliC	2023-08-22 22:36:35 +00:00
PyTorch MergeBot	d59a6864fb	Revert "[BE]: Update ruff to 0.285 (#107519 )" This reverts commit `88ab3e4322`. Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))	2023-08-22 19:53:32 +00:00
Aaron Gokaslan	88ab3e4322	[BE]: Update ruff to 0.285 (#107519 ) This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings. I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519 Approved by: https://github.com/ezyang	2023-08-20 01:36:18 +00:00
summerdo	7db6eb7156	[test_nn] add custom device support for dropout tests、lazy_modules te… (#106609 ) add custom device support for dropout tests、lazy_modules tests and multihead_attention tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106609 Approved by: https://github.com/mikaylagawarecki	2023-08-11 09:14:34 +00:00
Jason Lu	bc88028e8e	Back out "Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 )" (#106743 ) Summary: Original commit changeset: 81319beb97f3 Original Phabricator Diff: D47961182 Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822 Reviewed By: atuljangra Differential Revision: D48131623 @diff-train-skip-merge (D48131623 landed internally) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743 Approved by: https://github.com/malfet	2023-08-08 15:27:34 +00:00
Michael Gschwind	3200f63ee6	Make mocked functioned return the proper result structure (tuple for native MHA for attn result and attn weights) (#106526 ) Summary: Make mocked functioned return the proper result structure (tuple for native MHA for attn result and attn weights) Test Plan: sandcastle Differential Revision: D48021277 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106526 Approved by: https://github.com/davidberard98	2023-08-03 19:27:31 +00:00
Mikayla Gawarecki	d8e5f2aa6d	Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224 Approved by: https://github.com/atalman, https://github.com/albanD	2023-07-31 17:18:56 +00:00
Justin Chu	de8bd108b4	[BE] Enable ruff's UP rules in pyproject.toml (#105437 ) Signed-off-by: Justin Chu <justinchu@microsoft.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105437 Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/Skylion007	2023-07-21 19:14:52 +00:00
Justin Chu	79c5e33349	[BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436 Approved by: https://github.com/malfet, https://github.com/albanD	2023-07-21 07:38:46 +00:00
Andrey Talman	c6653b65d8	Back out "Make adding buffers more like adding parameters (#104069 )" (#105581 ) Summary: D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/ with `TypeError: register_buffer() takes 3 positional arguments but 4 were given` Original commit changeset: d4b4069fbd38 Original Phabricator Diff: D47537831 Test Plan: ``` buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform ``` Reviewed By: atalman Differential Revision: D47600140 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581 Approved by: https://github.com/mikaylagawarecki	2023-07-20 03:39:53 +00:00
ekamiti	32d422f335	Make adding buffers more like adding parameters (#104069 ) Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible. Fixes #35735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069 Approved by: https://github.com/mikaylagawarecki	2023-07-17 17:59:05 +00:00
Aaron Gokaslan	2f95a3d0fc	[BE]: Apply ruff PERF fixes to torch (#104917 ) Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-07-11 20:45:21 +00:00
Mikayla Gawarecki	1ad435772b	Added option to always call nn.Module global/non-global forward hooks (#104278 ) Fix #103997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104278 Approved by: https://github.com/albanD	2023-07-10 18:58:07 +00:00
Jerry Zhang	1a661639f7	[quant] Support integer implementations for adaptive_avg_pool2d (#104226 ) Summary: This is needed for representing quantized model in pt2 export quantization flow Test Plan: tested by opinfo, python test/test_ops.py Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/104226 Approved by: https://github.com/jgong5, https://github.com/andrewor14	2023-07-07 19:36:31 +00:00
Huy Do	f27a9129e7	XFAIL test_MaxUnpool_index_errors CUDA slow tests (#103905 ) This has been failing in trunk for a while. Let's XFAIL it while continuing the investigation https://github.com/pytorch/pytorch/issues/103854. We might not need this PR if the fix is on the way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103905 Approved by: https://github.com/mikaylagawarecki	2023-06-22 18:05:10 +00:00
Michael Voznesensky	e5e9d563c2	Lift user defined attributes into inputs for certain cases (user defined types and tensors) (#103386 ) (1) Lazy (converts to dynamo variable on access only) (2) Uses existing side effect/reconstruct tech (3) not tensor opinionated Pull Request resolved: https://github.com/pytorch/pytorch/pull/103386 Approved by: https://github.com/jansel	2023-06-20 23:45:19 +00:00
Fuzzkatt	6d570ccd59	tf32 context fixes for various tests (#103137 ) Addresses tf32 context related failures from NVIDIA internal testing for following unit tests: H100: - functorch/test_vmap.py: test_op_has_batch_rule A100: - test_expanded_weights.py: test_cnn_model_sum - nn/test_convolution.py: test_conv2d_same_padding_backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/103137 Approved by: https://github.com/zou3519	2023-06-15 02:33:12 +00:00
Bearnardd	2abad0c184	Add dtype check baddbmm (#102659 ) Fixes part of the #100838 related to disabling support for non matching dtypes for input/batches for `baddbmm` operator. * [x] added dtype checks * [x] added test case Pull Request resolved: https://github.com/pytorch/pytorch/pull/102659 Approved by: https://github.com/ngimel	2023-06-13 00:31:06 +00:00
Edward Z. Yang	ba962fefea	Add parametrization version of weight_norm (#103001 ) This done in the ordinary way, but also: * Deprecation warning for the old API, and a migration guide * Backwards compatibility for state_dict loading the old weight_norm * Test for pickling and deepcopy, which was the motivating reason weight_norm is still used by HuggingFace Wav2Vec2. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103001 Approved by: https://github.com/albanD	2023-06-06 13:14:43 +00:00
Fuzzkatt	f8896b7b0e	update tf32 thresholds in nn/test_convolution.py (#102015 ) updated tf32 thresholds for test_cudnn_convolution_relu, test_cudnn_convolution_add_relu Pull Request resolved: https://github.com/pytorch/pytorch/pull/102015 Approved by: https://github.com/ngimel	2023-05-24 22:42:25 +00:00
Fuzzkatt	47e9dba765	move tf32_on_and_off fix for test_convolution.py (#102007 ) move tf32_on_and_off after @torch.backends.cudnn.flags(enabled=True, benchmark=False) due to @torch.backends.cudnn.flags(enabled=True, benchmark=False) overwriting tf32_on_and_off if after. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102007 Approved by: https://github.com/ngimel	2023-05-24 02:23:06 +00:00
ts	74dc2a53f6	Thread generator through trunc_normal_ (#100810 ) This will solve @albertz's issue as described in #98200 , threading the generator argument through the trunc_normal_ function. I'm still working on #99796 (and won't let it stall out), but this fix doesn't trigger any JIT issues, so I think it might be helpful to get it merged now. Would be happy to iterate on this if there are any issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100810 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-05-12 01:04:08 +00:00
eqy	3f656ad7bb	[CUDA] Do accumulation for Adaptive Average Pooling in `opmath_t` (#99378 ) Fix for an issue surfaced from the discuss forum: https://discuss.pytorch.org/t/adaptiveavgpool2d-causes-some-data-to-contain-inf/177420 CC @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/99378 Approved by: https://github.com/ngimel	2023-04-28 20:43:12 +00:00
Michael Gschwind	36e1ae6778	De-select odd numbered heads from nn.MHA fastpath (#99672 ) Summary: https://github.com/pytorch/pytorch/issues/97128 * Add test for mha num_heads %2 != 0 * Fix test * Add test for bias false * show test passes Test Plan: sandcastle Differential Revision: D45161767 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99672 Approved by: https://github.com/ngimel	2023-04-25 00:27:18 +00:00
soulitzer	ee1c539ecf	Fix module backward pre-hooks to actually update gradient (#97983 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97983 Approved by: https://github.com/albanD	2023-03-30 20:33:44 +00:00
ecao	b72bddabe9	Move empty check to the start of _pack_padded_sequence (#94885 ) Fixes #94122. Move empty check to the start of `_pack_padded_sequence`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94885 Approved by: https://github.com/kshitij12345, https://github.com/jgong5, https://github.com/malfet	2023-03-22 04:16:58 +00:00
Will Constable	2f6a371ae9	Revert "Optimize nn.Module __call__ fast path for dynamo (#95931 )" (#96242 ) Reverting due to concerns over silent unsoundness (skipped hooks) if users have directly added hooks dicts without using official torch APIs. This reverts commit `26045336ca`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96242 Approved by: https://github.com/albanD	2023-03-10 01:05:01 +00:00
Will Constable	26045336ca	Optimize nn.Module __call__ fast path for dynamo (#95931 ) This PR optimizes the guards overhead introduced by dynamo tracing module forward hooks. It can and maybe should be followed by a wider change proposed by @voznesenskym to optimize specialized nnmodules by 'observing' any user mutations and directly invalidating the root guard, obviating the need to install other nnmodule guards. (But this observer change seems more involved...) Idea: maintain a flag, and keep it up to date whenever adding or removing hooks. Use the flag rather than dict checks to enter the call fast path. - need to extend RemovableHandle to keep a ref to nnModule so it can update the flag on removal. - also need to handle the flag in ScriptModule which still uses the python call impl when called from python. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95931 Approved by: https://github.com/ezyang, https://github.com/voznesenskym	2023-03-04 15:09:40 +00:00
Edward Z. Yang	d303665d33	Make int unspecialization actually work (#95621 ) OK, so this PR used to be about reducing the number of constants we specialize on, but it turns out that unspecialization was ~essentially never used (because we still constant specialized way too aggressively) and I ended up having to fix a bunch of issues to actually get tests to pass. So this PR is now "make int unspecialization actually work". As part of this, I have to turn off unspecialization by default, as there are still latent bugs in inductor. The general strategy is that an unspecialized int is represented as a SymInt. Representing it as a 0d tensor (which is what the code used to do) is untenable: (1) we often need unspecialized ints to participate in size computations, but we have no way of propagating sympy expressions through tensor compute, and (2) a lot of APIs work when passed SymInt, but not when passed a Tensor. However, I continue to represent Numpy scalars as Tensors, as they are rarely used for size computation and they have an explicit dtype, so they are more accurately modeled as 0d tensors. * I folded in the changes from https://github.com/pytorch/pytorch/pull/95099 as I cannot represent unspecialized ints as SymInts without also turning on dynamic shapes. This also eliminates the necessity for test_unspec.py, as toggling specialization without dynamic shapes doesn't do anything. As dynamic shapes defaults to unspecializing, I just deleted this entirely; for the specialization case, I rely on regular static shape tests to catch it. (Hypothetically, we could also rerun all the tests with dynamic shapes, but WITH int/float specialization, but this seems... not that useful? I mean, I guess export wants it, but I'd kind of like our Source heuristic to improve enough that export doesn't have to toggle this either.) * Only 0/1 integers get specialized by default now * A hodgepodge of fixes. I'll comment on the PR about them. Fixes https://github.com/pytorch/pytorch/issues/95469 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95621 Approved by: https://github.com/jansel, https://github.com/Chillee	2023-03-04 01:22:08 +00:00
kshitij12345	3b966a6ce3	[autograd] disable backward/grad for complex scalar output (#92753 ) Fixes https://github.com/pytorch/pytorch/issues/92750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753 Approved by: https://github.com/ezyang	2023-02-23 11:38:27 +00:00
puririshi98	8aa34602f7	Jetson Update for CI Redo (#94549 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94549 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-02-21 17:13:38 +00:00
soulitzer	e5c2a35d83	Add check that embedding_bag's weight is 2D (#94931 ) Fixes https://github.com/pytorch/pytorch/issues/94445 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94931 Approved by: https://github.com/albanD	2023-02-16 02:37:47 +00:00
Xuehai Pan	b005ec62b9	[BE] Remove dependency on `six` and `future` (#94709 ) Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709 Approved by: https://github.com/malfet, https://github.com/Skylion007	2023-02-14 09:14:14 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
haozhe.zhu	ed54a5d06b	enable bf16 emb (#94163 ) Merge https://github.com/pytorch/pytorch/pull/89199 and https://github.com/pytorch/pytorch/pull/91949 into one PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94163 Approved by: https://github.com/jianyuh, https://github.com/malfet, https://github.com/jgong5	2023-02-12 00:05:09 +00:00
Jeff Daily	66bfcd32fd	[ROCm] Remove PYTORCH_MIOPEN_SUGGEST_NHWC flag (#90725 ) Fixes #64427. MIOpen supports ChannelsLast. No longer need to opt-in with env var. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90725 Approved by: https://github.com/malfet	2023-02-09 22:26:24 +00:00
Yuyao Wang	0bf78b57c0	fix: max_unpool3d buffer overflow (#94372 ) Fixes #88032 Previously `output_size` is accessed before the shape length check, which leads to a buffer overflow issue. The fix is simply to prioritize the check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94372 Approved by: https://github.com/albanD	2023-02-08 19:48:25 +00:00
PyTorch MergeBot	53e4fe076a	Revert "enable bf16 emb (#94163 )" This reverts commit `f3bf46e801`. Reverted https://github.com/pytorch/pytorch/pull/94163 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. But I suspect that it causes flaky SIGSEGV failure for linux-bionic-py3.8-clang9 / test (crossref) job in trunk. For example, `05397b1250`	2023-02-07 00:32:22 +00:00
mingfeima	26cba842ad	Optimize ConvTransposed2D with mkldnn float32 and bfloat16 on CPU (#92530 ) this PR optimized `ConvTranspose2d` with oneDNN and add channels last support for it. Also the fallback path `slow_conv_transpose2d` also have channels last support. So the memory format propagation behavior would stay the same with or without oneDNN. Replacement of https://github.com/pytorch/pytorch/pull/77060, https://github.com/pytorch/pytorch/pull/70897 and https://github.com/pytorch/pytorch/pull/74023 which enables oneDNN for `ConvTranspose2d` and `ConvTranspose3d` The following results collects on Skylake Xeon 8180, dual sockets, 28 cores per socket. ### single core channels last configs \| forward before/ms \| forward after/ms \| ratio \| backward before/ms \| backward after/ms \| ratio -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) \| 181.36 \| 91.16 \| 1.99 \| 531.38 \| 124.08 \| 4.28 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) \| 324.35 \| 153.50 \| 2.11 \| 973.16 \| 185.97 \| 5.23 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) \| 1086.82 \| 671.52 \| 1.62 \| 3008.94 \| 1453.33 \| 2.07 ### single core channels first configs \| forward before/ms \| forward after/ms \| ratio \| backward before/ms \| backward after/ms \| ratio -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) \| 138.10 \| 5.94 \| 23.23 \| 37.97 \| 11.25 \| 3.38 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) \| 236.43 \| 8.75 \| 27.03 \| 87.77 \| 18.58 \| 4.72 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) \| 484.39 \| 37.69 \| 12.85 \| 185.40 \| 90.57 \| 2.05 ### single socket channels last configs \| forward before/ms \| forward after/ms \| ratio \| backward before/ms \| backward after/ms \| ratio -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) \| 138.10 \| 5.94 \| 23.23 \| 37.97 \| 11.25 \| 3.38 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) \| 236.43 \| 8.75 \| 27.03 \| 87.77 \| 18.58 \| 4.72 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) \| 484.39 \| 37.69 \| 12.85 \| 185.40 \| 90.57 \| 2.0 ### single socket channels first configs \| forward before/ms \| forward after/ms \| ratio \| backward before/ms \| backward after/ms \| ratio -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) \| 132.56 \| 7.19 \| 18.43 \| 31.43 \| 11.20 \| 2.81 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) \| 227.94 \| 13.33 \| 17.11 \| 63.00 \| 23.41 \| 2.69 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) \| 473.68 \| 52.79 \| 8.97 \| 150.40 \| 87.33 \| 1.72 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92530 Approved by: https://github.com/jgong5, https://github.com/ezyang	2023-02-06 10:11:25 +00:00
haozhe.zhu	f3bf46e801	enable bf16 emb (#94163 ) Merge https://github.com/pytorch/pytorch/pull/89199 and https://github.com/pytorch/pytorch/pull/91949 into one PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94163 Approved by: https://github.com/jianyuh, https://github.com/malfet, https://github.com/jgong5	2023-02-06 07:11:40 +00:00
Jeff Daily	72502b94f3	correct use of torch.backends.cudnn.flags() (#93182 ) Fixes #77467. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93182 Approved by: https://github.com/ngimel	2023-01-28 06:50:06 +00:00

1 2

84 Commits