pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mike Iovine	7d8ee38a5c	[Static Runtime] Fix prim::If tuple corner case (#85446 ) Summary: We currently assume that a tuple output implies that the prim::If node returns multiple unpacked outputs, but this is not guaranteed to be the case. Add some logic to return the wrapped tuple if necessary Test Plan: New unit test Differential Revision: D39712050 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85446 Approved by: https://github.com/tenpercent	2022-09-24 01:01:34 +00:00
jjsjann123	0e582fbfcc	[NVFuser] Upstream push 0907 (#84626 ) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Codegen changes include: - codegen improvement: i. improved view support on pointwise and transpose scheduler ii. grouped grid welford added for better outer-norm grid persistence in normalization - misc: i. new composite ops added: variance_mean , arange, ii. fixes misaligned address for transpose scheduler iii. refactor on separation of compilation API from execution API to prepare us for async compilation iv. double type support on expression evaluator v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN Commits that's in this PR from the devel branch: ``` 89330aa23aa804340b2406ab58899d816e3dc3d2 Tensor factories must set the output shape as its input (#1939) b2fd01ea9346712c6d6f623ca6addbc4888d008e arange support (#1933) 56c00fd3922dad7dfc57351ad7d780f0f2f8e4ed Double support on all expression evaluators (#1937) 371f28223e57fe3f6b5e50a0a45177e6a5c0785c Improve trivial reduction merge support (#1931) 1d0c26790e5647920b40d419d26815bbe310b3a6 Test `rand` in a fusion with zero tensor input (#1932) 0dab160fb2177d178eef3148c6a529e0855009e9 Fix softmax bwd sizes. (#1890) ef98f360f6d3e3e1cc662ecb65202d88150f128d Fix a bug (#1936) 63132a0c56508c550084b07fb76a3df865102d00 Propagate permissive mapping information into indexing pass (#1929) b4ac2c88d78078ee4d8b21c4fc51645b5710a282 Map IterationDomains through view operations. (#1919) c0a187a7619d7cf9dc920294e15461791e8d6d4d do not use deprecated functions (#1935) 88de85e758c5e4afb7b6e746573c0d9a53b4cea7 Upstream cherry pick fixes 0811 (#1934) b247dcf7c57dc6ac3f7a799b0a6beb7770536a74 Separate kernel compilation API from kernel execution API (#1914) b34e3b93ee1a8030730c14af3995dd95665af07d Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924) 14a53e6707f43bf760494c238a46386d69830822 Nullary RNGOp (#1892) 3c3c89e638f5172cafb0761f22bacd1fd695eec3 Misc fixes/tuning for transpose scheduler (#1912) 20cf109c8b44d48f61977e35bae94368985144ac Grouped grid welford (#1921) 6cf7eb024c9e53c358cbe56597e117bad56efefd Transpose scheduler small dim sizes better support (#1910) 9341ea9a5bf42f9b14ccad0c94edbc79fc5bb552 Disabled ViewPersistentShmoo sizes that results in NAN (#1922) 057237f66deeea816bb943d802a97c1b7e4414ab Fix CUDA driver error: misaligned address for transpose scheduler (#1918) 3fb3d80339e4f794767a53eb8fdd61e64cf404a2 Add variance_mean function using Welford (#1907) 98febf6aa3b8c6fe4fdfb2864cda9e5d30089262 Remove DisableOption::UnrollWithRng (#1913) ee8ef33a5591b534cf587d347af11e48ba7a15d4 Minor fix for the debug interface of using PTX directly (#1917) 6e8f953351f9dabfd1f991d8431cecb6c2ce684d Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916) 5eefa9a72385f6a4b145680a9dcc52d7e8293763 dopt is only available since nvrtc 11.7 (#1915) 2ec8fc711eafc72451eebf0f5e2a98a38bf3f6ef Kill computeAtBetween (#1911) d0d106a1d9af118d71673173674e875be35d259d Improve view support on pointwise and transpose scheduler (#1906) e71e1ecefe67219846070590bbed54bbc7416b79 Fix name clash of RNG with shared memory (#1904) 3381793a253689abf224febc73fd3fe2a0dbc921 Fix mutator and sameAs for expanded IterDomain (#1902) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84626 Approved by: https://github.com/malfet	2022-09-23 20:29:48 +00:00
Mike Iovine	63c1f2fef9	[Static Runtime] Fold linear prepack ops (#85289 ) Summary: Split `quantized_linear_unpacked_weight_v2` into `linear_prepack` and `quantized_linear` so that the prepacking operation may be eliminated by constant folding. Test Plan: Fixes a huge regression in an internal model: ``` Before 89.6141 ms. 99.0923%. fb::quantized_linear_unpacked_weight_v2 (12 nodes) After 0.806852 ms. 53.5365%. quantized::linear (12 nodes, out variant) (prepacking eliminated) ``` Differential Revision: D39622530 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85289 Approved by: https://github.com/davidberard98	2022-09-22 20:23:07 +00:00
Mike Iovine	e4899764b2	[Static Runtime] Fix aten::index_put list conversions (#85298 ) Summary: Apparently static runtime's list construct return value is always a `GenericList`, so we cannot use the `toOptionalTensorList` method in the general case -- we must convert each item individually. Test Plan: New unit test Differential Revision: D39628979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85298 Approved by: https://github.com/tenpercent	2022-09-22 20:21:52 +00:00
Xu Zhao	6f2b390fc1	Fix import of instruction count benchmark (#85359 ) This PR fixes the instruction count benchmark 1. Fix the updated import path 2. Allows building the benchmark with less compiler options (remove all "-W" options) Test plan: ``` BENCHMARK_USE_DEV_SHM=1 python main.py --mode ci ``` Manually tested and worked on the CI machine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85359 Approved by: https://github.com/robieta	2022-09-21 17:17:47 +00:00
Max Podkorytov	7f90606309	[static-runtime] update generator for the modified tests; re-run autogen script (#84437 ) Test Plan: CI Reviewed By: mikeiovine Differential Revision: D39183148 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84437 Approved by: https://github.com/mikeiovine	2022-09-06 20:07:56 +00:00
Max Podkorytov	bf62ece536	[static-runtime] add schema checks to most of the ops where these checks are missing (#84163 ) Test Plan: existing unit tests; also fix some failing ones along the way Differential Revision: D39074902 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84163 Approved by: https://github.com/mikeiovine	2022-09-01 17:21:22 +00:00
Mike Iovine	db7784e722	[Static Runtime] Schema checks for index_put (#84152 ) Summary: `index_put` can take a list of tensors, but Static Runtime always tries to convert its argument to a list of optional tensors. This was causing crashes for some users. Add some schema checks to prevent this, and add a new overload for the new case. Also, I found a clear bug in the JIT interpreter (mutating the argument when its not supposed to), so I fixed that too. Test Plan: New unit test Differential Revision: D39072214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84152 Approved by: https://github.com/tenpercent	2022-08-31 01:20:14 +00:00
zaf	2f04ba2c7c	[quant][ao_migration] `torch.nn.qat` → `torch.ao.nn.qat` (#78716 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [X] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [X] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [X] [Current PR] `torch.nn.qat` → `torch.ao.nn.qat` - [X] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [X] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36861197/)! Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78716 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:38 +00:00
zaf	d32a762147	[quant][ao_migration] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` (#78714 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] [Current PR] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - [Documentation](docs/source/quantization-support.rst) @vkuzo - [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10 - [BC test](test/quantization/bc/test_backward_compatibility.py) @vkuzo - [IR emitter](torch/csrc/jit/frontend/ir_emitter.cpp) @jamesr66a - [JIT serialization](torch/csrc/jit/serialization/import_source.cpp) @IvanKobzarev @jamesr66a Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860660/)! Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78714 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:34 +00:00
zaf	c92e5ac95b	[quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] [Current PR] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - Documentation @vkuzo - docs/source/conf.py - docs/source/quantization.rst - [quantize_fx](torch/ao/quantization/quantize_fx.py) @jerryzh168 - [common test routine](test/quantization/ao_migration/common.py) @HDCharles - JIT stuff @jamesr66a - torch/csrc/jit/passes/hoist_conv_packed_params.cpp - torch/csrc/jit/passes/quantization/helper.h - torch/csrc/jit/serialization/import_source.cpp Differential Revision: [D38926012](https://our.internmc.facebook.com/intern/diff/D38926012/) Differential Revision: [D38926012](https://our.internmc.facebook.com/intern/diff/D38926012) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78713 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:33 +00:00
jjsjann123	b21a6ff639	[NVFuser] Upstream push 0811 (#83239 ) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. double support in expression evaluator - bug fixes: 1. dropout fix - rework RNG to support broadcasted dropout (Fixes #82784) 2. expand fix - Patch expand+reduction, expand+view, rework view analysis and guard - scheduler: 1. manual transpose schedule example 2. WIP transpose scheduler Commits that's in this PR from the devel branch: ``` b7435afcd22c917713c2f41a7237bc26e1183f14 Transpose scheduler, step 1 (#1854) 8a45dbf72034684eb8e18b1835b533e90b68f184 Add an example on how to manually schedule transpose (#1889) 83dbf56a9554b2efbd5416461d938fff477b0b27 Patch dropout fix (#1898) 69d3519a532250719b1aa8341b50e067b181b42d Expand+Reduction, Expand+View support, rework View analysis and guards (#1883) 15091c488e96343bdc49e3990acbf238a3b3da51 Rework RNG to correctly support broadcasted dropout (#1888) aafe2d048aaac596e503596a41303423619f3954 Make ExpressionEvaluator support Double (#1885) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D38657074](https://our.internmc.facebook.com/intern/diff/D38657074) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83239 Approved by: https://github.com/davidberard98	2022-08-25 02:23:22 +00:00
Richard Barnes	b745e5f115	Check all CUDA API calls for errors in benchmarks/cpp/nvfuser (#74920 ) (#81817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74920 Test Plan: Sandcastle Differential Revision: D35194656 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81817 Approved by: https://github.com/malfet	2022-08-24 18:59:05 +00:00
Sergii Dymchenko	591222f5d9	Fix use-dict-literal lint (#83718 ) Fix use-dict-literal pylint suggestions by changing `dict()` to `{}`. This PR should do the change for every Python file except test/jit/test_list_dict.py, where I think the intent is to test the constructor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83718 Approved by: https://github.com/albanD	2022-08-24 00:26:46 +00:00
Mike Iovine	09157c76c0	[Static Runtime] Add schema checks for aten::list (#83753 ) Summary: The previous implementation assumed that there was only one overload and unconditionally tried to convert its input into a string. Some users were running into crashes because of this. Added a new overload for the list overload and schema checks. Also, I managed to uncover another bug when writing tests for this case (yikes). Returning inputs didn't work because the input cleanup process would destroy the output. Extended `CreateOwnedRefsForSpecialIValues` to fix that. Test Plan: CI + new unit tests Differential Revision: D38870803 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83753 Approved by: https://github.com/tenpercent, https://github.com/albanD	2022-08-22 13:42:47 +00:00
PyTorch MergeBot	6a9c02339d	Revert "[quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713 )" This reverts commit `432f037498`. Reverted https://github.com/pytorch/pytorch/pull/78713 on behalf of https://github.com/janeyx99 due to Reverting for breaking (trunk-only) ios build	2022-08-22 07:32:37 +00:00
PyTorch MergeBot	b1a7b67529	Revert "[quant][ao_migration] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` (#78714 )" This reverts commit `e6fb97d8ae`. Reverted https://github.com/pytorch/pytorch/pull/78714 on behalf of https://github.com/janeyx99 due to sorry, reverting so https://github.com/pytorch/pytorch/pull/78713 could be cleanly reverted	2022-08-22 07:30:48 +00:00
PyTorch MergeBot	4cbb1986fe	Revert "[quant][ao_migration] `torch.nn.qat` → `torch.ao.nn.qat` (#78716 )" This reverts commit `7cd2fa1d38`. Reverted https://github.com/pytorch/pytorch/pull/78716 on behalf of https://github.com/janeyx99 due to sorry, reverting so https://github.com/pytorch/pytorch/pull/78713 could be cleanly reverted	2022-08-22 07:23:24 +00:00
zaf	7cd2fa1d38	[quant][ao_migration] `torch.nn.qat` → `torch.ao.nn.qat` (#78716 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [X] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [X] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [X] [Current PR] `torch.nn.qat` → `torch.ao.nn.qat` - [X] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [X] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36861197/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78716 Approved by: https://github.com/jerryzh168	2022-08-22 05:33:23 +00:00
zaf	e6fb97d8ae	[quant][ao_migration] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` (#78714 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] [Current PR] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - [Documentation](docs/source/quantization-support.rst) @vkuzo - [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10 - [BC test](test/quantization/bc/test_backward_compatibility.py) @vkuzo - [IR emitter](torch/csrc/jit/frontend/ir_emitter.cpp) @jamesr66a - [JIT serialization](torch/csrc/jit/serialization/import_source.cpp) @IvanKobzarev @jamesr66a Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860660/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78714 Approved by: https://github.com/jerryzh168	2022-08-22 05:22:00 +00:00
zaf	432f037498	[quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] [Current PR] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - Documentation @vkuzo - docs/source/conf.py - docs/source/quantization.rst - [quantize_fx](torch/ao/quantization/quantize_fx.py) @jerryzh168 - [common test routine](test/quantization/ao_migration/common.py) @HDCharles - JIT stuff @jamesr66a - torch/csrc/jit/passes/hoist_conv_packed_params.cpp - torch/csrc/jit/passes/quantization/helper.h - torch/csrc/jit/serialization/import_source.cpp Differential Revision: [D36860145](https://our.internmc.facebook.com/intern/diff/D36860145/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78713 Approved by: https://github.com/jerryzh168	2022-08-22 01:38:55 +00:00
zaf	78c8a0d752	[quant][ao_migration] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` (#78712 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] [Current PR] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [ ] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - [Documentation](docs/source/quantization-support.rst) @vkuzo - [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10 Differential Revision: [D36792967](https://our.internmc.facebook.com/intern/diff/D36792967/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36792967/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78712 Approved by: https://github.com/jerryzh168	2022-08-18 17:51:54 +00:00
Max Podkorytov	68d2d7866d	[static-runtime] change the backend for permute_copy (#83532 ) Summary: Testing wrappable dims Differential Revision: D38717563 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83532 Approved by: https://github.com/mikeiovine	2022-08-17 18:10:36 +00:00
jjsjann123	df741c589f	[NVFuser] Upstream push 0809 (#83067 ) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. removes un-necessary sync from redundant thread compute analysis 2. symmetric API for BestEffortReplay 3. support merge on trivial reductions 4. Ampere async copy improvements - bug fixes: 1. vectorization bug fixes 2. type inference patch : fixes upstream #81725 3. segmenter bug fix with deterministic iteration ordering - parser update 1. added leaky_relu - scheduler 1. normalization scheduler clean up. 2. simplifies matmul scheduling with new transform propagator 3. merge all dimensions in PW scheduler 4. various gemm related improvements - debuggability 1. nsight compute support 2. debug dump for InlinePropagator 3. Add `UnaryOpType::Print` Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` dfe02f3faed4c64477e5f5c678f21f33415d0195 Merge remote-tracking branch 'csarofeen/devel' into HEAD 16173732ecfafc4797e93c2449cfb778015a6c7a Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884) 7cfb7796bdcf055eb61d600b7b5c9df292950290 Merge pull request #1887 from csarofeen/upstream_merge_0803 3399f6de62061d30781de50ef1862bbfb1615173 Merge remote-tracking branch 'origin/viable/strict' into HEAD 01208f5bba3bc158d41ccbefa0ee2c5ceea7aedb Add `UnaryOpType::Print` which can be helpful for debugging (#1878) 0646522454aa715ef164c88a73fb8bdddc706805 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881) 7bc76aa219293a59e4166e258d76289fe13633ca Fix most inlined propagator for mismatched dims (#1875) 501f4aa270bf4dd47b0d2f4860bc6f23ebc32a38 Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826) d863d690f923047a85b5229a787118708f810741 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827) e0ae11a61c87cd998e88ddd79a496548171c31e0 Larger sized mma instructions to support full vectorization (#1824) 9bb4cf7a66b098f04c9d95a2d34ab2bceee151b3 fragment iteration to support fully unrolled mma ops (#1823) a48270a18dc2d3accc2626758d14d5858ae55032 Merge all dims in pointwise scheduler (#1872) 172fb3673fb4aaf4c1e889922a4fc5c06cbd59f7 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868) a64462a5ac2fcf57a177bf36b0f26c61a4e252a4 Allow trivial reduction to be merged (#1871) 440102bcda6eb1dcd42d5fa5aeab9d6b049956bc Symmetric API for BestEffortReplay (#1870) d1caf330c08ea8002f7133ca655bbd5b28c4eb98 Some misc cleanups/refactor split out from #1854 (#1867) 1013eda50be38eac96c00ba781340ac199d5a136 Remove some welford specific logic. (#1864) 51589d36be5a101d06e641fe0400b39028b7cb81 Some cleanups on tests and heuristics params (#1866) a6b3e70da5dee51dbc246347228ea21384e46ac3 Segmenter bug fix, and deterministic iteration ordering. (#1865) 1b665b9b5e562d6f0caba5e7319e83e5df64104f Add nullptr checks to IrBuilder (#1861) 1cd9451d7493f631c2837ba07c1ea93a74e83a15 Simplify matmul scheduling with the new transform propagator. (#1817) bbc1fb9b8c454f557ab9fcf5b1c3cef9b9e136d0 Add leaky_relu operation (#1852) e842a9bab5e9f7289b7ce33ee37a682b22373f49 Minor cleanup in pointwise scheduler (#1858) 9ee850ca2f7f51dd5269bffb1255e485f809282d Fix stringstream usage (#1857) 20a36c1e4f28c4ff9837e56784be2686d17435f3 Improve nsight compute support (#1855) 405910308301097297b55c34d560aab6a360e897 Remove debugging `true \|\|` from getPointwiseHeuristics (#1822) 01117bfe8fdfacdbfdcfba9a624cdf900fe044d4 Misc cleanup (#1853) 5cc64943dc381a568223140bce0f22163c01e29f Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846) 92e6f0207e3a89fe90fd5cd3ffc575dfd766ba00 Cleanup normalization scheduler (#1845) db89c6591a2f21130599a93675e0615e55564e41 Type inference patch (#1848) 102fe93a4605ca465cda26ebaee4ba1af2026901 Add debug dump for InlinePropagator (#1847) b7a4d93d375a6e2ddef483763c93ffddc62ec452 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687) 942be5b256056d0e02877361b814ae6af32ca15f Upstream ci build fixes (#1842) 0b83645915029d67f9345aa4649b8c6f62b0061b Fix vectorization bug introduced in #1831 (#1840) 63630f1ae091180e541932a9d9dc598e0a9902dd Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825) 9135a963c01d97ba34b1a7d2f106e78a13fd6651 Fix transpose benchmark dtype (#1839) 2c9a6c02312d5bf4f83cde653b847b4f85849432 Add extra configurability to `parallelizeAllLike` (#1831) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D38543000](https://our.internmc.facebook.com/intern/diff/D38543000) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83067 Approved by: https://github.com/davidberard98	2022-08-10 21:02:56 +00:00
Nikita Shulga	62c8d30f9f	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-10 14:32:26 +00:00
PyTorch MergeBot	d3a1f17fc7	Revert "[BE] Add `append_cxx_flag_if_supported` macro (#82883 )" This reverts commit `d7e6aaa59b`. Reverted https://github.com/pytorch/pytorch/pull/82883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-10 10:27:59 +00:00
Nikita Shulga	d7e6aaa59b	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-08 21:04:09 +00:00
Sergii Dymchenko	a7299647d3	Re-enable non-contig in qarithmetic_test.py (#82345 ) As https://github.com/pytorch/pytorch/issues/29435 is closed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82345 Approved by: https://github.com/clee2000, https://github.com/huydhn	2022-07-29 00:10:50 +00:00
Will Constable	4f34cd6d1e	Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032 ) Avoid exposing defines that conflict with google logging, since this blocks external usage of libtorch in certain cases. All the 'interesting' changes should be in these two files, and the rest should just be mechanical changes via sed. c10/util/logging_is_not_google_glog.h c10/util/logging_is_google_glog.h Fixes https://github.com/pytorch/pytorch/issues/81415 cc @miladm @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/82032 Approved by: https://github.com/soumith, https://github.com/miladm	2022-07-26 01:20:44 +00:00
jjsjann123	8d753c8062	[WIP] Upstream push 0627 (#80355 ) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - TransformPropagator refactor: switched to Dijkstra instead of exhaustive enumeration on all possible paths to reduce compilation time on transform propagation; - Indexing refactor: remove reference tensor creation in all tensor indexing logic (#1690) - (more) generic grouped grid reduction kernel; - Minor parser/fuser patches: 1. zero-dim tensor reduction support 3. no-op binary removal within fused graph 4. expand supported in fusion Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` a054b3efcf5af58ea518de283f55aaf9fe06ff5f Refactor TransormPropagator to allow specifying a position and propagating to part of the DAG (#1775) d67e1cda9b802036841a371318014a818a849b0a Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic (#1690) 1b6529956a1ace220898ad09dde0bf85e49827f7 Issue 1770 (#1774) 35b04276b648c9b55cdb6a67f3889f54e745c3d2 Avoid compilation errors like below: (#1773) 452c77326a340d2a4130b7802f4f319aec60e72a Ignore reductions of zero-dim tensors per PyTorch conventions (#1771) 31d6c56d88afba09ac53b2d5dd3493d625f8cd57 TransformPropagator refactor (#1769) 570c5a84b91a3cf67207331be9650d26a2d37e3d Merge pull request #1767 from csarofeen/upstream_merge_0621 9d6c3d84be86da643df6fd51695543938111f20d merging upstream `61305cd638` 0ed815f76b08f285bda855dd500692ff10a8abce New TransformPropagator algorithm (#1763) 6c195200c0a92fb0f38c833431a8940ed07569b9 no-op binary removal (#1764) ec7fa4187c177186527409dfc5c7b1754d30bc92 Proper propagation of IterType (#1762) b263562dbc3c865007ad7d7d42a58a20be8d7922 Fix dimensionality check (#1759) 2d6343f6cc1e47b63ef20a50d1446f6480736478 More generic grouped grid reduction kernel (#1740) 64e2b56df2c8b9fd22a362d9cc05974a8607ef3d [nvfuser] prevent spamming warning message (#77777) (#1758) 0c431624ff15b6458b9f9b674a3852373fc426b1 [nvFuser] Improving bitwise ops support (#77158) (#1757) b93a14777fde3b9b39684b9cf1715651a806b281 Parser expand (#1754) ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: https://github.com/pytorch/pytorch/pull/80355 Approved by: https://github.com/davidberard98	2022-07-13 19:34:31 +00:00
zaf	55d1b376ea	[ao][sparsity] Vectorized WeightNormSparsifier (#80059 ) The previous implementation was using loops to compute the sparsity within a block in a mask, as well as across the mask blocks. This implements the vectorized version. ## Vectorization: A high level overview of the vectorization procedure falls into a two step process: ### Tensor-level masking A tensor-level masking is a mask generation routine that has a granularity of `sparse_block_shape`. That means that only patches of that shape can be considered sparse/dense. To vectorize: 1. Reshape the data such that one of the dimensions represents the patches of sparse_block_shape. 2. Create a mask of the same shape as the reshaped data 3. Find the smallest `k` elements in the the data, given the dimension of the sparse "patches". `k` represents a derived paramter specifying the sparsity level. 4. Apply the 0/1 to the patches in the mask 5. Reshape the mask back to the original dimensions Note: because the shape of the mask might not be multiple of the sparse_block_shape, we nudge the sshape of the mask, and truncate it afterwards. ## Block-level masking A block-level masking is a mask generation routine that concerns itself only with sparsity within a patch of shape `sparse_block_shape`. This is useful when block sparsity allows partial block sparsification. To vectorize: Overall the block-level masking follows the same routine as the tensor-level algorithm described above. One distinction is that when reshaping the data/mask tensors we aim for creating a dimension that captures the internals of each patch. For example, if a `sparse_block_shape` is `(2, 2)`, we want to reshape the data/mask into `(2, 2, -1)`. That allows us to sort the internal elements on the last axis, and zero-out the ones that obey the sparse logic. Differential Revision: [D37352494](https://our.internmc.facebook.com/intern/diff/D37352494/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37352494/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/80059 Approved by: https://github.com/jerryzh168	2022-07-12 19:16:44 +00:00
Akshay Parashar	38169c2287	[Static Runtime] Fix precision error in test cases (#80935 ) Summary: - Test cases related to DeepAndWideSciptModel() was crashing at random due to precision issue - test cases related for precision: DeepWide, KWargsAPI_1, KWargsAPI_2, KWargsAPI_Optional, FusionPass - test failure was not observed always due to random input to the model (via torch::randn) - Increasing the absolute tolerance for test cases Differential Revision: D37639067 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80935 Approved by: https://github.com/mikeiovine	2022-07-06 16:31:18 +00:00
Hui Guo	a622e3e14d	[static runtime] Fix linalg_solve test case (#79971 ) Summary: Added the third bool argument in inalg_solve test case to remove the runtime error. Test Plan: buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: mikeiovine Differential Revision: D37324419 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79971 Approved by: https://github.com/tenpercent	2022-06-22 16:31:25 +00:00
Max Podkorytov	bf75708ce4	[static-runtime] add nnc codegen for aten::div (#76903 ) Differential Revision: D36151087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76903 Approved by: https://github.com/mikeiovine	2022-06-22 05:47:44 +00:00
Hui Guo	0545c85f74	[static runtime] Add JIT prim ops: aten::cpu, aten::list, aten::numel, aten::__range_length (#79111 ) Summary: This adds the missing jit prim ops appear in the non ads models for c2->pt mitigation: aten::cpu, aten::list, aten::numel, aten::__range_length Test Plan: static runtime unit tests Differential Revision: D36984960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79111 Approved by: https://github.com/davidberard98	2022-06-18 16:38:58 +00:00
jjsjann123	c9c402eae9	[nvfuser_upstream_push] Reland: nvfuser code base bump 060822 (#79406 ) Landing reverted PR #79147. Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Bug fixes and minor refactor Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` 4c60e7dff22a494632370e5df55c011007340d06 Add examples infrastructure for using nvFuser in a standalone program (#1725) 02a05d98334ffa580d73ccb28fdb8c577ad296fe Fix issue #1751 (#1753) 8a69aa320bd7629e1709fe5ceb7104d2c88ec84c Refactor NvFuser transpose API to match eager mode behavior (#1746) ffdf6b7709048170d768217fcd7083fc8387f932 Remove BroadcastWithoutStride. (#1738) 02bab16035e70734450c02124f5cdaa95cf5749d Fix flipping of a boolean flag (#1745) 465d66890c8242e811224359cbdb1c2915490741 cleanup (#1744) 26d354e68720bc7dd2d3b1338ac01b707a230b6a fixing noncontig broadcast (#1742) 856b6b2f9073662dd98ca22ba6c3540e20eb1cdd Add IterDomainBuilder (#1736) 1fd974f912cd4c1e21cbd16e2abb23598d66a02f fixing warning for gcc7 (#1732) de2740a43a869f8272c2648e091d7b8235097db9 disabling complex in python tests for #1730 (#1733) fbbbe0a2e7c7a63e0e2719b8bfccb759b714221a fixing MSVC build (#1728) b5feee5e2b28be688dbddc766f3c0220389c8175 Fix the fused reduction runtime kernel (#1729) 5247682dff5980bb66edf8d3aac25dea2ef2ced5 Re-entrant GroupedGridReduction (#1727) ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: https://github.com/pytorch/pytorch/pull/79406 Approved by: https://github.com/davidberard98	2022-06-16 17:52:21 +00:00
Hui Guo	aee9762a51	[static runtime] Disable unit test for linalg_svdvals (#79574 ) Summary: The test is throwing a jit alias analysis not supporting error. Disabling it for now. Test Plan: buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: mikeiovine Differential Revision: D37056032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79574 Approved by: https://github.com/mikeiovine	2022-06-15 20:35:19 +00:00
Hui Guo	8d7fcfa8f1	[static runtime] Add native ops: aten::index_put, aten::item, aten::tensor_split (#79065 ) Summary: This adds the pytorch operators that are currently missing in non-ads models from c2->pt mitigation: aten::index_put, aten::item, aten::tensor_split Test Plan: buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest Differential Revision: D36984961 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79065 Approved by: https://github.com/davidberard98	2022-06-15 19:15:34 +00:00
PyTorch MergeBot	d28e9e145b	Revert "[nvfuser_upstream_push] nvfuser code base bump 060822 (#79147 )" This reverts commit `49c41b87a2`. Reverted https://github.com/pytorch/pytorch/pull/79147 on behalf of https://github.com/janeyx99 due to Broke 11.3 builds on trunk `49c41b87a2`	2022-06-10 20:55:10 +00:00
jjsjann123	49c41b87a2	[nvfuser_upstream_push] nvfuser code base bump 060822 (#79147 ) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Bug fixes and minor refactor Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` 4c60e7dff22a494632370e5df55c011007340d06 Add examples infrastructure for using nvFuser in a standalone program (#1725) 02a05d98334ffa580d73ccb28fdb8c577ad296fe Fix issue #1751 (#1753) 8a69aa320bd7629e1709fe5ceb7104d2c88ec84c Refactor NvFuser transpose API to match eager mode behavior (#1746) ffdf6b7709048170d768217fcd7083fc8387f932 Remove BroadcastWithoutStride. (#1738) 02bab16035e70734450c02124f5cdaa95cf5749d Fix flipping of a boolean flag (#1745) 465d66890c8242e811224359cbdb1c2915490741 cleanup (#1744) 26d354e68720bc7dd2d3b1338ac01b707a230b6a fixing noncontig broadcast (#1742) 856b6b2f9073662dd98ca22ba6c3540e20eb1cdd Add IterDomainBuilder (#1736) 1fd974f912cd4c1e21cbd16e2abb23598d66a02f fixing warning for gcc7 (#1732) de2740a43a869f8272c2648e091d7b8235097db9 disabling complex in python tests for #1730 (#1733) fbbbe0a2e7c7a63e0e2719b8bfccb759b714221a fixing MSVC build (#1728) b5feee5e2b28be688dbddc766f3c0220389c8175 Fix the fused reduction runtime kernel (#1729) 5247682dff5980bb66edf8d3aac25dea2ef2ced5 Re-entrant GroupedGridReduction (#1727) ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: https://github.com/pytorch/pytorch/pull/79147 Approved by: https://github.com/davidberard98	2022-06-10 19:37:42 +00:00
Akshay Parashar	28f87b9cf9	[Static Runtime] Fix aten::clone out variant (#78297 ) (#78322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/78297 Clone followed by expand/expand_as due to memoryOverlap check on copy_ native method. Refer to T118519310 for more details. Crashing test case: a = tensor(3,1) // strides = (1,1) B = tensor(3,2) // strides = (2,1) Temp = a.expand_as(b). // creates temp with shape as (3,2) and strides as (1,0) temp.clone() // crashe on copy_ due to memoryOverlap Fix: Disable the out variant for the expanded tensor. - Calls native clone instead of out variant for clone dealing with expanded tensors - Added test case for both clone variants (out and native clones) - Increased the tensor size for memory planner test case to trigger dynamic allocation Test Plan: buck test caffe2/benchmarks/static_runtime/fb:test_fb_operators buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest Differential Revision: D36672180 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78322 Approved by: https://github.com/mikeiovine	2022-06-02 21:06:59 +00:00
Max Podkorytov	ebfc70f37a	[static-runtime] out variant for aten::mean (#78161 ) Summary: As subject Test Plan: Added unit tests Differential Revision: D36614633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78161 Approved by: https://github.com/mikeiovine	2022-06-02 20:56:42 +00:00
Max Podkorytov	2679755bdc	[static-runtime] out variant for aten::max (#78271 ) Summary: Previously the op was auto-generated but it only covered the pointwise overload of aten::max. This adds support for reduction, overall and along a dim Test Plan: Added a unit test Differential Revision: D36656378 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78271 Approved by: https://github.com/mikeiovine	2022-05-26 23:29:27 +00:00
Hui Guo	d12bf9fd75	[static_runtime] Add auto-generated view ops (#77106 ) Summary: This includes the generated view ops from D36258767. Test Plan: buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest Differential Revision: D36258968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77106 Approved by: https://github.com/alanwaketan, https://github.com/tenpercent	2022-05-26 03:13:59 +00:00
mikeiovine	56c23f5633	[SR] Out variant for embedding_bag_byte_unpack Pull Request resolved: https://github.com/pytorch/pytorch/pull/77661 Add an out variant and wrapper in static runtime. I just added the declaration with the others in `qembeddingbag.h` for now (rather than properly adding the out variant to the torch library). This can be fixed in a followup. Differential Revision: [D36449840](https://our.internmc.facebook.com/intern/diff/D36449840/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36449840/)! Approved by: https://github.com/tenpercent	2022-05-25 23:24:11 +00:00
mikeiovine	2ae3c59e4b	[SR] Remove linear/relu fusion Pull Request resolved: https://github.com/pytorch/pytorch/pull/77620 Apparently, this is not implemented in fbgemm, so it's strictly worse than using NNC. Differential Revision: [D36431811](https://our.internmc.facebook.com/intern/diff/D36431811/) Approved by: https://github.com/hlu1	2022-05-23 21:46:27 +00:00
Hao Lu	c60d2ef4eb	[StaticRuntime] Replace Permute with copy version only when it's followed by reshape or flatten (#77832 ) Reviewed By: mikeiovine Differential Revision: D36466622 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77832 Approved by: https://github.com/mikeiovine	2022-05-20 03:14:01 +00:00
jjsjann123	a2802ad0b9	Upstream master bump 0513 (#77471 ) Updating nvfuser code base. This should fix the indexing issue observed in https://github.com/pytorch/vision/issues/6015. Running tests locally as well. Will update the description here at a later point @bypass-github-export-checks Pull Request resolved: https://github.com/pytorch/pytorch/pull/77471 Approved by: https://github.com/seemethere, https://github.com/eellison	2022-05-18 11:48:50 -07:00
mikeiovine	02713221e3	[SR] Fuse clamp/nan_to_num Pull Request resolved: https://github.com/pytorch/pytorch/pull/77094 Fuse `clamp` and `nan_to_num` in an NNC kernel. This leads to a big speed up on many models. We can avoid comparisons since clamp potentially gets rid of all of the `inf`s in the input tensor. Differential Revision: [D36220967](https://our.internmc.facebook.com/intern/diff/D36220967/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36220967/)! Approved by: https://github.com/navahgar	2022-05-10 23:33:59 +00:00
Mike Iovine	849984a2cd	[SR] Sigmoid out variant calls fast_sigmoid (#75661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75661 `fast_sigmoid` is a variant of sigmoid in NNC that is implemented in terms of `fast_tanh` (which is a fast rational function approximation). ghstack-source-id: 155604086 Reviewed By: navahgar, hlu1 Differential Revision: D35481390 fbshipit-source-id: 1d64b5c375539f3b2461a1f3d9b86cd696eae7a1 (cherry picked from commit 8106c2512b8d7b373cb6545a43c3e8fc04805c4b)	2022-05-06 00:14:30 +00:00

1 2 3 4 5 ...

757 Commits