jjsjann123
df741c589f
[NVFuser] Upstream push 0809 ( #83067 )
...
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/
Code changes includes:
- codegen improvements:
1. removes un-necessary sync from redundant thread compute analysis
2. symmetric API for BestEffortReplay
3. support merge on trivial reductions
4. Ampere async copy improvements
- bug fixes:
1. vectorization bug fixes
2. type inference patch : fixes upstream #81725
3. segmenter bug fix with deterministic iteration ordering
- parser update
1. added leaky_relu
- scheduler
1. normalization scheduler clean up.
2. simplifies matmul scheduling with new transform propagator
3. merge all dimensions in PW scheduler
4. various gemm related improvements
- debuggability
1. nsight compute support
2. debug dump for InlinePropagator
3. Add `UnaryOpType::Print`
Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:
```
dfe02f3faed4c64477e5f5c678f21f33415d0195 Merge remote-tracking branch 'csarofeen/devel' into HEAD
16173732ecfafc4797e93c2449cfb778015a6c7a Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884 )
7cfb7796bdcf055eb61d600b7b5c9df292950290 Merge pull request #1887 from csarofeen/upstream_merge_0803
3399f6de62061d30781de50ef1862bbfb1615173 Merge remote-tracking branch 'origin/viable/strict' into HEAD
01208f5bba3bc158d41ccbefa0ee2c5ceea7aedb Add `UnaryOpType::Print` which can be helpful for debugging (#1878 )
0646522454aa715ef164c88a73fb8bdddc706805 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881 )
7bc76aa219293a59e4166e258d76289fe13633ca Fix most inlined propagator for mismatched dims (#1875 )
501f4aa270bf4dd47b0d2f4860bc6f23ebc32a38 Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826 )
d863d690f923047a85b5229a787118708f810741 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827 )
e0ae11a61c87cd998e88ddd79a496548171c31e0 Larger sized mma instructions to support full vectorization (#1824 )
9bb4cf7a66b098f04c9d95a2d34ab2bceee151b3 fragment iteration to support fully unrolled mma ops (#1823 )
a48270a18dc2d3accc2626758d14d5858ae55032 Merge all dims in pointwise scheduler (#1872 )
172fb3673fb4aaf4c1e889922a4fc5c06cbd59f7 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868 )
a64462a5ac2fcf57a177bf36b0f26c61a4e252a4 Allow trivial reduction to be merged (#1871 )
440102bcda6eb1dcd42d5fa5aeab9d6b049956bc Symmetric API for BestEffortReplay (#1870 )
d1caf330c08ea8002f7133ca655bbd5b28c4eb98 Some misc cleanups/refactor split out from #1854 (#1867 )
1013eda50be38eac96c00ba781340ac199d5a136 Remove some welford specific logic. (#1864 )
51589d36be5a101d06e641fe0400b39028b7cb81 Some cleanups on tests and heuristics params (#1866 )
a6b3e70da5dee51dbc246347228ea21384e46ac3 Segmenter bug fix, and deterministic iteration ordering. (#1865 )
1b665b9b5e562d6f0caba5e7319e83e5df64104f Add nullptr checks to IrBuilder (#1861 )
1cd9451d7493f631c2837ba07c1ea93a74e83a15 Simplify matmul scheduling with the new transform propagator. (#1817 )
bbc1fb9b8c454f557ab9fcf5b1c3cef9b9e136d0 Add leaky_relu operation (#1852 )
e842a9bab5e9f7289b7ce33ee37a682b22373f49 Minor cleanup in pointwise scheduler (#1858 )
9ee850ca2f7f51dd5269bffb1255e485f809282d Fix stringstream usage (#1857 )
20a36c1e4f28c4ff9837e56784be2686d17435f3 Improve nsight compute support (#1855 )
405910308301097297b55c34d560aab6a360e897 Remove debugging `true ||` from getPointwiseHeuristics (#1822 )
01117bfe8fdfacdbfdcfba9a624cdf900fe044d4 Misc cleanup (#1853 )
5cc64943dc381a568223140bce0f22163c01e29f Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846 )
92e6f0207e3a89fe90fd5cd3ffc575dfd766ba00 Cleanup normalization scheduler (#1845 )
db89c6591a2f21130599a93675e0615e55564e41 Type inference patch (#1848 )
102fe93a4605ca465cda26ebaee4ba1af2026901 Add debug dump for InlinePropagator (#1847 )
b7a4d93d375a6e2ddef483763c93ffddc62ec452 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687 )
942be5b256056d0e02877361b814ae6af32ca15f Upstream ci build fixes (#1842 )
0b83645915029d67f9345aa4649b8c6f62b0061b Fix vectorization bug introduced in #1831 (#1840 )
63630f1ae091180e541932a9d9dc598e0a9902dd Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825 )
9135a963c01d97ba34b1a7d2f106e78a13fd6651 Fix transpose benchmark dtype (#1839 )
2c9a6c02312d5bf4f83cde653b847b4f85849432 Add extra configurability to `parallelizeAllLike` (#1831 )
```
RUN_TORCHBENCH: nvfuser
Differential Revision: [D38543000](https://our.internmc.facebook.com/intern/diff/D38543000 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83067
Approved by: https://github.com/davidberard98
2022-08-10 21:02:56 +00:00
jjsjann123
c9c402eae9
[nvfuser_upstream_push] Reland: nvfuser code base bump 060822 ( #79406 )
...
Landing reverted PR #79147 .
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/
Bug fixes and minor refactor
Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:
```
4c60e7dff22a494632370e5df55c011007340d06 Add examples infrastructure for using nvFuser in a standalone program (#1725 )
02a05d98334ffa580d73ccb28fdb8c577ad296fe Fix issue #1751 (#1753 )
8a69aa320bd7629e1709fe5ceb7104d2c88ec84c Refactor NvFuser transpose API to match eager mode behavior (#1746 )
ffdf6b7709048170d768217fcd7083fc8387f932 Remove BroadcastWithoutStride. (#1738 )
02bab16035e70734450c02124f5cdaa95cf5749d Fix flipping of a boolean flag (#1745 )
465d66890c8242e811224359cbdb1c2915490741 cleanup (#1744 )
26d354e68720bc7dd2d3b1338ac01b707a230b6a fixing noncontig broadcast (#1742 )
856b6b2f9073662dd98ca22ba6c3540e20eb1cdd Add IterDomainBuilder (#1736 )
1fd974f912cd4c1e21cbd16e2abb23598d66a02f fixing warning for gcc7 (#1732 )
de2740a43a869f8272c2648e091d7b8235097db9 disabling complex in python tests for #1730 (#1733 )
fbbbe0a2e7c7a63e0e2719b8bfccb759b714221a fixing MSVC build (#1728 )
b5feee5e2b28be688dbddc766f3c0220389c8175 Fix the fused reduction runtime kernel (#1729 )
5247682dff5980bb66edf8d3aac25dea2ef2ced5 Re-entrant GroupedGridReduction (#1727 )
```
RUN_TORCHBENCH: nvfuser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79406
Approved by: https://github.com/davidberard98
2022-06-16 17:52:21 +00:00
PyTorch MergeBot
d28e9e145b
Revert "[nvfuser_upstream_push] nvfuser code base bump 060822 ( #79147 )"
...
This reverts commit 49c41b87a2 .
Reverted https://github.com/pytorch/pytorch/pull/79147 on behalf of https://github.com/janeyx99 due to Broke 11.3 builds on trunk 49c41b87a2
2022-06-10 20:55:10 +00:00
jjsjann123
49c41b87a2
[nvfuser_upstream_push] nvfuser code base bump 060822 ( #79147 )
...
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/
Bug fixes and minor refactor
Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:
```
4c60e7dff22a494632370e5df55c011007340d06 Add examples infrastructure for using nvFuser in a standalone program (#1725 )
02a05d98334ffa580d73ccb28fdb8c577ad296fe Fix issue #1751 (#1753 )
8a69aa320bd7629e1709fe5ceb7104d2c88ec84c Refactor NvFuser transpose API to match eager mode behavior (#1746 )
ffdf6b7709048170d768217fcd7083fc8387f932 Remove BroadcastWithoutStride. (#1738 )
02bab16035e70734450c02124f5cdaa95cf5749d Fix flipping of a boolean flag (#1745 )
465d66890c8242e811224359cbdb1c2915490741 cleanup (#1744 )
26d354e68720bc7dd2d3b1338ac01b707a230b6a fixing noncontig broadcast (#1742 )
856b6b2f9073662dd98ca22ba6c3540e20eb1cdd Add IterDomainBuilder (#1736 )
1fd974f912cd4c1e21cbd16e2abb23598d66a02f fixing warning for gcc7 (#1732 )
de2740a43a869f8272c2648e091d7b8235097db9 disabling complex in python tests for #1730 (#1733 )
fbbbe0a2e7c7a63e0e2719b8bfccb759b714221a fixing MSVC build (#1728 )
b5feee5e2b28be688dbddc766f3c0220389c8175 Fix the fused reduction runtime kernel (#1729 )
5247682dff5980bb66edf8d3aac25dea2ef2ced5 Re-entrant GroupedGridReduction (#1727 )
```
RUN_TORCHBENCH: nvfuser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79147
Approved by: https://github.com/davidberard98
2022-06-10 19:37:42 +00:00
jjsjann123
a2802ad0b9
Upstream master bump 0513 ( #77471 )
...
Updating nvfuser code base.
This should fix the indexing issue observed in https://github.com/pytorch/vision/issues/6015 .
Running tests locally as well. Will update the description here at a later point
@bypass-github-export-checks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77471
Approved by: https://github.com/seemethere , https://github.com/eellison
2022-05-18 11:48:50 -07:00