Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887
BufHandle has exactly the same functionality and should be used instead.
Differential Revision:
D30889483
D30889483
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64828
Also, make `removeUnusedSelfArgument` more consistent with other passes
by mutating the graph in-place rather than returning a copy.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D30870776
Pulled By: ZolotukhinM
fbshipit-source-id: 4873f01b013921143a5aa43746d655a2d8d620c9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64717
This also exposed several bugs, which are fixed in this PR.
Differential Revision:
D30826408
D30826408
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: a67ec5739aceed9ffdf0d24f77eb3787cefe4560
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627
This fixes the root cause of S242719
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D30801686
Pulled By: navahgar
fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64589
Adding softplus operator lowering for NNC. Enabling element wise fusion as well.
Test Plan: Added a test in test_jit_fuser.py
Reviewed By: bertmaher
Differential Revision: D30736449
fbshipit-source-id: 6c5fc3bceb5cef2322ecd4449f827e4af018ea93
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64516
If fuser compilation fails due to a bug (which should be highly
unlikely at this point) we want to direct the user how to unblock themselves by
disabling fusion, in addition to requesting that they report a bug.
ghstack-source-id: 137398537
Test Plan: existing tests
Reviewed By: ZolotukhinM
Differential Revision: D30758051
fbshipit-source-id: 98be89f1b1d4fb3bc816f5b2634c618b9297930e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64077
We were assuming kernel dimensions fit in 32 bits (the old fuser made
this assumption too), but we should be able to support 64.
ghstack-source-id: 136933272
Test Plan: unit tests; new IR level test with huge sizes
Reviewed By: ZolotukhinM
Differential Revision: D30596689
fbshipit-source-id: 23b7e393a2ebaecb0c391a6b1f0c4b05a98bcc94
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63923
The input graph can contain constants whose names contain special characters. So, all names of constants in the input graph need to be sanitized.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63990
Reviewed By: ZolotukhinM
Differential Revision: D30558432
Pulled By: navahgar
fbshipit-source-id: de5b0c23d50ee8997f40f2c0fc605dda3719186f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776
I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack. Let's try again.
I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847
Test Plan: CI
Reviewed By: huiguoo
Differential Revision: D30484555
fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587
Now that there is no classes using KernelArena for memory management we
can remove it.
Differential Revision:
D30429115
D30429115
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586
This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.
After this change nothing uses KernelScope/KernelArena and they can be
safely removed.
Differential Revision:
D30429114
D30429114
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195
This helps us to later switch from using KernelArena with raw pointers
to shared pointers without having to change all our source files at
once.
The changes are mechanical and should not affect any functionality.
With this PR, we're changing the following:
* `Add*` --> `AddPtr`
* `new Add(...)` --> `alloc<Add>(...)`
* `dynamic_cast<Add*>` --> `to<Add>`
* `static_cast<Add*>` --> `static_to<Add>`
Due to some complications with args forwarding, some places became more
verbose, e.g.:
* `new Block({})` --> `new Block(std::vector<ExprPtr>())`
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D30292779
Pulled By: ZolotukhinM
fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62336
This PR was generated by removing `const` for all types of nodes in NNC IR, and fixing compilation errors that were the result of this change.
This is the first step in making all NNC mutations in-place.
Test Plan: Imported from OSS
Reviewed By: iramazanli
Differential Revision: D30049829
Pulled By: navahgar
fbshipit-source-id: ed14e2d2ca0559ffc0b92ac371f405579c85dd63
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.
I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop
# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
-j \
-s \
-k \
-v \
--paths torch/csrc/ \
-g"-torch/csrc/jit/passes/onnx/helper.cpp" \
-g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
-g"-torch/csrc/jit/serialization/onnx.cpp" \
-g"-torch/csrc/jit/serialization/export.cpp" \
-g"-torch/csrc/jit/serialization/import.cpp" \
-g"-torch/csrc/jit/serialization/import_legacy.cpp" \
-g"-torch/csrc/onnx/init.cpp" \
-g"-torch/csrc/cuda/nccl.*" \
-g"-torch/csrc/cuda/python_nccl.cpp" \
-g"-torch/csrc/autograd/FunctionsManual.cpp" \
-g"-torch/csrc/generic/*.cpp" \
-g"-torch/csrc/jit/codegen/cuda/runtime/*" \
-g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
-g"-torch/csrc/deploy/interpreter/interpreter.h" \
-g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
-g"-torch/csrc/deploy/interpreter/test_main.cpp"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649
Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.
Reviewed By: walterddr, janeyx99
Differential Revision: D29504258
Pulled By: 1ntEgr8
fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60804
The lowerings are stored as a map c10::Symbol -> std::function and the
signature of thoese functions match the signature of
`computeOperandValue`. Custom lowerings have higher priority over the
standard ones, i.e. we can redefine how a given op is lowered.
In general this feature is aimed at unblocking users whose models
contain ops that are not yet supported by NNC - it allows to quickly add
a custom lowering for a given op.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D29409580
Pulled By: ZolotukhinM
fbshipit-source-id: e8e8dc9d3cb9155cfbf5c08a4216ba1b5b791a60
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59347
We had external call wrappers for them, but they were not used in NNC.
This PR adds lowerings using these ext calls and fixes some bugs in
them.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D28853832
Pulled By: ZolotukhinM
fbshipit-source-id: 1718400368e1a9cf3f19180ee2290a4ed9c99d41
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59346
Currently JIT has a pass to propagate shapes, but doesn't have a
capability to fill in strides and dtypes. This PR works around that by
assuming default dtype to be Float and strides corresponding to
contiguous layout, unless otherwise specified. Ideally, we won't need
this, and this is done simply as a workaround unless the corresponding
features are implemented on JIT side.
This is required for AOT compilation of mobilenet v3 with NNC.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D28853831
Pulled By: ZolotukhinM
fbshipit-source-id: 81adb59409684f39b444909ab8ec58ee4a39d496
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59988
As we broaden operator support, putting all the implementations into
kernel.cpp is getting unwieldy. Let's factor them out into the "operators"
subdirectory.
This diff is big but it's entirely code movement; I didn't change anything,
other than to expose a few utilities in kernel.h.
ghstack-source-id: 131405139
Test Plan: CI
Reviewed By: ZolotukhinM
Differential Revision: D29115916
fbshipit-source-id: ba0df1d8dd4a108b584da3baf168407e966b2c78
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59754
Also, if inputs are contiguous, use their Placeholders
directly rather than generating contiguous Tensors from them.
The rationale for this change is that aten::matmul and aten::conv2d
support transposed inputs; if NNC generates a physical transpose to
perform an external call, performance will be strictly worse than not
fusing (sometimes dramatically so, as in the attached benchmark).
Test Plan: benchmark
Reviewed By: ZolotukhinM
Differential Revision: D29010209
fbshipit-source-id: da6d71b155c83e8d6e306089042b6b0af8f80900
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59508
An assert that was triggering in a previous version is now relaxed to
take 0-dim tensors into account.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D28918342
Pulled By: ZolotukhinM
fbshipit-source-id: c09b62c9725d1603b0ec11fcc051e7c932af06ae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59430
With constant support added, we can now have fusion groups with only
scalar inputs. So, we need to get the device type from the nodes in the graph
rather than just the inputs.
ghstack-source-id: 130613871
Test Plan: new unit test; also see test_tracer test_trace_of_script
Reviewed By: navahgar
Differential Revision: D28891989
fbshipit-source-id: f9e824acbd4856216b85a135c8cb60a2eac3c628
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59279
There were some issues with how we handle 0-dim cases in lowerings and
also in how we generate reductions in that special case. This PR fixes
those issues and reenables a bunch of tests.
Differential Revision:
D28819780
D28819780
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: f3feff35a1ce11821ada2f8d04ae9d4be10dc736
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59157
Currently view is represented as a copy since we don't support inplace
operations in NNC (similar to `aten::reshape`). Lowering for
`aten::expand_as` is exactly the same as for the `aten::expand`, since
we're building the TE expression basing on the output shape anyway.
Differential Revision:
D28774224
D28774224
Test Plan: Imported from OSS
Reviewed By: Chillee
Pulled By: ZolotukhinM
fbshipit-source-id: 0a1593c4c6500dcc5a374213adb734180ae1f72e
Summary:
The triangular_solve only returns the first input, since the second input is just a copy of the first one. Why does that exist?
Also, I fixed the permute lowering - I was previously doing the inverse application of the permute.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59131
Reviewed By: ansley
Differential Revision: D28768169
Pulled By: Chillee
fbshipit-source-id: 8e78611c6145fb2257cb409ba98c14ac55cdbccf