Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62336
This PR was generated by removing `const` for all types of nodes in NNC IR, and fixing compilation errors that were the result of this change.
This is the first step in making all NNC mutations in-place.
Test Plan: Imported from OSS
Reviewed By: iramazanli
Differential Revision: D30049829
Pulled By: navahgar
fbshipit-source-id: ed14e2d2ca0559ffc0b92ac371f405579c85dd63
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.
I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop
# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
-j \
-s \
-k \
-v \
--paths torch/csrc/ \
-g"-torch/csrc/jit/passes/onnx/helper.cpp" \
-g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
-g"-torch/csrc/jit/serialization/onnx.cpp" \
-g"-torch/csrc/jit/serialization/export.cpp" \
-g"-torch/csrc/jit/serialization/import.cpp" \
-g"-torch/csrc/jit/serialization/import_legacy.cpp" \
-g"-torch/csrc/onnx/init.cpp" \
-g"-torch/csrc/cuda/nccl.*" \
-g"-torch/csrc/cuda/python_nccl.cpp" \
-g"-torch/csrc/autograd/FunctionsManual.cpp" \
-g"-torch/csrc/generic/*.cpp" \
-g"-torch/csrc/jit/codegen/cuda/runtime/*" \
-g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
-g"-torch/csrc/deploy/interpreter/interpreter.h" \
-g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
-g"-torch/csrc/deploy/interpreter/test_main.cpp"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649
Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.
Reviewed By: walterddr, janeyx99
Differential Revision: D29504258
Pulled By: 1ntEgr8
fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60804
The lowerings are stored as a map c10::Symbol -> std::function and the
signature of thoese functions match the signature of
`computeOperandValue`. Custom lowerings have higher priority over the
standard ones, i.e. we can redefine how a given op is lowered.
In general this feature is aimed at unblocking users whose models
contain ops that are not yet supported by NNC - it allows to quickly add
a custom lowering for a given op.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D29409580
Pulled By: ZolotukhinM
fbshipit-source-id: e8e8dc9d3cb9155cfbf5c08a4216ba1b5b791a60
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59347
We had external call wrappers for them, but they were not used in NNC.
This PR adds lowerings using these ext calls and fixes some bugs in
them.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D28853832
Pulled By: ZolotukhinM
fbshipit-source-id: 1718400368e1a9cf3f19180ee2290a4ed9c99d41
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59346
Currently JIT has a pass to propagate shapes, but doesn't have a
capability to fill in strides and dtypes. This PR works around that by
assuming default dtype to be Float and strides corresponding to
contiguous layout, unless otherwise specified. Ideally, we won't need
this, and this is done simply as a workaround unless the corresponding
features are implemented on JIT side.
This is required for AOT compilation of mobilenet v3 with NNC.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D28853831
Pulled By: ZolotukhinM
fbshipit-source-id: 81adb59409684f39b444909ab8ec58ee4a39d496
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59988
As we broaden operator support, putting all the implementations into
kernel.cpp is getting unwieldy. Let's factor them out into the "operators"
subdirectory.
This diff is big but it's entirely code movement; I didn't change anything,
other than to expose a few utilities in kernel.h.
ghstack-source-id: 131405139
Test Plan: CI
Reviewed By: ZolotukhinM
Differential Revision: D29115916
fbshipit-source-id: ba0df1d8dd4a108b584da3baf168407e966b2c78
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59754
Also, if inputs are contiguous, use their Placeholders
directly rather than generating contiguous Tensors from them.
The rationale for this change is that aten::matmul and aten::conv2d
support transposed inputs; if NNC generates a physical transpose to
perform an external call, performance will be strictly worse than not
fusing (sometimes dramatically so, as in the attached benchmark).
Test Plan: benchmark
Reviewed By: ZolotukhinM
Differential Revision: D29010209
fbshipit-source-id: da6d71b155c83e8d6e306089042b6b0af8f80900
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59508
An assert that was triggering in a previous version is now relaxed to
take 0-dim tensors into account.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D28918342
Pulled By: ZolotukhinM
fbshipit-source-id: c09b62c9725d1603b0ec11fcc051e7c932af06ae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59430
With constant support added, we can now have fusion groups with only
scalar inputs. So, we need to get the device type from the nodes in the graph
rather than just the inputs.
ghstack-source-id: 130613871
Test Plan: new unit test; also see test_tracer test_trace_of_script
Reviewed By: navahgar
Differential Revision: D28891989
fbshipit-source-id: f9e824acbd4856216b85a135c8cb60a2eac3c628
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59279
There were some issues with how we handle 0-dim cases in lowerings and
also in how we generate reductions in that special case. This PR fixes
those issues and reenables a bunch of tests.
Differential Revision:
D28819780
D28819780
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: f3feff35a1ce11821ada2f8d04ae9d4be10dc736
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59157
Currently view is represented as a copy since we don't support inplace
operations in NNC (similar to `aten::reshape`). Lowering for
`aten::expand_as` is exactly the same as for the `aten::expand`, since
we're building the TE expression basing on the output shape anyway.
Differential Revision:
D28774224
D28774224
Test Plan: Imported from OSS
Reviewed By: Chillee
Pulled By: ZolotukhinM
fbshipit-source-id: 0a1593c4c6500dcc5a374213adb734180ae1f72e
Summary:
The triangular_solve only returns the first input, since the second input is just a copy of the first one. Why does that exist?
Also, I fixed the permute lowering - I was previously doing the inverse application of the permute.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59131
Reviewed By: ansley
Differential Revision: D28768169
Pulled By: Chillee
fbshipit-source-id: 8e78611c6145fb2257cb409ba98c14ac55cdbccf
Summary:
Finds a couple of bugs:
1. permute needs to wrap dimensions
2. slice needs to wrap dimensions
3. frac doesn't work correctly for negative values
4. Permute has some other failures.
This PR also fixes 1 + 2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58719
Reviewed By: SplitInfinity
Differential Revision: D28590457
Pulled By: Chillee
fbshipit-source-id: a67fce67799602f9396bfeef615e652364918fbd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58676
We only generate asm for small matmuls, but we were computing the # of
flops using an int32, which is too small.
Test Plan:
```
buck test mode/dev //caffe2/test:static_runtime -- --exact 'caffe2/test:static_runtime - test_mlp (test_static_runtime.TestStaticModule)'
```
Reviewed By: navahgar
Differential Revision: D28562157
fbshipit-source-id: a07ceba5209ef6022ead09140380c116994755cf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58256
Size-1 dims mess up our output restriding logic, because they're
technically "dense" no matter what stride the dimension has. In this example a
size-1 dim has stride 1, which causes all the indices to be taken mod 1 (i.e.,
all indices become 0). We work around this peculiar case by skipping size-1 in
our layout logic, since it has no impact on the rest of the tensor's indexing.
ghstack-source-id: 128932739
Test Plan:
new unit test, plus
```
buck test mode/dev //langtech/mobile/audio_stream_processor:audio_stream_processor_test -- --exact 'langtech/mobile/audio_stream_processor:audio_stream_processor_test - AudioStreamProcessorTest.DemucsReadWriteFloat'
```
Reviewed By: eellison
Differential Revision: D28424388
fbshipit-source-id: e33e39eef2a5bf2797bee78a5987558308b6d110
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57749
add to a fx test
Test Plan: Imported from OSS
Reviewed By: huiguoo
Differential Revision: D28425974
fbshipit-source-id: 195c7a1944decb7a2a99c2831cab38485f32be17
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58028
We were trying to translate the device argument and thus throwing an
unsupported dtype.
ghstack-source-id: 128748658
Test Plan: predictor models
Reviewed By: navahgar
Differential Revision: D28347704
fbshipit-source-id: 331a5786339e01f9df1b1878970b0c5983a92980
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58026
Cat-without-conditionals is a valuable optimization on CPU but on GPU
it can generate invalid code since it may introduce allocations (i.e. extra
kernel launches)
ghstack-source-id: 128748630
Test Plan: predictor
Reviewed By: navahgar
Differential Revision: D28347703
fbshipit-source-id: f9e68cd7bcf5d316082ce8378ddf99f2d33fcc07
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57906
I think it was accidentally flipped in #56875.
Test Plan: Imported from OSS
Reviewed By: Chillee
Differential Revision: D28312947
Pulled By: ZolotukhinM
fbshipit-source-id: 8d0f45e540f47daefbc270f5a2ade87f2171b958
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57552
This method uses `CodeGen::call_raw` instead of `CodeGen::call`.
Relanding #57328 (the entire stack) which was reverted because I forgot
to guard a new test with `ifdef LLVM`.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D28195047
Pulled By: ZolotukhinM
fbshipit-source-id: bcfd3cb5b4f33a149b7549515ffd705e2c4f208f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57383
Notes: I picked up an activation from https://github.com/pytorch/pytorch/issues/56969. You can look at the [activations.cpp](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/Activation.cpp#L429) file which has both forward and backward kernel code to help you write the NNC lowering and the symbolic gradient.
I added a test in test_jit_fuser_te for the fusion, and I added an OpInfo and asserted that we expect to see autodiffable nodes to test the symbolic gradient.
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D28197820
Pulled By: eellison
fbshipit-source-id: 05305d85c5bb0847c8f911b95ba47b137dca7e90
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57169
The pass is planned to be used in AOT pipeline, where we expect input
graphs to be functional. As such, these graphs should not use 'self'
argument even if it is present, and thus it can be remove safely.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D28128328
Pulled By: ZolotukhinM
fbshipit-source-id: a7dfbf7776682826100c8eb0fef982a2e81c2554
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57076
This pass is intended to be used in conjunction with shape propagation
pass: first we use sample inputs to specify shape info for graph inputs
and then we run shape-prop to infer shapes of intermediate values in the
graph.
Differential Revision: D28048290
Test Plan: Imported from OSS
Reviewed By: astaff
Pulled By: ZolotukhinM
fbshipit-source-id: 778d772e873d59d77af9f669f45dc44b9ee5e443