Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879
This change makes the output of `prim::TupleConstruct` alias only with its inputs *when* the created tuple is directly returned from the graph.
The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used.
Test Plan:
Added
- `AliasMoveForTupleConstructWithSingleUseAsGraphOutput`
- `WildcardAliasForTupleConstructWithUses`
to cover the newly added code.
Reviewed By: eellison
Differential Revision: D30437737
fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64824
See comment in function_schema.h for explanation. I claim that this is a good tradeoff because the aliasing information seems to be used only in compiler-ish code paths, where performance isn't as critical as actual execution. If performance is important there too, perhaps we should hoist isWrite into the Argument itself since there are several paths that only care about isWrite.
ghstack-source-id: 138958896
Test Plan: CI, profile schema parsing on startup and see much fewer page faults in createArgumentVector.
Reviewed By: suo
Differential Revision: D30860719
fbshipit-source-id: 1d4d2328f2b8e34f5ddf9d82083fd4dd7b7f738f
Summary:
1. Enable support for operators with default args and out args. For `torch.add(x, h, out=x)`, the number of specified arguments will be 3 instead of 4.
2. Bump bytecode version from 6 to 7
3. Implement backport_v7_to_v6 function. Also slightly refactor the local_thread to allow re-emit operators.
4. unittest to cover backport function
5. Update expect result from 4 to 3 in unit test DefaultArgsWithOutArg to cover the number of specified arguments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63651
ghstack-source-id: 138539912
Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions
```
Reviewed By: raziel, tugsbayasgalan
Differential Revision: D30454080
fbshipit-source-id: 357c50b96682430675142d20d688d1f64e1de307
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179
This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build.
Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`.
Reviewed By: iseeyuan
Differential Revision: D31006555
fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac
Summary:
Syncing nvfuser code base from devel branch, Listing a few of our development since last sync:
- Extends support to normalization and reduction kernels.
- Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation.
- profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes).
To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle.
internal updates are files located in:
1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda`
2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser`
3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h`
updates affecting integration:
1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/*`,
2. exposed a few more symbols `aten/src/ATen/core/*` used by codegen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745
Reviewed By: saketh-are
Differential Revision: D30752939
Pulled By: malfet
fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862
Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter.
* The decoupled functions are re-used by current lite interpreter loader.
* The bytecode can be serialized/deserialized from other formats.
* The decoupled functions have minimum dependencies on other PyTorch components.
Next:
Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components.
ghstack-source-id: 137867287
Test Plan:
As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction).
CI
Reviewed By: larryliu0820
Differential Revision: D29798382
Pulled By: iseeyuan
fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f
Summary:
This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234
Reviewed By: gmagogsfm
Differential Revision: D30656444
Pulled By: ansley
fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a
Summary:
1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag.
2. Add two unittests to cover this type of operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540
ghstack-source-id: 137211562
Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
```
Reviewed By: raziel, iseeyuan, tugsbayasgalan
Differential Revision: D30414156
fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307
Original commit changeset: 0b2aa7c57d08
Restores original changes.
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
chrome trace generation.
operator level memory profiling (to be added)
flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)
Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp.
Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985
Reviewed By: raziel
Differential Revision: D30680354
fbshipit-source-id: b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
- chrome trace generation.
- operator level memory profiling (to be added)
- flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)
Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985
Reviewed By: raziel
Differential Revision: D30327514
fbshipit-source-id: 3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414
Misuse of raw pointer in here where stack is never nullable.
ghstack-source-id: 136938318
Test Plan:
compiles.
Imported from OSS
Reviewed By: ejguan
Differential Revision: D30375410
fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62755
After this change, out argument can be checked by calling is_out()
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D30415256
Pulled By: tugsbayasgalan
fbshipit-source-id: b2e1fa46bab7c813aaede1f44149081ef2df566d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578
Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation.
Most of the implementation/tests are the same as `prim::VarConcat`.
Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt`
Reviewed By: navahgar
Differential Revision: D30426232
fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63577
Since other variadic ops will have an almost identical implementation, we can generalize the `UseVariadicCat` implementation and put it in a common folder.
Also moved some test utilities that other variadic op tests will likely need.
Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOptTest`
Reviewed By: navahgar
Differential Revision: D30409937
fbshipit-source-id: 925c11c27b58ce98cb8368d2a205e26ba66d3db9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348
This change addresses singlaiiit's comment on D30241792 (61b49c8e41), which makes the JIT interpreter's behavior consistent between `future` is set and not.
Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path.
Reviewed By: singlaiiit
Differential Revision: D30347782
fbshipit-source-id: 79ce57283154ca4372e5341217d942398db21ac8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419
This diff adds support for cpu only kineto profiler on mobile. Thus
enabling chrome trace generation on mobile. This bring cpp API for
mobile profiling on part with Torchscript.
This is done via:
1. Utilizating debug handle annotations in KinetoEvent.
2. Adding post processing capability, via callbacks, to
KinetoThreadLocalState
3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be
used in surrounding scope of model execution. This will write chrome
trace to the location specified in profiler constructor.
Test Plan:
MobileProfiler.ModuleHierarchy
Imported from OSS
Reviewed By: raziel
Differential Revision: D29993660
fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62417
This diff adds an option to make enableProfiler enable callbacks only
for certain RecordScopes.
Why?
Profiling has some overhead when we repeatedly execute callbacks for
alls copes. On mobile side when we often have small quantized models
this overhead can be large. We observed that by only profiling top level
op and skipping profiling of other atend ops called within we can limit
this overhead. For example, instead of profling at::conv2d -> at::convolution ->
at::convolution_ and further more if ops like transpose etc. are called,
skipping profiling of those. Of course this limits the visibility, but
at the least this way we get a choice.
Test Plan: Imported from OSS
Reviewed By: ilia-cher
Differential Revision: D29993659
fbshipit-source-id: 852d3ae7822f0d94dc6e507bd4019b60d488ef69
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62228
This diff adds debug handles to events and provides a way to use
RECORD_FUNCTIONs that will pass debug_handles down to profiler, which
will record it in the events.
Why add debug_handles?
For pytorch mobile, with lite interpreter, we generate debug handles
that can be used for lazily symbolicate exception traces to model level
stack trace. Similar to the model level stack trace you get in
TorchScript models. The debug_handles also enable getting module
hierarchy for lite interpreter model, support for which was added to
KinetoProfiler in previous diffs.
Followup plan:
1. Enabled scope callbacks such that lite interpreter can use it to
profiler only top level ops.
2. Enable post processing callbacks that take KinetoEvents and populate
module hierarchy using debug handles.
This will let us use KinetoProfiler for lite interpter use cases on
mobile. Aim is to use RAII guard to similarly generate chrome trace for
mobile usecases as well, although only for top level ops.
Test Plan:
test_misc : RecordDebugHandles.Basic
Imported from OSS
Reviewed By: ilia-cher
Differential Revision: D29935899
fbshipit-source-id: 4f06dc411b6b5fe0ffaebdd26d3274c96f8f389b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63073
It turned out that it's less than ideal to print out verbose stacktrace in exception messages in high-QPS services (see the related task) with a non-significant failure rate due to the truncation of long stacktrace which results in losing the original exception message thrown from native code. It is actually desirable to retain only the message of the original exception directly thrown from native code in such a usecase.
This change adds a new flag `torch_jit_disable_exception_stacktrace` to the pytorch jit interpreter to suppress stacktrace in the messages of exception thrown from the interpreter.
Reviewed By: Krovatkin
Differential Revision: D30241792
fbshipit-source-id: c340225c69286663cbd857bd31ba6f1736b1ac4c
Summary:
Add `-Wno-writable-strings`(which is clang's flavor of `-Wwrite-strings`) to list of warnings ignored while compiling torch_python.
Avoid unnecessary copies in range loop
Fix number of signed-unsigned comparisons
Found while building locally on M1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62930
Reviewed By: albanD
Differential Revision: D30171981
Pulled By: malfet
fbshipit-source-id: 25bd43dab5675f927ca707e32737ed178b04651e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61895
* Add FLOP count for addmm, should be `2*m*n*k`.
Share the same code path for `addmm` and `mm`.
Test Plan:
Imported from OSS
`python test/test_profiler.py`
Run a sample profile and check that FLOPS for `aten::addmm` is correct.
`[chowar@devbig053.frc2 ~/local/pytorch/build] ninja bin/test_jit`
`[chowar@devbig053.frc2 ~/local/pytorch/build] ./bin/test_jit --gtest_filter='ComputeFlopsTest*'`
Reviewed By: dskhudia
Differential Revision: D29785671
fbshipit-source-id: d1512036202d7234a981bda897af1f75808ccbfe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63005
Realized I forgot to move the Runtime half of these functions be within the struct.
Test Plan: ci
Reviewed By: pavithranrao
Differential Revision: D30205521
fbshipit-source-id: ccd87d7d78450dd0dd23ba493bbb9d87be4640a5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61477
It would be nice if the compatibility api was just kinda plug and play with no care about the internals of the api at all. Thats what this diff aims to provide.
The general usage would be something like
< On the Client >
RuntimeCompatibilityInfo runtime_info = get_runtime_compatibility_info();
.
.
.
< On the Server >
ModelCompatibilityInfo model_info = get_model_compatibility_info(<model_path>);
bool compatible = is_compatible(runtime_info, model_info);
Currently RuntimeCompatibilityInfo and ModelCompatibilityInfo are exactly the same, but it seemed feasible to me that they may end up diverging as more information is added to the api (such as a min supported bytecode version being exposed from the runtime).
Test Plan: unit test and ci
Reviewed By: dhruvbird, raziel
Differential Revision: D29624080
fbshipit-source-id: 43c1ce15531f6f1a92f357f9cde4e6634e561700
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62306
Test to see if caching of operators works as expected. When caching operators during model load we look up using the operator name. This test ensures that even if there are multiple operators with the same name (in the same model), the caching distinguishes between the ones that have a different number of arguments specified during the call in the serialized bytecode.
In this specific test, there's a model with 3 methods, 2 of which return a `float32` tensor and one which return an `int64` dtype. Please see the comments in the diff for details.
ghstack-source-id: 134634613
Test Plan:
Test command:
```
cd fbsource/fbcode/
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.OperatorCacheDifferentiatesDefaultArgs'
```
```
cd fbsource/
buck test xplat/caffe2:test_lite_interpreter
```
Reviewed By: raziel
Differential Revision: D29929116
fbshipit-source-id: 1d42bd3e6d33128631e970c477344564b0337325
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62305
Currently, it's super time consuming to run a lite interpreter test from fbcode since it takes > 10 minutes to build. Recently, I haven't been able to do that either due to low disk space.
Having this test available in fbsource/xplat/ is a great win for productivity since I can re-run it in ~2 minutes even after significant changes!
I've had to disarm some tests that can only run in OSS of fbcode builds (since they need functionality that we don't include for on-device FB builds). They are disarmed using the macro `FB_XPLAT_BUILD`.
ghstack-source-id: 134634611
Test Plan: New test!
Reviewed By: raziel, JacobSzwejbka, cccclai
Differential Revision: D29954943
fbshipit-source-id: e55eab14309472ef6bc9b0afe0af126c561dbdb1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61983
Trial #2. The previous PR (https://github.com/pytorch/pytorch/pull/61498) was reverted because this caused a failure in `pytorch_linux_backward_compatibility_check_test`. Fixed that now by adding to the exception list in `check_backward_compatibility.py`.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D29828830
Pulled By: navahgar
fbshipit-source-id: 947a7b1622ff6e3e575c051b8f34a789e105bcee
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61933
### Issue:
SubModules with same name are not serialized correctly in bytecode format while using `_save_for_mobile`. These submodules are not distinguished as different modules even though they have different foward, setstate etc if they have the same name.
### Fix:
Mangler creates unique names so that modules and submodules that have same names can be uniquely identified while saving the module. iseeyuan rightly pointed out the underlying issue that mangler is not used in the process of saving bytecode and hence unique references for the submodules are not created. Please refer to the notebook to repro the issue: N777224
### Diff:
The above idea of fix is implemented. The mangled names are used in bytecode thereby the files in `code/` directory now have right reference to the `bytecode.pkl`
Will this have backward compatibility?
iseeyuan please feel free to correct or update this.
Yes. This fix impacts only modules with same name sub modules which were not serialized correctly before. Existing modules should have correct references and `_load_for_mobile` must not see any change. To confirm this the existing test cases need to pass for the diff to be approved and shipped.
ghstack-source-id: 134242696
Test Plan:
```
~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestCompositeWithSetStates
Downloaded 0/5 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 19.2 sec (100%) 17619/17619 jobs, 3/17619 updated
Total time: 19.5 sec
More details at https://www.internalfb.com/intern/buck/build/91542d50-25f2-434d-9e1a-b93117f4efe1
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: de9e27cf-4c6c-4980-8bc5-b830b7c9c534
Trace available for this run at /tmp/tpx-20210719-161607.659665/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388
✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (8.140)
✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.528)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388
```
```
~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestConsistencyOfCompositeWithSetStates
Building: finished in 4.7 sec (100%) 6787/6787 jobs, 0/6787 updated
Total time: 5.0 sec
More details at https://www.internalfb.com/intern/buck/build/63d6d871-1dd9-4c72-a63b-ed91900c4dc9
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 81023cd2-c1a2-498b-81b8-86383d73d23b
Trace available for this run at /tmp/tpx-20210722-160818.436635/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153
✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (7.867)
✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestConsistencyOfCompositeWithSetStates (0.607)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153
```
To check the `bytecode.pkl` using module inspector please check:
N1007089
Reviewed By: iseeyuan
Differential Revision: D29669831
fbshipit-source-id: 504dfcb5f7446be5e1c9bd31f0bd9c986ce1a647
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61791
methods from forward
During inlining we attached InlinedCallstack to nodes being inlined. In
the process we attach moodule information as well, such that if
CallMethod is being inlined we know which class instance and class type
the method belongs to. However, CallMethod can be calling a method of
the same object to which the graph belongs. e.g.:
```
def forward(self, input):
x = input + 10
return forward_impl_(x, input)
```
Here forward_impl is method defined on the same class in which forward
is defined. Existing module hierarchy annotation will mislabel this as
unknown instance since the method is not associated with output of
GetAttr node (it would be we had called self.conv.forward_impl_ for
example).
Change in this PR reconciles this by creating a placeholder name "SELF"
for module instance indicating that you can traverse InlinedCallStack
backwards to find first node with name != SELF, which would be the name
of the object.
e.g.:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward
Test Plan:
Add test
Imported from OSS
Reviewed By: larryliu0820
Differential Revision: D29745443
fbshipit-source-id: 1525e41df53913341c4c36a56772454782a0ba93
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61594
### Summary:
Added a unit test for the Nnapi delegate's preprocess() function. The
function was previously tested locally, but now a basic test is
added for OSS.
See https://github.com/pytorch/pytorch/pull/61499 for preprocess
implementation. See D29647123 for local testing.
**TODO:**
Add more comprehensive tests.
Add tests for model execution, after the Nnapi delegate's initialization
and execution is implemented T91991928.
**CMakeLists.txt:**
Added a library for the Nnapi delegate
- Explicit linking of torch_python is necessary for the Nnapi delegate's use of pybind
**test_backends.py:**
Added a test for lowering to Nnapi
- Based off https://github.com/pytorch/pytorch/blob/master/test/test_nnapi.py
- Only differences are the loading of the nnapi backend library and the need to change dtype from float64 to float32
### Test Plan:
Running `python test/test_jit.py TestBackendsWithCompiler -v` succeeds. Also saved and examined the model file locally.
Test Plan: Imported from OSS
Reviewed By: iseeyuan
Differential Revision: D29687143
fbshipit-source-id: 9ba9e57f7f856e5ac15e13527f6178d613b32802
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59881
This pass is not included in the JIT flow or anywhere else at this point. The idea is, once this lands, everyone can use this to test their workflow with this transformation and once we are convinced this is useful and/or improves performance, we can include it in the appropriate workflow.
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D29277876
Pulled By: navahgar
fbshipit-source-id: b5be7bdcc98dced59295bd7b8f6627619cb58d41