Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887
BufHandle has exactly the same functionality and should be used instead.
Differential Revision:
D30889483
D30889483
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862
Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter.
* The decoupled functions are re-used by current lite interpreter loader.
* The bytecode can be serialized/deserialized from other formats.
* The decoupled functions have minimum dependencies on other PyTorch components.
Next:
Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components.
ghstack-source-id: 137867287
Test Plan:
As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction).
CI
Reviewed By: larryliu0820
Differential Revision: D29798382
Pulled By: iseeyuan
fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64862
Previously we erroneously were looking at dst signedness. This was
discovered when we tried to implement quantize/dequantize ops.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D30881696
Pulled By: ZolotukhinM
fbshipit-source-id: 34af842e5e52a3b6b5d2e70c4ef32f910a20341f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64763
Simplification pattern:
x/N -> 0; N is a constant positive integer and x is a for-loop index whose range is a subset of [0, N).
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D30845854
Pulled By: huiguoo
fbshipit-source-id: 814d69ed4be05e57405c222183cc1c6c526721cd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64717
This also exposed several bugs, which are fixed in this PR.
Differential Revision:
D30826408
D30826408
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: a67ec5739aceed9ffdf0d24f77eb3787cefe4560
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627
This fixes the root cause of S242719
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D30801686
Pulled By: navahgar
fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612
This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't
directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to
be rebuilt every time someone changes an operator signature.
Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable
with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to
minimize friction in code mixing the two types.
To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error
into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build
system for certain folders, or just define it at the top of any file.
I've also included an example of manually special-casing the commonly used `contiguous` operator.
The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in
`Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can
materialize a `Tensor` for use in dispatch without actually increasing its refcount.
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D30728580
Pulled By: ezyang
fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64609
We've been using exceptions to indicate whether vectorization succeeded
or not, but that posed some problems with (e.g. we spent too much time
symbolicazing these exceptions). This change converts this mechanism to
a standard error return code.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D30795342
Pulled By: ZolotukhinM
fbshipit-source-id: 16e38b37bcdd78ceb438ac814cc377f35b058e17
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61801
resubmitting because the last one was unrecoverable due to making changes incorrectly in the stack
Test Plan: Imported from OSS
Reviewed By: desertfire
Differential Revision: D29812510
Pulled By: makslevental
fbshipit-source-id: ba9685dc81b6699724104d5ff3211db5852370a6
Summary:
This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234
Reviewed By: gmagogsfm
Differential Revision: D30656444
Pulled By: ansley
fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a
Summary:
1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag.
2. Add two unittests to cover this type of operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540
ghstack-source-id: 137211562
Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
```
Reviewed By: raziel, iseeyuan, tugsbayasgalan
Differential Revision: D30414156
fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307
Original commit changeset: 0b2aa7c57d08
Restores original changes.
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
chrome trace generation.
operator level memory profiling (to be added)
flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)
Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp.
Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985
Reviewed By: raziel
Differential Revision: D30680354
fbshipit-source-id: b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
- chrome trace generation.
- operator level memory profiling (to be added)
- flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)
Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985
Reviewed By: raziel
Differential Revision: D30327514
fbshipit-source-id: 3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63995
JIT methods already have name() in their interface, and Py methods have names in their implementation. I'm adding this for a particular case where someone tried to use name() on a JIT method that we're replacing with an IMethod.
Test Plan: add case to imethod API test
Reviewed By: suo
Differential Revision: D30559401
fbshipit-source-id: 76236721f5cd9a9d9d488ddba12bfdd01d679a2c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414
Misuse of raw pointer in here where stack is never nullable.
ghstack-source-id: 136938318
Test Plan:
compiles.
Imported from OSS
Reviewed By: ejguan
Differential Revision: D30375410
fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64077
We were assuming kernel dimensions fit in 32 bits (the old fuser made
this assumption too), but we should be able to support 64.
ghstack-source-id: 136933272
Test Plan: unit tests; new IR level test with huge sizes
Reviewed By: ZolotukhinM
Differential Revision: D30596689
fbshipit-source-id: 23b7e393a2ebaecb0c391a6b1f0c4b05a98bcc94
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62755
After this change, out argument can be checked by calling is_out()
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D30415256
Pulled By: tugsbayasgalan
fbshipit-source-id: b2e1fa46bab7c813aaede1f44149081ef2df566d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63345
This diff did the following few things to enable the tests:
1. Exposed IMethod as TORCH_API.
2. Linked torch_deploy to test_api if USE_DEPLOY == 1.
3. Generated torch::deploy examples when building torch_deploy library.
Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.*
Reviewed By: ngimel
Differential Revision: D30346257
Pulled By: alanwaketan
fbshipit-source-id: 932ae7d45790dfb6e00c51893933a054a0fad86d
Summary:
Adds a C++ codegen backend to NNC to generate C++ for CPU instead of generating LLVM IR.
Tensors are represented as blobs of float. Vector operations are devectorized/unrolled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62869
Test Plan:
https://github.com/pytorch/pytorch/tree/mvz-nnc-aot-prototype makes it able to AOT compile the whole MobileNetV3 model into binary code through LLVM codegen in NNC.
I forked that branch to https://github.com/cheng-chang/pytorch/tree/cc-aot-cpp, merged this PR into it, and modified `fancy_compile` to compile MobileNetV3 into C++ through
```
import torch
m = torch.jit.load('mobnet.pt')
m.eval()
f = torch.jit.freeze(m)
torch._C._fancy_compile(f.graph, [1, 3, 224, 224])
```
The generated C++ file `mobnet.cc` can be found at https://gist.github.com/cheng-chang/e2830cc6920b39204ebf368035b2bcec.
I manually compiled the generated C++ through `g++ -o mobnet -std=c++14 -L./build/lib -ltorch_cpu -ltorch mobnet.cc`, and it succeeded.
Reviewed By: ZolotukhinM
Differential Revision: D30149482
Pulled By: cheng-chang
fbshipit-source-id: e77b189f0353e37cd309423a48a513e668d07675
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63923
The input graph can contain constants whose names contain special characters. So, all names of constants in the input graph need to be sanitized.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63990
Reviewed By: ZolotukhinM
Differential Revision: D30558432
Pulled By: navahgar
fbshipit-source-id: de5b0c23d50ee8997f40f2c0fc605dda3719186f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776
I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack. Let's try again.
I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847
Test Plan: CI
Reviewed By: huiguoo
Differential Revision: D30484555
fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578
Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation.
Most of the implementation/tests are the same as `prim::VarConcat`.
Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt`
Reviewed By: navahgar
Differential Revision: D30426232
fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587
Now that there is no classes using KernelArena for memory management we
can remove it.
Differential Revision:
D30429115
D30429115
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586
This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.
After this change nothing uses KernelScope/KernelArena and they can be
safely removed.
Differential Revision:
D30429114
D30429114
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63778
This is a preparation for a switch from raw pointers to shared pointers
as a memory model for TE expressions and statements.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D30487425
Pulled By: ZolotukhinM
fbshipit-source-id: 9cbe817b7d4e5fc2f150b29bb9b3bf578868f20c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63577
Since other variadic ops will have an almost identical implementation, we can generalize the `UseVariadicCat` implementation and put it in a common folder.
Also moved some test utilities that other variadic op tests will likely need.
Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOptTest`
Reviewed By: navahgar
Differential Revision: D30409937
fbshipit-source-id: 925c11c27b58ce98cb8368d2a205e26ba66d3db9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63197
This solves non-determinism from using hash values in sort methods.
Changes in tests are mostly mechanical.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D30292776
Pulled By: ZolotukhinM
fbshipit-source-id: 74f57b53c3afc9d4be45715fd74781271373e055
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195
This helps us to later switch from using KernelArena with raw pointers
to shared pointers without having to change all our source files at
once.
The changes are mechanical and should not affect any functionality.
With this PR, we're changing the following:
* `Add*` --> `AddPtr`
* `new Add(...)` --> `alloc<Add>(...)`
* `dynamic_cast<Add*>` --> `to<Add>`
* `static_cast<Add*>` --> `static_to<Add>`
Due to some complications with args forwarding, some places became more
verbose, e.g.:
* `new Block({})` --> `new Block(std::vector<ExprPtr>())`
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D30292779
Pulled By: ZolotukhinM
fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63194
This test implements functionality used nowhere, and the author no
longer works on that. This PR also adds test_approx to CMakeLists where
it's been missing before.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D30292777
Pulled By: ZolotukhinM
fbshipit-source-id: ab6d98e729320a16f1b02ea0c69734f5e7fb2554
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348
This change addresses singlaiiit's comment on D30241792 (61b49c8e41), which makes the JIT interpreter's behavior consistent between `future` is set and not.
Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path.
Reviewed By: singlaiiit
Differential Revision: D30347782
fbshipit-source-id: 79ce57283154ca4372e5341217d942398db21ac8