Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31222
- When constructing torch::from_blob() in the case where the deleter is a nop, switch to using a nullptr context in the DataPtr (with a nop deleter)
- No real extra memory/cpu requirements here, actually saves a minor alloc.
Why? Trying to get a signal that a Tensor might contain non-owned memory from
torch::from_blob(), by detecting the nullptr context.
ghstack-source-id: 96336078
Test Plan:
buck test mode/dev caffe2/test/cpp/api/...
buck test mode/dev-nosan caffe2/test/...
Differential Revision: D18992119
fbshipit-source-id: 4eea642f82d0858b57fdfc6995364a760c10567d
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28430
The unpythonic signatures for functions such as `torch.addcdiv` are already seperated in [`deprecated.yaml`] and the signatures marked as deprecated in `PythonArgParser`. However, nothing was done with this information previously. So, this now emits a warning when the deprecated signatures are used.
One minor complication is that if all arguments are passed as keyword args then there is nothing to differentiate the deprecated overload. This can lead to false warnings being emitted. So, I've also modified `PythonArgParser` to prefer non-deprecated signatures.
[`deprecated.yaml`]: https://github.com/pytorch/pytorch/blob/master/tools/autograd/deprecated.yaml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31514
Differential Revision: D19298735
Pulled By: ezyang
fbshipit-source-id: 03cb78af17658eaab9d577cd2497c6f413f07647
Summary:
Compared to cuDNN bias, PyTorch add has the following advantage:
- faster, especially for backward (see: https://github.com/zasdfgbnm/things/blob/master/2019/conv-backward-profile.md)
- handles 64bit indexing automatically
- has less code, less maintenance effort
ngimel I submit this PR early so the CI could start building it. But I have not tested it locally yet (still waiting for compiling).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31524
Differential Revision: D19264244
Pulled By: ngimel
fbshipit-source-id: cb483d378a6d8bce0a05c3643a796e544bd8e8f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30612
The first version to move prim ops to c10 registration. After the reviewers are fine with the initial changes, more operators will be moved in the same style.
Test Plan: Imported from OSS
Differential Revision: D19237648
Pulled By: iseeyuan
fbshipit-source-id: c5a519604efffb80564a556536f17d829f71d9f9
Summary:
Currently `cumsum` crashes for tensors with non-empty dimensions but with zero elements, which could happen when some dimension is zero. This commit fixes the error by checking both `dim()` and `numel()` in cumsum backward
Fixes https://github.com/pytorch/pytorch/issues/31515
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31694
Reviewed By: mrshenli
Differential Revision: D19266613
Pulled By: leedtan
fbshipit-source-id: 9407e0aa55440fed911c01a3580bb6c5eab62a16
Summary:
Add support to print op dependence as python code so that both custom
build script and BUCK can import it without yaml parser.
Test Plan:
- generate the file:
```
ANALYZE_TORCH=1 FORMAT=py DEPLOY=1 tools/code_analyzer/build.sh -closure=false
```
- load the file in python:
```
python
>>> from tools.code_analyzer.generated.torch import TORCH_DEPS
>>> print(TORCH_DEPS)
```
Differential Revision: D18894639
Pulled By: ljk53
fbshipit-source-id: e304d0525a07a13cf6e8a9317cd22637200d044c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28942
The new abstract RRef class contains only user-facing RRef APIs.
It will be later moved to a common folder so that it can be shared
by jit and distributed packages to provide TorchScript support.
Test Plan: Imported from OSS
Differential Revision: D18240590
Pulled By: mrshenli
fbshipit-source-id: ac28cfc2c8039ab7131b537b2971ed4738710acb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31517
This is going to be used by upsample (which currently uses magic values to represent optionals).
For now, we just introduce a fake function for testing (torch._test_optional_float(x)).
Test Plan: Imported from OSS
Differential Revision: D19198721
Pulled By: gchanan
fbshipit-source-id: 0a1382fde0927c5d277d02d62bfb31fb574b8c74
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31393
pytorch build was set up with the include paths (-I) relative to fbcode/. This works well for fbcode builds, but doesn't work for the new fbcode_deps args for xplat build targets that work across xplat and fbcode. When these targets are built, the include paths need to be relative to fbsource, so fbcode/ suffix needs to be added to those paths.
Longer term, to properly fix this, we need to use raw_headers with public_include_directories specified for all of these targets.
Test Plan: buck test mode/dev //papaya/integration/service/local/test:mnist_federated_system_test -- 'MnistFederatedSystemTest\.test' --run-disabled
Reviewed By: mzlee
Differential Revision: D19148465
fbshipit-source-id: a610e84bf4cad5838e54e94bae71b957c4b6d4b5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31214
This set up the basic infrastructure for distributed autograd and rpc to
bind their operators to TorchScript, since the whole distributed package
is builtin behind the `USE_DISTRIBUTED` flag, we separate the
registration and build it only when the flag is on.
Test Plan: Imported from OSS
Differential Revision: D19137160
fbshipit-source-id: ff47dc4c380ebe273fe0eea9e5e3fccfbd6466d7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30978
This particular approach queries our issue tracker for test titles that
match the following format:
```
DISABLED test_async_grad_guard_with_grad (jit.test_async.TestAsync)
```
And then skips the python test for them. There is 1 second timeout so
if the internet flakes we still run the test suite, without disabling any
tests.
This is intended as a quick fix, similar to ninja unland, to get to a green
master. Long term test disables should go into the code.
Test Plan: Imported from OSS
Pulled By: zdevito
Differential Revision: D18890532
fbshipit-source-id: fe9447e59a6d5c9ad345f7c3ff15d63b6d2a09e2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30391
A Type parser to parse the python string of a Type. For example,
"Tuple[str, Optional[float], Dict[str, List[Tensor]], int]".
Please refer to test_type_parser.cpp for the usage.
One of the use cases is in lite interpreter, types needs to be serialized (directly calling the python_str() of the Type) and deserialized (calling parseType(str)).
Test Plan: Imported from OSS
Differential Revision: D18924268
Pulled By: iseeyuan
fbshipit-source-id: 830d411563abfbeec023f01e7f8f4a1796f9a59a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29579
Per #28923, this diff is to move Future<Message> to torch::utils and extend it to be Future<T>, most of implementations are copied from FutureMessage and ivalue::Future. merge ivalue::Future with Future<T> will be done separately.
The main difference between Future<T> and FutureMessage is the error handling, instead of checking message type inside Future to handle error, this future<T> owns has_error_ and error_ states.
also this future passes value_, has_error_ and error_ states to callbacks for easily read future states.
In next diff, a torch script rpc async API will be created, before the API returns, it will create an ivalue::Future and passes it to Future<T>'s call back where state of ivalue::Future will be set. In this way, the torch script rpc async API can still return a ivalue::Future and call wait() to get its state appropriately afterwards.
ghstack-source-id: 95479525
Test Plan: unit tests
Differential Revision: D18263023
fbshipit-source-id: 48a65712656a72c2feb0bb3ec8b308c0528986a6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31117
After this diff, we will have completely removed the named tensor
feature flagging. This means that named tensors are always on and that
there is no mechanism to turn them off. There should be no more follow-up
diffs.
I performed the deletion of the header with
```
find . -type f -print0 | xargs -0 sed -i '/#include
<ATen\/core\/EnableNamedTensor.h>/d'
```
Test Plan: - wait for CI
Differential Revision: D18934952
Pulled By: zou3519
fbshipit-source-id: 253d059074b910fef15bdf885ebf71e0edf5bea5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116
Changelist:
- remove BUILD_NAMEDTENSOR macro
- remove torch._C._BUILD_NAMEDTENSOR
- remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR
Future:
- In the next diff, I will remove all usages of
ATen/core/EnableNamedTensor.h since that header doesn't do anything
anymore
- After that, we'll be done with the BUILD_NAMEDTENSOR removal.
Test Plan: - run CI
Differential Revision: D18934951
Pulled By: zou3519
fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d
Summary:
After several discussions, we agreed not to put any extra safety check for recordStream as either the check will cause failures in certain scenarios or there is no need to throw for user errors.
As a summary, it simply does what is described in https://github.com/pytorch/pytorch/issues/27405, check if a tensor is indeed allocated by a CUDACachingAllocator instance, if it is, then throw internal error if a block can not be retrieved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30870
Differential Revision: D18851669
Pulled By: yxia11
fbshipit-source-id: c2f01798cd24f1fd0f35db8764057d5d333dab95
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30894
This PR begins the process of removing BUILD_NAMEDTENSOR macros. There
will be followups.
Reasons for removing the macros:
- BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0.
- Since we don't test building without it, it is useless to keep around.
- Code becomes nicer to read without the macros
Reasons for not removing the macros:
- potential for feature flagging
Now, I argue against needing to feature flag. The main reason why we
might want to feature flag is if we need to disable the feature.
We'd need a fast switch to disable the feature if someone discovers
in the future that named tensors caused some regression in some existing workflows.
In https://github.com/pytorch/pytorch/pull/25798, I did a variety of
macro- and micro- benchmarks to determine the performance impact of named
tensors on regular tensors.
[The
microbenchmarks](https://github.com/pytorch/pytorch/pull/25798#issuecomment-529014810)
were not very stable, and running the
microbenchmarks for more iterations doesn't actually help because the
noise is not distributed in a nice way. Instead of microbenchmarks I ran
a [profiler
(perf)](https://github.com/pytorch/pytorch/pull/25798#issuecomment-555707645)
to estimate how much overhead named tensors add to unnamed code. I
estimated the overhead to be less than 100ns for `add` and even smaller
for `mm`; there are ways to optimize even futher if we find this to be a
problem.
[Initial
macrobenchmarks](https://github.com/pytorch/pytorch/pull/25798#issuecomment-530539104)
were also not very stable. I ran imagenet for some number of epochs. To
make them more stable, I got rid of the data loading (which seemed to
vary between runs). [In some benchmarkers without data
loading](https://github.com/pytorch/pytorch/pull/25798#issuecomment-562214053),
we can see that the results are less noisy now. These results support
no noticeable regressions in speed.
Test Plan: - wait for CI
Differential Revision: D18858543
Pulled By: zou3519
fbshipit-source-id: 08bf3853a9f506c6b084808dc9ddd1e835f48c13
Summary:
- [x] Add more comments and refactor the logic of `ReshapeToAdvancedIndexingFormat`
- [x] Add more description here. Cases that are/aren't supported, and how they are supported.
- [x] Need to merge this PR https://github.com/pytorch/pytorch/issues/27186 to enable testing inplace operators.
We are now supporting exporting aten::copy_ and aten::index_put to ONNX.
Here's a breakdown of the different cases in PyTorch code.
```
# Case 1: Scalar Indices
x[0, 1, 2] = data
# Case 2: Slice Indices
x[1:3, :, ::2] = data
# Case 3: Ellipsis Indices
x[..., 0] = data
# Case 4: Tensor Indices
ind1 = torch.tensor([0, 2])
ind2 = torch.tensor([1, 1])
x[ind1, ind2] = data
# Case 5: Mixing all the above cases
ind1 = torch.tensor([0, 2])
ind2 = torch.tensor([1, 1])
x[1:3, ind1, ind2, ..., 3] = data
```
Limitations:
Tensor indices must be consecutive, and 1-d tensors.
```
# Supported
ind1 = torch.tensor([0, 2])
ind2 = torch.tensor([1, 1])
x[ind1, ind2] = data
# Not supported
ind1 = torch.tensor([0, 2])
ind2 = torch.tensor([1, 1])
ind3 = torch.tensor([[0], [1]])
x[ind1, :, ind2] = data
x[ind3] = data
```
Negative indices are not supported.
```
# Not supported
x[-1] = data
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26941
Differential Revision: D17951030
Pulled By: houseroad
fbshipit-source-id: 4357777072f53aa0bc4b297aa1ee53457a7f8dec
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30649
Operators in VariableTypeManual are now no longer registered against the VariableTypeId key, but they are registered as compound ops. See https://github.com/pytorch/pytorch/issues/30102 for background.
This also requires the non-variable codegen to ignore them and requires removal of VariableMethodStubs.cpp.
So, because function_wrapper.py now also needs to know which ops are manual, instead of having a hard-coded list in gen_variable_type.cpp for ops with manual implementation, we now have a `manual_kernel_registration` flag in native_functions.yaml that disables the registration of operator kernels for this operator (the schema is still registered). Then, we manually register the right kernels for the operator.
ghstack-source-id: 95082204
Test Plan: unit tests
Differential Revision: D18778191
fbshipit-source-id: 0af6f9e43ff4fb9800ce19b286dfccd0fd22cc41
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30790
The index_select documentaiton reads:
"The returned tensor has the same number of dimensions as the original tensor (input)."
But the implementation would return a 0-dimensional tensor iff both the input and index were 0-dimensional.
This change makes it so we retuan a 0-dimensional tensor iff the input is 0-dimensional.
Restacked version of: https://github.com/pytorch/pytorch/pull/30502
Test Plan: Imported from OSS
Differential Revision: D18825717
Pulled By: gchanan
fbshipit-source-id: aeb10c5107e748af3e264fbdc81fff5dd4833cc4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29944
This particular approach queries our issue tracker for test titles that
match the following format:
```
DISABLED test_async_grad_guard_with_grad (jit.test_async.TestAsync)
```
And then skips the python test for them. There is 1 second timeout so
if the internet flakes we still run the test suite, without disabling any
tests.
This is intended as a quick fix, similar to ninja unland, to get to a green
master. Long term test disables should go into the code.
Test Plan: Imported from OSS
Differential Revision: D18621773
Pulled By: zdevito
fbshipit-source-id: 5532f1d5fa3f83f77fc3597126cbb7dba09a3c33
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29161.
I looked a bit at the code changes related to this and think I have all of the use cases of `DeprecatedTypeProperties` covered in the message, but suggestions from someone with more context on this would be very much appreciated :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30281
Differential Revision: D18830818
Pulled By: ezyang
fbshipit-source-id: 1a7fcee15354ae09e6644577e7fa33bd26acfe20
Summary:
This is a re-do of https://github.com/pytorch/pytorch/issues/27064, which was reverted (b8792c0438). This was landed at the same time as other work that added new operators to the `torch` namespace so the check for whether the `torch` namespace is exhaustively checked for overridability was triggering test failures.
I've temporarily disabled that check and added an explanatory comment that the check will be re-enabled in a future PR that will be merged during a time when the commit velocity on PyTorch is lower.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30730
Differential Revision: D18813270
Pulled By: ezyang
fbshipit-source-id: 70477c4656dca8fea6e7bc59259555041fcfbf68
Summary:
[Why static dispatch]
Static dispatch was introduced to allow stripping out unused ops at link
time (with “gc-sections” linker flag) for mobile build.
The alternative approaches to do "non-static" dispatch are:
* virtual methods - old ATen dispatcher, which has already been deprecated;
* registry pattern - used by caffe2, c10 and JIT;
However, none of them are “gc-sections” friendly. Global registers are
root symbols - linker cannot strip out any op if we use registry pattern
for mobile.
[Why static dispatch isn’t great]
* One more code path to maintain;
* Need recompile framework to add new backends/ops;
* Doesn’t support AutoGrad yet thus blocks on-device training;
[Static Code Analysis]
This PR introduces a LLVM analysis pass. It takes LLVM bitcode /
assembly as input and generates dependecy graph among aten ops. From a
set of root ops used by a model, we can calculate transitive closure of
all dependent ops, then we can ask codegen to only register these ops.
[Approach]
To generate the dependency graph it searches for 3 types of connections in
LLVM bitcode / assembly:
1) op registration: op name (schema string literal) -> registered function;
2) regular function call: function -> function;
3) op invocation: function -> op name (schema string literal)
For 2) it uses similar algorithm as llvm::LazyCallGraph - not only looks into
call/invoke instructions but also recursively searches for function pointers
in each instruction's operands.
For 1) and 3) it searches for connections between operator name string
literals / function pointers and c10 op registration/invocation API calls in
LLVM IR graph via "use" edges (bi-directional):
1. llvm::Value has "users()" method to get other llvm::Value nodes that use
the value;
2. most of types derive from llvm::User which has "operands()" method to get
other llvm::Value nodes being used by the value;
[Limitation]
For now the search doesn't go beyond the function boundary because the
reference to op name string literals and c10 op registration/invocation
APIs are almost always in the same function.
The script uses regular expression to identify c10 API calls:
* op_schema_pattern="^(aten|quantized|profiler|_test)::[^ ]+"
* op_register_pattern="c10::RegisterOperators::(op|checkSchemaAndRegisterOp_)"
* op_invoke_pattern="c10::Dispatcher::findSchema|callOp"
If we create helper function around c10 API (e.g. the "callOp" method
defined in aten/native), we could simply add them to the regular expression
used to identify c10 API.
[Example]
In the following example, it finds out:
1) the registered function for "quantized:add" operator;
2) one possible call path to at::empty() function;
3) the called operator name "aten::empty":
- "quantized::add"
- c10::detail::wrap_kernel_functor_unboxed_<at::native::(anonymous namespace)::QAdd<false>, at::Tensor (at::Tensor, at::Tensor, double, long)>::call(c10::OperatorKernel*, at::Tensor, at::Tensor, double, long)
- at::native::(anonymous namespace)::QAdd<false>::operator()(at::Tensor, at::Tensor, double, long)
- void at::native::DispatchStub<void (*)(at::Tensor&, at::Tensor const&, at::Tensor const&), at::native::qadd_stub>::operator()<at::Tensor&, at::Tensor const&, at::Tensor const&>(c10::DeviceType, at::Tensor&, at::Tensor const&, at::Tensor const&)
- at::native::DispatchStub<void (*)(at::Tensor&, at::Tensor const&, at::Tensor const&), at::native::qadd_stub>::choose_cpu_impl()
- void at::native::(anonymous namespace)::qadd_kernel<false>(at::Tensor&, at::Tensor const&, at::Tensor const&)
- at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&, bool)
- at::TensorIterator::build()
- at::TensorIterator::fast_set_up()
- at::empty(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>)
- "aten::empty"
[How do we know it’s correct?]
* Built a test project that contains different op registration/invocation
patterns found in pytorch codebase, including both codegen and non-codegen
cases.
* Tried different optimization flags “-O0”, “-O3” - the result seems to
be stable.
* Filtered by common patterns: “aten::”, “at::”, “at::native”,
“at::CPUType”, “at::TypeDefault” - manually checked the relationship
between function schema strings and corresponding implementations were
captured.
* It can print instruction level data flow and show warning message if it
encounters unexpected cases (e.g.: found 0 or multiple op names per
registration/invocation API call, found 0 registered functions, etc).
* Verified consistent results on different linux / macOs hosts. It can
handle different STL library ABI reliably, including rare corner cases
for short string literals
[Known issues]
* Doesn’t handle C code yet;
* Doesn’t handle overload name yet (all variants are collapsed into the
main op name);
Test Plan:
```
LLVM_DIR=... ANALYZE_TEST=1 CHECK_RESULT=1 scripts/build_code_analyzer.sh
```
Differential Revision: D18428118
Pulled By: ljk53
fbshipit-source-id: d505363fa0cbbcdae87492c1f2c29464f6df2fed
Summary:
Move the shell script into this separate PR to make the original PR
smaller and less scary.
Test Plan:
- With stacked PRs:
1. analyze test project and compare with expected results:
```
ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh
```
2. analyze LibTorch:
```
ANALYZE_TORCH=1 tools/code_analyzer/build.sh
```
Differential Revision: D18474749
Pulled By: ljk53
fbshipit-source-id: 55c5cae3636cf2b1c4928fd2dc615d01f287076a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30659
I could only find one usage of TupleParser and it doesn't seem worth maintaining just for that one usage.
Test Plan: Imported from OSS
Differential Revision: D18795979
Pulled By: nairbv
fbshipit-source-id: 6e50d65fc8fade0944f36ab20d00f1539a3d4cb8
Summary:
Given that pybind11 implements these gil functions, I don't think it makes sense for Pytorch to have its own bespoke versions.
Fixes https://github.com/pytorch/pytorch/issues/29065
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29095
Differential Revision: D18301806
Pulled By: ezyang
fbshipit-source-id: 03da6a26c41ee65aaadf7b67b9f0b14d2def2a5a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28287
This PR eliminates the static distinction between
Tensor and Variable. Every Variable is a Tensor, no need to static_cast
or call the Variable constructor.
To do this, I need Tensor to have API parity with Variable. I have already
moved most of the methods I don't want in Tensor off Variable.
These implementations are all placed in Tensor.cpp.
One API difference is that all Variable methods now have const, so we no longer
have faux const-correctness (see https://github.com/zdevito/ATen/issues/27 for
back story)
This diff is BC breaking in a few ways:
- Because torch::autograd::Variable is now just an alias of at::Tensor, ADL for
`torch::autograd` functions no longer works, you have to explicitly qualify
them with `torch::autograd` (examples: `torch/nn/parallel/data_parallel.h`)
- Because Variable and Tensor are now the same type, code which assumes that
they are different types (e.g., for the purposes of templating, or enable_if checks)
will not work until you delete the (now) redundant overload/specialization.
(examples: `torch/nn/modules/container/any.h`, `torch/csrc/utils/pybind.h`)
Some other notes:
- I'm not sure what was going with the old template implementation of `extract_vars`,
but I couldn't get the sfinae version to work. Replacing it with an overloading based version
made it work.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18571426
Pulled By: ezyang
fbshipit-source-id: 2ea8151e5f1d8512cdebf1345399642e68b707b8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30144
Create script to produce libtorch that only contains ops needed by specific
models. Developers can use this workflow to further optimize mobile build size.
Need keep a dummy stub for unused (stripped) ops because some JIT side
logic requires certain function schemas to be existed in the JIT op
registry.
Test Steps:
1. Build "dump_operator_names" binary and use it to dump root ops needed
by a specific model:
```
build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml
```
2. The MobileNetV2 model should use the following ops:
```
- aten::t
- aten::dropout
- aten::mean.dim
- aten::add.Tensor
- prim::ListConstruct
- aten::addmm
- aten::_convolution
- aten::batch_norm
- aten::hardtanh_
- aten::mm
```
NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm".
You need fix it manually for now.
3. Run custom build script locally (use Android as an example):
```
SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a
```
4. Checkout demo app that uses locally built library instead of
downloading from jcenter repo:
```
git clone --single-branch --branch custom_build git@github.com:ljk53/android-demo-app.git
```
5. Copy locally built libraries to demo app folder:
```
find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \;
```
6. Build demo app with locally built libtorch:
```
cd ${HOME}/src/android-demo-app/HelloWorldApp
./gradlew clean && ./gradlew assembleDebug
```
7. Install and run the demo app.
In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M.
Test Plan: Imported from OSS
Differential Revision: D18612127
Pulled By: ljk53
fbshipit-source-id: fa8d5e1d3259143c7346abd1c862773be8c7e29a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29881
Breaking these into separate files allows us to have three different builds:
- Mobile inference-only.
- Mobile with module saving.
- Server with module saving and other export functions like ONNX.
And this can be accomplished just by selecting which cpp files to compile,
without setting any preprocessor flags.
Test Plan: CI. Local mobile+saving build.
Reviewed By: smessmer
Differential Revision: D18509296
fbshipit-source-id: 9438273bac4624df5c7f035b2bacb901cce43053