Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73109
This change updates the Vulkan model runner in `speed_benchmark_torch` to be able to generate inputs for models that have input/output types other than just a single tensor. Input elements are processed depending on their type.
Test Plan: Imported from OSS
Reviewed By: mikaylagawarecki
Differential Revision: D34354839
Pulled By: SS-JIA
fbshipit-source-id: 993e55372d2664fa7eddb16146deba264727f399
(cherry picked from commit 4a140202ac)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66123
Some models may take in a list of tensors as inputs, thus the bundled inputs will contain `IValues` that are of the type `c10::List`. For Vulkan models, every tensor in the `IValue` list has to be converted to a vulkan tensor first, and this case is not currently handled by the Vulkan model wrapper in the benchmark binary.
This diff introduces `IValue` type checking to the input processor of the Vulkan model wrapper, and adds support for Tensor and List types.
Test Plan:
```
# Build the binary
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:ptmobile_compareAndroid\#android-arm64 --show-output
# Push it to the device
adb push buck-out/gen/xplat/caffe2/ptmobile_compareAndroid\#android-arm64 /data/local/tmp/compare_models
# Run the benchmark binary
BENCH_CMD="/data/local/tmp/compare_models"
BENCH_CMD+=" --model=$PATH_TO_MODEL"
BENCH_CMD+=" --refmodel=$PATH_TO_REFERENCE_MODEL"
BENCH_CMD+=" --input_type=float --input_dims=$MODEL_INPUT_SIZE"
BENCH_CMD+=" --iter=100"
BENCH_CMD+=" --tolerance 1e-5"
```
Reviewed By: beback4u
Differential Revision: D31276862
fbshipit-source-id: 1d9abf958963da6ecad641202f0458402bee5ced
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45364
Plus add some more comments about the usage, limitations and cons.
Test Plan: Build and run benchmark binary.
Reviewed By: gchanan
Differential Revision: D23944193
fbshipit-source-id: 30d4f4991d2185a0ab768d94c846d73730fc0835
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42006
This PR introduces a simple CPU caching allocator. This is specifically
intended for mobile use cases and for inference. There is nothing
specific to the implementation that can prevent it from other use cases,
however its simplicity may not be suitable everywhere.
It simply tracks allocation by sizes and relies on deterministic
repeatable behavior where allocation of same sizes are made on every
inference.
Thus after the first allocation when the pointer is returned, instead of
returning it to system, allocator caches it for subsequent use.
Memory is freed automatically at the end of the process, or it can be
explicitly freed.
This is enabled at the moment in DefaultMobileCPUAllocator only.
Test Plan:
android test: cpu_caching_allocator_test
Imported from OSS
Reviewed By: dreiss
Differential Revision: D22726976
fbshipit-source-id: 9a38b1ce34059d5653040a1c3d035bfc97609e6c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39076
`--vulkan` argument to use torch benchmark on Vulkan Backend
if it is True - inputs will be converted to Vulkan backend before module.forward
Usage for mobilenetv2 fp32:
```
/build/bin/speed_benchmark_torch --model=mn-fp32.pt --input_type=float --input_dims=1,3,224,224 --warmup=1 --iter=5 --vulkan=true
```
Test Plan: Imported from OSS
Differential Revision: D21962428
Pulled By: IvanKobzarev
fbshipit-source-id: 3136af5386b6bce9ea53ba4a9019af2d312544b3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36765
We recently added support for bundling inputs with models. Now add
support to the benchmarker to use those inputs. This frees users from
having to look up the proper input format for each model.
Test Plan:
- Ran on a model without bundled inputs. Saw a clear error.
- Ran on a model with too few bundled inputs. Saw a clear error.
- Ran on a proper bundled input. Model executed.
Differential Revision: D21142659
Pulled By: dreiss
fbshipit-source-id: d23c1eb9d1de882345b007bf2bfbbbd6f964f6fe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35279
Supported benchmarking pytorch jit self-contained models.
* By specifying flag `--no_inputs=True`, the binary supports benchmarking self-contained torchscript model (model runs without inputs, `model.forward()`)
* This allows moving data preparation part outside of this binary.
Reviewed By: kimishpatel
Differential Revision: D20585639
fbshipit-source-id: c28e50503534c90023c1430479d26f1c1ce740b1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34598
as above
Test Plan:
test.txt
```
what time is it now
could you set a reminder at 7 am
waht is the weather today
```
example json
```
{
"model": {
"category": "CNN",
"description": "Assistant Mobile Inference",
"files": {
"model": {
"filename": "model.pt1",
"location": "//everstore/GICWmAB2Znbi_mAAAB0P51IPW8UrbllgAAAP/model.pt1",
"md5": "c0f4b29c442bbaeb0007fb0ce513ccb3"
},
"data": {
"filename": "input.txt",
"location": "/home/pengxia/test/input.txt",
"md5": "c0f4b29c442bbaeb0007fb0ce513ccb3"
}
},
"format": "pytorch",
"framework": "pytorch",
"kind": "deployment",
"name": "Assistant Mobile Inference"
},
"tests": [
{
"command": "{program} --model {files.model} --input_dims \"1\" --input_type NLUType --warmup {warmup} --iter 5 --input_file {files.data} --report_pep true",
"identifier": "{ID}",
"metric": "delay",
"iter": 15,
"warmup": 2,
"log_output": true
}
]
}
```
iter = 5 (--iter 5 ) *3(3 lintes in the test.txt) = 15
arbabu123 I will provide a wrapper to compute the iter in future.
run following command
```
buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/assistant_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices SM-G960U-8.0.0-26
```
results
https://our.intern.facebook.com/intern/aibench/details/275259559594003
**Note: this is compatible with the existing examples.**
Reviewed By: kimishpatel, ljk53
Differential Revision: D20389285
fbshipit-source-id: 80165ef394439a307ac7986cf540a80fdf3d85d6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34556
According to
https://github.com/pytorch/pytorch/pull/34012#discussion_r388581548,
this `at::globalContext().setQEngine(at::QEngine::QNNPACK);` call isn't
really necessary for mobile.
In Context.cpp it selects the last available QEngine if the engine isn't
set explicitly. For OSS mobile prebuild it should only include QNNPACK
engine so the default behavior should already be desired behavior.
It makes difference only when USE_FBGEMM is set - but it should be off
for both OSS mobile build and internal mobile build.
Test Plan: Imported from OSS
Differential Revision: D20374522
Pulled By: ljk53
fbshipit-source-id: d4e437a03c6d4f939edccb5c84f02609633a0698
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30285
PR #30144 introduced custom build script to tailor build to specific
models. It requires a list of all potentially used ops at build time.
Some JIT optimization passes can transform the IR by replacing
operators, e.g. decompose pass can replace aten::addmm with aten::mm if
coefficients are 1s.
Disabling optimization pass can ensure that the list of ops we dump from
the model is the list of ops that are needed.
Test Plan: - rerun the test on PR #30144 to verify the raw list without aten::mm works.
Differential Revision: D18652777
Pulled By: ljk53
fbshipit-source-id: 084751cb9a9ee16d8df7e743e9e5782ffd8bc4e3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28399
This is also to address issue #26764
Turns out it's incorrect to wrap the entire forward() call with
NonVariableTypeMode guard as some JIT passes has is_variable() check and
can be triggered within forward() call, e.g.:
jit/passes/constant_propagation.cpp
Since now we are toggling NonVariableTypeMode per method/op call, we can
remove the guard around forward() now.
Test Plan: - With stacked PRs, verified it can load and run previously failed models.
Differential Revision: D18055850
Pulled By: ljk53
fbshipit-source-id: 3074d0ed3c6e05dbfceef6959874e5916aea316c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26911
Check if QNNPACK is present as a backend (should always be present on mobile).
If it is present then set the backend to QNNPACK
Test Plan:
Test on mobile
./speed_benchmark_torch --model mobilenet_quantized_scripted.pt --input_dims="1,3,224,224" --input_type=float --warmup=5 --iter 20 --print_output True
Imported from OSS
Differential Revision: D17613908
fbshipit-source-id: af96722570a0111f13d69c38ccca52416ea5e460
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25449
Currently Variable and Tensor are still not 100% merged. There are
various places in ATen/TH codebase where it asserts input type to be
Variable/Tensor.
Usually when input type is Variable it will dispatch function calls to
corresponding generated VariableType methods, where it converts input
Variable type to Tensor type with "unpack()" before calling into LegacyTHFunctions
and then converts result from Tensor type back to Variable type with "as_variable()".
However, when USE_STATIC_DISPATCH mode is enabled, it no longer dispatches function
calls to VariableType methods. This way, Variable inputs will remain as
Variable instances when they reach LegacyTHFunctions and fail the "checked_tensor_unwrap"
asserts. And there are a couple other failed asserts because of similar reason.
There are several options to address this problem with USE_STATIC_DISPATCH:
1. Wait until Variable and Tensor are fully merged as planned in https://github.com/pytorch/pytorch/issues/23032;
2. Create Tensors instead of Variables upfront on caller side (JIT);
3. Fix downstream asserts in ATen/TH to tolerant Variable inputs when AutoGrad is disabled;
Option 1 will still take some time; Option 2 was tried before and caused
a lot problems; Option 3 needs to be conducted case by case as it can be
dangerous to remove asserts before 100% merge happens.
After digging into it a bit more, turns out NonVariableTypeMode not only controls how
it dispatches, but also controls TensorImpl.is_variable() result. So the
problem can be addressed by:
1. Set AutoNonVariableTypeMode mode right before calling forward();
2. Make sure all inputs/params are created as Variable, e.g.:
A. should use torch::ones() to create test input tensor instead of at::ones();
B. should not set AutoNonVariableTypeMode before torch::jit::load() call;
This diff applied these changes to speed benchmark to proof how it works.
Test Plan:
- Build speed benchmark binary for Android:
```
./scripts/build_android.sh \
-DBUILD_BINARY=ON \
-DBUILD_CAFFE2_MOBILE=OFF \
-DUSE_STATIC_DISPATCH=ON \
-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')
```
- Push binaries and model to Android device:
```
adb push build_android/bin/speed_benchmark_torch /data/local/tmp
adb push resnet.pb /data/local/tmp
```
- Run inference on device:
```
/data/local/tmp # ./speed_benchmark_torch --model=resnet.pb \
--input_dims="1,3,224,224" --input_type=float --print_output=true
```
Differential Revision: D17128567
Pulled By: ljk53
fbshipit-source-id: 58cc49ff35d21fefc906172cc3271f984eeb29f0