Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72995
Add ability to specify input dimensions that need to be dynamic.
Example: if dim 115 can be dynamic in input sizes "1,115;1", then specify dynamic_dims as "115"
Also recompile and update CI models and some asm code as the old ones don't compile with compiler changes in context.cpp
Test Plan: - Compiles and runs BI Bytedoc model with and without dynamic inputs.
Reviewed By: ZolotukhinM
Differential Revision: D34233121
fbshipit-source-id: 35095e549ebd6d3bec98b9abb3f0764366a0ff6f
(cherry picked from commit 33166a9f9ac9194b5df0a35280b57708df255ebd)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72758
Bug: FLAGS_output_llvm option was introduced recently to specify LLVM assembly code file. Without previously default value, now the llvm code is not being saved to a file if asmfile input is not specified and is resulting in making the compiled output unusable.
Fix: Use default value if output_llvm/asmfile input is not specified.
Test Plan: Verified that output is saved to deafult .ll file path
Reviewed By: IvanKobzarev
Differential Revision: D34189107
fbshipit-source-id: ee51e8c17de92d3045690ca871fb9569fc3164d6
(cherry picked from commit 46352d446b)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68751
Add option to get input dtype from user for AOT compilation
Test Plan:
BI model compiles and runs fine
```
(pytorch) ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' --input_types='int64;int64'
Building... 8.3 sec (99%) 7673/7674 jobs, 0/7674 updated
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1116 14:32:44.632536 1332111 TensorImpl.h:1418] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
E1116 14:32:44.673710 1332111 huge_pages_allocator.cc:287] Not using huge pages because not linked with jemalloc
The compiled llvm assembly code was saved to bi.compiled.ll
The compiled model was saved to bi.compiled.pt
```
> Error thrown when input dims and input types sizes don't match
```
(pytorch) ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' --input_types='int64;int64;int64'
.
.
terminate called after throwing an instance of 'c10::Error'
what(): [enforce fail at aot_model_compiler.cc:208] split(';', FLAGS_input_dims).size() == split(';', FLAGS_input_types).size(). Number of input_dims and input_types should be the same
.
.
.
```
Reviewed By: ljk53
Differential Revision: D32477001
fbshipit-source-id: 8977b0b59cf78b3a2fec0c8428f83a16ad8685c5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67229
Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated.
This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later.
Note: Most of this change was landed in https://github.com/pytorch/pytorch/pull/66337 which had to be reverted as it was breaking `test_profiler` in `test_jit_fuser_te` as it replaced the name generated for graph with the default kernel_func_name value. This PR fixes that as well.
```
(pytorch) ~/local/pytorch kname
└─ $ python3 test/test_jit_fuser_te.py
CUDA not available, skipping tests
monkeytype is not installed. Skipping tests for Profile-Directed Typing
........................................<string>:3: UserWarning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release.
L = torch.cholesky(A)
should be replaced with
L = torch.linalg.cholesky(A)
and
.
.
.
......................<string>:3: UserWarning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release.
The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion.
L, _ = torch.symeig(A, upper=upper)
should be replaced with
L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L')
and
L, V = torch.symeig(A, eigenvectors=True)
should be replaced with
L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (Triggered internally at ../aten/src/ATen/native/BatchLinearAlgebra.cpp:2492.)
......[W pybind_utils.cpp:35] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator())
/data/users/priyaramani/pytorch/torch/testing/_internal/common_utils.py:403: UserWarning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (Triggered internally at ../torch/csrc/jit/python/pybind_utils.h:691.)
return callable(*args, **kwargs)
.....................................................................[W Resize.cpp:23] Warning: An output with one or more elements was resized since it had shape [1], which does not match the required output shape [].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function resize_output_check)
[W Resize.cpp:23] Warning: An output with one or more elements was resized since it had shape [1, 5], which does not match the required output shape [5].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function resize_output_check)
........................................................................s.......s...s.s....s......s..sss............................
----------------------------------------------------------------------
Ran 503 tests in 37.536s
OK (skipped=10)
```
Test Plan: Imported from OSS
Reviewed By: navahgar, pbelevich
Differential Revision: D31945713
Pulled By: priyaramani
fbshipit-source-id: f2246946f0fd51afba5cb6186d9743051e3b096b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967
Graph is an implementation detail. If user wants to get access to the
underlying graph, they should be able to explicitly dynamic cast instead.
ghstack-source-id: 141659819
Test Plan: no behavior change.
Reviewed By: gmagogsfm
Differential Revision: D31326153
fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67209
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67198
Fixing a couple instances where parameters were named method_compile_spec when they were actually compile_specs that could have multiple method_compile_specs inside.
Also use output dtype from buffer.
Test Plan:
Mobilenetv3 compiles and runs fine
```
(pytorch) ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ PYTORCH_JIT_LOG_LEVEL="aot_compiler" buck run //caffe2/binaries:aot_model_compiler -- --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="1,3,224,224
"
Downloaded 4501/6195 artifacts, 433.89 Mbytes, 14.3% cache miss (for updated rules)
Building: finished in 06:34.6 min (100%) 20233/20233 jobs, 5467/20233 updated
Total time: 06:35.0 min
BUILD SUCCEEDED
The compiled llvm assembly code was saved to mobilenetv3.compiled.ll
The compiled model was saved to mobilenetv3.compiled.pt
└─ $ ./compile_model.sh -m pytorch_dev_mobilenetv3 -p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/mobilenetv3.pt -v v1 -i "1,3,224,224"
+ VERSION=v1
+ getopts m:p:v:i:h opt
+ case $opt in
+ MODEL=pytorch_dev_mobilenetv3
.
.
Columns 961 to 9701e-11 *
-4.2304 -3.9674 2.4473 -0.8664 -0.7513 1.2140 0.0010 3.8675 1.2714 2.2989
Columns 971 to 9801e-11 *
-2.7203 1.6772 -0.7460 -0.6936 4.4421 -0.9865 -0.5186 -1.4441 1.3047 -1.6112
Columns 981 to 9901e-11 *
0.1275 -1.8815 2.5105 -0.4871 -2.2342 0.8520 0.8658 1.6180 3.8901 -0.2454
Columns 991 to 10001e-11 *
-1.4896 4.1337 -2.6640 0.8226 0.2441 -1.4830 -1.7430 1.8758 0.5481 0.5093
[ CPUFloatType{1,1000} ]
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Milliseconds per iter: 276.255. Iters per second: 3.61984
Memory usage before main runs: 104366080 bytes
Memory usage after main runs: 343441408 bytes
Average memory increase per iter: 2.39075e+07 bytes
0 value means "not available" in above
```
Reviewed By: ljk53
Differential Revision: D31698338
fbshipit-source-id: da6c74c1321ec02e0652f3afe6f97bf789d3361b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66337
Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated.
This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D31514095
Pulled By: priyaramani
fbshipit-source-id: b70c8e2c733600a435cd4e8b32092d37b7bf7de5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65707
Refactoring aotCompile to return a pair of compiled function and the LLVM assembly instead of updating an incoming string with assembly code
Testing: Gives expected results when compiled and run
```
(pytorch) ~/local/pytorch refactor_aot
└─ $ build/bin/aot_model_compiler --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="2,2,2"
The compiled model was saved to mobilenetv3.compiled.pt
```
Test Plan: Imported from OSS
Reviewed By: qihqi
Differential Revision: D31220452
Pulled By: priyaramani
fbshipit-source-id: f957c53ba83f876a2e7dbdd4b4571a760b3b6a9a