Commit Graph

15 Commits

Author SHA1 Message Date
Priya Ramani
ac97e953b4 Add dynamic shape support to AOT driver & compiler (#72995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72995

Add ability to specify input dimensions that need to be dynamic.
Example: if dim 115 can be dynamic in input sizes "1,115;1", then specify dynamic_dims as "115"

Also recompile and update CI models and some asm code as the old ones don't compile with compiler changes in context.cpp

Test Plan: - Compiles and runs BI Bytedoc model with and without dynamic inputs.

Reviewed By: ZolotukhinM

Differential Revision: D34233121

fbshipit-source-id: 35095e549ebd6d3bec98b9abb3f0764366a0ff6f
(cherry picked from commit 33166a9f9ac9194b5df0a35280b57708df255ebd)
2022-02-24 04:30:48 +00:00
Ivan Kobzarev
c32b74cecb [nnc][aot_compiler] Memory formats args to aot_compiler (#72873)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72873

Test Plan: Imported from OSS

Reviewed By: priyaramani

Differential Revision: D34250984

Pulled By: IvanKobzarev

fbshipit-source-id: e723ee64b024883eef78853e1b185b7040cafb09
(cherry picked from commit e9908df045)
2022-02-16 18:39:31 +00:00
Priya Ramani
444191de56 Use default value on empty llvm_code_path (#72758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72758

Bug: FLAGS_output_llvm option was introduced recently to specify LLVM assembly code file. Without previously default value, now the llvm code is not being saved to a file if asmfile input is not specified and is resulting in making the compiled output unusable.

Fix: Use default value if output_llvm/asmfile input is not specified.

Test Plan: Verified that output is saved to deafult .ll file path

Reviewed By: IvanKobzarev

Differential Revision: D34189107

fbshipit-source-id: ee51e8c17de92d3045690ca871fb9569fc3164d6
(cherry picked from commit 46352d446b)
2022-02-12 00:35:24 +00:00
Mikhail Zolotukhin
a60e2ae037 [TensorExpr] Move AOT compilation logic from aot_compiler.cpp to NNC's to_backend (#70375)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70375

Differential Revision:
D33303645
D33303645

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin, priyaramani

Pulled By: ZolotukhinM

fbshipit-source-id: 01ab9fab9bb0d63f89b06a146d3c5fb6ed7fe52d
(cherry picked from commit aac8e0ed90)
2022-02-02 02:34:55 +00:00
Priya Ramani
8cc9ec2f6b Add option to get input dtype from user (#68751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68751

Add option to get input dtype from user for AOT compilation

Test Plan:
BI model compiles and runs fine
```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' --input_types='int64;int64'
Building... 8.3 sec (99%) 7673/7674 jobs, 0/7674 updated
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1116 14:32:44.632536 1332111 TensorImpl.h:1418] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
E1116 14:32:44.673710 1332111 huge_pages_allocator.cc:287] Not using huge pages because not linked with jemalloc
The compiled llvm assembly code was saved to bi.compiled.ll
The compiled model was saved to bi.compiled.pt
```

> Error thrown when input dims and input types sizes don't match

```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' --input_types='int64;int64;int64'
.
.
terminate called after throwing an instance of 'c10::Error'
  what():  [enforce fail at aot_model_compiler.cc:208] split(';', FLAGS_input_dims).size() == split(';', FLAGS_input_types).size(). Number of input_dims and input_types should be the same
.
.
.
```

Reviewed By: ljk53

Differential Revision: D32477001

fbshipit-source-id: 8977b0b59cf78b3a2fec0c8428f83a16ad8685c5
2021-11-29 21:39:49 -08:00
Ivan Kobzarev
7fbcf79684 [tensorexpr][nnc] Support quantization (#66676)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66676

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31676329

Pulled By: IvanKobzarev

fbshipit-source-id: 288b41ff4ed603dfaacb465f296997f14bb23c22
2021-10-31 22:49:30 -07:00
Priya Ramani
fa70d72e95 Set kernel func name from AOT Compiler (#67229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67229

Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated.
This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later.

Note: Most of this change was landed in https://github.com/pytorch/pytorch/pull/66337 which had to be reverted as it was breaking `test_profiler` in `test_jit_fuser_te` as it replaced the name generated for graph with the default kernel_func_name value. This PR fixes that as well.

```
(pytorch)  ~/local/pytorch kname
└─ $ python3 test/test_jit_fuser_te.py
CUDA not available, skipping tests
monkeytype is not installed. Skipping tests for Profile-Directed Typing
........................................<string>:3: UserWarning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release.
L = torch.cholesky(A)
should be replaced with
L = torch.linalg.cholesky(A)
and
.
.
.
......................<string>:3: UserWarning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release.
The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion.
L, _ = torch.symeig(A, upper=upper)
should be replaced with
L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L')
and
L, V = torch.symeig(A, eigenvectors=True)
should be replaced with
L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (Triggered internally at  ../aten/src/ATen/native/BatchLinearAlgebra.cpp:2492.)
......[W pybind_utils.cpp:35] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator())
/data/users/priyaramani/pytorch/torch/testing/_internal/common_utils.py:403: UserWarning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (Triggered internally at  ../torch/csrc/jit/python/pybind_utils.h:691.)
  return callable(*args, **kwargs)
.....................................................................[W Resize.cpp:23] Warning: An output with one or more elements was resized since it had shape [1], which does not match the required output shape [].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function resize_output_check)
[W Resize.cpp:23] Warning: An output with one or more elements was resized since it had shape [1, 5], which does not match the required output shape [5].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function resize_output_check)
........................................................................s.......s...s.s....s......s..sss............................
----------------------------------------------------------------------
Ran 503 tests in 37.536s

OK (skipped=10)
```

Test Plan: Imported from OSS

Reviewed By: navahgar, pbelevich

Differential Revision: D31945713

Pulled By: priyaramani

fbshipit-source-id: f2246946f0fd51afba5cb6186d9743051e3b096b
2021-10-27 13:10:49 -07:00
Zhengxu Chen
b55a2500d2 [jit] Remove graph() call from abstract Function interface. (#65967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967

Graph is an implementation detail. If user wants to get access to the
underlying graph, they should be able to explicitly dynamic cast instead.
ghstack-source-id: 141659819

Test Plan: no behavior change.

Reviewed By: gmagogsfm

Differential Revision: D31326153

fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84
2021-10-27 11:54:26 -07:00
Priya Ramani
ecf7e96969 [Light] Remove ambiguity from compile_spec names, use actual output type (#67209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67209

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67198

Fixing a couple instances where parameters were named method_compile_spec when they were actually compile_specs that could have multiple method_compile_specs inside.
Also use output dtype from buffer.

Test Plan:
Mobilenetv3 compiles and runs fine
```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ PYTORCH_JIT_LOG_LEVEL="aot_compiler" buck run //caffe2/binaries:aot_model_compiler -- --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="1,3,224,224
"
Downloaded 4501/6195 artifacts, 433.89 Mbytes, 14.3% cache miss (for updated rules)
Building: finished in 06:34.6 min (100%) 20233/20233 jobs, 5467/20233 updated
  Total time: 06:35.0 min
BUILD SUCCEEDED
The compiled llvm assembly code was saved to mobilenetv3.compiled.ll
The compiled model was saved to mobilenetv3.compiled.pt

└─ $ ./compile_model.sh -m pytorch_dev_mobilenetv3 -p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/mobilenetv3.pt -v v1 -i "1,3,224,224"
+ VERSION=v1
+ getopts m:p:v:i:h opt
+ case $opt in
+ MODEL=pytorch_dev_mobilenetv3
.
.
Columns 961 to 9701e-11 *
-4.2304 -3.9674  2.4473 -0.8664 -0.7513  1.2140  0.0010  3.8675  1.2714  2.2989

Columns 971 to 9801e-11 *
-2.7203  1.6772 -0.7460 -0.6936  4.4421 -0.9865 -0.5186 -1.4441  1.3047 -1.6112

Columns 981 to 9901e-11 *
 0.1275 -1.8815  2.5105 -0.4871 -2.2342  0.8520  0.8658  1.6180  3.8901 -0.2454

Columns 991 to 10001e-11 *
-1.4896  4.1337 -2.6640  0.8226  0.2441 -1.4830 -1.7430  1.8758  0.5481  0.5093
[ CPUFloatType{1,1000} ]
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Milliseconds per iter: 276.255. Iters per second: 3.61984
Memory usage before main runs: 104366080 bytes
Memory usage after main runs: 343441408 bytes
Average memory increase per iter: 2.39075e+07 bytes
0 value means "not available" in above
```

Reviewed By: ljk53

Differential Revision: D31698338

fbshipit-source-id: da6c74c1321ec02e0652f3afe6f97bf789d3361b
2021-10-25 17:44:05 -07:00
Natalia Gimelshein
b6fa998892 Revert D31514095: Use kernel_func_name from aotCompiler
Test Plan: revert-hammer

Differential Revision:
D31514095 (7b55dc8340)

Original commit changeset: b70c8e2c7336

fbshipit-source-id: ad4d828f33506e612b51c276149fa0e12b0565d5
2021-10-23 17:17:53 -07:00
Priya Ramani
7b55dc8340 Use kernel_func_name from aotCompiler (#66337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66337

Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated.
This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31514095

Pulled By: priyaramani

fbshipit-source-id: b70c8e2c733600a435cd4e8b32092d37b7bf7de5
2021-10-23 02:20:45 -07:00
Priya Ramani
9e3a2babfa Make aotCompile support multiple input sizes (#66727)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66727

Make aotCompile support multiple input sizes

Test Plan:
Able to compile and run a model with multiple inputs
```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ PYTORCH_JIT_LOG_LEVEL=aot_compiler buck run //caffe2/binaries:aot_model_compiler -- --model aot_test_model.pt --model_name=aot_test_model --model_version=v1 --input_dims="2,2,2;2,2,2"
Building: finished in 3.2 sec (100%) 7461/7461 jobs, 0/7461 updated
  Total time: 3.4 sec
BUILD SUCCEEDED
[DUMP aot_compiler.cpp:097] graph before shape propagation
[DUMP aot_compiler.cpp:097] graph(%x.1 : Tensor,
[DUMP aot_compiler.cpp:097]       %y.1 : Tensor):
[DUMP aot_compiler.cpp:097]   %3 : int = prim::Constant[value=1]() # :0:0
[DUMP aot_compiler.cpp:097]   %4 : Tensor = aten::add(%x.1, %y.1, %3) # /data/users/priyaramani/fbsource/fbcode/caffe2/test/mobile/nnc/aot_test_model.py:10:15
[DUMP aot_compiler.cpp:097]   return (%4)
(1,.,.) =                                                                                                                                                                                            0.3357  0.6137
  0.8472  0.0858

(2,.,.) =
  0.8406  0.2959
  0.6012  0.7184
[ CPUFloatType{2,2,2} ]
(1,.,.) =
  0.7086  0.6398
  0.0579  0.1913

(2,.,.) =
  0.8598  0.3641
  0.5925  0.0200
[ CPUFloatType{2,2,2} ]
here
2
2
graph 0x6130001ee2d0
[DUMP aot_compiler.cpp:118] graph after shape propagation
[DUMP aot_compiler.cpp:118] graph(%x.1 : Float(2, 2, 2, strides=[4, 2, 1], requires_grad=0, device=cpu),
[DUMP aot_compiler.cpp:118]       %y.1 : Float(2, 2, 2, strides=[4, 2, 1], requires_grad=0, device=cpu)):
[DUMP aot_compiler.cpp:118]   %3 : int = prim::Constant[value=1]() # :0:0
[DUMP aot_compiler.cpp:118]   %4 : Tensor(2, 2, 2) = aten::add(%x.1, %y.1, %3) # /data/users/priyaramani/fbsource/fbcode/caffe2/test/mobile/nnc/aot_test_model.py:10:15
[DUMP aot_compiler.cpp:118]   return (%4)
The compiled llvm assembly code was saved to aot_test_model.compiled.ll
The compiled model was saved to aot_test_model.compiled.pt

└─ $ ./compile_model.sh -m aot_test_model -p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt -v v1 -i "2,2,2;2,2,2"
+ VERSION=v1
+ getopts m:p:v:i:h opt
+ case $opt in
+ MODEL=aot_test_model
+ getopts m:p:v:i:h opt
+ case $opt in
+ MODEL_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt
+ getopts m:p:v:i:h opt
+ case $opt in
+ VERSION=v1
+ getopts m:p:v:i:h opt
+ case $opt in
+ INPUT_DIMS='2,2,2;2,2,2'
+ getopts m:p:v:i:h opt
+ require_arg m aot_test_model
+ '[' -n aot_test_model ']'
+ require_arg p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt
+ '[' -n /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt ']'
+ require_arg i '2,2,2;2,2,2'
+ '[' -n '2,2,2;2,2,2' ']'
+ '[' '!' -f /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt ']'
+++ dirname ./compile_model.sh
++ cd .
++ pwd -P
+ SRC_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc
+ FBCODE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../..
+ FBSOURCE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../..
+ KERNEL_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../../xplat/pytorch_models/build/aot_test_model/v1/nnc
++ echo /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt
++ sed 's/.pt.*//'
+ MODEL_PATH_PREFIX=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model
+ LLVM_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.ll
+ ASSEMBLY_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.s
+ COMPILED_MODEL_FILE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt
+ KERNEL_FUNC_NAME=nnc_aot_test_model_v1_forward
+ cd /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../..
+ buck run //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc -- --model /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt --print_output true --input_dims '2,2,2$
2,2,2' --input_type 'float;float' --input_memory_format 'contiguous_format;contiguous_format'
clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]

Downloaded 1/4 artifacts, 2.11 Kbytes, 50.0% cache miss (for updated rules)
Building: finished in 12.2 sec (100%) 4572/4572 jobs, 3/4572 updated
  Total time: 12.2 sec
BUILD SUCCEEDED
Run with 56 threads
Run with 56 threads
Loading model...
Model loaded: /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt
Running forward ...
(1,.,.) =
 -0.7451 -0.7451
 -0.7451 -0.7451

(2,.,.) =
 -0.7451 -0.7451
 -0.7451 -0.7451
[ CPUFloatType{2,2,2} ]
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Milliseconds per iter: 0.0887. Iters per second: 11274
Memory usage before main runs: 71262208 bytes
Memory usage after main runs: 71573504 bytes
Average memory increase per iter: 31129.6 bytes
0 value means "not available" in above
```

Reviewed By: ljk53

Differential Revision: D31631975

fbshipit-source-id: 7956787b3e121f9c14f4733398a64c2f7ae84373
2021-10-16 20:04:52 -07:00
Priya Ramani
962c6476da Refactor: move method to func compilation work to compileMethod, add option to specify method name (#66726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66726

Move method to func compilation work to compileMethod

Test Plan:
Mobilenetv3 compiles and runs successfully
```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ buck run //caffe2/binaries:aot_model_compiler -- --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="1,3,224,224"
Downloaded 0/4 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 13.2 sec (100%) 18719/18719 jobs, 2/18719 updated
  Total time: 13.5 sec
BUILD SUCCEEDED
The compiled llvm assembly code was saved to mobilenetv3.compiled.ll
The compiled model was saved to mobilenetv3.compiled.pt
```

Reviewed By: ljk53, IvanKobzarev

Differential Revision: D31624342

fbshipit-source-id: 233a6e94ea05ba8d6fc166d2414034c9e58cb076
2021-10-16 20:03:24 -07:00
Priya Ramani
63bb7c6dba Refactor AotCompile to return a pair (#65707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65707

Refactoring aotCompile to return a pair of compiled function and the LLVM assembly instead of updating an incoming string with assembly code

Testing: Gives expected results when compiled and run
```
(pytorch)  ~/local/pytorch refactor_aot
└─ $ build/bin/aot_model_compiler --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="2,2,2"
The compiled model was saved to mobilenetv3.compiled.pt
```

Test Plan: Imported from OSS

Reviewed By: qihqi

Differential Revision: D31220452

Pulled By: priyaramani

fbshipit-source-id: f957c53ba83f876a2e7dbdd4b4571a760b3b6a9a
2021-09-27 18:56:04 -07:00
Priya Ramani
206646d6ed Add NNC AOT Compiler executable (#63994)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63994

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30582149

Pulled By: priyaramani

fbshipit-source-id: 3bbf085428824c3cb308e006c18bb0a57f50fef6
2021-09-15 19:18:24 -07:00