Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65968
tryToGraphFunction() should cover all cases and more composable than
adhoc virtual methods.
ghstack-source-id: 141759214
Test Plan: no behavior change.
Reviewed By: gmagogsfm
Differential Revision: D31326154
fbshipit-source-id: 692a35df424f7d4f777a96489c4cbb24b3ae7807
Summary:
Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b)
This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast.
We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)`
The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md
This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs.
Few limitation/challenge that is not properly resolved in this PR:
1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules.
2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input')
3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value.
Credit goes mostly to:
tlemo
kevinstephano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939
Reviewed By: navahgar
Differential Revision: D31093381
Pulled By: eellison
fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967
Graph is an implementation detail. If user wants to get access to the
underlying graph, they should be able to explicitly dynamic cast instead.
ghstack-source-id: 141659819
Test Plan: no behavior change.
Reviewed By: gmagogsfm
Differential Revision: D31326153
fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.
I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop
# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
-j \
-s \
-k \
-v \
--paths torch/csrc/ \
-g"-torch/csrc/jit/passes/onnx/helper.cpp" \
-g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
-g"-torch/csrc/jit/serialization/onnx.cpp" \
-g"-torch/csrc/jit/serialization/export.cpp" \
-g"-torch/csrc/jit/serialization/import.cpp" \
-g"-torch/csrc/jit/serialization/import_legacy.cpp" \
-g"-torch/csrc/onnx/init.cpp" \
-g"-torch/csrc/cuda/nccl.*" \
-g"-torch/csrc/cuda/python_nccl.cpp" \
-g"-torch/csrc/autograd/FunctionsManual.cpp" \
-g"-torch/csrc/generic/*.cpp" \
-g"-torch/csrc/jit/codegen/cuda/runtime/*" \
-g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
-g"-torch/csrc/deploy/interpreter/interpreter.h" \
-g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
-g"-torch/csrc/deploy/interpreter/test_main.cpp"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649
Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.
Reviewed By: walterddr, janeyx99
Differential Revision: D29504258
Pulled By: 1ntEgr8
fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
Summary:
By default, TorchScript execution is single threaded and uses the caller's thread pool. For the use case of distributed inference, we hope there is a way to customize the behavior where the interpreter in torch script can be executed in other places. This diff allows an explicit taskLauncher for torchscript interpreter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46865
Test Plan:
unit test is passed.
fbshipit-source-id: 1d7b003926c0d1f8facc53206efb960cff8897ac
Fixes #{issue number}
Reviewed By: houseroad
Differential Revision: D24616102
Pulled By: garroud
fbshipit-source-id: 79202b62f92d0b0baf72e4bf7aa3f05e0da91d59
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36093
Unwrap any tuples (including NamedTuples) in the module forward
function input list to be arglist.
1. Supports multiple tuple inputs, and traces their use through CallMethods and
TupleIndex
2. Does not unwrap inner use of other tuples that did not show up in the
original toplevel graph inputs
We work from the ScriptModule level instead of the Graph level because:
1. If the ScriptModule was previously called with the original set of inputs, the GraphExecutor caches the ExecutionPlan (specifically, ArgumentSpecCreator is derived from the Graph and type check the inputs passed in)
2. Since we are changing this graph's inputs, we clone the module and clear the GraphExecutor.
Since we work from ScriptModule level, we cannot take advantage of jit level syntactic sugar like run_pass(), so I jit exposed this as a cpp extension. Let me know if there are other ideas about this.
Test Plan:
buck test caffe2/torch/fb/model_transform:signature_translation_test
Todo: Verify use in bento
Untranslated graph:
```
> graph(%self : __torch__.test_jit.SparseNNWrapper,
> %inputs.1 : NamedTuple(dense : Tensor, sparse : Dict(int, Tensor))):
> %2 : __torch__.test_jit.SparseNN = prim::GetAttr[name="main_module"](%self)
> %4 : Tensor = prim::CallMethod[name="forward"](%2, %inputs.1) # /data/users/ansha/fbsource/fbcode/buck-out/dev/gen/caffe2/test/jit#binary,link-tree/test_jit.py:12141:23
> return (%4)
```
Translated graph:
```
> graph(%self : __torch__.test_jit.___torch_mangle_1.SparseNNWrapper,
> %inputs.1_0 : Tensor,
> %inputs.1_1 : Dict(int, Tensor)):
> %2 : __torch__.test_jit.___torch_mangle_2.SparseNN = prim::GetAttr[name="main_module"](%self)
> %3 : Tensor = prim::CallMethod[name="forward"](%2, %inputs.1_0, %inputs.1_1) # /data/users/ansha/fbsource/fbcode/buck-out/dev/gen/caffe2/test/jit#binary,link-tree/test_jit.py:12141:23
> return (%3)
```
Reviewed By: houseroad
Differential Revision: D20313673
fbshipit-source-id: fddd07c9537dc8b6f480a14d697bea10ecc74470
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34109
This change adds glue to GraphExecutor to give the RPC server
access to the future-based Interpreter::runAsync() api.
Previously, if a server encounted a TorchScript continuation-based block
with fork/wait, it would simply block in the server thread until the handler
completed, since it uses the synchronous Interpreter::run() api.
With the ivalue::Future returned by the Interpreter, we can run the
TorchScript code asynchronously from c++ simply by connecting its
callback to the server callback.
We add test cases to cover the new logic, both rpc_async and remote.
ghstack-source-id: 101245438
Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc/...
Differential Revision: D20194321
fbshipit-source-id: 16785ec5d9ed0b16cb1ffab0a9771a77de30fcb0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710
Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate.
Test Plan: unit test (test_misc.cpp/testRecordFunction)
Reviewed By: gdankel, dzhulgakov
Differential Revision: D20158523
fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34623
The bandaid of "AT_WARN" keeps introducing new warnings. Let's get rid
of it entirely.
Close#34502
Test Plan: Imported from OSS
Differential Revision: D20420112
Pulled By: albanD
fbshipit-source-id: 7160c113cb4deb2d2f50a375356f423fe5e86f50
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33921
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.intern.facebook.com/intern/diff/D20153092/)!
Test Plan: Imported from OSS
Differential Revision: D20177227
Pulled By: jamesr66a
fbshipit-source-id: 87f3e484c4f873d60f76f50f6789c1b4a73bdfde