Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65861
First in a series. This PR changes the code in deploy.h/cpp and
interpreter_impl.h/cpp to be camel case instead of snake case. Starting
with this as it has the most impact on downstream users.
Test Plan: Imported from OSS
Reviewed By: shannonzhu
Differential Revision: D31291183
Pulled By: suo
fbshipit-source-id: ba6f74042947c9a08fb9cb3ad7276d8dbb5b2934
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64620
`autograd` extension module's shutdown logic destructs `PyThreadState` by `pybind11::gil_scoped_acquire` using the RAII pattern.
The problem is that torch.deploy also destructs `PyThreadState` as part of its shutdown process (https://www.internalfb.com/phabricator/paste/view/P456363738), causing double destruction, use-after-free.
This change adds `defined(USE_DEPLOY)` as a special case to avoid destruction of `PyThreadState` to the existing special treatment for `IS_PYTHON_3_9_PLUS`.
Test Plan: Added `TorchpyTest.Autograd` unittest to ensure that torch.deploy can create multiple instances that use autograd without causing a crash.
Reviewed By: albanD
Differential Revision: D30779080
fbshipit-source-id: 4de3283cc2d394acc9b8141c17cacbfab5eea052
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63918
Previously we were building with `USE_DISTRIBUTED` off, because c10d was built as a separately library for historical reasons. Since then, lw has merged the c10d build into libtorch, so this is fairly easy to turn on.
Differential Revision:
D30492442
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.intern.facebook.com/intern/diff/D30492442/)!
D30492442
D30492442
Test Plan: added a unit test
Reviewed By: wconstab
Pulled By: suo
fbshipit-source-id: 843b8fcf349a72a7f6fcbd1fcc8961268690fb8c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63380
Crosses off some more of #62011, see the test in the stacked PR #63381
Test Plan: Imported from OSS
Reviewed By: malfet, seemethere
Differential Revision: D30455843
Pulled By: driazati
fbshipit-source-id: d473545d05ffa0b2476968f0b1c55f3a16a2c755
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62669
Useful to avoid having to implement null checking on the application side.
Test Plan: Add unit tests
Reviewed By: suo, houseroad
Differential Revision: D30074406
fbshipit-source-id: 881aec735953b43cb24786c1a2d79e8e724928b8
Summary:
Some refactoring of the custom loader logic:
* Make sure we unregister frames when they are deleted so that future exceptions do not attempt to read unallocated memory
* rename linker -> loader to make its name more correct
* move the build of the loader into lib deploy since it can be shared across interpreters
* unify the logic for finding the library symbol across ops and fbcode
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62223
Test Plan: Imported from OSS
Reviewed By: wconstab
Differential Revision: D29922002
Pulled By: zdevito
fbshipit-source-id: b7f8ee5812e29a5d098fcf1bd9f4cea7d30ecb4c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58117
Previously it was not possible to load C extension modules with deploy because extension
modules need to link against the Python.h API functions. Since
each libtorchdeploy_interpreter.so had its own copy of these functions, it is not possible
to tell dlopen to resolve symbols in a loaded SO from one of these libraries without exposing
its symbols globally.
This patch adds a custom ELF loader which does the custom loading of attaching c extension libraries
to the Python API that loaded the shared library. Simple use of numpy and regex modules appears to work.
This diff has some limitations:
* 64-bit Linux only. OSX and windows use different formats for shared libraries. 32-bit ELF files are not supported.
* debug info is not immediately availiable to debuggers. A script for lldb is provided which can be loaded
so that lldb knows about the libraries as they are loaded.
* shared libraries can directly use the Python API, but libraries they depend on
(via DT_NEEDED entries in their dynamic segment) may not use Python. In the future, we can
try to detect whether a sub library uses the Python API and load it with our customer loader.
* TLS initialization and library initialization may occur in a different order than what would happen with dlopen,
potentially leading to some issues running destructors in TLS segments. Use of this C++ features is relatively rare.
Test Plan: Imported from OSS
Reviewed By: suo
Differential Revision: D28435305
Pulled By: zdevito
fbshipit-source-id: 10f046053dd1d250e3c73f2cce8eb945eeba31b6
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61680
This diff enables torch deploy for fx.graph_module with non-torch dependencies . Here are the issues currently preventing this and are fixed in this change:
- Pickle is used as an internal format to transmit objects between interpreters. It needs to serialize python code, but to be able to get the source code for imports from python_code.globals it needs access to the PackageImporter. Currently a regular _reduce_ function is used which doesn't have the notion of custom importer.
- When deserializing pickled objects on an interpreter, it is passing empty globals to exec, thus it will not be able to resolve non-torch imports located in the package. We need to be able to point exec to our custom PackageImporter.
- Subclasses extending fx.graph_module should be able to optionally provide their own Tracer (extending fx.Tracer).
As a solution a new reducer is introduced (_reduce_deploy_) for torch deploy workflow. Reducer will be registered in _deploy.py (entry point for C++ torch deploy API) when saving the object transmitting it between interpreters. It allows us to pass a proper PackageImporter for each interpreter for pickling/unpickling fx.graph_module. It also defines an api for passing custom fx.Tracer when needed.
Test Plan:
Added UT to cover changes.
```
buck test //caffe2/torch/csrc/deploy:test_deploy
```
```
buck test caffe2/test:fx
```
Reviewed By: suo
Differential Revision: D29690088
fbshipit-source-id: 3a8dbe02d5d7e085534aa61b7773c86f0f8c19b0
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.
I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop
# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
-j \
-s \
-k \
-v \
--paths torch/csrc/ \
-g"-torch/csrc/jit/passes/onnx/helper.cpp" \
-g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
-g"-torch/csrc/jit/serialization/onnx.cpp" \
-g"-torch/csrc/jit/serialization/export.cpp" \
-g"-torch/csrc/jit/serialization/import.cpp" \
-g"-torch/csrc/jit/serialization/import_legacy.cpp" \
-g"-torch/csrc/onnx/init.cpp" \
-g"-torch/csrc/cuda/nccl.*" \
-g"-torch/csrc/cuda/python_nccl.cpp" \
-g"-torch/csrc/autograd/FunctionsManual.cpp" \
-g"-torch/csrc/generic/*.cpp" \
-g"-torch/csrc/jit/codegen/cuda/runtime/*" \
-g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
-g"-torch/csrc/deploy/interpreter/interpreter.h" \
-g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
-g"-torch/csrc/deploy/interpreter/test_main.cpp"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649
Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.
Reviewed By: walterddr, janeyx99
Differential Revision: D29504258
Pulled By: 1ntEgr8
fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58290
this is a helper function to get some python source code loaded
on each interpreter without having to use the standard import system
or packages. Useful for debugging or for writing wrapper classes for
handling loaded modules.
Test Plan: Imported from OSS
Reviewed By: wconstab
Differential Revision: D28435306
Pulled By: zdevito
fbshipit-source-id: b85c16346b9001cd7350d65879cb990098060813
Summary:
Switches most of the simple for loops outside of `jit` directories to use `c10::irange`.
Generated with D28874212.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59481
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D28909681
fbshipit-source-id: ec9ab1bd602933238d9d0f73d4d8d027b75d9d85
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58933
**Summary**
This commit makes load_library calls no-ops inside packages run with
deploy. Libraries containing custom C++ operators and classes are statically linked in C++
and don't need to be loaded. This commit takes advantage of the fact that sys.executable is
set to torch_deploy in deploy and uses that to exit early in load_library if
the program is running inside deploy.
**Test Plan**
This commit adds a test to `generate_examples`/`test_deploy` that
packages and runs a function that calls `load_library`. The library
doesn't exist, but that's okay because the function should be a no-op
anyway.
Test Plan: Imported from OSS
Reviewed By: Lilyjjo
Differential Revision: D28687159
Pulled By: SplitInfinity
fbshipit-source-id: 4a61fc636698e44f204334e338c5ce35257e7ae2
Summary:
Fix is simple; alias inputs before feeding them to distinct
torchdeploy interpreters.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Fixes https://github.com/pytorch/pytorch/issues/58832
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58871
Reviewed By: wconstab, zou3519
Differential Revision: D28646784
Pulled By: ezyang
fbshipit-source-id: 6d2850f3226b5b99468d1465723b421ce4d7ab89
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57985
Fixes https://github.com/pytorch/pytorch/issues/57756
This PR introduces a new `pyobj_interpreter_` field on TensorImpl which tracks what Python interpreter (if any) owns the TensorImpl. This makes it illegal to bind a TensorImpl from multiple Python interpreters, and means that we can now directly store PyObject pointer on TensorImpl even in the presence of multiple Python interpreters, as is the case in torchdeploy. This is a necessary step for PyObject preservation, which cannot be easily implemented when there are multiple Python interpreters.
Although the PR is not that long, there is a very subtle portion of the implementation devoted to ensuring that the tagging process is thread safe, since multiple threads can concurrently try to tag a PyObject. Check Note [Python interpreter tag] and Note [Memory ordering on Python interpreter tag] for detailed discussion of how this is handled. You will have to check this code carefully in code review; I did not torture test the multithreaded paths in any meaningful way.
In a follow up PR, I will pack the interpreter and PyObject fields into single atomic word on 64-bit.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: wconstab
Differential Revision: D28390242
Pulled By: ezyang
fbshipit-source-id: a6d9b244ee6b9c7209e1ed185e336297848e3017
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58412
Second try- avoid ctor/dtor handling this time as it is kind of
pointless if the rethrow will still terminate(), and upsets -Werror=terminate
Original commit changeset: 1775bed18269
Test Plan: existing unit tests and CI
Reviewed By: suo
Differential Revision: D28478588
fbshipit-source-id: 84191cecc3ef52e23f11bfea07bbb9773ebc5df4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58192
Exceptions thrown by deploy internals need to be sanitized
for application safety.
See commment in deploy.h for detailed explanation.
Test Plan: Added unit test
Reviewed By: suo
Differential Revision: D28371127
fbshipit-source-id: c0ced2f194424a394c5852bd4ab5cb41b0f4e87b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57748
To be used by PyTorchPredictor integration for deploy.
Original commit changeset: 4d41efc733b2
Test Plan: tested via new unit tests
Reviewed By: suo
Differential Revision: D28258525
fbshipit-source-id: 8b9436e47501d7c1c16e79909e668100f825711e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57484
To be used by PyTorchPredictor integration for deploy.
Test Plan: tested via new unit tests
Reviewed By: suo
Differential Revision: D28154522
fbshipit-source-id: 5ba57a8d7f01686180e6fd47663635ec3ab2120d
Summary:
In my last PR I've missed CUDA and distributed folders, fixing this now
This change is autogenerated by `python tool/clang_tidy.py -s`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57235
Reviewed By: janeyx99
Differential Revision: D28084444
Pulled By: malfet
fbshipit-source-id: bf222f69ee90c7872c3cb0931e8cdb84f0cb3cda
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53670
This puts deploy into the torch::deploy namespace. It also renames some
objects to better match their behavior:
PythonObject -> Obj, in the future it will refer to either a python object or a handle to a script obj, so rename it torch::deploy::Obj to be generic
MovableObject -> ReplicatedObj, to prevent confusion with "std::move" which is unrelated, and to note that we are replicating this object across interpreters.
Test Plan: Imported from OSS
Reviewed By: wconstab
Differential Revision: D26932131
Pulled By: zdevito
fbshipit-source-id: 8041d6c5b2041a7c3192c1a17d2edb38112a89f3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51754
This API allows you to manage multiple python interpreters in a single
process to deploy PyTorch models packaged with torch.package.
torch/csrc/deploy/deploy.h contains the API definition
torch/csrc/deploy/test_deploy.cpp has some examples.
Notes:
* mutex is added to PyTorchStreamReader to make it safe to use from multiple threads at once.
* USE_DEPLOY is only true for the special libtorch_deployinterpreter.so library, when enabled
we use a hash table to maintain PyObject <> at::Tensor mappping rather than the internal pointer
in Tensor since >1 interpreter may have a reference to the tensor.
* serialization.py has some additional functions for creating pickle objects
but keeping storages in memory for use transfering tensors between interpreters
Test Plan: Imported from OSS
Reviewed By: wconstab
Differential Revision: D26329468
Pulled By: zdevito
fbshipit-source-id: d75f4ebb9a27f1d911179d9996041bcb3ca04a07