Summary: `IValue::toString()` creates a `new c10::intrusive_ptr` (like `std::shared_ptr`) and `->string()` immediately accesses it, creating an atomic reference increment/decrement. We can skip both of these operations by calling `IValue::toStringRef()`.
Test Plan: CI
Reviewed By: jaybean-dev
Differential Revision: D39605242
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85437
Approved by: https://github.com/jfix71
Summary:
Addresses the following build failure that we get on some of our internal build environments:
caffe2/torch/csrc/deploy/environment.h:60:5: error: ignoring return value of function declared with 'warn_unused_result' attribute [-Werror,-Wunused-result] system(rmCmd.c_str());
Test Plan: buck build //caffe2/torch/...
Differential Revision: D39364411
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84862
Approved by: https://github.com/PaliC
We define specializations for pybind11 defined templates
(in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently
it is important that these specializations *always* be #include'd
when making use of pybind11 templates whose behavior depends on
these specializations, otherwise we can cause an ODR violation.
The easiest way to ensure that all the specializations are always
loaded is to designate a header (in this case, torch/csrc/util/pybind.h)
that ensures the specializations are defined, and then add a lint
to ensure this header is included whenever pybind11 headers are
included.
The existing grep linter didn't have enough knobs to do this
conveniently, so I added some features. I'm open to suggestions
for how to structure the features better. The main changes:
- Added an --allowlist-pattern flag, which turns off the grep lint
if some other line exists. This is used to stop the grep
lint from complaining about pybind11 includes if the util
include already exists.
- Added --match-first-only flag, which lets grep only match against
the first matching line. This is because, even if there are multiple
includes that are problematic, I only need to fix one of them.
We don't /really/ need this, but when I was running lintrunner -a
to fixup the preexisting codebase it was annoying without this,
as the lintrunner overall driver fails if there are multiple edits
on the same file.
I excluded any files that didn't otherwise have a dependency on
torch/ATen, this was mostly caffe2 and the valgrind wrapper compat
bindings.
Note the grep replacement is kind of crappy, but clang-tidy lint
cleaned it up in most cases.
See also https://github.com/pybind/pybind11/issues/4099
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552
Approved by: https://github.com/albanD
Summary: This adds logs for usage of deploy and package. These can be used to track where it's being used in production so we can support it better.
Test Plan: no functional changes - existing tests
Reviewed By: PaliC
Differential Revision: D36258876
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77097
Approved by: https://github.com/PaliC
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76538
when running the example from the docs, I found that these steps were not working.
These are the updates necessary to get the example working.
Test Plan: n/a
Reviewed By: PaliC
Differential Revision: D35998155
fbshipit-source-id: d78bb2886f94889abae5a3af5239fcd306cd5e09
(cherry picked from commit 6893812efe7443b437ccafb7b1ff6bc7bd2e6670)
Summary:
This adds dummy metadata for frozen builtin packages when using `torch::deploy`. This is a bit hacky but unblocks allows Huggingface transformers library to be used within `torch::deploy` which depends on `importlib.metadata.version` to detect whether torch is installed or not.
https://github.com/huggingface/transformers/blob/main/src/transformers/utils/import_utils.py#L49
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76211
Test Plan: Added `importlib.metadata.version("torch")` unit test
Reviewed By: kiukchung, PaliC
Differential Revision: D35834831
Pulled By: d4l3k
fbshipit-source-id: e58365e1ada69299adea96f0ca1fe211e092dd97
(cherry picked from commit c4b4152a24dcdf359503db2112a10a88633e67b6)
Summary:
**context**
GLIBCXX_USE_CXX11_ABI dictates whether the compiler uses the new (1) or old (0) ABI.
The new abi defines strings as: `std::__cxx11::string`
The old abi defines strings as: `std::string`
Usually `GLIBCXX_USE_CXX11_ABI` defaults to 1 on most systems.
**the problem**
When I build pytorch from source, grpc, glog and my server example, since my system's default `GLIBCXX_USE_CXX11_ABI` is 1, they are built with an ABI of 1 ( they look for `std::__cxx11::string`).
`pytorch` uses the `TORCH_CXX_FLAGS` variable to dictate the value of `GLIBCXX_USE_CXX11_ABI`. (https://www.internalfb.com/code/aros/[64134af5d4c7]/xros/third-party/caffe2/caffe2/cmake/TorchConfig.cmake.in?lines=167).
**Allthough `TORCH_CXX_FLAGS` would get set to `GLIBCXX_USE_CXX11_ABI=1`, when building torch deploy, the variable would be unset, and this would cause torch deploy to be built with `GLIBCXX_USE_CXX11_ABI=0` no matter what.**
This leads to `undefined symbol` errors because the server example (built with abi 1) will look for strings in torch::deploy as `std::__cxx11::string`, and these will be non existent since all strings in torch::deploy are defined as `std::string`.
**solution**
Re-define this variables `CMAKE_CXX_FLAGS` and `TORCH_CXX_FLAGS` inside torch deploy's build to match that of pytorch's.
Test Plan: Tested build in OSS and it works. No more undefined symbol errors due to `std::__cxx11::string`.
Reviewed By: PaliC
Differential Revision: D35694220
fbshipit-source-id: 678a9487a65dbc06b8b5b308d0e3714a85d84547
(cherry picked from commit 7f53b34b3cd479a209161e47187d4bf0507e6747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75461
This flag is needed in the OSS example and it wasn't clear that it was needed because it wasn't explicitly used when linking to torch deploy for the tests.
Since torch deploy builds the tests using `python setup.py develop`, Pytorch actually sets this flag in the `CMAKE_EXE_LINKER_FLAGS` variable somewhere along the build. I had to print out all the variables used in the pytorch build to realize that I did not have this flag set in my OSS torch deploy example.
I think having it explicit for the tests makes it clear which flags are actually necessary in an open source environment.
**What is -rdynamic?**
This flag (also known as `--export-dynamic` at the linker level) signals that the library it is targeting should export its symbols to the dynamic table. In doing so, shared libraries that are opened using `dlopen` (which is what torch deploy uses to launch subinterpreters: https://www.internalfb.com/code/fbsource/[ff6d5cfcc2b3]/xplat/caffe2/torch/csrc/deploy/deploy.cpp?lines=254), will be able to reference symbols defined in the modules that launched them.
Without this flag, the symbols from the `torch::deploy` library that the subinterpreters need, remain `undefined`.
This leads to runtime errors like:
`/tmp/torch_XYZ: undefined symbol - torch::deploy::Etc.`
`/tmp/torch_XYZ` is a subinterpreter.
Test Plan:
Ran the tests in an OSS environment
`python torch/csrc/deploy/example/generate_examples.py`
`./build/bin/test_deploy`
Reviewed By: PaliC
Differential Revision: D35477135
fbshipit-source-id: 30bd2b9fadd36b2a32066a52cda5b746d597e99f
(cherry picked from commit efb8030d41c4f657820d0121c5a2de2fa2e0b240)
Summary:
As it should never be negative, should it?
Also, add `torch/csrc/deploy` to the list of clang-format checked folders (as they are internally)
Last but not least: clang-tidy correctly identifies `totNumModules <= SIZE_MAX / sizeof(struct _frozen) - 1` as unneeded always true check (as `totNumModules` is int32, while SIZE_MAX is int64 and `sizeof(sturct_frozen)` is less than 4Gb ;) )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74978
Reviewed By: suo, tugsbayasgalan
Differential Revision: D35261476
Pulled By: malfet
fbshipit-source-id: 8a3432d2d9e96ded3f08baee14ccb43d2635a67d
(cherry picked from commit 21f6c33166c8e4e16dcac0248cb9006f69e222a1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74705
As the comment says, libdl might not be separate because it may be subsumed into libc.
Test Plan:
1) existing tests
2) this is being sent out on top of platform010 migration for caffe2
Reviewed By: d4l3k, r-barnes
Differential Revision: D35117159
fbshipit-source-id: c4a6de7c3412db695509bd25d529658cdf785e3d
(cherry picked from commit 563919d4c5fd7a9cbdc03d24b1afc5b6a2c09cc8)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73676
For some reason https://github.com/pytorch/pytorch/pull/72637 ended up in getting messed up during rebasing so please refer to that pr for review history.
This PR creates a new workflow called ` deploy-linux-xenial-cuda11.3-py3.7-gcc7` for torch::deploy tests.
For testing go to https://www.torch-ci.com/pytorch/pytorch/pull/73676 and check if a build and test job occur with ` deploy-linux-xenial-cuda11.3-py3.7-gcc7`
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D34586702
Pulled By: PaliC
fbshipit-source-id: 5627cf4ff411a4a04030f8b7726f84af979da213
(cherry picked from commit df6dddebb9fe078a6053a31033b5a40cc742fcf3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74286
This diff replaces `c10::optional` with the implementation it's based on in https://github.com/akrzemi1/Optional
in order to help torch::deploy function without a dependency on `torch`
https://github.com/pytorch/pytorch/pull/74002
Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy
Reviewed By: d4l3k
Differential Revision: D34907002
fbshipit-source-id: 93a3386f43d1c426f23c6dab5f898ed63b547a5c
(cherry picked from commit 182d9f70459f761aaa80a03cc68dd6cb9c07bfae)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74283
Remove `c10::errors` from torch::deploy and replace them with `multipy::errors` which is effectively a wrapper around `std::runtime_error.
Review History can be found with https://github.com/pytorch/pytorch/pull/73456
Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy
Reviewed By: aivanou
Differential Revision: D34905174
fbshipit-source-id: 8883fc77dce66c489fa3fa9d14a71d1de1e0cc5f
(cherry picked from commit 7fffcdf93648e8141159fe7b1669644db4281bf4)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73456
Replaces `at::Error` with a more simple implementation of exceptions in order to reduce the dependency of torch::deploy on torch.
Note: Internal Testing / changes are still needed
Test Plan: Imported from OSS
Reviewed By: samdow
Differential Revision: D34868005
Pulled By: PaliC
fbshipit-source-id: c8bb1f7a2b169b5a8e3b63a697e0ced748a0524c
(cherry picked from commit 51b3763d16e74458a5cfb8e4d660806dea897617)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73085
I'm wrapping up the conversion of type comments to type annotations
in caffe2. The last remaining "bulk" codemod has test failures that
are hard for me to understand, so I'm going to submit PRs for each
module individually which makes it easier to see what's causing
problems.
All the codemods were produced via LibCST and then manually cleaned up.
Test Plan: Wait for github CI
Reviewed By: shannonzhu
Differential Revision: D34344276
fbshipit-source-id: f64edc13533a6f62fb278dd16fe68f74d89442a7
(cherry picked from commit 061c60e918169ac0006f73f27c4f2a7a83a76249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72836
Replacing increment iterator loops with ranged loops. It allows loops such as for(int i=0;i<10;i++) to be expressed as for(const auto i : c10::irange(10)). This auto-types the loops and adds const-safety to the iteration variable.
Reviewed By: albanD
Differential Revision: D34136539
fbshipit-source-id: 760a70ad43ce6f05630ba8fea261d4dbb699e62e
(cherry picked from commit 0428408d88)
Summary:
`include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes `python`, `numpy` and `pybind11` into targets that only `torch_python` and `caffe2_pybind_state` are linked to. So, python libraries can't be accidentally included elsewhere.
Resubmit of https://github.com/pytorch/pytorch/issues/65654, Closes https://github.com/pytorch/pytorch/issues/65828
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69085
Reviewed By: anjali411
Differential Revision: D33776456
Pulled By: malfet
fbshipit-source-id: 018b0f6cd5a4f8c9e36df961deff832bc4afd479
(cherry picked from commit 57063107d6)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68945
This PR enables the Python conversion functions for `Storage` (specifically `UntypedStorage`) and also cleans up some remnants of the deprecated typed storages from `DynamicTypes.cpp`.
ghstack-source-id: 147245110
Test Plan: Run the existing unit and integration tests.
Reviewed By: albanD
Differential Revision: D32676505
fbshipit-source-id: 3a3f6db4fb0da5c78dd406c96ab70bdc37015521
(cherry picked from commit d6427b94cf)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71072
This PR replaces the old logic of loading frozen torch through cpython by directly loading zipped torch modules directly onto deploy interpreter. We use elf file to load the zip file as its' section and load it back in the interpreter executable. Then, we directly insert the zip file into sys.path of the each initialized interpreter. Python has implicit ZipImporter module that can load modules from zip file as long as they are inside sys.path.
Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy
Reviewed By: shunting314
Differential Revision: D32442552
fbshipit-source-id: 627f0e91e40e72217f3ceac79002e1d8308735d5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71197
Adds back legacy support for emmbedded interpreter to use .data section in internal use cases. Specifically this allows for dynamic loading of python extension files.
Test Plan: buck test mode/opt //caffe2/torch/csrc/deploy:test_deploy_gpu_legacy
Reviewed By: shunting314
Differential Revision: D33542636
fbshipit-source-id: b49f94163c91619934bc35595304b9e84d0098fc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70326
See D24145988 for context: it allows loops such as for(int i=0;i<10;i++) to be expressed as for(const auto i : c10::irange(10)). This is nice because it auto-types the loops and adds const-safety to the iteration variable.
Test Plan: buck run //caffe2/torch/fb/sparsenn:test
Reviewed By: r-barnes
Differential Revision: D33243400
fbshipit-source-id: b1f1b4163f4bf662031baea9e5268459b40c69a3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69251
This adds some actual documentation for deploy, which is probably useful
since we told everyone it was experimentally available so they will
probably be looking at what the heck it is.
It also wires up various compoenents of the OSS build to actually work
when used from an external project.
Differential Revision:
D32783312
D32783312
Test Plan: Imported from OSS
Reviewed By: wconstab
Pulled By: suo
fbshipit-source-id: c5c0a1e3f80fa273b5a70c13ba81733cb8d2c8f8
Summary:
Previously I need to back out D32220626 and then apply D31841609 to run the textray unity demo. It's hard to have other people to take a look how this textray demo looks like.
I copied the textray demo (a single file) from pytext folder to unity folder and applied the changes needed. This way, other people can also run this textray demo. This also makes my dev environment cleaner.
Test Plan: buck run mode/opt :textray_demo
Reviewed By: mleshen
Differential Revision: D32537190
fbshipit-source-id: 5df6347c4bec583c225aea9f98fbc9f37b5d3153
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67814
There was a limitation on the xar file size we can embed into the binary previously. The payload (xar file here) is added to .data section by default using 'ld -b binary -r' command (which section the payload goes is hardcoded in ld BTW. Check code pointer [here](https://github.com/bminor/binutils-gdb/blob/binutils-2_32/bfd/binary.c#L80) ) . When we link the object file containing the payload to other parts of the executable, we will get relocation out of range error if the overall size of .test, .data, .bss etc sections exceed 2G. Some relocation entries uses 32 bit singed integer, thus the limit is 2G here.
To solve the issue and mitigate the risk, we designed a mechanism to put the payload in a customized payload section (.torch_deploy_payload.unity here). The payload section does not join the party of relocating and symbol resolution, thus in theory it can be as large as the disk space... Since we don't do relocation for the payload section, the start/end/size symbols are no longer available/valid, we have to parse the ELF file ourselves to figure out those.
The mechanism can be used to embed interprter.so as well. The interpreter.so is currently 0.5G. That will limit the other .test/.data/.bss sections of the executable to be at most 1.5G. Using this mechanim in this diff avoid the interpreter.so taking any budgets. We could also use this mechanism to ship python scripts with our binary rather than freeze them before hand. These use cases are not handled in this diff.
This diff also improves experience for those simple use cases that does not depends on extra shared libraries in the XAR file (except the shared libraries for python extensions themselves). This is mainly for fixing the stress test right now, but it also makes other simple cases easier.
ghstack-source-id: 142483327
Test Plan:
# Verify the relocation out of range issue is fixed
Add //caffe2:torch as a dependency to the macro build_unity(name="example", …) in torch/csrc/deploy/unity/TARGETS and run 'buck run mode/opt :unity_demo', it's expected to get the relocation errors like:
```
ld.lld: error:
caffe2/c10/util/intrusive_ptr.h:325:(.text._ZN11ska_ordered8detailv317sherwood_v3_tableISt4pairIN3c106IValueES4_ES4_NS3_6detail11DictKeyHashENS0_16KeyOrValueHasherIS4_S5_S7_EENS6_14DictKeyEqualToENS0_18KeyOrValueEqualityIS4_S5_SA_EESaIS5_ESaINS0_17sherwood_v3_entryIS5_EEEE15emplace_new_keyIS5_JEEES2_INSH_18templated_iteratorIS5_EEbEaPSF_OT_DpOT0_+0x4E9): relocation R_X86_64_32S out of range: 2345984168 is not in [-2147483648, 2147483647]; references c10::UndefinedTensorImpl::_singleton
>>> defined in /data/sandcastle/boxes/fbsource/fbcode/buck-out/opt/gen/caffe2/c10/c10#platform009-clang,static/libc10.a(../c10#compile-UndefinedTensorImpl.cpp.o44c44c4c,platform009-clang/core/UndefinedTensorImpl.cpp.o)
```
With the diff, the error above is resolved.
# Pass Stress Test
Also pass existing unit tests for unity.
buck test mode/opt //caffe2/torch/csrc/deploy/unity/tests:test_unity_sum -- --exact 'caffe2/torch/csrc/deploy/unity/tests:test_unity_sum - UnityTest.TestUnitySum' --run-disabled --jobs 18 --stress-runs 10 --record-results
buck test mode/opt //caffe2/torch/csrc/deploy/unity/tests:test_unity_simple_model -- --exact 'caffe2/torch/csrc/deploy/unity/tests:test_unity_simple_model - UnityTest.TestUnitySimpleModel' --run-disabled --jobs 18 --stress-runs 10 --record-results
# Verify debug sections are not messed up
Verified that debug sections are not messed up and GDB still works:
`gdb ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/unity/unity_demo`
```
b main
run
l
c
```
Reviewed By: suo
Differential Revision: D32159644
fbshipit-source-id: a133513261b73551a71acc257f4019f7b5af34a8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67134
This diff demos torch::deploy unity which builds the model, the dependencies and the runtime as a unity!
The end user only need to use the build_unity rule to replace the python_binary rule to define the python application. Under the hood, we build the python application (an xar file), build the torch deploy runtime, and then embed the python application (the xar file) into the torch deploy runtime.
When starting the torch::deploy runtime, the xar will be written to the filesystem and extracted. We put the extracted path to python sys.path so all the model files and all the python dependencies can be found!
As a demo, the model here is just a simple python program using numpy and scipy. But theoretically, it can be as complex as we want.
I'll check how bento_kernel works. Maybe we can learn from bento_kernel to simplify things a bit.
ghstack-source-id: 142085742
Test Plan:
```
#build
buck build mode/opt unity:unity
# make sure the path exists before we start torch::deploy runtime
# Otherwise the dynamic loader will just skip this non-existing path
# even though we create it after the runtime starts.
mkdir -p /tmp/torch_deploy_python_app/python_app_root
#run
LD_LIBRARY_PATH=/tmp/torch_deploy_python_app/python_app_root ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/unity/unity
```
Reviewed By: suo
Differential Revision: D31816526
fbshipit-source-id: 8eba97952aad10dcf1c86779fb3f7e500773d7ee
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67499
Since https://github.com/pytorch/pytorch/pull/62030 was landed, storages being produced when loading from a pickle are of type TypedStorage. We weren't catching this in our deploy serialization, leading tensors to actually get pickled instead of the storages getting shared across interpreters.
Since this is technically correct still, it wasn't caught by any of our tests, until someone tried to pass a really big tensor and started ooming.
ghstack-source-id: 141869521
Test Plan: added unit test
Reviewed By: shunting314
Differential Revision: D32004075
fbshipit-source-id: ef5a80cd3cb1dff0b6b4c1b6c95923e4faab7d50