Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73750
My intuition is that the delay release of input and intermediate tensors may cause memory being accumulated. Especially for camera-based memory intensive situations, the runloop is full of all sorts of events. Thus, default `autoreleasepool` may not be efficient. The fix here is to manually wrap the prediction call inside a `autoreleasepool` to force releasing intermediate objects. Apple's doc - https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/MemoryMgmt/Articles/mmAutoreleasePools.html
ghstack-source-id: 150411705
Test Plan:
- CI
- Check the OOM data in QE
Reviewed By: dreiss
Differential Revision: D34605399
fbshipit-source-id: 413564d7ec560082a6572c5542e2b4da433ee62f
(cherry picked from commit ea2b613c16ffad329ccd463a44d6635db0681def)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71597
Problem: _jit_to_backend overrides get/set state. This means any attributes added to the module after lowering will not be preserved after serialization. For edge workflows the biggest problem here is it breaks bundled_inputs.
Solution?:
Real quick and easy way to handle issues with to_backend overriding get/set state. Wraps the lowered module in another module and has forwarding functions for the api specified in 'method_compile_spec'.
The tradeoff with this approach is now the actual workhorse of the module is 1 layer deep which might make debugging slightly grosser/more difficult/confusing. The other approach Martin David and I talked about would be to only lower the portions that require custom get/set state logic. This leaves the top level the same, and only specific backened internals are changed. Personally I'm not sure how much that really addresses the debugging concern all that well. It seems like if you cracked the model open you'd still run into similar amounts of confusion with a lot of the variables and logic referenced coming from another module.
The other concern with this approach is whether or not 'compile_spec' specifies the public api of the module (since thats our source of truth for this wrapper). While it may not be enforced, it certainly seems to be true by convention and the to_backend api already uses it as a source of truth for all functions that get generated in the resulting module. I say we just formally commit to this (compile spec keys being functions) being the contract of the api instead of just assuming it to be the case and then having weird behavior if its not.
Test Plan:
New Unit Test
CI to check for existing behavior and contracts.
manually tested in a notebook with bundled inputs.
{P475790313}
Reviewed By: raziel
Differential Revision: D33694257
fbshipit-source-id: 9ff27db421eba41bac083dff11a22e9e40a36970
(cherry picked from commit 91ef49977e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69438
We don't need to recompile the model if the OS version is not changed. This could save hundreds of ms when loading the model.
{F683788183}
ghstack-source-id: 144784720
ghstack-source-id: 144821734
Test Plan:
1. Test in the playground app
2. Test in the ig
Reviewed By: hanton
Differential Revision: D32866326
fbshipit-source-id: ae2174f68dda4d2ab89ee328cb710c08d45c4d9a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69234
We don't need to recompile the model if the OS version is not changed. This could save hundreds of ms when loading the model.
{F683788183}
ghstack-source-id: 144784720
Test Plan:
1. Test in the playground app
2. Test in the ig
Reviewed By: hanton
Differential Revision: D32743881
fbshipit-source-id: 2e94c6035520de3eeaf0b61f7cf9082228c8a955
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67412
For inputs, we'll be using the shape from PyTorch tensors. For outputs, we'll be using the shape from MLMultiArray. Thus, we can decouple from the symbolic shapes defined in the compile spec.
ghstack-source-id: 141746346
Test Plan:
- Sandcastle
- buck test pp-ios
Reviewed By: hanton
Differential Revision: D31299408
fbshipit-source-id: 337d5bb9efc2ff51409586c288d607399b937212
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967
Graph is an implementation detail. If user wants to get access to the
underlying graph, they should be able to explicitly dynamic cast instead.
ghstack-source-id: 141659819
Test Plan: no behavior change.
Reviewed By: gmagogsfm
Differential Revision: D31326153
fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66584
This will help avoid "-1"s in different places in our and backend codebase when
debug handle is not known.
Test Plan: CI
Reviewed By: sxu
Differential Revision: D31614478
fbshipit-source-id: 97fceb04e3e78f52feda7b1ba1da08fa4480dd77
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65355
I've been seeing heap corruptions in the CMake builds due to the NSError* not being initialized with `nil`. However, I haven't see this issue for the BUCK builds.
ghstack-source-id: 138502708
Test Plan:
1. Test the OSS builds to make sure the heap corruption has gone.
2. Test the Buck build in the playground app
3. Circle CI
Reviewed By: hanton
Differential Revision: D31048010
fbshipit-source-id: cfd8d614f3f91f09caee4aab61237007ec080481
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64522
The `PTMCoreMLExecutor` serves as a bridge between the delegate APIs and Core ML runtime.
ghstack-source-id: 138324217
allow-large-files
Test Plan:
iOS:
Run the CoreML tests in the playground app
MacOS:
```
buck test pp-macos
PASS 633ms 1 Passed 0 Skipped 0 Failed CoreMLTests
```
{F657776101}
Reviewed By: raziel, iseeyuan
Differential Revision: D30594042
fbshipit-source-id: a42a5307a24c2f364333829f3a84f7b9a51e1b3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63489
A few quick improvements to the Android NNAPI Delegate, some of which were discussed here https://github.com/pytorch/pytorch/pull/62272:
1) `throw std::exception` replaced with `TORCH_CHECK` to reduce runtime
size (nnapi_backend_lib.cpp)
2) weights processing moved from compile to preprocess step, since it can
be done AOT (nnapi_backend_lib.cpp & nnapi_backend_preprocess.cpp)
3) `ser_model_` and `shape_compute_module_` member variables removed, since they are never used after
`init()`, so they are not needed (nnapi_backend_lib.cpp)
Test Plan:
Unit tests: `python test/test_jit.py TestNnapiBackend`
Run SparkAR segmentation with delegated NNAPI as done here D30259033 (can use `jf download GAekdAwsyGKXhggFALN4LnSBTzcubsIXAAAz --file "v303-nnd-mod.ptl"` to get a preprocessed model from these changes)
Imported from OSS
Reviewed By: raziel, iseeyuan
Differential Revision: D30398880
fbshipit-source-id: b6872e1e9ccd583622b80659da00c83fdd82580e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63092
Bug discovered while testing NNAPI Delegate on SparkAR.
Using
```
c10::IntArrayRef order = {0, 2, 3, 1};
fixed_inputs.push_back(tensorInp.get(i).permute(order).contiguous());
```
results in a garbage value for order in `permute()`.
Moving order inside the call to `permute()` fixes this issue. Problem is seemingly related to https://github.com/pytorch/pytorch/issues/44409, but luckily the solution in this case is simple.
Bug wasn't caught earlier, since regular unit tests weren't affected by the dangling pointer, and address sanitizer NNAPI tests are turned off due to there being a different failure (T95764916).
ghstack-source-id: 135526129
Test Plan:
Run Unit tests: `python test/test_jit.py`
Build and run SparkAR on an Android phone at the top of this diff stack (D30173959): `buck build --show-output arstudioplayer_arm64_debug -c pt.enable_nnapi=1`
Reviewed By: raziel, iseeyuan
Differential Revision: D30237504
fbshipit-source-id: c946d81feefc453b43d9295d8d6f509cafdcec03
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62272
Added Android NNAPI delegate implementation of runtime initialization (compilation) and execution.
The delegate's preprocess step was [previously implemented](https://github.com/pytorch/pytorch/pull/62225). Now, the reset of the delegate, which implements client-side execution, is added.
**nnapi_backend_lib.cpp**:
Implementation of delegate's compile and execute.
`execute()` is essentially a C++ implementation of [`NnapiModule`](https://github.com/pytorch/pytorch/blob/master/torch/backends/_nnapi/prepare.py), which wraps an NNAPI Compilation and handles preparation of weights, inputs, and outputs.
- Any steps that can be done before execution are moved to `compile()`.
- `init()` cannot be moved to `compile()` because it requires real inputs for dynamic shaping.
- `shape_compute_module` cannot currently be deserialized in `compile()`, since mobile::Module has no IValue conversion.
- Processed arguments that are modified by `init()` must be kept as member variables. Any other processed arguments are passed through a dictionary, `handles`.
**nnapi_bind.cpp & nnapi_bind.h**:
Created a header file for `nnapi_bind.cpp`, so that it's NnapiCompilation class can be used by `nnapi_backend_lib.cpp`.
**test_backend_nnapi.py**:
Enabled execution testing.
ghstack-source-id: 135432844
Test Plan:
Imported from OSS
Tested on devserver.
1. Load and unpack a special devserver build of NNAPI: `jf download GICWmAAzUR0eo20TAPasVts8ObhobsIXAAAz --file "nnapi-host-linux.tar.xz"`
2. `export LIBNEURALNETWORKS_PATH=/path/to/libneuralnetworks.so`
3. Run unittests: `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py`
TODO: test with lite interpreter runtime
Reviewed By: raziel, iseeyuan
Differential Revision: D29944873
fbshipit-source-id: 48967d873e79ef2cce9bcba2aeea3c52f7a18c07
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62433
Without type information, default type is Tensor which may conflict at runtime.
Test Plan: CI
Reviewed By: raziel
Differential Revision: D29990902
fbshipit-source-id: 0a38843d7d0612a458bb38fad7c86bad08c7197b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62225
Rewrote the preprocess function for Android NNAPI delegate.
Previously, `preprocess()` called `convert_model_to_nnapi()` using Pybind and returned a NnapiModule that is serialized for mobile. Now, `preprocess()` calls a sub-function of `convert_model_to_nnapi()` and returns several preprocessed items (that were previously components of NnapiModule).
Dictionary returned contains:
"shape_compute_module": torch::jit::Module,
"ser_model": torch::Tensor,
"weights": List[torch.Tensor],
"inp_mem_fmts": List[int],
"out_mem_fmts": List[int]
**Purpose and Future:**
The purpose of these changes are to move more implementation from bytecode and Torchscript to the delegate API, since bytecode is less efficient.
Now, only the shape computation uses bytecode. In the future, shape computation will be moved out of Torchscript as well.
**nnapi_backend_preprocess.cpp:** preprocess implementation
**prepare.py**: refactored a portion of `convert_model_to_nnapi()` to `process_for_nnapi()`, so preprocess can get components of NnapiModule
**Test:**
Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully
ghstack-source-id: 134444190
Test Plan: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully
Reviewed By: raziel
Differential Revision: D29922279
fbshipit-source-id: cadcf8908d8a745dc7abbe286e97d6ead937d4ab
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62213
Added sanity checks in preprocess function for Android NNAPI delegate.
`preprocess()` requires some input metadata passed through its `method_compile_spec` function argument.
`preprocess()` now throws specific error messages, if it cannot find the correct input arguments.
Example error message:
```
RuntimeError: method_compile_spec does not contain the "forward" key.
method_compile_spec should contain a Tensor or Tensor List which bundles input parameters: shape, dtype, quantization, and dimorder.
For input shapes, use 0 for run/load time flexible input.
method_compile_spec must use the following format: {"forward": {"inputs": at::Tensor}} OR {"forward": {"inputs": c10::List<at::Tensor>}}
```
nnapi_backend_preprocess.cpp: contains sanity check implementation
test_backend_nnapi.py: sanity check unit tests
Test: Ran `python test/test_jit.py TestNnapiBackend` in OSS successfully.
TODO: Using Tensors to pass input parameters is a temporary hack. When a dedicated object is implemented, update the sanity check error message.
ghstack-source-id: 134339282
Test Plan: Ran `python test/test_jit.py TestNnapiBackend` in OSS successfully.
Reviewed By: raziel, iseeyuan
Differential Revision: D29917004
fbshipit-source-id: 0d5c6b35889c556cda905ffc29c25c5422ae9ee4
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61752
Updated Android NNAPI preprocess, so that it can accept both single Tensor inputs and Tensor List inputs.
- The inputs are not real data, but input parameters for shape, dtype, quantization, and dimorder that are bundled as a Tensor. Comments were updated to make this clearer.
- In the future, preprocess will also accept a dedicated NnapiArg object.
Compile_spec should have the following format:
{"forward": {"inputs": at::Tensor}} OR {"forward": {"inputs": c10::List< at::Tensor >}}
Example input Tensor:
torch.tensor([[1.0, -1.0, 2.0, -2.0]]).unsqueeze(-1).unsqueeze(-1)
### Testing
OSS testing is blocked by https://github.com/pytorch/pytorch/pull/61594. Testing was done locally in D29726948
TODO: Add OSS tests for single Tensor and Tensor List inputs.
ghstack-source-id: 133683735
Test Plan:
OSS testing is blocked by https://github.com/pytorch/pytorch/pull/61594. Testing was done locally in D29726948.
TODO: Add OSS tests for single Tensor and Tensor List inputs.
Reviewed By: iseeyuan
Differential Revision: D29726432
fbshipit-source-id: 08de70578f37681bda365f9776a1c96030257e7a
Summary: For methods returning complex (i.e. container) types, the existing code attempted to pass type designators with unsupported syntax (e.g. `Tensor[]`) into `isinstance`. Will now use the correct syntax supported by TorchScript (i.e. `List[Tensor]`).
Test Plan:
Unfortunately, a backend supporting methods returning container types has not yet been identified so the functionality cannot be tested end-to-end.
Adding a printout of `method_ct.format(method_te)` before https://fburl.com/code/4619d12g lets inspect the difference in the generated method body, e.g.:
```
assert isinstance(_0, List[Tensor])
```
vs
```
assert isinstance(_0, Tensor[])
```
Reviewed By: allwu
Differential Revision: D29537358
fbshipit-source-id: 3356f3c1477aa9304e1f070711f480441579414d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61499
Added a preprocess function for the delegate to Nnapi backend (internal and external files).
In the past we had functions and classes for converting to the Nnapi backend. Now, these functions and classes will be wrapped by the delegate API.
### nnapi_backend_preprocess.cpp:
Contains the preprocess function, which uses Pybind to call an existing python function, `convert_model_to_nnapi()`.
- The model is wrapped by a `RecursiveScriptModule`, so that `convert_model_to_nnapi()` can run correctly, since when jumping from Python to C++ to Python, the model loses its original wrapper.
- A tensor, which includes shape, data type, and quantization information, is passed through preprocess's compile_spec to `convert_model_to_nnapi()`.
- Finally, the Nnapi model is serialized for mobile and returned as a string.
### nnapi_backend_lib.cpp:
Contains stub functions for compile and execute, and is necessary for the Nnapi backend to be registered correctly. These will be implemented in a future PR.
**TODO:** implement execute and compile for the delegate API; throw exceptions for incorrect an compile_spec; add OSS tests
**Testing:** Tests were done locally (see D29647123). A simple module was lowered to Nnapi, saved locally, and examined.
ghstack-source-id: 133415234
Test Plan:
Tests were done locally (see D29647123).
TODO: add test in OSS in test_backends.py after CMake is ready.
I ran buck run caffe2:nnapi_backend_example. The model files are saved as nnapi_model.ptl and mobile_model.ptl. I checked that both zip files have expected contents.
Reviewed By: iseeyuan
Differential Revision: D29563351
fbshipit-source-id: 642e349356e38aecc1b9973c285569650c02668c
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.
I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop
# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
-j \
-s \
-k \
-v \
--paths torch/csrc/ \
-g"-torch/csrc/jit/passes/onnx/helper.cpp" \
-g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
-g"-torch/csrc/jit/serialization/onnx.cpp" \
-g"-torch/csrc/jit/serialization/export.cpp" \
-g"-torch/csrc/jit/serialization/import.cpp" \
-g"-torch/csrc/jit/serialization/import_legacy.cpp" \
-g"-torch/csrc/onnx/init.cpp" \
-g"-torch/csrc/cuda/nccl.*" \
-g"-torch/csrc/cuda/python_nccl.cpp" \
-g"-torch/csrc/autograd/FunctionsManual.cpp" \
-g"-torch/csrc/generic/*.cpp" \
-g"-torch/csrc/jit/codegen/cuda/runtime/*" \
-g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
-g"-torch/csrc/deploy/interpreter/interpreter.h" \
-g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
-g"-torch/csrc/deploy/interpreter/test_main.cpp"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649
Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.
Reviewed By: walterddr, janeyx99
Differential Revision: D29504258
Pulled By: 1ntEgr8
fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58873
BackenDebugInforRecorder
Prior to this PR:
In order to generate debug handles corresponding to the graph being
lowered, backend's preprocess will call generate_debug_handles and will
get map of Node*-to-debug_handles.
In order to facilitate this, to_backend will own
BackendDebugInfoRecorder and initialize thread local pointer to it.
generate_debug_handle function will query thread local pointer to see if
there is a valid BackendDebugInforRecorder for the context. If there is
it will generate debug handles.
After this PR:
Signature of preprocess is changed such that backends have to register
preprocess that accepts instance of BackendDebugInfoRecorder by
reference. generate_debug_handles is no more a free function but becomes
part of the API of BackendDebugInfoRecorder. Now backend's preprocess
function will call generate_debug_handles on BackendDebugInfoRecorder
instead of free function.
Reason for this change:
With RAII that initializes thread local pointer, results in a lose
contract with backends, which may result in backends not storing
debug information. Making it part of API results in
backends having to be aware of BackendDebugInfoRecorder and explicitly
chosing not to generate/store debug information if they chose to do so.
Test Plan:
backend tests
Imported from OSS
Reviewed By: jbschlosser, raziel
Differential Revision: D28648613
fbshipit-source-id: c9b7e7bf0f78e87023ea7bc08612cf893b08cb98
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57441
debug info
Previous diffs did not save operator name in debug info. For delegated
backends that only idenfity op for profiling with debug handle, operator
name should be stores as well.
Furthermore to complete debug informaton also serialize function name.
Test Plan:
Existing lite interpreter and backend tests
Existing lite interpreter and backend tests
Imported from OSS
Differential Revision:
D28144581
D28144581
Reviewed By: raziel
Pulled By: kimishpatel
fbshipit-source-id: 415210f147530a53b444b07f1d6ee699a3570d99
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57156
generate_debug_handles
To be able to generate debug handles for preprocess written inpython.
Test Plan:
CI
CI
Imported from OSS
Differential Revision:
D28062328
D28062328
Reviewed By: raziel
Pulled By: kimishpatel
fbshipit-source-id: 8795d089edc00a292a2221cfe80bbc671468055c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55462
handles and symbolicate exception callstack thrown from backend.
Objective of this diff is to achieve improve error reporting when
exceptions are raised from lowered backend. We would effectively like to
get the same model level stack trace that you would get without having
lowered some module to backend.
For example:
```
class AA(nn.Module):
def forward(self, x, y):
return x + y
class A(nn.Module):
def __init__(...):
self.AA0 = AA()
def forward(self, x, y):
return self.AA0.forward(x, y) + 3
class B(nn.Module):
def forward(self, x):
return x + 2
class C(nn.Module):
def __init__(...):
self.A0 = A()
self.B0 = B()
def forward(self, x, y):
return self.A0.forward(x, y) + self.B0.forward(x)
```
If the we then do C().forward(torch.rand((2,3)), torch.rand(14,2))) we
will likely see error stack like:
```
C++ exception with description "The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "<string>", line 3, in forward
def forward(self, x, y):
return self.A0.forward(x, y) + self.B0.forward(x)
~~~~~~~~~~~~~~~ <--- HERE
File "<string>", line 3, in forward
def forward(self, x, y):
return self.AA0.forward(x, y) + 3
~~~~~~~~~~~~~~~~ <--- HERE
File "<string>", line 3, in forward
def forward(self, x, y):
return x + y
~~~~~ <--- HERE
```
We would like to see the same error stack if we lowered C.A0 to some
backend.
With this diff we get something like:
```
Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).AA0(AA)
Traceback of TorchScript (most recent call last):
File "<string>", line 3, in FunctionName_UNKNOWN
def forward(self, x, y):
return self.A0.forward(x, y) + self.B0.forward(x)
~~~~~~~~~~~~~~~ <--- HERE
File "<string>", line 5, in FunctionName_UNKNOWN
typed_inputs: List[Any] = [x, y, ]
if self.__backend.is_available() :
_0, = self.__backend.execute(self.__handles["forward"], typed_inputs)
~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
assert isinstance(_0, Tensor)
return _0
File "<string>", line 3, in FunctionName_UNKNOWN
def forward(self, x, y):
return self.AA0.forward(x, y) + 3
~~~~~~~~~~~~~~~~ <--- HERE
File "<string>", line 3, in FunctionName_UNKNOWN
def forward(self, x, y):
return x + y
~~~~~ <--- HERE
```
This is achieved in 3 parts:
Part 1:
A. BackendDebugInfoRecorder:
During backend lowering, in `to_backend`, before calling the preprocess
function corresponding to the backend. This will facilitate recording of
debug info (such as source range + inlined callstack) for the lowered module.
B. Instantiate WithBackendDebugInfoRecorder with BackendDebugInfoRecorder.
This initializes thread local pointer to BackendDebugInfoRecorder.
C. generate_debug_handles:
In preprocess function, the backend will call generate_debug_handles
for each method being lowered separately. generate_debug_handles
takes `Graph` of the method being lowered and returns a map
of Node*-to-debug_handles. Backend is responsible for storing debug
handles appropriately so as to raise exception (and later profiling)
using debug handles when the exception being raised corresponds to
particular Node that was lowered.
Inside generate_debug_handles, we will query the current
BackendDebugHandleInfoRecorder, that is issuing debug handles. This debug
handle manager will issue debug handles as well as record
debug_handles-to-<source range, inlined callstack> map.
D. Back in `to_backend`, once the preprocess function is has finished
lowering the module, we will call `stopRecord` on
BackendDebugInfoRecorder. This will return the debug info map. This
debug info is then stored inside the lowered module.
Part 2:
Serialization:
During serialization for bytecode (lite interpreter), we will do two
things:
1. Extract all the source ranges that are contained inside
debug_handles-to-<source range, inlined callstack> map for lowered
module. This will be source range corresponding to debug handles,
including what is there is inlined callstack. Since we replaced original
module with lowered module, we wont be serializing code for the original
module and thus no source range. That is why the source range will have
to be stored separately. We will lump all the source ranges for all the
lowered modules in one single debug_pkl file.
2. Then we will serialize debug_handles-to-<source range, inlined
callstack> map.
Now during deserialization we will be able to reconstruct
debug_handles-to-<source range, inlined callstack> map. Given all
debug_handles are unique we would not need any module information.
Test Plan:
Tests are added in test_backend.cpp
Tests are added in test_backend.cpp
Imported from OSS
Differential Revision:
D27621330
D27621330
Reviewed By: raziel
Pulled By: kimishpatel
fbshipit-source-id: 0650ec68cda0df0a945864658cab226a97ba1890
Summary:
- dump final graph in glow
- print operator stats via to_glow API
- 1) node stats for final glow graph
- 2) operator stats in TorchGlowBackend for torch::jit::graph to lower
Reviewed By: khabinov
Differential Revision: D28444501
fbshipit-source-id: 743755c320071edc4c045ad004adeb16b4a9c323
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55062
This diff introduces the following changes:
1. InlinedCallStack pickler/serializer is introduced. It is serialized
as a tuple of {module_instance_info, source range tag, callee:InlinedCallStack}
Module instance info is serialized as tuple of {class_type_name,
instance_name}.
Note that callee of the serialized inlined callstack points to the tuple
of already serialized callstack. This means the first callstack ptr to
serialize, will serialize entire path of the tree, where some callee
nodes might be shared with callstack pointers that will be serialized
subsequently. Pickler supports memoization of pickled objects, where if
a tuple has been serialized then object id is obtained instead of
serialized object again. Thus we stll serialize the tree and not every
path from the root separately. Furthermore, InlinedCallStackSerializer
also uses cache to lookup the pointer and return the serialized IValue.
Furthermore, note that we must also serialize the source range of
InlinedCallStack. In order to this serializer requires map of
source-range-tags-to-source-range map. This was done in the previous
diff, where as part of source range serialization we also generate
unique tags. These are the tags that are serialized in InlinedCallStack.
Thus during deserialization we would have to deserialize source range
before deserializing InlinedCallStacks.
2. Furthermore, each serialized InlinedCallStack is serialized with a
unique debug_handle and source range tag.
BackendDebugHandleManager manages generation of
unique debug handles and saves the map of
debug-handles-to-{source_range_tag, inlined-callstack-ptr}.
This map is then serialized as callstack_debug_map.pkl. Note that
inlined callstack is not sufficient to get all the source information
since it contains source information about the nodes which are inlined.
The top-of-the-stack (or bottom) node, which is the actual op node, is
not part of the inlined callstack pointer and thus the source range of
this node is serialized separately using source_range_tag. This is
similar to how JIT creates callstack in
torch/csrc/jit/runtime/interpreter.cpp
Unique debug handles facilitates exception throwing or profiling using
just the debug handle without any further qualifications, such as which
function or module the inlined-callstack belongs to.
Furthermore, this diff refactors the old mobile code for tracking
module hierarchy information per op. Mainly now bytecode serialization
will serialize debug handles corresponding to ops/nodes in graph and
have callstack_debug_map.pkl help generate:
1. Entire callstack and
2. Module hierarchy information.
Test Plan:
python test/mobile/test_lite_script_module.py TestLiteScriptModule
./build/bin/test_jit --gtest_filter=*ModuleInfo
Imported from OSS
Reviewed By: raziel
Differential Revision: D27468709
fbshipit-source-id: 53e2413e7703ead01c77718b7c333c7c6ff50a23
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53068
Adds a ```bool is_available()``` method to the backend contract: it returns ```true``` if ```compile()``` and ```execute()``` can be called; ```false``` otherwise.
It is used to implement the following changes in the ```LoweredModule```:
* ```compile()``` in ```__setstate__``` will run if ```is_available()```, else ```__setstate__``` throws an exception (“Backend not available.”).
* ```compile()``` at ```LoweredModule``` creation will run if ```is_available()```, else a WARNING will be thrown.
* ```execute()``` will only be executed if ```is_available()``` returns true; else throws an exception (“Backend not available.”).
The goal of these changes is to ensure we have a well defined behaviour for the different combinations of backend availability on-host and on-target.
More specifically, backends may have different capabilities to compile and/or execute the Module, depending whether this happens on-host (i.e. where the program is being written) or on-target (where the program is being executed).
First of all, we know that "preprocess" always takes place, and that only happens on-host at creation time. So, we can assume that any compilation is needed/possible on-host then all of it could be pushed here.
Overall, we want to ensure the following:
**On host**
| compile | execute | Outcome |
| -- | -- | -- |
| No | No | On module creation, LoweredModule is generated, with a warning (since compilation and execution can still take place on-target). On module load, throws an exception (since execution is not possible). |
| No | Yes | This configuration should not be possible. This assumes the full compiler is not available, even if some work was done in preprocess the program cannot be finalized for execution. |
| Yes | No | In this case, the expectation would be for is_available() to return false, and compilation logic to move into preprocess. |
| Yes | Yes | All good. This is the only case that is_available() should return true. |
**On target**
| compile | execute | Outcome |
| -- | -- | -- |
| No | No | Loading the LoweredModule throws an exception. Since execution is not possible. |
| No | Yes | Basically this is another instance of Yes/Yes: compilation per se may not be possible on device, which means compile() can be called without issue but it is a no-op, and thus is_available should return true. Consequently, loading the LoweredModule: Succeeds, if the preprocessed module is ready for execution. Fails with exception otherwise. |
| Yes | No | This configuration should not be possible. Just putting here for completeness. |
| Yes | Yes | All good. This, along with No/Yes case (because compilation is assumed to have happened on-host, so it's just another instance of Yes/Yes), are the cases where is_available() should return true. |
**Refactoring existing code**
This change also updates other backends (Glow) code, to implement the is_available() method to have the same behaviour as before this change (i.e. always available).
This should not cause backward incompatibilities with already saved models since we're adding a new method to the PyTorchBackendInterface.
Models saved with the old interface that didn't have is_available() will still find the other 2 methods in the bound object (i.e. compile and execute), and the saved LoweredModule logic will be the old one.
**Future**
We plan to use is_available() to implement support for fallback to the PyTorch interpreter.
ghstack-source-id: 123498571
Test Plan: Added C++ (test_backend.cpp) and Python (test_backends.py) tests to validate the exceptions.
Reviewed By: jackm321, spaugh, iseeyuan
Differential Revision: D26615833
fbshipit-source-id: 562e8b11db25784348b5f86bbc4179aedf15e0d3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52258
Removes deprecated preprocess method from the backend interface.
Preprocessing logic should be now registered along with the backend interface (i.e. PyTorchBackendInterface) via the BackendPreprocessFunction.
Also refactored internal dependencies.
ghstack-source-id: 121704837
Test Plan:
Validates all related tests pass:
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - BackendTest.ToBackend'
python test/test_jit.py TestBackends
===== Glow
buck test mode/dev //glow/fb/torch_glow/tests:TorchGlowBackendTests
buck test mode/dev //glow/fb/torch_glow/tests:torch_glow_backend_tests
Reviewed By: jackm321
Differential Revision: D26443479
fbshipit-source-id: afdc51ae619ced293d10c7a6a12f3530e4c4e53c
Summary:
This is a re-land off https://github.com/pytorch/pytorch/pull/51797 with fix for spurious libcuda dependency
Fix limits the scope of `no-as-needed` linker flag to just `jitbackend_test`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52340
Reviewed By: agolynski, iseeyuan
Differential Revision: D26476168
Pulled By: malfet
fbshipit-source-id: f909428af82182b3bffd020ca18cca7a9b5846b6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51797
The C++ API, ```codegen_backend_module``` is added to ```to_<backend>```. Python related stuffs are decoupled in this function. It can be used from both C++ and python.
* Tests
Python: The existing ```test_backends.py```, which calls the C++ API under the hood.
C++: The end-to-end test of ```jit.BackendTest.ToBackend``` is added in ```test_backend.cpp```. The original class definitions in this file is moved to ```test_backend_lib.cpp```
ghstack-source-id: 121687464
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: raziel
Differential Revision: D26280518
fbshipit-source-id: fd466e4b448847ce64010a3297fff0b5760c5280
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51757
Enables backend preprocessing to take place outside of the backend interface.
What's new:
* A new definition for backend preprocessing (i.e. BackendPreprocessFunction).
* Registration of the backend's PyTorchBackendInterface interface implementation is augmented to take the BackendPreprocessFunction.
* A new registry is created to handle the BackendPreprocessFunction functions, using the backend's name as key.
* When a BackendPreprocessFunction is used, the PyTorchBackendInterface's "preprocess" method is not added to the LoweredModule. Instead, the BackendPreprocessFunction is called and its output used to set the LoweredModule's __processed_module.
Why?:
These changes are needed to avoid forcing backend preprocessing to be part of the LoweredModule, and in the future be able to eliminate "preprocess" from the PyTorchBackendInterface.
This is important for Mobile use cases where "preprocess" can take the bulk of the compilation process, and thus contain code dependencies that we do not want to bring (or cannot bring) to the Mobile binary.
What didn't change:
* Everything is backwards compatible:
** The existing "preprocess" method in PyTorchBackendInterface is still there.
** When backend registration is done without the BackendPreprocessFunction, as before, things work the same way: "preprocess" is added to LoweredModule, and invoked through the module's instance of the backend interface.
Longer term, the plan is to refactor existing users to move to the new backend registration.
ghstack-source-id: 121190883
Test Plan:
Updated existing tests (test_backend.py) to use the new registration mechanism.
Verified test ran and passed (in my OSS build).
Reviewed By: iseeyuan
Differential Revision: D26261042
fbshipit-source-id: 0dc378acd5f2ab60fcdc01f7373616d1db961e61
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49138
See for details: https://fb.quip.com/QRtJAin66lPN
We need to model optional types explicitly, mostly for schema inference. So we cannot pass a `Tensor?[]` as `ArrayRef<Tensor>`, instead we need to pass it as an optional type. This PR changes it to `torch::List<c10::optional<Tensor>>`. It also makes the ops c10-full that were blocked by this.
## Backwards Compatibility
- This should not break the Python API because the representation in Python is the same and python_arg_parser just transforms the python list into a `List<optional<Tensor>>` instead of into a `List<Tensor>`.
- This should not break serialized models because there's some logic that allows loading a serialized `List<Tensor>` as `List<optional<Tensor>>`, see https://github.com/pytorch/pytorch/pull/49138/files#diff-9315f5dd045f47114c677174dcaa2f982721233eee1aa19068a42ff3ef775315R57
- This will break backwards compatibility for the C++ API. There is no implicit conversion from `ArrayRef<Tensor>` (which was the old argument type) to `List<optional<Tensor>>`. One common call pattern is `tensor.index({indices_tensor})`, where indices_tensor is another `Tensor`, and that will continue working because the `{}` initializer_list constructor for `List<optional<Tensor>>` can take `Tensor` elements that are implicitly converted to `optional<Tensor>`, but another common call pattern was `tensor.index(indices_tensor)`, where previously, the `Tensor` got implicitly converted to an `ArrayRef<Tensor>`, and to implicitly convert `Tensor -> optional<Tensor> -> List<optional<Tensor>>` would be two implicit conversions. C++ doesn't allow chaining. two implicit conversions. So those call sites have to be rewritten to `tensor.index({indices_tensor})`.
ghstack-source-id: 119269131
Test Plan:
## Benchmarks (C++ instruction counts):
### Forward
#### Script
```py
from torch.utils.benchmark import Timer
counts = Timer(
stmt="""
auto t = {{op call to measure}};
""",
setup="""
using namespace torch::indexing;
auto x = torch::ones({4, 4, 4});
""",
language="cpp",
).collect_callgrind(number=1_000)
print(counts)
```
#### Results
| Op call |before |after |delta | |
|------------------------------------------------------------------------|---------|--------|-------|------|
|x[0] = 1 |11566015 |11566015|0 |0.00% |
|x.index({0}) |6807019 |6801019 |-6000 |-0.09%|
|x.index({0, 0}) |13529019 |13557019|28000 |0.21% |
|x.index({0, 0, 0}) |10677004 |10692004|15000 |0.14% |
|x.index({"..."}) |5512015 |5506015 |-6000 |-0.11%|
|x.index({Slice(None, None, None)}) |6866016 |6936016 |70000 |1.02% |
|x.index({None}) |8554015 |8548015 |-6000 |-0.07%|
|x.index({false}) |22400000 |22744000|344000 |1.54% |
|x.index({true}) |27624088 |27264393|-359695|-1.30%|
|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})})|123472000|123463306|-8694|-0.01%|
### Autograd
#### Script
```py
from torch.utils.benchmark import Timer
counts = Timer(
stmt="""
auto t = {{op call to measure}};
""",
setup="""
using namespace torch::indexing;
auto x = torch::ones({4, 4, 4}, torch::requires_grad());
""",
language="cpp",
).collect_callgrind(number=1_000)
print(counts)
```
Note: the script measures the **forward** path of an op call with autograd enabled (i.e. calls into VariableType). It does not measure the backward path.
#### Results
| Op call |before |after |delta | |
|------------------------------------------------------------------------|---------|--------|-------|------|
|x.index({0}) |14839019|14833019|-6000| 0.00% |
|x.index({0, 0}) |28342019|28370019|28000| 0.00% |
|x.index({0, 0, 0}) |24434004|24449004|15000| 0.00% |
|x.index({"..."}) |12773015|12767015|-6000| 0.00% |
|x.index({Slice(None, None, None)}) |14837016|14907016|70000| 0.47% |
|x.index({None}) |15926015|15920015|-6000| 0.00% |
|x.index({false}) |36958000|37477000|519000| 1.40% |
|x.index({true}) |41971408|42426094|454686| 1.08% |
|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})}) |168184392|164545682|-3638710| -2.16% |
Reviewed By: bhosmer
Differential Revision: D25454632
fbshipit-source-id: 28ab0cffbbdbdff1c40b4130ca62ee72f981b76d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43613
**Summary**
This commit adds a helper/utility to faciliate the selective lowering of
specific submodules within a module hierarchy to a JIT backend. The reason
that this is needed is that lowering a submodule of a scripted
module to a backend after the module has been scripted requires
adjusting its JIT type.
**Test Plan**
This commit refactors `NestedModuleTest` in `jit/test_backends.py` to
use this new selective lowering API.
**Fixes**
This commit fixes ##41432.
Test Plan: Imported from OSS
Reviewed By: mortzur
Differential Revision: D23339855
Pulled By: SplitInfinity
fbshipit-source-id: d9e69aa502febbe04fd41558c70d219729252be9
Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/5029
Support single element tuples in to_backend
Test Plan: new unit test for to_glow
Reviewed By: andrewmillspaugh
Differential Revision: D24539869
fbshipit-source-id: fb385a7448167b2b948e70f6af081bcf78f338dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43612
**Summary**
This commit modifies the `torch._C._jit_to_backend` function so that it
accepts `ScriptModules` as inputs. It already returns `ScriptModules`
(as opposed to C++ modules), so this makes sense and makes the API more
intuitive.
**Test Plan**
Continuous integration, which includes unit tests and out-of-tree tests
for custom backends.
**Fixes**
This commit fixes#41432.
Test Plan: Imported from OSS
Reviewed By: suo, jamesr66a
Differential Revision: D23339854
Pulled By: SplitInfinity
fbshipit-source-id: 08ecef729c4e1e6bddf3f483276947fc3559ea88
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41146
**Summary**
This commit adds support for using `Modules` that have been lowered as
submodules in `ScriptModules`.
**Test Plan**
This commit adds execution and save/load tests to test_backends.py for
backend-lowered submodules.
**Fixes**
This commit fixes#40069.
Test Plan: Imported from OSS
Reviewed By: ailzhang
Differential Revision: D22459543
Pulled By: SplitInfinity
fbshipit-source-id: 02e0c0ccdce26c671ade30a34aca3e99bcdc5ba7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40841
**Summary**
This commit adds support for using `Modules` that have been lowered as
submodules in `ScriptModules`.
**Test Plan**
This commit adds execution and save/load tests to test_backends.py for
backend-lowered submodules.
**Fixes**
This commit fixes#40069.
Test Plan: Imported from OSS
Differential Revision: D22418716
Pulled By: SplitInfinity
fbshipit-source-id: d2b2c6d5d2cf3042a620b3bde7d494f1abe28dc1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40840
**Summary**
This commit moves the TestBackend used for the JIT backend
extension to the tests directory. It was temporarily placed
in the source directory while figuring out some details of
the user experience for this feature.
**Test Plan**
`python test/test_jit.py TestBackends`
**Fixes**
This commit fixes#40067.
Test Plan: Imported from OSS
Differential Revision: D22418682
Pulled By: SplitInfinity
fbshipit-source-id: 9356af1341ec4d552a41c2a8929b327bc8b56057
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40839
**Summary**
This commit splits the to_backend API properly into
`libtorch` and `libtorch_python`. The backend interface and all
of the code needed to run a graph on a backend is in
libtorch, and all of the code related to creating a Python binding
for the lowering process is in `libtorch_python`.
**Test Plan**
`python test/test_jit.py TestBackends`
**Fixes**
This commit fixes#40072.
Test Plan: Imported from OSS
Differential Revision: D22418664
Pulled By: SplitInfinity
fbshipit-source-id: b96e0c34ab84e45dff0df68b8409ded57a55ab25
Summary:
**Summary**
This commit adds a registry for storing lowering functions for backends.
Instead of backends registering these lowering functions in separate C
extension modules, these will be registered in the Torch extension.
Backends are registered statically, so a registry is needed to hold
these lowering functions until Python bindings are created.
**Test Plan**
`python test/test_jit.py TestBackends`
```
Couldn't download test skip set, leaving all tests enabled...
..
----------------------------------------------------------------------
Ran 2 tests in 0.104s
OK
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39552
Reviewed By: mortzur
Differential Revision: D22033855
Pulled By: SplitInfinity
fbshipit-source-id: 05abf152274e5e51c37b6004886ea25bd4d33b80
Summary:
**Summary**
This commit adds support for seralization and deserialization of
`ScriptModules` that have been lowered to a specific backend. Nothing
special was required to accomplish this, other than removing some code
in `unpickler.cpp` that guarded against the deserialization of `Any`
type objects. Now that lists and dicts are tagged with their types
during serialization, this check is no longer necessary.
**Test Plan**
This commit adds a unit test for testing that a lowered module still
produces the same results as Python and regular JIT after saving and
loading.
**Fixes**
This pull request fixes part of https://github.com/pytorch/pytorch/issues/37841.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38893
Differential Revision: D21825813
Pulled By: SplitInfinity
fbshipit-source-id: 77a7b84504e0dddf14c89b3ed5dd6b438c086f66
Summary:
**Summary**
This commit gets rid of the separate compilation unit that is currently
being created for every backend-specific module generated by
`jit::backend::generateToBackendFn` and mangles the name properly to
allow multiple backend-specific modules to coexist in the same
compilation unit.
**Test Plan**
`python test/test_jit.py TestBackends`
**Fixes**
This pull request fixes part of https://github.com/pytorch/pytorch/issues/37841.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38679
Differential Revision: D21744620
Pulled By: SplitInfinity
fbshipit-source-id: ac85b8ce0d179c057991e9299fd53a4e13ba02a9
Summary:
**Summary**
This commit adjusts the `pybind` includes in `backend.h` so
that we can avoid exporting some unrelated headers during install (which
probably shouldn't be exposed anyway). In addition, the headers that this commit
removes are not used.
**Test Plan**
Continuous integration (includes tests for JIT backends).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38562
Differential Revision: D21601694
Pulled By: SplitInfinity
fbshipit-source-id: c8f8103d24cb4f10d9eb6b3657eed75878078945
Summary:
**Summary**
This commit adds `torch::jit::RegisterBackend`, an API that allows
external backends to be registered for the execution of JIT subgraphs
outside the JIT interpreter. In order to register an external backend,
one must extend the provided abstract class `PyTorchBackendInterface` and provide
two additional functions: one that creates an instance of the aforementioned subclass
of `PyTorchBackendInterface`, and another that preprocesses a `ScriptModule` so that
it can run on the backend. Then, a `ScriptModule` that can compile and execute a given
JIT subgraph using the functions provided at registration time is generated
for each registered backend.
**Testing**
This commit adds a unit test that uses a minimal test backend
to make sure that the registration endpoint and generated
`ScriptModule` work.
```
$ python test/test_jit.py TestBackends
Fail to import hypothesis in common_utils, tests are not derandomized
.
----------------------------------------------------------------------
Ran 1 test in 0.183s
OK
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35833
Differential Revision: D21231955
Pulled By: SplitInfinity
fbshipit-source-id: 452db1123d0e5d83f97fe5da8a00fdfdb50dbef9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33806
as title
Test Plan: Imported from OSS
Differential Revision: D20122117
Pulled By: suo
fbshipit-source-id: 209d29ed2c873181140c9fb5cdc305c200ce4008