Summary:
This is to move us along the path to removing Type from the public API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12355
Reviewed By: ezyang
Differential Revision: D10212616
Pulled By: gchanan
fbshipit-source-id: c9cd128d1111ab219cb0b2f3bf5b632502ab97c0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12103
This defers lookup of defaults to the site where we read
out of TensorOptions. THIS IS A BC-BREAKING BEHAVIOR CHANGE,
but we expect the bulk of uses of OptionsGuard don't allocate TensorOptions
inside the OptionsGuard region, and then use it outside of the region
(the situation where behavior could change.)
I also optimize the size of TensorOptions by rearranging fields, so that we
always fit in two 64-bit words.
Reviewed By: goldsborough
Differential Revision: D10052523
fbshipit-source-id: f454a15b4dbf8cd17bc902ab7d2016f2f689ed13
Summary:
* Replaces `prim::PythonOp` with the name of the function being called
* Delays printing values used in `prim::Return` nodes until the return
node itself if that is the only place the value is used to remove some
useless assigns
zdevito apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12179
Differential Revision: D10132661
Pulled By: driazati
fbshipit-source-id: cbc4ac34137ed5872049082e25d19eb1ebc71208
Summary:
At long last, we will have clang-tidy enabled in CI. For a while I thought I could clean up the project enough to enable clang-tidy with all checks enabled, but I figure it's smarter to set up the minimal checks and at least have those in CI. We can fix more going forward.
ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12213
Differential Revision: D10183069
Pulled By: goldsborough
fbshipit-source-id: 7ecd2d368258f46efe23a2449c0a206d10f3a769
Summary:
If a PythonOp throws an error it raises an exception to the interpreter and also releases the GIL which causes [pybind to segfault](https://github.com/potassco/clingo/issues/42)
This fix catches pybind errors while the GIL is still held and throws a `python_error` to re-capture the GIL
Fixes#12118
apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12243
Differential Revision: D10182787
Pulled By: driazati
fbshipit-source-id: 719d4a7c3294af201e061cf7141bec3ca0fb1f04
Summary:
This PR adds a bool type to `IValue` and puts it into place.
* changes conds for `prim::If` and `prim::Loop` to use `bool` type
* changes operators that take `bool`s to match their native ops
* fixes ambiguous `aten` ops `aten::std` and `aten::var`
* fixes tests in `test_jit.py TestJitGenerated`
```
'test_std_dim',
'test_std_dim_1d',
'test_std_dim_1d_neg0',
'test_std_dim_neg0',
'test_var_dim',
'test_var_dim_1d',
'test_var_dim_1d_neg0',
'test_var_dim_neg0'
```
* adds `prim::BoolToTensor` and `prim::TensorToBool`
apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11834
Differential Revision: D9928570
Pulled By: driazati
fbshipit-source-id: 373c53df2f1a8ffa9e33d9a517002fbeef25f3eb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12122
We are deprecating support for c extensions. Please use cpp extension in the future.
Reviewed By: Yangqing
Differential Revision: D10060541
fbshipit-source-id: 4f7149e06a254bd7af463fd7aa9740f65369963a
Summary:
Allow trailing commas in assign statements or tuples, which also allows single element tuples.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11723
Differential Revision: D10052162
Pulled By: eellison
fbshipit-source-id: 344d908a3ad942a23ebd9f341794bc9734226aa8
Summary:
- fix https://github.com/pytorch/pytorch/issues/12120
- add `torch.argsort`, `torch.pdist`, `broadcast_tensors` to *.rst files
- add parameter dim to `torch.unique` doc
- fix table and args for `torch.norm`
- test plan: make html and check docs in browser
gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12126
Differential Revision: D10087006
Pulled By: weiyangfb
fbshipit-source-id: 25f65c43d14e02140d0da988d8742c7ade3d8cc9
Summary:
Currently when tracing optimizations are performed twice. This means that optimizing passes, like the fusion pass, are also called twice. This is unnecessary and this PR turns off optimizations when checking the trace (since the trace is independent of optimizations). This should improve performance and debugging.
apaszke who proposed this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12172
Reviewed By: ezyang
Differential Revision: D10109250
Pulled By: apaszke
fbshipit-source-id: 8b3385eae143446820f1b61ca7576d7c07f9b248
Summary:
In current upstream float scalars are always written into kernels with:
`out << std::scientific << v << "f";`
When the floats are special values like NaN, +inf, or -inf this produces nonsense that causes compilation to fail. This fix updates the conversion of float scalars to device-specific special values. The appropriate macros are added to the CPU and CUDA resource strings. Note that a NAN macro was not necessary on the CPU since math.h defines NAN.
To verify this fix I updated the test_clamp_fusion test in test_jit.py. I wanted to test -inf, too, but -inf is not currently accepted by the interpreter.
Edit:
Forgot to mention, this partially addresses issue #12067.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12070
Reviewed By: ezyang
Differential Revision: D10044704
Pulled By: soumith
fbshipit-source-id: 8f4a930862d66a7d37d985e3f6a6fb724579e74c
Summary:
This functionality replaces the Scalar-Tensor builtin operators,
with builtin functions.
Builtin functions are used in place of operators where one operator
can be defined using a composition of another. This simplifies later
optimization passes by allowing us to have fewer operator.
In the future, builtin functions can be used for other purposes.
For example, we can define derivative functions as code rather than
building graphs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12141
Reviewed By: ezyang
Differential Revision: D10088065
Pulled By: zdevito
fbshipit-source-id: a2acb06346e649c4c8a2fe423b420871161c21cf
Summary:
This fixes a broadcasting error with the `StudentT` distribution
- [x] added a regression test
- [x] strengthened parameter broadcasting tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12148
Differential Revision: D10099226
Pulled By: soumith
fbshipit-source-id: 0c5eb14180d158f8fff28ceb9e7cd3471c2bb803
Summary:
This PR replaces the use of `std::FILE` with `istream`/`ostream` for JIT serialization.
It uses this mechanism to add the possibility to serialize to/from binary buffers, in addition to files, both in `libtorch` and from Python.
`getExportImportCopy` in `test_jit.py` has been updated so that both file and buffer codepaths are exercised during tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11932
Differential Revision: D10084303
Pulled By: apaszke
fbshipit-source-id: b850801b3932922fa1dbac6fdaed5063d58bc20d
Summary:
This PR implements the design that we discussed. Changes:
- Added a World token IValue and type. The IValue is basically a dummy struct for now, in the future we may extend it (say, add thread-local state).
- Effectful ops explicitly declare they are mutable by having World tokens as inputs and outputs in their schema.
- Purely functional ops that use mutable values will get "fenced" and the world token will be threaded through the fences
- AnnotateEffects pass which wires up all the world tokens together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10700
Reviewed By: eellison
Differential Revision: D9547881
Pulled By: michaelsuo
fbshipit-source-id: ebbd786c31f15bf45e2ddb0c188438ff2f5f3c88
Summary:
Previously, doRead/doWrite were functions that could return partial reads/writes,
and we checked for this case inconsistently in the call sites of serialization.cpp.
Now, these functions do NOT return the amount of bytes read/written, and instead
handle the necessary checking loop themselves.
Fixes#12042. Maybe.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12143
Differential Revision: D10097027
Pulled By: ezyang
fbshipit-source-id: fd222ab8a825bed352153648ad396acfe124a3e1
Summary:
reduce sum negative indices turn to positive as caffe2 not supporting it. GE/LE symbolic operand order is wrong..
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12123
Reviewed By: houseroad
Differential Revision: D10095467
Pulled By: wanchaol
fbshipit-source-id: eb20248de5531c25040ee68b89bd18743498138d
Summary:
As reported in Issue #10726, the jit compiler, when running on ppc64le, may produce an isomorphic output but fail a diff test against the expected output file. The expected output file is created from a test that was ran on x86_64. This ensures that if ppc64le test output is different, the output is instead compared to an expected output file created when the test is run on a ppc64le system.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11579
Differential Revision: D10080890
Pulled By: soumith
fbshipit-source-id: 7249bf6b5dfa7c853368a3688a982bc9ed642bc9
Summary:
Apply weight decay for Adam in-place instead of via copy.
Synced offline with soumith , who mentioned that it should be OK. This is also consistent with other optimizers, e.g. eee01731a5/torch/optim/sgd.py (L93)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12107
Reviewed By: soumith
Differential Revision: D10071787
Pulled By: jma127
fbshipit-source-id: 5fd7939c79039693b225c44c4c80450923b8d673
Summary:
We generate specialized list operations for int, float, and Tensor lists so that small lists of integers like the arguments to conv do not involve tons of boxing code.
This PR adds a fallback GenericList for List types that contain any other type. It does so by adding type variables to `jit::Type`, and machinery for matching/replacing the type variables during `tryMatchSchema` and operator lookup.
It also modifies the builtin list ops to include a fallback that works on a GenericList object that simply holds IValues. This is distinguished from IValue's tuple type so that conversion to/from Python still happens losslessly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12040
Differential Revision: D10037098
Pulled By: zdevito
fbshipit-source-id: 0c5f2864d12e7d33554bf34cc29e5fb700dde150
Summary:
Couple questions:
1) I used the log1p implementation in #8969 as a guide especially for testing. I'm not sure what the ```skipIfROCM``` annotation is for, so unsure if i need it for my test.
2) I implemented the branching logic in the narrow function itself; is this the right place to do so? I noticed that there a number of places where sparse-specific logic is handled with just an if statement in this file. Or should I implement a separate dispatch in native_functions.yml as in the log1p?
And of course, happy to make any any other updates/changes that I may have missed as well. This is my first PR to the project.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11342
Differential Revision: D9978430
Pulled By: weiyangfb
fbshipit-source-id: e73dc20302ab58925afb19e609e31f4a38c634ad
Summary:
The earlier tests had around 80 warnings, and now there are 6 warnings: these are due to JIT
The changes remove the wrapping of a Tensor by a Tensor constructor, which emits warnings due to the changes in https://github.com/pytorch/pytorch/pull/11061 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12038
Differential Revision: D10033392
Pulled By: apaszke
fbshipit-source-id: b1faf368e650d062d7983f9932511bee4702a893
Summary:
This unifies our versions across setup.py, libtorch, and libcaffe2. CMake has a default version (bumped to 1.0.0) that can be overridden by setup.py. The versions are also printed as a part of cmake/Summary.cmake to make sure they are correct.
cc Yangqing ezyang soumith goldsborough pjh5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12053
Differential Revision: D10041878
Pulled By: orionr
fbshipit-source-id: a98a01771f6c008d1016ab63ab785c3a88c3ddb0
Summary:
This PR does a few things:
Previously test_jit.py only tested autograd on backward graphs.
This is because we borrow from test_autograd and construct graphs with a small
number of nodes. Because the number of nodes is small (typically 1-2), those graph
do not end up containing autodiff subgraphs, so autodiff never gets tested.
This PR enables autodiff testing by doing the following:
- added disableDebugAutodiffSubgraphInlining fn to graph_executor to disable
autodiff subgraph inlining.
- (implementation) added autodiffSubgraphNodeThreshold and autodiffSubgraphInlineThreshold.
These are set to their default values (2, 5) but disableDebugAutodiffSubgraphInlining()
sets both to 1, disabling subgraph inlining and allowing 1-node autodiff subgraphs.
- The relevant backward jit tests disable autodiff subgraph inlining so they
will test the autodiff versions of the operators instead of autograd whenever
an autodiff variant exists.
- We don't run the tests that do inline autodiff subgraphs anymore.
This has no impact on testing correctness because the assumption is
that autograd functions are correct and are tested in test_autograd.py
This allows the graph fuser to work better because a lot of these ops were previously not autodiff-compatible but fusible. On a more concrete example, lstm backward contains a lot of tensor-scalar operations; these autodiff formulas help its double backward pass.
Included:
- arithmetic overloads
- abs, acos, asin, atan, ceil, cos, cosh, exp, expm1, floor, fmod, frac, log, log10, log1p, log2 reciprocal, remainder, round, sin, sinh, tan, trunc, rsqrt
TestJitGenerated tests autodiff for all of the added operations.
cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11832
Differential Revision: D10031256
Pulled By: zou3519
fbshipit-source-id: 9daf9900a5ad187743609cd0fbbd10b15411ad93
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12034
We need ATen and Caffe2 to line up, and the rule is
that if you have any private/protected members, you
should declare it as a class. Class we go.
(There are some other obvious candidates for this treatment,
but I've kept this patch just to Tensor)
Reviewed By: gchanan, mingzhe09088
Differential Revision: D10024467
fbshipit-source-id: 17cfe2741ba9c3f56cb87d6f5d1afd3c61a8e4fe
Summary:
This makes a few changes wrt Type, with the ultimate goal of removing Type from the public Methods/Functions. In particular:
1) Removes factory functions from Type, into TypeExtendedInterface.
2) sparse_coo_tensor is now a first class at:: namespace function, with TensorOptions overloads.
3) We move from Type-based sparse_coo_tensor dispatch to function-based.
Note we still require a number of changes to get rid of tType in the public interface, in particular TensorOptions needs to support CUDA vs non-CUDA dispatch. That is coming in a future patch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12025
Reviewed By: ezyang
Differential Revision: D10017205
Pulled By: gchanan
fbshipit-source-id: 00807a37b09ed33f0656aaa165bb925abb026320
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12035
This brings it in line with Caffe2's naming
Reviewed By: mingzhe09088
Differential Revision: D10024485
fbshipit-source-id: a6feef82a56b5eb3043b0821ea802ba746e542a0
Summary:
As pytuple should be a constant type (since obj is constant), potential errors would occur without
this const decorator, e.g., when compiling against PyPy. Although PyPy is not supported yet, it
would still be useful if we remove this compilation issue (out of very few numbers of compilation
issues) to allow hackers playing with them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11857
Differential Revision: D10024149
Pulled By: soumith
fbshipit-source-id: aa7e08e58f6369233a11477113351dccd3854ba8
Summary:
The MPI async work class returned a temporary as reference, which is
invalid (hat tip to colesbury for noticing it). This change fixes that and
uses a std::exception_ptr to hold on to the exception if applicable, and
then returns the reference by throwing it and returning it, like the
existing code path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11947
Differential Revision: D10019928
Pulled By: pietern
fbshipit-source-id: 5a8ed0e894615a09224ca5e48c8b3104275a3019
Summary:
Because we emit a lot of them in our symbolic AD. This brings down the backward time of an LSTM I'm testing from 14.2ms to 12.5ms (a 15% improvement).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11801
Differential Revision: D9916815
Pulled By: apaszke
fbshipit-source-id: 2d9cb886c424ccd43b9f996aad89950d3bddf494
Summary:
Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*.
Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498
Why is this useful?
1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed.
2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API.
3. The C++ extension API simply becomes richer by gaining access to the C++ API headers.
soumith ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510
Reviewed By: ezyang
Differential Revision: D9998835
Pulled By: goldsborough
fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
Summary:
Or even taking them as inputs. This prevents optimizations to happen
either inside the differentiable subgraphs, or in the surrounding graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11809
Differential Revision: D10009680
Pulled By: apaszke
fbshipit-source-id: face638566228e470a6deec48dc2aa3a1cce26d4
Summary:
This PR is a large codemod to rewrite all C++ API tests with GoogleTest (gtest) instead of Catch.
You can largely trust me to have correctly code-modded the tests, so it's not required to review every of the 2000+ changed lines. However, additional things I changed were:
1. Moved the cmake parts for these tests into their own `CMakeLists.txt` under `test/cpp/api` and calling `add_subdirectory` from `torch/CMakeLists.txt`
2. Fixing DataParallel tests which weren't being compiled because `USE_CUDA` wasn't correctly being set at all.
3. Updated README
ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11953
Differential Revision: D9998883
Pulled By: goldsborough
fbshipit-source-id: affe3f320b0ca63e7e0019926a59076bb943db80
Summary:
- fixes https://github.com/pytorch/pytorch/issues/10723
- migrate PReLU to ATen and deprecate legacy PReLU
- performance:
CPU with weight.numel() = 1
```
>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
100 loops, best of 100: 9.43 ms per loop
>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
10 loops, best of 100: 24.4 ms per loop
>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 695 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.47 ms per loop
```
CPU with weight.numel() = channels
```
>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 603 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 13.3 ms per loop
>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 655 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.45 ms per loop
```
CUDA with weight.numel() = 1
```
>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 187 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.01 ms per loop
>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 195 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.28 ms per loop
```
CUDA with weight.numel() = channel
```
>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 174 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.27 ms per loop
>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 181 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.26 ms per loop
```
The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels.
ezyang SsnL zou3519 soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758
Differential Revision: D9995799
Pulled By: weiyangfb
fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e
Summary:
This eliminates the need for any heuristics regarding stack size limits.
This is a re-do #11534 with a fix to properly handle cases where
multiple edges exist between a pair of functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11611
Differential Revision: D9991198
Pulled By: resistor
fbshipit-source-id: fecd2c5cac7e78f82a0f20cf33268bb1617bb4a0
Summary:
Previously, aten::view returned a Dynamic type when attr::size is a prim::ListConstruct.
See [this for a repro](https://gist.github.com/zou3519/cbd610472ba3369f556fa612a7d93b28).
This prevented a pre-multipled lstm input graph from being fusible (aten::view is necessary
to do premultiplication).
If aten::view is passed an output of a prim::ListConstruct node, then shape prop should
be able to figure out its TensorType because we statically know the number of inputs to
prim::ListConstruct. This PR implements that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11877
Differential Revision: D9972356
Pulled By: zou3519
fbshipit-source-id: cb87786f6e7f222d4b8f07d8f2a9de34859cb6a5
Summary:
This is pretty important because a common situation of passing LSTM hidden states as a tuple completely trashes performance of a network.
Cleans up all our propagation/undef specialization passes, at a cost of increased complexity of `ArgumentSpec` and `GraphExecutor`. An alternative would be to simply flatten all tuple inputs to a graph ahead of time, but that might just end up being confusing in the future (you never know if you're working with a graph that can have tuple or not).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11863
Differential Revision: D9992814
Pulled By: apaszke
fbshipit-source-id: 0a565a3b23e32f8fa72c0534e07c1ce6187739fc