Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66972
Add api to view how many custom classes we have and what their names are
Test Plan: unit test
Reviewed By: cccclai
Differential Revision: D31811337
fbshipit-source-id: 9f8ca1fc578a0a5360c9cd8f95475acc33f250e4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67085
leverages BuiltinRegistry to register the CPython standard C modules. The standard C modules moved are in the FOR_EACH macro
Test Plan:
buck test mode/opt //caffe2/torch/csrc/deploy/interpreter:test_builtin_registry
buck test mode/opt //caffe2/torch/csrc/deploy:test_deploy
Reviewed By: shunting314
Differential Revision: D31848547
fbshipit-source-id: 7eb49d222eaaccb2b8ca5c984b05bf54cc233f25
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66659
Original message: We added and registered a new operator, static_runtime::fused_sigrid_transforms, and modified the original sigrid_transforms to handle non-fused case only
Note: this diff was commandeered from a bootcamper. Some final touches were needed.
Test Plan: `buck test caffe2/benchmarks/static_runtime/...`
Reviewed By: swolchok
Differential Revision: D31550307
fbshipit-source-id: 287380be0cca20ee6e145bcc7217547bd58cf6d0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67097
all delegated models have `is_nonzero` ops by default, by making the op native and consumable without dispatch eases the portability of such models
ghstack-source-id: 141375082
Test Plan:
`buck test caffe2/test/cpp/jit:jit -- BackendTest.TestComposite`
```
~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test caffe2/test:jit -- test_trace_arange
Parsing buck files: finished in 0.5 sec
Building: finished in 9.4 sec (100%) 16035/16035 jobs, 0/16035 updated
Total time: 10.0 sec
More details at https://www.internalfb.com/intern/buck/build/1e55eea5-2adb-41d1-96ae-cbf4b446d6c6
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 46eedba2-ae17-4e88-b205-93bd1332665d
Trace available for this run at /tmp/tpx-20211015-113905.235421/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/1970324912349177
✓ ListingSuccess: caffe2/test:jit - main (12.372)
✓ Pass: caffe2/test:jit - test_trace_arange (jit.test_tracer.TestTracer) (13.748)
✓ Pass: caffe2/test:jit - test_trace_arange_with_grad (jit.test_tracer.TestTracer) (13.892)
Summary
Pass: 2
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1970324912349177
```
Reviewed By: iseeyuan
Differential Revision: D31656842
fbshipit-source-id: c0e6c798478a2783c0e17e6e9100ba5ce044da78
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66671
Made changes in the step function of the vectorized and non-vectorized adagrad optimizers to handle complex numbers as two real numbers as per 65711 on github
ghstack-source-id: 141442350
Test Plan:
buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex'
https://pxl.cl/1Rd44
Reviewed By: albanD
Differential Revision: D31673503
fbshipit-source-id: 90a0d0c69b556716e2d17c59ce80f09c750fc464
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67001
The overload of `operator()` taking `std::vector<at::Tensor>` was only used for testing. In a diff following this one, I will add a new overload that takes `std::vector<c10::IValue> args` and no `kwargs` so we can avoid default-constructing `kwargs` everywhere.
This new overload will probably take a forwarding reference, so to avoid problems with overloading on forwarding reference and simplify the interface, it's best to remove this unused one.
Test Plan:
`buck test caffe2/benchmarks/static_runtime/...`
`buck test caffe2/test:static_runtime`
Reviewed By: hlu1
Differential Revision: D31821990
fbshipit-source-id: 6d2e4a75ca4abe6e262651532eb96c3b274c6f4a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67125
Using explicit template instantiations in D31659973 (f2582a59d0) was a bad idea. The problem is that the lvalue instantiation was for a `const` vector of `IValue`, meaning that if you tried to pass SR a non-const vector of arguments, the linker would fail to find the symbol.
The reason we didn't catch this in D31659973 (f2582a59d0) was because predictor always passes a `const` reference anyways. But we should fix this to prevent unexpected problems in the future.
Test Plan: `buck test caffe2/benchmarks/static_runtime/...`
Reviewed By: hlu1
Differential Revision: D31873406
fbshipit-source-id: 5ab5a03334bed925cec11facadcedf9bec9b90ad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67169
Looks like the doc error only appears after it's landed
Test Plan: Imported from OSS
Reviewed By: seemethere
Differential Revision: D31890431
fbshipit-source-id: d40cba082712c4b35704ea15d82fbc4749f85aec
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67152
Test Plan:
```
cd docs
make html
```
Imported from OSS
Reviewed By: supriyar
Differential Revision: D31884570
fbshipit-source-id: 2b521f617c93f6fa08da3387df2d25497293eee6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66957
chunk appears to return a tuple which is enough given that we just
index to the right chunk and discard the rest.
ghstack-source-id: 141391149
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D31780799
fbshipit-source-id: fdb1b77fffa916328e14a4cd692b5241ae46a514
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66956
Adds some comments I found helpful while ramping up on FSDP code.
ghstack-source-id: 141391150
Test Plan: n/a
Reviewed By: mrshenli
Differential Revision: D31780798
fbshipit-source-id: e2d38a9801b4548b202a73615774d5f0f7f5e3ed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66337
Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated.
This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D31514095
Pulled By: priyaramani
fbshipit-source-id: b70c8e2c733600a435cd4e8b32092d37b7bf7de5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67066
We'll add it later when the api is ready
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D31849079
fbshipit-source-id: 0c00d08510166b2d897cf1562c7276527319b05c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66878
Currently convert_fx quantizes all layers that have been prepared, depending on the prepare qconfig_dict
This PR adds support to accept a variation of qconfig_dict in convert_fx that can be used to specify skip quantizing certain layers
This can help with prepare/observe all operators, quantize a subset of them (based on quantization error), to avoid preparing multiple times.
The qconfig_dict passed to convert_fx can only have the values set to `None`, with the keys being the same as what is allowed in the prepare qconfig_dict
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_convert_qconfig_dict
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D31808247
fbshipit-source-id: a4f5dca1090f0083fc3fea14aff56924033eb24f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66767
Make observer fqn in prepare step independent of input_node/observed_node name.
This change names the observers as `{input/output}_activation_post_process_{idx}` where idx will be incremented for each new observer instance and is guaranteed to be unique.
Test Plan:
python test/test_quantization.py test_observer_fqn
Imported from OSS
Reviewed By: anjali411
Differential Revision: D31752052
fbshipit-source-id: e0995b1ef33a99d5b012133fe92d303d55a73b7d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67050
This PR moves init_multi_gpu_helper to common_distributed so that it could be shared by different distributed tests.
ghstack-source-id: 141370119
Test Plan: wait for ci.
Reviewed By: mrshenli
Differential Revision: D31842644
fbshipit-source-id: c7bad25d6cef9bdce7ad1fb6c60c1cad4b765702
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66904
Add more FSDP unit tests to cover core logic, freezing weights and flatten parameter wrappe, these unit tests are refactored to be aligned with PyTorch commonly used test classes
ghstack-source-id: 141335614
Test Plan: unit tests
Reviewed By: mrshenli
Differential Revision: D31779565
fbshipit-source-id: c727110d1d7570c0ec49e42cadfc9e9a5e440073
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66649
some minor changes to dist quantization, mainly change the namespace and add some notes for future code dedup
ghstack-source-id: 141336191
Test Plan: wait for ci
Reviewed By: cbalioglu
Differential Revision: D31663043
fbshipit-source-id: 2f96b7346e9c90df5ab2536767f8301eb86a9c79
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66149
Updated logic will be able to infer rank of slice output, when only rank is known for slice input. Enables cases where `ConstantValueMap::HasRank(input)` is `True`, while `ConstantValueMap::HasShape(input)` is `False`.
Test Plan: Imported from OSS
Reviewed By: jansel
Differential Revision: D31423840
Pulled By: malfet
fbshipit-source-id: 17b2b24aa63435d5212ebe6bdf66ae3c348c4e3b
Co-authored-by: BowenBao <bowbao@microsoft.com>
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66143
Delete test_list_remove. There's no point in testing conversion of
this model since TorchScript doesn't support it.
Add a link to an issue tracking test_embedding_bag_dynamic_input.
[ONNX] fix docs (#65379)
Mainly fix the sphinx build by inserting empty before
bulleted lists.
Also some minor improvements:
Remove superfluous descriptions of deprecated and ignored args.
The user doesn't need to know anything other than that they are
deprecated and ignored.
Fix custom_opsets description.
Make indentation of Raises section consistent with Args section.
[ONNX] publicize func for discovering unconvertible ops (#65285)
* [ONNX] Provide public function to discover all unconvertible ATen ops
This can be more productive than finding and fixing a single issue at a
time.
* [ONNX] Reorganize test_utility_funs
Move common functionality into a base class that doesn't define any
tests.
Add a new test for opset-independent tests. This lets us avoid running
the tests repeatedly for each opset.
Use simple inheritance rather than the `type()` built-in. It's more
readable.
* [ONNX] Use TestCase assertions rather than `assert`
This provides better error messages.
* [ONNX] Use double quotes consistently.
[ONNX] Fix code block formatting in doc (#65421)
Test Plan: Imported from OSS
Reviewed By: jansel
Differential Revision: D31424093
fbshipit-source-id: 4ced841cc546db8548dede60b54b07df9bb4e36e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66140
* Add new argument to export api to enable users specifying `nn.Module` classes that they wish to be exported as local function in ONNX model.
* Refactor `torch/csrc/jit/serialization/export.cpp`, and remove redundant `EncoderBase` class.
* ~~Contains changes from #63268~~
* Depends on #63716 to update onnx submodule.
Test Plan: Imported from OSS
Reviewed By: jansel
Differential Revision: D31424098
fbshipit-source-id: c949d0b01c206c30b4182c2dd1a5b90e32b7a0d3
Co-authored-by: BowenBao <bowbao@microsoft.com>
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66581
c10d/frontend.cpp was originally proposed to introduce pure C++ API and use TorcBind to share python level API with TorchScript. This is no longer needed, so delete this to reduce code redundancy.
ghstack-source-id: 141336190
Test Plan: wait for ci
Reviewed By: rohan-varma
Differential Revision: D31627107
fbshipit-source-id: 07d30d280c25502a222a74c2c65dfa4069ed8713
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66648
Currently, SR shallow-copies its `IValue` inputs when running inferences. We can avoid refcount bumps by `std::move`-ing the inputs into their slots. To achieve this, I've made the following changes:
1. Add an overload for `set_inputs` that takes a `std::vector<IValue>&&`.
2. Change the signatures of `StaticModule::operator()` and `StaticRuntime::operator()`.
Old:
```
operator()(const std::vector<IValue>& args, const std::unordered_map<std::string, IValue>& kwargs)
```
New:
```
template <class IValueList>
operator()(IValueList&& args, const std::unordered_map<std::string, IValue>& kwargs)
```
The implementations use perfect forwarding to invoke the correct overload of `set_inputs`.
Test Plan: Added a short new unit test to exercise the new code path. All other unit tests still pass.
Reviewed By: hlu1
Differential Revision: D31659973
fbshipit-source-id: b8c194405b54a5af1b418f8edaa1dd29a061deed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66288
This change makes it so `UseVariadicOp` can transform ops with many Tensor list inputs.
Input pattern:
```
%output : Type = op(%list_1, %arg_1, %list_2, %list_3)
```
Output pattern:
```
%output : Type = variadic_op(%list_11, ..., %list_1N, %arg_1, %list_21, ..., %list_2M, %list_31, ..., %list_3K, N, M, K)
```
The length of each list is passed at the end of the variadic op so that the op implementation can process the inputs appropriately. This also frees us from needing to update `hasVarArgs` in static runtime each time we add a variadic op.
This diff also makes `UseVariadicOp` more robust. Before, `list_idx` was passed as an argument. Now, `VariadicUpdater` determines `list_idx` from the node's schema.
Test Plan:
Existing variadic ops do not break:
`buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`
Reviewed By: d1jang
Differential Revision: D31450811
fbshipit-source-id: 808fcc3ae8940b9e602586f38f8cf9154c9a6462
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65860
Re-enable peepholes like `x + 0 == x`. These were at one point enabled, and then disabled because they did not properly account for aliasing, and then re-enabled with reconstructing the alias db everytime which is slow - O(n^2). I've added correctness conditions, and I've also made it so that we avoid using stale aliasing properties for either the input or output of nodes we optimize.
Some of the other code that we have written to avoid re-instantiating the alias db involves internally mutating it, however this is tricky to reason about and we probably have to add some extra invariants...
cc navahgar relevant to graph opts and d1jang alias analysis relevant here
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D31352382
Pulled By: eellison
fbshipit-source-id: 441a27f17dc623d6c24538d1d43cba0412c3c482