Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67803
* Addresses comments from #63589
[ONNX] remove torch::onnx::PRODUCER_VERSION (#67107)
Use constants from version.h instead.
This simplifies things since we no longer have to update
PRODUCER_VERSION for each release.
Also add TORCH_VERSION to version.h so that a string is available for
this purpose.
[ONNX] Set `ir_version` based on opset_version. (#67128)
This increases the odds that the exported ONNX model will be usable.
Before this change, we were setting the IR version to a value which may
be higher than what the model consumer supports.
Also some minor clean-up in the test code:
* Fix string replacement.
* Use a temporary file so as to not leave files around in the test
current working directory.
Test Plan: Imported from OSS
Reviewed By: msaroufim
Differential Revision: D32181306
Pulled By: malfet
fbshipit-source-id: 02f136d34ef8f664ade0bc1985a584f0e8c2b663
Co-authored-by: BowenBao <bowbao@microsoft.com>
Co-authored-by: Gary Miguel <garymiguel@microsoft.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
Summary:
Some of the "no-ops" are not actually no-ops because they can change the dtype
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67688
Reviewed By: davidberard98
Differential Revision: D32104601
Pulled By: eellison
fbshipit-source-id: ccb99179a4b30fd20b5a9228374584f2cdc8ec21
Summary:
Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b)
This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast.
We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)`
The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md
This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs.
Few limitation/challenge that is not properly resolved in this PR:
1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules.
2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input')
3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value.
Credit goes mostly to:
tlemo
kevinstephano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939
Reviewed By: navahgar
Differential Revision: D31093381
Pulled By: eellison
fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967
Graph is an implementation detail. If user wants to get access to the
underlying graph, they should be able to explicitly dynamic cast instead.
ghstack-source-id: 141659819
Test Plan: no behavior change.
Reviewed By: gmagogsfm
Differential Revision: D31326153
fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65966
ghstack-source-id: 141594521
Support exportation of "interface methods" from submodule to a mobile module. "Interface methods" are defined as methods which might be dynamically called in a module therefore need to be exported anyway, like virtual functions in C++.
Before this change the algorithm of exportation is a simple iteration through all toplevel methods. Now since we have indirect calls, we need to recursively walkthrough the call graph to find all potentially used methods, which means the order we export methods might break in old runtimes, to guarantee forward compatibility we need to export toplevel methods first, then extra methods, in this order toplevel methods will always be found first.
NOTE that interface methods exportations are disabled by default in this diff. We need to call torch._C._enable_mobile_interface_call_export to actaully enable it.
Test Plan: buck test mode/dev //caffe2/test:jit -- --exact 'caffe2/test:jit - test_export_opnames_interface (jit.test_misc.TestMisc)'
Reviewed By: qihqi, iseeyuan
Differential Revision: D31326155
fbshipit-source-id: 5be7234cca07691f62648a85133b6db65e427b53
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66140
* Add new argument to export api to enable users specifying `nn.Module` classes that they wish to be exported as local function in ONNX model.
* Refactor `torch/csrc/jit/serialization/export.cpp`, and remove redundant `EncoderBase` class.
* ~~Contains changes from #63268~~
* Depends on #63716 to update onnx submodule.
Test Plan: Imported from OSS
Reviewed By: jansel
Differential Revision: D31424098
fbshipit-source-id: c949d0b01c206c30b4182c2dd1a5b90e32b7a0d3
Co-authored-by: BowenBao <bowbao@microsoft.com>
Summary:
This would save the cost copying text from stack to heap in some cases (like
parsing function schema during loading phase of libtorch.so)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65309
Reviewed By: swolchok
Differential Revision: D31060315
Pulled By: gmagogsfm
fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066
I noticed a bunch of time being spent heap-allocating Tuples
in the unpickler. 1-, 2-, and 3-element Tuples are apparently common
enough that they get their own bytecode instructions, so I decided to
try also giving them their own representation. We store up to 3
IValues inline in `Tuple` rather than doing a second heap allocation
for a `std::vector<IValue>`.
ghstack-source-id: 140695395
Test Plan:
Added automated tests for TupleElements.
Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422
We went from 347 ms to 302 ms.
Reviewed By: dhruvbird
Differential Revision: D30592622
fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63198
Linear layers using the same input tensor can be concatted together
as long as the weights and biases are compatible.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D31240642
fbshipit-source-id: 1e78daa6b89822412ba2513d326ee0e072ceff1e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345
FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership.
ghstack-source-id: 140044165
Test Plan:
CI
perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial.
Reviewed By: hlu1
Differential Revision: D31027361
fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63129
1. Add an api to get `supported_types` from runtime, expose in c++ only.
2. Add an api to get `contained_types` from model, expose in both c++ and PyThon.
3. Add a field `contained_types_` in `type_parser.cpp` to track the contained types when parsing python string.
4. Expand `is_compatible` api to check type. When checking type, it will check the contained type list from the model with the support type list from runtime.
5. Expand the unittest for compatibility to cover type
6. Add unit test in python to check type list
ghstack-source-id: 139826944
Test Plan:
```
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.GetContainTypes'
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleSuccess'
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail'
buck test //caffe2/test:mobile
```
Reviewed By: iseeyuan
Differential Revision: D30231419
fbshipit-source-id: 8427f423ec28cc5de56411f15fd960d8595d6947
Summary:
Description:
- Have only added `stdout` and `stderr` as possible options from python
API for now. We can do file path passing later maybe.
- Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file.
Python API:
`torch._C._jit_set_logging_stream('stdout|stderr')`
C++ API:
`::torch::jit::set_jit_logging_output_stream(ostream);`
Testing:
- Tested python API locally.
- Unit test for the C++ API is written
Fixes https://github.com/pytorch/pytorch/issues/54182
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768
Reviewed By: mrshenli
Differential Revision: D31291739
Pulled By: ZolotukhinM
fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610
- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.
- In the next PR
- Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
- HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.
cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd
Reviewed By: jbschlosser
Differential Revision: D30909053
Pulled By: ezyang
fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64382
* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.
* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.
* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D30905265
Pulled By: malfet
fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb
Co-authored-by: hwangdeyu <dejack953@outlook.com>
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65097
Previously, BatchMM would skip any block containing any mutable
operators. Now it will avoid batching any operation whose inputs or
outputs are ever mutated. Specifically: consider a tree of ADD, T,
and MM nodes rooted at an ADD node. If any input or output to any
node in the tree is ever mutated, then the entire tree will be ignored
by BatchMM.
Test Plan: python test/test_jit.py TestBatchMM
Reviewed By: eellison
Differential Revision: D30973515
Pulled By: davidberard98
fbshipit-source-id: 9d836faa1ef0c9e3fefe0ffc0bd265f275471f48
Summary:
This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234
Reviewed By: gmagogsfm
Differential Revision: D30656444
Pulled By: ansley
fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414
Misuse of raw pointer in here where stack is never nullable.
ghstack-source-id: 136938318
Test Plan:
compiles.
Imported from OSS
Reviewed By: ejguan
Differential Revision: D30375410
fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776
I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack. Let's try again.
I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847
Test Plan: CI
Reviewed By: huiguoo
Differential Revision: D30484555
fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62763
This PR is to fix the issue that the graph inputs might be updated when we export the model in inference mode.
When a model is export in inference mode, some optimizations will be made. One side effect of these optimizations is: the inputs of graph might be adjusted. Such optimizatiosn include:
1. Conv and BatchNorm op fusion.
2. Do constant folding.
If the user sets export_params=False, or set keep_initializers_as_inputs=True, it's highly possible that the user wants to provide the corresponding parameters or initiliazers as the inputs of the graph.
In such situation, no matter the model is export in inference mode or training mode, exporter needs to prevent above optimizations from adjusting the graph inputs. By this, the inputs of graph could match inputs that users provided.
The changes in this PR, add an additional common judgement to see if the above optimizations needs to be done or not. From the value of export_params and keep_initializers_as_inputs arguments, infer if the graph inputs are allowed to be adjusted.
If no, these optimizations will be ignored, even other requirements are matched.
Besides these code changes, the comments of some parameters below have been updated so that users have more thoughts when they consider how to leverage these parameters for different purposes:
1. export_params
2. training
3. do_constant_folding
4. keep_initializers_as_inputs
Test Plan: Imported from OSS
Reviewed By: SplitInfinity
Differential Revision: D30375183
Pulled By: msaroufim
fbshipit-source-id: 4db8b9695649eb32a3a0fefa950ee2e5651bdba0
Co-authored-by: fatcat-z <jiz@microsoft.com>
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59492
Adding code to find common expressions from the two subblocks of an if
operation and hoist them before the if block.
This also allows Dead Code Elimination to
then eliminate some if blocks.
Also eliminated some dead code in the codebase.
Test Plan:
python test_jit.py TestIfHoisting
Imported from OSS
Reviewed By: ngimel
Differential Revision: D29399533
fbshipit-source-id: 9336b9dc48c02c38862f98f98cd72fc1767a1802
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59814
Using opinfos to test shape analysis. By default, we just check that we don't give incorrect answers, and then if `assert_jit_shape_analysis` is true, tests that we correctly propagates the full shape. and it found a couple bugs {emoji:1f603}
Test Plan: Imported from OSS
Reviewed By: Krovatkin
Differential Revision: D30200058
Pulled By: eellison
fbshipit-source-id: 6226be87f5390277cfa5a1fffaa1b072d4bc8803
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62200
This commit brings back the `RemoveInplaceOps` pass removed in D29523283 (dec5aa2260) that apparently had a bunch of internal users.
Test Plan: danthe3rd
Reviewed By: danthe3rd
Differential Revision: D29833316
fbshipit-source-id: 6cf13d463ab0a5e50ba3eb3243f79a9c51623809
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61791
methods from forward
During inlining we attached InlinedCallstack to nodes being inlined. In
the process we attach moodule information as well, such that if
CallMethod is being inlined we know which class instance and class type
the method belongs to. However, CallMethod can be calling a method of
the same object to which the graph belongs. e.g.:
```
def forward(self, input):
x = input + 10
return forward_impl_(x, input)
```
Here forward_impl is method defined on the same class in which forward
is defined. Existing module hierarchy annotation will mislabel this as
unknown instance since the method is not associated with output of
GetAttr node (it would be we had called self.conv.forward_impl_ for
example).
Change in this PR reconciles this by creating a placeholder name "SELF"
for module instance indicating that you can traverse InlinedCallStack
backwards to find first node with name != SELF, which would be the name
of the object.
e.g.:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward
Test Plan:
Add test
Imported from OSS
Reviewed By: larryliu0820
Differential Revision: D29745443
fbshipit-source-id: 1525e41df53913341c4c36a56772454782a0ba93
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61806
Currently, if you do `save_pickle` on a ScriptModule, then `save_pickle`
on a tensor, this would result in a `0.storage` tensor being written
*twice* to the zip archive. This would cause weird bugs on the
serializing side (this presented as a ASAN-detected heap buffer overflow
because we tried to read more memory from a tensor than we actually
had).
Turns out this was because when we did:
```
self.storage_context = self.script_module_serializer.storage_context()
```
it returned a new copy of the storage context, so we weren't actually
assigning unique names to tensors!!
This PR fixes the issue by making `(De)SerializationStorageContext`
non-copyable and fixing up the parts of the bindings that returned by
copy.
Differential Revision:
D29748969
D29748969
Test Plan: Imported from OSS
Reviewed By: Lilyjjo
Pulled By: suo
fbshipit-source-id: c2f89ab270e07e7a111fb35c545b5e07b804dc3c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61520
This commit widens the exception caught by the try-catch block that checks if
an object passed to a scripted function is a `ScriptList`. It turns out that
there are internal tests that do not throw a `py::cast_error` so catching only
that is not sufficient.
Test Plan: Ran the failing tests in T94889011.
Reviewed By: Chillee
Differential Revision: D29560815
fbshipit-source-id: 442258f8997146d833a9d5db923e1f6359f2bfdd
Summary:
* Minor: spelling, grammar.
* Add calls to `GRAPH_DUMP()` where they were missing.
* Add or expand a few comments.
* Move a few comments to seemingly more appropriate spots.
* In canonicalize_graph_fuser_ops.cpp inline `runnableInputs()` since it
was only called in one place and had a misleading comment and
confusing name.
* In `PeepholeOptimizeImpl::optimizeBlock()`, set `changed = true;` when
removing `aten::is_complex`. Pretty sure its absence was a bug.
* Delete unused `_jit_pass_remove_inplace_ops` and and its
implementation `RemoveInplaceOps()`.
* In `preprocessCaffe2Ops()`, remove redundant check for nested optional
types. It was already checked in `checkONNXCompatibility()`.
* In `EncoderBase::AddAttribute`, log the unexpected attribute kind.
I don't remember the repro case now but I did hit this error at some
point and this additional logging made it easier to understand.
* In `fuseConvBatchNorm()` in eval_peephole.cpp, consistently use
camelCase instead of snake_case for local variables.
* Add curly braces around the bodies of if and loops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60390
Reviewed By: Krovatkin
Differential Revision: D29523283
Pulled By: SplitInfinity
fbshipit-source-id: 4e16c5648616f53da07d68dab7fdf252e06a0752
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60244
Do 2GB size check for protocol buffer serialization at a later time, to avoid false alarming for cases like shape inference where no serialization actually happens.
Test Plan: Imported from OSS
Reviewed By: zou3519, ZolotukhinM
Differential Revision: D29494910
Pulled By: SplitInfinity
fbshipit-source-id: 4c36d26de9a94e5d6cf78f332d4dffc46588ebf0
Co-authored-by: BowenBao <bowbao@microsoft.com>
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52832
**Summary**
This commit adds `torch._C.ScriptList`, a list type that has reference
semantics across the Python/TorchScript boundary. That is, modifications
made in TorchScript to instances of `torch._C.ScriptList`
are visible in Python even when it is not returned from the function.
`torch._C.ScriptList` is implemented using a modified version of pybind's
`stl_bind.h`-style bindings attached to `ScriptList` and `ScriptListIterator`,
wrapper classes around `c10::impl::GenericList` and
`c10::impl::GenericList::iterator`. These bindings allow instances of
`torch._C.ScriptList` to be used as if it were a
regular `list` in Python. Reference semantics are achieved by simply
retrieving the `IValue` contained in `ScriptList` in `toIValue` (invoked
when converting Python arguments to `IValues` before calling TorchScript
code).
**Test Plan**
This commit adds `TestScriptList` to `test_list_dict.py`, a set of tests
that check that all of the common list operations are supported
and that instances have reference semantics across the
Python/TorchScript boundary.
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D29478121
Pulled By: SplitInfinity
fbshipit-source-id: 652cc25cfa37debe28db9527504846f22abd8b54
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.
I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop
# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
-j \
-s \
-k \
-v \
--paths torch/csrc/ \
-g"-torch/csrc/jit/passes/onnx/helper.cpp" \
-g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
-g"-torch/csrc/jit/serialization/onnx.cpp" \
-g"-torch/csrc/jit/serialization/export.cpp" \
-g"-torch/csrc/jit/serialization/import.cpp" \
-g"-torch/csrc/jit/serialization/import_legacy.cpp" \
-g"-torch/csrc/onnx/init.cpp" \
-g"-torch/csrc/cuda/nccl.*" \
-g"-torch/csrc/cuda/python_nccl.cpp" \
-g"-torch/csrc/autograd/FunctionsManual.cpp" \
-g"-torch/csrc/generic/*.cpp" \
-g"-torch/csrc/jit/codegen/cuda/runtime/*" \
-g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
-g"-torch/csrc/deploy/interpreter/interpreter.h" \
-g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
-g"-torch/csrc/deploy/interpreter/test_main.cpp"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649
Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.
Reviewed By: walterddr, janeyx99
Differential Revision: D29504258
Pulled By: 1ntEgr8
fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44324
**Summary**
This commit adds reference semantics to TorchScript class types;
modifications made to them within TorchScript will be visible in Python.
**Test Plan**
This commit adds a unit test to `TestClassType` that checks that
modifications made to a class type instance passed into TorchScript are
visible in Python after executing the scripted function or module.
**Fixes**
This commit closes#41421.
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D24912807
Pulled By: SplitInfinity
fbshipit-source-id: d64ac6211012425b040b987e3358253016e84ca0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60384
Currently inlining module graph will drop module hierarchy info on Python side. Here we retrieve the module hierarchy from cpp side and expose it to a new Python API on Node called `moduleHierarchy()`.
Test Plan:
Usage:
```
torch._C._jit_pass_inline(module.graph)
torch._C._jit_pass_propagate_shapes_on_graph(module.graph)
node = module.graph.findNode("quantized::conv2d_relu")
'top(' + module.original_name + ').' + node.moduleHierarchy() + '.' + node.kind()
```
Output:
```
'top(QuantWrapper).module(FBNetHR).0(Sequential).xif0_0(ConvBNRelu).conv(ConvReLU2d).quantized::conv2d_relu'
```
Reviewed By: kimishpatel
Differential Revision: D29252169
fbshipit-source-id: 74163a87f919e061e5e75dfebc4c5cdbe8489d93
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334
Here's a possibly controversial PR. These counters got in the way of
generalizing the fuser tests to handle arbitrary devices, and I guess I'm just
generally skeptical that they provide much value. While true that they let us
observe whether fusion groups were created, we already have assertions based on
the shape of the graph, and I'm not sure that I trust those any less than these
counters.
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D29471484
Pulled By: bertmaher
fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59735
1. Fixes ABA storage identity problem during serialization for `torch.package` by keeping reference of serialized storages through lifetime of `PackageExporter` to prevent reuse of memory address. Achieved by extending logic used in solution to mobile's same issue.
2. Adds determinism to naming scheme of serialized storages in export code paths which utilize `tensor_cdata_naming_scheme`(introduced 2nd mapping in `StorageContext`, now maps `storage cdata ptr` -> `unique id`, `unique id` -> `c10::Storage`)
3. Additionally uses presence of a storage in the `StorageContext` instance as marker for if a storage has been serialized or not, removing the need to scan the `PythonStreamWriter` for presence of the storage's serialization file
Test Plan: Imported from OSS
Reviewed By: suo
Differential Revision: D29075276
Pulled By: Lilyjjo
fbshipit-source-id: 15a5c30b1de99c5bd7079388f2db9b6ece2eca12
Summary:
Description:
- Before this, logging level could only be changed by changing the env
variable "PYTORCH_JIT_LOG_LEVEL"
- Can change the level from python now
- Have not added stream configuration for now
- Configuration is stored in a singleton class managing the options
Issue Link: https://github.com/pytorch/pytorch/issues/54188
Gotchas:
- Created separate functions
`::torch::jit::get_jit_logging_levels/set_jit_logging_levels` instead of
using the singleton class's method directly
- This is because when running test cases, two different instances
of the singleton are created for the test suite and the actual code
(`jit_log.cpp`)
- On using these methods directly, `is_enabled` calls the singleton
in `jit_log.cpp` while we are setting the config using another
singleton
- See: https://stackoverflow.com/questions/55467246/my-singleton-can-be-called-multiple-times
API:
- To set the level: `torch._C._jit_set_logging_option("level")`
- To get the level: `torch._C._jit_get_logging_option()`
Testing:
- UTs were added for C++
- A very simple UT was added for python to just check if the API is
being called correctly
- The API was checked by running trace in a sample python file
- Set env variable to "" and used `_jit_set_logging_option` in python to set the variable to `>dead_code_elimination`
- The error output had logs of form [DUMP..] [UPDATE...] etc
Fixes https://github.com/pytorch/pytorch/issues/54188
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58821
Reviewed By: soulitzer
Differential Revision: D29116712
Pulled By: ZolotukhinM
fbshipit-source-id: 8f2861ee2bd567fb63b405953d035ca657a3200f
Summary:
This commit removes the warning that suggests that users script their
dictionaries before passing them into TorchScript code. The ScriptDict feature
is not fully ready, so it does not make sense to recommend this yet.
Test Plan:
Sandcastle.
In addition, the PyPER test broken by the original diff passes:
```
buck test mode/opt //caffe2/torch/fb/training_toolkit/backend/tests:test_model_materializer_full_sync_lwt -- --exact 'caffe2/torch/fb/training_toolkit/backend/tests:test_model_materializer_full_sync_lwt - caffe2.torch.fb.training_toolkit.backend.tests.test_model_materializer_full_sync_lwt.ModelMaterializerFullSyncLwtTest: test_materialization_determinism_cpu' --run-disabled
```
Differential Revision: D28891351
fbshipit-source-id: 2a3a00cde935d670fb1dc7fd8c709ae9c2ad8cdc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59287
D27211605 added a warning in `toIValue` that warns users to script their
dictionaries before passing them to TorchScript functions in order to get some
performance benefits and reference semantics. However, this warning is emitted
every time `toIValue` is called (e.g. when a dictionary is passed to
TorchScript function), which can lead to noisy log output. This diff changes
this changes to use `TORCH_WARN_ONCE` instead.
Test Plan: Sandcastle, OSS CI.
Reviewed By: hyuen
Differential Revision: D28824468
fbshipit-source-id: e651eade4380abaf77c6c8a81ec4e565b0c2c714
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56966
This PR adds a toggle to shape analysis which won't inline complete tensor shapes as constants into the shape compute graph, which is a good stress test on the partial evaluation pipeline.
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D28444664
Pulled By: eellison
fbshipit-source-id: a62e424515a8837a4b596546efa93af5e8e61f10
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52659
**Summary**
This commit adds `torch._C.ScriptDict`, a dictionary type that has reference
semantics across the Python/TorchScript boundary. That is, modifications
made to instances of `torch._C.ScriptDict` in TorchScript are visible in
Python even when it is not returned from the function. Instances can be
constructed by passing an instance of a Python dictionary to
`torch.jit.script`. In the case of an empty dictionary, its type is
assumed to be `Dict[str, Tensor]` to be consistent with the handling of
empty dictionaries in TorchScript source code.
`torch._C.ScriptDict` is implemented using a modified version of pybind's `stl_bind.h`-style bindings attached to `ScriptDict`, `ScriptDictIterator` and `ScriptDictKeyIterator`, wrapper classes around `c10::impl::GenericDict` and `c10::impl::GenericDict::iterator`. These bindings allow instances of `torch._C.ScriptDict` to be used as if it were a regular `dict` Python. Reference semantics are achieved by simply retrieving the `IValue` contained in `ScriptDict` in `toIValue` (invoked when converting Python arguments to `IValues` before calling TorchScript code).
**Test Plan**
This commit adds `TestScriptDict` to `test_list_dict.py`, a set of tests
that check that all of the common dictionary operations are supported
and that instances have reference semantics across the
Python/TorchScript boundary.
Differential Revision:
D27211605
D27211605
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Pulled By: SplitInfinity
fbshipit-source-id: 446d4e5328375791aa73eb9e8b04dfe3465af960
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58300
Current state: During graph rewriting that can fuse nodes or add nodes
result in new nodes without debug information that was available in
original node. Thus we lose this information during graph rewrite.
This PR changes graph rewriting API to let user specify how the values
in the replacement pattern map to values in the pattern to be matched.
Then the graph rewriting will copy source range and inlined callstack
from the matched nodes onto the nodes being inserted.
(Note: this ignores all push blocking failures!)
Test Plan:
python test/test_jit.py
TestJit.test_pattern_based_rewrite_with_source_range_preserved
Imported from OSS
Reviewed By: malfet
Differential Revision: D28512465
fbshipit-source-id: 863173c29de726be85b3acbd3ddf3257eea36d13
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58887
There are some callsites of `torch.distributed.rpc.XXX` APIs that are compiled
or not based on `USE_RPC`. However, `torch::deploy`, at least for now,
is compiled with `USE_RPC=1`, but the `torch.distributed.rpc.XXX` APIs used by
the aforementioned pieces of code are not available (i.e.
`torch.distributed.rpc.is_available()` returns `False`). This can cause
Torchscript compilation to fail, even if the code being compiled doesn't use
RPC.
This commit fixes this problem (at least temporarily) by predicating the use
all thse `torch.distributed.rpc` APIs on the value of
`torch.distributed.rpc.is_available()`.
Test Plan: Ran packaged XLM-R model with C++ benchmark.
Reviewed By: suo
Differential Revision: D28660925
fbshipit-source-id: fbff7c7ef9596549105e79f702987a53b04ba6f9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57501
Add an api _get_model_ops_and_info to get root operators and versioning info of a model in both cxx and python, and the input can be from a file path or buffer.
ghstack-source-id: 129620112
Test Plan: unit test.
Reviewed By: xcheng16, raziel
Differential Revision: D28162765
fbshipit-source-id: 4413c1e906b8a872e4a717d849da37347adbbea4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55926
This is necessary for code like conv2d where we wish to share a generic convolution shape function logic with that of conv2d but for conv2d always infer the output is dimension 4. I'm also hoping the refinement algorithm here could be refactored out and used to support refining tensor types from user annotations. i have a length comment explaining how this works, and the logic outside of data structures is pretty small and contained. Additionally, you might check out https://fb.quip.com/X7EVAdQ99Zzm for a very similar description of how to refine values based on comparison operators.
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D27750997
Pulled By: eellison
fbshipit-source-id: d962415af519ac37ebc9de88f2e1ea60a1374f7c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55925
This sets up the initial handling of symbolic shapes. As in the test, it doesn't work perfectly yet because it needs a couple other optimization passes. The basic description is pretty simple: we resolve tensor dimension indices to the same Value *, and before extracting out the output Tensor shape we substitute in symbolic shapes. We don't substitute during optimization because they are represented as negative numbers so we don't want them inadvertently used in Constant prop or something else.
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D27750996
Pulled By: eellison
fbshipit-source-id: 6984e7276b578f96b00fc2025cef0e13f594b6e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54809
I'm going to post on dev-discuss soon with a more thorough explanation of the design and advantages of this shape analysis, so I'm leaving out that for now.
There is still a ton left to do, I'm posting this initial version so we can get something on master multiple can work on. List of many remaining steps to do:
- [ ] Add symbolic shapes support
- [ ] Bind shape functions for operators in C++
- [ ] Make classes of operators share the same shape function (e.g. pointwise, broadcast two inputs)
- [ ] Refactor APIs
- [ ] Only iteratively optimize shape function while a change has been made
- [ ] Expand coverage of coverage to common ops
- [ ] Add shape analysis pass on Graph that handles Ifs and Loops
- [ ] Allow concurrent reads to the operator map
- [ ] Successive applications of same inputs to same shape function (e.g. series of pointwise ops)
For this review, I am mostly looking for comments related to the implementation of symolic_shape_analysis.cpp, with the caveats listed above. I am not really looking for comments related to api/registration/graph level analysis as those are all planned to be changed. I am fine landing this as is or waiting until necessary components of the TODOs above are finished.
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D27750998
Pulled By: eellison
fbshipit-source-id: 4338b99e8651df076291c6b781c0e36a1bcbec03
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57397
Introduces two main classes in C++ runtime:
ScriptProfile is the implementation for enalbing and disabling interpreter
profiling in C++. This should be only used from Python, and we will add
corresponding Python API in the next diff.
InstructionSpan is a utility class to instrument execution of each single
instruction. A start timestamp is recorded in the consturctor, and an end
timestamp is recorded in the destructor. During destruction, this will send
runtime data to all enabled ScriptProfile instances.
Test Plan:
build/bin/test_jit --gtest_filter='ScriptProfileTest.Basic'
Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D28133579
fbshipit-source-id: e7e30e96151367022793ab3ad323f01c51ad4a3b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58382
Calling markCompleted on a Future now first acquires the Future's mutex (as usual) but then sometimes tries to acquire the GIL during the DataPtr extraction while still holding the Future's mutex. (This happens when the value passed to markCompleted is a Python object). This can cause a deadlock if someone else calls any of the other methods of Future while holding the GIL.
There are two solutions to this: avoid holding the Future's mutex when extracting DataPtrs, and avoid holding the GIL while invoking the Future's method. In this PR I'm going for the latter, because it's a very simple immediate fix, but I believe this is brittle and that we should probably also consider the former fix.
ghstack-source-id: 129105358
Test Plan: The repro in https://github.com/pytorch/pytorch/issues/58239 now doesn't deadlock.
Reviewed By: mrshenli
Differential Revision: D28472816
fbshipit-source-id: 1bc9bca426dd004f9eb2568db1ffd38f014450e2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56152
Currently, the Bundled Inputs API mutates the module in-place. It adds class methods and not instance methods. This results in a small problem that one can't re-run an already executed cell in Bento if the class has already been subject to bundled inputs.
In addition, there is no way to add bundled inputs to a module that has bundled inputs added already. This API provides a way to solve this problem as well by adding an `ignored_methods` to the call to `clone()` by allowing the implementation of bundled inputs to pass in the methods that it will add as `ignored_methods` so that when it does try to add those methods, it will be able to do so successfully.
We'll have to be careful when ignoring those methods during the call to `torch.jit._clone_module_with_class` since any bundled input that relies on a user-provided method will need to be preserved and not ignored during the clone.
Looking for feedback on whether this is an acceptable direction.
ghstack-source-id: 128908360
Test Plan:
Added unit test and ran it as `buck test //caffe2/test:mobile`
Also see this Bento Notebook: https://www.internalfb.com/intern/anp/view/?id=550829
Reviewed By: gmagogsfm
Differential Revision: D27788394
fbshipit-source-id: 48109cd4583506d4efdb345e4ba31385db23a273
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57079
Testing onnx 1.9 release, we see that the old bug is triggered for the caffe2 test:
`pytest test/onnx/test_pytorch_onnx_caffe2_quantized.py::TestQuantizedOps::test_small_model`
This is because the graph inputs
```python
graph(%x.1 : Tensor,
%conv1._packed_params : __torch__.torch.classes.quantized.Conv2dPackedParamsBase,
%conv2._packed_params : __torch__.torch.classes.quantized.Conv2dPackedParamsBase,
%fc.bias : Float(10, strides=[1], requires_grad=0, device=cpu),
%fc.weight : Float(10, 72, strides=[72, 1], requires_grad=0, device=cpu)):
```
contains `Conv2dPackedParamsBase` which is a PackedParams.
When we do flatten, we will flatten to several tensors, then the shape inference for input misaligned.
This PR record how may tensors got flattened in PackeParams, and skip by these number rather than 1, then the UT passed.
Note that tuple case should still follow the original logic.
Test Plan: Imported from OSS
Reviewed By: SplitInfinity
Differential Revision: D28393949
Pulled By: malfet
fbshipit-source-id: 98d48aad27e5ca03fb10d260f8e625478d996ee2
Co-authored-by: David <jiafa@microsoft.com>
Summary:
Add an api to backport a model vn to model vi. It accept an input model (file or buffer) and output a model (file or buffer) with an expected bytecode version.
In this change, the input is a model and it can come from a file or buffer. The output is a model and can be either file path or buffer.
When backport fails, function return false with a warning message :
```
/Users/chenlai/pytorch/cmake-build-debug/bin/test_jit --gtest_filter=LiteInterpreterTest.BackPortByteCodeModelV4:LiteInterpreterTest/*.BackPortByteCodeModelV4:*/LiteInterpreterTest.BackPortByteCodeModelV4/*:*/LiteInterpreterTest/*.BackPortByteCodeModelV4 --gtest_color=no
Testing started at 2:32 PM ...
CUDA not available. Disabling CUDA and MultiCUDA tests
[W backport.cpp:419] Warning: Backport doesn't support backport to version3 (function _backport_for_mobile_impl)
Process finished with exit code 0
```
## Test
1. Run both `caffe2/test/cpp/jit/test_lite_interpreter.cpp` and `caffe2/test/mobile/test_bytecode.py`.
2. Run all prod models with backport api.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56802
ghstack-source-id: 128425510
Test Plan: CI
Reviewed By: raziel, iseeyuan
Differential Revision: D27844651
fbshipit-source-id: 8a803cf6c76433ee0a3049b1a5570585d569f8d6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57635
Note: this PR looks massive, but it's just one simple change, codemodded many times.
In many cases, a callback needs to access the value/error produced by the parent future. In Python this was easy because the callback was invoked with the parent future as argument, and could thus inspect it. In C++ the callbacks didn't take any arguments, thus in many cases we worked around this by capturing the future in its own callback. This is risky (leads to reference cycle and thus memory leak) and must be done carefully (spoiler: sometimes we weren't).
ghstack-source-id: 128296580
Test Plan: CI
Reviewed By: wanchaol
Differential Revision: D28178783
fbshipit-source-id: 6de02c4568be42123372edc008f630d5ddae0081
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56918
Re-importing a Python module each time is a bit expensive, and it's unnecessary because this is a private module which won't change and thus we can cache the value once we first extract it.
ghstack-source-id: 128184666
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D27985910
fbshipit-source-id: be40ae9b67ab8ea6c07bc2cb9a78d2c2c30b35d3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57688
P412982836 says that `torch::jit::toIValue()` will also touch GIL through `torch::jit::createGenericDict()` (P412848640)
So we have to move `torch::jit::toIValue()` out of multithreading execution
Reviewed By: hyuen
Differential Revision: D28236527
fbshipit-source-id: 43a33dbcfc828cc42c5e1230c8f5cb415bf7bde4
Summary:
Add an api `_get_bytecode_version` to get version number given a bytecode model in both cxx and python, and the input can be both from file path and buffer.
## Test
CI (new added unit test will run as part of `pytorch_core-buck`)
1. run test_lite_interpreter.cpp
2. `python test/mobile/test_bytecode.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56801
ghstack-source-id: 128169647
Test Plan:
CI (new added unit test will run as part of `pytorch_core-buck`)
1. run test_lite_interpreter.cpp
2. `python test/mobile/test_bytecode.py`
Reviewed By: iseeyuan
Differential Revision: D27961417
fbshipit-source-id: f786cc9573d855feecff0b4fe8e5363e25f5728c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57292
In Future (and soon in other places too) we need to receive a list of devices from Python-land. We don't want to just take their indices because we need full devices in order to infer the type from them. torch.device is not defined through pybind, it's defined through a plain `PyModule_AddObject` call with CPython, thus pybind isn't naturally able to understand and convert it. However we can provide a custom type caster which fixes that. We have this already for at::Tensor, at::Generator, ...
ghstack-source-id: 127916268
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D28092732
fbshipit-source-id: 1c31d0b85a4d5c9e7bde8161efbb7574d505157c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56807
If I understand correctly, there's no reason to create your own instance of these global singleton types.
ghstack-source-id: 127312270
Test Plan: CI
Reviewed By: SplitInfinity
Differential Revision: D27973447
fbshipit-source-id: f12df69d185f1baaa45f2ac6eac70570a7a65912
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57052
This PR caps a stack whose goal was to merge CUDAFuture into ivalue::Future. CUDAFuture used to be a subclass of ivalue::Future, which was already pretty good, but it meant that in several places we needed `#ifdef`s or registries in order to create the right type of class, which was annoying. We've made CUDAFuture device-agnostic, by using generic helpers, so that it doesn't depend on CUDA. Now all its code can be inserted into ivalue::Future.
This PR does this very naively, by copy-pasting CUDAFuture's code into the (previously empty) virtual methods of ivalue::Future. This helps ensure the correctness of this PR, as it's straightforward to see it behaves exactly like before. However we probably want to polish it a bit later to iron out so wrinkles.
ghstack-source-id: 127713138
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D28036829
fbshipit-source-id: 3e5b16402f5dc245c1fcb9d7bf06db64dcb0d2a3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57051
Make CUDAFuture autodetect the devicetype from its arguments (which thus change from DeviceIndices to full Devices). This in fact transforms CUDAFuture into a AnythingFuture, since it's not tied to CUDA in any way anymore. Having made it fully device-agnostic, we'll merge it into ivalue::Future in the next PR.
ghstack-source-id: 127713134
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D28032711
fbshipit-source-id: 8ba23b1b0d97f61db8693cd5f3c7bae7989a9bcd
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57045
Went back and adjusted the previous optimizations to just be applied to every function.
Cleaned up api to match.
ghstack-source-id: 127214412
ghstack-source-id: 127536155
Test Plan: unit test
Reviewed By: kimishpatel
Differential Revision: D27950859
fbshipit-source-id: 214e83d5a19b452747fe223615815c10fa4aee58
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56717
The signal_handler was under the caffe2 namespacee but was being used
by PyTorch as well.
I've fixed this my moving it to the c10 namespace where now both C2 and PyTorch
can use it.
The signal_handler interface in caffe2/utils/signal_handler.h is kept the same
for backward compatiblity for C2, but most of the commmon code is moved to c10.
ghstack-source-id: 127446929
Test Plan: waitforbuildbot
Reviewed By: ezyang
Differential Revision: D27946738
fbshipit-source-id: d6228d1a0108f4c807d405e7a0bb799c5375388f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56516
One problem with CUDAFuture's extraction of DataPtrs from IValues is that it only supported Python objects that could be converted to "regular" IValues (e.g., lists/dicts/tuples of ints/strings/tensors/...). One notable exception are custom Python classes, which are in fact a very common data type transferred over RPC. The only solution we found for those is to use the Python pickler to extract the tensors contained in them.
We can't insert a Python dependency directly into CUDAFuture, so instead I'm proposing to use the same indirection technique used to support `getSubValues` on Python objects: define some methods on the abstract class `PyObjectHolder` (which can be used by CUDAFuture) but only implement them in the concrete subclass `ConcretePyObjectHolder` (which is only built when Python support is enabled).
I am a bit worried about the performance toll of this (pickling isn't exactly known to be cheap) but I think we should start by providing a functionally complete API. We already have ideas on how to make this faster if needed, for example by having users provide a custom DataPtr extractor tailored to their class via a decorator. (Or just use TorchScript).
ghstack-source-id: 127295014
Test Plan: Added a test later in the stack
Reviewed By: mrshenli
Differential Revision: D27887189
fbshipit-source-id: 9d27e4e62390b836e5bb4f06f401cc002f0cf95b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56517
Currently a torch.futures.Future could wrap a CUDAFuture, but it could not create one from scratch. This prevented users from using CUDAFutures in some occasions, for example when using `rpc.functions.async_execution`, or in their own code. I don't see any reason for such a limitation, hence here I add support for this.
ghstack-source-id: 127261554
Test Plan: Added a test later in the stack
Reviewed By: mrshenli
Differential Revision: D27887190
fbshipit-source-id: ecbb39c1ad7cd189d478ded9c361448f05a270ad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56172
Enable the standardOps include **Add\Sub\Mul\Div\Gemm\Pow\Mod** with low precision input in ORT
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D27866136
Pulled By: SplitInfinity
fbshipit-source-id: f2cf5649fffefd68c0cc7b6dce94198751636727
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120
This reverts commit ad17fadbfc (D27786457).
The big annoyance here is that depending on the threading mode you may not be
able to toggle num_threads at will, so the fusion tests won't fail.
I hate this solution, but I'm adding a secondary override for the TE fuser.
Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're
OK if you're running with 1 thread, or you can add
`_jit_set_texpr_parallel_cpu_enabled` to enable it anyways.
This is (a) mainly for tests, since a real user probably won't fiddle aimlessly
with the thread count, and (b) will go away once NNC's threading support is
fully baked.
Test Plan: Imported from OSS
Reviewed By: Krovatkin
Differential Revision: D27788199
Pulled By: bertmaher
fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55799
I'm going to change the implementation of cdata soon so I need to
abstract over cdata access with a function. Additionally, many
users are casting manually casting to THPVariable to access
the member so I can remove these unsafe casts in the client code
(the implementation, of course, is still doing an unsafe cast.)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D27712130
Pulled By: ezyang
fbshipit-source-id: 95fcc013bf3913d67f2c634068eb5b3aab144cb3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54619
Minor refactor to conv batchnorm folding to work on other functions besides forward
ghstack-source-id: 125767010
Test Plan: unit test and {P339453712}
Reviewed By: kimishpatel
Differential Revision: D27301452
fbshipit-source-id: 4e0cc544a171a970583979a496b2908935124497
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55124
**Summary**
This commit modifies type inference (used by the module scripting code)
so that it tries to script the type of any class instances that it
encounters. This enables recursive, automatic scripting of class type
module attributes.
**Test Plan**
This commit adds a test case for this to `TestClassType`.
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D23971883
Pulled By: SplitInfinity
fbshipit-source-id: 7a5a2e7c12ee68cbdeb0a07e6aaf98734a79cb06
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54476
Per title. For `add_done_callback`, we log but swallow exceptions in order to keep consistent with what concurrent.futures python library does, see discussion in https://github.com/pytorch/pytorch/pull/45675.
Although, it would be good to improve the verbosity here as this can be a source of confusion if users are setting a different future via `add_done_callback`, and an error is hit resulting in an unexpected hang (see https://github.com/pytorch/pytorch/issues/52132 for more details on how this can happen).
ghstack-source-id: 125300389
Test Plan: CI
Reviewed By: lw
Differential Revision: D27253004
fbshipit-source-id: 72ed21c8fb6d27de5797c17fc46b762f893e6fea
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54915
TorchScript and torch.package have different mangling schemes. To avoid
them interfering with each other, we should undo the torch.package
mangling before processing anything with TorchScript (since TS
independently makes sure that no names collide).
Test Plan: Imported from OSS
Reviewed By: SplitInfinity
Differential Revision: D27410472
Pulled By: suo
fbshipit-source-id: d1cc013c532d9abb7fb9615122bc465ded4785bb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52881
**This PR adds:**
1. logic to parse complex constants (complex literals of the form `bj`)
2. logic to parse complex lists
3. support for complex constructors: `complex(tensor/int/float/bool, tensor/int/float/bool)`
4. Limited operator support
- `add`, `sub`, `mul`, `torch.tensor`, `torch.as_tensor`
**Follow-up work:**
1. Add complex support for unary and other registered ops.
2. support complex constructor with string as input (this is supported in Python eager mode).
3. Test all emitXYZ for all XYZ in `ir_emitter.cpp` (currently only emitConst, emitValueToTensor are tested). e.g., test loops etc.
4. onnx doesn't support complex tensors, so we should error out with a clear and descriptive error message.
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D27245059
Pulled By: anjali411
fbshipit-source-id: af043b5159ae99a9cc8691b5a8401503fa8d6f05
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53314
Introduction of api for optimizing non forward functions for mobile. As of this diff, all functions that you say to optimize will be preserved, and those functions will be run through canonical optimization. The intention is to stack each further optimization onto separate diffs since they touch multiple files, and it seems like it'd be a nightmare to review.
ghstack-source-id: 123909414
Test Plan:
torch.utils.mobile_optimizer.optimize_for_mobile(net, methods_to_optimize=["forward", "foo"]) runs fine
torch.utils.mobile_optimizer.optimize_for_mobile(net, methods_to_optimize={"foo"}) optimizes just foo if the model doesnt define forward otherwise optimizes foo and forward
torch.utils.mobile_optimizer.optimize_for_mobile(net, methods_to_optimize=["forward"]) runs fine
torch.utils.mobile_optimizer.optimize_for_mobile(net) runs fine if the model defines forward, Throws otherwise
Reviewed By: kimishpatel
Differential Revision: D26618689
fbshipit-source-id: 5bff1fb3f3f6085c4a649a8128af9c10f0fa9400
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53306
* [ONNX] Fix for sequence of mutations in blocks (#51577)
Fixes consecutive mutations in a tensor inside blocks.
Also, support append and pop in blocks.
* Support inplace operations + indexing
* Clean up old pass for remove mutations
* Add loop test
* Fixes for set attr in loops
* Removing the new jit API flag
* [ONNX] Redesign onnx pass to enable shape type dependent pattern conversion - cont (#51795)
With the introduction of ONNX shape inference, shape and type are inferred on the fly as operators get converted from ATen to ONNX when running symbolic function. This resolves the shape/type requirement for the symbolic functions. The pre-onnx passes however, can not be supported by shape inference, since at that stage the operators in the graph are still ATen operators.
This PR is to update the design of ONNX pass, to enable a mechanism of capturing subgraphs of ATen operators of certain patterns, and convert them later, when shape/type information of upstream operators are available.
The new design will require pre-onnx passes that need shape/type to be written in two parts, encapsulation and conversion.
The encapsulation part will find the nodes of patterns, like how pre-onnx passes were written previously. But instead of converting the nodes, it will encapsulate them into a sub-block of a new placeholder node. This part is called before onnx pass, so it runs before calling symbolic functions.
The conversion part will be called inside the onnx pass. In onnx pass, run_symbolic_func will be called for each node in topological order. When it reaches the placeholder node, the conversion part will be invoked. It will convert the nodes inside the sub-block based on pattern. By that time, it will have shape/type of upstream operators available. After the conversion is complete, the placeholder node will be removed, and nodes inside its sub-block converted. Run_symbolic_func will be called for these nodes, and they will be converted from ATen operator to ONNX operator.
This PR includes several other fixes, listed below.
* ~~replace helper.cpp with onnx_utils.cpp for holding utility functions.~~
* fix EraseNumberTypes on Bool type, the code was outdated that back then Bool type doesn't exist.
* ~~enable onnx shape inference in export with parameter/initializer data.~~
* other code clean ups.
* fix insertion of identity nodes for loop opset 13 sequence output.
~~PR depends on #51603~~
* Fix after merge
* clang
* Fix clang
* Fix clang
* Fix warning message.
* Fixes for non-model param attributes
* Fix for caffe2
* Additional test
* clang
* Skip test for lower opsets
* fix clang-tidy
* Update init.cpp
* Update remove_inplace_ops_for_onnx.cpp
* Update remove_inplace_ops_for_onnx.cpp
* Update remove_inplace_ops_for_onnx.cpp
* Fix for clang formatting
Test Plan: Imported from OSS
Reviewed By: pbelevich, malfet
Differential Revision: D26922416
Pulled By: SplitInfinity
fbshipit-source-id: e7108620b39b6404c594910786c4d275fee59d84
Co-authored-by: Bowen Bao <bowbao@microsoft.com>
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53304
With the introduction of ONNX shape inference, shape and type are inferred on the fly as operators get converted from ATen to ONNX when running symbolic function. This resolves the shape/type requirement for the symbolic functions. The pre-onnx passes however, can not be supported by shape inference, since at that stage the operators in the graph are still ATen operators.
This PR is to update the design of ONNX pass, to enable a mechanism of capturing subgraphs of ATen operators of certain patterns, and convert them later, when shape/type information of upstream operators are available.
The new design will require pre-onnx passes that need shape/type to be written in two parts, encapsulation and conversion.
The encapsulation part will find the nodes of patterns, like how pre-onnx passes were written previously. But instead of converting the nodes, it will encapsulate them into a sub-block of a new placeholder node. This part is called before onnx pass, so it runs before calling symbolic functions.
The conversion part will be called inside the onnx pass. In onnx pass, run_symbolic_func will be called for each node in topological order. When it reaches the placeholder node, the conversion part will be invoked. It will convert the nodes inside the sub-block based on pattern. By that time, it will have shape/type of upstream operators available. After the conversion is complete, the placeholder node will be removed, and nodes inside its sub-block converted. Run_symbolic_func will be called for these nodes, and they will be converted from ATen operator to ONNX operator.
This PR includes several other fixes, listed below.
* ~~replace helper.cpp with onnx_utils.cpp for holding utility functions.~~
* fix EraseNumberTypes on Bool type, the code was outdated that back then Bool type doesn't exist.
* ~~enable onnx shape inference in export with parameter/initializer data.~~
* other code clean ups.
* fix insertion of identity nodes for loop opset 13 sequence output.
~~PR depends on #51603~~
Test Plan: Imported from OSS
Reviewed By: SplitInfinity
Differential Revision: D26922417
Pulled By: malfet
fbshipit-source-id: 14ed06158d539e2451c2e5e63ba1b32fb0f75095
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53410
**Summary**
This commit enables indexing into `ModuleList` using a non-literal
index if the LHS of the assignment statement of which the indexing is
the RHS is annotated with an interface type.
This feature already exists for `ModuleDict`, and this commit builds on
top of that implementation. A `prim::ModuleContainerIndex` operator is
emitted for any statement of the form `lhs: InterfaceType =
module_container[idx]`. The same operator has to be used for both
`ModuleDict` and `ModuleList` because serialization does not preserve
the metadata that indicates whether a `Module` is a `ModuleDict` or
`ModuleList`.
**Testing**
This commit extends the existing unit tests for non-literal `ModuleDict`
indexing to test non-literal `ModuleList` indexing.
**Fixes**
This commit fixes#47496.
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D26857597
Pulled By: SplitInfinity
fbshipit-source-id: d56678700a264d79aae3de37ad6b08b080175f7c
Summary:
The code uses `torch::jit::jit_log_prefix` for handling recursive
indenting in most places in this function. There was one place that was
using "level", but it was buggy -- it would result in a compounding
superlinear indent. Note that changing it to "level+1" doesn't fix the
bug.
Before/after:
https://gist.github.com/silvasean/8ee3ef115a48de6c9c54fbc40838d8d7
The new code establishes a recursive invariant for
`Module::dump_to_str`: the function returns the module printed at the
base indent level (i.e. no indent). `torch::jit:log_prefix` is used
to prefix recursive calls. The code was already nearly there, except for
this spurious use of "level".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52539
Reviewed By: navahgar
Differential Revision: D26773657
Pulled By: gmagogsfm
fbshipit-source-id: ab476f0738bf07de9f40d168dd038dbf62a9a79e
Summary:
This PR adds an implementation for `aten::cat` in NNC without any conditionals. This version is not enabled by default.
Here is the performance of some micro benchmarks with and without conditionals. There is up to 50% improvement in performance without conditionals for some of the shapes.
aten::cat implementation in NNC **with** conditionals
```
$ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion concat
pt: concat2d2input_fwd_cpu_1_160_1_14_1: 5.44 us, SOL 0.26 GB/s, algorithmic 0.51 GB/s
pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.75 us, SOL 1.05 GB/s, algorithmic 2.10 GB/s
pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.87 us, SOL 4.05 GB/s, algorithmic 8.11 GB/s
pt: concat2d2input_fwd_cpu_20_580_20_174_1: 14.52 us, SOL 8.31 GB/s, algorithmic 16.62 GB/s
pt: concat2d2input_fwd_cpu_8_512_8_512_1: 9.58 us, SOL 6.84 GB/s, algorithmic 13.68 GB/s
```
aten::cat implementation in NNC **without** conditionals
```
$ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion --cat_wo_conditionals concat
pt: concat2d2input_fwd_cpu_1_160_1_14_1: 4.67 us, SOL 0.30 GB/s, algorithmic 0.60 GB/s
pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.65 us, SOL 1.07 GB/s, algorithmic 2.14 GB/s
pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.10 us, SOL 4.56 GB/s, algorithmic 9.12 GB/s
pt: concat2d2input_fwd_cpu_20_580_20_174_1: 7.44 us, SOL 16.22 GB/s, algorithmic 32.44 GB/s
pt: concat2d2input_fwd_cpu_8_512_8_512_1: 6.46 us, SOL 10.14 GB/s, algorithmic 20.29 GB/s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53128
Reviewed By: bertmaher
Differential Revision: D26758613
Pulled By: navahgar
fbshipit-source-id: 00f56b7da630b42bc6e7ddd4444bae0cf3a5780a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51564
Constructor logic was spread throughout InferenceModule and StaticRuntime. This diff unifies the two. After a lot of discussion on this diff D25961626 it became apparent that `clone` is uglier than a cheap StaticRuntime.
This means StaticRuntime is effectively StaticModule and the only code in the new StaticRuntime is the `run` functions.
```
graph, schema = PrepareForStaticModule(torchscript_module)
sm = StaticModule(graph, schema, options)
sm(inputs)
// or create many cheap runtimes with the module
sr = StaticRuntime(sm)
sr(inputs)
```
Changelist:
- Rename InferenceModule StaticModule
- Move all logic for construction into StaticModule
- Create a new StaticRuntime that only has a unique memory planner (everything else is in StaticModule)
- Update comments with explanation
- Propagate all changes to predictor integration
- Propagate all changes to python integration
- Change semantics to be a bit more PyTorch-standard (no "run" calls, no "get_" getters).
Test Plan:
buck test //caffe2/test:static_runtime
buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest
Reviewed By: hlu1
Differential Revision: D25592967
fbshipit-source-id: 8233bed03137ce129137af2d44bce0095033ef0f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50670
This PR adds property support to Torchbind. There are two cases that it needs to work:
**Torchscript**
Inside Torchscript, we don't go through pybind so there is no issue with accessing properties through ClassType.
**Eager Mode**
In Eager Mode, Torchbind creates ScriptObject which we cannot dynamically add (aka access) properties after initializing it. (https://stackoverflow.com/questions/1325673/how-to-add-property-to-a-class-dynamically
) Therefore we created a Python wrapper (ScriptObjectWrapper) around ScriptObject where we can use property method to set properties. By doing so, we can look up wrapped object's property through __getattr__ method of the ScriptObjectWrapper. This logic is inspired from https://github.com/pytorch/pytorch/pull/44324
Test Plan:
test cases in test_torchbind.py
Imported from OSS
Reviewed By: pbelevich
Differential Revision: D26632781
fbshipit-source-id: dd690887cfda0c48ff0d104aa240ce0ab09055bc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51483
This PR moves the conv weights of a frozen model to MKLDNN, and AOT reorders the weights. When the weights are already in MKLDNN, just computing a single conv by converting the input and output from/to mkldnn provides large speedups. I benchmark'd the results of the top 200 shapes in predictor [here](https://www.internalfb.com/phabricator/paste/view/P171537938), as well as verified that it sped up popular models in torchvision.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D26696703
Pulled By: eellison
fbshipit-source-id: 0b4441bee4f6e0890a4540fbca3bb5e58b8c5adf
Summary:
This is a second attempt to use graph executor to run forward on a gradient. This allows a secondary chance to profile intermediate tensor introduced by autodiff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52136
Reviewed By: pbelevich
Differential Revision: D26693978
Pulled By: Krovatkin
fbshipit-source-id: 91dde8009a210950af8e5173668ada241e16dd52
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52910
**Summary**
PR #52158 tried to move all JIT bindings from `torch._C` to a new
submodule `torch._C._jit`, but that...did not go well. This pull request
adds the new `torch._C._jit` submodule, but does not migrate the
existing bindings. Instead, it adds a unit test that fails if any new
bindings are added to `torch._C`. A comment in the test instructs
developers to add their new binding to the allowlist if it really should
be in `torch._C`, or to add it to the appropriate submodule (e.g
`torch._C._jit`, for example). The idea is to prevent the issue
described in #51691 from getting *worse* if it cannot be fixed.
**Test Plan**
Continuous integration.
**Fixes**
This commit fixes#51691.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D26698373
Pulled By: SplitInfinity
fbshipit-source-id: ec9f5426051227a513d4fd09512b624420e0100b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51754
This API allows you to manage multiple python interpreters in a single
process to deploy PyTorch models packaged with torch.package.
torch/csrc/deploy/deploy.h contains the API definition
torch/csrc/deploy/test_deploy.cpp has some examples.
Notes:
* mutex is added to PyTorchStreamReader to make it safe to use from multiple threads at once.
* USE_DEPLOY is only true for the special libtorch_deployinterpreter.so library, when enabled
we use a hash table to maintain PyObject <> at::Tensor mappping rather than the internal pointer
in Tensor since >1 interpreter may have a reference to the tensor.
* serialization.py has some additional functions for creating pickle objects
but keeping storages in memory for use transfering tensors between interpreters
Test Plan: Imported from OSS
Reviewed By: wconstab
Differential Revision: D26329468
Pulled By: zdevito
fbshipit-source-id: d75f4ebb9a27f1d911179d9996041bcb3ca04a07
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51177
**Summary**
This commit adds support for static methods to TorchBind. Just like
pybind, the API for declaring a static method is `def_static(...)`. A
static method must be called on the class directly, and can be called
both in Python as well as TorchScript.
Support for static methods is implemented in a manner similar to that of
instance methods. Registered static functions are wrapped in a layer of
unboxing logic, their schemas are inferred using templates and
metaprogramming, and they are added to the `ClassType` object
corresponding to the TorchBind class on which they are registered.
ScriptClass has been extended to support a `__getattr__` function so
that static methods of TorchBind classes can be invoked in Python. The
implementation of `__getattr__` returns `ScriptClassFunctionPtr`, a
version of `StrongFunctionPtr` without a compilation unit (since the
functions of a TorchBind class live inside the TorchBind registry).
Within TorchScript, TorchBind static functions are desugared in
`PythonClassValue::attr` by looking them up on the class type of the
`PythonClassValue` instance.
**Test Plan**
This commit adds a unit test that tests a simple static method on a
TorchBind class.
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D26356942
Pulled By: SplitInfinity
fbshipit-source-id: 1b6a9bc2e5f3e22071ad78e331a0201fbbf7ab30
Summary:
Previously `torch.jit.trace` relies on AutoGrad hooks to infer name of tensors in computation, including those of function/method arguments. This often doesn't work out because:
- These names often do not exist
- Tracer uses argument name of first tensor operation on each tensor as inferred argument names. These tensor operations have programmatically-generated names like `argument_1`
This PR extracts argument names directly from Python functions and pass them down to tracer, which then assigns them to correct graph inputs. This way, we always have the correct argument names captured in IR.
This is useful for both debugging and supporting using `InterfaceType` to represent traced modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51775
Reviewed By: izdeby
Differential Revision: D26273105
Pulled By: gmagogsfm
fbshipit-source-id: 934a385041137dc3731bb6fa8657b11532fed9e5
Summary:
Previously, the graph might have been delete while Python still has iterators, leading to segfaults.
This does not fully work for iterators from Nodes and Blocks as they may be invalidated when the owning graph goes out of scope. I will look into these separately.
Fixes https://github.com/pytorch/pytorch/issues/50454
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51951
Reviewed By: mrshenli
Differential Revision: D26352629
Pulled By: SplitInfinity
fbshipit-source-id: 67299b6cbf1ac7ab77f8703a0ca8f1162e03fcd4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51826
Looks like this:
```
resnet.pt
├── .data # Data folder named so it can't clash with torch.package codemodules.
│ │ # Names/extensions automatically added to avoid namingconflicts.
│ ├── 94286146172688.storage # tensor data
│ ├── 94286146172784.storage
│ ├── extern_modules # torch.package metadata
│ ├── version # version metadata
│ └── ...
├── model # package pickled model created w/
│ │ # exporter.save_pickel('model','model.pkl', resnet_model)
│ └── model.pkl
└── torchvision # all code dependencies for packaged picked
└── models # models are captured as source files
├── resnet.py
└── utils.py
```
Since `version` is hardcoded in our zip reader/writer implementation,
add it as an option that defaults to "version" but accepts other
locations for putting the version metadata.
Test Plan: Imported from OSS
Reviewed By: zdevito
Differential Revision: D26295649
Pulled By: suo
fbshipit-source-id: 2d75feeb7de0f78196b4d0b6e2b814a7d58bd1dd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50983
There is currently no way to handle/propagate errors with the python-based futures API (they are raised correctly if set with an error, but this is only possible from C++).
This diff allows the Future's `unwrap_func` to be set in python optionally, so users can set futures completed with an exception and the error will throw as expected. This is mostly to support the following use case in the next diff:
```
ret_fut = torch.futures.Future(unwrap_func = lambda python_result: {
# throw exception if needed
if isinstance(python_result, Exception):
throw python_result
})
rpc_fut = rpc.rpc_async(...) # RPC future that times out
# Goal is to propagate RPC error to this future
rpc_fut.add_done_callback(
res => {
# Note that ret_fut.set_result(res.wait()) won't propagate the error
try:
ret_fut.set_result(res.wait())
except Exception as e:
ret_fut.set_result(e)
}
)
```
ghstack-source-id: 121021434
Test Plan:
unittest
```
buck test mode/dev-nosan mode/no-gpu //caffe2/test:futures -- te
st_unwrap --print-passing-details
```
Reviewed By: mrshenli
Differential Revision: D25950304
fbshipit-source-id: 7ee61e98fcd783b3f515706fa141d538e6d2174d
Summary:
Previously we might have gotten segfaults and all, now it raises an exception.
Thread safety hasn't been an objective.
I have a followup to expand the Python interface for the API.
Fixes https://github.com/pytorch/pytorch/issues/49969.
wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50326
Reviewed By: pbelevich
Differential Revision: D26096234
Pulled By: gmagogsfm
fbshipit-source-id: 5425772002eb4deb3830ed51eaa3964f22505840
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51340
**Summary**
`toIValue` assumes that any value passed for an argument of type
`torch.device` is a valid device object, even when it is not. This can
lead to device type arguments of functions being assigned incorrect
values (see #51098).
This commit adds an explicit check that the passed in object is indeed a
`torch.device` using `THPDevice_Check` and only then does is it
converted to an `IValue`. Since implicit conversion from strings to
devices is generally allowed, if `THPDevice_Check` fails, it is assumed
that the object is a string and an `IValue` containing a `c10::Device`
containing the passed in string is returned.
**Test Plan**
This commit adds a unit test to `test_jit.py` to test that invalid
strings passed as devices are not longer silently accepted.
**Fixes**
This commit fixes#51098.
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D26187190
Pulled By: SplitInfinity
fbshipit-source-id: 48c990203431da30f9f09381cbec8218d763325b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51312
Follow up to D24690094 (4a870f6518) exposing the api in python. Created matching unit test.
ghstack-source-id: 120611452
Test Plan: Ran unit test
Reviewed By: dhruvbird
Differential Revision: D26112765
fbshipit-source-id: ffe3bb97de0a4f08b31719b4b47dcebd7d2fd42a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50870
**Summary**
Module attributes whose types cannot be determined based on annotations
or inference based on their values at script time are added to the
concrete type of the corresponding module as "failed attributes". Any
attempt to access them in scripted code produces an error with a message
explaining that the attribute could not be contributed to a
corresponding attribute on the TorchScript module. However, this error
is not more specific than that.
This commit modifies `infer_type` in `_recursive.py` so that it returns
`c10::InferredType` instead, which allows more information about typing
failures to be communicated to the caller through the `reason()` method
on this class. This information is appended to the hint added to the
module concrete type for failed attributes.
**Testing**
This commit adds a unit test to `test_module_containers.py` that checks
that extra information is provided about the reason for the failure
when a module attribute consisting of a list of `torch.nn.Module` fails to convert.
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D26091472
Pulled By: SplitInfinity
fbshipit-source-id: fcad6588b937520f250587f3d9e005662eb9af0d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50905
Adds an additional run of onnx shape inference after constant folding, since initializer may have changed and affected shape inference.
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D26050881
Pulled By: SplitInfinity
fbshipit-source-id: 9e5d69c52b647133cd3a0781988e2ad1d1a9c09d
Summary:
`ResolutionCallback` returns `py::object` (i.e. `Any`) rather than `py::function` (i.e. `Callable`)
Discovered while debugging test failures after updating pybind11
This also makes resolution code slightly faster, as it eliminates casts from object to function and back for every `py::object obj = rcb_(name);` statement.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51089
Reviewed By: jamesr66a
Differential Revision: D26069295
Pulled By: malfet
fbshipit-source-id: 6876caf9b4653c8dc8e568aefb6778895decea05
Summary:
This simplifies our handling and allows passing CompilationUnits from Python to C++ defined functions via PyBind easily.
Discussed on Slack with SplitInfinity
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50614
Reviewed By: anjali411
Differential Revision: D25938005
Pulled By: SplitInfinity
fbshipit-source-id: 94aadf0c063ddfef7ca9ea17bfa998d8e7b367ad
Summary:
Handle sequence output shape and type inference.
This PR fixes value type of sequence outputs. Prior to this, all model sequence type outputs were unfolded for ONNX models.
This PR also enable shape inference for sequence outputs to represent the dynamic shape of these values.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46542
Reviewed By: ezyang
Differential Revision: D24924236
Pulled By: bzinodev
fbshipit-source-id: 506e70a38cfe31069191d7f40fc6375239c6aafe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49698
Reincarnation of #47620 by jamesr66a.
It's just an initial bunch of things that we're exposing to python, more
is expected to come in future. Some things can probably be done better,
but I'm putting this out anyway, since some other people were interested
in using and/or developing this.
Differential Revision: D25668694
Test Plan: Imported from OSS
Reviewed By: bertmaher
Pulled By: ZolotukhinM
fbshipit-source-id: fb0fd1b31e851ef9ab724686b9ac2d172fa4905a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228
`fastmod -m 'expect(<((at|c10)::)?\w+Type>\(\)\s*)->'
'expectRef${1}.'`
Presuming it builds, this is a safe change: the result of `expect()`
wasn't being saved anywhere, so we didn't need it, so we can take a
reference instead of a new `shared_ptr`.
ghstack-source-id: 119782961
Test Plan: CI
Reviewed By: SplitInfinity
Differential Revision: D25837374
fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50222
This PR adds a pass which runs a set of optimizations to be done after freezing. Currently this encompasses Conv-BN folding, Conv->Add/Sub/Mul/Div folding and i'm also planning on adding dropout removal.
I would like some feedback on the API. torch.jit.freeze is technically in \~prototype\~ phase so we have some leeway around making changes. I think in the majority of cases, the user is going to want to freeze their model, and then run in inference. I would prefer if the optimization was opt-out instead of opt-in. All internal/framework use cases of freezing all use `freeze_module`, not the python API, so this shouldn't break anything.
I have separated out the optimization pass as a separate API to make things potentially modular, even though I suspect that is an unlikely case. In a future PR i would like to add a `torch::jit::freeze` which follows the same api as `torch.jit.freeze` intended for C++ use, and runs the optimizations.
Test Plan: Imported from OSS
Reviewed By: tugsbayasgalan
Differential Revision: D25856264
Pulled By: eellison
fbshipit-source-id: 56be1f12cfc459b4c4421d4dfdedff8b9ac77112
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50075
Adds Conv - Add/Sub/Mul/Div fusion for frozen models. This helps cover models like torchvision maskrcnn, which use a hand-rolled batchnorm implementation: 90645ccd0e/torchvision/ops/misc.py (L45).
I haven't tested results yet but I would expect a somewhat similar speed up as conv-bn fusion (maybe a little less).
Test Plan: Imported from OSS
Reviewed By: tugsbayasgalan
Differential Revision: D25856265
Pulled By: eellison
fbshipit-source-id: 2c36fb831a841936fe4446ed440185f59110bf68
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50074
Adds Conv-BN fusion for models that have been frozen. I haven't explicitly tested perf yet but it should be equivalent to the results from Chillee's PR [here](https://github.com/pytorch/pytorch/pull/476570) and [here](https://github.com/pytorch/pytorch/pull/47657#issuecomment-725752765). Click on the PR for details but it's a good speed up.
In a later PR in the stack I plan on making this optimization on by default as part of `torch.jit.freeze`. I will also in a later PR add a peephole so that there is not conv->batchnorm2d doesn't generate a conditional checking # dims.
Zino was working on freezing and left the team, so not really sure who should be reviewing this, but I dont care too much so long as I get a review �
Test Plan: Imported from OSS
Reviewed By: tugsbayasgalan
Differential Revision: D25856261
Pulled By: eellison
fbshipit-source-id: da58c4ad97506a09a5c3a15e41aa92bdd7e9a197
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50255
**Summary**
TorchScript classes are copied attribute-by-attribute from a py::object into
a `jit::Object` in `toIValue`, which is called when copying objects from
Python into TorchScript. However, if an attribute of the class cannot be
converted, the error thrown is a standard pybind error that is hard to
act on.
This commit adds code to `toIValue` to convert each attribute to an
`IValue` inside a try-catch block, throwing a `cast_error` containing
the name of the attribute and the target type if the conversion fails.
**Test Plan**
This commit adds a unit test to `test_class_type.py`
based on the code in the issue that commit fixes.
**Fixes**
This commit fixes#46341.
Test Plan: Imported from OSS
Reviewed By: pbelevich, tugsbayasgalan
Differential Revision: D25854183
Pulled By: SplitInfinity
fbshipit-source-id: 69d6e49cce9144af4236b8639d8010a20b7030c0
Summary:
=======
This PR addresses the following:
* Adds JIT support for CUDA Streams
* Adds JIT support for CUDA Events
* Adds JIT support for CUDA Stream context manager
Testing:
======
python test/test_jit.py -v TestCUDA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48020
Reviewed By: navahgar
Differential Revision: D25725749
Pulled By: nikithamalgifb
fbshipit-source-id: b0addeb49630f8f0c430ed7badeca43bb9d2535c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48840
The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.
This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.
In https://github.com/pytorch/pytorch/pull/48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.
In my opinion, this approach is just brilliant! Thank wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.
ghstack-source-id: 118704935
Test Plan: Unit tests
Reviewed By: wanchaol
Differential Revision: D25334355
fbshipit-source-id: 3f1d3bf6e6e8505a114c877fb9a6fcc3f68d91d3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49009
As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
https://github.com/pytorch/pytorch/pull/41287
Added a unittest that validates the error that comes from `jit/pybind_utils.h`.
ghstack-source-id: 118794661
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D25392905
fbshipit-source-id: 6f93251635740bcf902824548b2bc6f9249be5f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49146
Add support for Storage arguments to IValue and the JIT typing system, and make ops that were blocked on that c10-full.
ghstack-source-id: 118710665
(Note: this ignores all push blocking failures!)
Test Plan: waitforsandcastle
Reviewed By: ezyang
Differential Revision: D25456799
fbshipit-source-id: da14f125af352de5fcf05a83a69ad5a69d5a3b45
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48502
This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed).
---
FutureNCCL restricted the values to be tensors, or (singleton) lists of tensors, or Python object that could be converted to either of those types. We need a CUDA future that can handle more generic types though.
The main challenge is extracting all DataPtrs from an arbitrary object. I think I found some ways of doing so, but I'd like some JIT experts to look into this and tell me if there are better ways. I'll add inline comments for where their input would be appreciated.
ghstack-source-id: 118180026
Test Plan: Unit tests (I should probably add new ones)
Reviewed By: wanchaol
Differential Revision: D25177562
fbshipit-source-id: 1ef18e67bf44543c70abb4ca152f1610dea4e533
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48495
This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed).
---
PythonFutureWrapper needs to provide a GIL-aware way to extract tensors from an IValue of type PyObject. Since this was only used by FutureNCCL it was guarded by #ifdef USE_C10D_NCCL. However, we will need to use it with CUDA-aware futures other than the NCCL one. This might have been achieved simply by replacing USE_C10D_NCCL with USE_CUDA, but I wanted to clean this up better.
We're dealing with two independent dimensions: C++-vs-Python and CPU-vs-CUDA. To make the code more modular, the two dimensions should be dealt with by orthogonal solutions: the user setting a custom callback to handle Python, and the subclass being CUDA-aware. Mixing these two axes makes it more complicated.
Another reason for changing how this works is that later on, when we'll introduce multi-device support, we'll need to extract dataptrs for other reasons too (rather than just recording streams with the caching allocator), namely to inspect the value to determine which devices it resides on.
ghstack-source-id: 118180038
Test Plan: Unit tests
Reviewed By: mrshenli
Differential Revision: D25177560
fbshipit-source-id: 3a424610c1ea191e8371ffee0a26d62639895884
Summary:
`isCompleteTensor()` only returns true when both scalar type and shape is present. All dimensions in the shape must be static. This high requirement is unnecessary for many use cases such as when only rank or scalar type needs to be known.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48162
Reviewed By: malfet
Differential Revision: D25340823
Pulled By: bzinodev
fbshipit-source-id: 1fef61f44918f4339dd6654fb725b18cd58d99cf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48876
**Summary**
This commit adds support for `del` statements with multiple targets.
Targets are deleted left-to-right just like Python.
**Test Plan**
This commit updates the `TestBuiltins.test_del_multiple_operands` unit
test to actually test that multiple deletion works instead of asserting
that an error is thrown.
**Fixes**
This commit fixes#48635.
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D25386285
Pulled By: SplitInfinity
fbshipit-source-id: c0fbd8206cf98b2bd1b695d0b778589d58965a74
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48732
add support for ScriptObject as attributes in symbolic trace.
Test Plan: OSS CI
Reviewed By: jamesr66a
Differential Revision: D25116185
fbshipit-source-id: c61993c84279fcb3c91f1d44fb952a8d80d0e552
Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/5062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45556
User defined classes can be used as constants. This is useful when freezing and removing the module from the graph.
Test Plan: waitforsadcastle
Reviewed By: eellison
Differential Revision: D23994974
fbshipit-source-id: 5b4a5c91158aa7f22df39d71f2658afce1d29317
Summary:
Fix for https://github.com/pytorch/pytorch/issues/46122
For `Any`, we infer the type of the ivalue to set the ivalue's type tag. When we saw a Tensor, we would use a specialized Tensor type, so when `Dict[str, Tensor]` was passed in as any `Any` arg it would be inferred as `Dict[str, Float(2, 2, 2, 2)]` which breaks runtime `isinstance` checking.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46130
Reviewed By: glaringlee
Differential Revision: D24261447
Pulled By: eellison
fbshipit-source-id: 8a2bb26ce5b6c56c8dcd8db79e420f4b5ed83ed5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47586
relanding PR of https://github.com/pytorch/pytorch/pull/44492, and add
additional Capsule related wrapping to ensure we still have the correct
type in pybind11 to resolve Capsule as torch._C.CapsuleType
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D24822519
Pulled By: wanchaol
fbshipit-source-id: eaaea446fb54b56ed3b0d04c31481c64096e9459
Summary:
Original commit changeset: b9796e15074d
have weird issue happening with custom class + recursive scripting, unland this first to figure out more details
Test Plan: wait for sandcastle
Reviewed By: zhangguanheng66
Differential Revision: D24780498
fbshipit-source-id: 99a937a26908897556d3bd9f1b2b39f494836fe6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47300
**Summary**
This commit augments the module interface subtyping check that is done
before the emission of the `prim::ModuleDictIndex` operator so that the
error message that is printed if the subtyping check fails provides more
information on which methods do not match.
**Test Plan**
Existing unit tests for `prim::ModuleDictIndex`. Compilation of `ModWithWrongAnnotation` now produces this error:
```
Attribute module is not of annotated type __torch__.jit.test_module_containers.ModuleInterface: Method on class '__torch__.jit.test_module_containers.DoesNotImplementInterface' (1) is not compatible with interface '__torch__.jit.test_module_containers.ModuleInterface' (2)
(1) forward(__torch__.jit.test_module_containers.DoesNotImplementInterface self, Tensor inp) -> ((Tensor, Tensor))
(2) forward(InterfaceType<ModuleInterface> self, Any inp) -> (Any)
:
```
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D24709538
Pulled By: SplitInfinity
fbshipit-source-id: 6b6cb75e4b2b12b08576a5530b4b90cbcad9b6e5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45716
**Summary**
This commit enables indexing into `ModuleDict` using a non-literal
index if the `ModuleDict` is annotated with `Dict[str, X]`, where `X` is
a module interface type. These annotations must be expressed using a
class attribute named `__annotations__`, which is a `Dict[str, Type]`
where the keys are the names of module attributes and the values are
their types.
The approach taken by this commit is that these annotations are stored
as "hints" along with the corresponding module attributes in the
`ConcreteSubmoduleTypeBuilder` instance for each module (which might be
a `ModuleDict`). These hints are passed into the `ModuleValue` that is
created for desugaring operations on submodules so that indexing into a
`ModuleDict` can be emitted as a getitem op into a dict emitted into the
graph that represents the `ModuleDict`.
**Test Plan**
This commit adds unit tests to `TestModuleContainers` to test this
feature (`test_typed_module_dict`).
Differential Revision: D24070606
Test Plan: Imported from OSS
Reviewed By: ansley
Pulled By: SplitInfinity
fbshipit-source-id: 6019a7242d53d68fbfc1aa5a49df6cfc0507b992
Summary:
Enables the use of NoneType arguments to inputs tuple in the export API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45792
Reviewed By: heitorschueroff
Differential Revision: D24312784
Pulled By: bzinodev
fbshipit-source-id: 1717e856b56062add371af7dc09cdd9c7b5646da
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46686
I was trying to page this code back in after a while and some things
stuck out as unnecessarily confusing.
1. Improve documentation of closures and fork stuff to be more accurate
to how we use them today.
2. Change `prim::LocalVariableScope` to `prim::ListComprehension`. It is
only ever used for a list comprehensions, and in general the nodes
emitted by `ir_emitter` should correspond to concrete operations or
language features rather than semantic constraints.
3. Change the somewhat mysterious "inputs" and "attributes" argument
names throughout the codebase to be the more obvious "args" and "kwargs"
that they generally represent (I think "inputs" and "attributes" come
from the AST naming).
Test Plan: Imported from OSS
Reviewed By: navahgar, jamesr66a
Differential Revision: D24464197
Pulled By: suo
fbshipit-source-id: 1f4b1475b58b5690a0b204e705caceff969533b4
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46779
Test Plan: Used it in some notebooks.
Reviewed By: suo
Differential Revision: D24574005
Pulled By: dreiss
fbshipit-source-id: 78ba7a2bdb859fef5633212b73c7a3eb2cfbc380
Summary:
With this PR, users can optionally provide a "doc_string" to describe a class or its method. doc_string for TorchBind classes and methods are stored as `doc_string` properties in `Function` and `ScriptClass`. These `dos_string` properties are then exposed in Python layer via PyBind for doc generation.
Fixes https://github.com/pytorch/pytorch/issues/46047
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46576
Reviewed By: wanchaol
Differential Revision: D24440636
Pulled By: gmagogsfm
fbshipit-source-id: bfa9b270a6c2d8bc769a88fad6be939cc6310412
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45318
When calling `then()` from WorkNCCL, record the input data pointers in futureNCCLCallbackStream_ before the execution of the input callback.
Note that the recording cannot be directly added to the lambda used by addCallback in ProcessGroupNCCL.hpp. This is because the type of future value in that context is pyobject rather than TensorList, but a type casting will require pybind and introduce Python dependency, which should not be allowed in c10d library.
I have considered creating a util function in a separate file to support this type casting, and then placing it under torch/csrc directory where python dependency is allowed. However, torch/csrc has a dependency on c10d, so this will create a circular dependency.
Finally, a `record_stream_cb_` member is added to FutureNCCL, and the default value is nullptr. A default `record_stream_cb_` implementation is added to `PythonFutureWrapper,` where Python dependency is allowed.
In addition, a few lines are reformatted by lint.
caffe2/torch/csrc/distributed/c10d/init.cpp is only reformatted.
#Closes: https://github.com/pytorch/pytorch/issues/44203
Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- ProcessGroupNCCLTest
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_accumulate_gradients_no_sync_allreduce_with_then_hook
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_with_then_hook_nccl
Reviewed By: pritamdamania87
Differential Revision: D23910257
fbshipit-source-id: 66920746c41f3a27a3689f22e2a2d9709d0faa15
Summary:
1. Added CudaFusionGuard as the custom TypeCheck for nvfuser; enabled dynamic shape support with profiling executor;
2. dropped support for legacy fuser;
3. re-enabled nvfuser tests;
4. added registration for profiling record to allow profiling on user specified nodes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46452
Reviewed By: zou3519, anjali411
Differential Revision: D24364642
Pulled By: ngimel
fbshipit-source-id: daf53a9a6b6636e1ede420a3a6d0397d4a8b450b
Summary:
The record_stream method was hard coded for CUDA device. Define the record_stream in the native_functions.yaml to enable the dynamic dispatch to different end device.
Fixes https://github.com/pytorch/pytorch/issues/36556
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44301
Reviewed By: glaringlee
Differential Revision: D23763954
Pulled By: ezyang
fbshipit-source-id: e6d24f5e7892b56101fa858a6cad2abc5cdc4293
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45789
Making sure that more tests invoke a run with a Fusion Group.
Test Plan: Imported from OSS
Reviewed By: Krovatkin
Differential Revision: D24169535
Pulled By: eellison
fbshipit-source-id: 54d7af434772ba52144b12d15d32ae30460c0c3c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45788
We were only running the traced graph once, which would not yet have been fused at that point. We should run for num_profiled_runs + 1, and also assert that all nodes in the graph were fused.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D24169537
Pulled By: eellison
fbshipit-source-id: 8499bb1a5bd9d2221b1f1c54d6352558cf07ba9a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45262
**Summary**
This commit adds an API for ignoring arbitrary module attributes during
scripting. A class attribute named `ignored_attributes` containing names
of attributes to ignore can be added to the class of the instance being
scripted. Attributes ignored in this fashion cannot be used in
`forward`, methods used by `forward` or by `exported` methods. They
are, however, copied to the `RecursiveScriptModule` wrapper and can be
used by `ignored` methods and regular Python code.
**Test Plan**
This commit adds unit tests to `TestScriptPy3` to test this new API.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D23971882
Pulled By: SplitInfinity
fbshipit-source-id: 8c81fb415fde7b78aa2f87e5d83a477e876a7cc3
Summary:
* Support propagating `dim_param` in ONNX by encoding as `ShapeSymbol` in `SymbolicShape` of outputs. If export is called with `dynamic_axes` provided, shape inference will start with these axes set as dynamic.
* Add new test file `test_pytorch_onnx_shape_inference.py`, reusing all test cases from `test_pytorch_onnx_onnxruntime.py`, but focus on validating shape for all nodes in graph. Currently this is not enabled in the CI, since there are still quite some existing issues and corner cases to fix. The test is default to run only at opset 12.
* Bug fixes, such as div, _len, and peephole.cpp passes for PackPadded, and LogSoftmaxCrossEntropy.
* This PR depends on existing PR such as 44332.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44920
Reviewed By: eellison
Differential Revision: D23958398
Pulled By: bzinodev
fbshipit-source-id: 00479d9bd19c867d526769a15ba97ec16d56e51d
Summary:
Export of embedding bag with dynamic list of offsets.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44693
Reviewed By: malfet
Differential Revision: D23831980
Pulled By: bzinodev
fbshipit-source-id: 3eaff1a0f20d1bcfb8039e518d78c491be381e1a
Summary:
Optimize export_onnx api to reduce string and model proto exchange in export.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44332
Reviewed By: bwasti, eellison
Differential Revision: D23880129
Pulled By: bzinodev
fbshipit-source-id: 1d216d8f710f356cbba2334fb21ea15a89dd16fa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45098
**Summary**
This commit adds support for default arguments in methods of class
types. Similar to how default arguments are supported for regular
script functions and methods on scripted modules, default values are
retrieved from the definition of a TorchScript class in Python as Python
objects, converted to IValues, and then attached to the schemas of
already compiled class methods.
**Test Plan**
This commit adds a set of new tests to TestClassType to test default
arguments.
**Fixes**
This commit fixes#42562.
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D23844769
Pulled By: SplitInfinity
fbshipit-source-id: ceedff7703bf9ede8bd07b3abcb44a0f654936bd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44795
Today, we build our cpp tests twice, once as a standalone gtest binary,
and once linked in `libtorch_python` so we can call them from
`test_jit.py`.
This is convenient (it means that `test_jit.py` is a single entry point
for all our tests), but has a few drawbacks:
1. We can't actually use the gtest APIs, since we don't link gtest into
`libtorch_python`. We're stuck with the subset that we want to write
polyfills for, and an awkward registration scheme where you have to
write a test then include it in `tests.h`).
2. More seriously, we register custom operators and classes in these
tests. In a world where we may be linking many `libtorch_python`s, this
has a tendency to cause errors with `libtorch`.
So now, only tests that explicitly require cooperation with Python are
built into `libtorch_python`. The rest are built into
`build/bin/test_jit`.
There are tests which require that we define custom classes and
operators. In these cases, I've built thm into separate `.so`s that we
call `torch.ops.load_library()` on.
Test Plan: Imported from OSS
Reviewed By: SplitInfinity, ZolotukhinM
Differential Revision: D23735520
Pulled By: suo
fbshipit-source-id: d146bf4e7eb908afa6f96b394e4d395d63ad72ff
Summary:
PyObject_IsSubclass may set python live exception bit if given object is not a class. `IsNamedTuple` is currently using it incorrectly, which may trip all following python operations in debug-build python. Normal release-build python is not affected because `assert` is no-op in release-build.
Fixes https://github.com/pytorch/pytorch/issues/43577
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44769
Reviewed By: jamesr66a
Differential Revision: D23725584
Pulled By: gmagogsfm
fbshipit-source-id: 2dabd4f8667a045d5bf75813500876c6fd81542b
Summary:
There's an annoying O(N^2) in module export logic that makes saving some of the models (if they have many classes) take eternity.
I'm not super familiar with this code to properly untangle the deps and make it a pure hash lookup. So I just added a side lookup table for raw pointers. It's still quadratic, but it's O(num_classes^2) instead of O(num_classes * num_references) which already gives huge savings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44589
Test Plan:
Tested with one of the offending models - just loading a saving a Torchscript file:
```
Before:
load 1.9239683151245117
save 165.74712467193604
After:
load 1.9409027099609375
save 1.4711427688598633
```
Reviewed By: suo
Differential Revision: D23675278
Pulled By: dzhulgakov
fbshipit-source-id: 8f3fa7730941085ea20d9255b49a149ac1bf64fe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42390
**Summary**
This commit extends support for properties to include
ScriptModules.
**Test Plan**
This commit adds a unit test that has a ScriptModule with
a user-defined property.
`python test/test_jit_py3.py TestScriptPy3.test_module_properties`
Test Plan: Imported from OSS
Reviewed By: eellison, mannatsingh
Differential Revision: D22880298
Pulled By: SplitInfinity
fbshipit-source-id: 74f6cb80f716084339e2151ca25092b6341a1560
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43043
This add the support for rpc_sync in TorchScript in a way similar to
rpc_async
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D23252039
Pulled By: wanchaol
fbshipit-source-id: 8a05329cb8a24079b2863178b73087d47273914c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202
In preparation for changing mobile run_method() to be variadic, this diff:
* Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist.
* Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects.
ghstack-source-id: 111848222
Test Plan: CI, and all the unit tests which currently contain run_method that are being changed.
Reviewed By: iseeyuan
Differential Revision: D23436351
fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577