Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/5062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45556
User defined classes can be used as constants. This is useful when freezing and removing the module from the graph.
Test Plan: waitforsadcastle
Reviewed By: eellison
Differential Revision: D23994974
fbshipit-source-id: 5b4a5c91158aa7f22df39d71f2658afce1d29317
Summary:
Fix for https://github.com/pytorch/pytorch/issues/46122
For `Any`, we infer the type of the ivalue to set the ivalue's type tag. When we saw a Tensor, we would use a specialized Tensor type, so when `Dict[str, Tensor]` was passed in as any `Any` arg it would be inferred as `Dict[str, Float(2, 2, 2, 2)]` which breaks runtime `isinstance` checking.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46130
Reviewed By: glaringlee
Differential Revision: D24261447
Pulled By: eellison
fbshipit-source-id: 8a2bb26ce5b6c56c8dcd8db79e420f4b5ed83ed5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47586
relanding PR of https://github.com/pytorch/pytorch/pull/44492, and add
additional Capsule related wrapping to ensure we still have the correct
type in pybind11 to resolve Capsule as torch._C.CapsuleType
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D24822519
Pulled By: wanchaol
fbshipit-source-id: eaaea446fb54b56ed3b0d04c31481c64096e9459
Summary:
Original commit changeset: b9796e15074d
have weird issue happening with custom class + recursive scripting, unland this first to figure out more details
Test Plan: wait for sandcastle
Reviewed By: zhangguanheng66
Differential Revision: D24780498
fbshipit-source-id: 99a937a26908897556d3bd9f1b2b39f494836fe6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47300
**Summary**
This commit augments the module interface subtyping check that is done
before the emission of the `prim::ModuleDictIndex` operator so that the
error message that is printed if the subtyping check fails provides more
information on which methods do not match.
**Test Plan**
Existing unit tests for `prim::ModuleDictIndex`. Compilation of `ModWithWrongAnnotation` now produces this error:
```
Attribute module is not of annotated type __torch__.jit.test_module_containers.ModuleInterface: Method on class '__torch__.jit.test_module_containers.DoesNotImplementInterface' (1) is not compatible with interface '__torch__.jit.test_module_containers.ModuleInterface' (2)
(1) forward(__torch__.jit.test_module_containers.DoesNotImplementInterface self, Tensor inp) -> ((Tensor, Tensor))
(2) forward(InterfaceType<ModuleInterface> self, Any inp) -> (Any)
:
```
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D24709538
Pulled By: SplitInfinity
fbshipit-source-id: 6b6cb75e4b2b12b08576a5530b4b90cbcad9b6e5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45716
**Summary**
This commit enables indexing into `ModuleDict` using a non-literal
index if the `ModuleDict` is annotated with `Dict[str, X]`, where `X` is
a module interface type. These annotations must be expressed using a
class attribute named `__annotations__`, which is a `Dict[str, Type]`
where the keys are the names of module attributes and the values are
their types.
The approach taken by this commit is that these annotations are stored
as "hints" along with the corresponding module attributes in the
`ConcreteSubmoduleTypeBuilder` instance for each module (which might be
a `ModuleDict`). These hints are passed into the `ModuleValue` that is
created for desugaring operations on submodules so that indexing into a
`ModuleDict` can be emitted as a getitem op into a dict emitted into the
graph that represents the `ModuleDict`.
**Test Plan**
This commit adds unit tests to `TestModuleContainers` to test this
feature (`test_typed_module_dict`).
Differential Revision: D24070606
Test Plan: Imported from OSS
Reviewed By: ansley
Pulled By: SplitInfinity
fbshipit-source-id: 6019a7242d53d68fbfc1aa5a49df6cfc0507b992
Summary:
Enables the use of NoneType arguments to inputs tuple in the export API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45792
Reviewed By: heitorschueroff
Differential Revision: D24312784
Pulled By: bzinodev
fbshipit-source-id: 1717e856b56062add371af7dc09cdd9c7b5646da
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46686
I was trying to page this code back in after a while and some things
stuck out as unnecessarily confusing.
1. Improve documentation of closures and fork stuff to be more accurate
to how we use them today.
2. Change `prim::LocalVariableScope` to `prim::ListComprehension`. It is
only ever used for a list comprehensions, and in general the nodes
emitted by `ir_emitter` should correspond to concrete operations or
language features rather than semantic constraints.
3. Change the somewhat mysterious "inputs" and "attributes" argument
names throughout the codebase to be the more obvious "args" and "kwargs"
that they generally represent (I think "inputs" and "attributes" come
from the AST naming).
Test Plan: Imported from OSS
Reviewed By: navahgar, jamesr66a
Differential Revision: D24464197
Pulled By: suo
fbshipit-source-id: 1f4b1475b58b5690a0b204e705caceff969533b4
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46779
Test Plan: Used it in some notebooks.
Reviewed By: suo
Differential Revision: D24574005
Pulled By: dreiss
fbshipit-source-id: 78ba7a2bdb859fef5633212b73c7a3eb2cfbc380
Summary:
With this PR, users can optionally provide a "doc_string" to describe a class or its method. doc_string for TorchBind classes and methods are stored as `doc_string` properties in `Function` and `ScriptClass`. These `dos_string` properties are then exposed in Python layer via PyBind for doc generation.
Fixes https://github.com/pytorch/pytorch/issues/46047
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46576
Reviewed By: wanchaol
Differential Revision: D24440636
Pulled By: gmagogsfm
fbshipit-source-id: bfa9b270a6c2d8bc769a88fad6be939cc6310412
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45318
When calling `then()` from WorkNCCL, record the input data pointers in futureNCCLCallbackStream_ before the execution of the input callback.
Note that the recording cannot be directly added to the lambda used by addCallback in ProcessGroupNCCL.hpp. This is because the type of future value in that context is pyobject rather than TensorList, but a type casting will require pybind and introduce Python dependency, which should not be allowed in c10d library.
I have considered creating a util function in a separate file to support this type casting, and then placing it under torch/csrc directory where python dependency is allowed. However, torch/csrc has a dependency on c10d, so this will create a circular dependency.
Finally, a `record_stream_cb_` member is added to FutureNCCL, and the default value is nullptr. A default `record_stream_cb_` implementation is added to `PythonFutureWrapper,` where Python dependency is allowed.
In addition, a few lines are reformatted by lint.
caffe2/torch/csrc/distributed/c10d/init.cpp is only reformatted.
#Closes: https://github.com/pytorch/pytorch/issues/44203
Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- ProcessGroupNCCLTest
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_accumulate_gradients_no_sync_allreduce_with_then_hook
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_with_then_hook_nccl
Reviewed By: pritamdamania87
Differential Revision: D23910257
fbshipit-source-id: 66920746c41f3a27a3689f22e2a2d9709d0faa15
Summary:
1. Added CudaFusionGuard as the custom TypeCheck for nvfuser; enabled dynamic shape support with profiling executor;
2. dropped support for legacy fuser;
3. re-enabled nvfuser tests;
4. added registration for profiling record to allow profiling on user specified nodes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46452
Reviewed By: zou3519, anjali411
Differential Revision: D24364642
Pulled By: ngimel
fbshipit-source-id: daf53a9a6b6636e1ede420a3a6d0397d4a8b450b
Summary:
The record_stream method was hard coded for CUDA device. Define the record_stream in the native_functions.yaml to enable the dynamic dispatch to different end device.
Fixes https://github.com/pytorch/pytorch/issues/36556
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44301
Reviewed By: glaringlee
Differential Revision: D23763954
Pulled By: ezyang
fbshipit-source-id: e6d24f5e7892b56101fa858a6cad2abc5cdc4293
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45789
Making sure that more tests invoke a run with a Fusion Group.
Test Plan: Imported from OSS
Reviewed By: Krovatkin
Differential Revision: D24169535
Pulled By: eellison
fbshipit-source-id: 54d7af434772ba52144b12d15d32ae30460c0c3c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45788
We were only running the traced graph once, which would not yet have been fused at that point. We should run for num_profiled_runs + 1, and also assert that all nodes in the graph were fused.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D24169537
Pulled By: eellison
fbshipit-source-id: 8499bb1a5bd9d2221b1f1c54d6352558cf07ba9a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45262
**Summary**
This commit adds an API for ignoring arbitrary module attributes during
scripting. A class attribute named `ignored_attributes` containing names
of attributes to ignore can be added to the class of the instance being
scripted. Attributes ignored in this fashion cannot be used in
`forward`, methods used by `forward` or by `exported` methods. They
are, however, copied to the `RecursiveScriptModule` wrapper and can be
used by `ignored` methods and regular Python code.
**Test Plan**
This commit adds unit tests to `TestScriptPy3` to test this new API.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D23971882
Pulled By: SplitInfinity
fbshipit-source-id: 8c81fb415fde7b78aa2f87e5d83a477e876a7cc3
Summary:
* Support propagating `dim_param` in ONNX by encoding as `ShapeSymbol` in `SymbolicShape` of outputs. If export is called with `dynamic_axes` provided, shape inference will start with these axes set as dynamic.
* Add new test file `test_pytorch_onnx_shape_inference.py`, reusing all test cases from `test_pytorch_onnx_onnxruntime.py`, but focus on validating shape for all nodes in graph. Currently this is not enabled in the CI, since there are still quite some existing issues and corner cases to fix. The test is default to run only at opset 12.
* Bug fixes, such as div, _len, and peephole.cpp passes for PackPadded, and LogSoftmaxCrossEntropy.
* This PR depends on existing PR such as 44332.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44920
Reviewed By: eellison
Differential Revision: D23958398
Pulled By: bzinodev
fbshipit-source-id: 00479d9bd19c867d526769a15ba97ec16d56e51d
Summary:
Export of embedding bag with dynamic list of offsets.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44693
Reviewed By: malfet
Differential Revision: D23831980
Pulled By: bzinodev
fbshipit-source-id: 3eaff1a0f20d1bcfb8039e518d78c491be381e1a
Summary:
Optimize export_onnx api to reduce string and model proto exchange in export.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44332
Reviewed By: bwasti, eellison
Differential Revision: D23880129
Pulled By: bzinodev
fbshipit-source-id: 1d216d8f710f356cbba2334fb21ea15a89dd16fa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45098
**Summary**
This commit adds support for default arguments in methods of class
types. Similar to how default arguments are supported for regular
script functions and methods on scripted modules, default values are
retrieved from the definition of a TorchScript class in Python as Python
objects, converted to IValues, and then attached to the schemas of
already compiled class methods.
**Test Plan**
This commit adds a set of new tests to TestClassType to test default
arguments.
**Fixes**
This commit fixes#42562.
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D23844769
Pulled By: SplitInfinity
fbshipit-source-id: ceedff7703bf9ede8bd07b3abcb44a0f654936bd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44795
Today, we build our cpp tests twice, once as a standalone gtest binary,
and once linked in `libtorch_python` so we can call them from
`test_jit.py`.
This is convenient (it means that `test_jit.py` is a single entry point
for all our tests), but has a few drawbacks:
1. We can't actually use the gtest APIs, since we don't link gtest into
`libtorch_python`. We're stuck with the subset that we want to write
polyfills for, and an awkward registration scheme where you have to
write a test then include it in `tests.h`).
2. More seriously, we register custom operators and classes in these
tests. In a world where we may be linking many `libtorch_python`s, this
has a tendency to cause errors with `libtorch`.
So now, only tests that explicitly require cooperation with Python are
built into `libtorch_python`. The rest are built into
`build/bin/test_jit`.
There are tests which require that we define custom classes and
operators. In these cases, I've built thm into separate `.so`s that we
call `torch.ops.load_library()` on.
Test Plan: Imported from OSS
Reviewed By: SplitInfinity, ZolotukhinM
Differential Revision: D23735520
Pulled By: suo
fbshipit-source-id: d146bf4e7eb908afa6f96b394e4d395d63ad72ff
Summary:
PyObject_IsSubclass may set python live exception bit if given object is not a class. `IsNamedTuple` is currently using it incorrectly, which may trip all following python operations in debug-build python. Normal release-build python is not affected because `assert` is no-op in release-build.
Fixes https://github.com/pytorch/pytorch/issues/43577
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44769
Reviewed By: jamesr66a
Differential Revision: D23725584
Pulled By: gmagogsfm
fbshipit-source-id: 2dabd4f8667a045d5bf75813500876c6fd81542b
Summary:
There's an annoying O(N^2) in module export logic that makes saving some of the models (if they have many classes) take eternity.
I'm not super familiar with this code to properly untangle the deps and make it a pure hash lookup. So I just added a side lookup table for raw pointers. It's still quadratic, but it's O(num_classes^2) instead of O(num_classes * num_references) which already gives huge savings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44589
Test Plan:
Tested with one of the offending models - just loading a saving a Torchscript file:
```
Before:
load 1.9239683151245117
save 165.74712467193604
After:
load 1.9409027099609375
save 1.4711427688598633
```
Reviewed By: suo
Differential Revision: D23675278
Pulled By: dzhulgakov
fbshipit-source-id: 8f3fa7730941085ea20d9255b49a149ac1bf64fe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42390
**Summary**
This commit extends support for properties to include
ScriptModules.
**Test Plan**
This commit adds a unit test that has a ScriptModule with
a user-defined property.
`python test/test_jit_py3.py TestScriptPy3.test_module_properties`
Test Plan: Imported from OSS
Reviewed By: eellison, mannatsingh
Differential Revision: D22880298
Pulled By: SplitInfinity
fbshipit-source-id: 74f6cb80f716084339e2151ca25092b6341a1560
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43043
This add the support for rpc_sync in TorchScript in a way similar to
rpc_async
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D23252039
Pulled By: wanchaol
fbshipit-source-id: 8a05329cb8a24079b2863178b73087d47273914c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202
In preparation for changing mobile run_method() to be variadic, this diff:
* Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist.
* Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects.
ghstack-source-id: 111848222
Test Plan: CI, and all the unit tests which currently contain run_method that are being changed.
Reviewed By: iseeyuan
Differential Revision: D23436351
fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44493
This function allows to execute a graph exactly as it is, without going
through a graph executor which would run passes on the graph before
interpreting it. I found this feature extremely helpful when I worked on
a stress-testing script to shake out bugs from the TE fuser: I needed to
execute a very specific set of passes on a graph and nothing else, and
then execute exactly it.
Test Plan: Imported from OSS
Reviewed By: jamesr66a
Differential Revision: D23632505
Pulled By: ZolotukhinM
fbshipit-source-id: ea81fc838933743e2057312d3156b77284d832ef
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44162
This diff exports Node::isBefore/isAfter method to PythonAPI.
Test Plan: Tested locally. Please let me know if there is a set of unit tests to be passed.
Reviewed By: soumith
Differential Revision: D23514448
fbshipit-source-id: 7ef709b036370217ffebef52fd93fbd68c464e89
Summary:
Duplicate of https://github.com/pytorch/pytorch/issues/41413
This PR initiates the process of updating the torchsciprt backend interface used by ONNX exporter.
Replace jit lower graph pass by freeze module pass
Enable ScriptModule tests for ONNX operator tests (ORT backend) and model tests by default.
Replace jit remove_inplace_ops pass with remove_mutation and consolidation all passes for handling inplace ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43791
Reviewed By: houseroad
Differential Revision: D23421872
Pulled By: bzinodev
fbshipit-source-id: a98710c45ee905748ec58385e2a232de2486331b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43298
IR emitter uses `ModuleValue` to represent ScriptModules and emit IR for
attribute access, submodule access, etc.
`ModuleValue` relies on two pieces of information, the JIT type of the
module, and the `ConcreteModuleType`, which encapsulates Python-only
information about the module.
ScriptModules loaded from a package used to create a dummy
ConcreteModuleType without any info in it. This led to divergences in
behavior during compilation.
This PR makes the two ways of constructing a ConcreteModuleType equivalent,
modulo any py-only information (which, by definition, is never present in
packaged files anyway).
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D23228738
Pulled By: suo
fbshipit-source-id: f6a660f42272640ca1a1bb8c4ee7edfa2d1b07cc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43284
The IR emitter looks for attributes on modules like:
1. Check the JIT type for the attribute
2. Check the originating Python class, in order to fulfill requests for, e.g. static methods or ignored methods.
In the case where you do:
```
inner_module = torch.jit.load("inner.pt")
wrapped = Wrapper(inner_module) # wrap the loaded ScriptModule in an nn.Module
torch.jit.script(wrapped)
```
The IR emitter may check for attributes on `inner_module`. There is no
originating Python class for `inner_module`, since it was directly
compiled from the serialized format.
Due to a bug in the code, we don't guard for this case an a segfault
results if the wrapper asks for an undefined attribute. The lookup in
this case looks like:
1. Check the JIT type for the attribute (not there!)
2. Check the originating Python class (this is a nullptr! segfault!)
This PR guards this case and properly just raises an attribute missing
compiler error instead of segfaulting.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D23224337
Pulled By: suo
fbshipit-source-id: 0cf3060c427f2253286f76f646765ec37b9c4c49
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43965
As part of a larger effort to unify the API between the lite interpreter and full JIT:
- implement torch::jit::mobile::Method, a proxy for torch::jit::mobile::Function
- add support for overloaded operator() to mobile Method and Function
- mobile find_method now returns a c10::optional<Method> (so signature matches full jit)
- moves some implementation of Function from module.cpp to function.cpp
ghstack-source-id: 111161942
Test Plan: CI
Reviewed By: iseeyuan
Differential Revision: D23330762
fbshipit-source-id: bf0ba0d711d9566c92af31772057ecd35983ee6d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43248
We add the support of __torch_function__ override for C++ custom op. The logic is the same as the other components, like torch.nn.Module.
Refactored some code a little bit to make it reusable.
Test Plan: buck test //caffe2/test:fx -- test_torch_custom_ops
Reviewed By: bradleyhd
Differential Revision: D23203204
fbshipit-source-id: c462a86e407e46c777171da32d7a40860acf061e
Summary:
It is often that the conversion from torch operator to onnx operator requires input rank/dtype/shape to be known. Previously, the conversion depends on tracer to provide these info, leaving a gap in conversion of scripted modules.
We are extending the export with support from onnx shape inference. If enabled, onnx shape inference will be called whenever an onnx node is created. This is the first PR introducing the initial look of the feature. More and more cases will be supported following this PR.
* Added pass to run onnx shape inference on a given node. The node has to have namespace `onnx`.
* Moved helper functions from `export.cpp` to a common place for re-use.
* This feature is currently experimental, and can be turned on through flag `onnx_shape_inference` in internal api `torch.onnx._export`.
* Currently skipping ONNX Sequence ops, If/Loop and ConstantOfShape due to limitations. Support will be added in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40628
Reviewed By: mrshenli
Differential Revision: D22709746
Pulled By: bzinodev
fbshipit-source-id: b52aeeae00667e66e0b0c1144022f7af9a8b2948
Summary:
In case we want to store binary files using `ScriptModule.save(..., _extra_files=...)` functionality. With python3 we can just use bytes only and not bother about it.
I had to do a copy-pasta from pybind sources, maybe we should upstream it, but it'd mean adding a bunch of template arguments to `bind_map` which is a bind untidy.
Let me know if there's a better place to park this function (it seems to be the only invocation of `bind_map` so I put it in the same file)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43241
Reviewed By: zdevito
Differential Revision: D23205244
Pulled By: dzhulgakov
fbshipit-source-id: 8f291eb4294945fe1c581c620d48ba2e81b3dd9c
Summary:
[Re-review tips: nothing changed other than a type in python_ir.cpp to fix a windows build failure]
Adds code printing for enum type
Enhance enum type to include all contained enum names and values
Adds code parsing for enum type in deserialization
Enabled serialization/deserialization test in most TestCases. (With a few dangling issues to be addressed in later PRs to avoid this PR grows too large)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43460
Reviewed By: albanD
Differential Revision: D23284929
Pulled By: gmagogsfm
fbshipit-source-id: e3e81d6106f18b7337ac3ff5cd1eeaff854904f3
Summary:
This patch allows to freeze model that utilizes interfaces. Freezing works
under the user assumption that the interfase module dones not aliases with
any value used in the model.
To enable freezing of such modules, added an extra pramater:
torch._C._freeze_module(module, ignoreInterfaces = True)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41860
Reviewed By: eellison
Differential Revision: D22670566
Pulled By: bzinodev
fbshipit-source-id: 41197a724bc2dca2e8495a0924c224dc569f62a4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42963
* Adds code printing for enum type
* Enhance enum type to include all contained enum names and values
* Adds code parsing for enum type in deserialization
* Enabled serialization/deserialization test in most TestCases. (With a few dangling issues to be addressed in later PRs to avoid this PR grows too large)
Test Plan: Imported from OSS
Reviewed By: SplitInfinity
Differential Revision: D23223281
Pulled By: gmagogsfm
fbshipit-source-id: 716d1866b7770dfb7bd8515548cfe7dc4c4585f7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42869
We realized that when we invoke a simple callback that divides the tensors by `world_size` after `allreduce`, the performance was almost 50% lower in terms of QPS compared to the case where a simple `allreduce` hook is used with no `then` callback.
The main problem was as we call `work.wait()` before invoking `then` callback, we were synchronizing `work`'s stream with the default PyTorch stream inside [`runHook`](https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/reducer.cpp#L609) and stalling the backward computation.
In that PR, we ensure that FutureNCCL's `then` callback is not stalling the backward computation. Assuming single-process single-device, `FutureNCCL` gets a new stream from device's pool using `at::cuda::getStreamFromPool` to run `callback` and before invoking the `callback` inline it synchronizes `WorkNCCL`'s stream by callback's stream not the default stream.
ghstack-source-id: 110208431
Test Plan: Run performance benchmark tests to validate performance issue is resolved. Also, `python test/distributed/test_c10d.py` to avoid any odd issues.
Reviewed By: pritamdamania87
Differential Revision: D23055807
fbshipit-source-id: 60e50993f1ed97497514eac5cb1018579ed2a4c5
Summary:
The ONNX spec for the Squeeze operator:
> Remove single-dimensional entries from the shape of a tensor. Takes a parameter axes with a list of axes to squeeze. If axes is not provided, all the single dimensions will be removed from the shape. If an axis is selected with shape entry not equal to one, an error is raised.
Currently, as explained in issue https://github.com/pytorch/pytorch/issues/36796, it is possible to export such a model to ONNX, and this results in an exception from ONNX runtime.
Fixes https://github.com/pytorch/pytorch/issues/36796.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38476
Reviewed By: hl475
Differential Revision: D22158024
Pulled By: houseroad
fbshipit-source-id: bed625f3c626eabcbfb2ea83ec2f992963defa19
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42389
**Summary**
This commit adds support for properties to TorchScript classes,
specifically for getters and setters. They are implemented essentially
as pointers to the methods that the corresponding decorators decorate,
which are treated like regular class methods. Deleters for properties
are considered to be out of scope (and probably useless for TorchScript
anyway).
**Test Plan**
This commit adds a unit test for a class with a property that has both
getter and setter and one that has only a getter.
`python test/test_jit.py TestClassType.test_properties`
Test Plan: Imported from OSS
Reviewed By: eellison, ppwwyyxx
Differential Revision: D22880232
Pulled By: SplitInfinity
fbshipit-source-id: 4828640f4234cb3b0d4f3da4872a75fbf519e5b0
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42133
Test Plan:
We save a module with module debugging information as follows.
```
import torch
m = torch.jit.load('./detect.pt')
# Save module without debug info
m._save_for_lite_interpreter('./detect.bc')
# Save module with debug info
m._save_for_lite_interpreter('./detect.bc', _save_debug_info_in_bytecode=True)
```
Size of the file without module debugging information: 4.508 MB
Size of the file with module debugging information: 4.512 MB
Reviewed By: kimishpatel
Differential Revision: D22803740
Pulled By: taivu1998
fbshipit-source-id: c82ea62498fde36a1cfc5b073e2cea510d3b7edb
Summary:
The premise of this approach is that a small subset of neural networks are well represented by a data flow graph. The README contains more information.
The name is subject to change, but I thought it was a cute reference to fire.
suo let me know if you'd prefer this in a different spot. Since it lowers a JIT'd module directly I assumed the JIT folder would be appropriate. There is no exposed Python interface yet (but is mocked up in `test_accelerant.py`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42753
Reviewed By: zou3519
Differential Revision: D23043771
Pulled By: bwasti
fbshipit-source-id: 5353731e3aae31c08b5b49820815da98113eb551
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42740
Adds a pass to hoist conv packed params to root module.
The benefit is that if there is nothing else in the conv module,
subsequent passes will delete it, which will reduce module size.
For context, freezing does not handle this because conv packed
params is a custom object.
Test Plan:
```
PYTORCH_JIT_LOG_LEVEL=">hoist_conv_packed_params.cpp" python test/test_mobile_optimizer.py TestOptimizer.test_hoist_conv_packed_params
```
Imported from OSS
Reviewed By: kimishpatel
Differential Revision: D23005961
fbshipit-source-id: 31ab1f5c42a627cb74629566483cdc91f3770a94
Summary:
[5/N] Implement Enum JIT support
Implement Enum class iteration
Add aten.ne for EnumType
Supported:
Enum-typed function arguments
using Enum type and comparing them
Support getting name/value attrs of enums
Using Enum value as constant
Support Enum-typed return values
Support iterating through Enum class (enum value list)
TODO:
Support serialization and deserialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42661
Reviewed By: SplitInfinity
Differential Revision: D22977364
Pulled By: gmagogsfm
fbshipit-source-id: 1a0216f91d296119e34cc292791f9aef1095b5a8
Summary:
in `_jit_pass_onnx`, symbolic functions are called for each node for conversion. However, there are nodes that cannot be converted without additional context. For example, the number of outputs from split (and whether it is static or dynamic) is unknown until the point where it is unpacked by listUnpack node. This pass does a preprocess, and prepares the nodes such that enough context can be received by the symbolic function.
* After preprocessing, `_jit_pass_onnx` should have enough context to produce valid ONNX nodes, instead of half baked nodes that replies on fixes from later postpasses.
* `_jit_pass_onnx_peephole` should be a pass that does ONNX specific optimizations instead of ONNX specific fixes.
* Producing more valid ONNX nodes in `_jit_pass_onnx` enables better utilization of the ONNX shape inference https://github.com/pytorch/pytorch/issues/40628.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41832
Reviewed By: ZolotukhinM
Differential Revision: D22968334
Pulled By: bzinodev
fbshipit-source-id: 8226f03c5b29968e8197d242ca8e620c6e1d42a5
Summary:
* move both under new file `fixup_onnx_controlflow`
* move the fixup to where the ONNX loop/if node is created, as oppose to running the fixup as postpass. This will help with enable onnx shape inference later.
* move `fuseSequenceSplitConcat` to `Peephole`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40943
Reviewed By: mrshenli
Differential Revision: D22709999
Pulled By: bzinodev
fbshipit-source-id: 51d316991d25dc4bb4047a6bb46ad1e2401d3d2d
Summary:
Raise and assert used to have a hard-coded error message "Exception". User provided error message was ignored. This PR adds support to represent user's error message in TorchScript.
This breaks backward compatibility because now we actually need to script the user's error message, which can potentially contain unscriptable expressions. Such programs can break when scripting, but saved models can still continue to work.
Increased an op count in test_mobile_optimizer.py because now we need aten::format to form the actual exception message.
This is built upon an WIP PR: https://github.com/pytorch/pytorch/pull/34112 by driazati
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41907
Reviewed By: ngimel
Differential Revision: D22778301
Pulled By: gmagogsfm
fbshipit-source-id: 2b94f0db4ae9fe70c4cd03f4048e519ea96323ad
Summary:
[3/N] Implement Enum JIT support
* Add enum value as constant support
* Add sugared value for EnumClass
Supported:
Enum-typed function arguments
using Enum type and comparing them
Support getting name/value attrs of enums
Using Enum value as constant
TODO:
Add PyThon sugared value for Enum
Support Enum-typed return values
Support serialization and deserialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42085
Reviewed By: eellison
Differential Revision: D22758042
Pulled By: gmagogsfm
fbshipit-source-id: 5c6e571686c0b60d7fbad59503f5f94b3b3cd125
Summary:
Currently constant pooling runs before const propagation, which can create more constants that need pooling. This can get in the way of serialization/deserialization stability because each time user serializes and deserializes a module, runCleanUpPasses is called upon it. Doing so multiple times would lead to different saved module.
This PR moves constant pooling after const propagation, which may slow down const propagation a little bit, but would otherwise side-step aforementioned problem.
test_constant_insertion in test_jit.py is also updated because after fixing the pass ordering, the number of constants is no longer a constant and it is extremely difficult to get the exact number with the current convoluted test structure. So for now, I changed the test to check only that CSE doesn't change number of "prim::constant" rather than comparing against a known number. Also left a TODO to improve this test.
ConstantPropagation pass is replaced by ConstantPropagationImmutableTypes because the latter is used in runCleanUpPasses. If not replaced, the former would create new CSE opportunities by folding more constants. This voids the purpose of the test case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41891
Reviewed By: colesbury
Differential Revision: D22701540
Pulled By: gmagogsfm
fbshipit-source-id: 8e60dbdcc54a93dac111d81b8d88fb39387224f5
Summary:
Add pass that fuses Conv and Batchnormalization nodes into one node Conv.
This pass is only applied in inference mode (training is None or TrainingMode.Eval).
Since this pass needs access to param_dict it is written outside peephole file where these kind of passes (fusing multiple nodes into one) is usually placed.
This PR also adds wrapper skipIfNoEmbed to skip debug_embed_params test:
Pass that fuses Conv and Batchnorm changes the params of resnet model and parameters of onnx and pytorch model won't match. Since parameters are not matching, debug_embed_params test for test_resnet will fail and that is expected, therefore debug_embed_params test for test_resnet should be skipped.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40547
Reviewed By: gchanan
Differential Revision: D22631687
Pulled By: bzinodev
fbshipit-source-id: fe45812400398a32541e797f727fd8697eb6d8c0
Summary:
* Add EnumType and AnyEnumType as first-class jit type
* Add Enum-typed IValue
* Enhanced aten::eq to support Enum
Supported:
Enum-typed function targuments
using Enum type and comparing them
TODO:
Add PyThon sugared value for Enum
Support getting name/value attrs of enums
Support Enum-typed return values
Support enum values of different types in same Enum class
Support serialization and deserialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41390
Reviewed By: eellison
Differential Revision: D22524388
Pulled By: gmagogsfm
fbshipit-source-id: 1627154a64e752d8457cd53270f3d14aea4b1150
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41414
This diff exports replaceAllUsesAfterNodeWith to PythonAPI.
Test Plan: Tested locally. Please let me know if there is a set of unit tests to be passed outside of the default ones triggered by Sandcastle.
Reviewed By: soumith
Differential Revision: D22523211
fbshipit-source-id: 3f075bafa6208ada462abc57d495c15179a6e53d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41146
**Summary**
This commit adds support for using `Modules` that have been lowered as
submodules in `ScriptModules`.
**Test Plan**
This commit adds execution and save/load tests to test_backends.py for
backend-lowered submodules.
**Fixes**
This commit fixes#40069.
Test Plan: Imported from OSS
Reviewed By: ailzhang
Differential Revision: D22459543
Pulled By: SplitInfinity
fbshipit-source-id: 02e0c0ccdce26c671ade30a34aca3e99bcdc5ba7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40252
As title says.
Test Plan:
python test/test_mobile_optimizer.py
Imported from OSS
Differential Revision: D22126825
fbshipit-source-id: a1880587ba8db9dee0fa450bc463734e4a8693d9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39343
Building on top of previous PR that adds fused add_relu op, this PR adds
a JIT pass to transform input graph to find all fusable instancs of add
+ relu and fuses them.
Test Plan:
python test/test_jit.py TestJit.test_add_relu_fusion
Imported from OSS
Differential Revision: D21822396
fbshipit-source-id: 12c7e8db54c6d70a2402b32cc06c7e305ffbb1be
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40718
Currently only constant except tensor must be inlined during serialization.
Tensor are stored in the contant table. This patch generalizes this capability
to any IValue. This is particularly useful for non ASCII string literal that
cannot be inlined.
Test Plan: Imported from OSS
Differential Revision: D22298169
Pulled By: bzinodev
fbshipit-source-id: 88cc59af9cc45e426ca8002175593b9e431f4bac
Summary:
This should be in its own file...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41137
Reviewed By: jamesr66a
Differential Revision: D22437922
Pulled By: eellison
fbshipit-source-id: 1b62dde1a4ebac673b5c60aea4f398f734d62501
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40902
See the bottom of this stack for context.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D22360210
Pulled By: suo
fbshipit-source-id: 4275127173a36982ce9ad357aa344435b98e1faf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40841
**Summary**
This commit adds support for using `Modules` that have been lowered as
submodules in `ScriptModules`.
**Test Plan**
This commit adds execution and save/load tests to test_backends.py for
backend-lowered submodules.
**Fixes**
This commit fixes#40069.
Test Plan: Imported from OSS
Differential Revision: D22418716
Pulled By: SplitInfinity
fbshipit-source-id: d2b2c6d5d2cf3042a620b3bde7d494f1abe28dc1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40939
Previously, when we would do shape analysis by running the op with representative inputs, we would always set the grad property to false. This led to a wrong static analysis when we would create differentiable subgraphs, and propagate shapes without also propagating requires_grad, and then uninline them.
Test Plan: Imported from OSS
Differential Revision: D22394676
Pulled By: eellison
fbshipit-source-id: 254e6e9f964b40d160befe0e125abe1b7aa2bd5e
Summary:
Define static script implementation of __len__ and __contains__ on any subclass derived from a type such as ModuleList, Sequential, or ModuleDict. Implement getitem for classes derived from ModuleDict.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40789
Reviewed By: eellison
Differential Revision: D22325159
Pulled By: wconstab
fbshipit-source-id: fc1562c29640fe800e13b5a1dd48e595c2c7239b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37034
c10 takes a Stack* in boxed functions while JIT took Stack&.
c10 doesn't return anything while JIT returns an int which is always zero.
This changes JIT to follow the c10 behavior.
ghstack-source-id: 106834069
Test Plan: unit tests
Differential Revision: D20567950
fbshipit-source-id: 1a7aea291023afc52ae706957e9a5ca576fbb53b
Summary:
By default freeze_module pass, invoked from optimize_for_mobile,
preserves only forward method. There is an option to specify a list of
methods that can be preserved during freeze_module. This PR exposes that
to optimize_for_module pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40629
Test Plan: python test/test_mobile_optimizer.py
Reviewed By: dreiss
Differential Revision: D22260972
Pulled By: kimishpatel
fbshipit-source-id: 452c653269da8bb865acfb58da2d28c23c66e326
Summary:
Partial support for slicing of Sequential containers.
- works around missing Sequential slice functionality
by converting to tuple
- only supports iteration of resulting tuple values,
not direct call() on the sliced sequential
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40445
Differential Revision: D22192469
Pulled By: wconstab
fbshipit-source-id: 61c85deda2d58f6e3bea2f1fa1d5d5dde568b9b5
Summary:
**Summary**
This commit adds an instance method `_reconstruct` that permits users
to reconstruct a `ScriptModule` from a given C++ `Module` instance.
**Testing**
This commit adds a unit test for `_reconstruct`.
**Fixes**
This pull request fixes https://github.com/pytorch/pytorch/issues/33912.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39979
Differential Revision: D22172323
Pulled By: SplitInfinity
fbshipit-source-id: 9aa6551c422a5a324b822a09cd8d7c660f99ca5c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40270
Original commit changeset: 1227e243ab94
D22082806 (1e03d603c6) broke the model generation of pyper models. We trace the namedtuple as input. To unblock the development of PyPer project, let's revert the diff first.
Sorry about the inconvenience, SplitInfinity
ghstack-source-id: 106217609
Test Plan: buck run dper3/dper3_models/experimental/pytorch/feed:feed_generation_script -- --model_files_dir=/tmp/
Reviewed By: alyssawangqq
Differential Revision: D22132960
fbshipit-source-id: ce9278c8462602a341e231ea890e46f74e743ddf
Summary:
**Summary**
This commit adds support for with statements to PyTorch JIT. Each
of the with items in a with statement is represented in the JIT IR
as a pair of `prim::Enter` and `prim::Exit` nodes that call the
`__enter__` and `__exit__` methods defined on the context manager objects
returned by the expressions in the with item.
**Testing**
This commit adds unit tests for with statements with named with items,
nameless with items, and with statements that encounter exceptions.
```
$ python test/test_jit.py TestWith.test_with_as
Fail to import hypothesis in common_utils, tests are not derandomized
.
----------------------------------------------------------------------
Ran 1 test in 0.430s
OK
```
```
$ python test/test_jit.py TestWith.test_with_no_as
Fail to import hypothesis in common_utils, tests are not derandomized
.
----------------------------------------------------------------------
Ran 1 test in 0.264s
OK
```
```
$ python test/test_jit.py TestWith.test_with_exceptions
Fail to import hypothesis in common_utils, tests are not derandomized
Couldn't download test skip set, leaving all tests enabled...
.
----------------------------------------------------------------------
Ran 1 test in 1.053s
OK
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34705
Differential Revision: D22095945
Pulled By: SplitInfinity
fbshipit-source-id: f661565a834786725259b8ea014b4d7532f9419d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37462
Instead of running all the optimization pass in optimizeForMobile method,
introducing a whitelist optimizer dictionary as second param in the method,
when it is not passed during calling, the method will run all the optimization
passes, otherwise the method will read the dict and only run the pass with
value of True.
ghstack-source-id: 106104503
Test Plan:
python test/test_mobile_optimizer.py
Imported from OSS
Differential Revision: D22096029
fbshipit-source-id: daa9370c0510930f4c032328b225df0bcf97880f
Summary:
**Summary**
This commit modifies type inference for `nn.Module` instance attributes
such that the type of a `NamedTuple` attribute is inferred correctly and
such that the field names of this `NamedTuple` instance can be used in
scripted methods. At present, the type of this attribute is inferred to be
`Tuple[T, U, ..., V]`, so the field must be referred to by index and
cannot be referred to by name.
**Test Plan**
This commit adds a unit test to test that a field of a `NamedTuple`
attribute can be referred to by name in a scripted method.
**Fixes**
This commit fixes https://github.com/pytorch/pytorch/issues/37668.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39116
Differential Revision: D22082806
Pulled By: SplitInfinity
fbshipit-source-id: 1227e243ab941376cd5e382fb093751e88dc8846
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39915
Some of the usage, e.g. add_scalar will not be supporting the debug option,
that is, we will not have a numerically exact representation of the final quantized model
before finalize if people use add scalar.
warning will be added in a later PR.
Test Plan: Imported from OSS
Differential Revision: D22013026
fbshipit-source-id: 714b938f25c10fad3dfc79f095356b9803ef4b47
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39964
The "[fut.wait() for fut in futs]" idiom can introduce up to
O(len(futs)) thread switches, which may be excessive for large N.
This plumbs through the new c++ c10::collectAll() to Python space
so that we only employ a single jit-side wait.
Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:rpc_spawn
Differential Revision: D22027412
fbshipit-source-id: 4e344a19a09638ee46e7fc478df80a41941b84ce
Summary: Add 'find_method' into 'LiteScriptModule' python binding method, so that we use it to find existence of methods, e.g. "get_all_bundled_inputs".
Reviewed By: linbinyu, houseroad
Differential Revision: D22029002
fbshipit-source-id: 9acf76880fc989e825dc3a9186dab6928caee75e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39790
The "[fut.wait() for fut in futs]" idiom can introduce up to
O(len(futs)) thread switches, which may be excessive for large N.
This plumbs through the new c++ c10::collectAll() to Python space
so that we only employ a single jit-side wait.
ghstack-source-id: 105779443
Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:rpc_spawn
Reviewed By: kiukchung
Differential Revision: D21976891
fbshipit-source-id: 253c61f503f4ffb9be784e6c49a0656cede139fb
Summary:
Enhance FileCheck util to check for highlighted source ranges. This is useful when writing tests regarding generated error messages that require source code highlighting.
Here is how the error looks like in different cases:
- In case of needed source code token not found at all in input string:
```
RuntimeError: Expected to find "invalid_token" but did not find it
Searched string:
... <--- HERE
def to_list_missing_type_annotation(x):
# type: (torch.Tensor) -> List[float]
From CHECK-SOURCE-HIGHLIGHTED: invalid_token
```
- In case of source code token not highlighted:
```
Traceback (most recent call last):
File "test_range.py", line 11, in <module>
FileCheck().check_source_highlighted("x.tolist()").run(s)
RuntimeError: Expected to find "~~~~~~~~~~" but did not find it
Searched string:
# type: (torch.Tensor) -> List[float]
li = x.tolist()
~~~~~~~~~ <--- HERE
~~~~~~~~~~~~~~~~~~~... <--- HERE
return li
```
It is a bit confusing since both input text (usually an error message) and generated error messages have their highlighted portions, but this is consistent of previous behavior. Another option is to generate plain error messages without additional range highlighting on input text.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39692
Test Plan:
Added unit test.
Closes https://github.com/pytorch/pytorch/issues/38698
Differential Revision: D22001765
Pulled By: gmagogsfm
fbshipit-source-id: 6681441eee5853ab061d198ccfe55ebffddca202
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39795
Replaces the `is_dynamic` bool by enums in Python and c++
graph quantization code. This makes the code more readable
and will make it easier to modify for adding QAT logic in the future.
Test Plan:
CI, as well as
```
python test/test_quantization.py TestQuantizeDynamicScript
python test/test_quantization.py TestQuantizeScriptJitPasses
```
Imported from OSS
Differential Revision: D21981643
fbshipit-source-id: d475760407bcc794aeae92a2c696bac4acda843d
Summary:
Clearly expressing a type is inferred by PyTorch instead of explicitly annotated by user makes many error messages more user-friendly
Currently Type has two string conversion methods. str() for IR printing and python_str() for serialization and error message generation. If we want to include more information in type printing while maintaining serialization/deserialization correctness, we need to split python_str() into annotation_str() and repr_str().
annotation_str is solely responsible for serialization, it strictly matches format of python type annotation. repr_str() is responsible for generating a human-readable error message that includes information like "this type is inferred, not explicitly annotated"
Closes https://github.com/pytorch/pytorch/issues/39449
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39544
Differential Revision: D21978759
Pulled By: gmagogsfm
fbshipit-source-id: 733566f5a62e748b5ca4bb3c5943ebb6d5b664d0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38830
This patch enables to preserve user specified attributes or non forward
methods. The API:
_freeze_module(Module, ["a", "version"])
Test Plan: Imported from OSS
Differential Revision: D21957316
Pulled By: bzinodev
fbshipit-source-id: 5c9146ae679791070a9de868c45785725b48a9e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39219
We didn't model clamp ops correctly right now, this PR fixes that.
Reason is quantized clamp op quantizes the scalar arguments in the op implementation: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp#L614-L617
So we'll need to model this explicitly in the IR.
When we see a `aten::dequantize - aten::clamp(%x, %min, %max)`
we first make a scalar tensor with `aten::scalar_tensor(%scalar, ...)`, then we quantize the tensor with the same quantization parameters from the input tensor of the `aten::clamp`, dequantize the tensor, then convert the dequantized tensor to scalar using `aten::item`.
Test Plan: Imported from OSS
Differential Revision: D21831350
fbshipit-source-id: d60731459a0465d64946aabc62065d25d92faefc
Summary:
Instead of copying to a buffer, then setting a tensor's storage with that buffer, create a storage directly from the file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36362
Pulled By: driazati
Differential Revision: D21889537
fbshipit-source-id: edbd430073c2bbf52332fe7b3b2590e7d936dedf
Summary:
Previously, on conversion from python -> c++ it was casted to double list through bad copy pasta. It's pretty unusual for someone to script a broadcasting list function directly since it's an internal api, so it was unlikely to affect anyone.
Fix for https://github.com/pytorch/pytorch/issues/39450
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39481
Reviewed By: jamesr66a
Differential Revision: D21870557
Pulled By: eellison
fbshipit-source-id: e704e5e87d2702a270b7d65c4df444246a134480
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39181
Create a python binding classes torch._C. LiteScriptModule for mobile::module, a python class called LiteScriptModule is created which wrap torch._C. LiteScriptModule.
Python class LiteScriptModule contains preliminary functions including forward, run_method and __call__.
Create a python api "load_for_lite_interpreter" under torch.jit.mobile where takes pre-saved mobile module in a file-like object as input and returns python class LiteScriptModule.
Add a python binding method "_save_to_buffer_for_mobile" under ScriptModule, and python method "_save_to_buffer_for_lite_interpreter" under RecursiveScriptModule which saves mobile module into buffer instead of file.
ghstack-source-id: 105215736
Test Plan: buck test caffe2/test:mobile
Differential Revision: D21757474
fbshipit-source-id: 758b87497d65c4686459a567d41887c7a577aa4c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39008
This commit adds a `torch.futures.Future` type and exposes its ctor,
`wait`, `then`, and `set_result` APIs. This type is currently a
wrapper of `c10::ivalue::Future` and mainly used by RPC for now. Later,
we could revamp c10d APIs to return this `Future` type as well. More
utils will be added into `torch.futures` package in followup PRs.
Test Plan: Imported from OSS
Differential Revision: D21723022
Pulled By: mrshenli
fbshipit-source-id: 92e56160544e9bf00d11db3e8347a1b9707882c9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38735
Follow up to my comment https://github.com/pytorch/pytorch/pull/36597/#issuecomment-613674329
This adds a pass to convert op aliases into a normalized form. Having two ops generated in our IR that do the same thing makes the IR harder for downstream consumers of the IR, such as TorchScript passes but also ONNX, glow, etc.
Another solution would have been to fix our code generation to only emit `aten::abs` from the start. This seems trickier, and doesn't really buy us much if we still have to expose `aten::absolute` in C++, as glaringlee of the C++ API thinks we should.
Bike shedding: maybe this should be `CanonicalizeOps` instead
Test Plan: Imported from OSS
Differential Revision: D21673108
Pulled By: eellison
fbshipit-source-id: c328618907de1af22e07f57fd27fa619978c2817
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38734
As far as I can tell, this pass only exists to canonicalize ops that are generating in the graph fuser, so it's kind of a misnomer.
Test Plan: Imported from OSS
Differential Revision: D21673109
Pulled By: eellison
fbshipit-source-id: b7bedf34ccaf1fcd442bfb2bbb990e64915f51d4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35614
Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well.
Test Plan: CI
Differential Revision: D20842876
Pulled By: dreiss
fbshipit-source-id: 18abf0d324ed2185ec6d27c864e935d856dcc6ad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37474
Previously we would segfault
Test Plan: Imported from OSS
Differential Revision: D21297542
Pulled By: suo
fbshipit-source-id: c7e2f828a250c490ec23fb51c6a4a642d3370e52
Summary:
Make it so that non-nn Module classes do not need to be annotated with `torch.jit.script`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38050
Differential Revision: D21482654
Pulled By: eellison
fbshipit-source-id: 22689e4d7a33f6e1574b9495cff29a1fe6abb910
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38253
This pass removes dropout and dropout_ nodes when training is false. It
requires to have run freeze_module pass which does both inlining and constant
propagation, without which training variable remains as attribute instead of
constant.
ghstack-source-id: 103939141
Test Plan: python test/test_jit.py TestScript.test_remove_dropout
Reviewed By: dreiss
Differential Revision: D21505863
fbshipit-source-id: 42ea45804e4653b625b6a254c8d8480757264aa8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35154
This is for issue https://github.com/pytorch/pytorch/issues/34999.
close https://github.com/pytorch/pytorch/issues/34999.
https://github.com/pytorch/pytorch/issues/34997 need more work.
This will make a few work items easier, like 1) Dist autograd profiler, 2) JIT annotation for Future.
Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_rref_forward_chain --stress-runs 100
buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork && \
buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par \
-r test_call_method_on_rref
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- 'test_rref_proxy_class \(fb\.test_rpc_fork\.RpcTestWithFork\)' --stress-runs 100
test_rref_proxy_reuse
test_handle_send_exceptions
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork
buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \
buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \
-r test_script_call_python_return_future
```
Differential Revision: D7722184
fbshipit-source-id: bd92b855bfea4913d6672700590c57622fa86e0e
Summary:
**Summary**
This commit adds `torch::jit::RegisterBackend`, an API that allows
external backends to be registered for the execution of JIT subgraphs
outside the JIT interpreter. In order to register an external backend,
one must extend the provided abstract class `PyTorchBackendInterface` and provide
two additional functions: one that creates an instance of the aforementioned subclass
of `PyTorchBackendInterface`, and another that preprocesses a `ScriptModule` so that
it can run on the backend. Then, a `ScriptModule` that can compile and execute a given
JIT subgraph using the functions provided at registration time is generated
for each registered backend.
**Testing**
This commit adds a unit test that uses a minimal test backend
to make sure that the registration endpoint and generated
`ScriptModule` work.
```
$ python test/test_jit.py TestBackends
Fail to import hypothesis in common_utils, tests are not derandomized
.
----------------------------------------------------------------------
Ran 1 test in 0.183s
OK
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35833
Differential Revision: D21231955
Pulled By: SplitInfinity
fbshipit-source-id: 452db1123d0e5d83f97fe5da8a00fdfdb50dbef9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37472
Our convention is for `findX` to return an optional version and `getX`
to assert that the X is there. Fix up `getMethod` to be consistent with
this convention.
Test Plan: Imported from OSS
Differential Revision: D21297543
Pulled By: suo
fbshipit-source-id: b40f56231cc8183e61bbb01fe5c0c113bcb6464d
Summary:
We were previously only looking at class attributes, so that didn't include methods etc, and would silently give wrong semantics. This makes hasAttr go through the same resolution as our other attribute lookups.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37424
Differential Revision: D21282633
Pulled By: eellison
fbshipit-source-id: 8e970f365c2740d137a02331739c2ed93747b918
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37655
Add override name for aten::tensor and aten::as_tensor.
These two ops are used in NLU model, and they will included them in lite interpreter
Test Plan: verified model can be loaded correctly
Reviewed By: iseeyuan
Differential Revision: D21346142
fbshipit-source-id: 05ff4d9e0bcf7f4f9a30d95ca81aef9c3f6b0990
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36953
Add support for generic lists as a constant. generic dicts & tuples are already implemented. This is a pretty common pattern and cuts down on the number of non-tensor nodes executed in interpolate tests.
Test Plan: Imported from OSS
Differential Revision: D21160761
Pulled By: eellison
fbshipit-source-id: 1e6b7b25b7580f09067794772d44e615601c60c4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32684
Previously we have `clone` and `clone_instance`, where `clone` will clone both type
and value, and `clone_instance` only clone the value, both of them are shallow copies.
We need to re-evaluate whether we should expose them as a user facing API.
I think we should hide `clone`, but `clone_instance` might be useful as well, especially
when we are copying a model with very large weights, people might just want to do shallow copy.
This PR adds a `deepcopy` that might be useful as a user API, which deep copies the values, including
Tensor, but we didn't deepcopy `Blob`, `Capsule`, `Future` or `PyObject`.
For more discussions please see the following issue.
fixes: https://github.com/pytorch/pytorch/issues/32519
Test Plan: Imported from OSS
Differential Revision: D21220756
fbshipit-source-id: 476bf11fe82c08fac36e7457879a09f545ffdc5e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36727
Looks like this was renamed by accident in 0cbd7fa46f
Test Plan:
Unit test.
Lint.
Differential Revision: D21076697
Pulled By: dreiss
fbshipit-source-id: dbd18cb41c7b26479984a7a7b12ad41a4c5b7658
Summary:
With https://github.com/pytorch/pytorch/pull/35562, we are running peephole optimization on inlining to reduce the number of nodes that are copied.
The tracer encodes the sizes in the graph like:
```
graph(%0 : Double(7)):
%1 : Function = prim::Constant[name="tensor_size"]()
%2 : Tensor = prim::CallFunction(%1, %0)
return (%2)
```
however people would like to reuse the graph with different shapes so running size invalidations would invalidate that. long term it might be better for the tracer to not include shape information but there are downstream users of that.
Separates out FuseAddMM from peephole so that now there is a single `disable_size_optimizations` parameter, and onnx explicitly invokes fuseaddmm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36404
Differential Revision: D20968974
Pulled By: eellison
fbshipit-source-id: 56f8f1699e3b0adeeccdfd5a67bb975fd41a2913
Summary:
Since aten;:__interpolate is removed in https://github.com/pytorch/pytorch/pull/34514, we need a pass replace interpolate function with aten::__interpolate for ONNX export.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35744
Reviewed By: hl475
Differential Revision: D20907041
Pulled By: houseroad
fbshipit-source-id: f2d2cdfec47389245c50f538267124eedf682adf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36277
This PR introduce a flag to the tracer that guard the risky behaviors
like adding list/dict as output of the tracer. Currently to ensure not
BC breaking user, we throw warning if the tracer output is list, and
will throw error when the tracer output is dict to enforce using this
flag (next PR)
Test Plan: Imported from OSS
Differential Revision: D20998157
Pulled By: wanchaol
fbshipit-source-id: 0d2c55f1a263a48b1b92dd6ad54407815e0a6f72
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35170
Looks like this was renamed by accident in 0cbd7fa46f
Test Plan:
Unit test.
Imported from OSS
Differential Revision: D20783298
fbshipit-source-id: 8fcc146284af022ec1afe8d651baf6721b190ad3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35630
Prefix underscored for now because the semantics of this method can be
confusing. It adds a new attribute to the *type*, which can be shared
by several objects.
Test Plan:
Next diff in stack uses it, and has unit tests.
Imported from OSS
Differential Revision: D20904253
fbshipit-source-id: dcbf60eacf0e0e075c19238165aa33954aa73b5f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35913
The pass itself is still disabled by default, but with this change we
don't need to register it as a custom pass anymore. It allows us to
control its behavior with env variables more easily.
Test Plan: Imported from OSS
Reviewed By: suo
Differential Revision: D20827189
Pulled By: ZolotukhinM
fbshipit-source-id: e74d90b5e46422e7ab7bc40974a805220da50fbc
Summary:
Someone messaged me abt this when a better error msg would have solved their problem
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35888
Differential Revision: D20819538
Pulled By: eellison
fbshipit-source-id: 95d124bfd162e1747dcdf7a981703a279a5dfaa6
Summary:
**Summary:** This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_ One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated.
**Warning:** This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser.
**Short term goals:**
Parity with current CUDA fuser (including performance):
- Dynamic shapes (no recompilation)
- Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code)
- Dropout
**Mid-term goals:**
- Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation).
- 1-D reductions fused with pointwise operations
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34785
Reviewed By: ZolotukhinM
Differential Revision: D20650977
Pulled By: soumith
fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35720
When modules are saved, all relevant types are serialized according to
their qualified name with a compilation unit. Since qualified names are
guaranteed to be unique within a compilation unit, this normally works
fine.
On load, all types are registered in a compilation unit owned by the
script::Module. Type names are not unique across compilation units, so
if you load two modules with colliding type names, make them submodules
of yet another module, and save that module, there is the potential of a
name collision. See the added tests for examples if that description is
confusing.
The solution is to unique type names when serializing code by mangling
them if we detect a name collision.
Test Plan: Imported from OSS
Differential Revision: D20749423
Pulled By: suo
fbshipit-source-id: a8827ff1d4a89f3e7964dbbb49b4381863da3e6a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35586
This pass fuses the choose_qparams-quant-dequant sequence
Fusion for weight tensor is the same as static quant.
Test Plan:
python test/test_quantize_script.py
Imported from OSS
Differential Revision: D20755680
fbshipit-source-id: b7443770642b6e6fa0fa9da8a44637e9b2d4df70
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35448
Add _choose_qparams_per_tensor which returns scale and zero_point similar to the dynamic quantization in the operator
Test Plan:
python test/test_quantize_script.py
Imported from OSS
Differential Revision: D20755679
fbshipit-source-id: c9066d8f1bb3e331809be26c4be806faafc9b981