Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63198
Linear layers using the same input tensor can be concatted together
as long as the weights and biases are compatible.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D31240642
fbshipit-source-id: 1e78daa6b89822412ba2513d326ee0e072ceff1e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63436
use MKLDNN layernorm
use mkldnn version 2
address Elias feedback
fix build CI errors
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D30388825
Pulled By: Krovatkin
fbshipit-source-id: fb909bfbf53cb8567a43aac40f51c491daeec908
Summary:
Freezing exists as a pass which partially evaluates your model and applies generic optimizations which should speed it up. Optimize for inference is a counterpart to these optimizations which runs build & server specific optimizations. The interaction with existing `optimize_frozen_module` is not great, I guess we could just deprecate the API entirely? it was never officially released but just existed to document the `optimize_numerics` keyword.
Eventually, I would like to add a way of adding example inputs but I didnt add that here because they are not being used at all yet. I also have not yet included a way to blacklist individual optimizations, and would like to wait until we move this to Beta and have a little more clarity on how everything will fit together. I also think blacklisting will be an uncommon use case for the current optimizations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58193
Reviewed By: bertmaher, navahgar
Differential Revision: D28443714
Pulled By: eellison
fbshipit-source-id: b032355bb2585720a6d2f00c89d0d9a7ef60e649
Summary:
After adding new ops to a set of fusible ops, mobilenetv3 slows down to **9000ms from 1200ms** without this fix.
This happens because one of the inputs was expanded and converted to nchw/nhwc
we might end up in a very bad spot if the second argument
is in a blocked format. In this case, MKLDNN uses its
reference implementation for a binary operation that follows
these broadcasts and it could be up to ~100x slower.
We use a very simple heuristic to convert an arg in nchw
to the blocked format of the other argument.
* MKLDNN_VERBOSE without the issue:
[test_mobilenet_nopool.txt](https://github.com/pytorch/pytorch/files/6319528/test_mobilenet_nopool.txt)
* MKLDNN_VERBOSE with the issue (Note the times for `ref` operations)
[test_mobilenet_pool.txt](https://github.com/pytorch/pytorch/files/6319529/test_mobilenet_pool.txt)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56089
Reviewed By: eellison
Differential Revision: D27796688
Pulled By: Krovatkin
fbshipit-source-id: fc34d76358ce899e3b1f2b69efb9b5c38f5af1ad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54772
conv3d-add-relu fusion does not work on some platforms when TF32 is enabled, so set allow_tf32 to false.
Test Plan:
```
python test/test_jit.py -k test_freeze_conv_relu_fusion
```
Imported from OSS
Reviewed By: bertmaher
Differential Revision: D27435560
fbshipit-source-id: e35e2297dce85acfbe988deea97c3f5e68f1e1c7
Summary:
We were accessing their storage which will throw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54632
Reviewed By: ezyang
Differential Revision: D27372192
Pulled By: eellison
fbshipit-source-id: 9985e85af7a35a3d6bf1c0be0185699c34877b94
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53908
This adds reinplacing to MKLDNN Subgraphs so that we replace `aten::add` with `aten::add_`. Normally you would have to prove device and dtype, but we know that already, and because we have explicit broadcast nodes for other reasons we dont have to prove that the output shape of add is the same as inputs.
Ive tested correctness on resnet, I'm going to do more extensive testing as well. When I benchmarked the "unsafe" version (always inplace) I saw average speedups of ~16% for both Single threaded and Multithreaded. I dont think the "safe" version will be far beyond; when I looked at resnet for example every `add` and `relu` were reinplaced.
Theres some question of reusing other alias / liveness / inplacing passes in SR. I thought about it, however I didnt want to add a cross-dependency between very different parts of the code base with a bunch of different assumptions. The logic here is also covering a simpler case and does not add much complexity IMO.
Test Plan: Imported from OSS
Reviewed By: Krovatkin
Differential Revision: D27132969
Pulled By: eellison
fbshipit-source-id: 121a38daaedf01363f6b66a814beaaa72a0ab0dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52614
This can speed up models by 5% (~.5-1% from the base, but ~5% after they've been sped up with mkldnn).
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D26696693
Pulled By: eellison
fbshipit-source-id: bfed55242524a4c2f1ae5d63e76d6803016d986d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53410
**Summary**
This commit enables indexing into `ModuleList` using a non-literal
index if the LHS of the assignment statement of which the indexing is
the RHS is annotated with an interface type.
This feature already exists for `ModuleDict`, and this commit builds on
top of that implementation. A `prim::ModuleContainerIndex` operator is
emitted for any statement of the form `lhs: InterfaceType =
module_container[idx]`. The same operator has to be used for both
`ModuleDict` and `ModuleList` because serialization does not preserve
the metadata that indicates whether a `Module` is a `ModuleDict` or
`ModuleList`.
**Testing**
This commit extends the existing unit tests for non-literal `ModuleDict`
indexing to test non-literal `ModuleList` indexing.
**Fixes**
This commit fixes#47496.
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D26857597
Pulled By: SplitInfinity
fbshipit-source-id: d56678700a264d79aae3de37ad6b08b080175f7c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52918
Freeze_module seems to operate under the assumption that forward always exists. This isnt true, so the change first checks for existence then retrieves the function.
ghstack-source-id: 123215242
Test Plan: Try freezing something with and without forward.
Reviewed By: dhruvbird
Differential Revision: D26671815
fbshipit-source-id: d4140dad3c59d3d20012143175f9b9268bf23050
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52613
Including MaxPool as part of the MKLDNN fusion group sped up resnet18 by ~20%, and was a win on other models I tested as well. I will post more complete benchmarks.
As mentioned in the diff, in some cases MaxPool can be slower than aten - ideally we'd only include maxpool if it decreased the number of layout transformations that occur. That hasnt actually matttered for all of the torchvision models, I don't think its necessary for this PR.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D26696704
Pulled By: eellison
fbshipit-source-id: 61a025dbf5e7591c0a0f75def3beb439a138a21e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51600
Looking for notes on implementation first, will post more notes on benchmarks and overall thoughts/implementation and solicit more input soon.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D26696702
Pulled By: eellison
fbshipit-source-id: cd612f093fe3859e42fb0b77560ebd1b44fccff7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51484
This PR moves the linear weights of a frozen model to MKLDNN. When the weights are already in MKLDNN, just computing a single linear by converting the input and output from/to mkldnn provides large speedups. I benchmark'd the results of the top 200 shapes in predictor [here](https://www.internalfb.com/phabricator/paste/view/P171537854) (taken from aten::matmul), as well as verified that it sped up popular models. .
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D26696698
Pulled By: eellison
fbshipit-source-id: 53d03b9e6956e11b700ee58214e2266e2aa4106a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51483
This PR moves the conv weights of a frozen model to MKLDNN, and AOT reorders the weights. When the weights are already in MKLDNN, just computing a single conv by converting the input and output from/to mkldnn provides large speedups. I benchmark'd the results of the top 200 shapes in predictor [here](https://www.internalfb.com/phabricator/paste/view/P171537938), as well as verified that it sped up popular models in torchvision.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D26696703
Pulled By: eellison
fbshipit-source-id: 0b4441bee4f6e0890a4540fbca3bb5e58b8c5adf
Summary:
Update freezing api for 1.8, and add a corresponding C++ API. The `optimize` flag hasn't been publicly released yet, so we are able to change it without breaking BC. I will submit a PR to branch release as well, there are a few more days to do that
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52337
Reviewed By: ejguan
Differential Revision: D26491833
Pulled By: eellison
fbshipit-source-id: 6dcd74eb8f76db64ac53183d03dabdd0f101f4b5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51589
Dropout operators are only needed in training. Remove them for frozen models.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D26214259
fbshipit-source-id: 3ab05869e1e1f6c57498ba62bf40944f7c2189aa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50194
**Summary**
`ClassType::repr_str()` prints out only the name of a `ClassType`, which
is not always enough to disambiguate it. In some situations, two
`ClassTypes` are compared and do not match despite having identical
names because they are in separate compilation units. In such cases, the
error message can seem nonsensical (e.g. `expected type T but found type
T`). This commit modifies `ClassType::repr_str()` so that it prints out
the address of the type's compilation unit to make these messages less
puzzling (e.g. `expected type T (0x239023) but found type T (0x230223)`).
**Test Plan**
This commit adds a unit test, `ClassTypeTest.IdenticalTypesDifferentCus`
that reproduces this situation.
**Fixes**
This commit fixes#46212.
Test Plan: Imported from OSS
Reviewed By: tugsbayasgalan
Differential Revision: D25933082
Pulled By: SplitInfinity
fbshipit-source-id: ec71b6728be816edd6a9c2b2d5075ead98d8bc88
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50222
This PR adds a pass which runs a set of optimizations to be done after freezing. Currently this encompasses Conv-BN folding, Conv->Add/Sub/Mul/Div folding and i'm also planning on adding dropout removal.
I would like some feedback on the API. torch.jit.freeze is technically in \~prototype\~ phase so we have some leeway around making changes. I think in the majority of cases, the user is going to want to freeze their model, and then run in inference. I would prefer if the optimization was opt-out instead of opt-in. All internal/framework use cases of freezing all use `freeze_module`, not the python API, so this shouldn't break anything.
I have separated out the optimization pass as a separate API to make things potentially modular, even though I suspect that is an unlikely case. In a future PR i would like to add a `torch::jit::freeze` which follows the same api as `torch.jit.freeze` intended for C++ use, and runs the optimizations.
Test Plan: Imported from OSS
Reviewed By: tugsbayasgalan
Differential Revision: D25856264
Pulled By: eellison
fbshipit-source-id: 56be1f12cfc459b4c4421d4dfdedff8b9ac77112
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50075
Adds Conv - Add/Sub/Mul/Div fusion for frozen models. This helps cover models like torchvision maskrcnn, which use a hand-rolled batchnorm implementation: 90645ccd0e/torchvision/ops/misc.py (L45).
I haven't tested results yet but I would expect a somewhat similar speed up as conv-bn fusion (maybe a little less).
Test Plan: Imported from OSS
Reviewed By: tugsbayasgalan
Differential Revision: D25856265
Pulled By: eellison
fbshipit-source-id: 2c36fb831a841936fe4446ed440185f59110bf68
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50074
Adds Conv-BN fusion for models that have been frozen. I haven't explicitly tested perf yet but it should be equivalent to the results from Chillee's PR [here](https://github.com/pytorch/pytorch/pull/476570) and [here](https://github.com/pytorch/pytorch/pull/47657#issuecomment-725752765). Click on the PR for details but it's a good speed up.
In a later PR in the stack I plan on making this optimization on by default as part of `torch.jit.freeze`. I will also in a later PR add a peephole so that there is not conv->batchnorm2d doesn't generate a conditional checking # dims.
Zino was working on freezing and left the team, so not really sure who should be reviewing this, but I dont care too much so long as I get a review �
Test Plan: Imported from OSS
Reviewed By: tugsbayasgalan
Differential Revision: D25856261
Pulled By: eellison
fbshipit-source-id: da58c4ad97506a09a5c3a15e41aa92bdd7e9a197
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45716
**Summary**
This commit enables indexing into `ModuleDict` using a non-literal
index if the `ModuleDict` is annotated with `Dict[str, X]`, where `X` is
a module interface type. These annotations must be expressed using a
class attribute named `__annotations__`, which is a `Dict[str, Type]`
where the keys are the names of module attributes and the values are
their types.
The approach taken by this commit is that these annotations are stored
as "hints" along with the corresponding module attributes in the
`ConcreteSubmoduleTypeBuilder` instance for each module (which might be
a `ModuleDict`). These hints are passed into the `ModuleValue` that is
created for desugaring operations on submodules so that indexing into a
`ModuleDict` can be emitted as a getitem op into a dict emitted into the
graph that represents the `ModuleDict`.
**Test Plan**
This commit adds unit tests to `TestModuleContainers` to test this
feature (`test_typed_module_dict`).
Differential Revision: D24070606
Test Plan: Imported from OSS
Reviewed By: ansley
Pulled By: SplitInfinity
fbshipit-source-id: 6019a7242d53d68fbfc1aa5a49df6cfc0507b992
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45143
This PR prevents freezing cleaning up a submodule when user requests to
preserve a submodule.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D23844969
Pulled By: bzinodev
fbshipit-source-id: 80e6db3fc12460d62e634ea0336ae2a3551c2151
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43790
Interface calls were not handled properly when they are used in fork
subgraph. This PR fixes this issue.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D23402039
Pulled By: bzinodev
fbshipit-source-id: 41adc5ee7d942250e732e243ab30e356d78d9bf7
Summary:
This patch allows to freeze model that utilizes interfaces. Freezing works
under the user assumption that the interfase module dones not aliases with
any value used in the model.
To enable freezing of such modules, added an extra pramater:
torch._C._freeze_module(module, ignoreInterfaces = True)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41860
Reviewed By: eellison
Differential Revision: D22670566
Pulled By: bzinodev
fbshipit-source-id: 41197a724bc2dca2e8495a0924c224dc569f62a4
Summary:
During cleanup phase, calling recordReferencedAttrs would record
the attributes which are referenced and hence kept.
However, if you have two instances of the same type which are preserved
through freezing process, as the added testcase shows, then during
recording the attributes which are referenced, we iterate through the
type INSTANCES that we have seen so far and record those ones.
Thus if we have another instance of the same type, we will just look at
the first instance in the list, and record that instances.
This PR fixes that by traversing the getattr chains and getting the
actual instance of the getattr output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42457
Test Plan:
python test/test_jit.py TestFreezing
Fixes #{issue number}
Reviewed By: gchanan
Differential Revision: D23106921
Pulled By: kimishpatel
fbshipit-source-id: ffff52876938f8a1fedc69b8b24a3872ea66103b
Summary:
During cleanup phase, calling recordReferencedAttrs would record
the attributes which are referenced and hence kept.
However, if you have two instances of the same type which are preserved
through freezing process, as the added testcase shows, then during
recording the attributes which are referenced, we iterate through the
type INSTANCES that we have seen so far and record those ones.
Thus if we have another instance of the same type, we will just look at
the first instance in the list, and record that instances.
This PR fixes that by traversing the getattr chains and getting the
actual instance of the getattr output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42457
Test Plan:
python test/test_jit.py TestFreezing
Fixes #{issue number}
Reviewed By: zou3519
Differential Revision: D22898051
Pulled By: kimishpatel
fbshipit-source-id: 8b1d80f0eb40ab99244f931d4a1fdb28290a4683
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40718
Currently only constant except tensor must be inlined during serialization.
Tensor are stored in the contant table. This patch generalizes this capability
to any IValue. This is particularly useful for non ASCII string literal that
cannot be inlined.
Test Plan: Imported from OSS
Differential Revision: D22298169
Pulled By: bzinodev
fbshipit-source-id: 88cc59af9cc45e426ca8002175593b9e431f4bac
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38830
This patch enables to preserve user specified attributes or non forward
methods. The API:
_freeze_module(Module, ["a", "version"])
Test Plan: Imported from OSS
Differential Revision: D21957316
Pulled By: bzinodev
fbshipit-source-id: 5c9146ae679791070a9de868c45785725b48a9e6
Summary:
This patch removes call to run optimizations within freezing API.
Only dead code elimination is invoked to clean up the frozen module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38499
Reviewed By: eellison
Differential Revision: D21579607
Pulled By: bzinodev
fbshipit-source-id: a6231754fea89296a3dcf07b5e37a1c43cb8d5dd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36953
Add support for generic lists as a constant. generic dicts & tuples are already implemented. This is a pretty common pattern and cuts down on the number of non-tensor nodes executed in interpolate tests.
Test Plan: Imported from OSS
Differential Revision: D21160761
Pulled By: eellison
fbshipit-source-id: 1e6b7b25b7580f09067794772d44e615601c60c4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34787
This is a follow up patch of freezing of TorchScript modules. This patch
enables removal of constant attributes and unused method in submodules.
The clean up logic is generalized to handle attributes that share their class
type.
Test Plan: Imported from OSS
Differential Revision: D21004990
Pulled By: bzinodev
fbshipit-source-id: 84778aa9ae1a96d23db29c051031f9995ed3ac90
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34786
1) Rename 'HashIValue' to 'HashAliasedIValue'
2) Added Object case in getSubValues function
3) Hashes tensors to their storage
4) Added Dict case in orverrideGradient
5) nit clean up
Test Plan: Imported from OSS
Differential Revision: D20585270
Pulled By: bzinodev
fbshipit-source-id: f580f3cb80dd5623088a014efd5f0f5ccc1659c0
Summary:
This patch enables folding GetAttr nodes with their corresponding
values. _jit_pass_freeze_module API returns a new TorchScipt module
where all function calls and get attributes are inlined.
Usage:
frozen_model = torch._C._freeze_module(scrited_model._c)
frozen_model.forward(...)
This API currently optimizes the forward method. We will follow up to
to preserve and optimize methods and attributes that are annotated as
torch.jit.interface.
Several future improvements to JIT optimizations are required to maximize
clean up/de-sugar the graph and eliminate redundancies.
Ideally, we want to produce a graph that can easily be lowered to
GLOW and other low-level backends.
__
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32178
Differential Revision: D19419640
Pulled By: bzinodev
fbshipit-source-id: 52baffaba9bca2cd60a8e747baa68d57711ad42b