Fixes#68972
Relands #107246
To avoid causing Meta-internal CI failures, this PR avoids always asserting that the default dtype is float in the `TestCase.setUp/tearDown` methods. Instead, the assert is only done if `TestCase._default_dtype_check_enabled == True`. `_default_dtype_check_enabled` is set to True in the `if __name__ == "__main__":` blocks of all the relevant test files that have required changes for this issue
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108088
Approved by: https://github.com/ezyang
This PR allows freezing modules like the one below:
```python
# Ex. 1
@torch.jit.interface
class ModuleInterface(torch.nn.Module):
def forward(self, inp: torch.Tensor) -> torch.Tensor:
pass
class ImplementsInterface(torch.nn.Module):
def __init__(self):
super(ImplementsInterface, self).__init__()
self.sum = torch.zeros((2, 2))
def forward(self, inp: torch.Tensor) -> torch.Tensor:
self.sum += inp.relu() # this makes the interface-implementing module mutable
# and previously this would prevent freezing
return self.sum
class WrapperModule(torch.nn.Module):
impl: ModuleInterface
def __init__(self):
super().__init__()
self.impl = ImplementsInterface()
def forward(self, x: torch.Tensor) -> torch.Tensor:
return self.impl.forward(x)
```
Previously during freezing, we handle interfaces as shown below:
1. we inline interfaces in any preserved method graphs
2. during `cleanupFrozenModule`, we try to simplify the module data structure (<- this part is unrelated to freezing so far). During this step, if we found that a interface type was mutable, we'd error out; because of the possibility of a module that _swaps out the value of an interface-typed attribute at runtime_.
Below is an example of a module that swaps out the value of an interface-typed attribute at runtime:
```python
# Ex. 2
class MyBadModule(torch.nn.Module):
impl: MyInterface
option1: IfaceImpl1
option2: IfaceImpl2
....
def forward(self, x):
if x > 0:
self.impl = self.option1
else:
self.impl = self.option2
....
```
^ this type of situation cannot be supported by freezing (or at least would be difficult to do correctly) because it greatly complicates the details of handling types and simplifying the module data structure.
But we can still support the first example without _too_ much work:
1. inline the interface code as before
2. check to see if we have any setattrs on interface types; if so, error out
3. otherwise, replace the type of the interface types with the concrete type implementation
4. continue simplifying the module data structure as if we never had any interfaces.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86039
Approved by: https://github.com/eellison
Not sure why but this started throwing a lot of warnings while I was
adding tests to test_freezing.py, so I'm removing the deprecated escape
sequences to get rid of the warnings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85987
Approved by: https://github.com/eellison
Test was marked as `skip` due ot a memory leak. Turns out the memory leak is expected - it can be fixed by clearing the compilation unit (with `torch.jit._state._python_cu.drop_all_functions()` at the end of the test function) or by disabling the leak detector on this test.
Fixes#77618
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78566
Approved by: https://github.com/eellison
Original PR: #77295
Original commit message:
On GPU, conv errors if not all its inputs have the same dtype.
In the case of autocasting during freezing, what we see is:
1) inputs to conv are casted to half
2) inputs to batchnorm are not casted, so many are still floats
3) we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm.
If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU.
Reland changes:
There's a memory leak from cuda caching allocator that is a side effect of this fix. The memory leak causes the test to fail, though for some reason it didn't fail on CI in the last PR. This skips the tests for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77617
Approved by: https://github.com/eellison
On GPU, conv errors if not all its inputs have the same dtype.
In the case of autocasting during freezing, what we see is:
1) inputs to conv are casted to half
2) inputs to batchnorm are not casted, so many are still floats
3) we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm.
If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77295
Approved by: https://github.com/eellison
Relax the check condition of Conv-> Add/Sub/Mul/Div folding to accept that the input tensor of Add/Sub/Mul/Div is floating type or the promoteTypes of the input tensor of Add/Sub/Mul/Div is equal to the type of conv weight.
Relaxing this condition is mainly to deal with a common situation in models:
the conv output add/sub/mul/div an integer tensor or integer scalar tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73278
Approved by: https://github.com/eellison
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68668
This updates run_frozen_optimizations so that it will run on additional methods other than forward
ghstack-source-id: 143871758
Test Plan:
Added test in test_freezing.py
```
python3 test/test_jit.py -- test_conv_bn_folding_not_forward
```
Reviewed By: eellison
Differential Revision: D32567857
fbshipit-source-id: 75e56efad576404dc8d6897861d249573f5ccd7a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68316
Consider the following:
```
class Mod(nn.Module):
def __init__(self, val):
super().__init__()
self.param = nn.Parameter(val)
def forward(self, x):
# this method will change during freezing
return x + self.param
torch.jit.export
def make_prediction(self, x):
y = x + x
return self.forward(y)
param = torch.rand([2, 2])
unscripted_mod = Mod(param)
mod = torch.jit.script(unscripted_mod)
mod.eval()
mod = torch.jit.freeze(mod, preserved_attrs=["make_prediction"])`
```
During freezing the following will occur:
1. do some pre-freezing, including inlining; in particular, forward will be inlined into make_prediction. During inlining, forward.optimized_graph() is called, and the result is cached
2. freeze some methods. While freezing forward, the graph associated with the function will get updated. The cached optimized_graphs_ are not updated.
Previously, a call to `mod.forward(x)` would return an exectutor that would run on the old cached optimized_graph(). This would mean that the freezing optimizations would not apply, and potentially that the execution would fail because of parameters removed from the module.
This change clears the optimized_graphs_ cache after running freezing to prevent executing an old version of the graph.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D32410862
Pulled By: davidberard98
fbshipit-source-id: dd8bfe86ec2898b7c72813ab32c08f25c38e4cea
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68367
- bmm_test.py was using syntax not allowed in 3.6
- Some suppressions were not placed on the correct line.
With this file,
```
lintrunner --paths-cmd='git grep -Il .'
```
passes successfully.
Test Plan: Imported from OSS
Reviewed By: janeyx99, mrshenli
Differential Revision: D32436644
Pulled By: suo
fbshipit-source-id: ae9300c6593d8564fb326822de157d00f4aaa3c2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67437
Certain ops do nothing on the forward pass and can be discarded after training: `aten::detach` and `fb::scale_gradient` are examples of this.
Test Plan: `buck test caffe2/test:jit -- test_freezing`
Reviewed By: hlu1
Differential Revision: D31980843
fbshipit-source-id: 0045b6babcfae786a2ce801b2f5997a078205bc0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63198
Linear layers using the same input tensor can be concatted together
as long as the weights and biases are compatible.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D31240642
fbshipit-source-id: 1e78daa6b89822412ba2513d326ee0e072ceff1e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63436
use MKLDNN layernorm
use mkldnn version 2
address Elias feedback
fix build CI errors
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D30388825
Pulled By: Krovatkin
fbshipit-source-id: fb909bfbf53cb8567a43aac40f51c491daeec908
Summary:
Freezing exists as a pass which partially evaluates your model and applies generic optimizations which should speed it up. Optimize for inference is a counterpart to these optimizations which runs build & server specific optimizations. The interaction with existing `optimize_frozen_module` is not great, I guess we could just deprecate the API entirely? it was never officially released but just existed to document the `optimize_numerics` keyword.
Eventually, I would like to add a way of adding example inputs but I didnt add that here because they are not being used at all yet. I also have not yet included a way to blacklist individual optimizations, and would like to wait until we move this to Beta and have a little more clarity on how everything will fit together. I also think blacklisting will be an uncommon use case for the current optimizations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58193
Reviewed By: bertmaher, navahgar
Differential Revision: D28443714
Pulled By: eellison
fbshipit-source-id: b032355bb2585720a6d2f00c89d0d9a7ef60e649