Commit Graph

33 Commits

Author SHA1 Message Date
Oguz Ulgen
f8465df9f0 Use graph.find_nodes in inductor (#122256)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122256
Approved by: https://github.com/jansel
ghstack dependencies: #121565, #122255
2024-04-07 18:51:14 +00:00
Tugsbayasgalan Manlaibaatar
a9e9590934 FF inductor failure (#114980)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114980
Approved by: https://github.com/eellison, https://github.com/bdhirsh
2023-12-04 18:26:34 +00:00
Tugsbayasgalan Manlaibaatar
a378ae33e9 [BE][aot_autograd] Remove mutated_inp_indices (#114421)
We should use mutated_inp_runtime_indices moving forward

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114421
Approved by: https://github.com/zhxchen17
2023-11-23 22:41:38 +00:00
Jez Ng
df9acc61fb [inductor] Make {freezing,ir}.py pass follow-imports typechecking (#113534)
I used a couple of type-ignore comments in ir.py because it constructs
short-lived instances of FixedLayout and GraphModuleSerializer, just to
call a single method on them that doesn't use all their members. Making
those unused members optional would make the rest of the code a lot
messier with sprinkled `assert` statements.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113534
Approved by: https://github.com/albanD
2023-11-16 01:53:52 +00:00
Jez Ng
5b95715bc0 Make {Tracing,Compile}Context.get() return non-optional type (#113535)
They are used in many contexts that don't actually check if the returned
type is `None`. I have also created `try_get()` for the cases where we
do actually want an Optional type returned.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113535
Approved by: https://github.com/ezyang
ghstack dependencies: #113412
2023-11-14 04:31:12 +00:00
sanchitintel
40ab6409da [Trivial change] Remove duplicate line in freezing.py (#112538)
## Description

`aten = torch.ops.aten` was being called twice.
Removed one assignment in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112538
Approved by: https://github.com/jgong5, https://github.com/Skylion007, https://github.com/eellison
2023-11-02 03:20:18 +00:00
Peter Bell
66c32d099a Use pytree.arg_tree_leaves everywhere (#112394)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394
Approved by: https://github.com/lezcano
ghstack dependencies: #112391, #112392, #112393
2023-10-31 15:57:06 +00:00
Peter Bell
bbd5b935e4 Use pytree.tree_leaves everywhere (#112324)
This changes all the instances I could find of `tree_flatten(...)[0]` or
`x, _ = tree_flatten` to use `tree_leaves`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324
Approved by: https://github.com/lezcano
ghstack dependencies: #112327, #112323
2023-10-30 03:39:04 +00:00
Jez Ng
9172c9f03f Fix spelling / capitalization in freezing.py error message (#109347)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109347
Approved by: https://github.com/eellison
ghstack dependencies: #109269
2023-09-18 18:12:20 +00:00
Jez Ng
bab627073a Enable typechecking for _inductor/freezing.py (#109269)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109269
Approved by: https://github.com/eellison
2023-09-18 18:12:18 +00:00
Mu-Chu Lee
30a33b76b9 [AOTInductor] Include constants in AOTInductor .so file. (#108473)
Summary:
Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108473
Approved by: https://github.com/angelayi
2023-09-08 03:49:53 +00:00
eellison
ed92d9345e Refactorings for constant folding (#108450)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108450
Approved by: https://github.com/jansel
2023-09-02 03:49:05 +00:00
Bin Bao
06d74e6b24 Revert "[AOTInductor] Include constants in AOTInductor .so file. (#10… (#108349)
This reverts commit c3239442a3 due to internal test failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108349
Approved by: https://github.com/aakhundov, https://github.com/zhxchen17
2023-08-31 16:26:02 +00:00
Mu-Chu Lee
c3239442a3 [AOTInductor] Include constants in AOTInductor .so file. (#107718)
Summary:
Include the constants into AOTInductor .so file.
We do not modify existing API signatures but create necessary format with weight lifted out instead.

Test Plan:
test/inductor/test_aot_inductor.py

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107718
Approved by: https://github.com/angelayi, https://github.com/eellison
2023-08-29 22:37:30 +00:00
leslie-fang-intel
25678e31dc [Quant][Inductor] Enable quantized conv weight prepack inside inductor constant folding (#104581)
**Summary**
Enable quantization conv weight prepack inside inductor constant folding.

**Test Plan**
```
python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_unary
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104581
Approved by: https://github.com/jgong5, https://github.com/eellison
ghstack dependencies: #104580
2023-08-25 17:37:41 +00:00
eellison
606e3c297b conv-bn folding in low precision (#106576)
Batchnorm inference is done in fp32 if the inputs are in fp16/bf16 and the output is casted back down to its original precision. This causes the batchnorm weights to get constant folded to fp32, and prevented Conv-BN folding from firing.
```
 def forward(self, arg0_1: bf16[32, 3, 3, 3], arg1_1: bf16[32], arg2_1: bf16[32], ...)
     convolution: bf16[3, 32, 15, 15] = aten..convolution.default(arg6_1, arg0_1, None, [2, 2], [0, 0], [1, 1], False, [0, 0], 1);  arg6_1 = arg0_1 = None
     # weight upcasting
     convert_element_type: f32[32] = torch.ops.prims.convert_element_type.default(arg3_1, torch.float32);  arg3_1 = None
     convert_element_type_1: f32[32] = torch.ops.prims.convert_element_type.default(arg4_1, torch.float32);  arg4_1 = None
     ...
     # end of batch norm
     add_1: f32[3, 32, 15, 15] = aten..add.Tensor(mul_2, unsqueeze_7);  mul_2 = unsqueeze_7 = None
     # output downcast
     convert_element_type_2: bf16[3, 32, 15, 15] = torch.ops.prims.convert_element_type.default(add_1, torch.bfloat16);  add_1 = None
```

I mark the convolutions which are followed by binary foldable ops in a higher precision that are then get converted back down to the original conv dtype. We fold the weights in fp32 because it's slightly better accuracy, then at the end of the pass convert back the weights to their original dtype.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106576
Approved by: https://github.com/XiaobingSuper, https://github.com/yanboliang
2023-08-10 05:12:04 +00:00
PyTorch MergeBot
dc7ec4c843 Revert "conv-bn folding in low precision (#106576)"
This reverts commit c21df02ec0.

Reverted https://github.com/pytorch/pytorch/pull/106576 on behalf of https://github.com/kit1980 due to breaking internal builds, see D48144191 ([comment](https://github.com/pytorch/pytorch/pull/106576#issuecomment-1670768310))
2023-08-09 06:51:54 +00:00
Elias Ellison
c21df02ec0 conv-bn folding in low precision (#106576)
Batchnorm inference is done in fp32 if the inputs are in fp16/bf16 and the output is casted back down to its original precision. This causes the batchnorm weights to get constant folded to fp32, and prevented Conv-BN folding from firing.
```
 def forward(self, arg0_1: bf16[32, 3, 3, 3], arg1_1: bf16[32], arg2_1: bf16[32], ...)
     convolution: bf16[3, 32, 15, 15] = aten..convolution.default(arg6_1, arg0_1, None, [2, 2], [0, 0], [1, 1], False, [0, 0], 1);  arg6_1 = arg0_1 = None
     # weight upcasting
     convert_element_type: f32[32] = torch.ops.prims.convert_element_type.default(arg3_1, torch.float32);  arg3_1 = None
     convert_element_type_1: f32[32] = torch.ops.prims.convert_element_type.default(arg4_1, torch.float32);  arg4_1 = None
     ...
     # end of batch norm
     add_1: f32[3, 32, 15, 15] = aten..add.Tensor(mul_2, unsqueeze_7);  mul_2 = unsqueeze_7 = None
     # output downcast
     convert_element_type_2: bf16[3, 32, 15, 15] = torch.ops.prims.convert_element_type.default(add_1, torch.bfloat16);  add_1 = None
```

I mark the convolutions which are followed by binary foldable ops in a higher precision that are then get converted back down to the original conv dtype. We fold the weights in fp32 because it's slightly better accuracy, then at the end of the pass convert back the weights to their original dtype.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106576
Approved by: https://github.com/XiaobingSuper, https://github.com/yanboliang
ghstack dependencies: #106471, #106575
2023-08-07 01:30:47 +00:00
Elias Ellison
0010a8f753 Deallocate constant when it is no longer needed in constant folding (#106216)
Differential Revision: [D47881214](https://our.internmc.facebook.com/intern/diff/D47881214)

tested locally with :
```
@torch.compile()
def foo():
    size_gb = 1
    size_bytes = size_gb * 1024 * 1024 * 1024 * 20

    # Allocate the tensor on the GPU
    tensor = torch.empty(size_bytes // 4, device='cuda')  # Divide by 4 to allocate float32 elements

    for _ in range(10):
        tensor = tensor + 1

    return tensor

foo()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106216
Approved by: https://github.com/Skylion007
2023-07-31 18:20:22 +00:00
Elias Ellison
27ece5fad4 [Easy] remove unneeded sort (#106090)
This isn't needed now that we call stable_topological_sort in `freezing_passes`. The non-stable sort can also hurt perf.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106090
Approved by: https://github.com/Chillee, https://github.com/Skylion007
2023-07-27 19:09:48 +00:00
XiaobingSuper
854fe470cd fix check issue for replace_params_with_constants (#105909)
fix check issue for replace_params_with_constants to make llama mode const folding work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105909
Approved by: https://github.com/jgong5, https://github.com/eellison
2023-07-26 12:04:01 +00:00
XiaobingSuper
9c1802f8e3 inductor: using binary folding path to do conv+bn folding (#105650)
This path will use binary folding to do conv+bn folding to avoid using ```make_fx``` which meets tracing errors in some model dynamic shape path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105650
Approved by: https://github.com/eellison
2023-07-26 07:37:47 +00:00
XiaobingSuper
d09195ce82 inductor: fix fx tracing error for freezing pass (#105133)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105133
Approved by: https://github.com/eellison
2023-07-18 10:40:22 +00:00
XiaobingSuper
38c1e86ee2 inductor: make sure as_strided ops' input layout is not changed after converting conv's weight format (#105122)
For the freezing path, if we convert conv's weight to channels last, we need to make sure as_strided ops' input layout is not changed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105122
Approved by: https://github.com/jgong5, https://github.com/shunting314
2023-07-18 09:26:54 +00:00
Jerry Zhang
7b4d080496 [quant][pt2e] Rename _pt2e to pt2e (#104668)
Summary:
X-link: https://github.com/pytorch/executorch/pull/3

att

Test Plan: Imported from OSS

Differential Revision: D47202807

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104668
Approved by: https://github.com/andrewor14
2023-07-15 06:34:17 +00:00
XiaobingSuper
22520964ae inductor: convert view to reshape before doing fake_tensor_prop at freezing step (#104612)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104612
Approved by: https://github.com/jgong5, https://github.com/eellison, https://github.com/shunting314
2023-07-06 04:27:50 +00:00
XiaobingSuper
e802900bdc inductor: move the CPU weight packing path after of AOTAutograd (#103851)
At next step:
1. support amp path for applying more fusion.
2. support dynamic shape path for applying more fusion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103851
Approved by: https://github.com/jgong5, https://github.com/eellison
2023-07-06 00:48:35 +00:00
Shunting Zhang
98f00f881f [inductor] convert layout of conv weight ahead of time for inference (#103642)
This PR handles inference. Will do similar thing for training later.

Some manual testing results shows this can improve inference perf by 2-3% (absolute improvement not relative one).
- convmixer: 4.285x -> 4.309x
- resnet50: 2.170x -> 2.203x

The PR is built upon freezing. Since without freezing, the weight input for a conv node may not be a parameter directly but be the output of precision converting ops. It's so much easier to implement this PR after freezing.

Commands
```
TORCHINDUCTOR_FREEZING=1 python benchmarks/dynamo/timm_models.py --backend inductor --amp --performance --only convmixer_768_32 --inference
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103642
Approved by: https://github.com/eellison
2023-06-28 17:42:32 +00:00
Elias Ellison
05ebd538d4 Inference Horizontal Fuse Addmm (#100746)
Gives 1.5% improvement on PegasusForCausalLM

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100746
Approved by: https://github.com/jansel
2023-06-28 01:08:37 +00:00
Elias Ellison
edc9c0df7e Fold Conv-Bn (#100653)
Adds Conv-BN folding to inductor freezing. One thing that's a little awkward now is we'll want different decompositions to run depending on if we are in the inference compiler. For now, I require that you run with torch.no_grad() so we can detect if no gradients are required before calling aot_autograd.

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100653
Approved by: https://github.com/jansel
2023-06-26 16:04:34 +00:00
Elias Ellison
1b16ac7481 Add A Pass to Fold Tensors With a Uniform Value, match sdpa on a few models (#103600)
Adds a Constant Folding pass to the joint graph only targeting tensors which can be replaced with a single value, and then removes no-ops from the graph. This allows us to match sdpa in BertForMaskedLM, AlbertForMaskedLM, and LayoutLMForMaskedLM.

BertForMaskedLM
Perf: 1.6853 -> 1.933, Memory: 0.9462 -> 1.41

AlbertForMaskedLM
Perf: 1.6620 -> 1.761, Memory: 1.257 -> 1.94

LayoutLMForMaskedLM
Perf: (non cudagraphs) 1.6991 -> 1.939x, Memory: 0.9624 -> 1.50

MobileBertForMaskedLM
Perf: 1.864x -> 1.941x, Memory: 0.94 -> 1.03

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103600
Approved by: https://github.com/jansel
2023-06-17 16:50:51 +00:00
Elias Ellison
25b6b95b2e Fix freezing tests (#103531)
Workaround for https://github.com/pytorch/pytorch/issues/103532

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103531
Approved by: https://github.com/desertfire
2023-06-13 22:51:48 +00:00
Elias Ellison
d083d444ff Inductor Freezing (#100652)
Adds a freezing pass that will constant fold parameters in inductor `config.freezing`. This occurs post functionalization in aot autograd to capture both dispatching and allow passes to occur post functionalization. A few notes:

- There is an option to discard parameters `config.freezing_discard_parameters` which will take the current eager modules and wrap parameters to a Tensor subclass which will error if used.
- I needed to expose flat_params in aot_autograd in order to discard old references when we constant fold away parameters, like with amp. I also exposed `fw_metadata` to avoid constant folding mutated paraemters.
- Caching parameter transformations/constant folding across different inferences nyi
- Checking version_counter of constant folded params nyi

I'm not really sure what the actual naming should be. In jit there was both "freezing", which was platform agnostic, and "optimize for inference", which made device specific optimizations. We're doing the latter here but maybe freezing is a better name.

Differential Revision: [D46244033](https://our.internmc.facebook.com/intern/diff/D46244033)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100652
Approved by: https://github.com/jansel
2023-06-12 20:56:03 +00:00