pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Oguz Ulgen	f8465df9f0	Use graph.find_nodes in inductor (#122256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122256 Approved by: https://github.com/jansel ghstack dependencies: #121565, #122255	2024-04-07 18:51:14 +00:00
Tugsbayasgalan Manlaibaatar	a9e9590934	FF inductor failure (#114980 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114980 Approved by: https://github.com/eellison, https://github.com/bdhirsh	2023-12-04 18:26:34 +00:00
Tugsbayasgalan Manlaibaatar	a378ae33e9	[BE][aot_autograd] Remove mutated_inp_indices (#114421 ) We should use mutated_inp_runtime_indices moving forward Pull Request resolved: https://github.com/pytorch/pytorch/pull/114421 Approved by: https://github.com/zhxchen17	2023-11-23 22:41:38 +00:00
Jez Ng	df9acc61fb	[inductor] Make {freezing,ir}.py pass follow-imports typechecking (#113534 ) I used a couple of type-ignore comments in ir.py because it constructs short-lived instances of FixedLayout and GraphModuleSerializer, just to call a single method on them that doesn't use all their members. Making those unused members optional would make the rest of the code a lot messier with sprinkled `assert` statements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113534 Approved by: https://github.com/albanD	2023-11-16 01:53:52 +00:00
Jez Ng	5b95715bc0	Make {Tracing,Compile}Context.get() return non-optional type (#113535 ) They are used in many contexts that don't actually check if the returned type is `None`. I have also created `try_get()` for the cases where we do actually want an Optional type returned. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113535 Approved by: https://github.com/ezyang ghstack dependencies: #113412	2023-11-14 04:31:12 +00:00
sanchitintel	40ab6409da	[Trivial change] Remove duplicate line in freezing.py (#112538 ) ## Description `aten = torch.ops.aten` was being called twice. Removed one assignment in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112538 Approved by: https://github.com/jgong5, https://github.com/Skylion007, https://github.com/eellison	2023-11-02 03:20:18 +00:00
Peter Bell	66c32d099a	Use `pytree.arg_tree_leaves` everywhere (#112394 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393	2023-10-31 15:57:06 +00:00
Peter Bell	bbd5b935e4	Use `pytree.tree_leaves` everywhere (#112324 ) This changes all the instances I could find of `tree_flatten(...)[0]` or `x, _ = tree_flatten` to use `tree_leaves`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324 Approved by: https://github.com/lezcano ghstack dependencies: #112327, #112323	2023-10-30 03:39:04 +00:00
Jez Ng	9172c9f03f	Fix spelling / capitalization in freezing.py error message (#109347 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109347 Approved by: https://github.com/eellison ghstack dependencies: #109269	2023-09-18 18:12:20 +00:00
Jez Ng	bab627073a	Enable typechecking for _inductor/freezing.py (#109269 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109269 Approved by: https://github.com/eellison	2023-09-18 18:12:18 +00:00
Mu-Chu Lee	30a33b76b9	[AOTInductor] Include constants in AOTInductor .so file. (#108473 ) Summary: Include constants in AOTInductor .so file. Added some difference: 1) serialize with ctypes instead of the native of torch.storage 2) Use the underlying for_blob instead of from_blob to construct Tensor. Test Plan: Unit tests: ``` test/inductor/test_aot_inductor.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108473 Approved by: https://github.com/angelayi	2023-09-08 03:49:53 +00:00
eellison	ed92d9345e	Refactorings for constant folding (#108450 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108450 Approved by: https://github.com/jansel	2023-09-02 03:49:05 +00:00
Bin Bao	06d74e6b24	Revert "[AOTInductor] Include constants in AOTInductor .so file. (#10… (#108349 ) This reverts commit `c3239442a3` due to internal test failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108349 Approved by: https://github.com/aakhundov, https://github.com/zhxchen17	2023-08-31 16:26:02 +00:00
Mu-Chu Lee	c3239442a3	[AOTInductor] Include constants in AOTInductor .so file. (#107718 ) Summary: Include the constants into AOTInductor .so file. We do not modify existing API signatures but create necessary format with weight lifted out instead. Test Plan: test/inductor/test_aot_inductor.py Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/107718 Approved by: https://github.com/angelayi, https://github.com/eellison	2023-08-29 22:37:30 +00:00
leslie-fang-intel	25678e31dc	[Quant][Inductor] Enable quantized conv weight prepack inside inductor constant folding (#104581 ) Summary Enable quantization conv weight prepack inside inductor constant folding. Test Plan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_unary ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104581 Approved by: https://github.com/jgong5, https://github.com/eellison ghstack dependencies: #104580	2023-08-25 17:37:41 +00:00
eellison	606e3c297b	conv-bn folding in low precision (#106576 ) Batchnorm inference is done in fp32 if the inputs are in fp16/bf16 and the output is casted back down to its original precision. This causes the batchnorm weights to get constant folded to fp32, and prevented Conv-BN folding from firing. ``` def forward(self, arg0_1: bf16[32, 3, 3, 3], arg1_1: bf16[32], arg2_1: bf16[32], ...) convolution: bf16[3, 32, 15, 15] = aten..convolution.default(arg6_1, arg0_1, None, [2, 2], [0, 0], [1, 1], False, [0, 0], 1); arg6_1 = arg0_1 = None # weight upcasting convert_element_type: f32[32] = torch.ops.prims.convert_element_type.default(arg3_1, torch.float32); arg3_1 = None convert_element_type_1: f32[32] = torch.ops.prims.convert_element_type.default(arg4_1, torch.float32); arg4_1 = None ... # end of batch norm add_1: f32[3, 32, 15, 15] = aten..add.Tensor(mul_2, unsqueeze_7); mul_2 = unsqueeze_7 = None # output downcast convert_element_type_2: bf16[3, 32, 15, 15] = torch.ops.prims.convert_element_type.default(add_1, torch.bfloat16); add_1 = None ``` I mark the convolutions which are followed by binary foldable ops in a higher precision that are then get converted back down to the original conv dtype. We fold the weights in fp32 because it's slightly better accuracy, then at the end of the pass convert back the weights to their original dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106576 Approved by: https://github.com/XiaobingSuper, https://github.com/yanboliang	2023-08-10 05:12:04 +00:00
PyTorch MergeBot	dc7ec4c843	Revert "conv-bn folding in low precision (#106576 )" This reverts commit `c21df02ec0`. Reverted https://github.com/pytorch/pytorch/pull/106576 on behalf of https://github.com/kit1980 due to breaking internal builds, see D48144191 ([comment](https://github.com/pytorch/pytorch/pull/106576#issuecomment-1670768310))	2023-08-09 06:51:54 +00:00
Elias Ellison	c21df02ec0	conv-bn folding in low precision (#106576 ) Batchnorm inference is done in fp32 if the inputs are in fp16/bf16 and the output is casted back down to its original precision. This causes the batchnorm weights to get constant folded to fp32, and prevented Conv-BN folding from firing. ``` def forward(self, arg0_1: bf16[32, 3, 3, 3], arg1_1: bf16[32], arg2_1: bf16[32], ...) convolution: bf16[3, 32, 15, 15] = aten..convolution.default(arg6_1, arg0_1, None, [2, 2], [0, 0], [1, 1], False, [0, 0], 1); arg6_1 = arg0_1 = None # weight upcasting convert_element_type: f32[32] = torch.ops.prims.convert_element_type.default(arg3_1, torch.float32); arg3_1 = None convert_element_type_1: f32[32] = torch.ops.prims.convert_element_type.default(arg4_1, torch.float32); arg4_1 = None ... # end of batch norm add_1: f32[3, 32, 15, 15] = aten..add.Tensor(mul_2, unsqueeze_7); mul_2 = unsqueeze_7 = None # output downcast convert_element_type_2: bf16[3, 32, 15, 15] = torch.ops.prims.convert_element_type.default(add_1, torch.bfloat16); add_1 = None ``` I mark the convolutions which are followed by binary foldable ops in a higher precision that are then get converted back down to the original conv dtype. We fold the weights in fp32 because it's slightly better accuracy, then at the end of the pass convert back the weights to their original dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106576 Approved by: https://github.com/XiaobingSuper, https://github.com/yanboliang ghstack dependencies: #106471, #106575	2023-08-07 01:30:47 +00:00
Elias Ellison	0010a8f753	Deallocate constant when it is no longer needed in constant folding (#106216 ) Differential Revision: [D47881214](https://our.internmc.facebook.com/intern/diff/D47881214) tested locally with : ``` @torch.compile() def foo(): size_gb = 1 size_bytes = size_gb * 1024 * 1024 * 1024 * 20 # Allocate the tensor on the GPU tensor = torch.empty(size_bytes // 4, device='cuda') # Divide by 4 to allocate float32 elements for _ in range(10): tensor = tensor + 1 return tensor foo() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106216 Approved by: https://github.com/Skylion007	2023-07-31 18:20:22 +00:00
Elias Ellison	27ece5fad4	[Easy] remove unneeded sort (#106090 ) This isn't needed now that we call stable_topological_sort in `freezing_passes`. The non-stable sort can also hurt perf. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106090 Approved by: https://github.com/Chillee, https://github.com/Skylion007	2023-07-27 19:09:48 +00:00
XiaobingSuper	854fe470cd	fix check issue for replace_params_with_constants (#105909 ) fix check issue for replace_params_with_constants to make llama mode const folding work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105909 Approved by: https://github.com/jgong5, https://github.com/eellison	2023-07-26 12:04:01 +00:00
XiaobingSuper	9c1802f8e3	inductor: using binary folding path to do conv+bn folding (#105650 ) This path will use binary folding to do conv+bn folding to avoid using ```make_fx``` which meets tracing errors in some model dynamic shape path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105650 Approved by: https://github.com/eellison	2023-07-26 07:37:47 +00:00
XiaobingSuper	d09195ce82	inductor: fix fx tracing error for freezing pass (#105133 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105133 Approved by: https://github.com/eellison	2023-07-18 10:40:22 +00:00
XiaobingSuper	38c1e86ee2	inductor: make sure as_strided ops' input layout is not changed after converting conv's weight format (#105122 ) For the freezing path, if we convert conv's weight to channels last, we need to make sure as_strided ops' input layout is not changed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105122 Approved by: https://github.com/jgong5, https://github.com/shunting314	2023-07-18 09:26:54 +00:00
Jerry Zhang	7b4d080496	[quant][pt2e] Rename _pt2e to pt2e (#104668 ) Summary: X-link: https://github.com/pytorch/executorch/pull/3 att Test Plan: Imported from OSS Differential Revision: D47202807 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104668 Approved by: https://github.com/andrewor14	2023-07-15 06:34:17 +00:00
XiaobingSuper	22520964ae	inductor: convert view to reshape before doing fake_tensor_prop at freezing step (#104612 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104612 Approved by: https://github.com/jgong5, https://github.com/eellison, https://github.com/shunting314	2023-07-06 04:27:50 +00:00
XiaobingSuper	e802900bdc	inductor: move the CPU weight packing path after of AOTAutograd (#103851 ) At next step: 1. support amp path for applying more fusion. 2. support dynamic shape path for applying more fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103851 Approved by: https://github.com/jgong5, https://github.com/eellison	2023-07-06 00:48:35 +00:00
Shunting Zhang	98f00f881f	[inductor] convert layout of conv weight ahead of time for inference (#103642 ) This PR handles inference. Will do similar thing for training later. Some manual testing results shows this can improve inference perf by 2-3% (absolute improvement not relative one). - convmixer: 4.285x -> 4.309x - resnet50: 2.170x -> 2.203x The PR is built upon freezing. Since without freezing, the weight input for a conv node may not be a parameter directly but be the output of precision converting ops. It's so much easier to implement this PR after freezing. Commands ``` TORCHINDUCTOR_FREEZING=1 python benchmarks/dynamo/timm_models.py --backend inductor --amp --performance --only convmixer_768_32 --inference ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/103642 Approved by: https://github.com/eellison	2023-06-28 17:42:32 +00:00
Elias Ellison	05ebd538d4	Inference Horizontal Fuse Addmm (#100746 ) Gives 1.5% improvement on PegasusForCausalLM Pull Request resolved: https://github.com/pytorch/pytorch/pull/100746 Approved by: https://github.com/jansel	2023-06-28 01:08:37 +00:00
Elias Ellison	edc9c0df7e	Fold Conv-Bn (#100653 ) Adds Conv-BN folding to inductor freezing. One thing that's a little awkward now is we'll want different decompositions to run depending on if we are in the inference compiler. For now, I require that you run with torch.no_grad() so we can detect if no gradients are required before calling aot_autograd. Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100653 Approved by: https://github.com/jansel	2023-06-26 16:04:34 +00:00
Elias Ellison	1b16ac7481	Add A Pass to Fold Tensors With a Uniform Value, match sdpa on a few models (#103600 ) Adds a Constant Folding pass to the joint graph only targeting tensors which can be replaced with a single value, and then removes no-ops from the graph. This allows us to match sdpa in BertForMaskedLM, AlbertForMaskedLM, and LayoutLMForMaskedLM. BertForMaskedLM Perf: 1.6853 -> 1.933, Memory: 0.9462 -> 1.41 AlbertForMaskedLM Perf: 1.6620 -> 1.761, Memory: 1.257 -> 1.94 LayoutLMForMaskedLM Perf: (non cudagraphs) 1.6991 -> 1.939x, Memory: 0.9624 -> 1.50 MobileBertForMaskedLM Perf: 1.864x -> 1.941x, Memory: 0.94 -> 1.03 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103600 Approved by: https://github.com/jansel	2023-06-17 16:50:51 +00:00
Elias Ellison	25b6b95b2e	Fix freezing tests (#103531 ) Workaround for https://github.com/pytorch/pytorch/issues/103532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103531 Approved by: https://github.com/desertfire	2023-06-13 22:51:48 +00:00
Elias Ellison	d083d444ff	Inductor Freezing (#100652 ) Adds a freezing pass that will constant fold parameters in inductor `config.freezing`. This occurs post functionalization in aot autograd to capture both dispatching and allow passes to occur post functionalization. A few notes: - There is an option to discard parameters `config.freezing_discard_parameters` which will take the current eager modules and wrap parameters to a Tensor subclass which will error if used. - I needed to expose flat_params in aot_autograd in order to discard old references when we constant fold away parameters, like with amp. I also exposed `fw_metadata` to avoid constant folding mutated paraemters. - Caching parameter transformations/constant folding across different inferences nyi - Checking version_counter of constant folded params nyi I'm not really sure what the actual naming should be. In jit there was both "freezing", which was platform agnostic, and "optimize for inference", which made device specific optimizations. We're doing the latter here but maybe freezing is a better name. Differential Revision: [D46244033](https://our.internmc.facebook.com/intern/diff/D46244033) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100652 Approved by: https://github.com/jansel	2023-06-12 20:56:03 +00:00

33 Commits