pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jason Ansel	6022600cc6	[inductor] Handle meta tensor ops in graph (#123786 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123786 Approved by: https://github.com/anijain2305 ghstack dependencies: #123700, #123705	2024-04-12 19:03:13 +00:00
Aaron Orenstein	4044e93a51	Add mm_pattern and bmm_pattern to serialized_patterns (#121313 ) Make it easier to serialize patterns by adding `pattern_matcher.gen_register_replacement()` which is like `pattern_matcher.register_replacement()` but also requires the replacement to be precompiled. To precompile patterns (and save to disk) run: ``` torchgen/fuse_attention_patterns/gen_attention_patterns.py ``` - Updated the sfdp patterns to use `gen_register_replacement`. - Add serialized patterns for mm_pattern and bmm_pattern (The 'misc' patterns don't serialize cleanly so can't be added). - Updated the testing so it checked the round-trip patterns match and not just that it serialized the same way. - Checking that the patterns round-trip properly found that the `users` field wasn't being serialized properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121313 Approved by: https://github.com/eellison	2024-04-09 19:42:19 +00:00
Valentine233	666a628bea	[Inductor pattern] support int8 woq mm pattern matcher with freezing passe (#122955 ) There exist some issues in the previous PR (https://github.com/pytorch/pytorch/pull/120985) of supporting int8 WOQ mm pattern matcher. This PR tends to further optimize it. 1. New patterns are added to match int8 woq mm in gpt-fast model, due to different input layouts. 2. In constant folding, `int8_weight -> dq -> bf16_weight` should be kept for pattern match. 3. Currently, GPT-Fast enables `coordinate_descent_tuning` for CPU. This flag is only useful for CUDA, but it could change the graph: from the non-decomposed fallback pass to the decomposed one. We will disable the flag in GPT-Fast script for CPU, in order to have neat patterns. @yanbing-j Pull Request resolved: https://github.com/pytorch/pytorch/pull/122955 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-04-09 05:06:52 +00:00
Oguz Ulgen	f8465df9f0	Use graph.find_nodes in inductor (#122256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122256 Approved by: https://github.com/jansel ghstack dependencies: #121565, #122255	2024-04-07 18:51:14 +00:00
Mu-Chu Lee	2b48891e62	[AOTInductor] Add Runtime Constant-folding for AOTInductor (#118765 ) Summary: Add Runtime Constant-folding for AOTInductor. This also include the invocation of constant folding at load time. The constant folding lowering is a 2-step process. First, we split the graph into 2 modules, one of it is the constant module, which doesn't depend on any input and the whole module could be inferred (constant-folded) one-time and be reused. The constant module, is lowered, and being codegen-ed as usual and cached (let's call this constant code). The constant code reuses the whole lowering/profiling/etc. process, only difference is that we do not generate any headers or initialization for the constant code. Second, after handling the constant module, we take care of the main module (which is the part that would depend on the user input.) For the main module, we take in one additional component, the constant code, compare with a normal lowering. Addition step we do here is that, we inject the constant code into the codegen-ed main module, and create the caller for the main module to consume the result of the constant module. Test Plan: Unit tests included in commit. Differential Revision: D53274382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118765 Approved by: https://github.com/chenyang78	2024-02-01 04:54:25 +00:00
Oguz Ulgen	01b979fc9a	[Inductor] Fix constant folding and extern kernel mutation tracking bugs (#115908 ) This PR fixes two bugs 1) Constant folding a triton kernel results in the kernel's inputs to be returned back without any modification. Disable constant folding for triton kernels. Need more investigation 2) NoneLayout buffers should not be deleted as they do not exist Pull Request resolved: https://github.com/pytorch/pytorch/pull/115908 Approved by: https://github.com/aakhundov, https://github.com/jansel	2023-12-19 02:06:50 +00:00
Digant Desai	6c597ef015	[PyTorch] Fix attr cleanup after constant folding (#113957 ) Summary: Two nodes can point to the same attribute via node.target. This makes sure, - we don't try to delete already deleted attribute, i.e. delete attr only once - we do delete all the nodes pointing to the attribute Test Plan: ``` buck run fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:test_xnnpack_passes -- executorch.backends.xnnpack.test.passes.test_batch_norm_fusion.TestBatchNormFusion.test_q8_batch_norm_fusion ``` Differential Revision: D51419442 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113957 Approved by: https://github.com/Skylion007	2023-11-21 07:48:15 +00:00
Peter Bell	66c32d099a	Use `pytree.arg_tree_leaves` everywhere (#112394 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393	2023-10-31 15:57:06 +00:00
Peter Bell	bbd5b935e4	Use `pytree.tree_leaves` everywhere (#112324 ) This changes all the instances I could find of `tree_flatten(...)[0]` or `x, _ = tree_flatten` to use `tree_leaves`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324 Approved by: https://github.com/lezcano ghstack dependencies: #112327, #112323	2023-10-30 03:39:04 +00:00
Jerry Zhang	1b51d29b66	[quant][pt2e] Enable constant folding for quantize ops (#109343 ) Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D49399210](https://our.internmc.facebook.com/intern/diff/D49399210) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109343 Approved by: https://github.com/kimishpatel, https://github.com/jgong5	2023-09-27 06:04:45 +00:00
eellison	c8e72a4a5c	Improve mem efficiency of constant folding (#108421 ) Couple changes to make it more efficient. - Because we replacing nodes that only have a single value, only store a single value instead of the whole tensor for node replacement - torch.fx.Interpreter will preserve a Tensor in the env as long as it has more uses. That also applies even to output uses, but we are not going to constant fold that use. Instead of using last use for garbage collection, use last non output use. If reviewers would prefer I ghstack this bc of code movement let me know. Fix for https://github.com/pytorch/pytorch/issues/108388 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108421 Approved by: https://github.com/jansel	2023-09-06 02:19:30 +00:00
eellison	ed92d9345e	Refactorings for constant folding (#108450 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108450 Approved by: https://github.com/jansel	2023-09-02 03:49:05 +00:00

12 Commits