pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Michael Lazos	e72ef4f22a	Fix capturable enablement conditions (#125826 ) Only enable capturable if state hasn't been initialized and all parameters are on CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125826 Approved by: https://github.com/anijain2305 ghstack dependencies: #125825	2024-05-11 06:29:59 +00:00
Animesh Jain	a7575e8bd5	[dynamo] Use correct source for custom getattr (#125828 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125828 Approved by: https://github.com/williamwen42	2024-05-09 20:37:23 +00:00
William Wen	ae20f15941	[dynamo] trace through nn parametrize (#125771 ) Fix https://github.com/pytorch/pytorch/issues/120914 Example dynamo output graph (from test_nn_parametrize): ``` V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] TRACED GRAPH V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] ===== __compiled_fn_1 ===== V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] /data/users/williamwen/pytorch2/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module): V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] def forward(self, L_x_: "f32[10, 10]"): V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] l_x_ = L_x_ V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] # File: /data/users/williamwen/pytorch2/torch/nn/utils/parametrize.py:275 in forward, code: x = self[0](self.original) V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] l__self___parametrizations__param___original: "f32[10, 10]" = self.L__self___parametrizations__param___original V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] # File: /data/users/williamwen/pytorch2/test/dynamo/test_repros.py:4759 in forward, code: return torch.sin(x) V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] x: "f32[10, 10]" = torch.sin(l__self___parametrizations__param___original); l__self___parametrizations__param___original = None V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] # File: /data/users/williamwen/pytorch2/test/dynamo/test_repros.py:4755 in forward, code: return self.param @ x V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] matmul: "f32[10, 10]" = x @ l_x_; x = l_x_ = None V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] return (matmul,) V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125771 Approved by: https://github.com/jbschlosser ghstack dependencies: #125710, #125724	2024-05-09 17:43:48 +00:00
William Wen	ff090c6937	[dynamo] support tracing nn.Module @property that accesses closure cells (#125724 ) Fix https://github.com/pytorch/pytorch/issues/125702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125724 Approved by: https://github.com/jansel, https://github.com/jbschlosser ghstack dependencies: #125710	2024-05-08 23:25:39 +00:00
William Wen	93f3d561f9	[dynamo] don't make nn parametrized Modules unspecialized (#125710 ) Workaround for https://github.com/pytorch/pytorch/issues/125314 and https://github.com/pytorch/pytorch/issues/125478. We no longer make parametrized nn.Modules unspecialized. Instead, when we are about to call a function from the `torch.nn.utils.parametrize` module, we skip the frame. The script from https://github.com/pytorch/pytorch/issues/125314 now outputs ``` parametrize=True: 6587ms parametrize=False: 1729ms parametrize=True: 4497ms parametrize=False: 1539ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125710 Approved by: https://github.com/jansel, https://github.com/jbschlosser	2024-05-08 23:25:39 +00:00
Simon Fan	7e0edafe86	[compiled autograd][dynamo] improve lifted autograd.Function.backward handling and fallback to pseudo-eager (#125661 ) - `FakeContext` hides all fields other than ctx.saved_tensors, this dynamo errors when the autograd.Function.backward uses other attrs on ctx and it also doesn't allow fallback to eager. - If we remove it, we still can't fallback to eager: node variables are already freed (ctx.saved_tensors throws) - However, we can fallback to "pseudo-eager" by using a duck-typed ctx and routing the ctx.saved_tensors to lifted tensors - Dynamo tries to inline external_utils.call_backward, treats BackwardCFunction as a AutogradFunctionContextVariable (only used up until we create the fake context: FakeBackwardCFunction) - we call_function backward from the forward class AutogradFunctionVariable, and we still pass in the fake context as a UserDefinedObjectVariable (can later use AutogradFunctionContextVariable + HOO graph speculate) Fixes #125489 #124827 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125661 Approved by: https://github.com/jansel	2024-05-08 21:00:37 +00:00
ydwu4	461ffaaaf3	[dynamo] support torchbind object input (#124978 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124978 Approved by: https://github.com/jansel	2024-05-07 03:02:00 +00:00
Peter Bell	24b64fc482	[HOP][inductor] Support pytrees as associative_scan input (#122137 ) This allows `associative_scan` to take an arbitrary pytree of tensors, which is flattened to their leaves before calling the `associative_scan` higher order operator. I also add support in inductor to generate code for scanning over sequences of tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122137 Approved by: https://github.com/lezcano, https://github.com/Chillee ghstack dependencies: #119430	2024-05-06 11:29:28 +00:00
Aaron Gokaslan	1dd42e42c4	[BE]: Try TCH autofixes on torch/ (#125536 ) Tries TCH autofixes and see what breaks Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536 Approved by: https://github.com/ezyang	2024-05-05 23:13:59 +00:00
Edward Z. Yang	650a248d3e	Rename is_unspecialized to pass_arg_as_tensor, add comment (#125496 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125496 Approved by: https://github.com/lezcano ghstack dependencies: #125395, #125419, #125483, #125494	2024-05-05 16:57:50 +00:00
Edward Z. Yang	12da7ee58f	Don't use wrap_fx_proxy_cls for wrap_symint (#125494 ) We use very little of the code in wrap_fx_proxy_cls, so dupe it out. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125494 Approved by: https://github.com/lezcano ghstack dependencies: #125395, #125419, #125483	2024-05-05 16:57:50 +00:00
Edward Z. Yang	617e473da5	Split wrap_symint out of wrap_unspecialized_primitive (#125483 ) While there are some similarities, they are also quite different (one handles Numpy numbers while the other handles ints. I am also going to add a wrap_symfloat soon which will do even more different behavior. So split these out for clarity. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125483 Approved by: https://github.com/lezcano ghstack dependencies: #125395, #125419	2024-05-05 16:57:50 +00:00
Animesh Jain	5ba777f46e	[guards][cpp-guards] Optimize NN module getattr guards (#124522 ) Improves the guard overhead of MobileBert model with nn module guards from 92000 units to 20000 units. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124522 Approved by: https://github.com/jansel ghstack dependencies: #125439, #125421	2024-05-04 22:08:56 +00:00
William Wen	f2ab96a57e	[dynamo] fix crash when context manager is passed to a function (#125321 ) Fix https://github.com/pytorch/pytorch/issues/125274. Main change was to reconstruct `ContextWrappingVariables` as objects in general, but we can replace them with the class on the caller side when generating the resume function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125321 Approved by: https://github.com/jansel	2024-05-03 23:01:30 +00:00
Edward Z. Yang	e93b57a570	Add propagate_real_tensors mode for unbacked (#125115 ) A common complaint when working with data-dependent code in PyTorch is that it's hard to tell how far you are from the finish line: every time a GuardOnDataDependentSymNode error is hit, you have to somehow fix or workaround it to see the next one. This PR adds a new mode `torch._functorch.config.fake_tensor_propagate_real_tensors` which modifies fake tensors to also propagate real tensors. This means that when we try to guard on a data-dependent SymNode, we can actually produce a real result. We also produce a warning which you should consult to figure out what the crux points are. I ran this on vision_maskrcnn. In the baseline (without this mode), the model has 27 graph breaks, resulting in 40 graphs. With this mode on, the model has only 11 graph breaks, resulting in 15 graphs (the remaining graph breaks are due to missing functionality for item() on float tensor and some other Dynamo missing features.) You get a list of things that would have errored like this: ``` WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> False ``` Potential later follow ups: * Improve the warning messages (in particular, should provide user frames) * GC real tensors when they are no longer needed by tracing. Right now, this will use A LOT of memory, equal to as if your GC was broken and every intermediate tensor was kept live Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125115 Approved by: https://github.com/IvanKobzarev	2024-05-02 15:28:26 +00:00
laithsakka	0b70026d3b	Do not pass none to has_pending_mutation (#125359 ) #fix https://github.com/pytorch/pytorch/issues/125315 Several failures when inlining nn module is enabled are due to passing None to has_pending_mutation from previous code, it sounds like its expected for variable to be none when not found, In that case we should skip it and not call has_pending_mutation this is tested in https://github.com/pytorch/pytorch/pull/125354 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125359 Approved by: https://github.com/mlazos	2024-05-02 09:08:22 +00:00
Brian Hirsh	7058563078	support as_python_constant on PlacementClassVariable (#124398 ) Fixes an error for torchtitan + internal Pull Request resolved: https://github.com/pytorch/pytorch/pull/124398 Approved by: https://github.com/ezyang, https://github.com/wanchaol, https://github.com/yoyoyocmu	2024-05-01 21:56:01 +00:00
Avik Chaudhuri	746da8755c	switch tests from constrain_as* to torch._check* (#125253 ) To fix data-dependent errors we want to recommend that people use `torch._check` APIs. The `constrain_as` APIs should be fully subsumed by them, and in the future we should kill them entirely. Differential Revision: D56774333 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125253 Approved by: https://github.com/ezyang	2024-05-01 21:01:27 +00:00
Edward Z. Yang	b4ccc615cd	Do exact type match on int so we don't pick up bool here too (#125305 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125305 Approved by: https://github.com/Skylion007	2024-05-01 19:46:36 +00:00
William Wen	0506e95433	[dynamo] support inactive context managers across graph breaks (#125203 ) Fix https://github.com/pytorch/pytorch/issues/124900. When we reconstruct `ContextWrappingVariables`s, we only reconstruct the context class, not the object. Normally, contexts are active (via `with ctx:`) and we initialize the context object in the resume function. But for the case of inactive contexts (contexts declared ahead of time before the `with` block), we do not reconstruct them properly in the optimized bytecode or resume function. So this PR adds initialization for inactive contexts in the resume function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125203 Approved by: https://github.com/jansel	2024-05-01 01:49:09 +00:00
drisspg	25691558d9	Change templated_attention -> flex_attention (#125251 ) # Summary Change all the names Pull Request resolved: https://github.com/pytorch/pytorch/pull/125251 Approved by: https://github.com/Chillee, https://github.com/yanboliang	2024-05-01 01:08:48 +00:00
Animesh Jain	5e5f890273	[dynamo][source] Remove inspect getattr_static from AttrSource (#125200 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125200 Approved by: https://github.com/jansel	2024-04-30 06:44:25 +00:00
drisspg	8c219251c5	Add backwards support to FlexAttention (#123902 ) # Summary This is part one of adding backwards support to FlexAttention. This PR focuses on the eager implementation and wiring up enough of the templated_attention_backward(name change soon 😉) to get through aot_eager. Notably this does not actually wire up the triton template just yet in order to make this PR easier to review. That will be the next follow up PR. #### Structure We pass both the forward and backward graph to the backwardsHOP since these are both needed to be inlined into the calculation for backwards: - the forward graph is needed in order to re-compute the scores - the joint graph is needed in order to construct the correct gradients post softmax_grad calc ### Attatched AOT Graph https://gist.github.com/drisspg/ce4c041f8df8a5a7983c5174705cf2b5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123902 Approved by: https://github.com/Chillee	2024-04-29 22:34:22 +00:00
Yanbo Liang	ce503c1b40	Dynamo x autograd.Function supports setup_context (#124802 ) Fixes part of #118397 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124802 Approved by: https://github.com/zou3519	2024-04-27 04:57:13 +00:00
Animesh Jain	fd24d8c05a	[dynamo][nn module] Use correct sources for _call_impl (#124970 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124970 Approved by: https://github.com/jansel ghstack dependencies: #124779, #124627	2024-04-26 23:18:30 +00:00
YangQun1	91d565da0c	[dynamo] Add support for tensor's is_complex method (#124927 ) This PR is to add support for tensor's is_complex method in dynamo. Take the following code as an example: ```python def test_tensor_is_complex(x): if x.is_complex(): return x + 1 else: return x - 1 ``` Before this fix, the is_complex() call will cause a graph break "torch.* op returned non-Tensor bool call_method is_complex". After this fix, the graph break can be avoided. Fixes #122692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124927 Approved by: https://github.com/ezyang	2024-04-26 18:28:14 +00:00
chilli	392dc45597	Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124799 Approved by: https://github.com/drisspg ghstack dependencies: #124444	2024-04-26 17:22:13 +00:00
PyTorch MergeBot	e913f77c60	Revert "Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799 )" This reverts commit `9bccafc31c`. Reverted https://github.com/pytorch/pytorch/pull/124799 on behalf of https://github.com/clee2000 due to broke tests but only on crossref https://github.com/pytorch/pytorch/actions/runs/8841521519/job/24279075171, added no td label so itll actually run this time ([comment](https://github.com/pytorch/pytorch/pull/124799#issuecomment-2078530797))	2024-04-26 02:35:14 +00:00
chilli	9bccafc31c	Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124799 Approved by: https://github.com/drisspg ghstack dependencies: #124444	2024-04-26 01:02:28 +00:00
chilli	7321005dd8	Add support for capturing tensors with score_mod (#124444 ) ``` import torch from torch import nn import torch.nn.functional as F import torch._inductor.config as config # torch.set_default_device('cuda') import torch from torch.nn.attention._templated_attention import _templated_attention as templated_attention from triton.testing import do_bench from torch.nn.attention import SDPBackend, sdpa_kernel index = torch.ops.aten torch.manual_seed(0) B = 16 H = 16 S = 2048 D = 64 head_scale = torch.randn(H, device='cuda') def alibi(score, batch, head, token_q, token_kv): return score + torch.ops.aten.index(head_scale, [head]) * (token_q - token_kv) bias = torch.randn(H, S, S, dtype=torch.float16, device='cuda') query = torch.randn(B, H, S, D, device="cuda", dtype=torch.float16) key = torch.randn(B, H, S, D, device="cuda", dtype=torch.float16) value = torch.randn(B, H, S, D, device="cuda", dtype=torch.float16) compiled = torch.compile(templated_attention) out = compiled(query, key, value, score_mod=alibi) out2 = templated_attention(query, key, value,score_mod=alibi) print((out - out2).abs().mean()) assert (out - out2).abs().mean() < 1e-3 print("Flash (no mask): ", do_bench(lambda: F.scaled_dot_product_attention(query, key, value))) print("Flash (mask): ", do_bench(lambda: F.scaled_dot_product_attention(query, key, value, attn_mask=bias))) print("flexattention: ", do_bench(lambda: compiled(query, key, value, score_mod=alibi))) ``` <img width="324" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/18c175d0-2720-4dfd-8747-85b8a8f609f5"> Differential Revision: [D56583900](https://our.internmc.facebook.com/intern/diff/D56583900) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124444 Approved by: https://github.com/jansel, https://github.com/drisspg	2024-04-26 01:02:28 +00:00
Arun Pa	00c5859aeb	[dynamo] Add support for DELETE_SUBSCR (#123526 ) Fixes #123317 Co-authored-by: Jason Ansel <jansel@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123526 Approved by: https://github.com/jansel	2024-04-25 22:07:24 +00:00
PyTorch MergeBot	0ca1ff3dce	Revert "Add support for capturing tensors with score_mod (#124444 )" This reverts commit `7c253a7776`. Reverted https://github.com/pytorch/pytorch/pull/124444 on behalf of https://github.com/jeanschmidt due to Breaking internal tests, check D56522566 ([comment](https://github.com/pytorch/pytorch/pull/124444#issuecomment-2076908582))	2024-04-25 10:56:38 +00:00
PyTorch MergeBot	678662a557	Revert "Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799 )" This reverts commit `acc4cbea39`. Reverted https://github.com/pytorch/pytorch/pull/124799 on behalf of https://github.com/jeanschmidt due to checking if this diff introduced regressions on linux-focal-py3.11-clang10 and linux-focal-py3.8-clang10 ([comment](https://github.com/pytorch/pytorch/pull/124799#issuecomment-2076756876))	2024-04-25 09:29:57 +00:00
Animesh Jain	e68d65dae2	[dynamo][cpp-guards] Differentiate dict guards wrt to guarding on key order (#124779 ) We guard on key order 1) When a key is a non-constant object 2) When we actually need key order - like .values, .items etc For dicts/OrderedDicts that do not require key order guarding, we just rely on usual `GuardManger + DictGetItemGuardAccessor`. This is faster than going through the `list(d.keys())` based design for OrderedDicts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124779 Approved by: https://github.com/jansel	2024-04-25 08:20:35 +00:00
Animesh Jain	59a1f1f308	[dynamo][inline inbuilt nn modules] Do not inline for export (#124814 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124814 Approved by: https://github.com/jansel	2024-04-25 06:35:31 +00:00
chilli	acc4cbea39	Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124799 Approved by: https://github.com/drisspg	2024-04-25 06:19:55 +00:00
Guilherme Leobas	763dc26e59	[Dynamo] Add dynamo support to torch.func.linearize (#123118 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123118 Approved by: https://github.com/zou3519	2024-04-23 21:31:49 +00:00
chilli	7c253a7776	Add support for capturing tensors with score_mod (#124444 ) ``` import torch from torch import nn import torch.nn.functional as F import torch._inductor.config as config # torch.set_default_device('cuda') import torch from torch.nn.attention._templated_attention import _templated_attention as templated_attention from triton.testing import do_bench from torch.nn.attention import SDPBackend, sdpa_kernel index = torch.ops.aten torch.manual_seed(0) B = 16 H = 16 S = 2048 D = 64 head_scale = torch.randn(H, device='cuda') def alibi(score, batch, head, token_q, token_kv): return score + torch.ops.aten.index(head_scale, [head]) * (token_q - token_kv) bias = torch.randn(H, S, S, dtype=torch.float16, device='cuda') query = torch.randn(B, H, S, D, device="cuda", dtype=torch.float16) key = torch.randn(B, H, S, D, device="cuda", dtype=torch.float16) value = torch.randn(B, H, S, D, device="cuda", dtype=torch.float16) compiled = torch.compile(templated_attention) out = compiled(query, key, value, score_mod=alibi) out2 = templated_attention(query, key, value,score_mod=alibi) print((out - out2).abs().mean()) assert (out - out2).abs().mean() < 1e-3 print("Flash (no mask): ", do_bench(lambda: F.scaled_dot_product_attention(query, key, value))) print("Flash (mask): ", do_bench(lambda: F.scaled_dot_product_attention(query, key, value, attn_mask=bias))) print("flexattention: ", do_bench(lambda: compiled(query, key, value, score_mod=alibi))) ``` <img width="324" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/18c175d0-2720-4dfd-8747-85b8a8f609f5"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124444 Approved by: https://github.com/jansel, https://github.com/drisspg	2024-04-23 17:54:08 +00:00
Pian Pawakapan	cf98cab1b6	[export] Forward fix XNNPackQuantizer.module_type_config to detect str nn_module_stack (#123662 ) https://github.com/pytorch/pytorch/pull/123308 previously changed the nn_module_stack format (module type -> module str). This modifies XNNPackQuantizer's module_type_config to detect class module strs instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123662 Approved by: https://github.com/williamwen42	2024-04-23 15:21:37 +00:00
Peter Bell	7ecbbc40c3	[HOP][inductor] Add higher order associative scan operator (#119430 ) Currently only supports single tensor scans, e.g. `cumsum`, `cumprod`, `logcumsumexp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119430 Approved by: https://github.com/Chillee	2024-04-23 14:40:13 +00:00
PyTorch MergeBot	4f3e1f1c93	Revert "Add support for capturing tensors with score_mod (#124444 )" This reverts commit `e0c5113dec`. Reverted https://github.com/pytorch/pytorch/pull/124444 on behalf of https://github.com/malfet due to This is weird, but somehow profile test started to timeout after this PR, see https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=noGPU_AVX512 ([comment](https://github.com/pytorch/pytorch/pull/124444#issuecomment-2072506731))	2024-04-23 14:39:37 +00:00
chilli	e0c5113dec	Add support for capturing tensors with score_mod (#124444 ) ``` import torch from torch import nn import torch.nn.functional as F import torch._inductor.config as config # torch.set_default_device('cuda') import torch from torch.nn.attention._templated_attention import _templated_attention as templated_attention from triton.testing import do_bench from torch.nn.attention import SDPBackend, sdpa_kernel index = torch.ops.aten torch.manual_seed(0) B = 16 H = 16 S = 2048 D = 64 head_scale = torch.randn(H, device='cuda') def alibi(score, batch, head, token_q, token_kv): return score + torch.ops.aten.index(head_scale, [head]) * (token_q - token_kv) bias = torch.randn(H, S, S, dtype=torch.float16, device='cuda') query = torch.randn(B, H, S, D, device="cuda", dtype=torch.float16) key = torch.randn(B, H, S, D, device="cuda", dtype=torch.float16) value = torch.randn(B, H, S, D, device="cuda", dtype=torch.float16) compiled = torch.compile(templated_attention) out = compiled(query, key, value, score_mod=alibi) out2 = templated_attention(query, key, value,score_mod=alibi) print((out - out2).abs().mean()) assert (out - out2).abs().mean() < 1e-3 print("Flash (no mask): ", do_bench(lambda: F.scaled_dot_product_attention(query, key, value))) print("Flash (mask): ", do_bench(lambda: F.scaled_dot_product_attention(query, key, value, attn_mask=bias))) print("flexattention: ", do_bench(lambda: compiled(query, key, value, score_mod=alibi))) ``` <img width="324" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/18c175d0-2720-4dfd-8747-85b8a8f609f5"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124444 Approved by: https://github.com/jansel, https://github.com/drisspg	2024-04-23 06:20:13 +00:00
Yanbo Liang	72a34eeb99	Dynamo x autograd.Function supports non-{Tensor, symnode, constant} inputs (#124360 ) Fixes #118395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124360 Approved by: https://github.com/zou3519	2024-04-22 23:32:54 +00:00
Aaron Gokaslan	5a1216bb2e	[BE]: Update ruff to 0.4.1 (#124549 ) Update ruff to 0.4.1 . This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes. Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0 \| Repository \| Linter (v0.3) \| Linter (v0.4) \| Formatter (v0.3) \| Formatter (v0.4) \| \|----------------------------------------------------\|---------------\|---------------\|------------------\|------------------\| \| [pytorch/pytorch](https://github.com/pytorch/pytorch) \| 328.7 \| 251.8 \| 351.1 \| 274.9 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549 Approved by: https://github.com/ezyang	2024-04-21 14:06:23 +00:00
Yanbo Liang	0d90d4d613	[Dynamo] Fix NamedTuple hasattr bug (#124531 ) Fixes #124402 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124531 Approved by: https://github.com/jansel	2024-04-21 04:36:22 +00:00
Animesh Jain	febc4d8759	[dynamo][easy] forbid_in_graph check to use getattr_static (#124445 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124445 Approved by: https://github.com/yanboliang, https://github.com/jansel	2024-04-20 14:11:05 +00:00
Yanbo Liang	a3e3693afc	[Dynamo] Fix missing bracket in ListVariable (#124532 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/124532 Approved by: https://github.com/williamwen42	2024-04-20 08:26:30 +00:00
drisspg	f1cbaf1764	Adds LSE output for templated-attention-hop if inputs require grad (#124308 ) Adds LSE output for templated-attention-hop if inputs require grad Prep PR for adding autograd support to templated-attention-hop. The kernel needs to output the LSE during the forward which will be used during backwards. ### Output code https://gist.github.com/drisspg/2aea3ce5db75811e7e143eeecb774d8a ## Before \| Type \| Speedup \| batch_size \| num_heads \| q_seq_len \| k_seq_len \| head_dim \| score_mod \| dtype \| \|---------\|-----------\|--------------\|-------------\|-------------\|-------------\|------------\|---------------\|----------------\| \| Average \| 1.159 \| \| \| \| \| \| \| \| \| Max \| 1.342 \| 16 \| 16 \| 512 \| 512 \| 64 \| noop \| torch.bfloat16 \| \| Min \| 1.016 \| 1 \| 16 \| 512 \| 512 \| 64 \| relative_bias \| torch.bfloat16 \| ## After Type \| Speedup \| batch_size \| num_heads \| q_seq_len \| k_seq_len \| head_dim \| score_mod \| dtype \| \|---------\|-----------\|--------------\|-------------\|-------------\|-------------\|------------\|-------------\|----------------\| \| Average \| 1.155 \| \| \| \| \| \| \| \| \| Max \| 1.339 \| 16 \| 16 \| 512 \| 512 \| 64 \| noop \| torch.bfloat16 \| \| Min \| 1.009 \| 1 \| 16 \| 512 \| 512 \| 64 \| head_bias \| torch.bfloat16 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/124308 Approved by: https://github.com/Chillee	2024-04-20 05:45:56 +00:00
JackCaoG	7ae835eee4	Enable SourcelessBuilder to build GraphModule generated by make_fx (#123673 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123673 Approved by: https://github.com/ezyang, https://github.com/anijain2305 ghstack dependencies: #123680	2024-04-19 17:23:51 +00:00
Michael Lazos	5050e627dc	Defer marking_static_address (#124309 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124309 Approved by: https://github.com/anijain2305 ghstack dependencies: #123324, #123404, #123405	2024-04-19 17:20:57 +00:00

1 2 3 4 5 ...

1228 Commits