pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
chilli	f767a6c57a	Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504 Approved by: https://github.com/mlazos, https://github.com/eellison ghstack dependencies: #110501	2023-10-05 15:47:30 +00:00
PyTorch MergeBot	1e4c0641ce	Revert "Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504 )" This reverts commit `9648df1a6a`. Reverted https://github.com/pytorch/pytorch/pull/110504 on behalf of https://github.com/PaliC due to temporarily will revert as it's causing problems with difftrain import ([comment](https://github.com/pytorch/pytorch/pull/110504#issuecomment-1749132253))	2023-10-05 15:28:23 +00:00
chilli	9648df1a6a	Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504 Approved by: https://github.com/mlazos, https://github.com/eellison ghstack dependencies: #110501	2023-10-05 01:34:57 +00:00
chilli	e686341f64	Consider that ops can be fused into cat in the min-cut partitioner (#110501 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110501 Approved by: https://github.com/eellison	2023-10-05 01:34:57 +00:00
willfengg	772e104dfd	[inductor] visualize fused ops in svg graph (#107752 ) example usage * `TORCH_COMPILE_DEBUG=1 INDUCTOR_ORIG_FX_SVG=1 INDUCTOR_POST_FUSION_SVG=1 python trig.py`: show original fx node name, file, and code. see snapshot 2 where we have origin_0, 1, 2 * trig.py can be found in P816304818 Implementation * keep original fx graph in GraphLowering, ```self.orig_gm: torch.fx.GraphModule = gm.__copy__()``` * draw original fx graph with origins ir_post_fusion ```V.debug.draw_orig_fx_graph(self.orig_gm, self.scheduler.nodes)```. node.meta["buff_meta"] tracks buf_name <img width="350" alt="Screenshot 2023-08-29 at 12 40 24 PM" src="https://github.com/pytorch/pytorch/assets/134637289/c4e197cb-ab3b-4a09-a584-c1356376accb"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107752 Approved by: https://github.com/mlazos	2023-09-21 08:03:05 +00:00
Animesh Jain	8b7b824dca	[inductor][ac] preserve recompute tags through pattern matching (#107742 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107742 Approved by: https://github.com/eellison	2023-08-25 03:48:26 +00:00
Elias Ellison	918df10198	[Easy] use dtype.itemsize in partitions (#107749 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107749 Approved by: https://github.com/davidberard98	2023-08-24 16:07:05 +00:00
vasiliy	61fe49b8ed	pt2: make aot_eager backend handle basic float8 operations (#107783 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/107642 with a fix for tests on Windows. Makes aot_eager backend of torch.compile handle basic float8 operations. This is useful for float8 training UX. Test Plan: ``` python test/test_quantization.py -k test_pt2_traceable_aot_eager ``` Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/107783 Approved by: https://github.com/albanD	2023-08-23 18:10:53 +00:00
PyTorch MergeBot	5025fb9213	Revert "pt2: make aot_eager backend handle basic float8 operations (#107642 )" This reverts commit `24147a8e1c`. Reverted https://github.com/pytorch/pytorch/pull/107642 on behalf of https://github.com/huydhn due to Sorry for reverting this, but it is failing Windows CPU test in trunk. The Windows failures on your PR looks related I think ([comment](https://github.com/pytorch/pytorch/pull/107642#issuecomment-1688999380))	2023-08-22 22:17:36 +00:00
vasiliy	24147a8e1c	pt2: make aot_eager backend handle basic float8 operations (#107642 ) Summary: Makes aot_eager backend of torch.compile handle basic float8 operations. This is useful for float8 training UX. Test Plan: ``` python test/test_quantization.py -k test_pt2_traceable_aot_eager ``` Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/107642 Approved by: https://github.com/albanD	2023-08-22 18:57:14 +00:00
Animesh Jain	0b11da0ccb	[partitioners][ac][dynamic] Fix output signature of fwd with symints (#105771 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105771 Approved by: https://github.com/Chillee	2023-07-22 03:04:11 +00:00
PyTorch MergeBot	5ab2b27353	Revert "Re-enable low memory dropout (#103330 )" This reverts commit `f32593630b`. Reverted https://github.com/pytorch/pytorch/pull/103330 on behalf of https://github.com/davidberard98 due to large compilation time regression ([comment](https://github.com/pytorch/pytorch/pull/103330#issuecomment-1622304072))	2023-07-05 19:00:40 +00:00
Elias Ellison	f32593630b	Re-enable low memory dropout (#103330 ) On attention_is_all_you_need_pytorch: Perf: 1.526x -> 1.544x Memory: 1.00 -> 1.05x Fix for https://github.com/pytorch/pytorch/issues/102319, although I'm not sure all the perf is recovered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103330 Approved by: https://github.com/jansel	2023-06-29 16:27:02 +00:00
PyTorch MergeBot	f7fdaf8191	Revert "Re-enable low memory dropout (#103330 )" This reverts commit `2d14395f17`. Reverted https://github.com/pytorch/pytorch/pull/103330 on behalf of https://github.com/malfet due to Lots of tests failed with 'prims' object has no attribute 'inductor_random' ([comment](https://github.com/pytorch/pytorch/pull/103330#issuecomment-1610691147))	2023-06-28 04:27:37 +00:00
Elias Ellison	2d14395f17	Re-enable low memory dropout (#103330 ) On attention_is_all_you_need_pytorch: Perf: 1.526x -> 1.544x Memory: 1.00 -> 1.05x Fix for https://github.com/pytorch/pytorch/issues/102319, although I'm not sure all the perf is recovered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103330 Approved by: https://github.com/jansel	2023-06-28 03:13:41 +00:00
Animesh Jain	75dab587ef	[dynamo] FSDP + AC + torch.compile (#103953 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103953 Approved by: https://github.com/wanchaol	2023-06-24 01:40:56 +00:00
Animesh Jain	f9c64a1156	[debugging] aot_eager backend to use the min-cut partitioner (#103555 ) default_partitioner is kind of broken when it comes to memory footprint. Moving aot_eager to use min-cut partitioner is better debugging experience. One bad thing though would be that we will much lower speedup numbers, because min cut partitioner will try to recompute ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103555 Approved by: https://github.com/eellison, https://github.com/jansel	2023-06-22 09:31:08 +00:00
Animesh Jain	b7ae40f4c8	[min-cut partitioner] Disable a heuristic if graph has recomputable ops (#103635 ) Removing this heuristic leads to major memory compression and speedup bump for activation-checkpointed models. Here is the data ![image](https://github.com/pytorch/pytorch/assets/13822661/64a491ab-173d-435a-b858-61b847fbb08b) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103635 Approved by: https://github.com/Chillee	2023-06-21 22:27:17 +00:00
Animesh Jain	df0505743f	[activation checkpointing] Tagging based min cut partitioner (#103357 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103357 Approved by: https://github.com/jansel	2023-06-14 20:15:43 +00:00
Animesh Jain	9c4fd72b53	[aot_autograd][functional_rng] Change calling convention (#102344 ) Key change - seed, offset are the last 2 args in both the fwd and bwd graphs Reason - The cudagraphs implementation in inductor currently relies on very simple ordering guarantees i.e. first n inputs are static for both fwd and bwd graphs. In the current implementation of functionalization of rng ops, this assumption is broken because the first 2 inputs are seed, offset. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102344 Approved by: https://github.com/eellison	2023-05-26 21:27:20 +00:00
Animesh Jain	c2093de5d9	[partitioner] fix for rng ops (#102123 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102123 Approved by: https://github.com/Chillee	2023-05-25 00:35:07 +00:00
James Reed	d5169e7141	Use a stable ordering for saved values in functorch.default_partition (#100111 ) Previously, due to the use of the Python set data structure, the ordering of saved values (and how they would appear in the graph) was unstable and changed across runs, making it hard to debug downstream applications. Here we use a dict (with insertion-ordering semantics) to deduplicate values in a way that preserves ordering Pull Request resolved: https://github.com/pytorch/pytorch/pull/100111 Approved by: https://github.com/Skylion007	2023-05-02 05:14:31 +00:00
Animesh Jain	fdbc8625a1	Functionalization of torch.rand/rand_like ops (#97377 ) This PR introduces the functionalization of RNG ops. Key points are * Introduces a new `philox_rand` prim operator that accepts seed, offset. * Adds decompositions for random operators that use these philox_rand prims * Adds a PhiloxStateTracker to track the offset for each occurence of rand ops * Changes calling convention of AOT Autograd and adds <fwd_seed, fwd_base_offset> and <bwd_seed, bwd_base_offset> * Monkeypatches set_rng_state and get_rng_state while AOT Autograd tracing to record the rng state behavior * Raises assertion for CPU because CPU does not Philox RNG. Not dealt in this PR * dropout op - offset calculation is different * other distributions like normal, poisson etc * Inductor support * Cudagraph support * Dynamic shape support An example ~~~ class Custom(torch.autograd.Function): @staticmethod def forward(ctx, x): ctx.save_for_backward(x) a = torch.rand_like(x) * x a = torch.rand_like(x) * a return a @staticmethod def backward(ctx, grad_out): x, = ctx.saved_tensors return grad_out * torch.rand_like(grad_out) * torch.cos(x) ====== Forward graph 0 ====== def forward(self, fwd_seed_1: i64[], fwd_base_offset_1: i64[], primals_1: f32[16, 16]): # No stacktrace found for following nodes add: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 0) philox_rand: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add, [16, 1], device(type='cuda', index=0), torch.float32); add = None mul: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand, primals_1); philox_rand = None add_1: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 4); fwd_base_offset_1 = None philox_rand_1: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add_1, [16, 1], device(type='cuda', index=0), torch.float32); fwd_seed_1 = add_1 = None mul_1: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand_1, mul); philox_rand_1 = mul = None return [mul_1, primals_1] ====== Backward graph 0 ====== def forward(self, bwd_seed_1: i64[], bwd_base_offset_1: i64[], primals_1: f32[16, 16], tangents_1: f32[16, 16]): # No stacktrace found for following nodes add_2: i64[] = torch.ops.aten.add.Tensor(bwd_base_offset_1, 0); bwd_base_offset_1 = None philox_rand_2: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], bwd_seed_1, add_2, [16, 1], device(type='cuda', index=0), torch.float32); bwd_seed_1 = add_2 = None mul_2: f32[16, 16] = torch.ops.aten.mul.Tensor(tangents_1, philox_rand_2); tangents_1 = philox_rand_2 = None cos: f32[16, 16] = torch.ops.aten.cos.default(primals_1); primals_1 = None mul_3: f32[16, 16] = torch.ops.aten.mul.Tensor(mul_2, cos); mul_2 = cos = None return [mul_3] ~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97377 Approved by: https://github.com/ezyang	2023-04-16 09:55:56 +00:00
Edward Z. Yang	039faf0dbf	Add invariant that all symbolic shapes must be bound in graph (#99089 ) Previously, we had a problem when partitioning forward-backward dynamic graphs, which is that we could end up with a backward graph that mentions a symbol in an input tensor (e.g., `f32[s0 + s1]`), but without this symbol being otherwise bound elsewhere. When this happens, we have no way of actually deriving the values of `s0` and `s1`. Our fix for this in https://github.com/pytorch/pytorch/pull/93059 was to just retrace the graph, so that s0 + s1 got allocated a new symbol s2 and everything was happy. However, this strategy had other problems, namely (1) we lost all information from the previous ShapeEnv, including guards and (2) we end up allocating a LOT of fresh new symbols in backwards. With this change, we preserve the same ShapeEnv between forward and backwards. How do we do this? We simply require that every symbol which may be present inside tensors, ALSO be a plain SymInt input to the graph. This invariant is enforced by Dynamo. Once we have done this, we can straightforwardly modify the partitioner to preserve these SymInt as saved for backwards, if they are needed in the backwards graph to preserve the invariant as well. This apparently breaks yolov3, but since everything else is OK I'm merging this as obviously good and investigating later. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99089 Approved by: https://github.com/voznesenskym	2023-04-16 01:48:19 +00:00
Aaron Gokaslan	597b558c51	[BE]: Update flake8 and plugins and fix bugs (#97795 ) Update flake8 and flake8-plugins in lintrunner to a modern version. Enables more checks and makes flake8 checks significantly faster. Added a few additional rule ignores that will need to be fixed in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97795 Approved by: https://github.com/alexsio27444, https://github.com/janeyx99, https://github.com/ezyang	2023-03-28 23:51:55 +00:00
Edward Z. Yang	02f6d14b97	Only allow SymInt across partitioner boundaries, and fixes (#96653 ) This PR does a few things all at once, as I needed to fix several bugs on the way here. The main goal of the PR is to fix the `'float' object has no attribute '_has_symbolic_sizes_strides'` error. The general idea is to heavily penalize non-SymInt but still SymNode cuts in the graph. This doesn't work for default partitioner, so essentially, dynamic shapes with default partitioner is not supported. While doing this, I had a fix a few other bugs in the partitioner: * SymNode operations weren't considered recomputable. But they are very cheap, go wild. * zeros_like wasn't considered recomputable, and this prevented some gradient formulas (e.g., for angle with real inputs) from successfully finding a cut at all * AOTAutograd tests use the default partitioner. I switch them to use min-cut partitioner... * ...but this reveals a bug where if we have nodes in backward outputs that don't depend on tangents, they never get assigned to the backward graph. I fix this by making the backward outputs mandatory to be in backwards. I have to be careful to filter out None backward outputs; those never participate in flow analysis! This causes some wobbling for the min-cut tests, but these seem legitimate: since we're now willing to recompute, the partitioner can reduce the number of SymInts it transmits by just doing some recompute in the backend. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96653 Approved by: https://github.com/ngimel	2023-03-14 18:30:56 +00:00
Edward Z. Yang	e9e6b3b6c5	[EASY] Add complex dtypes to partitioner (#96297 ) Also, delete some redundant dtype setting. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96297 Approved by: https://github.com/Chillee	2023-03-08 21:08:26 +00:00
Aaron Gokaslan	67d9790985	[BE] Apply almost all remaining flake8-comprehension checks (#94676 ) Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676 Approved by: https://github.com/ezyang	2023-02-12 01:01:25 +00:00
Aaron Gokaslan	3d82d8d0ed	[BE] Enable more flake8-comprehensions checks (#94601 ) I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR. This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601 Approved by: https://github.com/ezyang	2023-02-10 23:40:29 +00:00
Edward Z. Yang	dc70b00d0b	Track and record hint on SymNode and use when possible (#94201 ) Historically, we work out `size_hint` by working it out on the fly by doing a substitution on the sympy expression with the `var_to_val` mapping. With this change, we also maintain the hint directly on SymNode (in `expr._hint`) and use it in lieu of Sympy substitution when it is available (mostly guards on SymInt, etc; in particular, in idiomatic Inductor code, we typically manipulate Sympy expressions directly and so do not have a way to conveniently maintain hints.) While it's possible this will give us modest performance improvements, this is not the point of this PR; the goal is to make it easier to carefully handle unbacked SymInts, where hints are expected not to be available. You can now easily test if a SymInt is backed or not by checking `symint.node.hint is None`. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94201 Approved by: https://github.com/voznesenskym	2023-02-09 00:00:44 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Horace He	d6c3468f70	Don't allow recomputing a node that must be materialized in the backwards pass (#90896 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90896 Approved by: https://github.com/ezyang	2023-01-20 22:34:41 +00:00
Edward Z. Yang	c4501593c3	Delete get_pyobj() entirely (#92638 ) Opt for the shorter and more direct node attribute access. I need to do this because I'm going to publicly document SymInt and SymFloat but I don't want to doc get_pyobj(). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92638 Approved by: https://github.com/Chillee, https://github.com/albanD, https://github.com/voznesenskym, https://github.com/bdhirsh	2023-01-20 19:06:56 +00:00
Edward Z. Yang	7078ad5b8c	Reland "AOT Autograd refactor + cleanup, handle intermediate views of bases, use view replay, fix non-tensor input handling" (#92076 ) Original PR: https://github.com/pytorch/pytorch/pull/89532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92076 Approved by: https://github.com/janeyx99, https://github.com/albanD	2023-01-12 21:32:05 +00:00
Brian Hirsh	4ab81ae80d	fix default partitioner: save sizes instead of tensor for backward when possible (#91012 ) This should fix hf_Longformer, AllenaiLongformerBase, and tacotron2 with dynamic shapes. Example repro: ``` TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/dynamo/torchbench.py --accuracy --backend aot_eager --training --only hf_Longformer ``` used to fail with: ``` RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4, 1024, 12, 513]], which is output 0 of AsStridedBackward0, is at version 6; expected version 4 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). ``` The problem is that: (1) when we have a tensor from the forward, whose sizes are needed the backward, we were saving the actual tensor for backward, and directly grabbing the sizes off of it inside of the backward graph (bad for perf) (2) If that tensor happens to be a graph input that gets mutated, we end up with the above error. Autograd yells at you if you try to save a tensor for backward, and later mutate it. I confirmed that this problem doesn't happen for the min cut partitioner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91012 Approved by: https://github.com/ezyang	2022-12-17 02:06:10 +00:00
Elias Ellison	b651e06049	Add Pointwise Tag from pointwise set in DTensor, use in aot_autograd partitioner (#90029 ) Takes the pointwise op list from [DTensor](https://github.com/pytorch/pytorch/blob/master/torch/distributed/_tensor/ops/pointwise_ops.py#L36) as an initially starting point for pointwise ops, and feeds them to the aot autograd partitioner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90029 Approved by: https://github.com/ezyang	2022-12-08 20:21:17 +00:00
Ram Rachum	351d73b97f	Fix exception causes all over the codebase (#90271 ) This is the continuation to #90134 and hopefully the final PR in this series. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271 Approved by: https://github.com/kit1980	2022-12-07 04:29:00 +00:00
Richard Zou	4068c5467d	[Reland] Move functorch/_src to torch/_functorch (#88756 ) (#90091 ) This will be the last disruptive functorch internals change. Why are we moving these files? - As a part of rationalizing functorch we are moving the code in functorch/_src to torch/_functorch - This is so that we can offer the functorch APIs as native PyTorch APIs (coming soon) and resolve some internal build issues. Why are we moving all of these files at once? - It's better to break developers all at once rather than many times Test Plan: - wait for tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/90091 Approved by: https://github.com/anijain2305, https://github.com/ezyang	2022-12-03 14:17:15 +00:00
PyTorch MergeBot	218d9c6e09	Revert "Move functorch/_src to torch/_functorch (#88756 )" This reverts commit `52bc5c1cfe`. Reverted https://github.com/pytorch/pytorch/pull/88756 on behalf of https://github.com/clee2000 due to broke imports in tests `52bc5c1cfe` https://github.com/pytorch/pytorch/actions/runs/3574742513/jobs/6010814968 probably a landrace	2022-11-29 17:17:11 +00:00
Richard Zou	52bc5c1cfe	Move functorch/_src to torch/_functorch (#88756 ) This will be the last disruptive functorch internals change. Why are we moving these files? - As a part of rationalizing functorch we are moving the code in functorch/_src to torch/_functorch - This is so that we can offer the functorch APIs as native PyTorch APIs (coming soon) and resolve some internal build issues. Why are we moving all of these files at once? - It's better to break developers all at once rather than many times Test Plan: - wait for tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/88756 Approved by: https://github.com/ezyang	2022-11-29 13:55:42 +00:00

40 Commits