pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Maggie Moss	eb83c3ca23	Clean up unused Pyrefly suppressions (#166178 ) Cleaning up ignores that are no longer needed in the repo and adding select suppressions so the main branch is clean. test plan: `lintrunner -a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166178 Approved by: https://github.com/oulgen	2025-10-25 05:32:21 +00:00
mansiag05	850ba8c96d	[Code Clean] Clean asserts in torch/autograd. (#165627 ) Replaces 78 assert statements across 10 files in torch.autograd with explicit if-checks raising AssertionError to prevent assertions from being disabled with Python -O flag. This ensures error checking remains active in optimized builds. fix partially #164878 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165627 Approved by: https://github.com/albanD	2025-10-20 23:03:47 +00:00
Maggie Moss	f414aa8e0d	Add pyrefly suppressions (3/n) (#164588 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: uncomment lines in the pyrefly.toml file step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/bb31574ac8a59893c9cf52189e67bb2d after: 0 errors (1,970 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164588 Approved by: https://github.com/oulgen	2025-10-03 22:03:03 +00:00
Aaron Orenstein	e95e8eed0a	mypy 1.16.0 (#155821 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155821 Approved by: https://github.com/ezyang, https://github.com/zou3519	2025-06-14 18:18:43 +00:00
Valérian Rey	f5851efed9	Fix `torch.autograd.backward` `inputs` validation (#150975 ) - Fixes #150883 - Fixes #70504 This is my first PR to pytorch, so please tell me if I'm forgetting anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150975 Approved by: https://github.com/soulitzer	2025-04-17 02:11:13 +00:00
Aaron Orenstein	db4ce78d46	PEP585: More UP006 fixes (#146392 ) This should be the final PR before we can enable RUFF UP006. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146392 Approved by: https://github.com/justinchuby, https://github.com/albanD, https://github.com/Skylion007	2025-02-20 06:18:13 +00:00
Aaron Orenstein	f2cfe8b59f	PEP585 update - mostly toplevels (#145178 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145178 Approved by: https://github.com/bobrenjc93	2025-01-22 02:21:14 +00:00
cyy	d87aad6877	[5/N] Apply Ruff fixes and pyupgrade to Python 3.9 (#144205 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/144205 Approved by: https://github.com/albanD	2025-01-15 04:00:47 +00:00
soulitzer	c000214826	Allow GradientEdge as torch.autograd.backward outputs (#144744 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144744 Approved by: https://github.com/albanD	2025-01-14 21:31:44 +00:00
Shivam Raikundalia	d2ecdcb2f7	[Profiler] Add API for Dynamic Activity Toggling [2/n] (#133035 ) Summary: During PT2 there are many GPU/CPU events that are unneccessary to profile in between a given step. To remedy this, we can add an API that takes in a list of activities and an arg to toggle said activies or not. For this diff we are adding the profiler API to propogate down to kineto (and in the future the collection.cpp logic). Subsequent diffs will be added for CPU toggling and e2e testing. Test Plan: Tested by toggling backward gpu traces off and got following trace: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Jul_31_13_40_55.3251726.pt.trace.json.gz&bucket=gpu_traces Reviewed By: aaronenyeshi Differential Revision: D60541767 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133035 Approved by: https://github.com/aaronenyeshi	2024-08-09 21:54:54 +00:00
Xuehai Pan	f3fce597e9	[BE][Easy][17/19] enforce style for empty lines in import segments in `torch/[a-c]/` and `torch/[e-n]/` (#129769 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769 Approved by: https://github.com/ezyang	2024-08-04 10:24:09 +00:00
soulitzer	2eec02523b	[autograd] Support GradientEdge as output for torch.autograd.grad (#127766 ) This is useful for splitting grad to run in two parts while preserving intermediates: <details> <summary> Click to see code </summary> ```python import collections import weakref from torch.autograd.graph import GradientEdge def _get_grad_fn_or_grad_acc(t): if t.requires_grad and t.grad_fn is None: return t.view_as(t).grad_fn.next_functions[0][0] else: return t.grad_fn def reverse_closure(roots, target_nodes): # Recurse until we reach a target node closure = set() actual_target_nodes = set() q: Deque = collections.deque() for node in roots: if node is not None and node not in closure: closure.add(node) q.append(node) while q: node = q.popleft() reverse_edges = node.metadata.get("reverse_edges", []) for holder_ref, idx in reverse_edges: ref = holder_ref() if ref is not None: raise RuntimeError("Reverse graph is no longer alive") fn = ref.node if fn in closure or fn is None: continue if fn in target_nodes: actual_target_nodes.add(fn) continue closure.add(fn) q.append(fn) return closure, actual_target_nodes # Enable weak pointer class Holder(): def __init__(self, node): self.node = node # TODO: use weak references to avoid reference cycle def construct_reverse_graph(roots): q: Deque = collections.deque() root_seen = set() reverse_graph_refs = [] for node in roots: if node is not None and node not in root_seen: q.append(node) root_seen.add(node) while q: node = q.popleft() for fn, idx in node.next_functions: if fn is not None: # Don't necessarily need to store on the graph reverse_edges = fn.metadata.get("reverse_edges", []) if len(reverse_edges) == 0: q.append(fn) holder = Holder(node) holder_ref = weakref.ref(holder) reverse_graph_refs.append(holder) reverse_edges.append((holder_ref, idx)) fn.metadata["reverse_edges"] = reverse_edges return reverse_graph_refs def get_param_groups(inputs, params): inputs_closure, _ = reverse_closure(inputs, set()) param_groups = dict() # keyed on intermediates for i, param in enumerate(params): closure, intersected = reverse_closure([param], inputs_closure) param_group = { "params": set([param]), "intermediates": set(intersected), } for input_node in intersected: existing = param_groups.get(input_node, None) if existing is not None: existing["params"] = existing["params"].union(param_group["params"]) existing["intermediates"] = existing["intermediates"].union(param_group["intermediates"]) param_group = existing else: param_groups[input_node] = param_group # Sanity check: union of all param_groups params should be equal to all params union_params = set() seen_ids = set() unique_param_groups = [] for param_group in param_groups.values(): if id(param_group) not in seen_ids: seen_ids.add(id(param_group)) unique_param_groups.append(param_group) union_params = union_params.union(param_group["params"]) assert union_params == set(params) return unique_param_groups def compute_grads_only_inputs2(roots, inps, weights): root_grad_fns = list(map(_get_grad_fn_or_grad_acc, roots)) inp_grad_fns = list(map(_get_grad_fn_or_grad_acc, inps)) weight_grad_fns = list(map(_get_grad_fn_or_grad_acc, weights)) reverse_graph_refs = construct_reverse_graph(root_grad_fns) param_groups = get_param_groups(inp_grad_fns, weight_grad_fns) del reverse_graph_refs for param_group in param_groups: for i, intermediate in enumerate(param_group["intermediates"]): def get_hook(param_group, i): def hook(grad_inputs): if param_group.get("grads", None) is None: param_group["grads"] = [None] * len(param_group["intermediates"]) param_group["grads"][i] = grad_inputs return hook # These are always "split" nodes that we need to recompute, so # save their inputs. intermediate.register_prehook(get_hook(param_group, i)) dinputs = torch.autograd.grad((out,), inputs=tuple(inps), grad_outputs=(torch.ones_like(out),), retain_graph=True) return dinputs, param_groups def compute_grads_only_weights2(user_weights, param_groups): all_dweights = dict() for param_group in param_groups: # TODO: Handle case where intermediate can have multiple outputs intermediate_edges = tuple(GradientEdge(i, 0) for i in param_group["intermediates"]) weights_edges = tuple(GradientEdge(w, 0) for w in param_group["params"]) assert all(len(g) == 1 for g in param_group["grads"]) # [NEW!] Able to pass a GradientEdge to autograd.grad as output # We do not need to retain_graph because... guarantee no overlap? print("trying to execute: ", intermediate_edges, weights_edges) dweights = torch.autograd.grad(intermediate_edges, weights_edges, grad_outputs=sum(param_group["grads"], tuple())) for w, dw in zip(param_group["params"], dweights): all_dweights[w] = dw # return grads in the original order weights were provided in out = [] for w in user_weights: grad_acc = _get_grad_fn_or_grad_acc(w) out.append(all_dweights[grad_acc]) return tuple(out) ``` </details> ```python import torch.nn as nn # Setup mod1 = nn.Linear(10, 10) mod2 = nn.Linear(10, 10) a = torch.rand(10, requires_grad=True) weights = tuple(mod1.parameters()) + tuple(mod2.parameters()) inps = (a,) out = mod2(mod1(a)) class LoggingTensorMode(torch.utils._python_dispatch.TorchDispatchMode): def __torch_dispatch__(self, func, types, args=(), kwargs=None): if kwargs is None: kwargs = {} rs = func(args, *kwargs) print(f"{func.__module__}.{func.__name__}") return rs print(" -- SPLIT -- ") # Compute gradients in two parts with LoggingTensorMode(): print("PART 1") dinputs, state = compute_grads_only_inputs2((out,), inps, weights) print("PART 2") dweights = compute_grads_only_weights2(weights, state) out = mod2(mod1(a)) print(" -- REF -- ") # Compare with reference with LoggingTensorMode(): ref_all_gradients = torch.autograd.grad(out, inputs=tuple(inps) + weights, grad_outputs=(torch.ones_like(out),)) for actual, ref in zip(dinputs + dweights, ref_all_gradients): print(torch.allclose(actual, ref)) ``` <img width="598" alt="image" src="https://github.com/pytorch/pytorch/assets/13428986/3681b8a7-3ab4-4d1d-a836-abef6913e671"> ``` PART 1 torch._ops.aten.view.default torch._ops.aten.view.default torch._ops.aten.view.default torch._ops.aten.view.default torch._ops.aten.view.default torch._ops.aten.ones_like.default V0603 10:17:21.590878 8300067520 torch/autograd/graph.py:751] Executing: <ViewBackward0 object at 0x12a1ee160> with grad_outputs: [f32[10]] torch._ops.aten.view.default V0603 10:17:21.591204 8300067520 torch/autograd/graph.py:751] Executing: <AddmmBackward0 object at 0x12a1ee0d0> with grad_outputs: [f32[1, 10]] torch._ops.aten.t.default torch._ops.aten.mm.default V0603 10:17:21.591578 8300067520 torch/autograd/graph.py:751] Executing: <ViewBackward0 object at 0x100d7ae50> with grad_outputs: [f32[1, 10]] torch._ops.aten.view.default V0603 10:17:21.591747 8300067520 torch/autograd/graph.py:751] Executing: <ViewBackward0 object at 0x12a1e4a60> with grad_outputs: [f32[10]] torch._ops.aten.view.default V0603 10:17:21.591834 8300067520 torch/autograd/graph.py:751] Executing: <AddmmBackward0 object at 0x12a1e4bb0> with grad_outputs: [f32[1, 10]] torch._ops.aten.t.default torch._ops.aten.mm.default V0603 10:17:21.591922 8300067520 torch/autograd/graph.py:751] Executing: <ViewBackward0 object at 0x12a1e4a90> with grad_outputs: [f32[1, 10]] torch._ops.aten.view.default PART 2 trying to execute: (GradientEdge(node=<AddmmBackward0 object at 0x12a1e4bb0>, output_nr=0),) (GradientEdge(node=<AccumulateGrad object at 0x12a21b130>, output_nr=0), GradientEdge(node=<AccumulateGrad object at 0x12a21b7c0>, output_nr=0)) V0603 10:17:21.592223 8300067520 torch/autograd/graph.py:751] Executing: <AddmmBackward0 object at 0x12a1e4bb0> with grad_outputs: [f32[1, 10]] torch._ops.aten.t.default torch._ops.aten.mm.default torch._ops.aten.t.default torch._ops.aten.sum.dim_IntList torch._ops.aten.view.default V0603 10:17:21.592421 8300067520 torch/autograd/graph.py:751] Executing: <TBackward0 object at 0x12a1cad60> with grad_outputs: [f32[10, 10]] torch._ops.aten.t.default trying to execute: (GradientEdge(node=<AddmmBackward0 object at 0x12a1ee0d0>, output_nr=0),) (GradientEdge(node=<AccumulateGrad object at 0x12a1e41c0>, output_nr=0), GradientEdge(node=<AccumulateGrad object at 0x12a21b670>, output_nr=0)) V0603 10:17:21.593481 8300067520 torch/autograd/graph.py:751] Executing: <AddmmBackward0 object at 0x12a1ee0d0> with grad_outputs: [f32[1, 10]] torch._ops.aten.t.default torch._ops.aten.mm.default torch._ops.aten.t.default torch._ops.aten.sum.dim_IntList torch._ops.aten.view.default V0603 10:17:21.593750 8300067520 torch/autograd/graph.py:751] Executing: <TBackward0 object at 0x12a21b2b0> with grad_outputs: [f32[10, 10]] torch._ops.aten.t.default torch._ops.aten.view.default torch._ops.aten.view.default torch._ops.aten.view.default torch._ops.aten.view.default ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/127766 Approved by: https://github.com/albanD	2024-07-16 21:46:19 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
Sahdev Zala	685fcfb40d	Fix docstring in autograd (#128657 ) Fix docstrings in autograd files. The fix can be verified by running pydocstyle path-to-file --count Related #112593 BEFORE the PR:  pydocstyle torch/autograd/anomaly_mode.py --count 8 pydocstyle torch/autograd/__init__.py --count 9 AFTER the PR:  pydocstyle torch/autograd/anomaly_mode.py --count 0 pydocstyle torch/autograd/__init__.py --count 0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128657 Approved by: https://github.com/soulitzer	2024-06-14 02:18:59 +00:00
Aaron Orenstein	62bcdc0ac9	Flip default value for mypy disallow_untyped_defs [4/11] (#127841 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841 Approved by: https://github.com/oulgen	2024-06-08 18:36:48 +00:00
Xuehai Pan	67ef2683d9	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. Resolves #126888 - #126888 This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689 Approved by: https://github.com/Skylion007	2024-06-02 12:30:43 +00:00
PyTorch MergeBot	033e733021	Revert "[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 )" This reverts commit `749a132fb0`. Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))	2024-05-31 19:47:24 +00:00
Xuehai Pan	749a132fb0	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. UPDATE: Use `FutureWarning` instead of `DeprecationWarning`. Resolves #126888 - #126888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898 Approved by: https://github.com/albanD	2024-05-29 12:09:27 +00:00
albanD	af9acc4168	Fix public binding to actually traverse modules (#126103 ) The current call passes in `['/actual/path']` to os.walk which is a string pointing to no path and thus silently leads to and empty traversal. There is an unused function just above that handles that, so I guess this is what was supposed to be called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126103 Approved by: https://github.com/suo	2024-05-15 19:36:03 +00:00
Andrew Gu	a5230e6019	[ez][docs] Fixed render of `tensors` in `backward` (#117994 ) Before: <img width="851" alt="Screenshot 2024-01-22 at 2 03 49 PM" src="https://github.com/pytorch/pytorch/assets/31054793/a71111ab-c7c4-4af5-a996-cbd42bcc8326"> After: ![Screenshot 2024-01-23 at 7 13 40 PM](https://github.com/pytorch/pytorch/assets/31054793/36db28a0-a96f-434c-a93f-fe78aff1e035) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117994 Approved by: https://github.com/soulitzer, https://github.com/weifengpy	2024-01-25 15:49:32 +00:00
soulitzer	cfb3cd11c1	Add basic autograd TORCH_LOGS support (#115438 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115438 Approved by: https://github.com/albanD	2023-12-20 15:23:44 +00:00
Edward Z. Yang	1f3fa13f0a	Handle unbacked SymInt sized outputs in AOTAutograd (#113159 ) Thanks aakhundov for constructing the test case. This PR was constructed by running the failing test case, and then fixing problems until we got all the way to the end. There are a few distinct fixes: * AOTAutograd performs equality tests on tensor metadata to determine if a metadata mutation had occurred. If we test i0 vs i1, we should report these are NOT equal, since obviously we have somehow resized the tensor from i0 to i1 (even if, on a particular run, it is possible i0 == i1). * There's a sketchy fix for `test_aot_autograd_exhaustive_matmul_cpu_float32` where we check if the output shape equals the tangent shape. Unfortunately, the same `definitely_true` treatment does not work here, it still fails on the example. I piled an extra sketchy fix on top of it, where I just try my best to avoid doing the view. Maybe we should have some sort of logging here. * Partitioner needs to get out a size for unbacked SymInt when partitioning. I just feed it a random heuristic value in this case, similar to how we've been dealing with this in Inductor. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113159 Approved by: https://github.com/aakhundov, https://github.com/bdhirsh	2023-11-08 04:28:38 +00:00
soulitzer	2dc1726ab7	Compile NestedTensor with AOTAutograd (#110529 ) This PR has a number of changes that improve subclass support for AOTAutograd/Inductor in general: - previously if a subclass does extra aliasing between graph outputs/inputs in a way, the partitioner would complain because grad_outputs are the outputs reused as-is. Now we do a view_as(self) to workaround this. - Use dense -> dense metadata when working with fwd_output_strides during backward. This is important since the stride information comes from inductor which sees the dense to dense graph. - Inductor requires that the inputs to the compiled backward to match some expected strides computed during compilation. We make sure to make the inner tensors of the subclass contiguous (previously, we only made the subclass itself contiguous) Changes specific to NestedTensor relevant to compilation: - Properly handle the case where `__tensor_unflatten__` is passed non-symbolic dense tensors and with meta extracted from fake subclasses. - Skip var_to_range logic for singleton int - Skip size hint logic in inductor for singleton int Pull Request resolved: https://github.com/pytorch/pytorch/pull/110529 Approved by: https://github.com/bdhirsh	2023-10-17 21:17:10 +00:00
albanD	5e8be63e99	Allow specifiying inputs as GradientEdge in autograd APIs (#110867 ) This can be useful for advanced users (like AOTAutograd) who don't want to keep the corresponding Tensor alive (for memory reasons for example) or when inplace op will change the Tensor's grad_fn (but gradients wrt to the original value is needed). I went minimal API change but open to suggestions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110867 Approved by: https://github.com/soulitzer	2023-10-12 04:08:44 +00:00
poseljacob	a25eee1d77	_force_original_view_tracking to work as both context manager and function (#106706 ) Fix _force_original_view_tracking to work as a function as well as a context manager, as stated by documentation. Applied similar fixes to PR: https://github.com/pytorch/pytorch/pull/105291 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106706 Approved by: https://github.com/albanD	2023-08-07 23:29:22 +00:00
Alex Settle	9ba0558d48	Add sequence_nr to aot_autograd to map forward ops to their corresponding backward ops (#103129 ) Fixes #102375 Sequence_nr increments in the forward pass and decrements in the backward pass. Backward ops with the same sequence_nr as a forward op represent the backward implementation for the op. The long term goal is to make this information available to the profiler so users can observe which ops are fused by the inductor openai triton kernels. Added a test for this feature test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_aot_sequence_nr. The test case uses aot_export_module() to create a joint fwd/bwd fx graph. Then it walks all the nodes in fx graph using fx_graph.graph.nodes. The seq_nr of each node is recorded in node.meta. During the fwd pass the seq_nr increments and it decrements during the bwd pass. This allows the user to map forward ops to their corresponding bwd ops which is useful for performance analysis. Expected output from the test case. SeqNr\|OrigAten\|SrcFn 0\|aten.convolution.default\|l__self___conv1 0\|aten.add.Tensor\|l__self___bn1 1\|aten._native_batch_norm_legit_functional.default\|l__self___bn1 2\|aten.relu.default\|l__self___relu1 3\|aten.add.Tensor\|add 4\|aten.view.default\|flatten 5\|aten.t.default\|l__self___fc1 6\|aten.unsqueeze.default\|l__self___fc1 7\|aten.mm.default\|l__self___fc1 8\|aten.squeeze.dim\|l__self___fc1 9\|aten.add.Tensor\|l__self___fc1 10\|aten.sub.Tensor\|l__self___loss_fn 11\|aten.abs.default\|l__self___loss_fn 12\|aten.mean.default\|l__self___loss_fn 12\|aten.ones_like.default\| 12\|aten.expand.default\| 12\|aten.div.Scalar\| 11\|aten.sgn.default\| 11\|aten.mul.Tensor\| 8\|aten.unsqueeze.default\| 7\|aten.t.default\| 7\|aten.mm.default\| 7\|aten.t.default\| 7\|aten.t.default\| 7\|aten.mm.default\| 6\|aten.squeeze.dim\| 5\|aten.t.default\| 4\|aten.view.default\| 2\|aten.threshold_backward.default\| 1\|aten.native_batch_norm_backward.default\| 0\|aten.convolution_backward.default\| 0\|aten.add.Tensor\| Pull Request resolved: https://github.com/pytorch/pytorch/pull/103129 Approved by: https://github.com/soulitzer	2023-08-02 00:52:52 +00:00
Edward Z. Yang	3bf922a6ce	Apply UFMT to low traffic torch modules (#106249 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249 Approved by: https://github.com/Skylion007	2023-07-29 23:37:30 +00:00
poseljacob	1aba399138	allow set_multithreading_enabled to act as function and context manager (#105291 ) Fixes #104985 Implemented `set_multithreading_enabled` C++ function to directly alter state rather than using `MultithreadingEnabled` class, which was automatically resetting the state when the object was destroyed. This behavior more closely aligns with set_grad_enabled which does work as expected. This allows us to change python class `set_multithreading_enabled` to act as both a function and context manager. I also added a getter: `torch._C.is_multithreading_enabled` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105291 Approved by: https://github.com/albanD	2023-07-18 16:55:40 +00:00
Qi Zhu	086ce765a5	Add new parameter `materialize_grads` to torch.autograd.grad() (#97015 ) Fixes #44189 Adds a new parameter, zero_grad_unused, to the torch.autograd.grad() function. This parameter allows for the gradient to be set to 0 instead of None when a variable is unused, which can be helpful for higher-order partial differentials. Here is an example of using this new parameter to solve d^3y/dx^3 given y = a * x: ```python x = torch.tensor(0.5, dtype=torch.float32, requires_grad=True) a = torch.tensor(1, dtype=torch.float32, requires_grad=True) y = x * a dydx = torch.autograd.grad(y, x, create_graph=True, allow_unused=True) d2ydx2 = torch.autograd.grad(dydx, x, allow_unused=True, zero_grad_unused=True) try: d3ydx3 = torch.autograd.grad(d2ydx2, x, allow_unused=True, zero_grad_unused=True) except RuntimeError as e: assert False, "Should not raise error" ``` With `zero_grad_unused`, d2ydx2 could be 0 instead of None, enabling d3ydx3 to be calculated as defined in math without throwing an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97015 Approved by: https://github.com/soulitzer	2023-03-18 03:11:12 +00:00
Luke Confait	46eaf4be7d	Fix Typo in pytorch/torch/autograd/__init__.py (#97024 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97024 Approved by: https://github.com/Skylion007, https://github.com/soulitzer	2023-03-17 16:24:18 +00:00
kshitij12345	3b966a6ce3	[autograd] disable backward/grad for complex scalar output (#92753 ) Fixes https://github.com/pytorch/pytorch/issues/92750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753 Approved by: https://github.com/ezyang	2023-02-23 11:38:27 +00:00
Brian Hirsh	2b36d35b9c	add torch.autograd._unsafe_set_version_counter API (#92924 ) better description coming soon (but this is meant to fix https://github.com/pytorch/pytorch/issues/91093) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92924 Approved by: https://github.com/ezyang, https://github.com/alanwaketan, https://github.com/albanD	2023-02-11 21:07:08 +00:00
Brian Hirsh	83275d8cdf	add torch.autograd._set_view_replay_enabled, use in aot autograd (#92588 ) tldr; this should fix some minor perf regressions that were caused by adding more as_strided() calls in aot autograd. This PR adds a new context manager, `torch.autograd._set_view_replay_enabled()`. Context: AOT Autograd has special handling for "outputs that alias graph intermediates". E.g. given this function: ``` def f(x): y = torch.mul(x, 2) out = y.view(-1) return out ``` AOT Autograd will do the following: ``` def fn_to_compile(x): y = torch.mul(x, 2) out = y.view(-1) # return the graph intermediate return y, out compiled_fn = compile(fn_to_compile) def wrapper(x): y, out = compiled_fn(x) # regenerate the alias of the graph intermediate return out._view_func(y) ``` What's annoying is that `out._view_func()` will result in a `.as_strided` call, because `out` is an ordinary runtime tensor. This (likely?) caused a perf regression, because when running the backward, out `as_strided_backward()` is slower than our `view_backward()`. In this PR, I added some TLS for instructing autograd to do view replay instead of as_strided, even when given a normal tensor. I'm definitely interested in thoughts from autograd folks (cc @albanD @soulitzer). A few points that I want to bring up: (1) One reason that this API seems generally useful to me is because of the case where you `torch.compile()` a function, and you pass in two inputs that alias each other, and mutate one of the inputs. Autograd is forced to add a bunch of as_strided() calls into the graph when this happens, but this would give users an escape hatch for better compiled perf in this situation (2) To be fair, AOT Autograd probably won't need this TLS in the long term. There's a better (more complicated) solution, where AOT Autograd manually precomputes the view chain off of graph intermediates during tracing, and re-applies them at runtime. This is kind of complicated though and feels lower priority to implement immediately. (3) Given all of that I made the API private, but lmk what you all think. This is a followup of https://github.com/pytorch/pytorch/pull/92255. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92588 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-08 01:48:32 +00:00
Elias Ellison	d04889323e	Add Context Manager for Disabling Multithreading in Backwards, use in aot autograd (#86245 ) We were running into a few issues with running multithreaded backwards in aot_autograd: such as https://github.com/pytorch/pytorch/issues/86136, and `FakeTensorMode` getting into a weird state as a result of not executing functions completely sequentially. The multithreaded backwards is lost in translation when we trace out the backwards anyway, and adds a lot of additional complexity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86245 Approved by: https://github.com/albanD, https://github.com/yf225	2022-10-06 03:27:42 +00:00
Richard Zou	cd32a86bf2	Stop monkeypatching Tensor.backward() on `import functorch` (#85152 ) Monkeypatching is bad, we should never be doing it. This PR removes functorch's monkeypatching on Tensor.backward() by adding it directly to the implementation of Tensor.backward(). As an alternative, we could have done an `import functorch` and used `functorch._C.are_transforms_active` directly in `torch/autograd/__init__.py`. The problem with that is that it runs into a bunch of circular imports. NB: https://github.com/pytorch/pytorch/issues/72179 is still on my mind. I didn't choose to do it right now because: - This PR doesn't make the situation worse than it already is (no monkeypatching is better than having the monkeypatch) - We don't have a design for #72179 yet. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/85152 Approved by: https://github.com/soulitzer	2022-09-19 17:06:15 +00:00
Taylor Robie	1fa9a377d0	[Profiler] Start moving python bindings out of autograd (#82584 ) A lot of profiler code still lives in autograd for historic reasons. However as we formalize and clean up profiler internals it makes sense to pull more and more into the profiler folders/namespace. For now I'm just moving some of the core config data structures and those related to `torch::profiler::impl::Result` to keep the scope manageable. Differential Revision: [D37961462](https://our.internmc.facebook.com/intern/diff/D37961462/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37961462/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/82584 Approved by: https://github.com/albanD, https://github.com/Gamrix	2022-08-19 17:15:18 +00:00
drisspg	b9f83cb737	use is_same_size in autograd init (#79553 ) Broke: #79446 into a smaller commit that just adds is_same_size to the the autograd __init_file. This function is_same_size will be dispatched to the original behavior for regular tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/79553 Approved by: https://github.com/soulitzer	2022-06-15 19:49:42 +00:00
Animesh Jain	0bbcac58e3	Monkey patch Variable module to fix FX codegen Fixes https://github.com/facebookresearch/torchdynamo/issues/82 There is a `torch.auotgrad.variable` function which conflicts with the module class `torch.autograd.variable.Variable`. @jansel Pull Request resolved: https://github.com/pytorch/pytorch/pull/76079 Approved by: https://github.com/jansel, https://github.com/albanD	2022-04-21 23:07:51 +00:00
Edward Z. Yang	ee955b8bb9	Cannibalize noarch CI job into crossref CI job crossref is a new strategy for performing tests when you want to run a normal PyTorch API call, separately run some variation of the API call (e.g., same thing but all the arguments are meta tensors) and then cross-reference the results to see that they are consistent. Any logic you add to CrossRefMode will get run on every PyTorch API call that is called in the course of PyTorch's test suite. This can be a good choice for correctness testing if OpInfo testing is not exhaustive enough. For now, the crossref test doesn't do anything except verify that we can validly push a mode onto the torch function mode stack for all functions. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/75988 Approved by: https://github.com/seemethere	2022-04-20 11:56:25 +00:00
Haobo Yuan	45ac08b319	torch.autograd.grad needs an extra tuple when handling single outputs and is_grads_batched=True Fixes #75735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75779 Approved by: https://github.com/soulitzer	2022-04-19 13:19:58 +00:00
Edward Z. Yang	0a1bc5f501	Miscellaneous __torch_function__ fixes I figured these out by unconditionally turning on a no-op torch function mode on the test suite and then fixing errors as they showed up. Here's what I found: - _parse_to failed internal assert when __torch_function__'ed because it claims its name is "to" to the argument parser; added a name override so we know how to find the correct name - Infix operator magic methods on Tensor did not uniformly handle __torch_function__ and TypeError to NotImplemented. Now, we always do the __torch_function__ handling in _wrap_type_error_to_not_implemented and your implementation of __torch_function__ gets its TypeErrors converted to NotImplemented (for better or for worse; see https://github.com/pytorch/pytorch/issues/75462 ) - A few cases where code was incorrectly testing if a Tensor was Tensor-like in the wrong way, now use is_tensor_like (in grad and in distributions). Also update docs for has_torch_function to push people to use is_tensor_like. - is_grads_batched was dropped from grad in handle_torch_function, now fixed - Report that you have a torch function even if torch function is disabled if a mode is enabled. This makes it possible for a mode to return NotImplemented, pass to a subclass which does some processing and then pass back to the mode even after the subclass disables __torch_function__ (so the tensors are treated "as if" they are regular Tensors). This brings the C++ handling behavior in line with the Python behavior. - Make the Python implementation of overloaded types computation match the C++ version: when torch function is disabled, there are no overloaded types (because they all report they are not overloaded). Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/75484 Approved by: https://github.com/zou3519	2022-04-11 16:52:16 +00:00
Brian Coutinho	02ba0fa8e8	[pytorch profiler] enable iteration tracking for kineto (#72292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72292 Integrates the libkineto step() method into pytorch profiler step() invocation. This enables Kineto to track the iteration count and trigger trace collection on iteration boundaries from outside the process. Test Plan: ## Test using pytorch profiler step() method Modified the resnet integration test to use pytorch profiler. Configure it to capture 3 iterations : ``` ACTIVITIES_COMPRESSION_ALGORITHM=GZIP ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1643063194/127.0.0.1/ PROFILE_START_ITERATION=200 ACTIVITIES_WARMUP_ITERATIONS=1 ACTIVITIES_ITERATIONS=3 ``` Run dyno gputrace -gpuconf /tmp/kineto_pytorch.conf The output trace has iterations 202, 203, 204 :) One iteration is skipped due to warmup. (Also its one off due 0 vs 1 indexing) [Trace link](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1643063194%2F127.0.0.1%2Flibkineto_activities_501743.json.gz&bucket=gpu_traces) {F695716262} Reviewed By: robieta Differential Revision: D33825241 fbshipit-source-id: 70983420cf47ebbac7b44bfb6494d314506302c5 (cherry picked from commit 96c06ecc9a80512d85b8941f195360f41d74103f)	2022-03-23 02:31:45 +00:00
Victor Quach	a3b7dd7b78	Enable nested default hooks (#70932 ) Summary: When default hooks are set, they are pushed onto a stack. When nesting context-manager, only the inner-most hooks will be applied. There is special care needed to update the TLS code. See also https://github.com/pytorch/pytorch/issues/70940 (i.e. do we need to be storing the enabled flag as well?) Fixes https://github.com/pytorch/pytorch/issues/70134 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70932 Reviewed By: mruberry Differential Revision: D33530370 Pulled By: albanD fbshipit-source-id: 3197d585d77563f36c175d3949115a0776b309f4	2022-01-11 15:03:49 -08:00
Nikolay Korovaiko	9e175400ac	Moving python binding to _C and its decl to the right pyi file (#67365 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67365 Reviewed By: malfet, albanD Differential Revision: D31972163 Pulled By: Krovatkin fbshipit-source-id: e5313c2c8cb810b57b7fe16af8ba26edbe486488	2021-10-27 17:33:45 -07:00
Nikita Shulga	d691bc1207	Revert D31937065: [pytorch][PR] fix binding to the wrong python module Test Plan: revert-hammer Differential Revision: D31937065 (`7ac8ed741d`) Original commit changeset: 5c10b2870bcc fbshipit-source-id: 9b21ffea8054b8a3a0b96e1b78e933f8654e7f2f	2021-10-26 17:40:59 -07:00
Nikolay Korovaiko	7ac8ed741d	fix binding to the wrong python module (#67246 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67246 Reviewed By: zhxchen17 Differential Revision: D31937065 Pulled By: Krovatkin fbshipit-source-id: 5c10b2870bccece50ba52dde26127da79bccbba6	2021-10-26 17:19:02 -07:00
Nikolay Korovaiko	1f55dd83ac	[WIP] wrap XLATensors into Python XLA wrapper class (#65841 ) Summary: Improbably fixes https://github.com/pytorch/pytorch/issues/65130 ezyang I'm super n00b in Python extensions, is this what we want to do? Pull Request resolved: https://github.com/pytorch/pytorch/pull/65841 Reviewed By: navahgar Differential Revision: D31889790 Pulled By: Krovatkin fbshipit-source-id: c7f077b89f6f02df1962ab83d9e13fcc348a227d	2021-10-25 16:11:03 -07:00
Louis Feng	ecb7b38c00	[PyTorch] Support additional arguments in Python record function (#65736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65736 We ran into some limitations to extract PyTorch operator parameters through hooks or the execution graph. Some of these limitations are not due to the operator not exposing them, rather the inputs for these operators are already fused/processed in some cases (like embedding table). We want to be able to attach some metadata to the user scope record functions allowing the profilers to later extract these information. The record function C++ API already supports taking inputs and outputs information. The corresponding Python interface does not support them and only allows a string name as record function parameter. This diff adds support for user to optionally to add additional arguments to the record function in two ways. 1. to remain backward compatible with `record_function_op`, we have added an optional string arg to the interface: `with record_function(name, arg_str)`. 2. to support data dependency graph, we also have the new `torch.autograd._record_function_with_args_enter` and `torch.autograd._record_function_with_args_exit` functions to provide an interface where we can give additional tensor arguments. For now we imagine this can be used for debugging or analysis purpose. In this form, we currently support some basic data types as inputs: scalars, string, list, and tensor. Example usage: ``` # record_function operator with a name and optionally, a string for arguments. with record_function("## TEST 1 ##", "[1, 2, 3]"): <actual module or operator> # more general form of record_function a = _record_function_with_args_enter("## TEST 2 ##", 1, False, 2.5, [u, u], "hello", u) <actual module or operator> _record_function_with_args_exit(a) ``` Corresponding outputs in execution graph: ``` { "name": "## TEST 2 ##", "id": 7, "parent": 3, "fw_parent": 0, "scope": 5, "tid": 1, "fw_tid": 0, "inputs": [1,false,2.5,[6,6],"hello",6], "input_shapes": [[],[],[],[[3,4,5],[3,4,5]],[],[3,4,5]], "input_types": ["Int","Bool","Double","GenericList[Tensor(float),Tensor(float)]","String","Tensor(float)"], "outputs": [], "output_shapes": [], "output_types": [] }, { "name": "## TEST 1 ##", "id": 3, "parent": 2, "fw_parent": 0, "scope": 5, "tid": 1, "fw_tid": 0, "inputs": ["1, 2, 3"], "input_shapes": [[]], "input_types": ["String"], "outputs": [], "output_shapes": [], "output_types": [] }, ``` Test Plan: ``` => buck build caffe2/test:profiler --show-output => buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_1651304.json Net filter: Target net for iteration count: Net Iterations: 3 INFO:2021-09-27 01:10:15 1651304:1651304 Config.cpp:424] Trace start time: 2021-09-27 01:10:30 Trace duration: 500ms Warmup duration: 5s Net size threshold: 0 GPU op count threshold: 0 Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event Manifold bucket: gpu_traces Manifold object: tree/traces/clientAPI/0/1632730215/devvm2060.ftw0/libkineto_activities_1651304.json Trace compression enabled: 1 INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:536] Tracing starting in 14s INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:48] Target net for iterations not specified - picking first encountered that passes net filter INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:57] Tracking net PyTorch Profiler for 3 iterations INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:126] Processing 1 CPU buffers INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:686] Recorded nets: INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:689] PyTorch Profiler: 1 iterations ok ---------------------------------------------------------------------- Ran 1 test in 0.021s OK ``` Reviewed By: gdankel Differential Revision: D31165259 fbshipit-source-id: 15920aaef7138c666e5eca2a71c3bf33073eadc4	2021-10-13 01:49:15 -07:00
soulitzer	73901b099d	Add batched_grad parameter to `autograd.grad` (#65564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65564 - wrap the call into engine with vmap if `batched_grad` is `True` - improves the comment on the call to engine (somewhat addressing https://github.com/pytorch/pytorch/issues/41659) - borrows the message from functional.jacobian's vectorized argument concerning usage of the vmap feature - adds basic test (further testing is done when we replace the usage in vectorized jacobian computation) TODO: - create an issue tracking this Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31236259 Pulled By: soulitzer fbshipit-source-id: b33e6b26ea98fa9f70c44da08458fc54ba4df0f7	2021-10-03 19:55:06 -07:00
北海若	efe01c59e3	[Doc] Deprecation notice for only_inputs argument (#63631 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63544. Changed docstring accordingly. I'm new here, not sure if the style is okay. Please check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63631 Reviewed By: ejguan Differential Revision: D30459439 Pulled By: soulitzer fbshipit-source-id: 8df3c509d1dd39764815b099ab47229550126cbe	2021-08-20 15:49:49 -07:00

1 2 3

141 Commits