pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Aaron Gokaslan	29cc293725	[BE]: FURB142 - Remove set mutations. Use set update (#124551 ) Uses set mutation methods instead of manually reimplementing (update, set_difference etc). Pull Request resolved: https://github.com/pytorch/pytorch/pull/124551 Approved by: https://github.com/ezyang	2024-04-21 14:12:33 +00:00
Animesh Jain	febc4d8759	[dynamo][easy] forbid_in_graph check to use getattr_static (#124445 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124445 Approved by: https://github.com/yanboliang, https://github.com/jansel	2024-04-20 14:11:05 +00:00
soulitzer	cf5ca58e7f	[NJT] Inline through torch.nested.nested_tensor_from_jagged instead of graph break (#124343 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124343 Approved by: https://github.com/jbschlosser	2024-04-19 23:13:59 +00:00
PyTorch MergeBot	4a0900d04b	Revert "[NJT] Inline through torch.nested.nested_tensor_from_jagged instead of graph break (#124343 )" This reverts commit `ef93402f61`. Reverted https://github.com/pytorch/pytorch/pull/124343 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/124343#issuecomment-2064937192))	2024-04-18 18:55:48 +00:00
Jason Ansel	7a6edb0b66	Possible fix for einops warning (#124084 ) See https://github.com/arogozhnikov/einops/issues/315 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124084 Approved by: https://github.com/peterbell10	2024-04-18 17:09:50 +00:00
soulitzer	ef93402f61	[NJT] Inline through torch.nested.nested_tensor_from_jagged instead of graph break (#124343 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124343 Approved by: https://github.com/jbschlosser	2024-04-18 14:42:54 +00:00
Aleksandar Samardžić	f5331aade5	Simplify ATen sparse semi-structured operators based on CUTLASS (#123473 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123473 Approved by: https://github.com/cpuhrsch	2024-04-14 06:57:41 +00:00
PyTorch MergeBot	97261be0a8	Revert "Simplify ATen sparse semi-structured operators based on CUTLASS (#123473 )" This reverts commit `b2a0b8c446`. Reverted https://github.com/pytorch/pytorch/pull/123473 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/123473#issuecomment-2053561077))	2024-04-13 07:47:32 +00:00
rzou	5d1f9bd2bc	Move the trace_rules.py docs up (#123873 ) I always remember that the docs exist but cannot actually find it in the file because it is on line 3000. Moving it to the top of the file for visibility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123873 Approved by: https://github.com/yanboliang	2024-04-12 20:18:38 +00:00
Aleksandar Samardžić	b2a0b8c446	Simplify ATen sparse semi-structured operators based on CUTLASS (#123473 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123473 Approved by: https://github.com/cpuhrsch	2024-04-11 11:56:27 +00:00
Guilherme Leobas	2a37793249	[Dynamo] Ensure that Higher Order Ops can be composed in dynamo (#123357 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123357 Approved by: https://github.com/zou3519 ghstack dependencies: #122211	2024-04-09 18:50:17 +00:00
Guilherme Leobas	dbe0c474a9	Ensure all `torch.func.*` functions capture can be disabled (#122212 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122212 Approved by: https://github.com/zou3519 ghstack dependencies: #122211	2024-04-05 03:29:11 +00:00
Joel Schlosser	721dcaff94	Revert usage of NJT views in SDPA (#123215 ) For internal purposes, this PR reverts the use of real views in SDPA -> autograd.Function "views" (i.e. `ViewBufferFromNested` and `ViewNestedFromBuffer`). This is a temporary fix to get the FIRST model launched and working. Note: this breaks some other Dynamo tests related to SDPA that rely on real views, but the breakage there isn't expected to be likely in a real-world scenario. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123215 Approved by: https://github.com/YuqingJ	2024-04-04 18:45:47 +00:00
PyTorch MergeBot	63d17d3c90	Revert "Revert usage of NJT views in SDPA (#123215 )" This reverts commit `0fcddb5625`. Reverted https://github.com/pytorch/pytorch/pull/123215 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but I think it needs to be skipped on ROCm `0fcddb5625` ([comment](https://github.com/pytorch/pytorch/pull/123215#issuecomment-2036080570))	2024-04-04 02:57:09 +00:00
Joel Schlosser	0fcddb5625	Revert usage of NJT views in SDPA (#123215 ) For internal purposes, this PR reverts the use of real views in SDPA -> autograd.Function "views" (i.e. `ViewBufferFromNested` and `ViewNestedFromBuffer`). This is a temporary fix to get the FIRST model launched and working. Note: this breaks some other Dynamo tests related to SDPA that rely on real views, but the breakage there isn't expected to be likely in a real-world scenario. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123215 Approved by: https://github.com/YuqingJ	2024-04-03 23:25:31 +00:00
rzou	44c0c0fc0f	Add torch.library.custom_op (#122344 ) This is the entrypoint for defining an opaque/blackbox (e.g. PyTorch will never peek into it) custom op. In this PR, you can specify backend impls and the abstract impl for this op. NB: most of this PR is docstrings, please don't be intimidated by the line count. There are a number of interesting features: - we infer the schema from type hints. In a followup I add the ability to manually specify a schema. - name inference. The user needs to manually specify an op name for now. In a followup we add the ability to automatically infer a name (this is a little tricky). - custom_op registrations can override each other. This makes them more pleasant to work with in environments like colab. - we require that the outputs of the custom_op do not alias any inputs or each other. We enforce this via a runtime check, but can relax this into an opcheck test if it really matters in the future. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/122344 Approved by: https://github.com/ezyang, https://github.com/albanD	2024-04-03 18:36:17 +00:00
willfengg	f1c4d0fb2c	[dynamo] show inlining reasons from trace_rules (#123014 ) show specific inlining reasons with ``TORCH_LOGS="+dynamo" TORCHDYNAMO_VERBOSE=1`` * before, ``INLINING <code...>, inlined according trace_rules.lookup`` * after, ``INLINING <code...> inlined according trace_rules.lookup MOD_INLINELIST`` this can distanguish between inlining by default or by MOD_INLINELIST (specific rule) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123014 Approved by: https://github.com/jansel ghstack dependencies: #123013	2024-04-02 03:04:22 +00:00
willfengg	d765e223ac	[dynamo][PT2D] avoid skipping dynamo_resume_* in torch/testing/_internal (#123013 ) this PR ensures ``dynamo_resume_`` survives ``trace_rules.py``. As a ground truth, modules defined outside of ``pytorch/torch`` folders can survive ``trace_rules.py`` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123013 Approved by: https://github.com/jansel	2024-04-01 21:12:48 +00:00
Yu, Guangye	eb7adc3ae0	Refactor gpu trace to be device-agnostic (#121794 ) # Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui	2024-03-30 13:04:38 +00:00
Mikayla Gawarecki	487b6d40ec	Add RMSNorm module (#121364 ) Similar to `dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)` The implementation here is not optimized and we welcome pull requests to improve this - Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation - Remove the [upcast to float and downcast ](`dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73)`) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D55485840](https://our.internmc.facebook.com/intern/diff/D55485840) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364 Approved by: https://github.com/albanD	2024-03-29 18:05:28 +00:00
willfengg	35c56f85fd	[dynamo][pt2d] avoid skipping modules from torch/testing/_internal (#122851 ) Dynamo skips user defined modules from `torch/testing/_internal` (eg MLP, Transformer). This PR adds `torch/testing/_internal/...` to `manual_torch_name_rule_map`. It ensures FSDP CI + torch.compile are meaningfully tested unit test shows frame count = 0 before and frame count > 0 after ```pytest test/dynamo/test_trace_rules.py -k test_module_survive_skip_files``` some FSDP unit tests actually start to compile modules with this change. add trition availability check or disable tests for now Pull Request resolved: https://github.com/pytorch/pytorch/pull/122851 Approved by: https://github.com/jansel	2024-03-29 06:42:06 +00:00
Guilherme Leobas	71b5b7e081	Let dynamo trace some functions in functorch.deprecated.* namespace (#121665 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121665 Approved by: https://github.com/zou3519 ghstack dependencies: #121410	2024-03-28 18:50:43 +00:00
PyTorch MergeBot	8698121636	Revert "Add RMSNorm module (#121364 )" This reverts commit `a7306de0dc`. Reverted https://github.com/pytorch/pytorch/pull/121364 on behalf of https://github.com/atalman due to Broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/121364#issuecomment-2025502007))	2024-03-28 15:31:10 +00:00
PyTorch MergeBot	6e1c81c687	Revert "Let dynamo trace some functions in functorch.deprecated.* namespace (#121665 )" This reverts commit `f9eab9ca92`. Reverted https://github.com/pytorch/pytorch/pull/121665 on behalf of https://github.com/guilhermeleobas due to revert PR ([comment](https://github.com/pytorch/pytorch/pull/121665#issuecomment-2025460500))	2024-03-28 15:11:51 +00:00
Guilherme Leobas	f9eab9ca92	Let dynamo trace some functions in functorch.deprecated.* namespace (#121665 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121665 Approved by: https://github.com/zou3519 ghstack dependencies: #121410	2024-03-28 15:07:18 +00:00
Guilherme Leobas	933d3a7829	Allow dynamo to inline through "hessian" (#121410 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121410 Approved by: https://github.com/zou3519	2024-03-27 21:39:37 +00:00
Mikayla Gawarecki	a7306de0dc	Add RMSNorm module (#121364 ) Similar to `dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)` The implementation here is not optimized and we welcome pull requests to improve this - Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation - Remove the [upcast to float and downcast ](`dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73)`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364 Approved by: https://github.com/albanD	2024-03-27 21:39:30 +00:00
Animesh Jain	7281c5afdc	[dynamo][fbcode][torchrec] Selectively inline torchrec/distributed/types.py (#122716 ) Manually verified for the internal model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122716 Approved by: https://github.com/jansel ghstack dependencies: #122646, #122647	2024-03-27 19:40:46 +00:00
rzou	c81c9ba472	Disallow {FakeTensor,FunctionalTensor}.data_ptr (#122514 ) This PR: - disallows FakeTensor.data_ptr when it is called inside PT2 or fx tracing. - disallows FunctionalTensor.data_ptr (python FunctionalTensor is only used in PT2) The motivation behind this is that the leading cause of segfaults when using custom ops with PT2 is calling .data_ptr on FunctionalTensor or FakeTensor. This change is BC-breaking. If your code broke as a result of this, it's because there was a bug in it (these .data_ptr should never be accessed!). You can either fix the bug (recommended) or get the previous behavior back with: ``` from torch._subclasses.fake_tensor import FakeTensor from torch._subclasses.functional_tensor import FunctionalTensor data_ptr = 0 if isinstance(tensor, (FakeTensor, FunctionalTensor)) else tensor.data_ptr() ``` Test Plan: - existing tests Differential Revision: [D55366199](https://our.internmc.facebook.com/intern/diff/D55366199) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122514 Approved by: https://github.com/ezyang, https://github.com/albanD, https://github.com/yifuwang, https://github.com/kurtamohler	2024-03-26 23:55:42 +00:00
Guilherme Leobas	99c822c0ba	Let dynamo inline through jacfwd (#121254 ) Similar to #121146, changes are simple and don't require any fancy modification to the codebase. Moved a few entries on trace_rules.py and added tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121254 Approved by: https://github.com/zou3519 ghstack dependencies: #120338	2024-03-26 12:43:30 +00:00
Peter Y Yeh	46a76cfef5	[ROCm] Fix test_trace_rule_update.py (#121524 ) -Add missing torch API to trace rules and ignore API with manual trace rule. The PR fix test/dynamo/test_trace_rule_update maybe related to https://github.com/pytorch/pytorch/pull/121142 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121524 Approved by: https://github.com/jansel, https://github.com/pruthvistony	2024-03-25 20:53:24 +00:00
Guilherme Leobas	4eaa000acc	Teach dynamo about torch.func.jvp (#119926 ) List of changes: - Replace JVP_NESTING by torch._C._functorch.maybe_current_level() - Remove all increment nesting functions from wrap_fx_proxy_cls - fwAD.make_dual receives the dual_level as keyword argument - Add jvp_increment_nesting, set_fwd_grad_enabled and dual_level context managers to dynamo Pull Request resolved: https://github.com/pytorch/pytorch/pull/119926 Approved by: https://github.com/zou3519	2024-03-22 20:25:47 +00:00
Joel Schlosser	cd6bfc7965	Proper view support for jagged layout NestedTensor (#113279 ) This PR: * Introduces an ATen op for creating true jagged views from a dense values buffer * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)` * This ops is implemented on the Python side using torch.library so we can return a subclass instance * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer` * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()` * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view * Introduces an ATen op for accessing the `values` component of an NT via a view * `_nested_get_values(nt)` * Removes the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively. * Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly * Similarly, avoid `buffer_from_jagged()`, preferring `values()` * Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack) With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling. Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922) Co-authored-by: voznesenskym <voznesenskym@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279 Approved by: https://github.com/ezyang	2024-03-22 02:12:36 +00:00
PyTorch MergeBot	224beecee6	Revert "Proper view support for jagged layout NestedTensor (#113279 )" This reverts commit `5855c490f0`. Reverted https://github.com/pytorch/pytorch/pull/113279 on behalf of https://github.com/jbschlosser due to Need to fix BC thing ([comment](https://github.com/pytorch/pytorch/pull/113279#issuecomment-2013899762))	2024-03-21 22:03:01 +00:00
PyTorch MergeBot	968c4c4154	Revert "Refactor gpu trace to be device-agnostic (#121794 )" This reverts commit `74deacbf31`. Reverted https://github.com/pytorch/pytorch/pull/121794 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks ROCm jobs in trunk `74deacbf31`, please help take a look and reland the change ([comment](https://github.com/pytorch/pytorch/pull/121794#issuecomment-2013674083))	2024-03-21 20:33:17 +00:00
Yu, Guangye	74deacbf31	Refactor gpu trace to be device-agnostic (#121794 ) # Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui	2024-03-21 01:52:58 +00:00
Joel Schlosser	5855c490f0	Proper view support for jagged layout NestedTensor (#113279 ) This PR: * Introduces an ATen op for creating true jagged views from a dense values buffer * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)` * This ops is implemented on the Python side using torch.library so we can return a subclass instance * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer` * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()` * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view * Introduces an ATen op for accessing the `values` component of an NT via a view * `_nested_get_values(nt)` * Removes the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively. * Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly * Similarly, avoid `buffer_from_jagged()`, preferring `values()` * Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack) With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling. Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922) Co-authored-by: voznesenskym <voznesenskym@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279 Approved by: https://github.com/ezyang	2024-03-20 23:45:34 +00:00
PyTorch MergeBot	0696db8202	Revert "Teach dynamo about torch.func.jvp (#119926 )" This reverts commit `17489784b6`. Reverted https://github.com/pytorch/pytorch/pull/119926 on behalf of https://github.com/peterbell10 due to broken mac jobs on main ([comment](https://github.com/pytorch/pytorch/pull/119926#issuecomment-2010327997))	2024-03-20 18:34:43 +00:00
Guilherme Leobas	17489784b6	Teach dynamo about torch.func.jvp (#119926 ) List of changes: - Replace JVP_NESTING by torch._C._functorch.maybe_current_level() - Remove all increment nesting functions from wrap_fx_proxy_cls - fwAD.make_dual receives the dual_level as keyword argument - Add jvp_increment_nesting, set_fwd_grad_enabled and dual_level context managers to dynamo Pull Request resolved: https://github.com/pytorch/pytorch/pull/119926 Approved by: https://github.com/zou3519	2024-03-20 13:09:19 +00:00
PyTorch MergeBot	36e5c1dcab	Revert "Teach dynamo about torch.func.jvp (#119926 )" This reverts commit `edd04b7c16`. Reverted https://github.com/pytorch/pytorch/pull/119926 on behalf of https://github.com/jeanschmidt due to lots of breakages in pull jobs, checking if reverting this one will help ([comment](https://github.com/pytorch/pytorch/pull/119926#issuecomment-2007915919))	2024-03-19 18:59:46 +00:00
PyTorch MergeBot	f9ed1c432d	Revert "Refactor gpu trace to be device-agnostic (#121794 )" This reverts commit `0ff1109e26`. Reverted https://github.com/pytorch/pytorch/pull/121794 on behalf of https://github.com/jeanschmidt due to Reverting to see if rocm trunk errors are related ([comment](https://github.com/pytorch/pytorch/pull/121794#issuecomment-2007519408))	2024-03-19 15:40:26 +00:00
Guilherme Leobas	edd04b7c16	Teach dynamo about torch.func.jvp (#119926 ) List of changes: - Replace JVP_NESTING by torch._C._functorch.maybe_current_level() - Remove all increment nesting functions from wrap_fx_proxy_cls - fwAD.make_dual receives the dual_level as keyword argument - Add jvp_increment_nesting, set_fwd_grad_enabled and dual_level context managers to dynamo Pull Request resolved: https://github.com/pytorch/pytorch/pull/119926 Approved by: https://github.com/zou3519	2024-03-19 13:06:42 +00:00
Yu, Guangye	0ff1109e26	Refactor gpu trace to be device-agnostic (#121794 ) # Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui	2024-03-19 06:02:28 +00:00
andrewor14	773ae817f7	Batch Norm Consolidation (#116092 ) Summary: This commit simplifies the existing decomposition hierarchy of batch norm ops by adding a single, backend agnostic op: `batch_norm_with_update`. The existing hierarchy looks like: ``` aten.batch_norm -> aten._batch_norm_impl_index -> [ aten.native_batch_norm -> aten._native_batch_norm_legit (export only) -> _batch_norm_legit_cpu/cuda (kernels, export only) -> _batch_norm_cpu/cuda (kernels) ] OR [ aten.cudnn_batch_norm ] OR [ aten.miopen_batch_norm ] ``` Aside from complexity, an important problem with the above decomposition hierarchy is cuda numerics in export flows. We observed significantly worse convergence when training a mobilenetv2-like model when using the `_batch_norm_cuda` kernel instead of the `cudnn_batch_norm` kernel. This means users who export their models on CPU first then move the models to cuda later may silently see worse accuracies even when cudnn is installed, because they are using the worse kernel. This issue is summarized in https://github.com/pytorch/pytorch/issues/111384. Instead, the new hierarchy proposed by consolidating existing batch norm ops will look like: ``` aten.batch_norm -> aten.batch_norm_with_update -> [ _batch_norm_cpu (kernel) ] OR [ _batch_norm_cuda (kernel) ] OR [ cudnn_batch_norm (kernel) ] OR [ miopen_batch_norm (kernel) ] ``` The new op `batch_norm_with_update` hides backend implementation details and automatically picks the right kernel based on what is installed. This commit also adds the following variants to this op: ``` batch_norm_with_update_functional batch_norm_with_update.out batch_norm_no_update batch_norm_no_update.out batch_norm_backward ``` Note that this commit only adds this op and its variants, but does not actually change the decomps to produce these ops in the graph. This will be done after the 2 week FC window, and the ops used in the old stack is planned to be removed after the 6 month BC window. Test Plan: `OpInfo` tests for `batch_norm_with_update`. Reviewers: albanD, bdhirsh Subscribers: albanD, bdhirsh, supriyar Tasks: https://github.com/pytorch/pytorch/issues/111384 Differential Revision: [D54805279](https://our.internmc.facebook.com/intern/diff/D54805279) Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2024-03-18 21:01:30 +00:00
PyTorch MergeBot	fd0dbcd891	Revert "Batch Norm Consolidation (#116092 )" This reverts commit `7b4f70eda5`. Reverted https://github.com/pytorch/pytorch/pull/116092 on behalf of https://github.com/osalpekar due to Causes build failure in //caffe2:aten-hip (AMD build) target. See [D54707318](https://www.internalfb.com/diff/D54707318) for more details, may require internal build system changes to resolve. ([comment](https://github.com/pytorch/pytorch/pull/116092#issuecomment-1989542965))	2024-03-11 22:22:41 +00:00
Jason Ansel	7cc476ea16	[dynamo] Fix support for nn.Parameter constructor (part 1) (#120163 ) This captures calls to `torch.nn.Parameter` by lifting them to graph inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120163 Approved by: https://github.com/albanD, https://github.com/yanboliang ghstack dependencies: #121086	2024-03-11 05:14:42 +00:00
Jason Ansel	32488b0664	[dynamo] Support _unsafe_set_version_counter (#121086 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121086 Approved by: https://github.com/yanboliang	2024-03-11 05:14:42 +00:00
Boyuan Feng	35d3adb4b0	Add ATen Op _chunk_cat and _chunk_cat.out (#121081 ) # Motivation In backward of per-parameter sharding FSDP, each rank performs reduce scatter to sync gradients across ranks. A rank chunks each gradient tensor into `world_size` slices along the 0-th dimension and concatenate all slices along the 1-th dimension. Gradient tensors will be padded before concatenation when tensor.size(0) % world_size != 0. ### Example 1 Consider `world_size=3` and tensors A (2x4), B (3x3), C (1x2): Input tensors: ``` AAAA BBB CC AAAA BBB BBB ``` Reduce-scatter-copy-in Output: ``` AAAABBBCC AAAABBB00 0000BBB00 ``` ### Example 2 Consider `world_size=2` and tensors A (2x4), B (3x3), C(1x2), D(4x2): Input tensors: ``` AAAA BBB CC DD AAAA BBB 00 DD BBB DD 000 DD ``` Reduce-scatter-copy-in first pad: ``` AAAA BBB CC DD AAAA BBB 00 DD BBB DD 000 DD ``` Then chunk and cat along dim as the output: ``` AAAABBBBBBCCDDDD AAAABBB00000DDDD ``` The performance of reduce-scatter-copy-in is critical to per-parameter sharding FSDP. However, reduce-scatter-copy-in via composing existing ATen ops involves `cat` and irregular `pad`, leading redundant data copies and unsatisfactory performance. # PR We provide aten native support for reduce-scatter-copy-in, namely `_chunk_cat()`: ``` _chunk_cat(Tensor[] tensors, int dim, int num_chunks) -> Tensor ``` This PR includes the registration of `_chunk_cat` and `_chunk_cat.out`, OpInfo tests, and basic implementation composing existing ATen ops. In the next PR, we will add the CUDA implementation. Comparing with baselines of composing existing ATen ops, `_chunk_cat()` CUDA implementation improves copy bandwidth from 498 GB/s to 966 GB/s on a production benchmark. ## Requirements on input 1. If input tensors have different ndims, dim should be non-negative and be less than the ndims of every input tensors. If all input tensors have the same ndims, we support both negative and non-negative dim. 2. For wrapped_dim, all tensors should have the same size for 0,...,wrapped_dim-1 dimensions. No requirements for (wrapped_dim, ...)-th dimension. 3. Expect positive num_chunks 4. Expect non-empty input tensor list and each input tensor should have at least 1 element Pull Request resolved: https://github.com/pytorch/pytorch/pull/121081 Approved by: https://github.com/albanD	2024-03-08 21:48:12 +00:00
andrewor14	7b4f70eda5	Batch Norm Consolidation (#116092 ) Summary: This commit simplifies the existing decomposition hierarchy of batch norm ops by adding a single, backend agnostic op: `batch_norm_with_update`. The existing hierarchy looks like: ``` aten.batch_norm -> aten._batch_norm_impl_index -> [ aten.native_batch_norm -> aten._native_batch_norm_legit (export only) -> _batch_norm_legit_cpu/cuda (kernels, export only) -> _batch_norm_cpu/cuda (kernels) ] OR [ aten.cudnn_batch_norm ] OR [ aten.miopen_batch_norm ] ``` Aside from complexity, an important problem with the above decomposition hierarchy is cuda numerics in export flows. We observed significantly worse convergence when training a mobilenetv2-like model when using the `_batch_norm_cuda` kernel instead of the `cudnn_batch_norm` kernel. This means users who export their models on CPU first then move the models to cuda later may silently see worse accuracies even when cudnn is installed, because they are using the worse kernel. This issue is summarized in https://github.com/pytorch/pytorch/issues/111384. Instead, the new hierarchy proposed by consolidating existing batch norm ops will look like: ``` aten.batch_norm -> aten.batch_norm_with_update -> [ _batch_norm_cpu (kernel) ] OR [ _batch_norm_cuda (kernel) ] OR [ cudnn_batch_norm (kernel) ] OR [ miopen_batch_norm (kernel) ] ``` The new op `batch_norm_with_update` hides backend implementation details and automatically picks the right kernel based on what is installed. This commit also adds the following variants to this op: ``` batch_norm_with_update_functional batch_norm_with_update.out batch_norm_no_update batch_norm_no_update.out batch_norm_backward ``` Note that this commit only adds this op and its variants, but does not actually change the decomps to produce these ops in the graph. This will be done after the 2 week FC window, and the ops used in the old stack is planned to be removed after the 6 month BC window. Test Plan: `OpInfo` tests for `batch_norm_with_update`. Reviewers: albanD, bdhirsh Subscribers: albanD, bdhirsh, supriyar Tasks: https://github.com/pytorch/pytorch/issues/111384 Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2024-03-08 15:07:15 +00:00
Ivan Zaitsev	5d8e4126b6	Fixup test_trace_rules (#121351 ) Summary: Fixes https://www.internalfb.com/intern/testinfra/diagnostics/7599824578133672.281475099376195.1709732674/ (for some reason this test didn't run in OSS)? Reached out to Yanbo Liang for additional context: {F1465435684} Test Plan: Local: https://www.internalfb.com/intern/testinfra/testconsole/testrun/16325548673376150/ Differential Revision: D54605075 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121351 Approved by: https://github.com/malfet, https://github.com/yanboliang	2024-03-08 00:50:45 +00:00

1 2 3

112 Commits