pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	b0d2fe6299	Revert "Short-term fix to preserve NJT metadata cache in torch.compile (#122836 )" This reverts commit `2a41fc0390`. Reverted https://github.com/pytorch/pytorch/pull/122836 on behalf of https://github.com/jbschlosser due to internal test failures with DEBUG=1 asserts ([comment](https://github.com/pytorch/pytorch/pull/122836#issuecomment-2177298245))	2024-06-19 00:28:53 +00:00
PyTorch MergeBot	5ffb032be6	Revert "Backward support for unbind() with NJT (#128032 )" This reverts commit `5dc4f652bc`. Reverted https://github.com/pytorch/pytorch/pull/128032 on behalf of https://github.com/jbschlosser due to reverting to revert parent PR ([comment](https://github.com/pytorch/pytorch/pull/128032#issuecomment-2177296325))	2024-06-19 00:26:40 +00:00
PyTorch MergeBot	99f042d336	Revert "Forward fix to skip ROCm tests for #122836 (#128891 )" This reverts commit `4061b3b822`. Reverted https://github.com/pytorch/pytorch/pull/128891 on behalf of https://github.com/jbschlosser due to reverting to revert parent PR ([comment](https://github.com/pytorch/pytorch/pull/128891#issuecomment-2177291249))	2024-06-19 00:21:21 +00:00
Joel Schlosser	5dc4f652bc	Backward support for unbind() with NJT (#128032 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128032 Approved by: https://github.com/soulitzer	2024-06-18 20:29:00 +00:00
Joel Schlosser	4061b3b822	Forward fix to skip ROCm tests for #122836 (#128891 ) Fixes broken ROCm tests from #122836. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128891 Approved by: https://github.com/huydhn ghstack dependencies: #127007, #128057, #122836	2024-06-18 03:01:19 +00:00
Joel Schlosser	2a41fc0390	Short-term fix to preserve NJT metadata cache in torch.compile (#122836 ) Idea: close over min / max sequence length in the main NJT view func (`_nested_view_from_jagged`) so that view replay during fake-ification propagates these correctly in torch.compile. For dynamic shapes support for min / max sequence length, this PR uses a hack that stores the values in `(val, 0)` shaped tensors. NB: This PR changes SDPA to operate on real views instead of using `buffer_from_jagged()` / `ViewNestedFromBuffer`, which may impact the internal FIRST model. That is, it undoes the partial revert from #123215 alongside a fix to the problem that required the partial revert. We need to verify that there are no regressions there before landing. Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122836 Approved by: https://github.com/soulitzer ghstack dependencies: #127007, #128057	2024-06-17 15:25:09 +00:00
Joel Schlosser	9a8917fdbd	Naive CPU kernels for jagged <-> padded dense conversions (#127007 ) This PR introduces naive CPU impls for: * `_jagged_to_padded_dense_forward()` * `_padded_dense_to_jagged_forward()` On the CUDA side, these are backed by lifted FBGEMM kernels. We may want to revisit the CPU versions with higher-performance implementations at a later time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127007 Approved by: https://github.com/davidberard98	2024-06-13 17:13:02 +00:00
Huy Do	6206da55ef	Fix lint after #119459 (#128558 ) TSIA Pull Request resolved: https://github.com/pytorch/pytorch/pull/128558 Approved by: https://github.com/atalman, https://github.com/kit1980, https://github.com/malfet	2024-06-12 22:11:37 +00:00
Joel Schlosser	67e6c76a18	Support apply_(callable) sugar for CPU NJTs (#125416 ) Example: ```python nt.apply_(lambda x: x * 2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125416 Approved by: https://github.com/soulitzer	2024-06-12 20:30:57 +00:00
Joel Schlosser	ec1fdda196	Fix jagged NT softmax semantics (#119459 ) Before: `softmax` definition uses `jagged_unary_pointwise()` (wrong) After: `softmax` impl adjusts the `dim` arg to account for the difference in dimensionality between the outer NT and the NT's `_values` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119459 Approved by: https://github.com/soulitzer	2024-06-12 19:12:03 +00:00
yuqingj	3e09123797	Enable UFMT on test_nestedtensor.py (#128359 ) split it into two PRs since it is more than 2k lines of change Pull Request resolved: https://github.com/pytorch/pytorch/pull/128359 Approved by: https://github.com/davidberard98	2024-06-11 19:14:04 +00:00
Janani Sriram	c1a43a69e4	[NestedTensor] Add error checks for unbind operator coverage when ragged_idx != 1 (#128058 ) Summary: Add the following error checks for the `unbind` operator on `NestedTensor`s when `ragged_idx != 1`: - The current implementation allows the creation of `NestedTensor` instances from the class definition with an `offsets` tensor that applies to a dimension other than the jagged dimension. This diff ensures that `unbind` fails when the `offsets` exceed the length of the jagged dimension. Test Plan: Added the following unit tests: `test_unbind_with_lengths_ragged_idx_equals_2_bad_dim_cpu` verifies that `unbind` fails when there is a mismatch between the offsets and the jagged dimension, for `NestedTensor`s with `lengths`. ``` test_unbind_with_lengths_ragged_idx_equals_2_bad_dim_cpu (test_nestedtensor.TestNestedTensorSubclassCPU) ... ok ``` Reviewed By: davidberard98 Differential Revision: D57989082 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128058 Approved by: https://github.com/davidberard98	2024-06-06 01:56:12 +00:00
Joel Schlosser	b42cfcabc4	Lift jagged -> padded dense forward / backward kernels from fbgemm_gpu (#125946 ) PyTorch can't depend on `fbgemm_gpu` as a dependency because `fbgemm_gpu` already has a dependency on PyTorch. So this PR copy / pastes kernels from `fbgemm_gpu`: * `dense_to_jagged_forward()` as CUDA registration for new ATen op `_padded_dense_to_jagged_forward()` * `jagged_to_padded_dense_forward()` as CUDA registration for new ATen op `_jagged_to_padded_dense_forward()` CPU impls for these new ATen ops will be added in a follow-up PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125946 Approved by: https://github.com/davidberard98	2024-06-03 23:41:54 +00:00
Janani Sriram	7c3740d388	[NestedTensor] Extend coverage for unbind when ragged_idx != 1 (#127493 ) Summary: Extend coverage for the `NestedTensor` `unbind` operator to cases in which `ragged_idx != 1`. Currently, the `unbind` operator in the `NestedTensor` class splits a tensor along the 0-th dimension, where the `ragged_idx` property, which controls the jagged dimension upon which `unbind` splits, is 1. This diff extends support for `ragged_idx != 1` in `NestedTensor`s, allowing `unbind` to split a tensor along a jagged dimension greater than 0 for `NestedTensor`s with and without the `lengths` property. Test Plan: Added the following unit tests: `test_unbind_ragged_idx_equals_2_cpu`, `test_unbind_ragged_idx_equals_3_cpu`, and `test_unbind_ragged_idx_equals_last_dim_cpu` verify that `unbind` works for all jagged dimensions greater than 1, for `NestedTensor`s without `lengths`. ``` test_unbind_ragged_idx_equals_2_cpu (test_nestedtensor.TestNestedTensorSubclassCPU) ... ok test_unbind_ragged_idx_equals_3_cpu (test_nestedtensor.TestNestedTensorSubclassCPU) ... ok test_unbind_ragged_idx_equals_last_dim_cpu (test_nestedtensor.TestNestedTensorSubclassCPU) ... ok ``` `test_unbind_with_lengths_cpu` and `test_unbind_with_lengths_ragged_idx_equals_1_cpu` verify that `unbind` works when the jagged dimension is 1, for `NestedTensor`s with `lengths`. ``` test_unbind_with_lengths_cpu (test_nestedtensor.TestNestedTensorSubclassCPU) ... ok test_unbind_with_lengths_ragged_idx_equals_1_cpu (test_nestedtensor.TestNestedTensorSubclassCPU) ... ok ``` `test_unbind_with_lengths_ragged_idx_equals_2_cpu` and `test_unbind_with_lengths_ragged_idx_equals_3_cpu` verify that `unbind` works when the jagged dimension is greater than 1, for `NestedTensor`s with `lengths`. ``` test_unbind_with_lengths_ragged_idx_equals_2_cpu (test_nestedtensor.TestNestedTensorSubclassCPU) ... ok test_unbind_with_lengths_ragged_idx_equals_3_cpu (test_nestedtensor.TestNestedTensorSubclassCPU) ... ok ``` `test_unbind_with_lengths_ragged_idx_equals_0_cpu` verifies that `unbind` fails when the jagged dimension is 0 (the batch dimension), for `NestedTensor`s with `lengths`. ``` test_unbind_with_lengths_ragged_idx_equals_0_cpu (test_nestedtensor.TestNestedTensorSubclassCPU) ... ok ``` `test_unbind_with_lengths_ragged_idx_equals_2_bad_dim_cpu` verifies that `unbind` fails when there is a mismatch between the offsets and the jagged dimension, for `NestedTensor`s with `lengths`. ``` test_unbind_with_lengths_ragged_idx_equals_2_bad_dim_cpu (test_nestedtensor.TestNestedTensorSubclassCPU) ... ok ``` `test_unbind_with_wrong_lengths_cpu` verifies that `unbind` fails when the lengths exceed the limitations set by offsets, for `NestedTensor`s with `lengths`. ``` test_unbind_with_wrong_lengths_cpu (test_nestedtensor.TestNestedTensorSubclassCPU) ... ok ``` Differential Revision: D57942686 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127493 Approved by: https://github.com/davidberard98	2024-06-03 17:46:12 +00:00
David Berard	f33beb767d	[NestedTensor] Use maybe_mark_dynamic instead of mark_dynamic (#127453 ) Fixes #127097 TL;DR: dimensions marked with mark_dynamic can result in assertion failures if the marked-dynamic dimensions get specialized. In NJT, we don't care _that_ much that a dimension is marked as dynamic. So instead, mark with `maybe_mark_dynamic` which suggests that a dimension should be dynamic, but doesn't fail if the dimension gets specialized. Background: NJT marks the values tensor as dynamic: `49ad90349d/torch/nested/_internal/nested_tensor.py (L122)` It does this for two reasons: 1. Conceptual: We know that this dimension _should_ be dynamic; it's a nested tensor, so the sequence lengths will _probably_ vary between batches in the common case. Therefore, we should compile it as dynamic to prevent needing a recompile to trigger automatic dynamic shapes. 2. Implementation detail: Right now we run into issues with torch.compile / tensor_unflatten / other details when the dimensions are not marked as dynamic. We have some attempts to remove this (e.g. https://github.com/pytorch/pytorch/pull/126563) but while testing this I wasn't able to get all tests to pass, so there could be potential regressions here if we removed the mark_dynamic. Justification for this change 1. Conceptual: AFAIK, we don't care enough about the dynamism of this dimension to error out if we specialize. We'd prefer that we don't have to recompile to get automatic dynamic shapes, but it's also better to not have this issue (and not to force the user to go hunt down all the other equivalent shapes to mark them as dynamic as well). This solution allows us to suggest the dynamism but not force it. 2. Implementation detail: This still marks the dimension as symbolic at the beginning of dynamo tracing, so we will (probably) avoid a lot of the issues we run into when we completely remove the `mark_dynamic` decorators. Differential Revision: [D57933779](https://our.internmc.facebook.com/intern/diff/D57933779) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127453 Approved by: https://github.com/soulitzer, https://github.com/YuqingJ	2024-05-31 21:32:12 +00:00
PyTorch MergeBot	c34f8c7f91	Revert "Lift jagged -> padded dense forward / backward kernels from fbgemm_gpu (#125946 )" This reverts commit `5e69e11d09`. Reverted https://github.com/pytorch/pytorch/pull/125946 on behalf of https://github.com/clee2000 due to sorry the Dr CI fix hasn't been merged yet and its still failing `5e69e11d09` https://github.com/pytorch/pytorch/actions/runs/9228887299/job/25393895252 ([comment](https://github.com/pytorch/pytorch/pull/125946#issuecomment-2130305958))	2024-05-24 20:26:07 +00:00
Joel Schlosser	5e69e11d09	Lift jagged -> padded dense forward / backward kernels from fbgemm_gpu (#125946 ) PyTorch can't depend on `fbgemm_gpu` as a dependency because `fbgemm_gpu` already has a dependency on PyTorch. So this PR copy / pastes kernels from `fbgemm_gpu`: * `dense_to_jagged_forward()` as CUDA registration for new ATen op `_padded_dense_to_jagged_forward()` * `jagged_to_padded_dense_forward()` as CUDA registration for new ATen op `_jagged_to_padded_dense_forward()` CPU impls for these new ATen ops will be added in a follow-up PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125946 Approved by: https://github.com/davidberard98	2024-05-24 19:16:29 +00:00
PyTorch MergeBot	99a11efc8a	Revert "Lift jagged -> padded dense forward / backward kernels from fbgemm_gpu (#125946 )" This reverts commit `e2f081837f`. Reverted https://github.com/pytorch/pytorch/pull/125946 on behalf of https://github.com/clee2000 due to I think dr ci is wrong and the windows build failure is real `e2f081837f` https://github.com/pytorch/pytorch/actions/runs/9216826622/job/25357819877 ([comment](https://github.com/pytorch/pytorch/pull/125946#issuecomment-2128388126))	2024-05-24 02:37:46 +00:00
Joel Schlosser	e2f081837f	Lift jagged -> padded dense forward / backward kernels from fbgemm_gpu (#125946 ) PyTorch can't depend on `fbgemm_gpu` as a dependency because `fbgemm_gpu` already has a dependency on PyTorch. So this PR copy / pastes kernels from `fbgemm_gpu`: * `dense_to_jagged_forward()` as CUDA registration for new ATen op `_padded_dense_to_jagged_forward()` * `jagged_to_padded_dense_forward()` as CUDA registration for new ATen op `_jagged_to_padded_dense_forward()` CPU impls for these new ATen ops will be added in a follow-up PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125946 Approved by: https://github.com/davidberard98	2024-05-24 00:42:59 +00:00
eqy	5f64086d08	[NT][SDPA] Bump tolerances for `test_sdpa_with_packed_in_proj_cuda_bfloat16` (#126356 ) Current tolerances fail on RTX 6000 (Ada) with `Mismatched elements: 2 / 144 (1.4%)` ``` AssertionError: Tensor-likes are not close! Mismatched elements: 2 / 144 (1.4%) Greatest absolute difference: 0.002197265625 at index (5, 0, 0) (up to 0.001 allowed) Greatest relative difference: 0.08203125 at index (3, 0, 0) (up to 0.016 allowed) To execute this test, run the following from the base repo dir: python test/test_nestedtensor.py -k test_sdpa_with_packed_in_proj_cuda_bfloat16 This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ---------------------------------------------------------------------- ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126356 Approved by: https://github.com/drisspg	2024-05-21 03:25:30 +00:00
David Berard	82edc8b5d5	[NT] Make NestedTensor register as having symbolic sizes/strides (#124687 ) Fixes #123698 This PR makes TensorImpl::has_symbolic_sizes_strides return false for NestedTensors. 1. It passes in the actual sizes when we call `_make_wrapper_subclass` - this is the change that makes the subclass register as `has_symbolic_sizes_strides() == True` 2. It adds a field to `_make_wrapper_subclass` where an explicit `numel` can be provided. This allows us to skip the numel computation for the storage, which previously fails due to arithmetic on NestedInts. 3. Implements `aten::numel` for NJT - this is separate from the overridden numel in `make_wrapper_subclass` for now. Note also that this means that we leave `dispatch_sizes_strides_policy="sizes"`, so that we call into the custom `numel` implementation (as well as `sizes` and `strides`), because `numel` cannot currently be computed from `sizes` for NJT. Note also that this depends on #121361, because calling TensorImpl::set_sizes_and_strides() tries to clone the sizes into the tensor, which means that we need `clone` to be implemented on NestedInt. Differential Revision: [D57225736](https://our.internmc.facebook.com/intern/diff/D57225736) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124687 Approved by: https://github.com/albanD	2024-05-13 16:50:25 +00:00
soulitzer	4440d0755a	Support custom layout call under torch dispatch mode (#125379 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/125379 Approved by: https://github.com/jbschlosser	2024-05-02 23:44:12 +00:00
yuqingj	9a70e7f58c	[Nested Tensor]Add unit test that cover the internal use cases (#124880 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124880 Approved by: https://github.com/jbschlosser	2024-04-25 05:04:27 +00:00
David Berard	868e5ced5d	[Dispatcher] Collect autograd sequence numbers on PythonTLSSnapshot dispatch keys (#123304 ) Fixes #121758 TL;DR: When profiling is turned on, the dispatcher will sometimes attach the autograd sequence number to the recorded profiler event. This PR expands the set of profiler events onto which we attach sequence numbers. Before, we'd only attach a sequence number if the current dispatch key was an Autograd dispatch key. Now, we attach a sequence number if the current dispatch key set contains Autograd. Context: The use case for this is torch.profiler for python subclasses. Autograd attaches a "sequence number" to all ops that it encounters during the forward pass. Then, the corresponding sequence number can be associated with a backward kernel when backward is executed. This is used by the profiler to associate the forward ops to the backward ops; a forward op and a backward op with the same sequence number are "linked" in some post-processing step. Prior to this PR, this profiler feature didn't work for python subclasses. The reason is that we don't collect profiler information for all the dispatches for a given kernel; we only dispatch the initial `call`, and not the subsequent `redispatch` invocations. Normally, an Autograd key (if we're running with autograd) is the highest dispatch key, so the initial `call` that we profile is an Autograd key, and we collect the sequence number. But when we're dealing with a python subclass, the first dispatch key is PythonTLSSnapshot, which eventually redispatches into Autograd. We don't record the Autograd sequence number in that case because we don't record redispatches. To fix this, this PR collects a sequence number whenever the dispatch key set contains an Autograd key. That means we might sometimes collect multiple events with the same sequence number, or possibly attach sequence numbers when we won't actually use them? (e.g. maybe if the initial dispatch key handler removes Autograd for some reason). Although this might be a bit confusing for users looking directly at the sequence_nr directly, I think the main use case is for the profiler to create fwd-bwd links; and those should still be generated correctly in these cases. Differential Revision: [D55724190](https://our.internmc.facebook.com/intern/diff/D55724190) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123304 Approved by: https://github.com/soulitzer	2024-04-12 02:01:15 +00:00
Grace Zhang	3e43dc086a	implement bmm to support nested tensor and normal tensor (#119762 ) implement bmm to support nested and normal tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/119762 Approved by: https://github.com/cyyever, https://github.com/ezyang	2024-04-11 01:10:04 +00:00
William Wen	cbde0f048b	[dynamo, 3.12] enable tests disabled due to missing dynamo 3.12 support (#123300 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123300 Approved by: https://github.com/jansel, https://github.com/malfet, https://github.com/zou3519	2024-04-05 20:13:17 +00:00
Joel Schlosser	721dcaff94	Revert usage of NJT views in SDPA (#123215 ) For internal purposes, this PR reverts the use of real views in SDPA -> autograd.Function "views" (i.e. `ViewBufferFromNested` and `ViewNestedFromBuffer`). This is a temporary fix to get the FIRST model launched and working. Note: this breaks some other Dynamo tests related to SDPA that rely on real views, but the breakage there isn't expected to be likely in a real-world scenario. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123215 Approved by: https://github.com/YuqingJ	2024-04-04 18:45:47 +00:00
PyTorch MergeBot	63d17d3c90	Revert "Revert usage of NJT views in SDPA (#123215 )" This reverts commit `0fcddb5625`. Reverted https://github.com/pytorch/pytorch/pull/123215 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but I think it needs to be skipped on ROCm `0fcddb5625` ([comment](https://github.com/pytorch/pytorch/pull/123215#issuecomment-2036080570))	2024-04-04 02:57:09 +00:00
Joel Schlosser	0fcddb5625	Revert usage of NJT views in SDPA (#123215 ) For internal purposes, this PR reverts the use of real views in SDPA -> autograd.Function "views" (i.e. `ViewBufferFromNested` and `ViewNestedFromBuffer`). This is a temporary fix to get the FIRST model launched and working. Note: this breaks some other Dynamo tests related to SDPA that rely on real views, but the breakage there isn't expected to be likely in a real-world scenario. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123215 Approved by: https://github.com/YuqingJ	2024-04-03 23:25:31 +00:00
soulitzer	638b003cb7	[NJT] .to() properly updates device of offsets (#122797 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122797 Approved by: https://github.com/jbschlosser	2024-04-02 16:07:27 +00:00
PyTorch MergeBot	4290a57e9c	Revert "[NJT] .to() properly updates device of offsets (#122797 )" This reverts commit `3e7fd45b40`. Reverted https://github.com/pytorch/pytorch/pull/122797 on behalf of https://github.com/jeffdaily due to Sorry for reverting your change but it is failing CUDA and ROCm jobs in trunk. Please help take a look and reland the change ([comment](https://github.com/pytorch/pytorch/pull/122797#issuecomment-2025473181))	2024-03-28 15:17:45 +00:00
soulitzer	3e7fd45b40	[NJT] .to() properly updates device of offsets (#122797 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122797 Approved by: https://github.com/jbschlosser	2024-03-28 00:56:23 +00:00
Joel Schlosser	e6986e4317	Public API for NJT construction from jagged components (#121518 ) This PR introduces `torch.nested.nested_tensor_from_jagged(values, offsets=None, lengths=None, jagged_dim=1)` (bikeshedding welcome). This is intended to be the main entrypoint for getting an NJT from the `(values, offsets, lengths)` components. The returned NJT is a view of the `values` component. Note that `torch.nested.nested_tensor()` / `torch.nested.as_nested_tensor()` already exist for constructing an NJT from a list of tensors. TODO: * Some doc formatting; suggestions welcome there * Tests / examples using `jagged_dim != 1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121518 Approved by: https://github.com/cpuhrsch ghstack dependencies: #113279, #113280	2024-03-22 14:48:22 +00:00
Joel Schlosser	470b44c048	Support for torch.nested.as_nested_tensor(t) (#113280 ) This PR adds support for tensor inputs to `as_nested_tensor()`. The tensor is treated as a batch of consistently-sized constituents. It utilizes `_nested_view_from_values_offsets()` to return a real view that allows for propagating gradients into inputs. Co-authored-by: voznesenskym <voznesenskym@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113280 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer ghstack dependencies: #113279	2024-03-22 02:12:37 +00:00
Joel Schlosser	cd6bfc7965	Proper view support for jagged layout NestedTensor (#113279 ) This PR: * Introduces an ATen op for creating true jagged views from a dense values buffer * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)` * This ops is implemented on the Python side using torch.library so we can return a subclass instance * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer` * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()` * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view * Introduces an ATen op for accessing the `values` component of an NT via a view * `_nested_get_values(nt)` * Removes the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively. * Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly * Similarly, avoid `buffer_from_jagged()`, preferring `values()` * Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack) With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling. Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922) Co-authored-by: voznesenskym <voznesenskym@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279 Approved by: https://github.com/ezyang	2024-03-22 02:12:36 +00:00
PyTorch MergeBot	224beecee6	Revert "Proper view support for jagged layout NestedTensor (#113279 )" This reverts commit `5855c490f0`. Reverted https://github.com/pytorch/pytorch/pull/113279 on behalf of https://github.com/jbschlosser due to Need to fix BC thing ([comment](https://github.com/pytorch/pytorch/pull/113279#issuecomment-2013899762))	2024-03-21 22:03:01 +00:00
PyTorch MergeBot	12e7602cf9	Revert "Support for torch.nested.as_nested_tensor(t) (#113280 )" This reverts commit `17c9c70265`. Reverted https://github.com/pytorch/pytorch/pull/113280 on behalf of https://github.com/jbschlosser due to Need to fix BC thing ([comment](https://github.com/pytorch/pytorch/pull/113280#issuecomment-2013893099))	2024-03-21 22:00:44 +00:00
PyTorch MergeBot	816db3bd29	Revert "Public API for NJT construction from jagged components (#121518 )" This reverts commit `d4dff9cf5e`. Reverted https://github.com/pytorch/pytorch/pull/121518 on behalf of https://github.com/jbschlosser due to Need to fix BC thing ([comment](https://github.com/pytorch/pytorch/pull/121518#issuecomment-2013879641))	2024-03-21 21:56:29 +00:00
Christian Puhrsch	16935de961	Support alias for NestedTensorCPU/CUDA (#117711 ) Fixes #ISSUE_NUMBER Co-authored-by: Vincent Moens <vmoens@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/117711 Approved by: https://github.com/albanD	2024-03-21 16:05:52 +00:00
Joel Schlosser	d4dff9cf5e	Public API for NJT construction from jagged components (#121518 ) This PR introduces `torch.nested.nested_tensor_from_jagged(values, offsets=None, lengths=None, jagged_dim=1)` (bikeshedding welcome). This is intended to be the main entrypoint for getting an NJT from the `(values, offsets, lengths)` components. The returned NJT is a view of the `values` component. Note that `torch.nested.nested_tensor()` / `torch.nested.as_nested_tensor()` already exist for constructing an NJT from a list of tensors. TODO: * Some doc formatting; suggestions welcome there * Tests / examples using `jagged_dim != 1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121518 Approved by: https://github.com/cpuhrsch ghstack dependencies: #113280	2024-03-21 04:14:17 +00:00
Joel Schlosser	17c9c70265	Support for torch.nested.as_nested_tensor(t) (#113280 ) This PR adds support for tensor inputs to `as_nested_tensor()`. The tensor is treated as a batch of consistently-sized constituents. It utilizes `_nested_view_from_values_offsets()` to return a real view that allows for propagating gradients into inputs. Co-authored-by: voznesenskym <voznesenskym@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113280 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2024-03-21 04:13:55 +00:00
Joel Schlosser	5855c490f0	Proper view support for jagged layout NestedTensor (#113279 ) This PR: * Introduces an ATen op for creating true jagged views from a dense values buffer * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)` * This ops is implemented on the Python side using torch.library so we can return a subclass instance * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer` * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()` * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view * Introduces an ATen op for accessing the `values` component of an NT via a view * `_nested_get_values(nt)` * Removes the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively. * Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly * Similarly, avoid `buffer_from_jagged()`, preferring `values()` * Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack) With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling. Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922) Co-authored-by: voznesenskym <voznesenskym@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279 Approved by: https://github.com/ezyang	2024-03-20 23:45:34 +00:00
Joel Schlosser	ea8f6e2e54	Subclass view fake-ification via reified ViewFuncs (#118405 ) This PR: * Uses reified ViewFuncs to swap in fake tensors / symbolic SymInts for view replay during subclass view fake-ification * Enables automatic dynamic on view bases -> fakeifies according to the resultant symbolic context instead of the old "all-static" approach * Covers the following view types: * subclass -> dense * dense -> subclass * subclass -> subclass * Dense -> dense views are handled the old way via an `as_strided()` call, as it's likely there is no view func available Differential Revision: [D54269082](https://our.internmc.facebook.com/intern/diff/D54269082) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118405 Approved by: https://github.com/ezyang	2024-03-07 19:56:16 +00:00
Chengji Yao	0e604becc5	[NJT] support chunk on batch dim (#119713 ) - support chunk op on batch dim - support empty_like op - add tests for the like ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/119713 Approved by: https://github.com/jbschlosser	2024-03-05 17:57:50 +00:00
soulitzer	312ce35c1f	Rename singleton int to nested int (#119661 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119661 Approved by: https://github.com/ezyang	2024-02-16 19:21:17 +00:00
Joel Schlosser	756cf2913d	Fix NJT stride access in SDPA dispatcher logic (#119846 ) `._stride` -> `._strides` Adds test to cover this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119846 Approved by: https://github.com/drisspg, https://github.com/ani300, https://github.com/soulitzer ghstack dependencies: #119910	2024-02-14 22:37:52 +00:00
Joel Schlosser	0560c193a6	Fix meta registration for _flash_attention_forward() [ROCm forward fix] (#119910 ) Addresses ROCm failures from #119812 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119910 Approved by: https://github.com/drisspg	2024-02-14 22:37:52 +00:00
Joel Schlosser	31e59766e7	Fix meta registration for _flash_attention_forward() (#119812 ) Meta registration wrongly assumes 4D inputs, while the underlying op allows 3D inputs for the `mha_varlen_fwd()` case. Testing: I added `detach()`es so the NJT test `test_sdpa_compile()` won't fail for a view-related reason. It should pass now with this fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119812 Approved by: https://github.com/drisspg	2024-02-14 02:38:53 +00:00
PyTorch MergeBot	8994f2367d	Revert "Fix jagged NT softmax semantics (#119459 )" This reverts commit `6adadbaf79`. Reverted https://github.com/pytorch/pytorch/pull/119459 on behalf of https://github.com/malfet due to broke dynamo, see https://github.com/pytorch/pytorch/actions/runs/7835402753/job/21386634602 ([comment](https://github.com/pytorch/pytorch/pull/119459#issuecomment-1935246413))	2024-02-09 02:31:49 +00:00
Joel Schlosser	6adadbaf79	Fix jagged NT softmax semantics (#119459 ) Before: `softmax` definition uses `jagged_unary_pointwise()` (wrong) After: `softmax` impl adjusts the `dim` arg to account for the difference in dimensionality between the outer NT and the NT's `_values` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119459 Approved by: https://github.com/soulitzer	2024-02-08 20:13:12 +00:00
David Berard	278a0e1600	[NestedTensor] Support binary pointwise ops with >2 inputs (if inputs are non-tensors) (#119419 ) It should usually be safe to run pointwise binary ops with >2 inputs. e.g. threshold_backward(tensor, tensor, scalar): we just operate on the values of the nested tensors, and pass in the other args as-is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119419 Approved by: https://github.com/soulitzer	2024-02-08 20:06:40 +00:00
Huy Do	3ed9df36a9	Clean up some obsolete TODOs in run_test and several test files (#119113 ) * The TODOs in `test/test_nestedtensor.py` has been mitigated, I keep the issue for reference. * ~~The TODOs in `test/test_ops_fwd_gradients.py` doesn't apply anymore~~ * The TODOs in `run_test.py` to support disabling C++ tests is probably not going to happen. I have never seen a flaky C++ test that needs to be disabled before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119113 Approved by: https://github.com/kit1980	2024-02-03 23:54:30 +00:00
David Berard	460950d3aa	[Nested Tensor] Support ragged_idx != 1 on aten::is_same_size, aten::_to_copy (#118442 ) is_same_size is needed internally; `_to_copy` should be easy because it doesn't support new layouts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118442 Approved by: https://github.com/cpuhrsch	2024-01-30 01:32:51 +00:00
David Berard	2842d3c9d3	[Nested Tensor] view: basic support for ragged_idx != 1 and _unsafe_view (#118317 ) Uses case: `_unsafe_view` is used in aot_autograd to create a view that doesn't register as a view: `eebe7e1d37/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py (L470-L476)` If a transposed nested tensor (i.e. NT with ragged_idx != 1) encounters this code path, it previously would fail for two reasons: 1) because `_unsafe_view` isn't registered, and 2) because ragged_idx != 1 is not supported. This PR adds support for `_unsafe_view` (completely reusing the implementation of `view`; this just registers `_unsafe_view` as another op using the same implementation). It also adds support for ragged_idx != 1, but only for trivial cases where inp._size == size (the use case used by aot_autograd). Tests: verify that the result of `_unsafe_view` doesn't have a `_base`, and that simple views on transposed NTs work. Differential Revision: [D53096814](https://our.internmc.facebook.com/intern/diff/D53096814) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118317 Approved by: https://github.com/soulitzer	2024-01-26 17:29:37 +00:00
David Berard	52c5803088	[NestedTensor] Support ragged_idx != 1 in pointwise ops (#118157 ) This PR allows pointwise ops to operate on tensors with ragged_idx != 1. It does this by passing the ragged_idx metadata into the construction of the returned NestedTensor when computing pointwise ops. The assumption is that: pointwise ops can operate directly on the values tensors, and the resulting tensor should have all the same metadata properties as the input tensors. For binary ops, a test is added to verify that adding two tensors with different ragged_idx cannot be added. Previously: * unary pointwise ops would error out when performed on nested tensors with ragged_idx != 1 * binary pointwise ops would produce tensors with nonsense shapes Differential Revision: [D53032641](https://our.internmc.facebook.com/intern/diff/D53032641) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118157 Approved by: https://github.com/jbschlosser	2024-01-25 23:34:15 +00:00
YuqingJ	a97d00cca5	[Nested Tensor]Support SDPA math fallback for jagged layout nested tensor (#116445 ) Support this fallback by converting the jagged layout NT to strided layout NT, and the convert the result back to jagged layout NT. This fallback might not be efficient since it uses unbind, contiguous and split. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116445 Approved by: https://github.com/soulitzer	2024-01-12 17:30:40 +00:00
PyTorch MergeBot	9f87760160	Revert "[Nested Tensor]Support SDPA math fallback for jagged layout nested tensor (#116445 )" This reverts commit `e55a778cbb`. Reverted https://github.com/pytorch/pytorch/pull/116445 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but i see it fails ROCm test in trunk due to an unsupported use case `e55a778cbb` ([comment](https://github.com/pytorch/pytorch/pull/116445#issuecomment-1888060036))	2024-01-11 22:21:45 +00:00
YuqingJ	e55a778cbb	[Nested Tensor]Support SDPA math fallback for jagged layout nested tensor (#116445 ) Support this fallback by converting the jagged layout NT to strided layout NT, and the convert the result back to jagged layout NT. This fallback might not be efficient since it uses unbind, contiguous and split. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116445 Approved by: https://github.com/soulitzer	2024-01-11 20:28:40 +00:00
Joel Schlosser	f70aeb4ffd	Fix backward for reshape() on jagged layout NT (#117137 ) Provides symbolic C++-side `reshape_as()` / `reshape()` decomps for jagged layout NTs to make the backwards pass work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117137 Approved by: https://github.com/soulitzer	2024-01-10 23:35:07 +00:00
Joel Schlosser	0b0c76bace	Support squeeze.dim for jagged NT (#116891 ) As title. Needed for `rev_view_func()` of `unsqueeze()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116891 Approved by: https://github.com/soulitzer ghstack dependencies: #115894, #116512	2024-01-06 01:00:53 +00:00
Joel Schlosser	3c21264c9b	Introduce reverse view_funcs (#115894 ) Part 2 of implementation for general [subclass view fake-ification](https://docs.google.com/document/d/1C5taWiplmX7nKiURXDOAZG2W5VNJ2iV0fQFq92H0Cxw). Details: * Codegen `rev_view_func()` alongside `view_func()` * Reverse view_func gives you a "base" from a "view": `rev_view_func(new_view) -> new_base` AKA it plays the original view backwards * Utilizes the functional inverses defined in `FunctionalInverses.cpp`, passing `InverseReturnMode::AlwaysView` * Manually implements functional inverses for `narrow()` and `chunk()` * NB: Multi-output views now set view_func() / rev_view_func() for each of the output views! * Due to this, the `as_view()` overload that operates on a list of views is scrapped in favor of iteration via codegen Example codegen in `ADInplaceOrViewTypeN.cpp`: ```cpp at::Tensor narrow(c10::DispatchKeySet ks, const at::Tensor & self, int64_t dim, c10::SymInt start, c10::SymInt length) { auto _tmp = ([&]() { at::AutoDispatchBelowADInplaceOrView guard; return at::_ops::narrow::redispatch(ks & c10::after_ADInplaceOrView_keyset, self, dim, start, length); })(); std::function<at::Tensor(const at::Tensor&)> func=nullptr; std::function<at::Tensor(const at::Tensor&)> rev_func=nullptr; if (false \|\| !self.unsafeGetTensorImpl()->support_as_strided() \|\| c10::AutogradState::get_tls_state().get_view_replay_enabled()) { func = [=](const at::Tensor& input_base) { return at::_ops::narrow::call(input_base, dim, start, length); }; rev_func = [=](const at::Tensor& input_view) { // NB: args from narrow() signature are passed along to the inverse return at::functionalization::FunctionalInverses::narrow_copy_inverse(self, input_view, at::functionalization::InverseReturnMode::AlwaysView, dim, start, length); }; } auto result = as_view(/* base / self, / output / _tmp, / is_bw_differentiable / true, / is_fw_differentiable / true, / view_func / func, / rev_view_func / rev_func, / creation_meta */ InferenceMode::is_enabled() ? CreationMeta::INFERENCE_MODE : (at::GradMode::is_enabled() ? CreationMeta::DEFAULT : CreationMeta::NO_GRAD_MODE)); return result; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115894 Approved by: https://github.com/soulitzer	2024-01-05 16:48:12 +00:00
Joel Schlosser	ea3a5f8ddc	Add chunk for jagged layout NT (#115842 ) Nice to have for the [SDPA tutorial](https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115842 Approved by: https://github.com/soulitzer ghstack dependencies: #115192, #116111	2023-12-20 20:13:20 +00:00
Joel Schlosser	29b198dcf8	Add markDynamoStrictTest to NT tests (#116111 ) Decorates all NT tests with `@markDynamoStrictTest` to ensure we get the correct signal. Adds xfails where needed to get things passing. Includes a fix in meta_utils.py for a bug that was breaking several python 3.11 tests. In particular, a dense tensor graph input that is a view of a strided NT would slip past Dynamo's check and break in meta-ification. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116111 Approved by: https://github.com/soulitzer, https://github.com/zou3519 ghstack dependencies: #115192	2023-12-20 20:13:20 +00:00
Joel Schlosser	1474eb5f29	Fix jagged composite impl of flatten() (#115192 ) Need to handle this in `NestedTensor.__torch_function__()` since it's CompositeImplicit Pull Request resolved: https://github.com/pytorch/pytorch/pull/115192 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2023-12-19 19:15:21 +00:00
Joel Schlosser	bf62511e07	Reshape decomposition for jagged layout NT (#115191 ) No more segfault from using `reshape()` on jagged NT :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115191 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2023-12-18 22:34:41 +00:00
Joel Schlosser	6fee208064	Handle -1 in jagged layout NT view ops (#115843 ) Allows for inheriting the ragged and batch dims via -1: ```python nt.view(-1, -1, D) nt.expand(B, -1, D) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115843 Approved by: https://github.com/soulitzer ghstack dependencies: #115636	2023-12-15 00:42:47 +00:00
Joel Schlosser	0ff155fb65	Fix SDPA for SAM (#115636 ) Addresses the regression for Segment Anything Fast in https://github.com/pytorch-labs/segment-anything-fast/issues/99 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115636 Approved by: https://github.com/soulitzer, https://github.com/ani300	2023-12-12 18:52:38 +00:00
soulitzer	8885128dcc	Fix backward for SDPA NT jagged layout (#115576 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115576 Approved by: https://github.com/jbschlosser, https://github.com/ani300	2023-12-12 18:35:40 +00:00
Yanbo Liang	da341d0d48	[Dynamo][6.1/N] Refactor out TorchInGraphFunctionVariable and improve heuristic (#113432 ) This is splitted from #113009, please check https://github.com/pytorch/pytorch/pull/113009#issuecomment-1804417925 for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113432 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-12-09 05:11:44 +00:00
PyTorch MergeBot	e8e4141773	Revert "[Dynamo][6.1/N] Refactor out TorchInGraphFunctionVariable and improve heuristic (#113432 )" This reverts commit `e61d6b42f0`. Reverted https://github.com/pytorch/pytorch/pull/113432 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing dynamo tests in trunk `e61d6b42f0`, landrace? ([comment](https://github.com/pytorch/pytorch/pull/113432#issuecomment-1847787981))	2023-12-08 20:15:39 +00:00
Yanbo Liang	e61d6b42f0	[Dynamo][6.1/N] Refactor out TorchInGraphFunctionVariable and improve heuristic (#113432 ) This is splitted from #113009, please check https://github.com/pytorch/pytorch/pull/113009#issuecomment-1804417925 for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113432 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-12-08 17:15:14 +00:00
Joel Schlosser	3b01f30b20	Prevent invalid pointwise ops on jagged with transposed ragged dim (#115190 ) TODO: tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/115190 Approved by: https://github.com/soulitzer, https://github.com/ani300	2023-12-08 00:54:03 +00:00
Yanbo Liang	4620170008	[Dynamo] Revert multiple PRs since they triggered compilation stuck internally (#115126 ) Revert the following PRs to mitigate internal compilation stuck: #113432 #114016 #114507 #114196 #114739 #114669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115126 Approved by: https://github.com/xush6528	2023-12-05 22:35:37 +00:00
Antoni Viros	1dc4588c6a	Add an SDPA dispatcher for nested tensors with jagged layouts (#114164 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114164 Approved by: https://github.com/jbschlosser	2023-12-05 06:33:45 +00:00
PyTorch MergeBot	5cfda9b7f8	Revert "Add an SDPA dispatcher for nested tensors with jagged layouts (#114164 )" This reverts commit `aafa8233a4`. Reverted https://github.com/pytorch/pytorch/pull/114164 on behalf of https://github.com/malfet due to Broke ROCM, see `aafa8233a4` ([comment](https://github.com/pytorch/pytorch/pull/114164#issuecomment-1839798986))	2023-12-05 00:35:20 +00:00
Antoni Viros	aafa8233a4	Add an SDPA dispatcher for nested tensors with jagged layouts (#114164 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114164 Approved by: https://github.com/jbschlosser	2023-12-04 21:54:02 +00:00
Yanbo Liang	033d7b670a	[Dynamo][6.1/N] Refactor out TorchInGraphFunctionVariable and improve heuristic (#113432 ) This is splitted from #113009, please check https://github.com/pytorch/pytorch/pull/113009#issuecomment-1804417925 for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113432 Approved by: https://github.com/ezyang	2023-11-17 23:42:00 +00:00
Joel Schlosser	ea39cc34f9	Refactor NestedTensor subclass to remove ragged_size from constructor (#113491 ) This PR removes the need for passing `ragged_size` into the `NestedTensor` constructor. This was an artifact of fake-ification, where sometimes we needed the NT to have a symbolic singleton symint shape for the ragged dimension. The new way of achieving this is to also store mappings between fake / functional tensors -> symbolic symints in the ragged structure registry. Now the `NestedTensor` constructor can just query this registry for the `ragged_size`. Old: `NestedTensor(values, offsets, , ragged_size=None, kwargs)` New: `NestedTensor(values, offsets, *kwargs)` This makes it possible to have a `_nested_view_from_values_offsets(values, offsets)` without needing to pass a `ragged_size`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113491 Approved by: https://github.com/ezyang, https://github.com/soulitzer	2023-11-14 19:32:21 +00:00
Antoni Viros	1aece432ba	Implement narrow from a regular tensor to jagged tensor (#112770 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112770 Approved by: https://github.com/cpuhrsch	2023-11-13 19:09:59 +00:00
Yuqing Jiang	9f3e378125	[nested tensor]add split and layer_norm_backward operations (#113108 ) Summary: Add split and layer_norm_backward. Note: It is non trivial to support split_with_sizes backward so adding the split operation to support the use case in the model. Test Plan: unit tests Differential Revision: D51052966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113108 Approved by: https://github.com/soulitzer	2023-11-08 07:44:35 +00:00
soulitzer	538ec4942a	Do not generate zero-numel NT by default in helper and improve to_padded_tensor msg (#113162 ) Improvements: improves to_padded_tensor error message when passed a NT with zero numel Pull Request resolved: https://github.com/pytorch/pytorch/pull/113162 Approved by: https://github.com/jbschlosser ghstack dependencies: #113031, #112519, #113091	2023-11-07 19:56:26 +00:00
soulitzer	0c991acab0	Factor out test_nestedtensor setUp tearDown and call super (#113091 ) Fixes https://github.com/pytorch/pytorch/issues/112845 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113091 Approved by: https://github.com/jbschlosser ghstack dependencies: #113031, #112519	2023-11-07 19:56:26 +00:00
soulitzer	c2084da14a	[NT] Backward support for broadcasting binary ops (#112519 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112519 Approved by: https://github.com/jbschlosser ghstack dependencies: #113031	2023-11-07 00:03:21 +00:00
soulitzer	53fff56ab8	Graph break cleanly for test_nestedtensor (#112662 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112662 Approved by: https://github.com/jbschlosser	2023-11-03 07:20:43 +00:00
Joel Schlosser	51a38380d1	Fix torch.load(..., weights_only=True) for NT (#112516 ) Found when looking into #112509 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112516 Approved by: https://github.com/soulitzer	2023-11-02 14:41:04 +00:00
Yuqing Jiang	24f217ee64	[Nested tensor] Add more ops in Python subclass nested tensor (#112302 ) Summary: Add dropout, split_with_sizes, and silu operations in python subclass nested tensor Test Plan: unit tests Differential Revision: D50676812 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112302 Approved by: https://github.com/soulitzer, https://github.com/jbschlosser	2023-10-31 20:57:05 +00:00
William Wen	a380bf3297	[dynamo, test] skip flaky dynamo-wrapped tests (#112310 ) ghstack-source-id: 7a87e33e7513e7924e4513b6473284562989ed4c Pull Request resolved: https://github.com/pytorch/pytorch/pull/112309 Skip flaky tests reported by - https://github.com/pytorch/pytorch/issues/111825 - https://github.com/pytorch/pytorch/issues/111826 - https://github.com/pytorch/pytorch/issues/111909 - https://github.com/pytorch/pytorch/issues/112142 - https://github.com/pytorch/pytorch/issues/112220 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112310 Approved by: https://github.com/xmfan	2023-10-28 04:14:57 +00:00
Joel Schlosser	2225e6361d	Support for as_nested_tensor() with jagged layout + fixed nested_tensor() semantics (#112304 ) This PR: * Adds support for the `layout` kwarg to `torch.nested.as_nested_tensor()` * Fixes `torch.nested.nested_tensor()` * It should accept a list of lists of scalars * It should not preserve autograd history * Adds extensive testing for these two functions Semantics for the two functions follow those of the strided layout: * `torch.nested.nested_tensor(tensor_list, layout=torch.jagged)`: Creates a new jagged layout NT with no autograd history * `tensor_list` can be a list of Tensors or list of lists of scalars * `torch.nested.as_nested_tensor(tensor_list, layout=torch.jagged)`: Creates a new jagged layout NT preserving autograd history of `tensor_list` * `tensor_list` must be a list of Tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/112304 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2023-10-28 02:34:27 +00:00
Antoni Viros	668c3b3f3b	Add embedding op to jagged NT (#112288 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112288 Approved by: https://github.com/cpuhrsch	2023-10-28 01:29:17 +00:00
soulitzer	73170b23d4	Add compile support for NT unbind (#111531 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111531 Approved by: https://github.com/ezyang	2023-10-23 21:16:20 +00:00
Joel Schlosser	ba2ba9621c	More NT subclass op support for SAM (#111253 ) With this PR, we have full op support for SAM without needing to unwrap subclass into jagged buffer -> run ops -> rewrap manually. Specifically, this was previously happening in the MaskDecoder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111253 Approved by: https://github.com/soulitzer, https://github.com/cpuhrsch	2023-10-18 21:21:28 +00:00
soulitzer	2dc1726ab7	Compile NestedTensor with AOTAutograd (#110529 ) This PR has a number of changes that improve subclass support for AOTAutograd/Inductor in general: - previously if a subclass does extra aliasing between graph outputs/inputs in a way, the partitioner would complain because grad_outputs are the outputs reused as-is. Now we do a view_as(self) to workaround this. - Use dense -> dense metadata when working with fwd_output_strides during backward. This is important since the stride information comes from inductor which sees the dense to dense graph. - Inductor requires that the inputs to the compiled backward to match some expected strides computed during compilation. We make sure to make the inner tensors of the subclass contiguous (previously, we only made the subclass itself contiguous) Changes specific to NestedTensor relevant to compilation: - Properly handle the case where `__tensor_unflatten__` is passed non-symbolic dense tensors and with meta extracted from fake subclasses. - Skip var_to_range logic for singleton int - Skip size hint logic in inductor for singleton int Pull Request resolved: https://github.com/pytorch/pytorch/pull/110529 Approved by: https://github.com/bdhirsh	2023-10-17 21:17:10 +00:00
Jesse Cai	4c01686027	Public API for constructing NT with jagged layout from tensor list (#111078 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111078 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer ghstack dependencies: #109123	2023-10-13 03:27:41 +00:00
soulitzer	110382bacf	Make NestedTensor compilable with eager backend (#109171 ) In this PR: - Adds support for strides for jagged tensor (design doc for this coming soon) - NestedTensor skips automatic dynamic - Make use of @bdhirsh's subclass fakification logic by adding the __tensor_{un,}flatten__ functions. - Additional logic for fakification: since existing subclass fakification logic does not handle the case where the outer tensor has an additional dimension. We insert one-off logic to (1) insert an extra SingletonSymInt onto the fakified NestedTensor. (2) make sure we call track_symint on both the sizes on the inner and outer tensor during guard creation. Remaining things that are weird: - Still need to skip some logic in meta utils for some reason (I was going to write this up more, but decided not to since we're not able to do this anyway for a immediate reason: we cannot arbitrarily compare singleton ints. For now I'm just following Brian's advise from [here](https://github.com/pytorch/pytorch/pull/109171#discussion_r1328137070) ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109171 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-10-11 04:47:10 +00:00
Joel Schlosser	ac01304e22	pin_memory support for NT (#110404 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110404 Approved by: https://github.com/cpuhrsch, https://github.com/albanD ghstack dependencies: #110292	2023-10-10 21:58:19 +00:00
soulitzer	fda0a965c7	[reland] Support SingletonSymNode mul with coefficient (#110673 ) reland of https://github.com/pytorch/pytorch/pull/110369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110673 Approved by: https://github.com/ezyang	2023-10-10 19:37:17 +00:00
Joel Schlosser	17348b0f51	Implement split_with_sizes backward for NT (#110647 ) Needed internally. Note that `split_with_sizes()` for NT is currently supported only on `dim=-1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110647 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer ghstack dependencies: #110646	2023-10-06 18:44:22 +00:00
PyTorch MergeBot	81ce5d5725	Revert "pin_memory support for NT (#110404 )" This reverts commit `3597325bc7`. Reverted https://github.com/pytorch/pytorch/pull/110404 on behalf of https://github.com/kit1980 due to Previous PR in the stack caused CUDA memory leaks ([comment](https://github.com/pytorch/pytorch/pull/110404#issuecomment-1749850211))	2023-10-06 01:03:17 +00:00
PyTorch MergeBot	1c3fae46ee	Revert "Support SingletonSymNode mul with coefficient (#110369 )" This reverts commit `eb8feb8ff8`. Reverted https://github.com/pytorch/pytorch/pull/110369 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110369#issuecomment-1749802899))	2023-10-05 23:51:28 +00:00
Joel Schlosser	3597325bc7	pin_memory support for NT (#110404 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110404 Approved by: https://github.com/cpuhrsch, https://github.com/albanD ghstack dependencies: #110292	2023-10-05 16:33:22 +00:00
soulitzer	eb8feb8ff8	Support SingletonSymNode mul with coefficient (#110369 ) We want to be able to use SingletonSymNode to represent strides for Jagged layout tensor. The following is for 3D, but easily generalizable to higher dimensions. Constraints: - [B, x, D] (where x represents the "variably lengthed dim") can be strided in two ways [x, 1, sum(x)] and [dx, d, 1]. We need two different placeholder values depending on how the jagged tensor is strided. - When doing operations we need the strides of output tensors to be expressable in terms of the strides and sizes of the inner tensors. Given [B, x, D] @ [D, D'], the output strides is [x * D', D', 1] rather than some opaque [x2, D', 1]. This constraint exists because if I'm tracing, I need a symint to represent the output stride. This symint needs to come from somewhere; I get it in several ways: (1) create a constant, (2) unbacked symint, (3) create a new input using a source, (4) output of an operation on an existing symint. It is clear that (4) is what we want here, which brings us to the design below. Design: Given the two constraints, the most straightforward way to implement this is actually to update SingletonSymNode to include some scalar factor, i.e. Morally, SingletonSymNode represents `factor * [s_0, s_1, …, s_n]` This enables us to symbolically compute strides from sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110369 Approved by: https://github.com/ezyang ghstack dependencies: #110044	2023-10-04 22:56:15 +00:00
Joel Schlosser	3693777a86	Pickle support for NT (#110219 ) Fixes #104198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110219 Approved by: https://github.com/cpuhrsch	2023-09-29 15:30:06 +00:00
soulitzer	2bcff92540	Add NestedTensor python subclass (#108314 ) Description coming soon Pull Request resolved: https://github.com/pytorch/pytorch/pull/108314 Approved by: https://github.com/jbschlosser ghstack dependencies: #108808	2023-09-11 18:29:20 +00:00
Joel Schlosser	e5548f8195	NT support for cat with dim > 0 when representable as jagged (#108428 ) Used in SAM Pull Request resolved: https://github.com/pytorch/pytorch/pull/108428 Approved by: https://github.com/cpuhrsch ghstack dependencies: #108361, #108370, #108362	2023-09-03 01:50:32 +00:00
Joel Schlosser	76ccf6c770	NT support for narrow() on dim=0 (#108362 ) Satisfies request here: https://github.com/pytorch/pytorch/issues/105913#issuecomment-1652249934 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108362 Approved by: https://github.com/cpuhrsch ghstack dependencies: #108361, #108370	2023-09-02 23:48:37 +00:00
Joel Schlosser	54dcb0ea61	NT support for matmul of (B, *, C, D) NT with dense (D, E) (#108370 ) Used in SAM Pull Request resolved: https://github.com/pytorch/pytorch/pull/108370 Approved by: https://github.com/cpuhrsch ghstack dependencies: #108361	2023-09-01 20:45:44 +00:00
Joel Schlosser	414cb26ded	NT support for cat with dim=0 (#108361 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108361 Approved by: https://github.com/cpuhrsch	2023-09-01 02:02:53 +00:00
Jirka Borovec	9178deedff	removing some redundant str splits (#106089 ) drop some redundant string splits, no factual changes, just cleaning the codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/106089 Approved by: https://github.com/albanD, https://github.com/malfet	2023-09-01 00:22:58 +00:00
Joel Schlosser	33d70be95f	Binary out-of-place ge.Scalar / eq.Scalar support for NT (#107892 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107892 Approved by: https://github.com/cpuhrsch ghstack dependencies: #107891	2023-08-28 15:18:37 +00:00
Joel Schlosser	e917d2749a	Unary out-of-place sin / cos support for NT (#107891 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107891 Approved by: https://github.com/cpuhrsch	2023-08-28 15:17:34 +00:00
Justin Chu	73e1455327	[BE] Enable ruff's UP rules and autoformat test/ (#105434 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434 Approved by: https://github.com/albanD	2023-07-19 20:36:06 +00:00
drisspg	4b9ba3fad5	Allow discontiguous NestedTensors to empty_like (#98383 ) # Summary Preivously we disallowd dis-contiguous NTs to passed into to empty_like. This was done out of an abundance of caution, :think:. However it should be safe to create an empty NT for dis-contiguous NTs. Empty like does account for offsets, strides, and sizes in construction of the result and therefore this should be safe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98383 Approved by: https://github.com/cpuhrsch	2023-05-03 02:27:08 +00:00
Driss Guessous	5a81508bb6	Add NestedTensor ops: logical_not, logical_not_, masked_fill (#97934 ) # Summary <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 7954302</samp> This pull request adds support for `logical_not` and `masked_fill` operations on nested tensors, which are tensors that can have tensors as elements. It modifies the `native_functions.yaml` file to dispatch these operations to the nested tensor backend, implements the logic for these operations in `NestedTensorBinaryOps.cpp` and `NestedTensorUnaryOps.cpp`, adds documentation in `nested.rst`, and adds tests in `test_nestedtensor.py`. ## Description <!-- copilot:walkthrough --> ### <samp>🤖 Generated by Copilot at 7954302</samp> * Implement `logical_not` operation on nested tensors ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991R1164), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991R1172), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-f7c94671810b3ce652f9ad5458518cb7bbd67e8bf7e84e0a2fba641d878ba7c5R45-R56), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-c8b131d009badb3f92031b2aaa6e7f93a793f13caee278ea78e1c57d78c0399eR203), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-6eef496a8ec635930b6e52507358e069c80021f3535b8737d39e14ffc38950c0L854-R867)) - Add `NestedTensor_logical_not` and `NestedTensor_logical_not_` functions to `native_functions.yaml` for CPU and CUDA dispatch ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991R1164), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991R1172)) - Define `NestedTensor_logical_not` and `NestedTensor_logical_not_` functions in `NestedTensorUnaryOps.cpp` using `map_nt` and `get_buffer` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-f7c94671810b3ce652f9ad5458518cb7bbd67e8bf7e84e0a2fba641d878ba7c5R45-R56)) - Document `torch.logical_not` function for nested tensors in `nested.rst` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-c8b131d009badb3f92031b2aaa6e7f93a793f13caee278ea78e1c57d78c0399eR203)) - Add subtest for `logical_not` function in `test_activations` method in `TestNestedTensorDeviceType` class in `test_nestedtensor.py` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-6eef496a8ec635930b6e52507358e069c80021f3535b8737d39e14ffc38950c0L854-R867)) * Implement `masked_fill` operation on nested tensors ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991R7439), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-f847e41e3d373230df0b25574e993ec0e6b699bf16796b3df9ae9fb518048e25L210-R224), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-c8b131d009badb3f92031b2aaa6e7f93a793f13caee278ea78e1c57d78c0399eR197), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-6eef496a8ec635930b6e52507358e069c80021f3535b8737d39e14ffc38950c0R677-R688), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-6eef496a8ec635930b6e52507358e069c80021f3535b8737d39e14ffc38950c0R2515-R2528)) - Add `NestedTensor_masked_fill` function to `native_functions.yaml` for CPU and CUDA dispatch ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991R7439)) - Define `NestedTensor_masked_fill` function in `NestedTensorBinaryOps.cpp` using `NestedTensor_elementwise_Tensor` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-f847e41e3d373230df0b25574e993ec0e6b699bf16796b3df9ae9fb518048e25L210-R224)) - Document `torch.Tensor.masked_fill` function for nested tensors in `nested.rst` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-c8b131d009badb3f92031b2aaa6e7f93a793f13caee278ea78e1c57d78c0399eR197)) - Add test case for `masked_fill` function in `TestNestedTensorDeviceType` class in `test_nestedtensor.py` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-6eef496a8ec635930b6e52507358e069c80021f3535b8737d39e14ffc38950c0R677-R688)) - Add test case for backward pass of `masked_fill` function in `TestNestedTensorAutograd` class in `test_nestedtensor.py` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-6eef496a8ec635930b6e52507358e069c80021f3535b8737d39e14ffc38950c0R2515-R2528)) * Improve error message for unsupported element-wise binary operations on nested dense tensors ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-f847e41e3d373230df0b25574e993ec0e6b699bf16796b3df9ae9fb518048e25L142-R150)) - Modify `NestedTensor_elementwise_Tensor` function in `NestedTensorBinaryOps.cpp` to include operation name in error message ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-f847e41e3d373230df0b25574e993ec0e6b699bf16796b3df9ae9fb518048e25L142-R150)) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97934 Approved by: https://github.com/cpuhrsch	2023-03-30 08:14:39 +00:00
Driss Guessous	f603873c1b	add various NT ops needed for testing (#97837 ) # Summary Add some Simple unary and binary NT ops - Sub - sgn - abs Pull Request resolved: https://github.com/pytorch/pytorch/pull/97837 Approved by: https://github.com/cpuhrsch	2023-03-29 23:43:37 +00:00
Joel Schlosser	0d66db1b2a	Implement last dim split_with_sizes for NT (forward only, non-SymInt-ified) (#97446 ) This is needed for the HSTU model. Details: * ~~NT `chunk` now calls into NT `split_with_sizes` since the latter is more general~~ (removed; they're totally separate) * Throws for backward * Only operates over the last dim (`dim=-1`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97446 Approved by: https://github.com/cpuhrsch	2023-03-23 22:17:06 +00:00
drisspg	db15d191b6	Update NestedTensor add to support non identical striding for NT+NT (#97195 ) # Summary NestedTensors currenlty don't support non-identical strided addition. When accumulating grad it possible to try and accumulate a grad with different striding then the old var and there is no way to change this in user code. This is a solution.. probs should support strided addition for NT Pull Request resolved: https://github.com/pytorch/pytorch/pull/97195 Approved by: https://github.com/albanD, https://github.com/cpuhrsch	2023-03-22 03:34:47 +00:00
Joel Schlosser	77e73b9b7a	Refactor NT offsets metadata to be a Tensor (#96909 ) It's tedious work, but somebody's gotta do it. Benefits: * Enable access to offsets metadata from Python via private API (for validation, etc.) * Consistency with nested sizes / strides metadata * Needed for SymInt-ifying offsets metadata * more TBD Bonus: * Remove `_tensor` suffixes from metadata / getter names Pull Request resolved: https://github.com/pytorch/pytorch/pull/96909 Approved by: https://github.com/drisspg	2023-03-21 18:51:35 +00:00
Driss Guessous	a269e5fa04	Add forward and backward support for silu to NestedTensors (#97181 ) # Summary Add forward and backward support for silu to NestedTensors - Add forward support to silu - Add forward support to silu_ - Add backward support to silu - Add to NT docs - Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/97181 Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser	2023-03-20 23:46:12 +00:00
Driss Guessous	5612aa6acd	Fixes a layer_norm_nested backwards edge case. (#96788 ) # Summary Add Test and the fix for when input NT doesn't require grad to layernorm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96788 Approved by: https://github.com/cpuhrsch	2023-03-15 17:16:13 +00:00
Joel Schlosser	30d56dd8c1	Support randn_like() for NT (#96528 ) To satisfy an internal ask. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96528 Approved by: https://github.com/mikaylagawarecki, https://github.com/cpuhrsch	2023-03-13 19:39:51 +00:00
Joel Schlosser	024ea1a21e	Support zeros_like() for NT (#96527 ) This is used for the fake tensor fallbacks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96527 Approved by: https://github.com/cpuhrsch	2023-03-13 15:15:08 +00:00
Mikayla Gawarecki	6930f30ccd	Small bugfix in nested matmul bmm path head_dim acquisition (#95744 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95744 Approved by: https://github.com/drisspg	2023-03-01 03:27:08 +00:00
Joel Schlosser	68eec90cfd	Support elementwise add / mul for [B, ] nested, [B, 1] dense (CUDA only) (#95620 ) Small hack to reuse the 3D custom kernel from #88289 for [B, ] nested, [B, 1] dense elementwise add / mul. Simply treat the inputs as [B, *, 1], [B, 1, 1]. This is added to satisfy an internal ask. Future work: full general broadcasting support between mixed nested / dense. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95620 Approved by: https://github.com/cpuhrsch, https://github.com/drisspg	2023-02-27 21:07:09 +00:00
Huy Do	9b7abc4fac	Run slow gradcheck tests sequentially (#95494 ) Also redo https://github.com/pytorch/pytorch/pull/95246 as there are many more still run OOM Pull Request resolved: https://github.com/pytorch/pytorch/pull/95494 Approved by: https://github.com/clee2000	2023-02-26 00:44:25 +00:00
Driss Guessous	0d7913c9c1	add backwards for layer norm nested (#94781 ) Fixes #94702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94781 Approved by: https://github.com/cpuhrsch	2023-02-16 01:42:57 +00:00
Driss Guessous	63bf7674fa	add backwards for gelu and relu on nested tensors. (#94776 ) Fixes #94701 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94776 Approved by: https://github.com/cpuhrsch	2023-02-14 18:42:06 +00:00
Mikayla Gawarecki	c7c7238976	Fix bug in unsqueeze_nested stride calculation (#88688 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88688 Approved by: https://github.com/cpuhrsch	2023-02-10 17:00:04 +00:00
Driss Guessous	bef2483ed8	[NestedTensor] Call contiguous in linear backward (#94317 ) Fixes #94303 If in upward grad for linear_backward was discontiguous we would throw a torch check. This updates the implementation to instead call contiguous and changes the check to an internal assert. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94317 Approved by: https://github.com/mikaylagawarecki	2023-02-07 23:43:46 +00:00
Driss Guessous	df14650f0b	[SDPA] Update SDPA API and make function Public (#92189 ) # Summary In preparation for pt 2.0 launch this PR updates SDPA's API and makes the function a nn.funcitonal public function. ## Changes ### API Previously the the function signature was: `scaled_dot_product_attention(query, key, value, attn_mask=None, need_attn_weights=False, dropout_p=0.0, is_causal=False) -> (Tensor, Tensor)` Updated signature: `scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0, is_causal=False) -> Tensor` This PR removes the need_attn_weights optional boolean variable and updates the return type to a singular tensor. #### Reasoning: The main goal of this function is to provide an easy interface for users to call into fused attention kernels e.g. (FlashAttention). The fused kernels do not currently support arbitrary attn_mask or dropout but there is a PR to mem-efficient attention to enable these. We want to have the API surface ready for when the backing kernels get updated. The fused kernels save on memory usage by not materializing the weights and it is unlikely that a fast fused implementation will enable this feature so we are removing. Discussed with folks at FAIR/Xformers and +1 this API change. #### Make function Public In preparation for the pt 2.0 launch we make the function public to start to generate user feedback Pull Request resolved: https://github.com/pytorch/pytorch/pull/92189 Approved by: https://github.com/cpuhrsch	2023-01-23 20:50:46 +00:00
Mikayla Gawarecki	5848704ef8	Removed unecessary check in `select_nested` (#89150 ) Implementation in #88585 should work for all dimensions. Removed unnecessary check that constrained select to dims 0 and 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89150 Approved by: https://github.com/cpuhrsch	2022-11-16 22:11:37 +00:00
Antoni Viros i Martin	9f58e027a9	Add implementation for irregular dimension selection for nested tensors. (#88585 ) Summary: This diff modifies the implementation of the select operator so slices of the irregular dimension can be selected (e.g. nt[:,0,:]). Test Plan: Added new unit tests to test that the new functions work as intended (see them in diff). To test, `buck test mode/dev-nosan //caffe2/test:nested` Differential Revision: D41083993 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88585 Approved by: https://github.com/cpuhrsch	2022-11-09 00:19:38 +00:00
Antoni Viros i Martin	c77368d416	Implement a constructor for nested_tensor that is similar to torch.tensor() (#88213 ) Summary: This diff merges both previous implementations of constructors for nested tensors, the one from lists of tensors and the one with arbitrary python lists, adn implements it in pytorch core so no extensions are needed to construct NT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88213 Approved by: https://github.com/cpuhrsch	2022-11-08 00:03:18 +00:00
Christian Puhrsch	5e6ceebccb	Add support for neg to NestedTensor (#88131 ) Partially fixes #86889 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88131 Approved by: https://github.com/drisspg	2022-11-03 15:15:57 +00:00
Mikayla Gawarecki	d979caa87c	Added add/mul for nested dense [B, *, D], [B, 1, D] case (CUDA-only) (#88289 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88289 Approved by: https://github.com/cpuhrsch	2022-11-03 01:29:25 +00:00
Christian Puhrsch	943b20e7ae	Use tensor cores for NT bmm (#86856 ) Copy of internal diff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86856 Approved by: https://github.com/drisspg	2022-11-02 21:51:40 +00:00
PyTorch MergeBot	99c07735e4	Revert "Add support for neg to NestedTensor (#88131 )" This reverts commit `6a75a0d1a1`. Reverted https://github.com/pytorch/pytorch/pull/88131 on behalf of https://github.com/mehtanirav due to [Internal breakages](https://www.internalfb.com/intern/sandcastle/job/13510799692239080/insights)	2022-11-02 18:43:36 +00:00
Driss Guessous	560786ac20	call contiguous on BMM inputs for NT on CUDA (#88108 ) Fixes #87713 BMM for cpu supports non-contiguous nested tensor inputs, while BMM for Cuda does not support currently non-contiguous inputs. The derivative for BMM: ``` - name: bmm(Tensor self, Tensor mat2) -> Tensor self: grad.bmm(mat2.transpose(1, 2).conj()) mat2: self.transpose(1, 2).conj().bmm(grad) result: self_t.bmm(mat2_p) + self_p.bmm(mat2_t) ``` When calling backward it was impossible for this function to succeed since the inputs were always discontiguous, regardless of the user input. This adds contiguous calls to BMM_cuda implementation for nested tensors. This was not caught by tests because grad_check is currently only done on CPU in test_nestedtensors. This PR updates the autograd test to also be run on GPU. As a result I found one more issue with the backward for to_padded_tensor erroring instead of calling the generic version. cc @cpuhrsch @jbschlosser @bhosmer @mikaylagawarecki Pull Request resolved: https://github.com/pytorch/pytorch/pull/88108 Approved by: https://github.com/cpuhrsch	2022-11-01 03:14:27 +00:00
Christian Puhrsch	6a75a0d1a1	Add support for neg to NestedTensor (#88131 ) Partially fixes #86889 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88131 Approved by: https://github.com/drisspg	2022-11-01 02:37:42 +00:00
Christian Puhrsch	b192e7e415	Support non-contiguous NestedTensors for elementwise ops (#87888 ) Enables benchmarking of math path of sdp kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/87888 Approved by: https://github.com/drisspg	2022-10-28 11:26:17 +00:00
Antoni Viros i Martin	775fef51b7	Implement copy_, fill_, and ones_like for Nested Tensors backends (#87728 ) Summary: This diff implements copy_ in order to allow pinned memory transfers for nested tensors, as well as fill_ and ones_like, to test whether nested tensors can be created with other factory functions. Test Plan: Pass all CI and sandcastle jobs. Reviewed By: mikekgfb Differential Revision: D40689594 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87728 Approved by: https://github.com/cpuhrsch	2022-10-26 14:48:27 +00:00
Antoni Viros i Martin	d94e33f041	Add support for .to() for NestedTensor backends (#87146 ) Summary: This commit adds support for moving NestedTensors from CPU to GPU and back. The implementation includes requires implementing empty_like(), which is based on PR#83140. Test Plan: Added a new unit test based on the unit test for the main .to() implementation. All unit tests must pass, as well as every sandcastle job. Differential Revision: D40437585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87146 Approved by: https://github.com/drisspg	2022-10-20 03:46:50 +00:00
Christian Puhrsch	f6c6048b10	Use CUTLASS GEMM for NT bmm (#85894 ) Copy of https://github.com/pytorch/pytorch/pull/85710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85894 Approved by: https://github.com/drisspg	2022-10-18 23:11:47 +00:00
Mikayla Gawarecki	ab69550678	Add nested squeeze.dim and unsqueeze (#86813 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86813 Approved by: https://github.com/drisspg	2022-10-13 16:05:36 +00:00
PyTorch MergeBot	d169f950da	Revert "Use CUTLASS GEMM for NT bmm [OSS-only] (#85894 )" This reverts commit `ef58a132f2`. Reverted https://github.com/pytorch/pytorch/pull/85894 on behalf of https://github.com/DanilBaibak due to Break internal build	2022-10-13 15:28:09 +00:00
Mikayla Gawarecki	2a75152537	[easy] Add nested tanh (#86826 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86826 Approved by: https://github.com/cpuhrsch	2022-10-13 00:48:08 +00:00
Christian Puhrsch	ef58a132f2	Use CUTLASS GEMM for NT bmm [OSS-only] (#85894 ) OSS-only copy of https://github.com/pytorch/pytorch/pull/85710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85894 Approved by: https://github.com/drisspg	2022-10-12 20:03:28 +00:00
Driss Guessous	16f65f178a	Nested tensor forward only chunk operations (#85645 ) # Summary Taking over this pr: https://github.com/pytorch/pytorch/pull/83736 Adding support for chunk without autograd support Pull Request resolved: https://github.com/pytorch/pytorch/pull/85645 Approved by: https://github.com/cpuhrsch	2022-10-11 01:21:39 +00:00
Antoni Viros i Martin	cdbffa7f66	🦊 [AI Accelerators] Consolidate native_layer_norm for nested tensor (#86295 ) Summary: In order to make the layer normalization implementation for nested tensors public, it needs to be generalized to accept a normalized_shape argument instead of assuming it to be the last dimension of the nested_tensor. This commit does that, as well as adding extra unit tests to ensure the implementation is correct. Test Plan: All unit tests designed to test different ways of using the function work: `buck test //caffe2/test:nested -- test_layer_norm` Differential Revision: D40105207 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86295 Approved by: https://github.com/drisspg	2022-10-06 13:10:25 +00:00
Mikayla Gawarecki	01add6e288	Allow only one -1 in nested view/reshape (#85691 ) ### this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension Behavior before this PR: 1. `-1` allowed for implicit batch dimension 2. multiple `-1`s allowed for pre-existing dimensions 3. for new dimensions, `-1` is not allowed it is worth noting that for the most part 3 is basically unreachable because assuming a nested tensor has at least 1 ragged dimension, you would expect at least one -1 to be in the proposed shape for the pre-existing dimensions Behavior after this PR: 1. batch dimension must be specified 2. only one `-1` allowed for pre-existing dimensions this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension 3. unchanged Pull Request resolved: https://github.com/pytorch/pytorch/pull/85691 Approved by: https://github.com/cpuhrsch	2022-09-28 22:29:40 +00:00
Mikayla Gawarecki	afaee00fec	Add python `nested_tensor` and `as_nested_tensor` constructors in `torch.nested` (#85593 ) Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ). Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc. Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593 Approved by: https://github.com/drisspg, https://github.com/cpuhrsch	2022-09-28 20:15:02 +00:00
PyTorch MergeBot	fc8ba3a92d	Revert "Allow only one -1 in nested view/reshape (#85691 )" This reverts commit `4c4e5f6106`. Reverted https://github.com/pytorch/pytorch/pull/85691 on behalf of https://github.com/atalman due to Causes github first merge conflict	2022-09-28 17:22:53 +00:00
Mikayla Gawarecki	4c4e5f6106	Allow only one -1 in nested view/reshape (#85691 ) ### this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension Behavior before this PR: 1. `-1` allowed for implicit batch dimension 2. multiple `-1`s allowed for pre-existing dimensions 3. for new dimensions, `-1` is not allowed it is worth noting that for the most part 3 is basically unreachable because assuming a nested tensor has at least 1 ragged dimension, you would expect at least one -1 to be in the proposed shape for the pre-existing dimensions Behavior after this PR: 1. batch dimension must be specified 2. only one `-1` allowed for pre-existing dimensions this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension 3. unchanged Pull Request resolved: https://github.com/pytorch/pytorch/pull/85691 Approved by: https://github.com/cpuhrsch	2022-09-27 17:16:54 +00:00
Mikayla Gawarecki	5e700803c2	Use fallback approach for nested matmul (#85311 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85311 Approved by: https://github.com/cpuhrsch, https://github.com/drisspg	2022-09-22 21:19:09 +00:00
PyTorch MergeBot	caa0ab557d	Revert "Use fallback approach for nested matmul (#85311 )" This reverts commit `7c31f6e672`. Reverted https://github.com/pytorch/pytorch/pull/85311 on behalf of https://github.com/clee2000 due to broke lots of builds `7c31f6e672` even though the pr was green	2022-09-21 22:55:40 +00:00
Mikayla Gawarecki	7c31f6e672	Use fallback approach for nested matmul (#85311 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85311 Approved by: https://github.com/cpuhrsch, https://github.com/drisspg	2022-09-21 22:39:52 +00:00
Mikayla Gawarecki	77f1f98479	Re-introduce `torch.Tensor.to_padded_tensor` (#85293 ) Differential Revision: [D39629004](https://our.internmc.facebook.com/intern/diff/D39629004) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85293 Approved by: https://github.com/cpuhrsch	2022-09-21 18:45:56 +00:00
drisspg	bda8a5729b	[Nested Tensor] Create differentiable nt to tensor view functions (#83371 ) This PR attempts to implements 2) "the safe way" of creating a view of nested tensor that returns a regular tensor. The rest of the break down is here: https://fb.quip.com/J8QCAx41af11 https://gist.github.com/drisspg/8622e9c97d374fa920ac647e1167cabc This is a short list of some edge cases. After some more work I was able to address two of the test cases in the above gist. There are few complex aspects here that I left defeated comments inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83371 Approved by: https://github.com/bdhirsh	2022-09-13 20:35:58 +00:00
Mikayla Gawarecki	e217b30b0f	Add `torch.nested` namespace (#84102 ) First step towards #83775 - only `to_padded_tensor` is moved to the nested namespace for now - following the schema used for `special`, `fft`, `linalg` and other namespaces, nested functions are registered in native_functions.yaml as `nested_{function_name}` and are bound to the desired Python name in `torch/nested/__init__.py`, and the desired C++ name in `torch/csrc/api/include/torch/nested.h`. ~~Question: should we keep the documentation for `Tensor.to_padded_tensor` or can this deleted since it is shared by `torch.nested.to_padded_tensor`?~~ [generated nested docs](https://docs-preview.pytorch.org/84102/nested.html?highlight=nested#module-torch.nested) Differential Revision: [D39361148](https://our.internmc.facebook.com/intern/diff/D39361148) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84102 Approved by: https://github.com/drisspg	2022-09-12 16:31:05 +00:00
Mikayla Gawarecki	1cad744694	Enable select.int when NestedTensor requires grad (#83875 ) Previously indexing a nested tensor when it requires_grad would raise an error because the backward formula for `select.int` uses `self.sizes()`. This PR fixes that by temporarily registering a _nested_select_backward function which can be removed when we start using the symint approach to register kernels. For now this functionality is needed for creating a POC that nested tensor can be an API to `segment_coo` and `segment_csr` in the torch_scatter repo ``` a = torch.arange(10).reshape(2, 5).float() b = torch.arange(12).reshape(2, 6).float() nt = torch.nested_tensor([a, b], dtype=torch.float).requires_grad_(True) nt[0] # RuntimeError: Internal error: NestedTensorImpl doesn't support sizes. Please file an issue on https://github.com/pytorch/nestedtensor ``` whereas ``` nt = torch.nested_tensor([a, b], dtype=torch.float).requires_grad_(False) nt[0] ``` would succeed Pull Request resolved: https://github.com/pytorch/pytorch/pull/83875 Approved by: https://github.com/albanD, https://github.com/drisspg	2022-09-06 22:19:32 +00:00
Driss Guessous	f803fa9fc9	[Nested Tensor] Add a NestedTensorUtils header and cpp file for organization (#84385 ) # Summary Trying to do some clean up into code structure for nested tensors. This introduces a utility header and cpp file that implements helper functions. This is the initial PR in more clean up. The next would be separating out the all native functions that create nested tensors into their own file since they do not infact do math on nested tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84385 Approved by: https://github.com/mikaylagawarecki	2022-09-02 16:31:55 +00:00
YifanShenSZ	673b35c847	Better reshape with autograd support (#82754 ) (#84154 ) The original author is @YifanShenSZ and the original PR is: #82754 # Summary: Previous reshape [https://github.com/pytorch/pytorch/issues/80981](https://github.com/pytorch/pytorch/pull/80981) is ok for forward, but needs improvement for backward: need to handle "sometimes view sometimes copy" behavior. This pull request fixes it by: 1. add a new alias dispatch key `CompositeImplicitAutogradNestedTensor`, which ideally would work as nested-tensor version of `CompositeImplicitAutograd` 2. register `reshape_nested` to `reshape` by `CompositeImplicitAutogradNestedTensor` Side changes: * add contiguous memory format support to `clone_nested` * add `view_nested` * add `reshape_as_nested` Fix issue [https://github.com/pytorch/pytorch/issues/83041](https://github.com/pytorch/pytorch/issues/83041) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82754 Test Plan: Imported from GitHub, without a `Test Plan:` line. Static Docs Preview: executorch \|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D39023822/V13/executorch/)\| \|Modified Pages\| Reviewed By: albanD Differential Revision: D39023822 Pulled By: drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/84154 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2022-09-01 20:01:39 +00:00
Driss Guessous	71369051ee	[Nested Tensor] fix from_padded bug (#84217 ) Fixes #84082 Explained in the issue that the problem was arising from grad being not contiguous and the fast kernel not handiling this case gracefully. The other thing I can do is add a contiguous call to `d144594512/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cpp (L45)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84217 Approved by: https://github.com/albanD	2022-08-30 03:48:11 +00:00
Driss Guessous	2436cf8aa8	[Nested Tensor] detach (#84078 ) ## Summary Add detach op for nested tensors. Nested tensors are not part of the composite explicit dispatch key set and therefore need to be added manually. The Detach test is failing only for the dtype=torch.float32, torch.float16 and device=cuda. The chain of ops that called are sum.backward() -> from_padded() -> unbind(). This populates the grad for a and b. Does this potentially indicated that cuda implementation for one of these ops, likely from_padded() is incorrect? Pull Request resolved: https://github.com/pytorch/pytorch/pull/84078 Approved by: https://github.com/albanD	2022-08-29 09:12:26 +00:00
PyTorch MergeBot	f4f54c7ce1	Revert "[Nested Tensor] detach (#84078 )" This reverts commit `092fe71f33`. Reverted https://github.com/pytorch/pytorch/pull/84078 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-28 15:30:21 +00:00
Driss Guessous	092fe71f33	[Nested Tensor] detach (#84078 ) ## Summary Add detach op for nested tensors. Nested tensors are not part of the composite explicit dispatch key set and therefore need to be added manually. The Detach test is failing only for the dtype=torch.float32, torch.float16 and device=cuda. The chain of ops that called are sum.backward() -> from_padded() -> unbind(). This populates the grad for a and b. Does this potentially indicated that cuda implementation for one of these ops, likely from_padded() is incorrect? Pull Request resolved: https://github.com/pytorch/pytorch/pull/84078 Approved by: https://github.com/albanD	2022-08-27 03:00:55 +00:00
Yifan Shen	b3c99bef0c	Support nested dropout autograd (#83338 ) When the initial version came out, `NestedTensor` was not included in the `CompositeImplicitAutograd` key set, so we had to register dropout_nested to dropout and make it forward-only. Now is the time to improve it! This pr removes dropout_nested; instead native_dropout_nested is implemented along with native_dropout_backward_nested. Side change: remove dropout__nested since @cpuhrsch suggested to leave out nested in-place ops for now Pull Request resolved: https://github.com/pytorch/pytorch/pull/83338 Approved by: https://github.com/jbschlosser	2022-08-18 00:49:29 +00:00
Mikayla Gawarecki	bd0ad7a84f	Add backward support for rudimentary NestedTensor.sum(dim) (#82625 ) Per offline discussion, this will be updated to use expand once expand semantics for nested tensor have been fleshed out. Next steps will be to add support for other features for forward sum mentioned on #82387 and likewise update the backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/82625 Approved by: https://github.com/albanD	2022-08-17 18:12:00 +00:00
Driss Guessous	4b597019b7	[Nested Tensor] Created Nested Tensor to Nested Tensor Views (#82658 ) # Summary This is PR is pulling out all the changes from #81838 specific to properly creating nested_tensor views. I will update this comment with a design doc once that has been made. This should enable proper creation of NestedTensor views, two nested_tensors sharing the same buffer_ but with different NestedTensor meta data. The function `create_nested_tensor_view` is a helper function for creating a new nested tensor whose storage aliases the base causing the underlying storage to be shared - and is therefore a view. This function by itself is not differentiable and therefore autograd does not track its uses. If a nested tensor function implementation uses this helper in its implementation the aten_op must meet two requirements: - The function must return a view of the input - The function must be explicit and defines its backward ## Testing A bug was found when creating a base tensor out of inference mode and then creating a view in inference mode. This test has been aded to this PR in order to show the effect of the change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82658 Approved by: https://github.com/albanD	2022-08-16 20:22:21 +00:00
Driss Guessous	c5c0dd9b62	Update shallow_copy_and_detach for nested tensor impls (#83002 ) # Summary This change fixes a bug that was encountered when trying to add more backward formulas for nested tensor ops. If a derivative is defined that stores the "result" for use in the backward the output of the forward op is saved using: ``` if (grad_fn) { grad_fn->result_ = SavedVariable(result, true); } ``` SavedVariable calls a series of functions which in turn calls shallow_copy_and_detach and when `c179597753/c10/core/TensorImpl.cpp (L533)` is hit this calls sizes_custom() which is not implemented and errors. I also noticed that since the storage format is different for nested_tensor not `storage_ ` but instead two tensors that the we should actually be calling the NestedTensorImpl constructor. This PR overrides shallow_copy_and_detach from the derived class and ensures that shallow copy works correctly. ## Update - Added the softmax derivative in this PR because that is a direct use case that was blocked by not having shallow_copy_and_detach work correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83002 Approved by: https://github.com/albanD	2022-08-10 20:34:46 +00:00
Driss Guessous	e816644495	Add nested tensor contiguous (#82147 ) ### Description <!-- What did you change and why was it needed? --> The nested_tensor impl for `contiguous` was currently disabled. Prior to the work on nested_tensor transpose. Only contiguous nested tensors could be created from python. However now is possible to create nested tensors that are non contiguous. This pr links up the existing function used at the c++ level to the python function. ### Tests Updated Test in `test/test_nestedtensor.py` ### Notes The inference mode had to be removed for this test. This is because the func `.contiguous` is a composite implicit function. Currently this does not work in inference mode. However: https://github.com/pytorch/pytorch/pull/81838 should fix that issue. ### Why When writing kernels in Triton for nested tensors I exposed a helper function that returned the "Buffer" tensor to python. Now contiguity can be checked before running any triton kernel. Also a good follow up would be making `nt.contiguous` on non contiguous nested tensors return a contiguous nested tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82147 Approved by: https://github.com/jbschlosser	2022-08-09 01:51:37 +00:00
Joel Benjamin Schlosser	6ca95547ac	Initial private SDP interface and naive composite impl (#81956 ) Adds an initial private API version of the SDP interface. Signature: ``` _scaled_dot_product_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool need_attn_weights=True, bool is_causal=False) -> (Tensor, Tensor) ``` Returns a tuple of `(output, attn_weights)`. Note the following: * `need_attn_weights`: flag indicating that attention weights should be computed. This is useful to toggle off for flash attention as it does not materialize the weights by default, making it more expensive to return them. * Boolean attention mask support only; `True` values within `attn_mask` indicate that the element should take part in attention (notably, this is reverse of MHA, which uses `True` to mask out values). Mask is optional. * `is_causal`: Temporary flag indicating whether to use a causal attention weighting. If this is set to `True`, it takes precedent over any value passed in for `attn_mask`. Longer term, the `is_causal` flagging can be subsumed into the `attn_mask` arg via tensor subclassing (see e.g. [CausalTensor](https://github.com/facebookresearch/xformers/blob/sparse_cleanup/xformers/sparse/causal_tensor.py) in xFormers). * Testing is currently done via reference with the existing Python impl of `F._scaled_dot_product_attention`. * This PR does not yet drop-in the new SDP anywhere. A future PR can hook it up in BT or MHA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81956 Approved by: https://github.com/drisspg, https://github.com/erichan1	2022-08-01 22:26:18 +00:00
YifanShenSZ	4bb7e148c4	add nested tensor matmul support (#81957 ) There was a discussion on whether letting nested tensor `reshape` support collapsing and splitting dimension 0. The conclusion was to make reshape simple, so we need a tweaked `matmul`, which only supports 3+ dimension nonbroadcast case, i.e. a generalized `bmm`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81957 Approved by: https://github.com/jbschlosser	2022-07-30 22:35:09 +00:00
YifanShenSZ	5f9939f65e	Introduce discontinuity to nested tensor (#80981 ) Nested tensor used to assume the buffer memory to be contiguous. However, some operations can break that assumption: * reshape * transpose * slice To be able to access underlying tensors from discontinuous buffer, we need 3 metadata: * sizes of each tensor (`nested_size_tensor_`) * strides of each tensor (`nested_stride_tensor_`) * offset of each tensor (`offsets_`) so we access each tensor by `buffer.as_strided(size, stride, offset)` This pull request introduces the offsets metadata, then added reshape and transpose so that we can create discontinuous cases for testing. Unbind, select, dropout, softmax, bmm are refactored to provide tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80981 Approved by: https://github.com/jbschlosser	2022-07-30 04:08:30 +00:00
Mikayla Gawarecki	89c0123ba0	Add rudimentary NestedTensor.sum(dim) (#82387 ) A first step towards adding dimension-wise reductions to NestedTensor, - Assumes tensors in the nested tensor as well as the buffer of the nested tensor are contiguous - Always enforces `keepdim=True` - Only supports reduction across the last dimension - No support for acctype (`dtype` argument) - No autograd support - CPU only Next steps would be to add support for the above. For now this basic support is for prototyping to make sure `NestedTensor` can be used as an API for segment reductions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82387 Approved by: https://github.com/jbschlosser	2022-07-28 22:45:22 +00:00
PyTorch MergeBot	26776d628c	Revert "Initial private SDP interface and naive composite impl (#81956 )" This reverts commit `f15c5bf133`. Reverted https://github.com/pytorch/pytorch/pull/81956 on behalf of https://github.com/janeyx99 due to broke all configs on test_scaled_dot_product_attention (__main__.TestNestedTensorAutograd) `f15c5bf133`	2022-07-27 18:36:54 +00:00
Joel Benjamin Schlosser	f15c5bf133	Initial private SDP interface and naive composite impl (#81956 ) Adds an initial private API version of the SDP interface. Signature: ``` _scaled_dot_product_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool need_attn_weights=True, bool is_causal=False) -> (Tensor, Tensor) ``` Returns a tuple of `(output, attn_weights)`. Note the following: * `need_attn_weights`: flag indicating that attention weights should be computed. This is useful to toggle off for flash attention as it does not materialize the weights by default, making it more expensive to return them. * Boolean attention mask support only; `True` values within `attn_mask` indicate that the element should take part in attention (notably, this is reverse of MHA, which uses `True` to mask out values). Mask is optional. * `is_causal`: Temporary flag indicating whether to use a causal attention weighting. If this is set to `True`, it takes precedent over any value passed in for `attn_mask`. Longer term, the `is_causal` flagging can be subsumed into the `attn_mask` arg via tensor subclassing (see e.g. [CausalTensor](https://github.com/facebookresearch/xformers/blob/sparse_cleanup/xformers/sparse/causal_tensor.py) in xFormers). * Testing is currently done via reference with the existing Python impl of `F._scaled_dot_product_attention`. * This PR does not yet drop-in the new SDP anywhere. A future PR can hook it up in BT or MHA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81956 Approved by: https://github.com/drisspg, https://github.com/erichan1	2022-07-27 15:41:45 +00:00
PyTorch MergeBot	500be5998d	Revert "Introduce discontinuity to nested tensor (#80981 )" This reverts commit `b492f7c485`. Reverted https://github.com/pytorch/pytorch/pull/80981 on behalf of https://github.com/osalpekar due to This was reverted internally in D38142790, due to causing TorchScript inference failures	2022-07-26 21:40:42 +00:00
PyTorch MergeBot	0b0dbc59e6	Revert "Update shallow_copy_and_detach for nested tensor impls to enable nested tensor softmax backward (#81838 )" This reverts commit `6697f1e467`. Reverted https://github.com/pytorch/pytorch/pull/81838 on behalf of https://github.com/osalpekar due to Reverting this in order to revert https://github.com/pytorch/pytorch/pull/80981 cleanly. That diff caused GPU Inference breakage internally	2022-07-26 21:34:10 +00:00
PyTorch MergeBot	6c10a598ca	Revert "add nested tensor matmul support (#81957 )" This reverts commit `7bdafed4f1`. Reverted https://github.com/pytorch/pytorch/pull/81957 on behalf of https://github.com/osalpekar due to Reverting this in order to revert https://github.com/pytorch/pytorch/pull/80981 cleanly. That diff caused GPU Inference breakage internally	2022-07-26 21:10:28 +00:00
YifanShenSZ	7bdafed4f1	add nested tensor matmul support (#81957 ) There was a discussion on whether letting nested tensor `reshape` support collapsing and splitting dimension 0. The conclusion was to make reshape simple, so we need a tweaked `matmul`, which only supports 3+ dimension nonbroadcast case, i.e. a generalized `bmm`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81957 Approved by: https://github.com/jbschlosser	2022-07-26 16:58:42 +00:00
Driss Guessous	6697f1e467	Update shallow_copy_and_detach for nested tensor impls to enable nested tensor softmax backward (#81838 ) # Summary This change fixes a bug that was encountered when trying to add more backward formulas for nested tensor ops. If a derivative is defined that stores the "result" for use in the backward the output of the forward op is saved using: ``` if (grad_fn) { grad_fn->result_ = SavedVariable(result, true); } ``` SavedVariable calls a series of functions which in turn calls shallow_copy_and_detach and when `c179597753/c10/core/TensorImpl.cpp (L533)` is hit this calls sizes_custom() which is not implemented and errors. I also noticed that since the storage format is different for nested_tensor not `storage_ ` but instead two tensors that the we should actually be calling the NestedTensorImpl constructor. This PR overrides shallow_copy_and_detach from the derived class and ensures that shallow copy works correctly. ## Update - Added the softmax derivative in this PR because that is a direct use case that was blocked by not having shallow_copy_and_detach work correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81838 Approved by: https://github.com/soulitzer	2022-07-25 20:04:40 +00:00
Yifan Shen	b492f7c485	Introduce discontinuity to nested tensor (#80981 ) Nested tensor used to assume the buffer memory to be contiguous. However, some operations can break that assumption: * reshape * transpose * slice To be able to access underlying tensors from discontinuous buffer, we need 3 metadata: * sizes of each tensor (`nested_size_tensor_`) * strides of each tensor (`nested_stride_tensor_`) * offset of each tensor (`offsets_`) so we access each tensor by `buffer.as_strided(size, stride, offset)` This pull request introduces the offsets metadata, then added reshape and transpose so that we can create discontinuous cases for testing. Unbind, select, dropout, softmax, bmm are refactored to provide tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80981 Approved by: https://github.com/jbschlosser	2022-07-21 17:17:25 +00:00
Driss Guessous	fca1523604	implement numel and tests for nested tensor (#80424 ) Add numel implementation for Nested Tensor. Currently the construction of nested size and nested_strides assume contiguous. This implementation was based off of the safe_compute_numel(). Having a TORCH_CHECK in a for loop kinda feels bad but I don't really know how performant numel needs to be. Since nested size is stored as a tensor: `nested_size_tensor().cumprod(dim=1).sum(dim=0)[1].item() ` Would also get the job done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80424 Approved by: https://github.com/cpuhrsch	2022-06-28 18:02:44 +00:00
drisspg	2a09e95169	Register nested tensor linear kernel (#80397 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80397 Approved by: https://github.com/soulitzer	2022-06-28 06:23:26 +00:00
Christian Puhrsch	2258db5da3	TensorImpl:::size_custom to support NestedTensor.size (#80236 ) This allows subclasses such as NestedTensorImpl to provide special behavior for `int64_t size(int64_t d)` that'll also be accessible by our Python frontend. It follows the same pattern as sizes_custom. Currently getting CI before asking for a review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80236 Approved by: https://github.com/ezyang	2022-06-27 17:07:42 +00:00
Yifan Shen	09f79e94ac	support nested_tensor * scalar (#80284 ) In transformer, the scale step in attention has a `nested_tensor / scalar` operation. There are two ways to support that: 1. directly support `nested_tensor / scalar`: * pro: straightforward, good UX * con: is dispatching `mul(nested tensor, regular tensor)` a good practice? 2. let user manually convert `scalar` to `nested_scalar = torch.nested_tensor([broadcast_scalar])` * pro: dispatcher only has to deal with `mul(nested tensor, nested tensor)` * con: confusing manual conversions, bad UX Pull Request resolved: https://github.com/pytorch/pytorch/pull/80284 Approved by: https://github.com/cpuhrsch	2022-06-27 14:15:05 +00:00
Yifan Shen	fc0faa2cf6	Support nested_tensor.bmm (#80224 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80224 Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser	2022-06-25 03:19:46 +00:00
Yifan Shen	54a1cc5246	Support softmax(nested tensor) (#80179 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80179 Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser	2022-06-24 14:50:24 +00:00
Yifan Shen	f749f86fee	Add nested tensor metadata nested_stride then use it in unbind, select (#79831 ) 2 reasons to add metadata `nested_stride`: 1. it will be used later in `reshape` and `transpose` 2. it reduces the computation to get offsets and shapes necessary in `unbind`-like codes, which will be used again and again in nested tensor operations `unbind` and `select` are refactored to make use of `nested_stride` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79831 Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser	2022-06-23 20:24:50 +00:00
Driss Guessous	a098937c20	Add factory function derivatives (#79872 ) Adding derivatives for factory functions, this issue is used for tracking: #79044 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79872 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2022-06-21 00:53:11 +00:00
Edward Z. Yang	f7ee061638	Wconstab/reland pysymint (#79795 ) rebased https://github.com/pytorch/pytorch/pull/79617/ to see if issues are reproducible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79795 Approved by: https://github.com/malfet	2022-06-20 22:55:06 +00:00
Yifan Shen	1b25aa6786	Support dropout(nested tensor) (#79318 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79318 Approved by: https://github.com/jbschlosser	2022-06-17 18:41:54 +00:00
PyTorch MergeBot	8a7a5def1d	Revert "Support dropout(nested tensor) (#79318 )" This reverts commit `1211ab679c`. Reverted https://github.com/pytorch/pytorch/pull/79318 on behalf of https://github.com/janeyx99 due to Broke dropout tests on trunk, also errors on PR	2022-06-17 04:56:29 +00:00
Yifan Shen	1211ab679c	Support dropout(nested tensor) (#79318 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79318 Approved by: https://github.com/jbschlosser	2022-06-17 00:46:07 +00:00
drisspg	f9656817df	Add nested tensor support to autograd (#79446 ) The issue that is tracking this work is: #79447 This is one in a series of PRs to add autograd support for nested tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79446 Approved by: https://github.com/soulitzer	2022-06-16 21:09:17 +00:00
PyTorch MergeBot	44436947bc	Revert "Reland PySymInt (#79617 )" This reverts commit `8ef6356f26`. Reverted https://github.com/pytorch/pytorch/pull/79617 on behalf of https://github.com/zengk95 due to this is breaking periodic jobs (and maybe pull) on trunk	2022-06-16 19:40:27 +00:00
Nikolay Korovaiko	8ef6356f26	Reland PySymInt (#79617 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/79617 Approved by: https://github.com/Chillee	2022-06-16 04:18:06 +00:00
PyTorch MergeBot	b8db0a0475	Revert "Python Bindings for SymInts (#78135 )" This reverts commit `d332724071`. Reverted https://github.com/pytorch/pytorch/pull/78135 on behalf of https://github.com/ezyang due to broke torchvision tests	2022-06-15 13:52:14 +00:00
Nikolay Korovaiko	d332724071	Python Bindings for SymInts (#78135 ) This PR adds support for `SymInt`s in python. Namely, * `THPVariable_size` now returns `sym_sizes()` * python arg parser is modified to parse PyObjects into ints and `SymbolicIntNode`s * pybind11 bindings for `SymbolicIntNode` are added, so size expressions can be traced * a large number of tests added to demonstrate how to implement python symints. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78135 Approved by: https://github.com/ezyang	2022-06-14 02:17:59 +00:00
YifanShenSZ	6ad51c9422	Support indexing of the underlying tensors for nested tensors (#78934 ) Fixes #76843 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78934 Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser	2022-06-08 21:05:04 +00:00

... 2 3 4 5 6 ...

360 Commits