pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Antoni Viros	eca0cb0fbe	Conversions between strided and jagged layouts for Nested Tensors (#115749 ) This PR does 3 things: 1. Adds a copy-free strided->jagged layout conversion for NT 2. Adds a copy-free jagged->strided layout conversion for NT 3. Modifies and expands the .to() API to support the layout argument for the specific case of NT layout conversion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115749 Approved by: https://github.com/jbschlosser	2024-08-05 23:45:48 +00:00
Aaron Gokaslan	fd4b649e6c	[BE]: Simplify some list comps to generators C419 (#132578 ) Simplifies some list comprehensions to generator which is more efficient. Automatically applied diffs for the most part with ruff Pull Request resolved: https://github.com/pytorch/pytorch/pull/132578 Approved by: https://github.com/ezyang	2024-08-04 17:46:26 +00:00
Xuehai Pan	4226ed1585	[BE] Format uncategorized Python files with `ruff format` (#132576 ) Remove patterns ``, `test/`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: #132574	2024-08-04 17:13:31 +00:00
Joel Schlosser	a356a03f4a	Fix DEBUG=1 asserts for mvlgamma backward with NJT (#132422 ) mvlgamma backward trips DEBUG=1 asserts when trying to construct an empty tensor with `layout=torch.jagged`. This happens due to passing `self.options()` to `arange()` in `mvlgamma_backward()`. Fix in this PR unconditionally constructs `arange()` with the strided layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132422 Approved by: https://github.com/albanD	2024-08-01 21:53:16 +00:00
Oguz Ulgen	221350e3a4	Add None return type to init -- tests (#132352 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132352 Approved by: https://github.com/ezyang ghstack dependencies: #132335, #132351	2024-08-01 15:44:51 +00:00
Joel Schlosser	7eb2a99585	Fix to support unary pointwise ops when an NJT is not the first arg (#131937 ) Background: NJT utilizes a `jagged_unary_pointwise()` fallback that historically has assumed blindly that the first arg is an NJT. This assumption breaks certain ops; for example `pow(scalar, Tensor)` has an NJT as the second arg. This PR expands `jagged_unary_pointwise()` and the associated schema validation logic to handle an NJT in args other than the first position. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131937 Approved by: https://github.com/soulitzer ghstack dependencies: #131898, #131704	2024-07-31 17:51:03 +00:00
Janani Sriram	46994e753b	[NestedTensor] Integrate the layer normalization operator along the jagged dimension into NestedTensor (#132172 ) Modify the existing `layer normalization` operator in PyTorch, invoked by `torch.layer_norm`, to allow for reductions along the jagged dimension of a nested tensor. The function originally had a basic implementation for reducing along 1 non-ragged dimension. This diff, which uses the `aten` padding operator, enables PyTorch users to invoke `torch.nn.functional.layer_norm` on a nested tensor when reducing along the ragged dimension, e.g. `` in a `(B, , M)` or `(B, *, M, N)` nested tensor. Write unit tests based on the `softmax` jagged operator to verify the accuracy of the ragged reduction implementation for `torch.nn.functional.layer_norm`. Add unit tests to verify error handling for unsupported features. Note that this implementation is limited to nested tensors with `ragged_idx == 1`, i.e. the ragged dimension is not transposed. The layer normalization operator also requires an operation on a 2-dimensional layer; for nested tensors with 4 or more dimensions, I flatten the extra dimensions, then unflatten them after performing layer normalization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132172 Approved by: https://github.com/davidberard98 ghstack dependencies: #132170	2024-07-31 10:51:46 +00:00
Janani Sriram	89053e382a	[NestedTensor] Integrate the softmax operator along the jagged dimension into NestedTensor (#132170 ) Modify the existing `softmax` operator in PyTorch, invoked by `torch.softmax`, to allow for reductions along the jagged dimension of a nested tensor. The function originally had a basic implementation for reducing along 1 non-ragged dimension. This diff, which uses the aten padding operator, enables PyTorch users to invoke `torch.softmax` on a nested tensor when reducing along the ragged dimension, e.g. `` in a `(B, , M)` nested tensor. Write unit tests based on the `sum` and `mean` jagged operators to verify the accuracy of the ragged reduction implementation for `torch.softmax`. Add unit tests to verify error handling for unsupported features in `NestedTensor` `torch.softmax`. Note that this implementation is limited to nested tensors with `ragged_idx == 1`, i.e. the ragged dimension is not transposed. In addition, the `softmax` operator is required to take in as input an integer for the reduction dimension `dim`, requiring new unit tests heavily inspired by the `sum` and `mean` jagged operator unit tests. `Softmax` also allows for reducing along the batch dimension. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132170 Approved by: https://github.com/davidberard98	2024-07-31 10:51:46 +00:00
Joel Schlosser	524aac413c	Initial OpInfo-based testing for NJTs (#131704 ) This PR utilizes the info from the existing OpInfo database `op_db` to contribute to general NJT testing. * New tests in `TestNestedTensorOpInfo` * `test_forward()` - compares forward output to an unbind-based reference * `test_backward()` - compares forward output and grads to an unbind-based reference * `test_forward_compile()` - compares forward compile output (`backend="aot_eager_decomp_partition"`) to eager * `test_backward_compile()` - compares forward compile output (`backend="aot_eager_decomp_partition"`) and grads to eager * To avoid adding a bunch of NJT-specific stuff to the `OpInfo` structure, this PR translates `op_db` -> a NJT-specific `njt_op_db`. * `UnaryUfuncInfo`s utilize a new `sample_inputs_unary_njt_pointwise()` which iterates through a comprehensive list of NJTs: contiguous / non-contiguous, dims 2, 3, and 4, transposed / not, etc. * `BinaryUfuncInfo`s utilize a new `sample_inputs_binary_njt_pointwise()` which iterates through a comprehensive list of NJTs: contiguous / non-contiguous, dims 2, 3, and 4, transposed / not, etc. * `ReductionOpInfo`s utilize a new `sample_inputs_njt_reduction()` which covers full reductions, reductions over the jagged dim, and reductions over the non-jagged dim * Several xfails were added to get things passing TODO (future PRs): * Pass non-contiguous / non-contiguous with holes NJTs (maybe we should have separate tests for these? most ops don't support NJTs with holes today) * Mixed (NT, T), (T, NT) inputs for binary ops * Handle other types of OpInfos (beyond unary pointwise, binary pointwise, and reduction) by manually by writing sample_inputs_funcs * Address all xfails via fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/131704 Approved by: https://github.com/soulitzer ghstack dependencies: #131898	2024-07-30 23:02:24 +00:00
Joel Schlosser	d53b11bb6e	Strict shape checking for NJTs with TestCase.assertEqual() (#131898 ) Background: `TestCase.assertEqual()` is commonly used during test case validation. Historically, to support NSTs, the logic was written to compare two nested tensors by unbinding them and comparing their components. This logic applied to NJTs as well, which in practice meant that two NJTs with different nested ints in their shapes could compare equal if their components were equal. This PR changes the above logic so that NJTs are no longer unbound during comparison, allowing them to receive full shape validation. This makes `TestCase.assertEqual()` stricter for NJTs, requiring them to have the same nested ints in their shapes to compare equal. Note that some tests rely on the old, looser behavior. To address this, the PR introduces a base `NestedTensorTestCase` that defines a helper function `assertEqualIgnoringNestedInts()` so that these tests can explicitly opt in to the looser comparison behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131898 Approved by: https://github.com/soulitzer	2024-07-30 20:05:48 +00:00
Animesh Jain	f806128619	[dynamo] Skip <frozen abc> to skip __isisintance__ check on abc objects (#131956 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131956 Approved by: https://github.com/williamwen42, https://github.com/mlazos ghstack dependencies: #131827	2024-07-30 05:49:58 +00:00
PyTorch MergeBot	7a7dd8c29e	Revert "[NestedTensor] Integrate the softmax operator along the jagged dimension into NestedTensor (#131518 )" This reverts commit `bcf5c68c18`. Reverted https://github.com/pytorch/pytorch/pull/131518 on behalf of https://github.com/ZainRizvi due to Sorry, reverting this since this is based on an internal diff that has diverged from actual internal commit (the final PR and diff must always be identical). Conflicts arise when that happens which block the diff train. Let's revert both this PR and the internal diff, and then reland them as a proper new codev diff ([comment](https://github.com/pytorch/pytorch/pull/131518#issuecomment-2257259839))	2024-07-30 00:55:10 +00:00
PyTorch MergeBot	be5e44192d	Revert "[NestedTensor] Integrate the layer normalization operator along the jagged dimension into NestedTensor (#131519 )" This reverts commit `8fe2bf212d`. Reverted https://github.com/pytorch/pytorch/pull/131519 on behalf of https://github.com/ZainRizvi due to Sorry, reverting this since this is based on an internal diff that has diverged from actual internal commit. Weird conflicts arise when that happens. Let's revert both this PR and the internal diff, and then reland them as a proper new codev diff ([comment](https://github.com/pytorch/pytorch/pull/131519#issuecomment-2257230717))	2024-07-30 00:18:22 +00:00
yuqingj	e3dc20c94b	[NJT] support cat backward (#132076 ) cat_tensors_backward use narrow_symint, so we need to support aten::narrow for NJT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132076 Approved by: https://github.com/davidberard98	2024-07-29 23:49:26 +00:00
Janani Sriram	8fe2bf212d	[NestedTensor] Integrate the layer normalization operator along the jagged dimension into NestedTensor (#131519 ) Modify the existing `layer normalization` operator in PyTorch, invoked by `torch.layer_norm`, to allow for reductions along the jagged dimension of a nested tensor. The function originally had a basic implementation for reducing along 1 non-ragged dimension. This diff, which uses the `aten` padding operator, enables PyTorch users to invoke `torch.nn.functional.layer_norm` on a nested tensor when reducing along the ragged dimension, e.g. `` in a `(B, , M)` or `(B, *, M, N)` nested tensor. Write unit tests based on the `softmax` jagged operator to verify the accuracy of the ragged reduction implementation for `torch.nn.functional.layer_norm`. Add unit tests to verify error handling for unsupported features. Note that this implementation is limited to nested tensors with `ragged_idx == 1`, i.e. the ragged dimension is not transposed. The layer normalization operator also requires an operation on a 2-dimensional layer; for nested tensors with 4 or more dimensions, I flatten the extra dimensions, then unflatten them after performing layer normalization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131519 Approved by: https://github.com/davidberard98 ghstack dependencies: #131518	2024-07-29 22:16:32 +00:00
PyTorch MergeBot	8cdfdb41bc	Revert "[NestedTensor] Integrate the layer normalization operator along the jagged dimension into NestedTensor (#131519 )" This reverts commit `f862f45730`. Reverted https://github.com/pytorch/pytorch/pull/131519 on behalf of https://github.com/atalman due to broke CI: test_nestedtensor.py::TestNestedTensorSubclassCPU::test_layer_norm_with_lengths_requires_grad_False_components_require_grad_False_cpu_float32 [GH job link](https://github.com/pytorch/pytorch/actions/runs/10121747545/job/27996722731) [HUD commit link](`f862f45730`) ([comment](https://github.com/pytorch/pytorch/pull/131519#issuecomment-2254167994))	2024-07-27 14:45:47 +00:00
Janani Sriram	f862f45730	[NestedTensor] Integrate the layer normalization operator along the jagged dimension into NestedTensor (#131519 ) Modify the existing `layer normalization` operator in PyTorch, invoked by `torch.layer_norm`, to allow for reductions along the jagged dimension of a nested tensor. The function originally had a basic implementation for reducing along 1 non-ragged dimension. This diff, which uses the `aten` padding operator, enables PyTorch users to invoke `torch.nn.functional.layer_norm` on a nested tensor when reducing along the ragged dimension, e.g. `` in a `(B, , M)` or `(B, *, M, N)` nested tensor. Write unit tests based on the `softmax` jagged operator to verify the accuracy of the ragged reduction implementation for `torch.nn.functional.layer_norm`. Add unit tests to verify error handling for unsupported features. Note that this implementation is limited to nested tensors with `ragged_idx == 1`, i.e. the ragged dimension is not transposed. The layer normalization operator also requires an operation on a 2-dimensional layer; for nested tensors with 4 or more dimensions, I flatten the extra dimensions, then unflatten them after performing layer normalization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131519 Approved by: https://github.com/davidberard98 ghstack dependencies: #131518	2024-07-27 07:09:10 +00:00
Janani Sriram	bcf5c68c18	[NestedTensor] Integrate the softmax operator along the jagged dimension into NestedTensor (#131518 ) Modify the existing `softmax` operator in PyTorch, invoked by `torch.softmax`, to allow for reductions along the jagged dimension of a nested tensor. The function originally had a basic implementation for reducing along 1 non-ragged dimension. This diff, which uses the aten padding operator, enables PyTorch users to invoke `torch.softmax` on a nested tensor when reducing along the ragged dimension, e.g. `` in a `(B, , M)` nested tensor. Write unit tests based on the `sum` and `mean` jagged operators to verify the accuracy of the ragged reduction implementation for `torch.softmax`. Add unit tests to verify error handling for unsupported features in `NestedTensor` `torch.softmax`. Note that this implementation is limited to nested tensors with `ragged_idx == 1`, i.e. the ragged dimension is not transposed. In addition, the `softmax` operator is required to take in as input an integer for the reduction dimension `dim`, requiring new unit tests heavily inspired by the `sum` and `mean` jagged operator unit tests. `Softmax` also allows for reducing along the batch dimension. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131518 Approved by: https://github.com/davidberard98	2024-07-27 07:09:10 +00:00
Janani Sriram	13e806a591	[NestedTensor] Add support for transposed NestedTensors where ragged_idx > 1 for sum and mean operators (#131517 ) Add support for transposed, non-contiguous `NestedTensor`s, where `ragged_idx > 1`, for the aten operators `sum` and `mean`. This diff enables reducing along the jagged dimension for non-contiguous `NestedTensor`s, transposed between non-batch dimensions as well as between a ragged and a non-batch dimension. For example, users can now reduce a `NestedTensor` of shape `(B, M, , N)` along `` or `(B, N, M, )` along ``. Parametrize existing unit tests and add new unit tests verifying the accuracy of implementations on `NestedTensor`s that transpose between 2 non-batch dimensions as well as between a ragged and a non-batch dimension. Differential Revision: [D59847927](https://our.internmc.facebook.com/intern/diff/D59847927/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131517 Approved by: https://github.com/davidberard98	2024-07-26 07:21:32 +00:00
Janani Sriram	e782918b8e	[NestedTensor] Add example NestedTensor objects with inner dimension of size 1 to tests reducing along jagged dimension for NestedTensor (#131516 ) Add example `NestedTensor`s with inner dimension of size `1` to `_get_example_tensor_lists` with `include_inner_dim_size_1=True`. This diff creates `NestedTensor`s of sizes `(B, , 1)` and `(B, , 5, 1)`, ensuring that the current implementations of jagged reductions for `sum` and `mean` hold for tensors of effective shape `(B, )` and `(B, , 5)`. Differential Revision: [D59846023](https://our.internmc.facebook.com/intern/diff/D59846023/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131516 Approved by: https://github.com/davidberard98	2024-07-24 07:01:39 +00:00
jananisriram	faddb0f30c	[NestedTensor] Integrate the mean operator along the jagged dimension into NestedTensor (#131132 ) Summary: Modify the existing `mean` operator in PyTorch, invoked by `torch.mean`, to allow for reductions along the jagged dimension of a nested tensor. The function originally had a basic implementation for reducing along 1 non-ragged dimension. This diff enables PyTorch users to invoke `torch.mean` on a nested tensor when reducing along the ragged dimension, e.g. `` in a `(B, , M)` nested tensor. Parametrize unit tests from `sum` to verify the accuracy of the ragged reduction implementation for `torch.mean`. Add unit tests and parametrize `sum` unit tests to verify error handling for unsupported features in `NestedTensor` `torch.mean`. Test Plan: Verify that the new unit test passes via the following command: ``` buck2 run mode/{opt,inplace} //caffe2/test:nested -- --regex test_mean ``` ``` buck2 run mode/{opt,inplace} //caffe2/test:nested -- --regex test_jagged_op ``` Differential Revision: D59654668 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131132 Approved by: https://github.com/davidberard98, https://github.com/jbschlosser	2024-07-23 18:48:34 +00:00
jananisriram	28a74b9fa4	[NestedTensor] Integrate sum along the jagged dimension into NestedTensor (#130425 ) Summary: Modify the existing `sum` operator in PyTorch, invoked by `torch.sum`, to allow for reductions along the ragged dimension of a nested tensor. This diff enables PyTorch users to invoke `torch.sum` on a nested tensor with `dim=1`, where `ragged_idx=1`. Functions modified in `caffe2/torch/nested/_internal/ops.py`: - `sum_dim_IntList()`: The function assumes that `ragged_idx=1`; in the case that `dim=1` as well, where `dim` is the dimension on which we reduce, this diff invokes the PyTorch benchmark found in D58423489. Specifically, this diff pads a nested tensor, e.g. of logical shape `(B, , M)`, using [`torch.ops.aten._jagged_to_padded_dense_forward`](https://www.internalfb.com/code/fbsource/[92c2a067ab04e3eebc999254fed4ae2fbea6def3]/fbcode/deeplearning/fbgemm/fbgemm_gpu/fb/inductor_lowerings/elementwise_ops.py?lines=26), then reduces across the `` dimension (`dim == 1`) to a `(B, M)` output tensor. - `_wrap_jagged_dims()`: This diff adds special handling to allow for the case where `dim` contains `1` and not `0`, but to continue disallowing the case where `dim` contains `0` and not `1`. In this function's creation, I created a helper function, `_get_condition_for_invalid_jagged_reductions()`, which makes it clearer which conditions apply to which operators. Specifically, operators which are enabled with jagged reductions are specified at the top of the file in `SUPPORTED_JAGGED_REDUCTIONS` and have a different set of conditions that need to be tested, as reducing along `dim == 1` without `dim == 0` is now possible. Functions modified in `caffe2/test/test_nestedtensor.py`: - `test_sum_int_DimList()`: This diff adds special handling in the `sum` unit test to allow for the case where `dim` contains `1` and not `0`, but to continue disallowing the case where `dim` contains `0` and not `1`. - `test_sum_int_DimList_ragged_dim_1()`: This diff adds a new unit test which verifies the accuracy and feasibility of reducing along the jagged dimension of a nested tensor. Notes: - This diff solely adds functionality for the case in which we reduce only along the ragged dimension. Cases in which we reduce along both the ragged and another dimension, like `dim == (1, 2)`, are not permitted, as this set of diffs focuses primarily on the former. - The `sum` operator is the only operator which uses the function `_wrap_jagged_dims()`; all other operators use `_wrap_jagged_dim()`. I would like to later look into why this is the case and if we can consolidate this! - I modified some of the comments in the `sum` function as well as the unit tests for more clarity. Test Plan: Verify that existing (`test_sum_int_DimList`) and new (`test_sum_int_DimList_ragged_dim_1`) unit tests pass via the following command: ``` buck2 run mode/{opt,inplace} //caffe2/test:nested -- --regex test_sum_int_DimList ``` Differential Revision: D59571209 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130425 Approved by: https://github.com/davidberard98	2024-07-18 10:48:18 +00:00
Xuehai Pan	ba48cf6535	[BE][Easy][6/19] enforce style for empty lines in import segments in `test/` (#129757 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757 Approved by: https://github.com/ezyang	2024-07-17 06:42:37 +00:00
David Berard	d548417d95	[NJT] throw an exception if nested_tensor_from_jagged is fx-traced without being fx.wrapped (#130702 ) The NJT constructor can't be fx-traced safely due to the dummy nt used: `774ca93fd2/torch/nested/_internal/nested_tensor.py (L501-L508)` The error doesn't appear immediately, but appears if you try to move a module with an fx-traced NJT constructor onto a different device, or try to serialize it. Let's throw an error if we try to fx-trace the NJT constructor so users know to wrap the call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130702 Approved by: https://github.com/jbschlosser, https://github.com/soulitzer	2024-07-16 19:21:10 +00:00
Joel Schlosser	09b1b113f5	Cache min / max seq len for torch.nested.as_nested_tensor(t) (#130766 ) For the `torch.nested.as_nested_tensor(t)` constructor, computing min / max seq len is trivial since the sequence lengths are all the same. Might as well cache them during construction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130766 Approved by: https://github.com/YuqingJ, https://github.com/soulitzer	2024-07-16 18:32:47 +00:00
yuqingj	ea4f310ff1	[Nested Tensor][easy] Add softmax backward support (#130602 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130602 Approved by: https://github.com/davidberard98, https://github.com/jbschlosser	2024-07-16 00:07:42 +00:00
yuqingj	0e79e1f958	[NJT+SDPA]Fix flash_attention output when batch_size=1 and seq_len=1 (#130652 ) fix issue #130196 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130652 Approved by: https://github.com/Skylion007, https://github.com/drisspg, https://github.com/jbschlosser	2024-07-15 19:44:04 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
Joel Schlosser	00335a27b4	Accept min / max sequence length in nested_tensor_from_jagged() constructor (#130175 ) This PR updates the public API for NJT construction `torch.nested.nested_tensor_from_jagged()` to accept values for min / max sequence length. It's useful to provide these ahead of time to avoid GPU -> CPU syncs from on-demand computation later on. NB: The test changes are extensive because I reworked the existing `_validate_nt()` helper function used throughout our NJT construction tests to verify more (specifically: expected cached min / max seq len and contiguity). API design question: should we additionally provide an option to compute these from `offsets` at construction time? I can think of three possible cases during construction: 1. Min / max seq len has already been obtained from somewhere (manual calculation, static values, etc.) and they should be used in the cache 2. Min / max seq len should be computed immediately at construction time for use in the cache (ideally, the caller wouldn't have to do this computation manually) 3. Min / max seq len are not needed at all (i.e. SDPA isn't ever called) and computation should be skipped Pull Request resolved: https://github.com/pytorch/pytorch/pull/130175 Approved by: https://github.com/davidberard98, https://github.com/soulitzer	2024-07-08 22:14:52 +00:00
Joel Schlosser	7192ee0735	Default to input tensor device for as_nested_tensor(t) (#130050 ) Fixes #129647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130050 Approved by: https://github.com/YuqingJ	2024-07-05 17:50:08 +00:00
soulitzer	eeef68671d	[autograd] Do not detach when unpacking tensors that do not require grad (#127959 ) In this PR: - Ensure that if a tensor not requiring grad is saved for backward unpacking does not trigger a detach (unless the user installs a saved tensor pack hook that returns a tensor requiring grad). - Update non-reentrant checkpoint to also no longer detach for this case. Alternatives: - For custom autograd Function, you could directly save on ctx to work around this, but that would not work for when we switch to using custom ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127959 Approved by: https://github.com/YuqingJ ghstack dependencies: #125795, #128545, #129262	2024-07-01 21:57:36 +00:00
PyTorch MergeBot	fa6c0fe3e4	Revert "Conversions between strided and jagged layouts for Nested Tensors (#115749 )" This reverts commit `9450e198aa`. Reverted https://github.com/pytorch/pytorch/pull/115749 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/115749#issuecomment-2197790226))	2024-06-29 00:16:47 +00:00
Antoni Viros	9450e198aa	Conversions between strided and jagged layouts for Nested Tensors (#115749 ) This PR does 3 things: 1. Adds a copy-free strided->jagged layout conversion for NT 2. Adds a copy-free jagged->strided layout conversion for NT 3. Modifies and expands the .to() API to support the layout argument for the specific case of NT layout conversion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115749 Approved by: https://github.com/jbschlosser	2024-06-27 03:41:28 +00:00
Joel Schlosser	e364290718	Support linear backward for NJT with dim > 3 (#129393 ) Replaces usage of `torch.mm()` with `torch.matmul()` in NJT's impl of linear_backward to support higher dims. See [here](https://github.com/pytorch/pytorch/issues/125214#issuecomment-2184968703) for more context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129393 Approved by: https://github.com/soulitzer	2024-06-25 16:06:23 +00:00
yuqingj	00f675bb4c	[Nested Tensor]fix sdpa backward for the special case with ragged second batch dim and constant length (#128349 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128349 Approved by: https://github.com/jbschlosser	2024-06-24 22:35:07 +00:00
Joel Schlosser	e1c1052829	Backward support for unbind() with NJT (#128032 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128032 Approved by: https://github.com/soulitzer	2024-06-21 14:05:23 +00:00
Joel Schlosser	31d5753247	Short-term fix to preserve NJT metadata cache in torch.compile (#122836 ) Idea: close over min / max sequence length in the main NJT view func (`_nested_view_from_jagged`) so that view replay during fake-ification propagates these correctly in torch.compile. For dynamic shapes support for min / max sequence length, this PR uses a hack that stores the values in `(val, 0)` shaped tensors. NB: This PR changes SDPA to operate on real views instead of using `buffer_from_jagged()` / `ViewNestedFromBuffer`, which may impact the internal FIRST model. That is, it undoes the partial revert from #123215 alongside a fix to the problem that required the partial revert. We need to verify that there are no regressions there before landing. Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122836 Approved by: https://github.com/soulitzer	2024-06-20 23:15:53 +00:00
PyTorch MergeBot	b0d2fe6299	Revert "Short-term fix to preserve NJT metadata cache in torch.compile (#122836 )" This reverts commit `2a41fc0390`. Reverted https://github.com/pytorch/pytorch/pull/122836 on behalf of https://github.com/jbschlosser due to internal test failures with DEBUG=1 asserts ([comment](https://github.com/pytorch/pytorch/pull/122836#issuecomment-2177298245))	2024-06-19 00:28:53 +00:00
PyTorch MergeBot	5ffb032be6	Revert "Backward support for unbind() with NJT (#128032 )" This reverts commit `5dc4f652bc`. Reverted https://github.com/pytorch/pytorch/pull/128032 on behalf of https://github.com/jbschlosser due to reverting to revert parent PR ([comment](https://github.com/pytorch/pytorch/pull/128032#issuecomment-2177296325))	2024-06-19 00:26:40 +00:00
PyTorch MergeBot	99f042d336	Revert "Forward fix to skip ROCm tests for #122836 (#128891 )" This reverts commit `4061b3b822`. Reverted https://github.com/pytorch/pytorch/pull/128891 on behalf of https://github.com/jbschlosser due to reverting to revert parent PR ([comment](https://github.com/pytorch/pytorch/pull/128891#issuecomment-2177291249))	2024-06-19 00:21:21 +00:00
Joel Schlosser	5dc4f652bc	Backward support for unbind() with NJT (#128032 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128032 Approved by: https://github.com/soulitzer	2024-06-18 20:29:00 +00:00
Joel Schlosser	4061b3b822	Forward fix to skip ROCm tests for #122836 (#128891 ) Fixes broken ROCm tests from #122836. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128891 Approved by: https://github.com/huydhn ghstack dependencies: #127007, #128057, #122836	2024-06-18 03:01:19 +00:00
Joel Schlosser	2a41fc0390	Short-term fix to preserve NJT metadata cache in torch.compile (#122836 ) Idea: close over min / max sequence length in the main NJT view func (`_nested_view_from_jagged`) so that view replay during fake-ification propagates these correctly in torch.compile. For dynamic shapes support for min / max sequence length, this PR uses a hack that stores the values in `(val, 0)` shaped tensors. NB: This PR changes SDPA to operate on real views instead of using `buffer_from_jagged()` / `ViewNestedFromBuffer`, which may impact the internal FIRST model. That is, it undoes the partial revert from #123215 alongside a fix to the problem that required the partial revert. We need to verify that there are no regressions there before landing. Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122836 Approved by: https://github.com/soulitzer ghstack dependencies: #127007, #128057	2024-06-17 15:25:09 +00:00
Joel Schlosser	9a8917fdbd	Naive CPU kernels for jagged <-> padded dense conversions (#127007 ) This PR introduces naive CPU impls for: * `_jagged_to_padded_dense_forward()` * `_padded_dense_to_jagged_forward()` On the CUDA side, these are backed by lifted FBGEMM kernels. We may want to revisit the CPU versions with higher-performance implementations at a later time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127007 Approved by: https://github.com/davidberard98	2024-06-13 17:13:02 +00:00
Huy Do	6206da55ef	Fix lint after #119459 (#128558 ) TSIA Pull Request resolved: https://github.com/pytorch/pytorch/pull/128558 Approved by: https://github.com/atalman, https://github.com/kit1980, https://github.com/malfet	2024-06-12 22:11:37 +00:00
Joel Schlosser	67e6c76a18	Support apply_(callable) sugar for CPU NJTs (#125416 ) Example: ```python nt.apply_(lambda x: x * 2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125416 Approved by: https://github.com/soulitzer	2024-06-12 20:30:57 +00:00
Joel Schlosser	ec1fdda196	Fix jagged NT softmax semantics (#119459 ) Before: `softmax` definition uses `jagged_unary_pointwise()` (wrong) After: `softmax` impl adjusts the `dim` arg to account for the difference in dimensionality between the outer NT and the NT's `_values` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119459 Approved by: https://github.com/soulitzer	2024-06-12 19:12:03 +00:00
yuqingj	3e09123797	Enable UFMT on test_nestedtensor.py (#128359 ) split it into two PRs since it is more than 2k lines of change Pull Request resolved: https://github.com/pytorch/pytorch/pull/128359 Approved by: https://github.com/davidberard98	2024-06-11 19:14:04 +00:00
Janani Sriram	c1a43a69e4	[NestedTensor] Add error checks for unbind operator coverage when ragged_idx != 1 (#128058 ) Summary: Add the following error checks for the `unbind` operator on `NestedTensor`s when `ragged_idx != 1`: - The current implementation allows the creation of `NestedTensor` instances from the class definition with an `offsets` tensor that applies to a dimension other than the jagged dimension. This diff ensures that `unbind` fails when the `offsets` exceed the length of the jagged dimension. Test Plan: Added the following unit tests: `test_unbind_with_lengths_ragged_idx_equals_2_bad_dim_cpu` verifies that `unbind` fails when there is a mismatch between the offsets and the jagged dimension, for `NestedTensor`s with `lengths`. ``` test_unbind_with_lengths_ragged_idx_equals_2_bad_dim_cpu (test_nestedtensor.TestNestedTensorSubclassCPU) ... ok ``` Reviewed By: davidberard98 Differential Revision: D57989082 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128058 Approved by: https://github.com/davidberard98	2024-06-06 01:56:12 +00:00
Joel Schlosser	b42cfcabc4	Lift jagged -> padded dense forward / backward kernels from fbgemm_gpu (#125946 ) PyTorch can't depend on `fbgemm_gpu` as a dependency because `fbgemm_gpu` already has a dependency on PyTorch. So this PR copy / pastes kernels from `fbgemm_gpu`: * `dense_to_jagged_forward()` as CUDA registration for new ATen op `_padded_dense_to_jagged_forward()` * `jagged_to_padded_dense_forward()` as CUDA registration for new ATen op `_jagged_to_padded_dense_forward()` CPU impls for these new ATen ops will be added in a follow-up PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125946 Approved by: https://github.com/davidberard98	2024-06-03 23:41:54 +00:00

1 2 3 4 5

247 Commits