pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Mikayla Gawarecki	2a75152537	[easy] Add nested tanh (#86826 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86826 Approved by: https://github.com/cpuhrsch	2022-10-13 00:48:08 +00:00
Christian Puhrsch	ef58a132f2	Use CUTLASS GEMM for NT bmm [OSS-only] (#85894 ) OSS-only copy of https://github.com/pytorch/pytorch/pull/85710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85894 Approved by: https://github.com/drisspg	2022-10-12 20:03:28 +00:00
Driss Guessous	16f65f178a	Nested tensor forward only chunk operations (#85645 ) # Summary Taking over this pr: https://github.com/pytorch/pytorch/pull/83736 Adding support for chunk without autograd support Pull Request resolved: https://github.com/pytorch/pytorch/pull/85645 Approved by: https://github.com/cpuhrsch	2022-10-11 01:21:39 +00:00
Antoni Viros i Martin	cdbffa7f66	🦊 [AI Accelerators] Consolidate native_layer_norm for nested tensor (#86295 ) Summary: In order to make the layer normalization implementation for nested tensors public, it needs to be generalized to accept a normalized_shape argument instead of assuming it to be the last dimension of the nested_tensor. This commit does that, as well as adding extra unit tests to ensure the implementation is correct. Test Plan: All unit tests designed to test different ways of using the function work: `buck test //caffe2/test:nested -- test_layer_norm` Differential Revision: D40105207 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86295 Approved by: https://github.com/drisspg	2022-10-06 13:10:25 +00:00
Mikayla Gawarecki	01add6e288	Allow only one -1 in nested view/reshape (#85691 ) ### this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension Behavior before this PR: 1. `-1` allowed for implicit batch dimension 2. multiple `-1`s allowed for pre-existing dimensions 3. for new dimensions, `-1` is not allowed it is worth noting that for the most part 3 is basically unreachable because assuming a nested tensor has at least 1 ragged dimension, you would expect at least one -1 to be in the proposed shape for the pre-existing dimensions Behavior after this PR: 1. batch dimension must be specified 2. only one `-1` allowed for pre-existing dimensions this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension 3. unchanged Pull Request resolved: https://github.com/pytorch/pytorch/pull/85691 Approved by: https://github.com/cpuhrsch	2022-09-28 22:29:40 +00:00
Mikayla Gawarecki	afaee00fec	Add python `nested_tensor` and `as_nested_tensor` constructors in `torch.nested` (#85593 ) Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ). Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc. Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593 Approved by: https://github.com/drisspg, https://github.com/cpuhrsch	2022-09-28 20:15:02 +00:00
PyTorch MergeBot	fc8ba3a92d	Revert "Allow only one -1 in nested view/reshape (#85691 )" This reverts commit `4c4e5f6106`. Reverted https://github.com/pytorch/pytorch/pull/85691 on behalf of https://github.com/atalman due to Causes github first merge conflict	2022-09-28 17:22:53 +00:00
Mikayla Gawarecki	4c4e5f6106	Allow only one -1 in nested view/reshape (#85691 ) ### this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension Behavior before this PR: 1. `-1` allowed for implicit batch dimension 2. multiple `-1`s allowed for pre-existing dimensions 3. for new dimensions, `-1` is not allowed it is worth noting that for the most part 3 is basically unreachable because assuming a nested tensor has at least 1 ragged dimension, you would expect at least one -1 to be in the proposed shape for the pre-existing dimensions Behavior after this PR: 1. batch dimension must be specified 2. only one `-1` allowed for pre-existing dimensions this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension 3. unchanged Pull Request resolved: https://github.com/pytorch/pytorch/pull/85691 Approved by: https://github.com/cpuhrsch	2022-09-27 17:16:54 +00:00
Mikayla Gawarecki	5e700803c2	Use fallback approach for nested matmul (#85311 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85311 Approved by: https://github.com/cpuhrsch, https://github.com/drisspg	2022-09-22 21:19:09 +00:00
PyTorch MergeBot	caa0ab557d	Revert "Use fallback approach for nested matmul (#85311 )" This reverts commit `7c31f6e672`. Reverted https://github.com/pytorch/pytorch/pull/85311 on behalf of https://github.com/clee2000 due to broke lots of builds `7c31f6e672` even though the pr was green	2022-09-21 22:55:40 +00:00
Mikayla Gawarecki	7c31f6e672	Use fallback approach for nested matmul (#85311 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85311 Approved by: https://github.com/cpuhrsch, https://github.com/drisspg	2022-09-21 22:39:52 +00:00
Mikayla Gawarecki	77f1f98479	Re-introduce `torch.Tensor.to_padded_tensor` (#85293 ) Differential Revision: [D39629004](https://our.internmc.facebook.com/intern/diff/D39629004) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85293 Approved by: https://github.com/cpuhrsch	2022-09-21 18:45:56 +00:00
drisspg	bda8a5729b	[Nested Tensor] Create differentiable nt to tensor view functions (#83371 ) This PR attempts to implements 2) "the safe way" of creating a view of nested tensor that returns a regular tensor. The rest of the break down is here: https://fb.quip.com/J8QCAx41af11 https://gist.github.com/drisspg/8622e9c97d374fa920ac647e1167cabc This is a short list of some edge cases. After some more work I was able to address two of the test cases in the above gist. There are few complex aspects here that I left defeated comments inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83371 Approved by: https://github.com/bdhirsh	2022-09-13 20:35:58 +00:00
Mikayla Gawarecki	e217b30b0f	Add `torch.nested` namespace (#84102 ) First step towards #83775 - only `to_padded_tensor` is moved to the nested namespace for now - following the schema used for `special`, `fft`, `linalg` and other namespaces, nested functions are registered in native_functions.yaml as `nested_{function_name}` and are bound to the desired Python name in `torch/nested/__init__.py`, and the desired C++ name in `torch/csrc/api/include/torch/nested.h`. ~~Question: should we keep the documentation for `Tensor.to_padded_tensor` or can this deleted since it is shared by `torch.nested.to_padded_tensor`?~~ [generated nested docs](https://docs-preview.pytorch.org/84102/nested.html?highlight=nested#module-torch.nested) Differential Revision: [D39361148](https://our.internmc.facebook.com/intern/diff/D39361148) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84102 Approved by: https://github.com/drisspg	2022-09-12 16:31:05 +00:00
Mikayla Gawarecki	1cad744694	Enable select.int when NestedTensor requires grad (#83875 ) Previously indexing a nested tensor when it requires_grad would raise an error because the backward formula for `select.int` uses `self.sizes()`. This PR fixes that by temporarily registering a _nested_select_backward function which can be removed when we start using the symint approach to register kernels. For now this functionality is needed for creating a POC that nested tensor can be an API to `segment_coo` and `segment_csr` in the torch_scatter repo ``` a = torch.arange(10).reshape(2, 5).float() b = torch.arange(12).reshape(2, 6).float() nt = torch.nested_tensor([a, b], dtype=torch.float).requires_grad_(True) nt[0] # RuntimeError: Internal error: NestedTensorImpl doesn't support sizes. Please file an issue on https://github.com/pytorch/nestedtensor ``` whereas ``` nt = torch.nested_tensor([a, b], dtype=torch.float).requires_grad_(False) nt[0] ``` would succeed Pull Request resolved: https://github.com/pytorch/pytorch/pull/83875 Approved by: https://github.com/albanD, https://github.com/drisspg	2022-09-06 22:19:32 +00:00
Driss Guessous	f803fa9fc9	[Nested Tensor] Add a NestedTensorUtils header and cpp file for organization (#84385 ) # Summary Trying to do some clean up into code structure for nested tensors. This introduces a utility header and cpp file that implements helper functions. This is the initial PR in more clean up. The next would be separating out the all native functions that create nested tensors into their own file since they do not infact do math on nested tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84385 Approved by: https://github.com/mikaylagawarecki	2022-09-02 16:31:55 +00:00
YifanShenSZ	673b35c847	Better reshape with autograd support (#82754 ) (#84154 ) The original author is @YifanShenSZ and the original PR is: #82754 # Summary: Previous reshape [https://github.com/pytorch/pytorch/issues/80981](https://github.com/pytorch/pytorch/pull/80981) is ok for forward, but needs improvement for backward: need to handle "sometimes view sometimes copy" behavior. This pull request fixes it by: 1. add a new alias dispatch key `CompositeImplicitAutogradNestedTensor`, which ideally would work as nested-tensor version of `CompositeImplicitAutograd` 2. register `reshape_nested` to `reshape` by `CompositeImplicitAutogradNestedTensor` Side changes: * add contiguous memory format support to `clone_nested` * add `view_nested` * add `reshape_as_nested` Fix issue [https://github.com/pytorch/pytorch/issues/83041](https://github.com/pytorch/pytorch/issues/83041) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82754 Test Plan: Imported from GitHub, without a `Test Plan:` line. Static Docs Preview: executorch \|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D39023822/V13/executorch/)\| \|Modified Pages\| Reviewed By: albanD Differential Revision: D39023822 Pulled By: drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/84154 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2022-09-01 20:01:39 +00:00
Driss Guessous	71369051ee	[Nested Tensor] fix from_padded bug (#84217 ) Fixes #84082 Explained in the issue that the problem was arising from grad being not contiguous and the fast kernel not handiling this case gracefully. The other thing I can do is add a contiguous call to `d144594512/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cpp (L45)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84217 Approved by: https://github.com/albanD	2022-08-30 03:48:11 +00:00
Driss Guessous	2436cf8aa8	[Nested Tensor] detach (#84078 ) ## Summary Add detach op for nested tensors. Nested tensors are not part of the composite explicit dispatch key set and therefore need to be added manually. The Detach test is failing only for the dtype=torch.float32, torch.float16 and device=cuda. The chain of ops that called are sum.backward() -> from_padded() -> unbind(). This populates the grad for a and b. Does this potentially indicated that cuda implementation for one of these ops, likely from_padded() is incorrect? Pull Request resolved: https://github.com/pytorch/pytorch/pull/84078 Approved by: https://github.com/albanD	2022-08-29 09:12:26 +00:00
PyTorch MergeBot	f4f54c7ce1	Revert "[Nested Tensor] detach (#84078 )" This reverts commit `092fe71f33`. Reverted https://github.com/pytorch/pytorch/pull/84078 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-28 15:30:21 +00:00
Driss Guessous	092fe71f33	[Nested Tensor] detach (#84078 ) ## Summary Add detach op for nested tensors. Nested tensors are not part of the composite explicit dispatch key set and therefore need to be added manually. The Detach test is failing only for the dtype=torch.float32, torch.float16 and device=cuda. The chain of ops that called are sum.backward() -> from_padded() -> unbind(). This populates the grad for a and b. Does this potentially indicated that cuda implementation for one of these ops, likely from_padded() is incorrect? Pull Request resolved: https://github.com/pytorch/pytorch/pull/84078 Approved by: https://github.com/albanD	2022-08-27 03:00:55 +00:00
Yifan Shen	b3c99bef0c	Support nested dropout autograd (#83338 ) When the initial version came out, `NestedTensor` was not included in the `CompositeImplicitAutograd` key set, so we had to register dropout_nested to dropout and make it forward-only. Now is the time to improve it! This pr removes dropout_nested; instead native_dropout_nested is implemented along with native_dropout_backward_nested. Side change: remove dropout__nested since @cpuhrsch suggested to leave out nested in-place ops for now Pull Request resolved: https://github.com/pytorch/pytorch/pull/83338 Approved by: https://github.com/jbschlosser	2022-08-18 00:49:29 +00:00
Mikayla Gawarecki	bd0ad7a84f	Add backward support for rudimentary NestedTensor.sum(dim) (#82625 ) Per offline discussion, this will be updated to use expand once expand semantics for nested tensor have been fleshed out. Next steps will be to add support for other features for forward sum mentioned on #82387 and likewise update the backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/82625 Approved by: https://github.com/albanD	2022-08-17 18:12:00 +00:00
Driss Guessous	4b597019b7	[Nested Tensor] Created Nested Tensor to Nested Tensor Views (#82658 ) # Summary This is PR is pulling out all the changes from #81838 specific to properly creating nested_tensor views. I will update this comment with a design doc once that has been made. This should enable proper creation of NestedTensor views, two nested_tensors sharing the same buffer_ but with different NestedTensor meta data. The function `create_nested_tensor_view` is a helper function for creating a new nested tensor whose storage aliases the base causing the underlying storage to be shared - and is therefore a view. This function by itself is not differentiable and therefore autograd does not track its uses. If a nested tensor function implementation uses this helper in its implementation the aten_op must meet two requirements: - The function must return a view of the input - The function must be explicit and defines its backward ## Testing A bug was found when creating a base tensor out of inference mode and then creating a view in inference mode. This test has been aded to this PR in order to show the effect of the change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82658 Approved by: https://github.com/albanD	2022-08-16 20:22:21 +00:00
Driss Guessous	c5c0dd9b62	Update shallow_copy_and_detach for nested tensor impls (#83002 ) # Summary This change fixes a bug that was encountered when trying to add more backward formulas for nested tensor ops. If a derivative is defined that stores the "result" for use in the backward the output of the forward op is saved using: ``` if (grad_fn) { grad_fn->result_ = SavedVariable(result, true); } ``` SavedVariable calls a series of functions which in turn calls shallow_copy_and_detach and when `c179597753/c10/core/TensorImpl.cpp (L533)` is hit this calls sizes_custom() which is not implemented and errors. I also noticed that since the storage format is different for nested_tensor not `storage_ ` but instead two tensors that the we should actually be calling the NestedTensorImpl constructor. This PR overrides shallow_copy_and_detach from the derived class and ensures that shallow copy works correctly. ## Update - Added the softmax derivative in this PR because that is a direct use case that was blocked by not having shallow_copy_and_detach work correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83002 Approved by: https://github.com/albanD	2022-08-10 20:34:46 +00:00
Driss Guessous	e816644495	Add nested tensor contiguous (#82147 ) ### Description <!-- What did you change and why was it needed? --> The nested_tensor impl for `contiguous` was currently disabled. Prior to the work on nested_tensor transpose. Only contiguous nested tensors could be created from python. However now is possible to create nested tensors that are non contiguous. This pr links up the existing function used at the c++ level to the python function. ### Tests Updated Test in `test/test_nestedtensor.py` ### Notes The inference mode had to be removed for this test. This is because the func `.contiguous` is a composite implicit function. Currently this does not work in inference mode. However: https://github.com/pytorch/pytorch/pull/81838 should fix that issue. ### Why When writing kernels in Triton for nested tensors I exposed a helper function that returned the "Buffer" tensor to python. Now contiguity can be checked before running any triton kernel. Also a good follow up would be making `nt.contiguous` on non contiguous nested tensors return a contiguous nested tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82147 Approved by: https://github.com/jbschlosser	2022-08-09 01:51:37 +00:00
Joel Benjamin Schlosser	6ca95547ac	Initial private SDP interface and naive composite impl (#81956 ) Adds an initial private API version of the SDP interface. Signature: ``` _scaled_dot_product_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool need_attn_weights=True, bool is_causal=False) -> (Tensor, Tensor) ``` Returns a tuple of `(output, attn_weights)`. Note the following: * `need_attn_weights`: flag indicating that attention weights should be computed. This is useful to toggle off for flash attention as it does not materialize the weights by default, making it more expensive to return them. * Boolean attention mask support only; `True` values within `attn_mask` indicate that the element should take part in attention (notably, this is reverse of MHA, which uses `True` to mask out values). Mask is optional. * `is_causal`: Temporary flag indicating whether to use a causal attention weighting. If this is set to `True`, it takes precedent over any value passed in for `attn_mask`. Longer term, the `is_causal` flagging can be subsumed into the `attn_mask` arg via tensor subclassing (see e.g. [CausalTensor](https://github.com/facebookresearch/xformers/blob/sparse_cleanup/xformers/sparse/causal_tensor.py) in xFormers). * Testing is currently done via reference with the existing Python impl of `F._scaled_dot_product_attention`. * This PR does not yet drop-in the new SDP anywhere. A future PR can hook it up in BT or MHA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81956 Approved by: https://github.com/drisspg, https://github.com/erichan1	2022-08-01 22:26:18 +00:00
YifanShenSZ	4bb7e148c4	add nested tensor matmul support (#81957 ) There was a discussion on whether letting nested tensor `reshape` support collapsing and splitting dimension 0. The conclusion was to make reshape simple, so we need a tweaked `matmul`, which only supports 3+ dimension nonbroadcast case, i.e. a generalized `bmm`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81957 Approved by: https://github.com/jbschlosser	2022-07-30 22:35:09 +00:00
YifanShenSZ	5f9939f65e	Introduce discontinuity to nested tensor (#80981 ) Nested tensor used to assume the buffer memory to be contiguous. However, some operations can break that assumption: * reshape * transpose * slice To be able to access underlying tensors from discontinuous buffer, we need 3 metadata: * sizes of each tensor (`nested_size_tensor_`) * strides of each tensor (`nested_stride_tensor_`) * offset of each tensor (`offsets_`) so we access each tensor by `buffer.as_strided(size, stride, offset)` This pull request introduces the offsets metadata, then added reshape and transpose so that we can create discontinuous cases for testing. Unbind, select, dropout, softmax, bmm are refactored to provide tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80981 Approved by: https://github.com/jbschlosser	2022-07-30 04:08:30 +00:00
Mikayla Gawarecki	89c0123ba0	Add rudimentary NestedTensor.sum(dim) (#82387 ) A first step towards adding dimension-wise reductions to NestedTensor, - Assumes tensors in the nested tensor as well as the buffer of the nested tensor are contiguous - Always enforces `keepdim=True` - Only supports reduction across the last dimension - No support for acctype (`dtype` argument) - No autograd support - CPU only Next steps would be to add support for the above. For now this basic support is for prototyping to make sure `NestedTensor` can be used as an API for segment reductions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82387 Approved by: https://github.com/jbschlosser	2022-07-28 22:45:22 +00:00
PyTorch MergeBot	26776d628c	Revert "Initial private SDP interface and naive composite impl (#81956 )" This reverts commit `f15c5bf133`. Reverted https://github.com/pytorch/pytorch/pull/81956 on behalf of https://github.com/janeyx99 due to broke all configs on test_scaled_dot_product_attention (__main__.TestNestedTensorAutograd) `f15c5bf133`	2022-07-27 18:36:54 +00:00
Joel Benjamin Schlosser	f15c5bf133	Initial private SDP interface and naive composite impl (#81956 ) Adds an initial private API version of the SDP interface. Signature: ``` _scaled_dot_product_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool need_attn_weights=True, bool is_causal=False) -> (Tensor, Tensor) ``` Returns a tuple of `(output, attn_weights)`. Note the following: * `need_attn_weights`: flag indicating that attention weights should be computed. This is useful to toggle off for flash attention as it does not materialize the weights by default, making it more expensive to return them. * Boolean attention mask support only; `True` values within `attn_mask` indicate that the element should take part in attention (notably, this is reverse of MHA, which uses `True` to mask out values). Mask is optional. * `is_causal`: Temporary flag indicating whether to use a causal attention weighting. If this is set to `True`, it takes precedent over any value passed in for `attn_mask`. Longer term, the `is_causal` flagging can be subsumed into the `attn_mask` arg via tensor subclassing (see e.g. [CausalTensor](https://github.com/facebookresearch/xformers/blob/sparse_cleanup/xformers/sparse/causal_tensor.py) in xFormers). * Testing is currently done via reference with the existing Python impl of `F._scaled_dot_product_attention`. * This PR does not yet drop-in the new SDP anywhere. A future PR can hook it up in BT or MHA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81956 Approved by: https://github.com/drisspg, https://github.com/erichan1	2022-07-27 15:41:45 +00:00
PyTorch MergeBot	500be5998d	Revert "Introduce discontinuity to nested tensor (#80981 )" This reverts commit `b492f7c485`. Reverted https://github.com/pytorch/pytorch/pull/80981 on behalf of https://github.com/osalpekar due to This was reverted internally in D38142790, due to causing TorchScript inference failures	2022-07-26 21:40:42 +00:00
PyTorch MergeBot	0b0dbc59e6	Revert "Update shallow_copy_and_detach for nested tensor impls to enable nested tensor softmax backward (#81838 )" This reverts commit `6697f1e467`. Reverted https://github.com/pytorch/pytorch/pull/81838 on behalf of https://github.com/osalpekar due to Reverting this in order to revert https://github.com/pytorch/pytorch/pull/80981 cleanly. That diff caused GPU Inference breakage internally	2022-07-26 21:34:10 +00:00
PyTorch MergeBot	6c10a598ca	Revert "add nested tensor matmul support (#81957 )" This reverts commit `7bdafed4f1`. Reverted https://github.com/pytorch/pytorch/pull/81957 on behalf of https://github.com/osalpekar due to Reverting this in order to revert https://github.com/pytorch/pytorch/pull/80981 cleanly. That diff caused GPU Inference breakage internally	2022-07-26 21:10:28 +00:00
YifanShenSZ	7bdafed4f1	add nested tensor matmul support (#81957 ) There was a discussion on whether letting nested tensor `reshape` support collapsing and splitting dimension 0. The conclusion was to make reshape simple, so we need a tweaked `matmul`, which only supports 3+ dimension nonbroadcast case, i.e. a generalized `bmm`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81957 Approved by: https://github.com/jbschlosser	2022-07-26 16:58:42 +00:00
Driss Guessous	6697f1e467	Update shallow_copy_and_detach for nested tensor impls to enable nested tensor softmax backward (#81838 ) # Summary This change fixes a bug that was encountered when trying to add more backward formulas for nested tensor ops. If a derivative is defined that stores the "result" for use in the backward the output of the forward op is saved using: ``` if (grad_fn) { grad_fn->result_ = SavedVariable(result, true); } ``` SavedVariable calls a series of functions which in turn calls shallow_copy_and_detach and when `c179597753/c10/core/TensorImpl.cpp (L533)` is hit this calls sizes_custom() which is not implemented and errors. I also noticed that since the storage format is different for nested_tensor not `storage_ ` but instead two tensors that the we should actually be calling the NestedTensorImpl constructor. This PR overrides shallow_copy_and_detach from the derived class and ensures that shallow copy works correctly. ## Update - Added the softmax derivative in this PR because that is a direct use case that was blocked by not having shallow_copy_and_detach work correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81838 Approved by: https://github.com/soulitzer	2022-07-25 20:04:40 +00:00
Yifan Shen	b492f7c485	Introduce discontinuity to nested tensor (#80981 ) Nested tensor used to assume the buffer memory to be contiguous. However, some operations can break that assumption: * reshape * transpose * slice To be able to access underlying tensors from discontinuous buffer, we need 3 metadata: * sizes of each tensor (`nested_size_tensor_`) * strides of each tensor (`nested_stride_tensor_`) * offset of each tensor (`offsets_`) so we access each tensor by `buffer.as_strided(size, stride, offset)` This pull request introduces the offsets metadata, then added reshape and transpose so that we can create discontinuous cases for testing. Unbind, select, dropout, softmax, bmm are refactored to provide tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80981 Approved by: https://github.com/jbschlosser	2022-07-21 17:17:25 +00:00
Driss Guessous	fca1523604	implement numel and tests for nested tensor (#80424 ) Add numel implementation for Nested Tensor. Currently the construction of nested size and nested_strides assume contiguous. This implementation was based off of the safe_compute_numel(). Having a TORCH_CHECK in a for loop kinda feels bad but I don't really know how performant numel needs to be. Since nested size is stored as a tensor: `nested_size_tensor().cumprod(dim=1).sum(dim=0)[1].item() ` Would also get the job done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80424 Approved by: https://github.com/cpuhrsch	2022-06-28 18:02:44 +00:00
drisspg	2a09e95169	Register nested tensor linear kernel (#80397 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80397 Approved by: https://github.com/soulitzer	2022-06-28 06:23:26 +00:00
Christian Puhrsch	2258db5da3	TensorImpl:::size_custom to support NestedTensor.size (#80236 ) This allows subclasses such as NestedTensorImpl to provide special behavior for `int64_t size(int64_t d)` that'll also be accessible by our Python frontend. It follows the same pattern as sizes_custom. Currently getting CI before asking for a review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80236 Approved by: https://github.com/ezyang	2022-06-27 17:07:42 +00:00
Yifan Shen	09f79e94ac	support nested_tensor * scalar (#80284 ) In transformer, the scale step in attention has a `nested_tensor / scalar` operation. There are two ways to support that: 1. directly support `nested_tensor / scalar`: * pro: straightforward, good UX * con: is dispatching `mul(nested tensor, regular tensor)` a good practice? 2. let user manually convert `scalar` to `nested_scalar = torch.nested_tensor([broadcast_scalar])` * pro: dispatcher only has to deal with `mul(nested tensor, nested tensor)` * con: confusing manual conversions, bad UX Pull Request resolved: https://github.com/pytorch/pytorch/pull/80284 Approved by: https://github.com/cpuhrsch	2022-06-27 14:15:05 +00:00
Yifan Shen	fc0faa2cf6	Support nested_tensor.bmm (#80224 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80224 Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser	2022-06-25 03:19:46 +00:00
Yifan Shen	54a1cc5246	Support softmax(nested tensor) (#80179 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80179 Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser	2022-06-24 14:50:24 +00:00
Yifan Shen	f749f86fee	Add nested tensor metadata nested_stride then use it in unbind, select (#79831 ) 2 reasons to add metadata `nested_stride`: 1. it will be used later in `reshape` and `transpose` 2. it reduces the computation to get offsets and shapes necessary in `unbind`-like codes, which will be used again and again in nested tensor operations `unbind` and `select` are refactored to make use of `nested_stride` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79831 Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser	2022-06-23 20:24:50 +00:00
Driss Guessous	a098937c20	Add factory function derivatives (#79872 ) Adding derivatives for factory functions, this issue is used for tracking: #79044 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79872 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2022-06-21 00:53:11 +00:00
Edward Z. Yang	f7ee061638	Wconstab/reland pysymint (#79795 ) rebased https://github.com/pytorch/pytorch/pull/79617/ to see if issues are reproducible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79795 Approved by: https://github.com/malfet	2022-06-20 22:55:06 +00:00
Yifan Shen	1b25aa6786	Support dropout(nested tensor) (#79318 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79318 Approved by: https://github.com/jbschlosser	2022-06-17 18:41:54 +00:00
PyTorch MergeBot	8a7a5def1d	Revert "Support dropout(nested tensor) (#79318 )" This reverts commit `1211ab679c`. Reverted https://github.com/pytorch/pytorch/pull/79318 on behalf of https://github.com/janeyx99 due to Broke dropout tests on trunk, also errors on PR	2022-06-17 04:56:29 +00:00
Yifan Shen	1211ab679c	Support dropout(nested tensor) (#79318 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79318 Approved by: https://github.com/jbschlosser	2022-06-17 00:46:07 +00:00

1 2

66 Commits