Commit Graph

66 Commits

Author SHA1 Message Date
Mikayla Gawarecki
2a75152537 [easy] Add nested tanh (#86826)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86826
Approved by: https://github.com/cpuhrsch
2022-10-13 00:48:08 +00:00
Christian Puhrsch
ef58a132f2 Use CUTLASS GEMM for NT bmm [OSS-only] (#85894)
OSS-only copy of https://github.com/pytorch/pytorch/pull/85710
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85894
Approved by: https://github.com/drisspg
2022-10-12 20:03:28 +00:00
Driss Guessous
16f65f178a Nested tensor forward only chunk operations (#85645)
# Summary

Taking over this pr: https://github.com/pytorch/pytorch/pull/83736

Adding support for chunk without autograd support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85645
Approved by: https://github.com/cpuhrsch
2022-10-11 01:21:39 +00:00
Antoni Viros i Martin
cdbffa7f66 🦊 [AI Accelerators] Consolidate native_layer_norm for nested tensor (#86295)
Summary: In order to make the layer normalization implementation for nested tensors public, it needs to be generalized to accept a normalized_shape argument instead of assuming it to be the last dimension of the nested_tensor. This commit does that, as well as adding extra unit tests to ensure the implementation is correct.

Test Plan:
All unit tests designed to test different ways of using the function work:

`buck test //caffe2/test:nested -- test_layer_norm`

Differential Revision: D40105207

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86295
Approved by: https://github.com/drisspg
2022-10-06 13:10:25 +00:00
Mikayla Gawarecki
01add6e288 Allow only one -1 in nested view/reshape (#85691)
###  this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension

Behavior before this PR:

1. `-1` allowed for implicit batch dimension
2. multiple `-1`s allowed for pre-existing dimensions
3.  for new dimensions, `-1` is not allowed

 it is worth noting that for the most part 3 is basically unreachable because assuming a nested tensor has at least 1 ragged dimension, you would expect at least one -1 to be in the proposed shape for the pre-existing dimensions

Behavior after this PR:
1. batch dimension **must be specified**
2. **only one** `-1` allowed for pre-existing dimensions **this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension**
3. unchanged

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85691
Approved by: https://github.com/cpuhrsch
2022-09-28 22:29:40 +00:00
Mikayla Gawarecki
afaee00fec Add python nested_tensor and as_nested_tensor constructors in torch.nested (#85593)
Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ).

Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc.

Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593
Approved by: https://github.com/drisspg, https://github.com/cpuhrsch
2022-09-28 20:15:02 +00:00
PyTorch MergeBot
fc8ba3a92d Revert "Allow only one -1 in nested view/reshape (#85691)"
This reverts commit 4c4e5f6106.

Reverted https://github.com/pytorch/pytorch/pull/85691 on behalf of https://github.com/atalman due to Causes github first merge conflict
2022-09-28 17:22:53 +00:00
Mikayla Gawarecki
4c4e5f6106 Allow only one -1 in nested view/reshape (#85691)
###  this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension

Behavior before this PR:

1. `-1` allowed for implicit batch dimension
2. multiple `-1`s allowed for pre-existing dimensions
3.  for new dimensions, `-1` is not allowed

 it is worth noting that for the most part 3 is basically unreachable because assuming a nested tensor has at least 1 ragged dimension, you would expect at least one -1 to be in the proposed shape for the pre-existing dimensions

Behavior after this PR:
1. batch dimension **must be specified**
2. **only one** `-1` allowed for pre-existing dimensions **this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension**
3. unchanged

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85691
Approved by: https://github.com/cpuhrsch
2022-09-27 17:16:54 +00:00
Mikayla Gawarecki
5e700803c2 Use fallback approach for nested matmul (#85311)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85311
Approved by: https://github.com/cpuhrsch, https://github.com/drisspg
2022-09-22 21:19:09 +00:00
PyTorch MergeBot
caa0ab557d Revert "Use fallback approach for nested matmul (#85311)"
This reverts commit 7c31f6e672.

Reverted https://github.com/pytorch/pytorch/pull/85311 on behalf of https://github.com/clee2000 due to broke lots of builds 7c31f6e672 even though the pr was green
2022-09-21 22:55:40 +00:00
Mikayla Gawarecki
7c31f6e672 Use fallback approach for nested matmul (#85311)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85311
Approved by: https://github.com/cpuhrsch, https://github.com/drisspg
2022-09-21 22:39:52 +00:00
Mikayla Gawarecki
77f1f98479 Re-introduce torch.Tensor.to_padded_tensor (#85293)
Differential Revision: [D39629004](https://our.internmc.facebook.com/intern/diff/D39629004)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85293
Approved by: https://github.com/cpuhrsch
2022-09-21 18:45:56 +00:00
drisspg
bda8a5729b [Nested Tensor] Create differentiable nt to tensor view functions (#83371)
This PR attempts to implements 2) "the safe way" of creating a view of nested tensor that returns a regular tensor. The rest of the break down is here: https://fb.quip.com/J8QCAx41af11

https://gist.github.com/drisspg/8622e9c97d374fa920ac647e1167cabc
This is a short list of some edge cases. After some more work I was able to address two of the test cases in the above gist. There are few complex aspects here that I left defeated comments inline.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83371
Approved by: https://github.com/bdhirsh
2022-09-13 20:35:58 +00:00
Mikayla Gawarecki
e217b30b0f Add torch.nested namespace (#84102)
First step towards #83775
- only `to_padded_tensor` is moved to the nested namespace for now
- following the schema used for `special`, `fft`, `linalg` and other namespaces, nested functions are registered in native_functions.yaml as `nested_{function_name}` and are bound to the desired Python name in
`torch/nested/__init__.py`, and the desired C++ name in `torch/csrc/api/include/torch/nested.h`.

~~**Question**: should we keep the documentation for `Tensor.to_padded_tensor` or can this deleted since it is shared by `torch.nested.to_padded_tensor`?~~

[generated nested docs](https://docs-preview.pytorch.org/84102/nested.html?highlight=nested#module-torch.nested)

Differential Revision: [D39361148](https://our.internmc.facebook.com/intern/diff/D39361148)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84102
Approved by: https://github.com/drisspg
2022-09-12 16:31:05 +00:00
Mikayla Gawarecki
1cad744694 Enable select.int when NestedTensor requires grad (#83875)
Previously indexing a nested tensor when it requires_grad would raise an error because the backward formula for `select.int` uses `self.sizes()`. This PR fixes that by temporarily registering a _nested_select_backward function which can be removed when we start using the symint approach to register kernels. For now this functionality is needed for creating a POC that nested tensor can be an API to `segment_coo` and `segment_csr` in the torch_scatter repo

```
    a = torch.arange(10).reshape(2, 5).float()
    b = torch.arange(12).reshape(2, 6).float()
    nt = torch.nested_tensor([a, b], dtype=torch.float).requires_grad_(True)
    nt[0]
    # RuntimeError: Internal error: NestedTensorImpl doesn't support sizes. Please file an issue on https://github.com/pytorch/nestedtensor
```

whereas

```
 nt = torch.nested_tensor([a, b], dtype=torch.float).requires_grad_(False)
 nt[0]
 ```
would succeed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83875
Approved by: https://github.com/albanD, https://github.com/drisspg
2022-09-06 22:19:32 +00:00
Driss Guessous
f803fa9fc9 [Nested Tensor] Add a NestedTensorUtils header and cpp file for organization (#84385)
# Summary
Trying to do some clean up into code structure for nested tensors. This introduces a utility header and cpp file that implements helper functions.

This is the initial PR in more clean up. The next would be separating out the all native functions that create nested tensors into their own file since they do not infact do math on nested tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84385
Approved by: https://github.com/mikaylagawarecki
2022-09-02 16:31:55 +00:00
YifanShenSZ
673b35c847 Better reshape with autograd support (#82754) (#84154)
The original author is @YifanShenSZ  and the original PR is: #82754
# Summary:
Previous reshape [https://github.com/pytorch/pytorch/issues/80981](https://github.com/pytorch/pytorch/pull/80981) is ok for forward, but needs improvement for backward: need to handle "sometimes view sometimes copy" behavior.

This pull request fixes it by:
1. add a new alias dispatch key `CompositeImplicitAutogradNestedTensor`, which ideally would work as nested-tensor version of `CompositeImplicitAutograd`
2. register `reshape_nested` to `reshape` by `CompositeImplicitAutogradNestedTensor`

Side changes:
* add contiguous memory format support to `clone_nested`
* add `view_nested`
* add `reshape_as_nested`

Fix issue [https://github.com/pytorch/pytorch/issues/83041](https://github.com/pytorch/pytorch/issues/83041)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82754

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

**Static Docs Preview: executorch**
|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D39023822/V13/executorch/)|

|**Modified Pages**|

Reviewed By: albanD

Differential Revision: D39023822

Pulled By: drisspg

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84154
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2022-09-01 20:01:39 +00:00
Driss Guessous
71369051ee [Nested Tensor] fix from_padded bug (#84217)
Fixes #84082

Explained in the issue that the problem was arising from grad being not contiguous and the fast kernel not handiling this case gracefully.  The other thing I can do is add a contiguous call to d144594512/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cpp (L45)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84217
Approved by: https://github.com/albanD
2022-08-30 03:48:11 +00:00
Driss Guessous
2436cf8aa8 [Nested Tensor] detach (#84078)
## Summary
Add detach op for nested tensors. Nested tensors are not part of the composite explicit dispatch key set and therefore need to be added manually.

The Detach test is failing only for the dtype=torch.float32, torch.float16 and device=cuda. The chain of ops that called are sum.backward() -> from_padded() -> unbind(). This populates the grad for a and b.

Does this potentially indicated that cuda implementation for one of these ops, likely from_padded() is incorrect?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84078
Approved by: https://github.com/albanD
2022-08-29 09:12:26 +00:00
PyTorch MergeBot
f4f54c7ce1 Revert "[Nested Tensor] detach (#84078)"
This reverts commit 092fe71f33.

Reverted https://github.com/pytorch/pytorch/pull/84078 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-28 15:30:21 +00:00
Driss Guessous
092fe71f33 [Nested Tensor] detach (#84078)
## Summary
Add detach op for nested tensors. Nested tensors are not part of the composite explicit dispatch key set and therefore need to be added manually.

The Detach test is failing only for the dtype=torch.float32, torch.float16 and device=cuda. The chain of ops that called are sum.backward() -> from_padded() -> unbind(). This populates the grad for a and b.

Does this potentially indicated that cuda implementation for one of these ops, likely from_padded() is incorrect?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84078
Approved by: https://github.com/albanD
2022-08-27 03:00:55 +00:00
Yifan Shen
b3c99bef0c Support nested dropout autograd (#83338)
When the initial version came out, `NestedTensor` was not included in the `CompositeImplicitAutograd` key set, so we had to register dropout_nested to dropout and make it forward-only. Now is the time to improve it!

This pr removes dropout_nested; instead native_dropout_nested is implemented along with native_dropout_backward_nested.

Side change: remove dropout__nested since @cpuhrsch suggested to leave out nested in-place ops for now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83338
Approved by: https://github.com/jbschlosser
2022-08-18 00:49:29 +00:00
Mikayla Gawarecki
bd0ad7a84f Add backward support for rudimentary NestedTensor.sum(dim) (#82625)
Per offline discussion, this will be updated to use expand once expand semantics for nested tensor have been fleshed out.

Next steps will be to add support for other features for forward sum mentioned on #82387 and likewise update the backward

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82625
Approved by: https://github.com/albanD
2022-08-17 18:12:00 +00:00
Driss Guessous
4b597019b7 [Nested Tensor] Created Nested Tensor to Nested Tensor Views (#82658)
# Summary
This is PR is pulling out all the changes from #81838 specific to properly creating nested_tensor views. I will update this comment with a design doc once that has been made.  This should enable proper creation of NestedTensor views, two nested_tensors sharing the same buffer_ but with different NestedTensor meta data.

The function `create_nested_tensor_view` is a helper function for creating a new nested tensor whose storage aliases the base causing the underlying storage to be shared - and is therefore a view.

This function by itself is not differentiable and therefore autograd does not track its uses. If a nested tensor function implementation uses this helper in its implementation the aten_op must meet two requirements:
- The function must return a view of the input
- The function must be explicit and defines its backward

## Testing
A bug was found when creating a base tensor out of inference mode and then creating a view in inference mode. This test has been aded to this PR in order to show the effect of the change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82658
Approved by: https://github.com/albanD
2022-08-16 20:22:21 +00:00
Driss Guessous
c5c0dd9b62 Update shallow_copy_and_detach for nested tensor impls (#83002)
# Summary
This change fixes a bug that was encountered when trying to add more backward formulas for nested tensor ops.  If a derivative is defined that stores the "result" for use in the backward the output of the forward op is saved using:
```
 if (grad_fn) {
    grad_fn->result_ = SavedVariable(result, true);
  }
```

SavedVariable calls a series of functions which in turn calls shallow_copy_and_detach and when c179597753/c10/core/TensorImpl.cpp (L533) is hit this calls sizes_custom() which is not implemented and errors.  I also noticed that since the storage format is different for nested_tensor not `storage_ ` but instead two tensors that the we should actually be calling the NestedTensorImpl constructor.

This PR overrides shallow_copy_and_detach from the derived class and ensures that shallow copy works correctly.

## Update
- Added the softmax derivative in this PR because that is a direct use case that was blocked by not having shallow_copy_and_detach work correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83002
Approved by: https://github.com/albanD
2022-08-10 20:34:46 +00:00
Driss Guessous
e816644495 Add nested tensor contiguous (#82147)
### Description
<!-- What did you change and why was it needed? -->
The nested_tensor impl for `contiguous` was currently disabled. Prior to the work on nested_tensor transpose. Only contiguous nested tensors could be created from python. However now is possible to create nested tensors that are non contiguous. This  pr links up the existing function used at the c++ level to the python function.

### Tests
Updated Test in `test/test_nestedtensor.py`

### Notes
The inference mode had to be removed for this test. This is because the func `.contiguous` is a composite implicit function. Currently this does not work in inference mode. However: https://github.com/pytorch/pytorch/pull/81838 should fix that issue.

### Why
When writing kernels in Triton for nested tensors I exposed a helper function that returned the "Buffer" tensor to python. Now contiguity can be checked before running any triton kernel. Also a good follow up would be making `nt.contiguous` on non contiguous nested tensors return a contiguous nested tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82147
Approved by: https://github.com/jbschlosser
2022-08-09 01:51:37 +00:00
Joel Benjamin Schlosser
6ca95547ac Initial private SDP interface and naive composite impl (#81956)
Adds an initial private API version of the SDP interface.

Signature:
```
_scaled_dot_product_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None,
    float dropout_p=0.0, bool need_attn_weights=True, bool is_causal=False) -> (Tensor, Tensor)
```

Returns a tuple of `(output, attn_weights)`.

Note the following:
* `need_attn_weights`: flag indicating that attention weights should be computed. This is useful to toggle off for flash attention as it does not materialize the weights by default, making it more expensive to return them.
* Boolean attention mask support only; `True` values within `attn_mask` indicate that the element should take part in attention (notably, this is reverse of MHA, which uses `True` to mask *out* values). Mask is optional.
* `is_causal`: Temporary flag indicating whether to use a causal attention weighting. If this is set to `True`, it takes precedent over any value passed in for `attn_mask`. Longer term, the `is_causal` flagging can be subsumed into the `attn_mask` arg via tensor subclassing (see e.g. [CausalTensor](https://github.com/facebookresearch/xformers/blob/sparse_cleanup/xformers/sparse/causal_tensor.py) in xFormers).
* Testing is currently done via reference with the existing Python impl of `F._scaled_dot_product_attention`.
* This PR does not yet drop-in the new SDP anywhere. A future PR can hook it up in BT or MHA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81956
Approved by: https://github.com/drisspg, https://github.com/erichan1
2022-08-01 22:26:18 +00:00
YifanShenSZ
4bb7e148c4 add nested tensor matmul support (#81957)
There was a discussion on whether letting nested tensor `reshape` support collapsing and splitting dimension 0. The conclusion was to make reshape simple, so we need a tweaked `matmul`, which only supports 3+ dimension nonbroadcast case, i.e. a generalized `bmm`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81957
Approved by: https://github.com/jbschlosser
2022-07-30 22:35:09 +00:00
YifanShenSZ
5f9939f65e Introduce discontinuity to nested tensor (#80981)
Nested tensor used to assume the buffer memory to be contiguous. However, some operations can break that assumption:
* reshape
* transpose
* slice

To be able to access underlying tensors from discontinuous buffer, we need 3 metadata:
* sizes of each tensor (`nested_size_tensor_`)
* strides of each tensor (`nested_stride_tensor_`)
* offset of each tensor (`offsets_`)

so we access each tensor by `buffer.as_strided(size, stride, offset)`

This pull request introduces the offsets metadata, then added reshape and transpose so that we can create discontinuous cases for testing. Unbind, select, dropout, softmax, bmm are refactored to provide tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80981
Approved by: https://github.com/jbschlosser
2022-07-30 04:08:30 +00:00
Mikayla Gawarecki
89c0123ba0 Add rudimentary NestedTensor.sum(dim) (#82387)
A first step towards adding dimension-wise reductions to NestedTensor,
- Assumes tensors in the nested tensor as well as the buffer of the nested tensor are contiguous
- Always enforces `keepdim=True`
- Only supports reduction across the last dimension
- No support for acctype (`dtype` argument)
- No autograd support
- CPU only

Next steps would be to add support for the above. For now this basic support is for prototyping to make sure `NestedTensor` can be used as an API for segment reductions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82387
Approved by: https://github.com/jbschlosser
2022-07-28 22:45:22 +00:00
PyTorch MergeBot
26776d628c Revert "Initial private SDP interface and naive composite impl (#81956)"
This reverts commit f15c5bf133.

Reverted https://github.com/pytorch/pytorch/pull/81956 on behalf of https://github.com/janeyx99 due to broke all configs on test_scaled_dot_product_attention (__main__.TestNestedTensorAutograd) f15c5bf133
2022-07-27 18:36:54 +00:00
Joel Benjamin Schlosser
f15c5bf133 Initial private SDP interface and naive composite impl (#81956)
Adds an initial private API version of the SDP interface.

Signature:
```
_scaled_dot_product_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None,
    float dropout_p=0.0, bool need_attn_weights=True, bool is_causal=False) -> (Tensor, Tensor)
```

Returns a tuple of `(output, attn_weights)`.

Note the following:
* `need_attn_weights`: flag indicating that attention weights should be computed. This is useful to toggle off for flash attention as it does not materialize the weights by default, making it more expensive to return them.
* Boolean attention mask support only; `True` values within `attn_mask` indicate that the element should take part in attention (notably, this is reverse of MHA, which uses `True` to mask *out* values). Mask is optional.
* `is_causal`: Temporary flag indicating whether to use a causal attention weighting. If this is set to `True`, it takes precedent over any value passed in for `attn_mask`. Longer term, the `is_causal` flagging can be subsumed into the `attn_mask` arg via tensor subclassing (see e.g. [CausalTensor](https://github.com/facebookresearch/xformers/blob/sparse_cleanup/xformers/sparse/causal_tensor.py) in xFormers).
* Testing is currently done via reference with the existing Python impl of `F._scaled_dot_product_attention`.
* This PR does not yet drop-in the new SDP anywhere. A future PR can hook it up in BT or MHA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81956
Approved by: https://github.com/drisspg, https://github.com/erichan1
2022-07-27 15:41:45 +00:00
PyTorch MergeBot
500be5998d Revert "Introduce discontinuity to nested tensor (#80981)"
This reverts commit b492f7c485.

Reverted https://github.com/pytorch/pytorch/pull/80981 on behalf of https://github.com/osalpekar due to This was reverted internally in D38142790, due to causing TorchScript inference failures
2022-07-26 21:40:42 +00:00
PyTorch MergeBot
0b0dbc59e6 Revert "Update shallow_copy_and_detach for nested tensor impls to enable nested tensor softmax backward (#81838)"
This reverts commit 6697f1e467.

Reverted https://github.com/pytorch/pytorch/pull/81838 on behalf of https://github.com/osalpekar due to Reverting this in order to revert https://github.com/pytorch/pytorch/pull/80981 cleanly. That diff caused GPU Inference breakage internally
2022-07-26 21:34:10 +00:00
PyTorch MergeBot
6c10a598ca Revert "add nested tensor matmul support (#81957)"
This reverts commit 7bdafed4f1.

Reverted https://github.com/pytorch/pytorch/pull/81957 on behalf of https://github.com/osalpekar due to Reverting this in order to revert https://github.com/pytorch/pytorch/pull/80981 cleanly. That diff caused GPU Inference breakage internally
2022-07-26 21:10:28 +00:00
YifanShenSZ
7bdafed4f1 add nested tensor matmul support (#81957)
There was a discussion on whether letting nested tensor `reshape` support collapsing and splitting dimension 0. The conclusion was to make reshape simple, so we need a tweaked `matmul`, which only supports 3+ dimension nonbroadcast case, i.e. a generalized `bmm`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81957
Approved by: https://github.com/jbschlosser
2022-07-26 16:58:42 +00:00
Driss Guessous
6697f1e467 Update shallow_copy_and_detach for nested tensor impls to enable nested tensor softmax backward (#81838)
# Summary
This change fixes a bug that was encountered when trying to add more backward formulas for nested tensor ops.  If a derivative is defined that stores the "result" for use in the backward the output of the forward op is saved using:
```
 if (grad_fn) {
    grad_fn->result_ = SavedVariable(result, true);
  }
```

SavedVariable calls a series of functions which in turn calls shallow_copy_and_detach and when c179597753/c10/core/TensorImpl.cpp (L533) is hit this calls sizes_custom() which is not implemented and errors.  I also noticed that since the storage format is different for nested_tensor not `storage_ ` but instead two tensors that the we should actually be calling the NestedTensorImpl constructor.

This PR overrides shallow_copy_and_detach from the derived class and ensures that shallow copy works correctly.

## Update
- Added the softmax derivative in this PR because that is a direct use case that was blocked by not having shallow_copy_and_detach work correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81838
Approved by: https://github.com/soulitzer
2022-07-25 20:04:40 +00:00
Yifan Shen
b492f7c485 Introduce discontinuity to nested tensor (#80981)
Nested tensor used to assume the buffer memory to be contiguous. However, some operations can break that assumption:
* reshape
* transpose
* slice

To be able to access underlying tensors from discontinuous buffer, we need 3 metadata:
* sizes of each tensor (`nested_size_tensor_`)
* strides of each tensor (`nested_stride_tensor_`)
* offset of each tensor (`offsets_`)

so we access each tensor by `buffer.as_strided(size, stride, offset)`

This pull request introduces the offsets metadata, then added reshape and transpose so that we can create discontinuous cases for testing. Unbind, select, dropout, softmax, bmm are refactored to provide tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80981
Approved by: https://github.com/jbschlosser
2022-07-21 17:17:25 +00:00
Driss Guessous
fca1523604 implement numel and tests for nested tensor (#80424)
Add numel implementation for Nested Tensor. Currently the construction of nested size and nested_strides assume contiguous. This implementation was based off of the safe_compute_numel().  Having a TORCH_CHECK in a for loop kinda feels bad but I don't really know how performant numel needs to be.

Since nested size is stored as a tensor: `nested_size_tensor().cumprod(dim=1).sum(dim=0)[1].item() ` Would also get the job done.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80424
Approved by: https://github.com/cpuhrsch
2022-06-28 18:02:44 +00:00
drisspg
2a09e95169 Register nested tensor linear kernel (#80397)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80397
Approved by: https://github.com/soulitzer
2022-06-28 06:23:26 +00:00
Christian Puhrsch
2258db5da3 TensorImpl:::size_custom to support NestedTensor.size (#80236)
This allows subclasses such as NestedTensorImpl to provide special behavior for `int64_t size(int64_t d)` that'll also be accessible by our Python frontend.

It follows the same pattern as sizes_custom.

Currently getting CI before asking for a review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80236
Approved by: https://github.com/ezyang
2022-06-27 17:07:42 +00:00
Yifan Shen
09f79e94ac support nested_tensor * scalar (#80284)
In transformer, the scale step in attention has a `nested_tensor / scalar` operation. There are two ways to support that:
1. directly support `nested_tensor / scalar`:
* pro: straightforward, good UX
* con: is dispatching `mul(nested tensor, regular tensor)` a good practice?
2. let user manually convert `scalar` to `nested_scalar = torch.nested_tensor([broadcast_scalar])`
* pro: dispatcher only has to deal with `mul(nested tensor, nested tensor)`
* con: confusing manual conversions, bad UX
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80284
Approved by: https://github.com/cpuhrsch
2022-06-27 14:15:05 +00:00
Yifan Shen
fc0faa2cf6 Support nested_tensor.bmm (#80224)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80224
Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser
2022-06-25 03:19:46 +00:00
Yifan Shen
54a1cc5246 Support softmax(nested tensor) (#80179)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80179
Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser
2022-06-24 14:50:24 +00:00
Yifan Shen
f749f86fee Add nested tensor metadata nested_stride then use it in unbind, select (#79831)
2 reasons to add metadata `nested_stride`:
1. it will be used later in `reshape` and `transpose`
2. it reduces the computation to get offsets and shapes necessary in `unbind`-like codes, which will be used again and again in nested tensor operations

`unbind` and `select` are refactored to make use of `nested_stride`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79831
Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser
2022-06-23 20:24:50 +00:00
Driss Guessous
a098937c20 Add factory function derivatives (#79872)
Adding derivatives for factory functions, this issue is used for tracking: #79044

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79872
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
2022-06-21 00:53:11 +00:00
Edward Z. Yang
f7ee061638 Wconstab/reland pysymint (#79795)
rebased https://github.com/pytorch/pytorch/pull/79617/ to see if issues are reproducible.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79795
Approved by: https://github.com/malfet
2022-06-20 22:55:06 +00:00
Yifan Shen
1b25aa6786 Support dropout(nested tensor) (#79318)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79318
Approved by: https://github.com/jbschlosser
2022-06-17 18:41:54 +00:00
PyTorch MergeBot
8a7a5def1d Revert "Support dropout(nested tensor) (#79318)"
This reverts commit 1211ab679c.

Reverted https://github.com/pytorch/pytorch/pull/79318 on behalf of https://github.com/janeyx99 due to Broke dropout tests on trunk, also errors on PR
2022-06-17 04:56:29 +00:00
Yifan Shen
1211ab679c Support dropout(nested tensor) (#79318)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79318
Approved by: https://github.com/jbschlosser
2022-06-17 00:46:07 +00:00