Commit Graph

643 Commits

Author SHA1 Message Date
PyTorch MergeBot
d59a6864fb Revert "[BE]: Update ruff to 0.285 (#107519)"
This reverts commit 88ab3e4322.

Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))
2023-08-22 19:53:32 +00:00
Aaron Gokaslan
88ab3e4322 [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-20 01:36:18 +00:00
Mikayla Gawarecki
1317dbf176 Reland "Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148)" (#106632)
Previous one was reverted because the PR stacked under which added error-checking to Pad variants https://github.com/pytorch/pytorch/pull/106147 was reverted as internally some people pass 2D inputs to ZeroPad2d (which should actually take 3d or 4d inputs :) but there wasn't actually anything this PR was breaking according to my understanding

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106632
Approved by: https://github.com/albanD
2023-08-07 20:10:25 +00:00
PyTorch MergeBot
dfcfd5cedb Revert "Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148)"
This reverts commit 87d2536971.

Reverted https://github.com/pytorch/pytorch/pull/106148 on behalf of https://github.com/malfet due to Reverting as dependent PR https://github.com/pytorch/pytorch/pull/106147 was reverted as well ([comment](https://github.com/pytorch/pytorch/pull/106148#issuecomment-1662344543))
2023-08-02 14:46:00 +00:00
Mikayla Gawarecki
87d2536971 Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148)
Fixes #105749 https://github.com/pytorch/pytorch/issues/95320

(tldr is that input should always be `[N, C, H, (W, D])` where only H, W and D dimensions get circular padding, so the 2D case where user wants both dimensions to be padded --> they should `.unsqueeze(0)` (as is the case for `Reflection/ReplicationPad`) but we didn't document this for circular padding. [This seems to be the old docstring](277b05014a/torch/nn/functional.py (L4689)) that was somehow lost.

Fixes no_batch_dim support https://github.com/pytorch/pytorch/issues/104860

- Adds missing documentation for circular padding
- Adds missing CircularPad modules
- Migrates legacy test_nn tests from circular padding to ModuleInfo
- Adds no_batch_dim support + sample inputs that test this

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106148
Approved by: https://github.com/albanD
ghstack dependencies: #106325, #106147
2023-08-01 12:49:58 +00:00
FFFrog
9a1cdcb8a0 Format: fixing multiple string concatenation in single line (#106013)
Fixing multiple string concatenation in single line
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106013
Approved by: https://github.com/albanD
2023-07-26 18:39:18 +00:00
lezcano
9bde7f4e27 Fix the docs for cosine_similarity (#104772)
The behaviour of `cosine_similarity` was subtly changed in
https://github.com/pytorch/pytorch/pull/31378, but the docs were not
updated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104772
Approved by: https://github.com/albanD, https://github.com/svekars
2023-07-26 09:23:09 +00:00
Justin Chu
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
Justin Chu
79c5e33349 [BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436
Approved by: https://github.com/malfet, https://github.com/albanD
2023-07-21 07:38:46 +00:00
drisspg
2ee440054b Small tweaks to SDPA docs (#104749)
Fixes #104652

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 2d61112</samp>

No summary available (An error occurred while summarizing these changes: Gave up after 3 retries: Failed to read error response)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104749
Approved by: https://github.com/mikaylagawarecki
2023-07-10 21:01:45 +00:00
yewentao
d3ba8901d8 Adding precision issue note docs for functional.interpolate (#104622)
Fixes #104157

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104622
Approved by: https://github.com/ezyang
2023-07-05 16:20:57 +00:00
vfdev
4ab140902b [docs] Fixed typo in grid_sample doctring (#104406)
Fixed a small typo in grid_sample doctring:

<img width="265" alt="image" src="https://github.com/pytorch/pytorch/assets/2459423/1d2dd7a2-895a-4683-9d9f-a4d1d9d1a4a7">

- https://pytorch.org/docs/main/generated/torch.nn.functional.grid_sample.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104406
Approved by: https://github.com/mikaylagawarecki, https://github.com/svekars
2023-06-29 19:44:54 +00:00
Ryan Smith
6bda97e2c1 Raise type error message for interpolate if size contains non-integer elements (#99243)
Raise type error message for interpolate when output size is a tuple containing elements that are not `int`

Fixes #98287

Check is only performed if `size` is an instance of `list` or `tuple`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99243
Approved by: https://github.com/Skylion007, https://github.com/Neilblaze, https://github.com/MovsisyanM, https://github.com/albanD
2023-06-23 00:48:45 +00:00
MysticalMusings
f1f13a35b0 Fix GELU-related docstring formatting (#102845)
The docstring about GELU seems formatted incorrectly. The original docstring about GELU is rendered as below:

$$ \text{GELU}(x) = 0.5 * x * (1 + \text{Tanh}(\sqrt(2 / \pi) * (x + 0.044715 * x^3))) $$

where the square root of which part is confusing.

I double-checked the formula, which should be:

$$ \text{GELU}(x) = 0.5 * x * (1 + \text{Tanh}(\sqrt{2 / \pi} * (x + 0.044715 * x^3))) $$

where round brackets in resource code should be brace brackets.

> _formula in [original paper](https://arxiv.org/abs/1606.08415)_
> ![Snipaste_2023-06-03_00-43-49](https://github.com/pytorch/pytorch/assets/39690782/22511c4e-2f20-4a16-9bda-4c182a360160)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102845
Approved by: https://github.com/mikaylagawarecki
2023-06-08 20:19:03 +00:00
cviviers
81c181dc01 Update BCEWithLogitsLoss pos_weight description in documentation (#101567)
Fixes #82496 and #65702

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101567
Approved by: https://github.com/mikaylagawarecki
2023-05-19 21:23:21 +00:00
Edward Z. Yang
c567748e16 Make interpolate_bilinear deterministic using decomposition (#101115)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101115
Approved by: https://github.com/ngimel
2023-05-11 22:48:01 +00:00
Joel Schlosser
bd9d50a3fc Remove future deprecation warning from kl_div docs (#96541)
Fixes #95687
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96541
Approved by: https://github.com/albanD
2023-05-05 23:01:21 +00:00
soulitzer
6585d76f0f [docs] nn.functional.embedding: Note expected discrepancy between numerical and analytical gradients (#99181)
*

Fixes https://github.com/pytorch/pytorch/issues/93950
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99181
Approved by: https://github.com/albanD
2023-04-22 02:30:53 +00:00
mega-optimus
06081ac8f3 Update docstring of torch.nn.functional.normalize() (#99512)
Fixes #99125

torch.nn.functional.normalize() already supports dim=tuple(int), but the docstring says int only.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99512
Approved by: https://github.com/albanD
2023-04-21 16:45:24 +00:00
ts
dbf0db958f Fix torch.nn.FractionalMaxPool2d output_size error (#99507)
Fixes #99148 , raising an error if output_ratio's size > 2.

Justification for changes:

If an output size is not specified but an output ratio is, we call fractional_max_pool2d_with_indices. We then generate the value of output_size based on the first two integers of the output_ratio (line ~480 of torch.nn.functional.py).

Thus, we should raise a value error in the case that the user passes an output_ratio (instead of an output_size) and the number of elements in output_ratio exceeds two. We must raise an error before calling torch._C._nn.franctional_max_pool2d as the value of output_size passed into torch._C._nn.fractional_max_pool2d is guaranteed to be of size 2 (as the existing code generates it from the first two indices of the passed in ratio).

I would be happy to iterate on this if there are any issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99507
Approved by: https://github.com/mikaylagawarecki
2023-04-21 14:38:25 +00:00
Kazuaki Ishizaki
a531a464fd Fix typos under torch/nn directory (#97594)
This PR fixes typos in comments of `.py` files under `torch/nn` directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97594
Approved by: https://github.com/dagitses, https://github.com/kit1980
2023-04-10 22:07:15 +00:00
Mikayla Gawarecki
73b06a0268 Fix rendering of arguments for nn.functional ops that use boolean_dispatch (#98092)
Fix #97982

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98092
Approved by: https://github.com/albanD
2023-04-03 21:17:43 +00:00
Aaron Gokaslan
597b558c51 [BE]: Update flake8 and plugins and fix bugs (#97795)
Update flake8 and flake8-plugins in lintrunner to a modern version. Enables more checks and makes flake8 checks significantly faster. Added a few additional rule ignores that will need to be fixed in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97795
Approved by: https://github.com/alexsio27444, https://github.com/janeyx99, https://github.com/ezyang
2023-03-28 23:51:55 +00:00
Michael Gschwind
c757647dd8 [Better Transformer] make is_causal a hint and force attn_mask to be set on is_causal=True in F.MHA (#97214)
Summary:
This fixes an issue raised in [is_causal parameter in torch.nn.TransformerEncoderLayer.forward does not work #96941](https://github.com/pytorch/pytorch/issues/96941) where results computed with is_causal do not properly reflect causal masking.

In PyTorch 2.0, Accelerated PT Transformers added the is_causal parameter to legacy nn.Transformer* and nn.MHA APIs aligned with and intended to engage the is_causal parameter of the new scaled_dot_product_attention (SDPA) operator.

At present is_causal works differently for Transformer* modules, the nn.MHA and F.MHA:
* The nn.Transformer* modules treat is_causal as an optional indicator about the format of attn_mask. This is because some layers (such as the CLIP layer use the attention mask in the layer, and thus the attn_mask was a required feature.)
* Initially, nn.MHA and F.MHA were defined to align with F.SDPA in behavior: a user may specify either the attention mask, or is_causal, but not both.  It seemed to make sense at the time to align SDPA and MHA, esp since there was a larger overlap of parameters which have since changed, e.g., with the removal of need_weights from SDPA. (See below for why this makes sense.)

Unfortunately, this does not work because of how MHA was changed to handle the need_weights parameter.  When need_weights is present, we do not (any more) call SDPA because support for need_weights was removed from SDPA before the release.  The rationale is that need_weights defeats all optimization at the foundation of SDPA performance.  Having the flag might thus mislead users into thinking they get good performance and have them disappointed when they enable a legacy feature of MHA which massively degrades performance.  (They might not think anything of enabling that, because it is on by default in MHA today, which leads to more  issues.)

Since SDPA does not (no longer) support need_weights, we need to pick a separate path which implements attention using a set of discrete operations that allocates a tensor for weights.  Alas, this code path does not have support for is_causal, because attention is implemented as matmul and using the attention mask.  Thus, is_causal has no impact.  (A substantially similar situation arises with how kpm is implemented today because Nested Tensors are not supported by torch.compile() in 2.0)

This problem was masked because all uses of legacy nn.MHA (and F.MHA) come through nn.Transformer* which called self-attention (i.e., nn.MHA) only ever with the attention mask attn_mask, and never with is_causal, a missed optimization opportunit that would have been addressed in a future performance update.

Regrettably, always calling nn.MHA with attn_mask prevented diagnosing of the issue of not having a suitable attention mask when need_weights support was dropped from SDPA and a discrete implementation of attention was added for that scenario, and for the execution path with key_padding_mask.

We have two options to address this issue:

Solution 1: Whenever nn.MHA and F.MHA are executed with is_causal set, we internally create a causal mask at significant expense of allocating a tensor and filling it with a triangular causal matrix.  This increases memory usage, and runtime, for allocating a causal mask.  To add insult to injury, in all current (and likely future) execution scenarios, MHA is called by a model using the nn.Transformer API which already has that matrix and passes it from nn.module to nn.module.  Then the passing in of attn_mask has to be suppressed by nn.TransformerEncoderLayer, only for nn.MHA to immediately allocate the very same tensor again to satisfy the requirement to have an attention mask for the computation. (We expect new use cases to use SDPA directly.)

Solution 2: We align the behavior of nn.MHA and F.MHA with the rest of the existing nn.Transformer API, and require the attention mask to be passed into nn.MHA in addition to is_causal as an optional indicator about the nature of the attention mask rather than as an alternative to attn_mask.  Then, when we choose the code path for processing MHA with need_weights or a key_padding_mask, we have the attn_mask passed down through the nn.Transformer* hierarchy, without the added overhead of allocating an attention mask as in scenario 1.

This PR implements solution 2 which offers better performance and in retrospect aligns MHA better with the rest of the Transformer modules as the definition of SDPA evolved into a more streamlined high-performance operator.  It ostensibly changes how is_causal works, by requiring the attention mask to be specified.  However, as described here, and as shown in the submitted issue, is_causal is not working as intended today, so it requires a change regardless.

In that sense, a change in API does not occur per-se, as the current implementation is not working, and a change has to occur either way to resolve the submitted issue, breaking any use cases that depend on the current implementation.  Checks exist (and more can be added) that flag any scenarios where is_causal is passed as True, but no attention mask is provided, ensuring that there's not quiet change from even the faulty behavior present in 2.0.

As  an upside, the present implementation will improve performance by addressing the passing of the is_causal flag from Transformer modules to MHA, speeding up training for these examples, e.g., finetuning BERT, RoBERTa, XLM-R models.

Differential Revision: D44245725

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97214
Approved by: https://github.com/albanD
2023-03-25 01:36:30 +00:00
CedricPicron
cf0ba1b9c0 Use L1 loss for Smooth L1 loss with beta=0 (#97022)
Fixes #96813.

Comments:

1. Wasn't able to test since tools/nightly.py does not allow for GPU build (and I don't want to build from scratch).
2. In theory, the bug (i.e. NaNs) can still occur when beta is very small (e.g. `beta=1e-50`), but not sure whether anybody cares.
3. Some checks within the smooth_l1_loss C++ code could be changed to check for `beta > 0` instead of `beta >= 0`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97022
Approved by: https://github.com/jbschlosser
2023-03-24 19:10:32 +00:00
Michael Gschwind
61cb544397 Align mask formatting of both masks more closely (#96286)
Summary: Align mask formatting of both masks more closely

Test Plan: sandcastle & github

Differential Revision: D43878634

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96286
Approved by: https://github.com/cpuhrsch
2023-03-11 02:18:05 +00:00
Driss Guessous
11aab72dc9 [SDPA] Add an optional scale kwarg (#95259)
# Summary
This PR adds an optional kwarg to torch torch.nn.functional.scaled_dot_product_attention()
The new kwarg is a scaling factor that is applied after the q@k.T step of the computation. Made updates to the efficient kernel to support but flash and math were minimally updated to support as well.

Will reduce the complexity of: #94729 and has been asked for by a couple of users.

# Review Highlights
- As far as I know I did this the correct way and this both BC and FC compliant. However I always seem to break internal workloads so I would love if someone can advice I did this right?
- I named the optional arg 'scale'. This is probably dumb and I should name it 'scale_factor'. I will make this change but this is annoying and it will require someone thinking we should rename.
- 'scale' is interpreted as `Q@K.T * (scale)`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95259
Approved by: https://github.com/cpuhrsch
2023-03-08 18:07:40 +00:00
Michael Gschwind
03b6e6979c Transformers: fix src and key padding mask bool regression (#96009)
Summary: fix src and pad mask bool regression

This fixes a regression introduced previously with #92733. That PR unified testing of masks to remove Byte Tensors as permissible mask, introduced mask compatibility check, and mask conversion to FP mask.  The problem addressed in this PR was that after the first mask had been converted, a check for mask compatibility would fail.

Test Plan: sandcastle & github

Differential Revision: D43782858

Fixes  https://github.com/pytorch/pytorch/issues/95702

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96009
Approved by: https://github.com/malfet
2023-03-05 01:50:46 +00:00
soulitzer
e5c2a35d83 Add check that embedding_bag's weight is 2D (#94931)
Fixes https://github.com/pytorch/pytorch/issues/94445

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94931
Approved by: https://github.com/albanD
2023-02-16 02:37:47 +00:00
Driss Guessous
70026aaad6 [SDPA] update type hint for scaled_dot_product_attention and documentation (#94008)
# Summary
- Adds type hinting support for SDPA
- Updates the documentation adding warnings and notes on the context manager
- Adds scaled_dot_product_attention to the non-linear activation function section of nn.functional docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94008
Approved by: https://github.com/cpuhrsch
2023-02-10 18:02:43 +00:00
Natalia Gimelshein
a5daea69fb teach inductor to handle floor (#94341)
Per title, happen when there's upsampling with non-integer scale.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94341
Approved by: https://github.com/ezyang
2023-02-10 11:21:57 +00:00
PyTorch MergeBot
6007874bbb Revert "teach inductor to handle floor (#94341)"
This reverts commit e7df9aaec8.

Reverted https://github.com/pytorch/pytorch/pull/94341 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but the CudaTest failure looks related.  It fails on both PR and trunk e7df9aaec8
2023-02-09 19:31:08 +00:00
Natalia Gimelshein
e7df9aaec8 teach inductor to handle floor (#94341)
Per title, happen when there's upsampling with non-integer scale.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94341
Approved by: https://github.com/ezyang
2023-02-09 17:09:35 +00:00
milesial
6c555b29a8 MHA optimizations (#93234)
Slight perf optimizations for regular MHA by reducing the number of kernels called

Before:
![image](https://user-images.githubusercontent.com/30204471/215349212-172c6364-9e3c-4fd1-92b6-8ddd9931613e.png)

After:
![image](https://user-images.githubusercontent.com/30204471/215349247-021dd9e6-f6ca-40a2-8de8-0805af001f69.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93234
Approved by: https://github.com/drisspg
2023-02-03 15:18:35 +00:00
Driss Guessous
3df0e26e20 [SDPA] Remove private version and only utilize public version (#94004)
# Summary
Due to internal failures we needed to keep the private call in torch.nn.mha. This PR undoes this change, so that we call the public function and remove the private function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94004
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2023-02-03 08:12:09 +00:00
103yiran
d9117b93fb unsqueeze only when dim = 3 (#91052)
unsqueeze is not necessary if use view

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91052
Approved by: https://github.com/albanD
2023-01-31 16:28:23 +00:00
Driss Guessous
ca8f5e177a Use the old aten underscored function for Predictor (#93096)
Summary:
Errors reported via https://fb.prod.workplace.com/groups/1405155842844877/permalink/6644919482201794/

The problem is that the scriptable op set between predictor and the latest build of master is different.

Test Plan: Sandcastle testing

Differential Revision: D42786069

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93096
Approved by: https://github.com/mikekgfb
2023-01-28 03:14:18 +00:00
Michael Gschwind
7265f60ad0 Regularize mask handling for attn_mask and key_padding_mask (#92733)
Summary:
Regularize mask handling for attn_mask and key_padding_mask
* Update documentation to remove reference to byte masks (which were deprecated long ago)
* Introduce check and warn about deprecation if attn_mask and key_padding_mask types mismatch
* Convert all masks to float before combining
* Combine by adding

Test Plan: sandcastle & github CI

Differential Revision: D42653215

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92733
Approved by: https://github.com/ngimel, https://github.com/drisspg
2023-01-24 14:12:05 +00:00
Driss Guessous
df14650f0b [SDPA] Update SDPA API and make function Public (#92189)
# Summary
In preparation for pt 2.0 launch this PR updates SDPA's API and makes the function a nn.funcitonal public function.

## Changes
### API
Previously the the function signature was:
`scaled_dot_product_attention(query, key, value, attn_mask=None, need_attn_weights=False, dropout_p=0.0, is_causal=False) -> (Tensor, Tensor)`
Updated signature:
`scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0, is_causal=False) -> Tensor`

This PR removes the need_attn_weights optional boolean variable and updates the return type to a singular tensor.

#### Reasoning:
The main goal of this function is to provide an easy interface for users to call into fused attention kernels e.g.  (FlashAttention). The fused kernels do not currently support arbitrary attn_mask or dropout but there is a PR to mem-efficient attention to enable these. We want to have the API surface ready for when the backing kernels get updated.

The fused kernels save on memory usage by not materializing the weights and it is unlikely that a fast fused implementation will enable this feature so we are removing.

Discussed with folks at FAIR/Xformers and +1 this API change.

#### Make function Public
In preparation for the pt 2.0 launch we make the function public to start to generate user feedback

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92189
Approved by: https://github.com/cpuhrsch
2023-01-23 20:50:46 +00:00
Michael Gschwind
af589b3d1f switch causal mask for is_causal flag (#91171)
Summary: switch causal mask for is_causal flag

Test Plan: sandcastle & github

Differential Revision: D42089340

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91171
Approved by: https://github.com/wushirong, https://github.com/drisspg
2022-12-30 17:24:58 +00:00
joncrall
ad782ff7df Enable xdoctest runner in CI for real this time (#83816)
Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816
Approved by: https://github.com/ezyang, https://github.com/malfet
2022-12-29 05:32:42 +00:00
Joel Schlosser
3d8834bdbf SymIntify F.interpolate() with recompute_scale_factor=True (#91318)
This PR makes the minor changes necessary to get `F.interpolate()` working with symbolic shapes when `recompute_scale_factor=True` + adds `OpInfo` samples to test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91318
Approved by: https://github.com/ezyang
2022-12-29 01:42:56 +00:00
Michael Gschwind
512ec181ec Introduce causal mask (#90508)
Summary: Introduce causal mask

This PR introduces a causal mask option _causal_mask (as well as causal mask detection if attn_mask is provided), since current custom kernels do not support arbitrary masks.

Test Plan: sandcastle & github ci/cd

Differential Revision: D41723137

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90508
Approved by: https://github.com/albanD
2022-12-16 21:39:42 +00:00
Driss Guessous
78bdb858f9 Call _sdp_attention in nn.functional.mha (#89470)
# Summary
Replaces the the inline block of code in nn.funcitonal.mha with `_scaled_dot_product_attention`. This function allows the fused kernels to be called if all the required input conditions are met.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89470
Approved by: https://github.com/cpuhrsch, https://github.com/mikekgfb
2022-12-02 19:46:22 +00:00
PyTorch MergeBot
f1415b8cb6 Revert "Call _sdp_attention in nn.functional.mha (#89470)"
This reverts commit 4d7ec30220.

Reverted https://github.com/pytorch/pytorch/pull/89470 on behalf of https://github.com/jeanschmidt due to breaking internal builds
2022-11-30 16:16:24 +00:00
PyTorch MergeBot
618a585f6c Revert "replace double transpose with single permute in nn.f.mha (#89847)"
This reverts commit b9afa92827.

Reverted https://github.com/pytorch/pytorch/pull/89847 on behalf of https://github.com/jeanschmidt due to Need to revert this commit as it is causing conflict when reverting #89470
2022-11-30 16:03:48 +00:00
Driss Guessous
b9afa92827 replace double transpose with single permute in nn.f.mha (#89847)
# Summary

I forgot about permute which was exactly what I wanted. Quick perf bump
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89847
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2022-11-29 22:18:42 +00:00
Driss Guessous
4d7ec30220 Call _sdp_attention in nn.functional.mha (#89470)
# Summary
Replaces the the inline block of code in nn.funcitonal.mha with `_scaled_dot_product_attention`. This function allows the fused kernels to be called if all the required input conditions are met.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89470
Approved by: https://github.com/cpuhrsch, https://github.com/mikekgfb
2022-11-29 03:02:10 +00:00
foram-chandra
e19a7165fd [nn] Remove deprecation warning from nn.functional.{tanh, sigmoid} (#86905)
Fixes #65909

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86905
Approved by: https://github.com/albanD, https://github.com/kit1980
2022-11-24 00:34:26 +00:00
Nikita Karetnikov
0a1a53083e [primTorch] Enable regex error testing for some refs (#87765)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87765
Approved by: https://github.com/mruberry
2022-11-23 23:36:27 +00:00