Commit Graph

1899 Commits

Author SHA1 Message Date
XiaobingSuper
136dadd689 fix norrow_copy correctness issue for non-contiguous input for cpu path (#91789)
Fix https://github.com/pytorch/pytorch/issues/91690.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91789
Approved by: https://github.com/jgong5, https://github.com/lezcano
2023-01-09 00:55:03 +00:00
PyTorch MergeBot
b3603f8129 Revert "Deduplicate c10 error and PyTorchError hierarchy (#87855)"
This reverts commit 34f2d3e6ae.

Reverted https://github.com/pytorch/pytorch/pull/87855 on behalf of https://github.com/osalpekar due to perf regression in quantization tests
2023-01-06 19:56:35 +00:00
William Phetsinorath
34f2d3e6ae Deduplicate c10 error and PyTorchError hierarchy (#87855)
Fixes #53370

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87855
Approved by: https://github.com/albanD
2023-01-02 15:53:36 +00:00
ecao
274d3b24c3 use scatter_add for index_add when dim is the most inner dim (#88729)
### Motivation
When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations.

`scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel.

### Testing

- Single core:

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.82E-03 | 2.11E-03
[10, 128, 50, 50] | 0.023604 | 0.023794

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 9.30E-04 | 1.66E-03
[10, 128, 50, 50] | 0.005995 | 0.010003

- Single socket (28 cores):

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.96E-03 | 2.52E-03
[10, 128, 50, 50] | 0.012208 | 0.012568

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 7.44E-05 | 1.33E-04
[10, 128, 50, 50] | 0.000333 | 0.000469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet
2022-12-28 12:04:17 +00:00
PyTorch MergeBot
eadd557266 Revert "use scatter_add for index_add when dim is the most inner dim (#88729)"
This reverts commit 68e9da68cb.

Reverted https://github.com/pytorch/pytorch/pull/88729 on behalf of https://github.com/atalman due to Break internal build
2022-12-22 18:06:45 +00:00
ecao
68e9da68cb use scatter_add for index_add when dim is the most inner dim (#88729)
### Motivation
When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations.

`scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel.

### Testing

- Single core:

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.82E-03 | 2.11E-03
[10, 128, 50, 50] | 0.023604 | 0.023794

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 9.30E-04 | 1.66E-03
[10, 128, 50, 50] | 0.005995 | 0.010003

- Single socket (28 cores):

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.96E-03 | 2.52E-03
[10, 128, 50, 50] | 0.012208 | 0.012568

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 7.44E-05 | 1.33E-04
[10, 128, 50, 50] | 0.000333 | 0.000469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet
2022-12-22 01:13:35 +00:00
PyTorch MergeBot
3194281ca7 Revert "use scatter_add for index_add when dim is the most inner dim (#88729)"
This reverts commit 13dbad6369.

Reverted https://github.com/pytorch/pytorch/pull/88729 on behalf of https://github.com/desertfire due to causing inductor test failure
2022-12-20 15:19:54 +00:00
ecao
13dbad6369 use scatter_add for index_add when dim is the most inner dim (#88729)
### Motivation
When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations.

`scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel.

### Testing

- Single core:

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.82E-03 | 2.11E-03
[10, 128, 50, 50] | 0.023604 | 0.023794

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 9.30E-04 | 1.66E-03
[10, 128, 50, 50] | 0.005995 | 0.010003

- Single socket (28 cores):

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.96E-03 | 2.52E-03
[10, 128, 50, 50] | 0.012208 | 0.012568

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 7.44E-05 | 1.33E-04
[10, 128, 50, 50] | 0.000333 | 0.000469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet
2022-12-20 13:12:36 +00:00
Yanbo Liang
511fbad830 [Dynamo] Fix builder for class with metaclass (#90807)
Fixes Meta internal user case: a class with metaclass can't be identified as ```UserDefinedClassVariable```.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90807
Approved by: https://github.com/jansel
2022-12-20 05:02:28 +00:00
Edward Z. Yang
e686a442b4 If a torch.* returns non-Tensor, make this unimplemented rather than assert. (#89918)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89918
Approved by: https://github.com/albanD
2022-12-15 21:53:54 +00:00
Edward Z. Yang
283cf718ed Fix _fix_weakref memory leak (#90823)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90823
Approved by: https://github.com/eellison, https://github.com/albanD
2022-12-15 01:07:29 +00:00
Edward Z. Yang
cc504ce292 Restore test_warn_types (#90810)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90810
Approved by: https://github.com/ngimel
2022-12-14 05:15:32 +00:00
Bin Bao
7035bcdd0f [inductor] Enable test_torch (#90518)
Summary: Skipping failures in those tests so that CI can guard other
passing cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90518
Approved by: https://github.com/jansel
2022-12-13 16:21:35 +00:00
Yuxin Wu
5d8618dfbd Some memory saving in large unittests (#90148)
Two tests test_large_cumsum, test_large_cumprod use a lot of memory. This PR:
* Reduces their memory usage by: avoid `self.assertEqual` and avoid a temporary python variable
* Mark their memory requirement by decorator.

related to https://github.com/pytorch/pytorch/issues/84944
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90148
Approved by: https://github.com/soumith
2022-12-11 21:04:38 +00:00
Edward Z. Yang
2ad6ed8ac9 Fix some typed storage is deprecated warnings. (#89867)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89867
Approved by: https://github.com/albanD
2022-12-07 20:09:57 +00:00
eqy
f7520cb51e Reduce memory usage requirement of test_pdist_norm_large in test_torch.py (#90075)
Basically the same fix as #85373, `/usr/bin/time` indicates that the memory requirement on the host-side was actually ~64GiB before the workaround and ~30GiB after.

CC @ptrblck @davidberard98

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90075
Approved by: https://github.com/davidberard98
2022-12-03 05:28:21 +00:00
Yu, Guangye
4144ad16af add XPU backend to support torch.save and torch.load (#89679)
# Motivate
We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False.

# Solution
We give a design via wrap data as a tensor:
>1. and use an in-place copy for H2D
>2. directly call a tensor.to() for D2H.

This can help us:
>1. unify the generic code for all backends.
>2. support all the non-CPU device backends.

# Additional Context
No need more UT.
test/test_serialization.py will cover this code change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89679
Approved by: https://github.com/ezyang
2022-11-30 20:38:02 +00:00
albanD
8713119c89 Stream actually overrides __new__ so we need to patch it as well (#89592)
Avoids
```
$ python foo.py
Traceback (most recent call last):
  File "foo.py", line 3, in <module>
    a = torch.cuda.Stream()
  File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/streams.py", line 34, in __new__
    return super(Stream, cls).__new__(cls, priority=priority, **kwargs)
TypeError: object.__new__() takes exactly one argument (the type to instantiate)
```
And now gets
```
$ python foo.py
Traceback (most recent call last):
  File "foo.py", line 3, in <module>
    a = torch.cuda.Stream()
  File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/streams.py", line 34, in __new__
    return super(Stream, cls).__new__(cls, priority=priority, **kwargs)
  File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/_utils.py", line 44, in err_fn
    raise RuntimeError(
RuntimeError: Tried to instantiate dummy base class Stream

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89592
Approved by: https://github.com/soumith
2022-11-29 21:43:23 +00:00
David Berard
a029ec2c88 Move gpu slow tests to sm86 (#87880)
NVFuser tests (which are slow tests) would be better to run on more
modern GPU hardware.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87880
Approved by: https://github.com/malfet
2022-11-29 19:29:59 +00:00
Nikita Karetnikov
57af0c8245 Bug fix: make sure copy_impl doesn't read out of bounds (#88544)
Fixes #88543.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88544
Approved by: https://github.com/lezcano
2022-11-16 13:23:38 +00:00
PyTorch MergeBot
8441443132 Revert "Add nondeterministic error for scatter (#88244)"
This reverts commit e940a2f8e2.

Reverted https://github.com/pytorch/pytorch/pull/88244 on behalf of https://github.com/mehtanirav due to Internal test failures
2022-11-10 23:56:49 +00:00
Kurt Mohler
ee28b865ee Deprecate TypedStorage, its derived classes, and all of their public methods (#85303)
Part of #85302

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303
Approved by: https://github.com/ezyang
2022-11-08 18:11:01 +00:00
Kurt Mohler
e940a2f8e2 Add nondeterministic error for scatter (#88244)
Fixes #88096

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88244
Approved by: https://github.com/ezyang, https://github.com/mruberry
2022-11-04 20:23:59 +00:00
Nikolay Korovaiko
0f6304ef1e disable the out variants in test_cumprod test for inductor (#88328)
`out=` variants aren't supported by autograd and it's not a must fix, so disabling the test (https://github.com/pytorch/torchdynamo/issues/1798) for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88328
Approved by: https://github.com/desertfire
2022-11-03 16:52:37 +00:00
Nikolay Korovaiko
529ba076c6 add an exclude for test_constructor for inductor (#88143)
This test (https://github.com/pytorch/torchdynamo/issues/1800) fails since none of the c-tor ops support `pin_memory=True`. Natalia suggests it's not a priority to fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88143
Approved by: https://github.com/desertfire
2022-11-03 16:21:18 +00:00
Edward Z. Yang
f884e817d4 Make Python op registration work with torchdeploy/multipy (#87162)
See strategy at PythonOpRegistrationTrampoline.cpp for the
big picture.

Along the way, I made OperatorHandle support == and hashing,
and slightly changed the low level python_dispatch impl API
to disallow empty strings for dispatch key, which had the knock
on effect of requiring us to explicitly make sure we pass in
CompositeImplicitAutograd if we would have passed in "" (I didn't apply
this to the rest of the file because I'm lazy.)

Test strategy is we delete the logic for preventing Python op
registrations in torch from being skipped in a torchdeploy context
and show CI still works.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87162
Approved by: https://github.com/anjali411, https://github.com/bdhirsh
2022-11-03 12:56:44 +00:00
Philip Meier
bc73affdad prepare removal of deprecated functionality in torch.testing (#87969)
_Redo of #86586 with all BC breaking changes granularly placed into separate commits._

---

Per title. Deprecation happened on Feb 25, 2022 in c6f1bbc0ac, which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969
Approved by: https://github.com/mruberry
2022-11-02 14:04:48 +00:00
Kurt Mohler
1dbc8ad3b7 Add Warning class and refactor C++ warnings to use it (#84101)
Also adds `TORCH_WARN_WITH` and `TORCH_WARN_DEPRECATION` macros

Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84101
Approved by: https://github.com/albanD
2022-10-18 20:02:42 +00:00
Natalia Gimelshein
1704256b10 Enables where to have cpu scalar args (#87022)
This is for decompositions only, no attempt made to have good performance for this case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87022
Approved by: https://github.com/ezyang, https://github.com/eellison, https://github.com/mruberry
2022-10-17 17:08:47 +00:00
Mikayla Gawarecki
afaee00fec Add python nested_tensor and as_nested_tensor constructors in torch.nested (#85593)
Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ).

Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc.

Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593
Approved by: https://github.com/drisspg, https://github.com/cpuhrsch
2022-09-28 20:15:02 +00:00
Kurt Mohler
b0a631cd14 Add nondeterministic alert for MaxUnpool1d/2d/3d (#84766)
Part of #80827
Part of #78249
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84766
Approved by: https://github.com/Lezcano, https://github.com/mruberry, https://github.com/nikitaved
2022-09-17 11:58:18 +00:00
soulitzer
02f654abca Disable torch.library.Library with PYTORCH_DISABLE_LIBRARY (#85190)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85190
Approved by: https://github.com/d4l3k
2022-09-17 03:05:43 +00:00
Khushi Agrawal
a9258eba8e [Testing] Port bernoulli and multinomial to ErrorInputs. (#74683)
Hi,
The PR aims to port `bernoulli` and `multinomial` to error inputs. Thanks!

cc: @kshitij12345! :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74683
Approved by: https://github.com/kshitij12345, https://github.com/mruberry
2022-09-16 21:24:09 +00:00
Elias Ellison
f37069aac7 Re-enable fixed dynamo tests (#84969)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84969
Approved by: https://github.com/bdhirsh, https://github.com/ezyang
2022-09-16 15:36:52 +00:00
Kurt Mohler
95a2c3df31 Replace expectedAlertNondeterministic with simpler check function (#84808)
Fixes #84807

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84808
Approved by: https://github.com/mruberry
2022-09-16 01:10:12 +00:00
Kurt Mohler
5b58140d1a Add deterministic impl of scatter_add CUDA for all input sizes (#79466)
Fixes #50469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79466
Approved by: https://github.com/ngimel
2022-09-07 03:12:49 +00:00
Natalia Gimelshein
0b363c5c5c don't synchronize single element any/all reductions (#84465)
Fixes #84291

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84465
Approved by: https://github.com/ezyang
2022-09-02 21:18:58 +00:00
Elias Ellison
f701cb04fb Test Dynamo CI w Fake Tensors (#84282)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84282
Approved by: https://github.com/anijain2305
2022-09-01 00:15:05 +00:00
mattip
4dfa6d28a1 Normalize DLPack stride to 1 where shape < 2 (#83158)
Fixes #83069. Also move all the dlpack tests to a new file., `test_dlpack.py`.

The fix involves always allocating a "strides" int array when converting to dlPack and deleting the strides when the capsule descructor is called. Then the strides are copied from the tensor, and `strides[i]` is set to `1` where `shape[i] < 2`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83158
Approved by: https://github.com/ezyang
2022-08-23 15:03:29 +00:00
Brian Hirsh
0c24af4985 Always allow tensor metadata changes (#83590)
Make it so that it is valid to set metadata after detach calls, like `x.detach().resize_(...)`.

This technically lifts some restrictions around `.data`. This PR means that you can now technically call `x.data.resize_(...)`, which can now directly resize `x` instead of erroring.

My understanding: Before the tensor-variable merge, when `x` and `x.data` were really different tensors, you could resize `x.data` independently of `x`, and during the merge, this error was added to avoid silent confusing behavior changes.

It was agreed that this error has been around long enough (several years) that it's acceptable to drop.  cc @albanD @ezyang.

(Ed already had a prototype PR [here](https://github.com/pytorch/pytorch/pull/83545) - I ended up making one to try to slog through test failures).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83590
Approved by: https://github.com/ezyang
2022-08-19 23:30:43 +00:00
Nikita Shulga
1a09b05c94 Fix torch.equal on CPU (#83350)
`torch.equal` should not raise an exception when comparing tensors of different types
I.e. `torch.equal(torch.tensor([1, 2]), torch.tensor([1, 2], dtype=torch.float)))` should return True rather than raise an exception.
Also, this makes it consistent with GPU behaviour

Fixes https://github.com/pytorch/pytorch/issues/83314

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83350
Approved by: https://github.com/albanD
2022-08-17 03:22:56 +00:00
Nikita Karetnikov
4010f96121 [primTorch] Fix off by 1 in canonicalize_dim (#83198)
Also fix an issue in the `unsqueeze` ref due to this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83198
Approved by: https://github.com/ngimel
2022-08-16 17:57:01 +00:00
PyTorch MergeBot
f534b2c627 Revert "Remove split functional wrapper (#74727)"
This reverts commit a58876ace7.

Reverted https://github.com/pytorch/pytorch/pull/74727 on behalf of https://github.com/seemethere due to Fails internal use cases, might extend out to external use cases as well. Need to assess overall impact of this change more widely
2022-08-10 19:45:23 +00:00
Peter Bell
a58876ace7 Remove split functional wrapper (#74727)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74727
Approved by: https://github.com/albanD, https://github.com/khabinov
2022-08-10 17:57:48 +00:00
Kurt Mohler
c379915969 Add nondeterministic alert to CUDA cumsum (#75693)
Part of #75240

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75693
Approved by: https://github.com/ngimel
2022-08-04 01:58:29 +00:00
Kurt Mohler
14d0296e5c Rename _Typed/_UntypedStorage to Typed/UntypedStorage and update docs (#82438)
### Description

Since the major changes for `_TypedStorage` and `_UntypedStorage` are now complete, they can be renamed to be public.

`TypedStorage._untyped()` is renamed to `TypedStorage.untyped()`.

Documentation for storages is improved as well.

### Issue
Fixes #82436

### Testing
N/A

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82438
Approved by: https://github.com/ezyang
2022-07-30 19:37:08 +00:00
Fabio Rocha
fd84c458f4 Add torch.unflatten and improve its docs (#81399)
unflatten now has a free function version in torch.flatten in addition to
    the method in torch.Tensor.flatten.

    Updated docs to reflect this and polished them a little.
    For consistency, changed the signature of the int version of unflatten in
    native_functions.yaml.

    Some override tests were failing because unflatten has unusual
    characteristics in terms of the .int and .Dimname versions having
    different number of arguments so this required some changes
    to test/test_override.py

    Removed support for using mix of integer and string arguments
    when specifying dimensions in unflatten.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81399
Approved by: https://github.com/Lezcano, https://github.com/ngimel
2022-07-29 15:02:42 +00:00
ecao
1ebe98220c Optimize the copy of BFloat16 to Float and Float to BFloat16 (#79685)
Optimize the copy of BFloat16 to Float and Float to BFloat16.
* Vectorize the copy of BFLoat16 <-> Float
* Use `at::internal::serial_for_each` instead of directly using `cpu_kernel_vec` as  `cpu_kernel_vec` can't handle that input and output has different data types.

single socket (28cores):
```
before: torch.Size([10, 128, 10, 124])  bf16 -> fp32: 4.18e-05 ms;   fp32 -> bf16: 5.04e-05 ms
        torch.Size([10, 128, 30, 124])  bf16 -> fp32: 0.00011868 ms; fp32 -> bf16: 0.0001476 ms

after:  torch.Size([10, 128, 10, 124])  bf16 -> fp32: 1.35e-05 ms;   fp32 -> bf16: 1.97e-05 ms
        torch.Size([10, 128, 30, 124])  bf16 -> fp32: 7.32e-05 ms;   fp32 -> bf16: 5.70e-05 ms
```
single core:
```
before: torch.Size([10, 128, 10, 124])  bf16 -> fp32: 0.000848 ms;   fp32 -> bf16: 0.00105 ms
        torch.Size([10, 128, 30, 124])  bf16 -> fp32: 0.00269 ms;    fp32 -> bf16: 0.00321 ms

after:  torch.Size([10, 128, 10, 124])  bf16 -> fp32: 0.000370 ms;   fp32 -> bf16: 0.000382 ms
        torch.Size([10, 128, 30, 124])  bf16 -> fp32: 0.00153 ms;    fp32 -> bf16: 0.00113 ms
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79685
Approved by: https://github.com/malfet
2022-07-28 14:34:08 +00:00
Huy Do
edf1868e67 Fix test_doc_template regex (#81755)
### The problem

This original regex abuses .* in combination with `re.DOTALL` and leads to a catastrophic backtracking perf issue when there is no match. When it happens, test_doc_template will run "forever" and timeout. Here is an example timeout test https://github.com/pytorch/pytorch/runs/7413337595

Another minor issue with this regex is that it won't matches concatenated doc string like `"""FOO""" + """BAR"""`, which is used for some API `_torch_docs.py`

### The fix
* Remove most of the match all .* usage. I have tested to make sure that the test finishes even when there is no match, i.e. it fails successfully
* Update the regex to match all the following cases before and after linting (You can also try it out on https://pythex.org):

BEFORE
```
add_docstr(torch.abs, r"""
abs(input, *, out=None) -> Tensor

Computes the absolute value of each element in :attr:`input`.

.. math::
    \text{out}_{i} = |\text{input}_{i}|
""" + r"""
Args:
    {input}

Keyword args:
    {out}

Example::

    >>> torch.abs(torch.tensor([-1, -2, 3]))
    tensor([ 1,  2,  3])
""".format(**common_args))

add_docstr(torch.absolute,
           r"""
absolute(input, *, out=None) -> Tensor

Alias for :func:`torch.abs`
""")
```

AFTER
```
add_docstr(
    torch.abs,
    r"""
abs(input, *, out=None) -> Tensor

Computes the absolute value of each element in :attr:`input`.

.. math::
    \text{out}_{i} = |\text{input}_{i}|
"""
    + r"""
Args:
    {input}

Keyword args:
    {out}

Example::

    >>> torch.abs(torch.tensor([-1, -2, 3]))
    tensor([ 1,  2,  3])
""".format(
        **common_args
    ),
)

add_docstr(
    torch.absolute,
    r"""
absolute(input, *, out=None) -> Tensor

Alias for :func:`torch.abs`
""",
)
```

This will unblock https://github.com/pytorch/pytorch/pull/81643
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81755
Approved by: https://github.com/atalman
2022-07-21 16:28:29 +00:00
Animesh Jain
1d90d6ee60 Setup for running PyTorch tests with TorchDynamo and skips for known failing tests (#80106)
@ezyang I am going to keep adding more skips in this PR for now. And once we have the CI running, I will replace with the appropriate decorators.

cc @mlazos , we should add those tests in test_ops.py in this PR as well

cc @jansel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80106
Approved by: https://github.com/ezyang, https://github.com/jansel
2022-07-07 18:57:33 +00:00