Commit Graph

2225 Commits

Author SHA1 Message Date
Philip Meier
1f74e082e2 only compare attributes for meta tensors (#72508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72508

Todo:

- [x] document this behavior
- [x] add tests

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D34262452

Pulled By: ezyang

fbshipit-source-id: bc5c9653d5c3ad5c6efccc9c8e0efc0d28e15104
(cherry picked from commit 233142c88e)
2022-02-17 02:33:08 +00:00
Philip Meier
b5f2574f36 no longer coalesce sparse COO tensors before comparison (#69751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69751

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D34262453

Pulled By: ezyang

fbshipit-source-id: e2e62d2aa03fc569d2951c880960b256f5dc4aaa
(cherry picked from commit cb6b0ef719)
2022-02-17 02:33:08 +00:00
Kurt Mohler
8e7fe87630 Rename Typed/UntypedStorage to _Typed/_UntypedStorage (#72540)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72540

Reviewed By: jbschlosser

Differential Revision: D34216823

Pulled By: bdhirsh

fbshipit-source-id: 1bc9930ab582771ebf02308e035576cd1a0dbe47
(cherry picked from commit 329238f612)
2022-02-15 23:53:01 +00:00
soulitzer
67adc0cb11 Remove xfail for trapz and trapezoid on meta device (#72677)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72677

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D34182326

Pulled By: soulitzer

fbshipit-source-id: 9697b9e144780a4f3f60bea0978878f7edb72606
(cherry picked from commit 0386263175)
2022-02-15 22:04:01 +00:00
Nikita Vedeneev
961bbe1c6a linalg_det_singular: modify samples such that CUDA IMA dissapears. (#72585)
Summary:
Implicitly fixes https://github.com/pytorch/pytorch/issues/72203 and https://github.com/pytorch/pytorch/issues/72204.

The issues is coming from an incorrect use of `scatter` with wrong indices, see https://github.com/pytorch/pytorch/issues/72204#issuecomment-1034087199. I do not know what exactly calls to `scatter`, investigating...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72585

Reviewed By: cpuhrsch

Differential Revision: D34245279

Pulled By: anjali411

fbshipit-source-id: 460f030524f9228f2269eaee0a3a72e1978caeb4
(cherry picked from commit e48295716a)
2022-02-15 21:12:17 +00:00
Alban Desmaison
a7cac05ca6 Add new tls snapshot feature (#72832)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/72623 that was reverted for the tls cleanup was removed.

From close inspection on the counting of the number of available keys, I think there is one more since the guard is actually one after the last usable key. With this update assert, the last updated key will still be <=63 which will fit just fine.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72832

Reviewed By: H-Huang

Differential Revision: D34228571

Pulled By: albanD

fbshipit-source-id: ce5e10a841ea87386727346cfc8d9327252574c4
(cherry picked from commit 59d3b86353)
2022-02-15 19:02:05 +00:00
Huamin Li
32dd4a8639 move fx_acc out of pytorch core (#72803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72803

as title

Reviewed By: jfix71

Differential Revision: D34101788

fbshipit-source-id: a9fd84671929af21405c049603e9895ec68de3d8
(cherry picked from commit e98fd1c32d)
2022-02-15 16:13:43 +00:00
Nikita Shulga
80f23469dd Revert D34152115: [pytorch][PR] [ROCm] Enable sort operator BF16 support
Test Plan: revert-hammer

Differential Revision:
D34152115 (aa44480b40)

Original commit changeset: 53841c91976b

Original Phabricator Diff: D34152115 (aa44480b40)

fbshipit-source-id: c9b5cc06198032af73cd6390466de2c62576a1e1
(cherry picked from commit eb72533ae9)
2022-02-15 15:23:29 +00:00
Brian Hirsh
f1a9650e4f Revert D34214953: Add new tls snapshot feature
Test Plan: revert-hammer

Differential Revision:
D34214953 (6199b5231f)

Original commit changeset: 7aa5d5e3540a

Original Phabricator Diff: D34214953 (6199b5231f)

fbshipit-source-id: 5d271e9a5ab021b8202402630dbf917b43c55421
(cherry picked from commit a12c630198)
2022-02-14 23:14:19 +00:00
Alban Desmaison
6199b5231f Add new tls snapshot feature (#72623)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72623

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D34214953

Pulled By: albanD

fbshipit-source-id: 7aa5d5e3540a45a0ae70c5af3a4495c755908aa9
(cherry picked from commit dc0a1ab54a)
2022-02-14 20:46:54 +00:00
Alban Desmaison
3c33f0bdcd Clean up LoggingTensor semantic (#72620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72620

Clarify how LoggingTensor works with autograd.
The updated comment should cover the semantic changes.

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D34214956

Pulled By: albanD

fbshipit-source-id: 730d0a68f4228d2a84758e6807d869a34cbc1b31
(cherry picked from commit 66110bf16b)
2022-02-14 20:13:30 +00:00
Brian Hirsh
f87f753bb9 avoiding adding some functions to the public python API before 1.11 release (#72543)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72543

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D34085724

Pulled By: bdhirsh

fbshipit-source-id: 941d5a90a6fa5328268d623e0e2b01577e4132ca
(cherry picked from commit 6676a0c79a)
2022-02-14 19:49:01 +00:00
Pruthvi Madugundu
aa44480b40 [ROCm] Enable sort operator BF16 support (#71226)
Summary:
Related to [https://github.com/pytorch/pytorch/issues/58196](https://github.com/pytorch/pytorch/pull/58196)

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH amathews-amd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71226

Reviewed By: malfet

Differential Revision: D34152115

Pulled By: seemethere

fbshipit-source-id: 53841c91976bdb5a0002362f22a54ec23aa2f78f
(cherry picked from commit 963027c7f2)
2022-02-14 19:26:07 +00:00
Ryan Spring
4f8b986e28 Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: VitalyFedyunin

Differential Revision: D33894937

Pulled By: jbschlosser

fbshipit-source-id: b65e8fb6ea66168af8f34f45ed50e92737a33851
(cherry picked from commit 6e986f91a9)
2022-02-14 03:40:32 +00:00
Douglas Lehr
28549b618a [ROCm] Enable skipped ROCm unit tests (#67706)
Summary:
A number of ROCm tests were skipped via the skipCUDAIfRocm flag.
A majority of the testcases are now supported on the ROCm platform. This fix enabled all of the test_ops tests for ROCm and enables most Operators in  common_methods_invocations.py minus the SpectralFuncInfo class which still has some fft issues.

Partially Fixes https://github.com/pytorch/pytorch/issues/51303

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH amathews-amd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67706

Reviewed By: seemethere, janeyx99

Differential Revision: D34153457

Pulled By: malfet

fbshipit-source-id: 95f4420f306ca7580cd438d3b5cc0b24efbfae99
(cherry picked from commit 0d178fffd3)
2022-02-11 22:14:54 +00:00
Rohan Varma
37651894f9 [Easy] Small DDP fixes (#72455)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72455

- Improve helper function
- Improve/fix some logging
ghstack-source-id: 148840678

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D34044865

fbshipit-source-id: d2ae820effaaaecdd7155ffa8d3a1d8ebbd9f39e
(cherry picked from commit 3efbea8f41)
2022-02-11 15:55:09 +00:00
David Berard
933e0e8991 Opinfo test for mvlgamma: add epsilon (#72491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72491

retry #71794, base revision in that stack had been reverted

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D34062260

Pulled By: davidberard98

fbshipit-source-id: 40fbb2d2de3b10000645e25e7fe89f3ce929f0a2
(cherry picked from commit 917676f076)
2022-02-09 19:01:22 +00:00
David Berard
bbd42c605a [JIT] Opinfo tests for nnc fusion - retry (#72486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72486

Retry #70465.

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D34061628

Pulled By: davidberard98

fbshipit-source-id: e27ed315bc4ad57cdbfbc9cedffcbb7886004524
(cherry picked from commit 7937808d2e)
2022-02-09 19:01:22 +00:00
Kushashwa Ravi Shrimali
cdc9b1160e Port linalg_cross to structured kernels (#72413)
Summary:
This PR ports `linalg_cross` to structured kernels. Please see https://github.com/pytorch/pytorch/issues/55070 for the tracker issue.

cc: ysiraichi bdhirsh AnirudhDagar

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72413

Reviewed By: ejguan

Differential Revision: D34076711

Pulled By: bdhirsh

fbshipit-source-id: 45b073681692a6f0406e92a4da86aef4dd1423da
(cherry picked from commit 3780fb2611)
2022-02-09 07:13:13 +00:00
Yinghai Lu
3670466201 Move fx2trt out of PyTorch core (#72499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72499

Pull Request resolved: https://github.com/pytorch/benchmark/pull/740

To fx2trt out of tree to remove bloatness of PyTorch core.

It's the first and major step. Next, we will move acc_tracer out of the tree and rearrange some fx passes.

Reviewed By: suo

Differential Revision: D34065866

fbshipit-source-id: c72b7ad752d0706abd9a63caeef48430e85ec56d
(cherry picked from commit 91647adbca)
2022-02-09 04:04:49 +00:00
Kushashwa Ravi Shrimali
bc03c1d000 Structured Kernels for index_copy, add out variant (#67329)
Summary:
This PR ports `index_copy` implementation to structured kernels, also adds an `out` variant.

~Note to the reviewers: This is in draft mode, waiting for the tests from the CI, and I'll give a final look before requesting the review.~

Issue tracker: https://github.com/pytorch/pytorch/issues/55070

cc: bdhirsh ysiraichi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67329

Reviewed By: ejguan

Differential Revision: D34077219

Pulled By: bdhirsh

fbshipit-source-id: 6accda33957f654b753261c5c3d765a27a64d2c0
(cherry picked from commit f3ac83217a)
2022-02-08 22:52:27 +00:00
Nikita Shulga
bb101ec78d Revert D33595240: [JIT] Opinfo tests for nnc fusion
Test Plan: revert-hammer

Differential Revision:
D33595240 (0b57bd4c66)

Original commit changeset: e2e17a921bc3

Original Phabricator Diff: D33595240 (0b57bd4c66)

fbshipit-source-id: 172a3ffd19d180b1b3617956b1f881be62f37bc9
(cherry picked from commit 324cfaea86)
2022-02-08 01:28:42 +00:00
Nikita Shulga
58f25678bd Revert D33780905: Opinfo test for mvlgamma: add epsilon
Test Plan: revert-hammer

Differential Revision:
D33780905 (72cedba655)

Original commit changeset: c9afd443bc90

Original Phabricator Diff: D33780905 (72cedba655)

fbshipit-source-id: 180b862ed03e18f96cc1c7f956476eb16dd56225
(cherry picked from commit 623643b362)
2022-02-08 01:28:42 +00:00
David Berard
72cedba655 Opinfo test for mvlgamma: add epsilon (#71794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71794

mvlgamma(inp, p) requires that all the elements of inp are > (p-1)/2.

The opinfo test was occasionally producing inputs with elements == (p-1/2), which would generate errors like:

```
ERROR: test_nnc_correctness_mvlgamma_mvlgamma_p_5_cpu_bfloat16 (__main__.TestNNCOpInfoCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/path/pytorch/torch/testing/_internal/common_device_type.py", line 381, in instantiated_test
    raise rte
  File "/path/pytorch/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
    result = test(self, **param_kwargs)
  File "/path/pytorch/torch/testing/_internal/common_device_type.py", line 753, in test_wrapper
    return test(*args, **kwargs)
  File "/path/pytorch/torch/testing/_internal/common_device_type.py", line 907, in only_fn
    return fn(slf, *args, **kwargs)
  File "/path/pytorch/test/test_jit_fuser_te.py", line 2293, in test_nnc_correctness
    ref = variant(*clone_inputs((sample.input, *sample.args)), **sample.kwargs)
RuntimeError: All elements must be greater than (p-1)/2
```

repro example: https://gist.github.com/davidberard98/9da688e31cdfbaed7e990746b28a4ba2

Test Plan: Imported from OSS

Reviewed By: qihqi

Differential Revision: D33780905

Pulled By: davidberard98

fbshipit-source-id: c9afd443bc90ce68f33b97498921b447e4f7d1d8
(cherry picked from commit a974b03f07)
2022-02-07 22:21:03 +00:00
David Berard
0b57bd4c66 [JIT] Opinfo tests for nnc fusion (#70465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70465

These tests check to ensure that
(a) the result after nnc fusion (of a single op) is the same as the
unfused op
(b) for certain ops where fusion is expected to occur, ensure that
fusion does actually occur

Test Plan: Imported from OSS

Reviewed By: wenleix

Differential Revision: D33595240

Pulled By: davidberard98

fbshipit-source-id: e2e17a921bc30c313e92e8e5bbc6c1b5fcd14bc1
(cherry picked from commit b1ba221acc)
2022-02-07 20:56:21 +00:00
Andrew Gu
b047963983 [PT-D][BE] Fix DDP no_sync() test logic (#72348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72348

**Overview**
#43307 changed `_test_accumulate_gradients_no_sync()` to add a `num_iters` argument. However, I think the change misconstrued the test logic slightly.

61ab04e1db/torch/testing/_internal/distributed/distributed_test.py (L4369-L4397)

- `iteration % num_iters == 0` evaluates to `True` only for `iteration == 0` since `iteration` comes from `for iteration in `range(num_iters)`.
- IIUC, the intention is to alternate between accumulating gradients (using `no_sync()`) and synchronizing gradients normally. In the existing implementation, any iterations following the second one are non-productive since gradients are in sync, meaning it reduces to testing normal DDP.
- This PR changes the check back to `iteration % 2 == 0` to restore the alternating behavior.

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D34011559

Pulled By: awgu

fbshipit-source-id: 4ba771e45b28a343167a324462571e4b8e25ae72
(cherry picked from commit 8492a8b803)
2022-02-07 18:05:19 +00:00
anjali411
2b53121ddd Reenable tests that now run fine after the IMA fix (#72288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72288

Fixes https://github.com/pytorch/pytorch/issues/72177 https://github.com/pytorch/pytorch/issues/72189 https://github.com/pytorch/pytorch/issues/72189

These tests were disabled in https://github.com/pytorch/pytorch/pull/72016

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D34009065

Pulled By: anjali411

fbshipit-source-id: 0874d0cc612bb2fb368a092389448b76ac31a989
(cherry picked from commit 5b40a7ba63)
2022-02-04 18:39:04 +00:00
Nikita Shulga
38ebb776a4 Fail with unexpected success for fatal errors (#72016)
Summary:
Rest of the tests from CUDA testuite is skipped after GPU context corruption is encountered.
For tests decorated with `expectedFailure` creates false impression that entire testsuite is passing.
Remedy it by suppressing the exception and printing the warning about unexpected success if `should_stop_early` is true
Also, prints warning when this happens (to make attribution easier) as well as when this condition is detected.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72016

Test Plan:
`python test_ops.py -v  -k test_fn_fwgrad_bwgrad_gradient`
Before the change:
```
test_fn_fwgrad_bwgrad_gradient_cpu_complex128 (__main__.TestGradientsCPU) ... ok
test_fn_fwgrad_bwgrad_gradient_cpu_float64 (__main__.TestGradientsCPU) ... ok
test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (__main__.TestGradientsCUDA) ... expected failure

----------------------------------------------------------------------
Ran 3 tests in 0.585s
OK (expected failures=1)
```

After the change:
```
test_fn_fwgrad_bwgrad_gradient_cpu_complex128 (__main__.TestGradientsCPU) ... ok
test_fn_fwgrad_bwgrad_gradient_cpu_float64 (__main__.TestGradientsCPU) ... ok
test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (__main__.TestGradientsCUDA) ... /home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1670: UserWarning: TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  warn(f"TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with {rte}")
/home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_device_type.py:382: UserWarning: Suppressed expected failure that resulted in fatal error
  warn("Suppressed expected failure that resulted in fatal error")
unexpected success

----------------------------------------------------------------------
Ran 3 tests in 0.595s

FAILED (unexpected successes=1)
```
And `stderr` from XML file contains requested info:
```
/home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1670: UserWarning: TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  warn(f"TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with {rte}")
/home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_device_type.py:382: UserWarning: Suppressed expected failure that resulted in fatal error
  warn("Suppressed expected failure that resulted in fatal error")
```

Fixes https://github.com/pytorch/pytorch/issues/71973

Reviewed By: janeyx99, ngimel

Differential Revision: D33854287

Pulled By: malfet

fbshipit-source-id: dd0f5a4d2fcd21ebb7ee50ce4ec4914405a812d0
(cherry picked from commit 0c0baf3931)
2022-02-03 17:49:59 +00:00
Junjie Wang (PyTorch)
88547396eb [PT-D] Enable megatron-lm style MLP layers (Changes mainly on sharded linear op) (#69735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69735

We want to build a prototype of Megatron-LM so that we can apply PT-D op to models like transformer and other Meta flagship models like

The basic idea of Megatron-LM is as following:
1. Col-wise sharding of linear weight. Perform the linear op for the first layer.
2. Perform a math op (optional), such as ReLU or GeLU. We use GeLU in our example unit test. The input is from step 1.
3. Row-wise sharing of linear weight. Perform the linear op for the second layer. The input is from step 2.

We then save communications to concatenate the col-wise sharding results and spreading the input to different ranks for row-wise sharding.

The change is as following:
1. Return a ShardedTensor for the col-wise sharding in the sharded_linear op.
2. Return a PartialTensors for the row-wise sharding in the sharded_linear op.
3. Leverage APIs already defined for `reshard` to merge/aggregate local results to a fully sync local result if needed.
4. Add helper function to create sharded tensor based on the local result.
5. Add a unit test to test the Megatron-LM idea mentioned above and compare with local ops, including the grad and optimizer so that we can ensure the correctness of the implementation.
6. Refactor the unit test of sharded linear to reflect the changes in the code.
ghstack-source-id: 148273049

Test Plan: Unit test + CI

Reviewed By: pritamdamania87

Differential Revision: D32978221

fbshipit-source-id: 565fc92e7807e19d53b0261f8ace3945bef69e3e
(cherry picked from commit 344abe7520)
2022-02-03 06:12:15 +00:00
Junjie Wang (PyTorch)
19d0de8a57 [PT-D][RFC] Resharding related API implement for ShardedTensor and Partial Tensor (#70079)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70079

We defined a new concept named `PartialTensor`, which is an abstraction to represent Tensors that need aggregation across multiple devices and multiple processes.

We also defined a API `reshard_output` to reshard a `PartialTensor` to `Tensor` or reshard a `ShardedTensor` to `ShardedTensor/Tensor`. This is done via class `ModuleResharder` which acts like a wrapper of original modules plus the a reshard in the final step.

The `reshard` logic is defined in each class (`ShardedTensor` and `PartialTensor`).
ghstack-source-id: 148273050

Test Plan: Unit test is in the next PR.

Reviewed By: pritamdamania87

Differential Revision: D33121037

fbshipit-source-id: 5f56617ea526b857c5b73df6e069697d428ec359
(cherry picked from commit 58b1457cbc)
2022-02-03 05:26:02 +00:00
anjali411
f607af126e Set correct device id on efficientzerotensors (#71611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71611

Fixes https://github.com/pytorch/pytorch/issues/71160 https://github.com/pytorch/pytorch/issues/69925 #69913

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D33897543

Pulled By: anjali411

fbshipit-source-id: f1d8608c351876b8c2619da5ef891f74bad30ab5
(cherry picked from commit 643e666ea3)
2022-02-02 21:51:32 +00:00
kshitij12345
02f6226bff [fix] Dropout2d-3d no-batch-dim (#69885)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69801

TODO:
* [x] Update C++ API

cc albanD mruberry jbschlosser walterddr kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69885

Reviewed By: mruberry

Differential Revision: D33175470

Pulled By: jbschlosser

fbshipit-source-id: c9d7d9e0f59ba290a0157725c338a345f3d58b9f
(cherry picked from commit 7e4271a156)
2022-02-02 16:40:32 +00:00
Yanli Zhao
2336571cb7 make fsdp folder to be public (#72084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72084

make fsdp folder to be public
ghstack-source-id: 148173447

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D33903417

fbshipit-source-id: 7852a2adc4af09af48a5ffa52ebf210489f834d5
(cherry picked from commit bd06513cfe)
2022-02-02 15:50:14 +00:00
Pritam Damania
64670e414e [reland] Create torch.distributed._shard package. (#72141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72141

We have many sharding components currently:
torch.distributed._sharded_tensor, torch.distributed._sharding_spec,
torch.distributed._sharded_optimizer and more coming.

As a result, organizing all of this under the `torch.distributed._shard`
package. For BC reasons, I'm still keeping the old packages and have them just
reference the new package.
ghstack-source-id: 148150861
ghstack-source-id: 148150861

Test Plan: waitforbuildbot

Reviewed By: fduwjj

Differential Revision: D33904585

fbshipit-source-id: 057e847eb7521b536a3ee4e0f94871aacc752062
(cherry picked from commit 29a70dd7af)
2022-02-02 06:58:20 +00:00
Andrew Or
e118d6e59f Add lowering path for LinearReLU module (#71427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71427

This commit adds a lowering path for the LinearReLU modules
in static quantization mode. This includes torch.nn.qat.Linear,
torch.nn.intrinsic.LinearReLU, and torch.nn.intrinsic.qat.LinearReLU.
Future commits will add support for dynamic quantization and functional
LinearReLU.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_linear_module

Imported from OSS

Reviewed By: george-qi

Differential Revision: D33694742

fbshipit-source-id: 19af11f82b1ad8ade0c307498971c29a3f776036
(cherry picked from commit b3f607de43)
2022-02-01 19:31:31 +00:00
Richard Zou
f20fa66f70 Revert "[fix] max_pool1d: composite compliance (#70900)" (#71992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71992

This reverts commit b7222e15b6.

We are conservatively reverting this because it broke a test in functorch.
The original PR added a `_max_pool1d_cpu` operator. I'm not sure if it
is actually safe to revert this due to the addition of the new operator
(someone may have serialized it between now and then) but because it has
only been two weeks this should be fine.

Test Plan: - wait for tests

Reviewed By: jbschlosser, VitalyFedyunin

Differential Revision: D33882918

Pulled By: zou3519

fbshipit-source-id: f146e82e6b46690376b3d8825dc7f7da62e2c7de
(cherry picked from commit 1606333e6c)
2022-02-01 15:07:21 +00:00
Jerry Zhang
082ff25f37 [reland][bc-breaking][quant][be] Refactor fuser_method to include is_qat argument" (#71956)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71956

Pull Request resolved: https://github.com/facebookresearch/mobile-vision/pull/59

Original commit changeset: f3912e210e8c

Original Phabricator Diff: D33178977 (ef501e8fed)

Test Plan:
Please see original diff for test plans

**Static Docs Preview: classyvision**
|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D33833203/V3/classyvision/)|

|**Modified Pages**|

Reviewed By: andrewor14

Differential Revision: D33833203

fbshipit-source-id: 74a8f22730b00aafa6a173b208e635c1d696959e
(cherry picked from commit fb88772b18)
2022-01-31 23:02:22 +00:00
Nikita Shulga
34494e6252 Back out "Create torch.distributed.shard package." (#72062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72062

Original commit changeset: dc692b31e260

Original Phabricator Diff: D33755913 (87bbcf70f7)

Test Plan: CI

Reviewed By: pbelevich

Differential Revision: D33891115

fbshipit-source-id: 37286e03d743d8691319f07c95e9561d54f3d6d0
(cherry picked from commit 0c1b3fe008)
2022-01-31 18:29:27 +00:00
Nikita Shulga
74c44ba9d6 Revert D33850228: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33850228 (23d03025dc)

Original commit changeset: 3cc33fb298e4

Original Phabricator Diff: D33850228 (23d03025dc)

fbshipit-source-id: 9436e7df73c2b2e2011f321674f24973316d3692
(cherry picked from commit c9efb58223)
2022-01-31 17:44:19 +00:00
Ryan Spring
23d03025dc Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: cpuhrsch

Differential Revision: D33850228

Pulled By: jbschlosser

fbshipit-source-id: 3cc33fb298e480d7ecc5c67716da019d60c6ab33
(cherry picked from commit 3a53b3e94f)
2022-01-31 17:07:45 +00:00
Anjali Chourdia
1e4aefaa2f Revert D33834916: Set correct device id on efficientzerotensors
Test Plan: revert-hammer

Differential Revision:
D33834916 (a18cfb790d)

Original commit changeset: 11cec343e95e

Original Phabricator Diff: D33834916 (a18cfb790d)

fbshipit-source-id: 3d3f60b760b445383768161b1d21ea4dadbe5d7c
(cherry picked from commit eba41aa646)
2022-01-31 03:49:56 +00:00
anjali411
a18cfb790d Set correct device id on efficientzerotensors (#71611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71611

Fixes https://github.com/pytorch/pytorch/issues/71160 https://github.com/pytorch/pytorch/issues/69925

Test Plan: Imported from OSS

Reviewed By: george-qi

Differential Revision: D33834916

Pulled By: anjali411

fbshipit-source-id: 11cec343e95e2ee188ab7576f26f64aa19317891
(cherry picked from commit f6e86f8a6b)
2022-01-30 20:53:15 +00:00
Pritam Damania
87bbcf70f7 Create torch.distributed.shard package. (#71742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71742

We have many sharding components currently:
torch.distributed._sharded_tensor, torch.distributed._sharding_spec,
torch.distributed._sharded_optimizer and more coming.

As a result, organizing all of this under the `torch.distributed.shard`
package. For BC reasons, I'm still keeping the old packages and have them just
reference the new package.
ghstack-source-id: 147899768

Test Plan: waitforbuildbot

Reviewed By: fduwjj, wanchaol

Differential Revision: D33755913

fbshipit-source-id: dc692b31e2607063d55dfcb3db33ec53961d5a5b
(cherry picked from commit 5b6885f358)
2022-01-29 00:48:06 +00:00
Joel Schlosser
cb823d9f07 Revert D33744717: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33744717 (f499ab9cef)

Original commit changeset: d64532a562ed

Original Phabricator Diff: D33744717 (f499ab9cef)

fbshipit-source-id: 396c3f63de5865f894dbc353d0790a01a624be93
(cherry picked from commit e9fb2d1db1)
2022-01-28 18:35:01 +00:00
Ryan Spring
f499ab9cef Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: mikaylagawarecki

Differential Revision: D33744717

Pulled By: jbschlosser

fbshipit-source-id: d64532a562ed53247bb4fa52bb16722634d5c187
(cherry picked from commit 4713dd9cca)
2022-01-28 16:59:09 +00:00
soulitzer
5bd19ba846 Expect test_fn_fwgrad_bwgrad to fail because forward AD is not implemented (#71944)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71944

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33828924

Pulled By: soulitzer

fbshipit-source-id: f754d0f08567f20324d10f37502b1ab37aca3d8f
2022-01-27 20:46:35 -08:00
Rohan Varma
d0ff1f0013 [FSDP] Backward prefetch in recursive call (#71804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71804

Add backward prefetch arg when using auto_wrap_policy. Unittests are
updated appropriately.
ghstack-source-id: 147753214

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D33782346

fbshipit-source-id: c0176b48db29c3756a8873e809610ed53480102b
(cherry picked from commit 764acb3f1c)
2022-01-28 00:34:08 +00:00
lezcano
6cb128c8dd Generalize noncontiguous tests to several outputs (#67996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67996

This is necessary for most matrix decompositions in `linalg`.

cc mruberry

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33774418

Pulled By: mruberry

fbshipit-source-id: 576f2dda9d484808b4acf0621514c0ffe26834e6
(cherry picked from commit fb07c50aa9)
2022-01-27 23:13:17 +00:00
lezcano
a675770adc Deactivate the tracking of gradients in sampling functions within OpInfos (#68522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68522

Some OpInfos were inadvertibly generating samples with `grad_fn`. For
example, when using functions like `transpose()` or `conj()` on the
inputs to generate transposed or conjugated inputs. This PR corrects
this and deactivates the tracking of gradients in all the sampling
functions.

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33774420

Pulled By: mruberry

fbshipit-source-id: da0e6189a2d67a2cb0fd458054558d36dbad9b61
(cherry picked from commit 42b0870774)
2022-01-27 23:13:17 +00:00
lezcano
e2011b29aa Add OpInfo test to check that floating point inputs in OpInfos have requires_grad set to True (#69909)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69909

This test detected a number of sampling methods that were not generating
the samples as expected, e.g. `index_put`, `cosine_embedding`, `stft`, but
perhaps most notably the generator for `BinOps`.

It also detected that `reminder` and `fmod` did not have implemented the
backward formula for the second input. I added this in the previous PR.

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33774422

Pulled By: mruberry

fbshipit-source-id: 76cfc75b1fdfd72ee64aa524665f83a75fe52509
(cherry picked from commit 13ea7b436b)
2022-01-27 23:13:17 +00:00