Commit Graph

935 Commits

Author SHA1 Message Date
Jerry Zhang
096bea5251 [reland][quant][graphmode][fx][fp16] Add fp16 support for {add|mul}{_relu} (#52714) (#53019)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53019

Test Plan:
python test/test_quantization.py TestQuantizedOps.test_add
python test/test_quantization.py TestQuantizedOps.test_mul
python test/test_quantization.py TestQuantizedOps.test_add_relu
python test/test_quantization.py TestQuantizedOps.test_mul_relu

Imported from OSS

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D26725350

fbshipit-source-id: 2a89f5da6a21908f454f870521d2a4549fdd291e
2021-03-01 13:19:42 -08:00
Kyle Chen
0a70ec45d1 [ROCm] Enable test cases in autocast_test_lists.py for ROCm (#52737)
Summary:
Enabling test cases in autocast_test_lists.py for ROCm because they are passing.

Signed-off-by: Kyle Chen <kylechen@amd.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52737

Reviewed By: H-Huang

Differential Revision: D26706346

Pulled By: ngimel

fbshipit-source-id: c1b3b3d8c0ef2a5b1f7e2bd061a749afbae16590
2021-03-01 12:51:56 -08:00
kshitij12345
a06cf5d8a4 [numpy] torch.{rad2deg, deg2rad}: promote integer inputs to float (#51853)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/42515

Depends on https://github.com/pytorch/pytorch/issues/51283

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51853

Reviewed By: albanD

Differential Revision: D26399743

Pulled By: mruberry

fbshipit-source-id: a6f0e12723e1451c6479d818752fe5d41788715d
2021-03-01 06:25:23 -08:00
kshitij12345
f5617b0932 [testing] Add Opinfo for torch.frac and minor fixes (#52660)
Summary:
Reference : https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52660

Reviewed By: ailzhang

Differential Revision: D26618151

Pulled By: mruberry

fbshipit-source-id: cf0df38e46f44d3afff6e0015af5a840c661aa0e
2021-03-01 04:58:31 -08:00
Mike Ruberry
312b297b82 Revert D26626092: [quant][graphmode][fx][fp16] Add fp16 support for {add|mul}{_relu}
Test Plan: revert-hammer

Differential Revision:
D26626092 (2962fbb03c)

Original commit changeset: 91d040efa51e

fbshipit-source-id: cc6bcc0f451d6adcd7bf7572451e6e3cd6ad59d1
2021-03-01 04:52:47 -08:00
jiej
4d94ee566e Ge v1 (#52136)
Summary:
This is a second attempt to use graph executor to run forward on a gradient. This allows a secondary chance to profile intermediate tensor introduced by autodiff.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52136

Reviewed By: pbelevich

Differential Revision: D26693978

Pulled By: Krovatkin

fbshipit-source-id: 91dde8009a210950af8e5173668ada241e16dd52
2021-02-28 00:53:13 -08:00
Jerry Zhang
2962fbb03c [quant][graphmode][fx][fp16] Add fp16 support for {add|mul}{_relu} (#52714)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52714

Test Plan:
python test/test_quantization.py TestQuantizedOps.test_add
python test/test_quantization.py TestQuantizedOps.test_mul
python test/test_quantization.py TestQuantizedOps.test_add_relu
python test/test_quantization.py TestQuantizedOps.test_mul_relu

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D26626092

fbshipit-source-id: 91d040efa51e9c955eb688ec16a30f0c12233958
2021-02-27 22:12:10 -08:00
Jerry Zhang
177694681e [quant][graphmode][fx] Add reference option support for linear_dynamic_fp16 (#52534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52534

Currently linear_dynamic_fp16 has a signature that's tied to fbgemm/qnnpack
We'll need to produce a pattern equivalent to linear_dynamic_fp16 to support extensions
to other backends

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_linear_dynamic_fp16

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D26557726

fbshipit-source-id: 270c9f781f73c79416a092b7831294cabca84b0c
2021-02-26 21:12:22 -08:00
Rohan Varma
b8e6e2971c Run distributed_test with NCCL_ASYNC_ERROR_HANDLING (#52619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52619

Runs this test suite with nccl_async_error_handling enabled. It is the
default to run many distributed training jobs, and can also help catch
errors/hangs in tests more easily. We don't expect any changes in the actual
existing tests since they shouldn't have any hangs.

Also removes a commented out line
ghstack-source-id: 122595646

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D26588108

fbshipit-source-id: a57bbe2ae5a0c86731d77be45756b17151618eb6
2021-02-26 11:59:49 -08:00
Vasiliy Kuznetsov
d2e88246d8 ns for fx: make return type of ns APIs future proof (#52789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52789

Changes the return type of NS APIs from

```
{
  layer_name: {
    model_name: [torch.Tensor(...), ...],
  },
}
```

to

```
{
  layer_name: {
    model_name: {
      'type': 'weight',  # or node_output, etc
      'values': [torch.Tensor(...), ...],
      // future info can be added here, such as node name, etc
  },
}
```

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D26652640

fbshipit-source-id: 4b31164e402754141368d5a04d595f2b643af3bb
2021-02-25 20:45:44 -08:00
Vasiliy Kuznetsov
fe068157de ns for fx: unify return types of weight and activation APIs (#52779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52779

1. makes the return type of the weight comparison APIs match the return
type of the activation comparison APIs:

```
# before
{layer_name: {model_name: weight_tensor}}
{layer_name: {model_name: [activation_tensor]}}

# after
{layer_name: {model_name: [weight_tensor]}}
{layer_name: {model_name: [activation_tensor]}}
```

2. makes a type alias for the type, so future changes are easier

Test Plan:
```
mypy torch/quantization
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D26652639

fbshipit-source-id: eb1f04d6913cedf88d628f362468875ae9ced928
2021-02-25 20:45:39 -08:00
Xiang
a52001f923 Improve test_reference_numerics (#51604)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50749
ci-all version of https://github.com/pytorch/pytorch/pull/50550

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51604

Reviewed By: anjali411

Differential Revision: D26666951

Pulled By: mruberry

fbshipit-source-id: b87db68f1d2a0f6c151edbc5c7809bbceece69b0
2021-02-25 15:38:42 -08:00
Joel Schlosser
f974cf4688 Test for distributed RL with RPC (#52393)
Summary:
Addresses one item in https://github.com/pytorch/pytorch/issues/46321

## Background
This is a test version of the RL RPC example defined [here](https://github.com/pytorch/examples/blob/master/distributed/rpc/rl/main.py) and [here](https://pytorch.org/tutorials/intermediate/rpc_tutorial.html), with the following differences:
* It defines and uses a `DummyEnv` to avoid a dependency on `gym`. The `DummyEnv` simply returns random states & rewards for a small number of iterations.
* It removes the `ArgumentParser` and utilizes `RpcAgentTestFixture` + hard-coded constants for configuration and launching.
* It changes the worker names to match what the internal Thrift RPC tests expect.

The code is purposefully kept very similar to the original example code outside of these differences.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52393

Test Plan:
```
pytest test/distributed/rpc/test_tensorpipe_agent.py -k test_rl_rpc -vs
pytest test/distributed/rpc/test_process_group_agent.py -k test_rl_rpc -vs
```

Reviewed By: glaringlee

Differential Revision: D26515435

Pulled By: jbschlosser

fbshipit-source-id: 548548c4671fe353d83c04108580d807108ca76e
2021-02-25 10:52:53 -08:00
pbialecki
39fa0b5d0a Add scatter_add to amp promote list (#52133)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/51730

I've added the `scatter_add` and `scatter_add.dimname` to the promote list as well as test cases for the former op.
However, it seems that `scatter_add` [doesn't support named tensors yet](8b0cb5ede3/aten/src/ATen/native/NamedTensor.cpp (L356-L358)) (thanks t-vi for the pointer):
```python
dev = 'cuda'
torch.scatter_add(torch.zeros(2, 2, 2, dtype=torch.float16, device=dev, names=('N', 'C', 'L')),
                             'C',
                             torch.randint(0, 2, (2, 2, 2), device=dev),
                             torch.randn((2, 2, 2), dtype=torch.float32, device=dev))
> RuntimeError: scatter_add: You passed a dimname (string) to this op in place of a dimension index but it does not yet support this behavior. Please pass a dimension index to work around this.
```
which raised this error after adding this test case.

I'm thus unsure, if I should also remove `scatter_add.dimname` from the promote list or not.

In any case, once named tensors are supported a potential test could be added as:
```python
            ("scatter_add", (torch.zeros(2, 2, 2, dtype=torch.float16, device=dev, names=('N', 'C', 'L')),
                             'C',
                             torch.randint(0, 2, (2, 2, 2), device=dev),
                             torch.randn((2, 2, 2), dtype=torch.float32, device=dev))),
```

CC mcarilli ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52133

Reviewed By: ejguan

Differential Revision: D26440392

Pulled By: ngimel

fbshipit-source-id: f4ee2d0b9e1f81afb6f94261c497cf2bf79ec115
2021-02-25 09:37:01 -08:00
Shen Li
1ac59d9db3 Fix RPC get_worker_info for rank=0 (#52804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52804

`rpc.get_worker_info` used to only take string in v1.6. We recently
allow it to accept `int` and `WorkerInfo`, but the previous check
on `worker_name` is no longer correct. This commit adds explicit
`not None` check.

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D26655089

Pulled By: mrshenli

fbshipit-source-id: fa1545bd6dd2b33bc1e919de46b94e799ab9719c
2021-02-25 08:15:01 -08:00
Jane Xu
f71d9e28f9 Store test filename in test report path (#52791)
Summary:
This way, we can have a mapping from the test files we directly execute (the tests [here](https://github.com/pytorch/pytorch/blob/master/test/run_test.py#L20)) to the test suites that we store data for in XML reports.

This will come in use later for categorizing the tests we run in CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52791

Reviewed By: samestep

Differential Revision: D26655086

Pulled By: janeyx99

fbshipit-source-id: 94be32f80d7bc0ea1a7a11d4c4b1d3d8e774c5ea
2021-02-25 07:53:30 -08:00
Luca Wehrstedt
92a4ee1cf6 Revert D26375734: Implemented torch.linalg.multi_dot
Test Plan: revert-hammer

Differential Revision:
D26375734 (0396f492b9)

Original commit changeset: 839642692424

fbshipit-source-id: cb64db646010128d802e1930d5e9526c1f7aa6a2
2021-02-25 00:43:57 -08:00
Bel H
30cb6ac53c Introduce mlc device (ML Compute device) to PyTorch's device list (#50634)
Summary:
Apple recently announced ML Compute, a new framework available in macOS Big Sur, which enables users to accelerate the training of neural networks on Mac hardware. This PR is the first on a series of PRs that will enable the integration with ML Compute. Most of the integration code will live on a separate subrepo named `mlc`.
The integration with `mlc` (ML Compute) will be very similar to that of xla. We rely on registering our ops through:

TORCH_LIBRARY_IMPL(aten, PrivateUse1, m) {
 m.impl_UNBOXED(<op_schema_name>, &customized_op_kernel)
 ...
}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50634

Reviewed By: malfet

Differential Revision: D26614213

Pulled By: smessmer

fbshipit-source-id: 3b492b346c61cc3950ac880ac01a82fbdddbc07b
2021-02-24 22:39:11 -08:00
Heitor Schueroff
0396f492b9 Implemented torch.linalg.multi_dot (#51807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51807

Implemented torch.linalg.multi_dot similar to [numpy.linalg.multi_dot](https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html).

This function does not support broadcasting or batched inputs at the moment.

**NOTE**
numpy.linalg.multi_dot allows the first and last tensors to have more than 2 dimensions despite their docs stating these must be either 1D or 2D. This PR diverges from NumPy in that it enforces this restriction.

**TODO**
- [ ] Benchmark against NumPy
- [x] Add OpInfo testing
- [x] Remove unnecessary copy for out= argument

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D26375734

Pulled By: heitorschueroff

fbshipit-source-id: 839642692424c4b1783606c76dd5b34455368f0b
2021-02-24 15:32:30 -08:00
Heitor Schueroff
08d7f29601 Add discontiguous kwarg to make_tensor (#51985)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51985

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D26375733

Pulled By: heitorschueroff

fbshipit-source-id: bb7831dc28c24b90c6f83885681eeccfdbb83438
2021-02-24 08:57:24 -08:00
Pritam Damania
1c63cb2c0f Pass child error to parent in distributed tests. (#52632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52632

Distributed tests run in a multiprocessing environment, where a parent
process drives the tests through several child processes. As a result, when a
child process fails the parent only prints the following:

```
Process 0 exited with error code 10
```

The child process also logs its own exception, but it is cumberson to go
through the logs and track this down.

To alleviate this, I've added a bunch of pipes for each child process so that
the child process writes the error to the pipe before exiting and the parent
process can read the appropriate error from the pipe and display it.

The new output printed by the parent is as follows:

```
> RuntimeError: Process 0 exited with error code 10 and exception:
Traceback (most recent call last):
  File "torch/testing/_internal/common_distributed.py", line 361, in _run
    getattr(self, test_name)()
  File "torch/testing/_internal/common_distributed.py", line 288, in wrapper
    fn()
  File "test_c10d.py", line 789, in test_broadcast_checks
    pg.broadcast([t1], opts)
ValueError: ProcessGroupGloo::broadcast: invalid root rank: -1

Process 1 exited with error code 10 and exception:
Traceback (most recent call last):
  File "torch/testing/_internal/common_distributed.py", line 361, in _run
    getattr(self, test_name)()
  File "torch/testing/_internal/common_distributed.py", line 288, in wrapper
    fn()
  File "test_c10d.py", line 789, in test_broadcast_checks
    pg.broadcast([t1], opts)
ValueError: ProcessGroupGloo::broadcast: invalid root rank: -1

Process 2 exited with error code 10 and exception:
Traceback (most recent call last):
  File "torch/testing/_internal/common_distributed.py", line 361, in _run
    getattr(self, test_name)()
  File "torch/testing/_internal/common_distributed.py", line 288, in wrapper
    fn()
  File "test_c10d.py", line 789, in test_broadcast_checks
    pg.broadcast([t1], opts)
ValueError: ProcessGroupGloo::broadcast: invalid root rank: -1

Process 3 exited with error code 10 and exception:
Traceback (most recent call last):
  File "torch/testing/_internal/common_distributed.py", line 361, in _run
    getattr(self, test_name)()
  File "torch/testing/_internal/common_distributed.py", line 288, in wrapper
    fn()
  File "test_c10d.py", line 789, in test_broadcast_checks
    pg.broadcast([t1], opts)
ValueError: ProcessGroupGloo::broadcast: invalid root rank: -1
```
ghstack-source-id: 122273793

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D26589274

fbshipit-source-id: 7b7a71ec790b216a89db7c157377f426531349a5
2021-02-23 11:50:25 -08:00
kshitij12345
49b59e3472 Add OpInfo entries for i0 and logical_not (#51956)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51956

Reviewed By: albanD

Differential Revision: D26404440

Pulled By: mruberry

fbshipit-source-id: dd73e63155dd4a200afb38a5e566eb2132e69fde
2021-02-23 10:12:05 -08:00
kshitij12345
ed71cbdd39 Revert PR 52483 "[reland][complex] masked_fill (#52587)
Summary:
Revert "[reland][complex] `masked_fill`: Complex Autograd support update masked_scatter skips. (https://github.com/pytorch/pytorch/issues/52483)"

This reverts commit b6cf17deee.

Reference: https://github.com/pytorch/pytorch/pull/52483#issuecomment-783023560

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52587

Reviewed By: anjali411

Differential Revision: D26579741

Pulled By: malfet

fbshipit-source-id: 9b53c8aab51d844d0f65393609861a4ff72ef7bb
2021-02-22 10:53:37 -08:00
Brian Hirsh
57637e0ab4 port upsample_nearest3d and upsample_trilinear3d to structured (#52065)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52065

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D26373027

Pulled By: bdhirsh

fbshipit-source-id: 76b283ea8142732ffc8f7b200a8494349739e326
2021-02-22 10:38:52 -08:00
Brian Hirsh
d659477ae0 port upsample_bilinear2d and upsample_bicubic2d to structured (#52012)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52012

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D26356329

Pulled By: bdhirsh

fbshipit-source-id: 8f974224799493e3172fe5dff3fbd43af8c09722
2021-02-22 10:38:48 -08:00
Brian Hirsh
f3ea5ca672 port upsample_linear1d to structured (#51917)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51917

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D26327750

Pulled By: bdhirsh

fbshipit-source-id: 443ad278010ce655eb5f08fa6889c45ccb328268
2021-02-22 10:38:43 -08:00
Rohan Varma
ef8d17e112 [DDP] Separate error messages for unused params in forward and not all outputs (#52391)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52391

There are 2 ways DDP can throw the exception refactored here -
1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this.
2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support.

Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known.

This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable.

error msg with `find_unused_params=True`:
```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely  means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
```
error msg without `find_unused_params` specified:
```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
```
ghstack-source-id: 122097900

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D26496688

fbshipit-source-id: 4a9eeeda10293da13d94a692d10cb954e4506d7c
2021-02-19 17:09:22 -08:00
Jane Xu
09516d2d0c Reenables skipped tests for all CUDA versions except 11.2 (#52359)
Summary:
This PR adds functionality to skip a test based on CUDA version.

This way, we can be more specific when skipping a test, such as when the test only fails for a particular CUDA version.

This allows us to add back the skipped tests for CUDA 11.2 for other CUDA versions, such as 10.1 and 11.1.

I tested this locally (by using 11.0 instead of 11.2), but will run all the CI to make sure it works.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52359

Reviewed By: walterddr

Differential Revision: D26487951

Pulled By: janeyx99

fbshipit-source-id: 45c71cc6105ffd9985054880009cf68ea5ef3f6a
2021-02-19 15:30:55 -08:00
Jerry Zhang
626756ac39 [quant][graphmode][api] debug --> reference (#52179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52179

Rename debug to reference. We'll use this to produce a reference quantized model
that can be used as a common interface between pytorch quantized model and backends.

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D26424656

fbshipit-source-id: a0299b023f6ba7d98f5750724c517b0ecb987b35
2021-02-19 14:20:01 -08:00
kshitij12345
b6cf17deee [reland][complex] masked_fill: Complex Autograd support and update masked_scatter skips. (#52483)
Summary:
Reland https://github.com/pytorch/pytorch/issues/52035

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52483

Reviewed By: heitorschueroff

Differential Revision: D26545097

Pulled By: anjali411

fbshipit-source-id: f154c239183279be381a7393a8226778b36148bb
2021-02-19 12:36:49 -08:00
Nikita Vedeneev
9699c703c2 Stable sort for the CPU take 2. (#51790)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/38681.
A duplicate of https://github.com/pytorch/pytorch/pull/50052 created to become importable to the fb internal tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51790

Reviewed By: agolynski

Differential Revision: D26279045

Pulled By: glaringlee

fbshipit-source-id: 348e171dee9c370a76002b65d0c82c329f57a421
2021-02-19 09:28:57 -08:00
kshitij12345
5fda3b094c Add conj OpInfo and fix out inconsistency (#52059)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/42515
Fixes: https://github.com/pytorch/pytorch/issues/51949

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52059

Reviewed By: ailzhang

Differential Revision: D26373800

Pulled By: anjali411

fbshipit-source-id: d2c92263a690072c0f23cb60885be42eebea48c6
2021-02-19 08:18:55 -08:00
Yanli Zhao
c75fa39b6c add stats that can only be collected at runtime (#51386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51386

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data

1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices
2. use at::cuda::event API for safer calls
3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results.
4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls

ghstack-source-id: 121933566

Test Plan: unit tests

Reviewed By: SciPioneer

Differential Revision: D26158645

fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178
2021-02-19 00:13:11 -08:00
Rohan Varma
c29e279f72 [DDP] unittest for when params arent used in backward pass (#52384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52384

Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient.
ghstack-source-id: 122001930

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D26482479

fbshipit-source-id: c80bdeea7cf9db35390e385084ef28d64ed239eb
2021-02-18 23:34:16 -08:00
76181208+imaginary-person@users.noreply.github.com
3adc8f8cf7 Enable min & max for Float16 & BFloat16 (#51244)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50790.

Added `min()` & `max()` support for `Float16` & `BFloat16`.
CUDA already supported these ops on `Float16`, so the other three combinations had to be enabled.
`OpInfo`s for `min` & `max` were also added, and their sample inputs were removed from `method_tests()`.

### MORE INFO
The (slightly) long-term goal is to add dispatch for `min()` & `max()` related operations on CPU & CUDA for `Float16` & `BFloat16`,
wherever they aren't present already:
1. `amin()`
2. `argmax()`
3. `amax()`
4. `argmin()`
5. `torch._aminmax()`
6. `torch.clamp()` on CPU. Was already supported on CUDA
7. `min()` (in this PR)
8. `max()` (in this PR)
9. `minimum()`
10. `maximum()`

I'll submit separate PRs for the other ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51244

Reviewed By: jbschlosser

Differential Revision: D26503455

Pulled By: anjali411

fbshipit-source-id: c32247f214e9272ca2e4322a23337874e737b140
2021-02-18 23:13:51 -08:00
Raghavan Raman
c7a70eec1b Make LLVM the default backend for TE (#52314)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52264

When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression.

This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52314

Reviewed By: ejguan

Differential Revision: D26491294

Pulled By: navahgar

fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb
2021-02-18 12:00:38 -08:00
Gregory Chanan
983347fa25 Allow broadcasting against lerp weights. (#52319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52319

Fixes: https://github.com/pytorch/pytorch/issues/52254

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26488411

Pulled By: gchanan

fbshipit-source-id: 60eb471609986584c4235ba7f263581e988e7642
2021-02-18 09:53:25 -08:00
Rong Rong (AI Infra)
b52e2e6045 [BE] _get_torch_cuda_version should return tuple (#52409)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52409

Reviewed By: jbschlosser, glaringlee

Differential Revision: D26513924

Pulled By: walterddr

fbshipit-source-id: ee18ef357c326c5ad344d80c59821cc2b8814734
2021-02-18 09:28:38 -08:00
Vasiliy Kuznetsov
d903106bad [wip] ns for fx: add support for subgraph matching (#52130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52130

We have patterns like (F.linear, F.relu) which need to match
to (toq.linear_relu).  So, we need to match subgraphs.

This PR does the following:
* defines a "subgraph" as (start_node, end_node). The current assumption
is that subgraphs are simple, there is always a path from start_node to
end_node, and we can ignore any non-input args/kwargs of these nodes
for the purposes of matching and copying things. An example one node
subgraph is (F.linear, F.linear).  An example two node subgraph
is (F.linear, F.relu).
* changes the matching logic to iterate over subgraphs instead of nodes
* changes the NS core APIs to use subgraph pairs instead of node pairs:
1. for weights, we match on the start node
2. for unshadowed activations, we observe the end nodes
3. for shadowed activations, we copy the subgraph of a to graph c

TODO(before review) write up better, not ready for review yet

Test Plan:
TODO before land: better test plan

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D26403092

fbshipit-source-id: e49aaad4b02b8d60589435848bee422b8f41937a
2021-02-18 08:20:04 -08:00
Vasiliy Kuznetsov
3978ffb37a NS for FX: add test for a simple sparsenn model (#52092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52092

Adds a very simple toy sparsenn model, and enables
its inspection with the new NS APIs.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_sparsenn_compare_activations
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_sparsenn_shadow
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D26403095

fbshipit-source-id: 3c3650aca47186deb32f2b3f1d87a0716d1ad9d1
2021-02-18 08:17:57 -08:00
Anjali Chourdia
758aa45563 Revert D26369476: [pytorch][PR] [complex] masked_fill: Complex Autograd support and update masked_scatter skips.
Test Plan: revert-hammer

Differential Revision:
D26369476 (7a408c7290)

Original commit changeset: 7a79d5a609b0

fbshipit-source-id: f0011f40962ccbcd8e7c19bd727e1e49cf2ec0c4
2021-02-18 05:01:03 -08:00
kshitij12345
7a408c7290 [complex] masked_fill: Complex Autograd support and update masked_scatter skips. (#52035)
Summary:
Now that `masked_fill` CUDA is migrated, skips on masked_scatter can be removed.

Reference: https://github.com/pytorch/pytorch/issues/33152

**Note**:

Have decreased the shape of Tensor for `masked_scatter` from (M, M) -> (S, S) and so on.

With shapes of M : **96.53s**
```
test/test_ops.py ........................................ssssssssssss........................ssssssssssss........................                                                   [100%]

=============================================================== 88 passed, 24 skipped, 7981 deselected in 96.53s (0:01:36) ================================================================
```

With shapes of S : **46.53s**
```
test/test_ops.py ........................................ssssssssssss........................ssssssssssss........................                                                   [100%]

==================================================================== 88 passed, 24 skipped, 7981 deselected in 46.53s =====================================================================
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52035

Reviewed By: VitalyFedyunin

Differential Revision: D26369476

Pulled By: anjali411

fbshipit-source-id: 7a79d5a609b0019f8fe9ce6452924dd33390dce1
2021-02-17 22:49:26 -08:00
Kimish Patel
08b95e3c48 [Pytorch, Sparsity] Integrate sparse qnnpack operator in framework (#52377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52377

Add QNNPACK specific packed params for sparse linear.
Add sparse linear dynamic op with appropriate registration.
Add python side LinearDynamic module for sparsity.
Add tests to validate sparse linear qnnpack kernels.
Note that since these test are mostly run on x86 platform and
given that 1x4 sparse kernels are implemented both in sse and arm,
LinearDynamic at the moment defaults to 1x4 pattern.
Plan is to add another diff that will allow a global override for 8x1 pattern
such that prepare/convert flow can work for exporting model for mobile.

Test Plan: buck run caffe2/torch/fb/model_optimization:sparsity_test

Reviewed By: z-a-f

Differential Revision: D26491944

fbshipit-source-id: b98839b4c62664e1fabbb0cbeb2e5c1bd5903b4d
2021-02-17 18:25:13 -08:00
Ansley Ussery
440fddf07b Remove unnecessary statement in capture_stderr (#52366)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52366

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D26489602

Pulled By: ansley

fbshipit-source-id: dd0db0a631840b5efd5dc48887fbf724781c6be4
2021-02-17 12:28:46 -08:00
Rohan Varma
6dabe0b291 [Dist Profiling] Enable dist profiling for DDP (gloo only) (#52031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52031

Closes https://github.com/pytorch/pytorch/issues/52020
Ensures that we can profile collectives in DDP by propagating the profiler threadLocalState appropriately. As described in the above issue, before this wouldn't work as the profiler would only be enabled on the main thread.
ghstack-source-id: 121818080

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D26356192

fbshipit-source-id: 0158b5833a3f857a0b4b2943ae3037e9d998dfd1
2021-02-17 12:21:37 -08:00
Nikita Shulga
72d1ccd3ca Revert D26263480: [Pytorch, Sparsity] Integrate sparse qnnpack operator in framework
Test Plan: revert-hammer

Differential Revision:
D26263480 (87ebaa4eb1)

Original commit changeset: 04ab60aec624

fbshipit-source-id: ad7690eebdc4b2782c2c94b5bbadbde4ef7c0627
2021-02-17 11:29:08 -08:00
Kimish Patel
87ebaa4eb1 [Pytorch, Sparsity] Integrate sparse qnnpack operator in framework
Summary:
Add QNNPACK specific packed params for sparse linear.
Add sparse linear dynamic op with appropriate registration.
Add python side LinearDynamic module for sparsity.
Add tests to validate sparse linear qnnpack kernels.
Note that since these test are mostly run on x86 platform and
given that 1x4 sparse kernels are implemented both in sse and arm,
LinearDynamic at the moment defaults to 1x4 pattern.
Plan is to add another diff that will allow a global override for 8x1 pattern
such that prepare/convert flow can work for exporting model for mobile.

Test Plan: buck run caffe2/torch/fb/model_optimization:sparsity_test

Reviewed By: z-a-f

Differential Revision: D26263480

fbshipit-source-id: 04ab60aec624d1ecce8cfb38b79c7e94f501cdf6
2021-02-17 08:44:16 -08:00
Vasiliy Kuznetsov
bfc7e28188 reland - ns for fx - stubs of the three APIs (compare weights, activations, activations with shadow) (#52302)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52302

Adds the basic functionality for the three Numeric Suite core APIs to work on FX models:
1. comparing weights
2. comparing activations, with same input fed to both models
3. comparing activations, with nodes of A shadowing nodes of B

Note: there are a lot of TODOs in the code, and some/most of the APIs and implementation details may change as we iterate.  This is just the first PR.

Test Plan:
We have unit test coverage for all of the APIs, for now this is with toy models:

```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```

Reviewed By: raghuramank100

Differential Revision: D26463013

Pulled By: vkuzo

fbshipit-source-id: e454115099ad18e4037d3c54986951cdffcab367
2021-02-16 19:59:32 -08:00
Natalia Gimelshein
eaddadd4f7 Revert D26403094: ns for fx - stubs of the three APIs (compare weights, activations, activations with shadow)
Test Plan: revert-hammer

Differential Revision:
D26403094 (37622db76a)

Original commit changeset: 9752331d4ae0

fbshipit-source-id: f0a32d443a29b25af33d90420dfd1bada40c917c
2021-02-14 15:09:16 -08:00
Ansley Ussery
4cc10563e7 Customize traceback for calls to symbolically-traced code (#51648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51648

The following code will throw during the call to `traced(5)`:
```python
class M(nn.Module):
    def __init__(self):
        super(M, self).__init__()
        self.W = torch.nn.Parameter(torch.randn(5))

    def forward(self, x):
        return torch.dot(self.W, x)

traced = fx.symbolic_trace(M())
traced(5)
```

Traceback before:
```
Traceback (most recent call last):
  File "test/tinytest.py", line 26, in <module>
    traced(5)
  File "/home/ansley/local/pytorch/torch/fx/graph_module.py", line 338, in wrapped_call
    return self._cls_call(self, *args, **kwargs)
  File "/home/ansley/local/pytorch/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "<eval_with_key_0>", line 4, in forward
TypeError: dot(): argument 'tensor' (position 2) must be Tensor, not int
```

Traceback after:
```
Traceback (most recent call last):
  File "/home/ansley/local/pytorch/torch/fx/graph_module.py", line 338, in wrapped_call
    return torch.nn.Module.__call__(self, *args, **kwargs)
  File "/home/ansley/local/pytorch/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "<eval_with_key_1>", line 4, in forward
    dot_1 = torch.dot(w, x);  w = x = None
TypeError: dot(): argument 'tensor' (position 2) must be Tensor, not int

Call using an FX-traced Module, line 4 of the traced Module’s generated forward function:
    w = self.W
    dot_1 = torch.dot(w, x);  w = x = None

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    relu_1 = dot_1.relu();  dot_1 = None

    return relu_1
```

(Note that the same `TypeError` is thrown despite modifying the traceback.)

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D26424005

Pulled By: ansley

fbshipit-source-id: 368f46ba81fb3111bd09654825bb2ac5595207d1
2021-02-12 18:31:23 -08:00