Commit Graph

620 Commits

Author SHA1 Message Date
James Reed
9bc8f071a3 [WIP] Move torch.fx into its own target (#46658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46658

ghstack-source-id: 115213192

Test Plan: waitforsadcastle

Reviewed By: zdevito, vkuzo

Differential Revision: D24374723

fbshipit-source-id: 2b5708001f5df2ffb21ea5e586e26030653ccdcf
2020-10-29 17:03:08 -07:00
Rohan Varma
d850b5c98c Fix DDP issue where parameters share same grad_accumulator (#46755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46755

As reported in https://github.com/pytorch/pytorch/issues/41324, there is a bug in DDP when `find_unused_parameters=True` and 2 or more parameters share the same gradient accumulator.

In the reducer, we currently keep a mapping of grad accumulator to index and populate it with map[accumulator] = index, but this overwrites indices when the accumulator is the same. To fix this, switch the mapping values to a vector of indices to hold all such indices that share the same accumulator.
ghstack-source-id: 115453567

Test Plan: Added UT

Reviewed By: pritamdamania87

Differential Revision: D24497388

fbshipit-source-id: d32dfa9c5cd0b7a8df13c7873d5d28917b766640
2020-10-29 12:23:06 -07:00
Yi Wang
cab32d9cdf [RPC Framework] Support remote device format "<workername>/<device>" (#46773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46773

Changed the constructor of RemoteModule to accept a `remote_device` arg in the following format:
"<workername>/<device>" (e.g., "trainer0/cpu", "ps0/cuda:0")

This arg merges the original `on` and `device` arg.

Original PR issue: RemoteDevice Format #46554
ghstack-source-id: 115448051

Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule

Reviewed By: pritamdamania87

Differential Revision: D24482562

fbshipit-source-id: 5acfc73772576a4b674df27625bf560b8f8e67c1
2020-10-29 00:14:56 -07:00
Kshiteej K
5c8aad1141 [numpy] torch.cos, torch.tan : promote integer inputs to float (#46706)
Summary:
References https://github.com/pytorch/pytorch/issues/42515

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46706

Reviewed By: izdeby

Differential Revision: D24537262

Pulled By: mruberry

fbshipit-source-id: e57377a625814a3f34a765ce6bfd63a33c02a5d9
2020-10-28 22:02:52 -07:00
Jerry Zhang
c2a3951352 [quant][graphmode][fx] Remove inplace option for convert_fx (#46955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46955

Initially we were thinking of adding a `invalidate_quantized_float_parameters` option to free the memory
of quantized floating parameters, but it turns out we will do module swap just like in eager mode for the modules
that are quantized, so the old floating point module will not be referenced after quantization. therefore this feature
is only needed for functionals, since most people are using quantization with modules we may not need this.

we'll revisit after we find there is a need for this.

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D24579400

fbshipit-source-id: fbb0e567405dc0604a2089fc001573affdade986
2020-10-28 21:07:19 -07:00
Rohan Varma
c7183c9878 Fix object-based collectives API to use torch.cuda.current_device instead of (#46897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46897

These APIs implicitly assumed that gpu for rank == rank index, but
that is not necessarily true. For example, the first GPU could be used for a
different purpose and rank 0 could use GPU 1, rank 1 uses GPU 2, etc. Thus, we
mandate that the user specify the device to use via `torch.cuda.set_device()`
before making calls to this API. This expectation should be okay since we
clearly document it, and we expect the user to set this for
DistributedDataParallel as well.

Also adds/tidies up some documentation.
ghstack-source-id: 115359633

Test Plan: Modified unittests

Reviewed By: divchenko

Differential Revision: D24556177

fbshipit-source-id: 7e826007241eba0fde3019180066ed56faf3c0ca
2020-10-28 18:12:50 -07:00
Richard Barnes
353e7f940f Ensure kernel launches are checked (#46474)
Summary:
Caffe2 and Torch currently does not have a consistent mechanism for determining if a kernel has launched successfully. The result is difficult-to-detect or silent errors. This diff provides functionality to fix that. Subsequent diffs on the stack fix the identified issues.

Kernel launch errors may arise if invalid launch parameters (number of blocks, number of threads, shared memory, or stream id) are specified incorrectly for the hardware or for other reasons. Interestingly, unless these launch errors are specifically checked for CUDA will silently fail and return garbage answers which can affect downstream computation. Therefore, catching launch errors is important.

Launches are currently checked by placing
```
AT_CUDA_CHECK(cudaGetLastError());
```
somewhere below the kernel launch. This is bad for two reasons.
1. The check may be performed at a site distant to the kernel launch, making debugging difficult.
2. The separation of the launch from the check means that it is difficult for humans and static analyzers to determine whether the check has taken place.

This diff defines a macro:
```
#define TORCH_CUDA_KERNEL_LAUNCH_CHECK() AT_CUDA_CHECK(cudaGetLastError())
```
which clearly indicates the check.

This diff also introduces a new test which analyzes code to identify kernel launches and determines whether the line immediately following the launch contains `TORCH_CUDA_KERNEL_LAUNCH_CHECK();`.

A search of the Caffe2 codebase identifies 104 instances of `AT_CUDA_CHECK(cudaGetLastError());` while the foregoing test identifies 1,467 launches which are not paired with a check. Visual inspection indicates that few of these are false positives, highlighting the need for some sort of static analysis system.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46474

Test Plan:
The new test is run with:
```
buck test //caffe2/test:kernel_launch_checks -- --print-passing-details
```
And should be launched automatically with the other land tests. (TODO: Is it?)

The test is currently set up only to provide warnings but can later be adjusted to require checks.

Otherwise, I rely on the existing test frameworks to ensure that changes resulting from reorganizing existing launch checks don't cause regressions.

Reviewed By: ngimel

Differential Revision: D24309971

Pulled By: r-barnes

fbshipit-source-id: 0dc97984a408138ad06ff2bca86ad17ef2fdf0b6
2020-10-28 09:27:48 -07:00
Yang Wang
810c68fb1d [OpBench] fix jit tracing with quantized op/tensor by enabling _compare_tensors_internal to compare quantized tensors (#46772)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46772

When running `buck run caffe2/benchmarks/operator_benchmark/pt:qactivation_test -- --use_jit`, I encountered the following error P146518683. The error was traced down to the fact that `torch.allclose` does not work with quantized tensors (the error was triggered by this particular multiplication https://fburl.com/diffusion/8vw647o6 since native mul can not work with a float scalar and a quantized tensor.)

Minimum example to reproduce:
```(Pdb) input = torch.ones(5)
(Pdb) aa = torch.quantize_per_tensor(input, scale=1.0, zero_point=0, dtype=torch.quint8)
(Pdb) bb = torch.quantize_per_tensor(input, scale=1.0, zero_point=0, dtype=torch.quint8)
(Pdb) torch.allclose(aa, bb)
Comparison exception: 	promoteTypes with quantized numbers is not handled yet; figure out what the correct rules should be, offending types: QUInt8 Float
```

Here the proposed fix is to compare quantized tensors strictly within `_compare_tensors_internal`.

The other two possible fixes are:
1. convert quantized tensors to float tensors first before sending them to `torch.allclose`
2. change `torch.allclose` to handle quantized tensor.

Test Plan: buck run caffe2/benchmarks/operator_benchmark/pt:qactivation_test -- --use_jit

Reviewed By: kimishpatel

Differential Revision: D24506723

fbshipit-source-id: 6426ea2a88854b4fb89abef0edd2b49921283796
2020-10-27 18:53:13 -07:00
kshitij12345
21e60643c0 [numpy] torch.log{2,10} : promote integer inputs to float (#46810)
Summary:
References https://github.com/pytorch/pytorch/issues/42515

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46810

Reviewed By: izdeby

Differential Revision: D24536187

Pulled By: mruberry

fbshipit-source-id: b7dd7678d4e996f3dea0245c65055654e02be459
2020-10-27 13:07:44 -07:00
Pritam Damania
adafd3d4b2 Support RRef.backward() for local RRefs. (#46568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46568

This PR adds support for an RRef.backward() API. This would be useful
in applications like pipeline parallelism as described here:
https://github.com/pytorch/pytorch/issues/44827

This PR only adds support for local RRefs, remote RRef support will be added in
a follow up PR.
ghstack-source-id: 115100729

Test Plan:
1) unit tests.
2) waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D24406311

fbshipit-source-id: fb0b4e185d9721bf57f4dea9847e0aaa66b3e513
2020-10-26 17:31:17 -07:00
anjali411
13a5be571b Enable complex backward for torch.take() and tensor.fill_() (#46860)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46860

Test Plan: Imported from OSS

Reviewed By: izdeby

Differential Revision: D24544601

Pulled By: anjali411

fbshipit-source-id: 4e29d48da30da3630cb558ccee464d89780b1ab7
2020-10-26 15:46:08 -07:00
Oscar Sandoval
58ed60c259 Added context manager enabling all futures returned by rpc_async and custom build rpc functions to be automatically waited on (#41807)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41807

Test Plan: Make sure ci tests pass, including newly written test

Reviewed By: mrshenli

Differential Revision: D22640839

Pulled By: osandoval-fb

fbshipit-source-id: 3ff98d8e8c6e6d08575e307f05b5e159442d7216
2020-10-26 12:53:35 -07:00
anjali411
d94bd998ec Update backward formulas (Re #44444) (#46275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46275

Re #44444

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D24285785

Pulled By: anjali411

fbshipit-source-id: c60ecd4fe4f144132085f2c91d3b950e92b2a491
2020-10-25 19:40:59 -07:00
Guilherme Leobas
789e935304 Annotate torch.nn.cpp (#46490)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46489

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46490

Reviewed By: zhangguanheng66

Differential Revision: D24509519

Pulled By: ezyang

fbshipit-source-id: edffd32ab2ac17ae4bbd44826b71f5cb9f1da1c5
2020-10-23 17:40:32 -07:00
Nikita Vedeneev
c31ced4246 make torch.lu differentiable. (#46284)
Summary:
As per title. Limitations: only for batches of squared full-rank matrices.

CC albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46284

Reviewed By: zou3519

Differential Revision: D24448266

Pulled By: albanD

fbshipit-source-id: d98215166268553a648af6bdec5a32ad601b7814
2020-10-23 10:13:46 -07:00
Supriya Rao
e34c825b77 [quant][fx] Embedding quantization support (#46677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46677

Add support for weight only embedding quantization

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_qembedding_module

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D24463305

fbshipit-source-id: 2dba49d8a77cf237a8e6da2efdd83b1ebdc432d6
2020-10-22 17:59:52 -07:00
Alexander Grund
93719440b8 Replace map(lambda constructs (#46462)
Summary:
Follow-up of https://github.com/pytorch/pytorch/issues/46461 with a similar goal

Makes them more readable and possibly faster. Care has to be taken because `map` applies the function immediately while `(x for x in xs)` is a generator expression which gets evaluated later. This is a benefit in some cases where it is not required to actually create the list of values in memory (e.g. when passing to `tuple` or `extend` or `join`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46462

Reviewed By: zou3519

Differential Revision: D24422343

Pulled By: ezyang

fbshipit-source-id: 252e33499c92ac0b15238f2df32681dbbda2b237
2020-10-22 09:50:22 -07:00
Rohan Varma
25dc0056f2 [RPC] print exception message on workers that run python functions (#46372)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46372

Currently, in `_run_function`, we catch an exception from the python
function which is run, and report it back to the master. However in some large
scale training jobs, it would be valuable to also log the error on the trainer
itself for faster debugging.

Test Plan: Added unittest.

Reviewed By: pritamdamania87

Differential Revision: D24324578

fbshipit-source-id: 88460d7599ea69d2c38fd9c10eb6471f7edd4100
2020-10-22 09:44:15 -07:00
Ivan Kobzarev
3112e23428 [py][vulkan][reland] Add is_vulkan to py api, add vulkan to device type parsing (#46655)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46655

Test Plan: Imported from OSS

Pulled By: IvanKobzarev

Reviewed By: mrshenli

Differential Revision: D24448984

fbshipit-source-id: 5000846a06077f7a5a06dd51da422d2a42f70820
2020-10-22 09:35:50 -07:00
Rohan Varma
7245d2c939 Avoid scatter for single-device case in DDP (#46304)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46304

In the case that a single process operates only on one GPU, we can
avoid this scatter and instead replace it with a recursive version of `to`
which transfers the input tensors to the correct device.

The implementation of `_recursive_to` is modeled after `scatter` in https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/scatter_gather.py, in order to keep parity with the previous conventions (i.e. custom types not having their tensors moved).
ghstack-source-id: 114896677

Test Plan: Added unittest, and CI

Reviewed By: pritamdamania87

Differential Revision: D24296377

fbshipit-source-id: 536242da05ecabfcd36dffe14168b1f2cf58ca1d
2020-10-22 08:29:37 -07:00
kshitij12345
8e13fe6c44 [numpy] torch.sin : support and promote integer inputs to float (#45733)
Summary:
References https://github.com/pytorch/pytorch/issues/42515

> Enable integer -> float unary type promotion for ops like sin

Will follow-up for other such Ops once this PR is merged.

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45733

Reviewed By: zou3519

Differential Revision: D24431194

Pulled By: mruberry

fbshipit-source-id: db600bc5de0e535b538d2aa301c3526b7c75ed17
2020-10-22 01:58:57 -07:00
Xiao Wang
fe4f90c40b Cusolver inverse check info (#46625)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46557

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46625

Reviewed By: zou3519

Differential Revision: D24438577

Pulled By: ngimel

fbshipit-source-id: d00e6eb2eae4aa39ca6ecf5914fe9cf37c24b906
2020-10-21 21:46:33 -07:00
Rahul Nambiar
adbb50ea67 Enabling alias annotation checks for all operations during autograd tests (#46601)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46601

* except excluded tests and magic methods.

https://github.com/pytorch/pytorch/issues/38731

Previously, we'd only do run these tests for inplace operations. Since this is a lot more tests, fixed these issues that came up when running them -
- Updated schema of conj() to reflect existing behaviour.
- Updated deepEquals method in check_alias_annotation.cpp to re-use the overloaded == operator. Previous implementation did not cover all types of IValues.
- Corrected the order inputs are passed in during autograd testing of 'view' & 'reshape'.
- Subbed out atn::ger with the func its aliased to, atn::outer, for testing. The alias annotation checking code doesn't handle aliased operators properly.
ghstack-source-id: 114830903

Test Plan: Ran all tests in test:jit and verified they pass.

Reviewed By: eellison

Differential Revision: D24424955

fbshipit-source-id: 382d7e2585911b81b1573f21fff1d54a5e9a2054
2020-10-21 20:01:57 -07:00
Howard Huang
611f028168 Add Batch-Updating Parameter Server Example to CI Tests (#46510)
Summary:
Resolves one item in https://github.com/pytorch/pytorch/issues/46321

This PR sets up DistExamplesTest which will be used as the class to implement future tests for examples. This class is run as part of CI tests. It also creates a dist_examples folder and includes the [batch server example](https://github.com/pytorch/examples/blob/master/distributed/rpc/batch/parameter_server.py) which is slightly modified to allow to be tested.

Run test:
pytest test/distributed/rpc/test_tensorpipe_agent.py -k test_batch_updating_parameter_server -vs
pytest test/distributed/rpc/test_process_group_agent.py -k test_batch_updating_parameter_server -vs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46510

Reviewed By: mrshenli

Differential Revision: D24379296

Pulled By: H-Huang

fbshipit-source-id: 1c102041e338b022b7a659a51894422addc0e06f
2020-10-21 08:46:46 -07:00
Jerry Zhang
f9446cb15a [quant][refactor] Remove register api and rename get_*_mapping to get_default_*_mapping (#46337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46337

We plan to pass around the mappings instead of using global registration api to keep
the mappings local to the transformations user is performing

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D24317436

fbshipit-source-id: 81569b88f05eeeaa9595447e482a12827aeb961f
2020-10-20 15:53:47 -07:00
Jane Xu
0d4590c279 renaming env var IN_CIRCLECI to a broader name of IN_CI (#46567)
Summary:
The `IN_CIRCLECI` variable is a misnomer since the flag really indicates when we enable XML reporting because we want to run the test in CI. Since this doesn't necessarily mean CircleCI in particular, IN_CI is more accurate and general.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46567

Reviewed By: walterddr

Differential Revision: D24407642

Pulled By: janeyx99

fbshipit-source-id: 5e141a0571b914310a174a58ac0fde58e9521c6b
2020-10-20 08:25:39 -07:00
Alexander Grund
5b0f400488 Replace list(map(...)) constructs by list comprehensions (#46461)
Summary:
As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant.

It also fixes a bug detected by this where the argument order of `map` was confused: 030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)

Fixes https://github.com/pytorch/pytorch/issues/46392

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461

Reviewed By: ailzhang

Differential Revision: D24367015

Pulled By: ezyang

fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7
2020-10-19 18:42:49 -07:00
Jerry Zhang
30d687522d [reland][quant][eagermode] Move custom_module registration to prepare/convert_custom_config_dict (#46293) (#46364)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46364

Test Plan:
Imported from OSS

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D24322747

fbshipit-source-id: 4801ba1835fc805bf767fe9810b9edfa2ceeefb4
2020-10-19 15:21:00 -07:00
Nikita Shulga
172ed51a17 Mark parts of spectral tests as slow (#46509)
Summary:
According to https://app.circleci.com/pipelines/github/pytorch/pytorch/228154/workflows/31951076-b633-4391-bd0d-b2953c940876/jobs/8290059
TestFFTCUDA.test_fftn_backward_cuda_complex128 takes 242 seconds to finish, where most of the time spent checking 2nd gradient

Refactor common part of test_fft_backward and test_fftn_backward into _fft_grad_check_helper
Introduce `slowAwareTest` decorator
Split test into fast and slow parts by checking 2nd degree gradient only during the slow part

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46509

Reviewed By: walterddr

Differential Revision: D24378901

Pulled By: malfet

fbshipit-source-id: 606670c2078480219905f63b9b278b835e760a66
2020-10-19 10:11:46 -07:00
Rohan Varma
5c5484c889 [Flaky tests] Fix test_all_gather_timeout test (#45989)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45989

This test was failing internally for the Thrift-based RPC agent, since
it has a different error regex. Use `self.get_timeout_error_regex` which gets
the timeout error string for each backend to fix this.
ghstack-source-id: 114463458

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D24170394

fbshipit-source-id: 9b30945e3e30f36472268d042173f8175ad88098
2020-10-16 09:02:46 -07:00
Xiong Wei
7b788d113e Fix deprecated warnings for nan_to_num (#46309)
Summary:
Related to https://github.com/pytorch/pytorch/issues/44592

This PR is to fix the deprecated warnings for the nan_to_num function.

Below is the warning message when building the latest code.
```
../aten/src/ATen/native/UnaryOps.cpp: In function ‘at::Tensor& at::native::nan_to_num_out(at::Tensor&,
const at::Tensor&, c10::optional<double>, c10::optional<double>, c10::optional<double>)’:
../aten/src/ATen/native/UnaryOps.cpp:397:45: warning: ‘bool c10::isIntegralType(c10::ScalarType)’
is deprecated: isIntegralType is deprecated.
Please use the overload with 'includeBool' parameter instead. [-Wdeprecated-declarations]
   if (c10::isIntegralType(self.scalar_type())) {
```

The deprecated warning is defined in `ScalarType.h`.
d790ec6de0/c10/core/ScalarType.h (L255-L260)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46309

Reviewed By: mrshenli

Differential Revision: D24310248

Pulled By: heitorschueroff

fbshipit-source-id: 0f9f2ad304eb5a2da9d2b415343f2fc9029037af
2020-10-16 06:07:14 -07:00
Rong Rong
d1745c36dc fix type check for torch.quantization._numeric_suite (#46330)
Summary:
fix https://github.com/pytorch/pytorch/issues/42977

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46330

Reviewed By: malfet

Differential Revision: D24320449

Pulled By: walterddr

fbshipit-source-id: f892b5c83cb932aee53245d6c825568b3e05f3c6
2020-10-15 20:45:07 -07:00
Ivan Yashchuk
c1141b6f68 Added support for complex torch.pinverse (#45819)
Summary:
This PR adds support for complex-valued input for `torch.pinverse`.
Fixed cuda SVD implementation to return singular values with real dtype.

Fixes https://github.com/pytorch/pytorch/issues/45385.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45819

Reviewed By: heitorschueroff

Differential Revision: D24306539

Pulled By: anjali411

fbshipit-source-id: 2fe19bc630de528e0643132689e1bc5ffeaa162a
2020-10-15 12:28:22 -07:00
Xiaomeng Yang
a87a1c1103 Fix perfornance issue of GroupNorm on CUDA when feature map is small. (#46170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46170

Fix perfornance issue of GroupNorm on CUDA when feature map is small.

Benchmark script:

```
import torch
import torch.nn.functional as F

from timeit import Timer

norm = torch.nn.GroupNorm(8, 512).cuda()

num = 5000

sizes = [(1024, 512, 14, 14), (1024, 512, 7, 7), (1024, 512)]

def forward(x):
    _ = norm(x)
    torch.cuda.synchronize()

def backward(y, grad):
    y.backward(grad, retain_graph=True)
    torch.cuda.synchronize()

if __name__ == "__main__":
    # warm up
    x = torch.rand(*(sizes[0]), dtype=torch.float,
                   device="cuda", requires_grad=True)
    for _ in range(100):
        forward(x)

    for size in sizes:
        x = torch.rand(*size, dtype=torch.float,
                       device="cuda", requires_grad=True)
        t = Timer("forward(x)", "from __main__ import forward, x")
        print(f"size = {size}:")
        t1 = t.timeit(num) / num * 1e6
        print(f"avg_forward_time =  {t1}us")

        y = norm(x)
        grad = torch.randn_like(y)
        t = Timer("backward(y, grad)", "from __main__ import backward, y, grad")
        t2 = t.timeit(num) / num * 1e6
        print(f"avg_backward_time = {t2}us")
```
Benchmark result before this Diff:
```
size = (1024, 512, 14, 14):
avg_forward_time =  1636.729855206795us
avg_backward_time = 5488.682465581223us
size = (1024, 512, 7, 7):
avg_forward_time =  465.88476160541177us
avg_backward_time = 3129.9425506033003us
size = (1024, 512):
avg_forward_time =  96.90486900508404us
avg_backward_time = 2319.4099438143894us
```

Benchmark result after this Diff:
```
size = (1024, 512, 14, 14):
avg_forward_time =  1635.6191572034732us
avg_backward_time = 4140.7730475999415us
size = (1024, 512, 7, 7):
avg_forward_time =  463.6513736099005us
avg_backward_time = 1641.7451039887965us
size = (1024, 512):
avg_forward_time =  66.59087920561433us
avg_backward_time = 128.6882139975205us

```

Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GroupNorm"

Reviewed By: hl475, houseroad

Differential Revision: D24242738

fbshipit-source-id: b52c82d7b6e47855c48fa8ceacd0c55d03bb92d5
2020-10-14 23:34:33 -07:00
Meghan Lele
75bf5f2b59 [JIT] Improve class type annotation inference (#45940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45940

**Summary**
In `try_ann_to_type`, if an annotation has an attribute named
`__torch_script_class__`, it is assumed to be a TorchScript class that
has already been scripted. However, if it is a class that extends
another class, this code path causes a crash because it looks up the
JIT type for the class by name in the compilation unit. This JIT type
obviously cannot exist because inheritance is not supported.

This commit fixes this by looking up the qualified name of a class
in torch.jit._state._script_class in order to ascertain whether it has
already been scripted (instead of looking for a `__torch_script_class__`
attribute on the class object.

**Test Plan**
This commit adds a unit test consisting of the code sample from the
issue that reported this problem.

**Fixes**
This commit fixes #45860.

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D24310027

Pulled By: SplitInfinity

fbshipit-source-id: 9f8225f3316fd50738d98e3544bf5562b16425b6
2020-10-14 23:28:47 -07:00
HyunJun
a69910868a Fix possible padding length overflow in DistributedSampler (#45329)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45324

This fix handles cases for `len(dataset) * 2 < num_replica` in DistributedSampler. (which previous code resulted in error.)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45329

Reviewed By: mruberry

Differential Revision: D24205035

Pulled By: rohan-varma

fbshipit-source-id: f94329d9c1e7deaee41e5af319e7c7d0c741910c
2020-10-14 17:19:44 -07:00
Mike Ruberry
ff0af7242b Revert D24290811: [quant][eagermode] Move custom_module registration to prepare/convert_custom_config_dict
Test Plan: revert-hammer

Differential Revision:
D24290811 (3ad797c937)

Original commit changeset: 7d2aee98e194

fbshipit-source-id: 24013e92044f2a1b36b1a9f475bbaa6f17bdaa11
2020-10-14 16:42:55 -07:00
Jerry Zhang
3ad797c937 [quant][eagermode] Move custom_module registration to prepare/convert_custom_config_dict (#46293)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46293

Test Plan: Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D24290811

fbshipit-source-id: 7d2aee98e1946c2a4268efb94443f1e5daaa793e
2020-10-14 12:10:37 -07:00
ashish
5500b62f28 Enable zero batch conv tests for ROCm (#46305)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/26669

This PR enables convolution tests for zero batch size implemented in https://github.com/pytorch/pytorch/pull/26214/.

jamesr66a jeffdaily sunway513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46305

Reviewed By: navahgar

Differential Revision: D24307981

Pulled By: heitorschueroff

fbshipit-source-id: dfc595fa855ae084b60a693e209b0fdcc714221d
2020-10-14 11:36:30 -07:00
Brian Hirsh
1f791c06f0 adding BAND/BOR/BXOR reduce ops to unsupported list for complex numbers. added tests (#46270)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46270

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D24284702

Pulled By: bdhirsh

fbshipit-source-id: 7e6c3fce83a4367808a638f0400999399b2c35b0
2020-10-14 08:48:14 -07:00
Pritam Damania
f89498f3f8 Allow RPC framework to use rank in addition to WorkerInfo and name. (#46221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46221

The RPC framework only allowed sending RPCs based on provided
WorkerInfo or name. When using RPC with DDP, sometimes it might just be easier
to refer to everything in terms of ranks since DDP doesn't support names yet.

As a result, support a `to` parameter in the RPC APIs which allow for
specifying a rank as well would be helpful.
ghstack-source-id: 114207172

Test Plan:
1) waitforbuildbot
2) Unit Tests

Reviewed By: mrshenli

Differential Revision: D24264989

fbshipit-source-id: 5edf5d92e2bd2f213471dfe7c74eebfa9efc9f70
2020-10-13 17:52:54 -07:00
Edward Yang
2118d58d45 Add some more docs to expecttest. (#46263)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46263

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: robieta

Differential Revision: D24281640

Pulled By: ezyang

fbshipit-source-id: 88c5b3bf091f47b69ce58aa321669158c5afda79
2020-10-13 15:17:11 -07:00
Brian Hirsh
a3caa719af fix #45552 - adding add_done_callback(fn) to torch.futures.Future (#45675)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45675

Test Plan: Imported from OSS

Reviewed By: glaringlee

Differential Revision: D24055353

Pulled By: bdhirsh

fbshipit-source-id: 9233c8e17acc878f0fecbe740a4397fb55cf722f
2020-10-13 07:47:36 -07:00
Erjia Guan
bed3b40523 Implement ravel (#46098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46098

Doc:
![image](https://user-images.githubusercontent.com/68879799/95611323-ae5cf380-0a2f-11eb-9b8e-56bf79ce68af.png)

Test Plan: Imported from OSS

Reviewed By: glaringlee

Differential Revision: D24253213

Pulled By: ejguan

fbshipit-source-id: 42a866c902272cbe3743a9d0cb3afb9165d51c0b
2020-10-12 16:00:44 -07:00
Brian Hirsh
c02efdefa8 adding complex support for distributed functions and . fix #45760 (#45879)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45879

Test Plan: Imported from OSS

Reviewed By: glaringlee

Differential Revision: D24127949

Pulled By: bdhirsh

fbshipit-source-id: 8061b14fa1c0adbe22b9397c2d7f92618556d223
2020-10-12 12:44:47 -07:00
Mingzhe Li
281463ba0b [NCCL] Enable send/recv tests (#45994)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45994

Send/Recv tests were disabled because of the https://github.com/pytorch/pytorch/issues/42517. With that issue fixed, this diff enables those tests.
ghstack-source-id: 113970569

Test Plan: waitforsandcastle

Reviewed By: jiayisuse

Differential Revision: D24172484

fbshipit-source-id: 7492ee2e9bf88840c0d0086003ce8e99995aeb91
2020-10-09 15:00:39 -07:00
Rohan Varma
62554a3bd2 Prioritize raising error message about unused parameters when rebuild_buckets fails (#45933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45933

Occasionally users run DDP with models with unused params, in this
case we would like to surface an error message telling them to run with
find_unused_params=True. However, a recent change to rebuild_buckets logic (https://github.com/pytorch/pytorch/pull/44798) made
it so that we raise a size mismatch error when this happens, but the
information about unused parameters is likely to be more useful and likely to
be the most common case of failure. Prefer raising this error over the
subsequent size mismatch errors.
ghstack-source-id: 113914759

Test Plan: Added unittest

Reviewed By: mrshenli

Differential Revision: D24151256

fbshipit-source-id: 5d349a988b4aac7d3e0ef7b3cd84dfdcbe9db675
2020-10-09 09:16:45 -07:00
Nikita Shulga
f363a2e106 Mark top 3 slowest tests as slow (#46068)
Summary:
`TCPStoreTest.test_numkeys_delkeys` takes 5+ min (mostly in idle wait for socket timeout)
`TestDataLoader.test_proper_exit` and `TestDataLoaderPersistentWorkers.test_proper_exit` take 2.5 min each
`TestXNNPACKConv1dTransformPass.test_conv1d_with_relu_fc` takes 2 min to finish

Add option to skip reporting test classes that run for less than a second to `print_test_stats.py` and speed up `TestTorchDeviceTypeCUDA.test_matmul_45724_cuda`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46068

Reviewed By: mruberry

Differential Revision: D24208660

Pulled By: malfet

fbshipit-source-id: 780e0d8be4f0cf69ea28de79e423291a1f3349b7
2020-10-08 21:10:03 -07:00
Mingzhe Li
b7f7378b2d [NCCL] support send/recv to/from self when communicator is created on demand (#45873)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45873

This diff adds support for sending/receiving to/from self. It also fixed a bug when p2p operations are not used by all processes.
ghstack-source-id: 113910526

Test Plan: waitforsandcastle

Reviewed By: jiayisuse

Differential Revision: D24124413

fbshipit-source-id: edccb830757ac64f569e7908fec8cb2b43cd098d
2020-10-08 19:19:15 -07:00
Shen Li
96d48178c8 Make pipeWrite and pipeRead noexcept (#45783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45783

After the previous device maps commits, `pipeWrite` might throw. In
this case, if we increment active calls before `pipeWrite` on the
caller, that active call won't be decremented properly when `pipeWrite`
throws. As a result, `shutdown` can silently timeout. I noticed this
as some tests take more than 60s to finish.

This commit extract the tensor device checking logic out of pipeWrite,
and make sure the error is thrown before the active call count is
incremented.

Differential Revision: D24094803

Test Plan: Imported from OSS

Reviewed By: mruberry

Pulled By: mrshenli

fbshipit-source-id: d30316bb23d2afd3ba4f5540c3bd94a2ac10969b
2020-10-08 18:53:51 -07:00