Commit Graph

38881 Commits

Author SHA1 Message Date
Meghan Lele
05b802d4e0 [pytorch] Bring back RemoveInplaceOps() (#62200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62200

This commit brings back the `RemoveInplaceOps` pass removed in D29523283 (dec5aa2260) that apparently had a bunch of internal users.

Test Plan: danthe3rd

Reviewed By: danthe3rd

Differential Revision: D29833316

fbshipit-source-id: 6cf13d463ab0a5e50ba3eb3243f79a9c51623809
2021-07-28 12:00:38 -07:00
Raghavan Raman
b91a917616 [Static Runtime] Fixed another build failure in OSS due to test_utils.h (#62338)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62338

Test Plan: Imported from OSS

Reviewed By: d1jang

Differential Revision: D29965744

Pulled By: navahgar

fbshipit-source-id: cf3e54ac13432ea8afc4b718fac6c9768743d01b
2021-07-28 11:41:33 -07:00
Thomas J. Fan
7c588d5d00 ENH Adds no_batch_dim support for pad 2d and 3d (#62183)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62183

Reviewed By: ejguan

Differential Revision: D29942250

Pulled By: jbschlosser

fbshipit-source-id: d1df4ddcb90969332dc1a2a7937e66ecf46f0443
2021-07-28 11:10:44 -07:00
zhouzhuojie
6da4a25509 Use private squid proxy (#62244)
Summary:
This PR adds a **private** squid proxy (note that the internal ELB is only accessible from the private VPC subnets of GitHub Runners) that's deployed dedicated for PyTorch CI for GitHub runners.

```
dig $SQUID_PROXY

10.0.x.x
10.0.x.x
```

http_proxy and https_proxy are compatible with the following http clients:

- curl
- wget
- python

Existing cache policy:

refresh_pattern -i .(7z|deb|rpm|exe|zip|tar|tgz|gz|ram|rar|bin|tiff|bz2|run|csv|sh)$ 1440 80% 2880
It uses the standard squid refresh_pattern for cache requests. In our setup, we tried
to cache at least (1440 minutes - 1 day) and at max (2880 minutes - 2 days), with
last-modified factor 80% (squid doc). Please refer to pytorch/test-infra for details.

Right now, it only applies to the build and test step, to limit the scope and make sure build and test are more reliable with egress cache.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62244

Test Plan:
```
# first time, cache miss (4min20s)
http_proxy=$SQUID_PROXY https_proxy=$SQUID_PROXY curl -v -L http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz --output /tmp/tmp_mnist.zip
100 9680k  100 9680k    0     0  37836      0  0:04:21  0:04:21 --:--:-- 29908

# second time, cache hit (0s)
http_proxy=$SQUID_PROXY https_proxy=$SQUID_PROXY curl -v -L http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz --output /tmp/tmp_mnist.zip
100 9680k  100 9680k    0     0   103M      0 --:--:-- --:--:-- --:--:--  103M
```

Load Test Plan:
```
# ab load test with `-n 100` requests
ab -X $SQUID_PROXY -n 100 http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz

Concurrency Level:      1
Time taken for tests:   9.044 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      991326300 bytes
HTML transferred:       991242200 bytes
Requests per second:    11.06 [#/sec] (mean)
Time per request:       90.442 [ms] (mean)
Time per request:       90.442 [ms] (mean, across all concurrent requests)
Transfer rate:          107040.50 [Kbytes/sec] received
```

Reviewed By: malfet

Differential Revision: D29928698

Pulled By: zhouzhuojie

fbshipit-source-id: 4ee78be0abe35411666c6121991b0addded57106
2021-07-28 10:37:42 -07:00
Yi Wang
2581dfc249 [Model Averaging] Create a base class for model averaging (#62111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62111

This base class will be passed to the post-localSGD optimizer in the next PR. This way, the same post-localSGD optimizer can choose different model averaging algorithms.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 134489187

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager

Reviewed By: rohan-varma

Differential Revision: D29884954

fbshipit-source-id: 1dc5e35c58895902991567f633afd621c7108938
2021-07-28 10:15:36 -07:00
Howard Huang
a15fff0a7f Revert D29794666: Remove faulty process group code
Test Plan: revert-hammer

Differential Revision:
D29794666 (afe3644321)

Original commit changeset: 0b35191cc072

fbshipit-source-id: 6467bc5100f4115f2fdb385e205740cd68c89743
2021-07-28 10:15:34 -07:00
Thomas J. Fan
71a6ef17a5 ENH Adds no_batch_dim tests/docs for Maxpool1d & MaxUnpool1d (#62206)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62206

Reviewed By: ejguan

Differential Revision: D29942341

Pulled By: jbschlosser

fbshipit-source-id: a3fad774cee30478f7d6cdd49d2eec31be3fc518
2021-07-28 10:15:32 -07:00
Jerry Zhang
cdf85a82ed [quant][graphmode][fx] Add reference pattern support for BatchNorm (#62215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62215

including batchnorm2d, batchnorm3d, batchnormrelu2d and batchnormrelu3d

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29917524

fbshipit-source-id: 3a9520ff659cb21e6e2fe614973b3d08aa0af923
2021-07-28 10:14:16 -07:00
leslie-fang-intel
7443c90f15 optimize non lastdim softmax bf16 (#60371)
Summary:
Here is the PR to enable the softmax calculation with data type of `bfloat16` when not along the last dim.
* Use bf16 specialization for forward calculation to reduce the bf16/fp32 cast in vec template.
* Release the bf16 limitation for backward calculation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60371

Reviewed By: ejguan

Differential Revision: D29563109

Pulled By: cpuhrsch

fbshipit-source-id: f6b439fa3850a6c633f35db65ea3d735b747863e
2021-07-28 10:06:51 -07:00
Don Jang
68efa186cc [static runtime] Implement aten::full (#62227)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62227

Test Plan: Added `StaticRuntime.IndividualOps_Full` to cover the newly added code path.

Reviewed By: hlu1

Differential Revision: D29923649

fbshipit-source-id: 722950137c35ae325590a670b97f03b395e8eac3
2021-07-28 09:50:27 -07:00
Rohan Varma
10c6811a6b [DDP] Run test_ddp_new_tensor_in_fwd with static graph (#61992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61992

This test previously was not enabled for static graph but to ensure
this feature is supported with DDPSink, enable it for static graph which
currently passes outputs to DDPSink.
ghstack-source-id: 134471406

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D29830887

fbshipit-source-id: 2d3f750d9eb4289558ed21acccd172d83d9b82cc
2021-07-28 09:49:12 -07:00
Alban Desmaison
acf8907e94 These should be equivalent per the previous formula but breaks xla (#62329)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62329

Reviewed By: ejguan

Differential Revision: D29961527

Pulled By: albanD

fbshipit-source-id: 46e46726591f4c0c8faf6ec0d7136a2d4ca976ea
2021-07-28 09:23:51 -07:00
Jerry Zhang
f4baa83eae [bc-breaking] reference option for conv produce a pattern instead of reference conv module (#61942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61942

This PR changes is_reference=True for conv to produce a pattern consists of dequant - float conv - quant instead of reference conv module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29810656

fbshipit-source-id: 549237a62bfda4341a2a7474c124f5e33350e267
2021-07-28 09:13:40 -07:00
Richard Zou
52d1ffb789 Teach pytrees about namedtuple (#62292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62292

This PR adds pytree support for namedtuples. The challenge about namedtuple
is that each namedtuple class is actually different. This PR does the
following:
- it adds a namedtuple flatten/unflatten. The flatten function returns
a context that is the actual type of the namedtuple subclass. The
unflatten function uses that type to reconstruct the namedtuple
- Special cases all pytree logic to consider all namedtuples the same.
This is done by creating a `_get_node_type(pytree)` helper function that
returns `namedtuple` if `pytree` is any namedtuple subclass. The effect
of this is that all namedtuple subclasses will go through the namedtuple
flatten/unflatten functions
- Adds a `_namedtuple_flatten_spec` function for FX pytrees. This function
flattens the namedtuple based on the spec and is equivalent to the
`_tuple_flatten_spec`.

Test Plan
- new tests in test/test_pytree.py and test/test_fx.py

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29947302

Pulled By: zou3519

fbshipit-source-id: 19c00665b13546642c315df0f243ad99b8e7ff7c
2021-07-28 06:27:44 -07:00
Nikita Shulga
c06b6e445f Build M1 binaries with PocketFFT (#62222)
Summary:
As MKL is only available on x86_64 platform, clone header-only PocketFFT
library and use it as FFT provider

Fixes https://github.com/pytorch/pytorch/issues/62107

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62222

Reviewed By: ejguan

Differential Revision: D29938718

Pulled By: malfet

fbshipit-source-id: ac0bd98b5090d6c8a26c36c4e34a4d6e1d9f1a92
2021-07-27 22:41:29 -07:00
Nikita Shulga
cb2b5f06c9 Revert D29816592: [pytorch][PR] [fix] polygamma n>=1
Test Plan: revert-hammer

Differential Revision:
D29816592 (b73d759708)

Original commit changeset: 2c020a6e4c32

fbshipit-source-id: 310c93ade300966366ef04f206a5908fb27745db
2021-07-27 22:14:10 -07:00
Amy He
73f1e2d1dc [8/N] Nnapi backend delegation preprocess: New refactored design (#62225)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62225

Rewrote the preprocess function for Android NNAPI delegate.
Previously, `preprocess()` called `convert_model_to_nnapi()` using Pybind and returned a NnapiModule that is serialized for mobile. Now, `preprocess()` calls a sub-function of `convert_model_to_nnapi()` and returns several preprocessed items (that were previously components of NnapiModule).

Dictionary returned contains:
   "shape_compute_module": torch::jit::Module,
   "ser_model": torch::Tensor,
   "weights": List[torch.Tensor],
   "inp_mem_fmts": List[int],
   "out_mem_fmts": List[int]

**Purpose and Future:**
The purpose of these changes are to move more implementation from bytecode and Torchscript to the delegate API, since bytecode is less efficient.
Now, only the shape computation uses bytecode. In the future, shape computation will be moved out of Torchscript as well.

**nnapi_backend_preprocess.cpp:** preprocess implementation
**prepare.py**: refactored a portion of `convert_model_to_nnapi()` to `process_for_nnapi()`, so preprocess can get components of NnapiModule

**Test:**
Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully
ghstack-source-id: 134444190

Test Plan: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully

Reviewed By: raziel

Differential Revision: D29922279

fbshipit-source-id: cadcf8908d8a745dc7abbe286e97d6ead937d4ab
2021-07-27 18:52:48 -07:00
Nikita Shulga
7aabda6d5d Update nccl to v2.10.3-1 (#62276)
Summary:
Which at the time of creating PR is points to 7e51592129

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62276

Reviewed By: ngimel

Differential Revision: D29940950

Pulled By: malfet

fbshipit-source-id: 59c6fda76a9023af3adbfb5a96b83ca50950df6c
2021-07-27 18:32:53 -07:00
Nikita Shulga
1f1d01df3e Revert D29943356: .github: Migrate ecr_gc to github actions
Test Plan: revert-hammer

Differential Revision:
D29943356 (8e0622abf1)

Original commit changeset: 493592baf2f7

fbshipit-source-id: f0e604aab2b828561adc3e8fabf0f39221e15615
2021-07-27 18:14:31 -07:00
Wanchao Liang
af0f083d42 [dist_optim] fix the bug of none grads on functional optimizers (#62249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62249

parameter and grads passed to torch.optim.functional should always match, we should skip the parameters that have none gradients to avoid the size mismatch
ghstack-source-id: 134452467

Test Plan: test_dist_optim_none_grads

Reviewed By: mrshenli

Differential Revision: D29929653

fbshipit-source-id: 4ca6167fecdfe1db422236655edee3aa59b8b044
2021-07-27 18:10:51 -07:00
Nikita Shulga
c0b806694f Do not use deprecated data accessor in IndexKernel.cu (#62268)
Summary:
Fixes repeated warnings like:
```
/var/lib/jenkins/workspace/aten/src/ATen/native/cuda/IndexKernel.cu: In lambda function:
/var/lib/jenkins/workspace/aten/src/ATen/native/cuda/IndexKernel.cu:354:683: warning: 'T* at::Tensor::data() const [with T = c10::BFloat16]' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
   AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3 (e23ddf06e9)(at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, iter.dtype(), "take_cuda", [&] {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ^
/var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:559:1: note: declared here
   T * data() const {
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62268

Reviewed By: walterddr

Differential Revision: D29937267

Pulled By: malfet

fbshipit-source-id: 6413deb9762b973880f4a7db47652eacd013214f
2021-07-27 17:58:19 -07:00
Christopher Dewan
e3be185069 [PyTorch] Add KWargs support to script module forward (#62224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62224

They underlying operator allows both args and kwargs, but we only expose args in this convenience method. this brings them in line while not changing any existing programs.

Test Plan: CI

Reviewed By: gunchu

Differential Revision: D29920830

fbshipit-source-id: f4b2aa88d4a679e33595625b7ef355e4d14e54c4
2021-07-27 17:02:57 -07:00
Peter Bell
9776e1ff2f Migrate thnn_conv_depthwise2d from THC to ATen (#62281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62281

Closes gh-24646, Closes gh-24647

There is no `TensorIterator` equivalent to these kernels so this is just
migrating the existing kernels over to the ATen style.

I've benchmarked for contiguous tensors with this script:
```
import torch
shape = (10, 10, 100, 100)
x = torch.randn(*shape, device='cuda')
w = torch.randn((10, 1, 5, 5), device='cuda')

for _ in range(100):
    torch.nn.functional.conv2d(x, w, groups=10)
```

and similarly for backwards. I see these as the same to within measurement error.

|                   | Master Forward (us) | This PR Forward (us) |
|------------------:|:-------------------:|:--------------------:|
|           Forward |        133.5        |         133.6        |
|  Backward (input) |        1,102        |         1,119        |
| Backward (weight) |        2,220        |         2,217        |

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29943062

Pulled By: ngimel

fbshipit-source-id: fc5d16496eb733743face7c5a14e532d7b8ee26a
2021-07-27 16:51:23 -07:00
Alban Desmaison
ba9423aa93 Fix forward ad for matrix power land race (#62291)
Summary:
Fix land race from https://github.com/pytorch/pytorch/pull/59993

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62291

Reviewed By: driazati, seemethere

Differential Revision: D29946599

Pulled By: albanD

fbshipit-source-id: 16411e1a0c298fad12a6a6788ec2427923b0112a
2021-07-27 16:17:51 -07:00
Peter Bell
171e13fde9 Rework PowKernel.cu (#62260)
Summary:
PowKernel.cu is the single slowest file to compile in all of pytorch, taking
7 m 34 s on my machine. After investigating, I discovered that the case with
complex inputs and a cpu scalar for the first argument takes more than half that
time just on its own.

Noting that [`thrust::pow`] for complex is just `exp(log(base) * exponent)`,
we can improve this kernel by precomputing `log(base)` on cpu and computing
only the `exp` on CUDA. This is faster in both runtime and compile time.
For 1 million elements, master takes 61.6 us vs 56.9 us with this PR.

I also noticed that the constant exponent case is implemented twice, once in
`gpu_kernel_with_scalars` and again in `pow_tensor_scalar_kernel`. Further, the
`Pow.cpp` code detects cpu-scalar exponents and redispatches to the `tensor_scalar`
overload, making the `gpu_kernel_with_scalars` version dead code. Now instead,
we unconditionally run `tensor_tensor` and it will call into `tensor_scalar` if appropriate.

With these changes, PowKernel.cu takes just 2 m 30 s to compile.

[`thrust::pow`]: 368266e80e/thrust/detail/complex/cpow.h (L33)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62260

Reviewed By: ejguan

Differential Revision: D29938789

Pulled By: ngimel

fbshipit-source-id: 7ab7d81ececc92a9e6e62e60b0a4f2e6e3146df8
2021-07-27 16:16:20 -07:00
Jerry Zhang
7507aeded5 [reland][bc-breaking] reference option for linear produce a pattern instead of reference linear module (#61892) (#62277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62277

This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Imported from OSS

Reviewed By: ejguan

Differential Revision: D29941079

fbshipit-source-id: 84bdfc0bb872c34fc345875e545c8b323e77c41e
2021-07-27 15:46:44 -07:00
Jane Xu
24d94f5102 Limit smoke tests on PRs to just one config (#62288)
Summary:
When coming across the short runtime of a periodic job on this PR, I realized the current smoke tests on PRs set up was flawed. Previously an attempt for better future compatibility, our conditional for running smoke tests only was for USE_CUDA=1 on Windows.

This is BAD and has unintended consequences, such as misleading results when a ci/scheduled workflow is triggered but fails to test the full test suite. e.g., with PR https://github.com/pytorch/pytorch/issues/62266 https://github.com/pytorch/pytorch/actions/runs/1071698069

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62288

Reviewed By: seemethere, ejguan

Differential Revision: D29945540

Pulled By: janeyx99

fbshipit-source-id: 3cc91511c151f7348872b039c94d7752b6ea4692
2021-07-27 15:33:37 -07:00
Eli Uriegas
8e0622abf1 .github: Migrate ecr_gc to github actions (#62284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62284

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet, zhouzhuojie

Differential Revision: D29943356

Pulled By: seemethere

fbshipit-source-id: 493592baf2f7abe206e1fb17438bac4e908b1251
2021-07-27 15:11:01 -07:00
Eli Uriegas
d0e5ef5eba .circleci: Remove conda-package-handling pin (#62290)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62290

No longer needed anymore.

Fixes nightly failures that we're observing as well:

```
Jul 27 07:33:02 Found conflicts! Looking for incompatible packages.
Jul 27 07:33:02 This can take several minutes.  Press CTRL-C to abort.
Jul 27 07:33:02 failed
Jul 27 07:33:02
Jul 27 07:33:02 UnsatisfiableError: The following specifications were found
Jul 27 07:33:02 to be incompatible with the existing python installation in your environment:
Jul 27 07:33:02
Jul 27 07:33:02 Specifications:
Jul 27 07:33:02
Jul 27 07:33:02   - conda-package-handling=1.6.0 -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0']
Jul 27 07:33:02
Jul 27 07:33:02 Your python: python=3.9
```

From: https://app.circleci.com/pipelines/github/pytorch/pytorch/356478/workflows/2102acf1-c92a-4a59-919c-61d32d3bcd71/jobs/15027876

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D29946501

Pulled By: seemethere

fbshipit-source-id: 3e9182f4cbcf2aab185dbbc21b7a6171746e2281
2021-07-27 14:59:41 -07:00
Rong Rong (AI Infra)
8fe32c9c13 fix test-report uploading uniqueness issue (#62217)
Summary:
Should fix: https://github.com/pytorch/pytorch/issues/61978.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62217

Reviewed By: seemethere, ejguan

Differential Revision: D29944444

Pulled By: walterddr

fbshipit-source-id: 4b737d1535fd5cbfafb24245fad9ef67285f1dc0
2021-07-27 14:17:50 -07:00
Rong Rong (AI Infra)
190cdcb08c remove print for status on scribe sending (#62285)
Summary:
Following up on https://github.com/pytorch/pytorch/issues/61768.

Currently the printout is hugely long because each test case returns a status code OK without an exception.
This should be avoided when no exception was raised from send_to_scribe.

Removing the log printing when response without error

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62285

Reviewed By: zhouzhuojie

Differential Revision: D29944461

Pulled By: walterddr

fbshipit-source-id: fc3c2b88bba27c68521cef7079ca2b6197d2d58b
2021-07-27 14:16:32 -07:00
Mike Iovine
e1bee3eb30 [Static Runtime] Add missing unit tests for static runtime ops (#62238)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62238

Added tests for the following ops:

* `aten::mul`
* `aten::nan_to_num`
* `aten::stack`
* `aten::relu`
* `aten::tanh`

Reviewed By: hlu1

Differential Revision: D29914217

fbshipit-source-id: 6a6c39629310e7131127e24fdce7253ccdf80340
2021-07-27 14:12:21 -07:00
Sameer Deshmukh
4a15f4a902 Allow 0-dim batch sizes in Bilinear NN layer. (#47106)
Summary:
Part of the fix for https://github.com/pytorch/pytorch/issues/12013

Checks if the inputs and outputs are non-zero in order to allow the Bilinear layer to accept 0-dim batch sizes. The if-check for this checks for both input and output dim sizes since the `_trilinear` function is written to work with both forward and backward for Bilinear.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47106

Reviewed By: ejguan

Differential Revision: D29935589

Pulled By: jbschlosser

fbshipit-source-id: 607d3352bd4f88e2528c64408f04999960be049d
2021-07-27 13:59:42 -07:00
albanD
ab0354b650 All remaining linear/element-wise formulas (#59993)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59993

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29914594

Pulled By: albanD

fbshipit-source-id: 2ffc5993cb66586e1458d7016774a03dfe786863
2021-07-27 13:06:46 -07:00
albanD
4c3eea26bd Fix out= variant forward grad detection (#60499)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60499

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29914595

Pulled By: albanD

fbshipit-source-id: c51bb3aed91ab1f6ebc57936143b249590a43bd5
2021-07-27 13:06:45 -07:00
albanD
4a36e2a223 Add forward AD inplace check and fix codegen (#60498)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60498

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29914593

Pulled By: albanD

fbshipit-source-id: bde649d5a03639a240dfe5fe027c6a3f758428a4
2021-07-27 13:04:55 -07:00
Tanvir Zaman
df18d05429 Make bytes_read available for OperatorCost (#62059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62059

GetOperatorCost in Workspace exposes flops and bytes_written only. Make the an additional piece, bytes_read, available from OperatorSchema::Cost.

Test Plan:
Added the two additional pieces in the unit test testGetOperatorCost in workspace_test

buck test caffe2/caffe2/python:workspace_test -- testGetOperatorCost

buck test //aml/ml_foundation/exp_platform/large_scale_training/distributed_hogwild/auto_device_placement/tests/...

buck test //aiplatform/training/autotuning/tests/...

buck test //aiplatform/training/pipelining/tests/...

buck test //deeplearning/fblsim/tests/...

Flow tests:

ADP Greedy: f288078287
ADP MILP: f288079278

Reviewed By: CrazySherman, xtaofb

Differential Revision: D29860676

fbshipit-source-id: 8b3a9f2bf17c0dae48cfe2800e8821bf441e0b03
2021-07-27 12:48:36 -07:00
JackCaoG
bba7800933 Add logical op symbol (#62063)
Summary:
This is for xla side [pr](https://github.com/pytorch/xla/pull/3054) to add logical op lowering

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62063

Reviewed By: ejguan

Differential Revision: D29937449

Pulled By: bdhirsh

fbshipit-source-id: ba421f6c2dad67395a383b5ed0b81ad9d59abe86
2021-07-27 12:19:56 -07:00
Laurence Rouesnel
3bdee2bbed [jit] Rewrote DFS graph iterator to remove unnecessary local state (#61326) (#61980)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61980

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29917766

Pulled By: laurencer

fbshipit-source-id: 536c4806636fe9e709e8bffdefa9320127064dea
2021-07-27 11:50:20 -07:00
Eli Uriegas
fa52b4b922 .github: chown workspace for render_test_results (#62207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62207

Workspace was getting held back due to permission denied errors, let's
ensure we have a chown'd / clean workspace for all render_test_results
runs

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr, janeyx99

Differential Revision: D29915232

Pulled By: seemethere

fbshipit-source-id: dd9fcc9c00d9665569bd8cfa57e5d2d8da965aac
2021-07-27 11:44:15 -07:00
Erjia Guan
acaac70f63 Revert D29883676: Migrate thnn_conv_depthwise2d from THC to ATen
Test Plan: revert-hammer

Differential Revision:
D29883676 (de3a4eb583)

Original commit changeset: 9b2ac62cdd8a

fbshipit-source-id: d211d3cb7723b5d2e73de6941a7e649e5f78864f
2021-07-27 11:28:52 -07:00
Pritam Damania
82d81455ae [2/N] Remove unittest.skip across all of torch.distributed. (#61887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61887

1) Introduced a `sandcastle_skip_if` decorator that ensures these
tests just get passed on sandcastle.
2) Fixed all test files under `test/distributed` to not use `unittest.skip`

Overall goal is to avoid using skips since sandcastle tags these tests as
continuously skipping.
ghstack-source-id: 134382237

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D29784152

fbshipit-source-id: 17b4df6c5a55ff1d1e8e1de128fa679c3dfbcb7d
2021-07-27 10:53:23 -07:00
huqinghao
7fc96db45d fix typo errors in quantization-support.rst Line320 (#44447)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44379

change
"`torch.per_channel_symmetric` — per tensor, symmetric"
to
 "`torch.per_channel_symmetric` — per channel, symmetric"

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44447

Reviewed By: mruberry

Differential Revision: D29909645

Pulled By: ezyang

fbshipit-source-id: e1505d070ec2b335dd6503b528e6a2f3bda2f1e3
2021-07-27 10:42:29 -07:00
Edward Yang
5f7f08f498 Reenable AMP on XLA (#61861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61861

Fixes https://github.com/pytorch/pytorch/issues/61804

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29881903

Pulled By: ezyang

fbshipit-source-id: 91530c10fa37715bec33f477285da119415a9da9
2021-07-27 10:32:01 -07:00
Oleg Khabinov
a0c1c7e5d4 Fixing the case when starter nodes depend on get_attr node (#62234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62234

There was a typo that we caught until recently, thus making this fix.

Reviewed By: 842974287

Differential Revision: D29924190

fbshipit-source-id: ee6259fcd41358aefe9680b419acc87c0c2821cb
2021-07-27 10:29:53 -07:00
Erjia Guan
8cdf16d1de Revert D29810657: [bc-breaking] reference option for linear produce a pattern instead of reference linear module
Test Plan: revert-hammer

Differential Revision:
D29810657 (9df605133e)

Original commit changeset: 949615bbc017

fbshipit-source-id: 54597d1f9636b0f94ae01c66018ff2592e5c39fc
2021-07-27 10:10:13 -07:00
Nikita Vedeneev
d7ddae8e4f det_backward: correct, more robust and with complex support [clone] (#61905)
Summary:
Clone of https://github.com/pytorch/pytorch/pull/58195 to ease the import. Done by request from anjali411

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61905

Reviewed By: albanD

Differential Revision: D29937920

Pulled By: anjali411

fbshipit-source-id: 025892a8e6147790825b20458986730ad8c5bb0f
2021-07-27 10:08:26 -07:00
Peter Bell
de3a4eb583 Migrate thnn_conv_depthwise2d from THC to ATen (#62006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62006

Closes gh-24646, gh-24647

There is no `TensorIterator` equivalent to these kernels so this is just
migrating the existing kernels over to the ATen style.

I've benchmarked for contiguous tensors with this script:
```
import torch
shape = (10, 10, 100, 100)
x = torch.randn(*shape, device='cuda')
w = torch.randn((10, 1, 5, 5), device='cuda')

for _ in range(100):
    torch.nn.functional.conv2d(x, w, groups=10)
```

and similarly for backwards. I see these as the same to within measurement error.

|                   | Master Forward (us) | This PR Forward (us) |
|------------------:|:-------------------:|:--------------------:|
|           Forward |        133.5        |         133.6        |
|  Backward (input) |        1,102        |         1,119        |
| Backward (weight) |        2,220        |         2,217        |

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29883676

Pulled By: ngimel

fbshipit-source-id: 9b2ac62cdd8a84e1a23ffcd66035b2b2fe2374d8
2021-07-27 10:00:25 -07:00
Jerry Zhang
9df605133e [bc-breaking] reference option for linear produce a pattern instead of reference linear module (#61892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61892

This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29810657

fbshipit-source-id: 949615bbc017bc454d81c8a6b2bdec53badaab19
2021-07-27 09:49:20 -07:00
Amy He
6c6a9c73f2 [7/N] Nnapi backend delegation preprocess: compile_spec sanity check (#62213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62213

Added sanity checks in preprocess function for Android NNAPI delegate.
`preprocess()` requires some input metadata passed through its `method_compile_spec` function argument.

`preprocess()` now throws specific error messages, if it cannot find the correct input arguments.
Example error message:
```
RuntimeError: method_compile_spec does not contain the "forward" key.
method_compile_spec should contain a Tensor or Tensor List which bundles input parameters: shape, dtype, quantization, and dimorder.
For input shapes, use 0 for run/load time flexible input.
method_compile_spec must use the following format: {"forward": {"inputs": at::Tensor}} OR {"forward": {"inputs": c10::List<at::Tensor>}}
```

nnapi_backend_preprocess.cpp: contains sanity check implementation
test_backend_nnapi.py: sanity check unit tests

Test: Ran `python test/test_jit.py TestNnapiBackend` in OSS successfully.

TODO: Using Tensors to pass input parameters is a temporary hack. When a dedicated object is implemented, update the sanity check error message.
ghstack-source-id: 134339282

Test Plan: Ran `python test/test_jit.py TestNnapiBackend` in OSS successfully.

Reviewed By: raziel, iseeyuan

Differential Revision: D29917004

fbshipit-source-id: 0d5c6b35889c556cda905ffc29c25c5422ae9ee4
2021-07-27 09:31:35 -07:00