Commit Graph

11958 Commits

Author SHA1 Message Date
Alex Suhan
5d57025206 [TensorExpr] Add log1p support to the LLVM backend (#44839)
Summary:
Also corrected Sleef_log1p registrations, float versions had a redundant f.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44839

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseLog1pFloat_LLVM

Reviewed By: glaringlee

Differential Revision: D23762113

Pulled By: asuhan

fbshipit-source-id: b5cf003b5c0c1ad549c7f04470352231929ac459
2020-09-17 13:38:35 -07:00
Rohan Varma
bee97d5be0 Document the default behavior for dist.new_group() when ranks=None (#44000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44000

This wasn't documented, so add a doc saying all ranks are used when
ranks=None
ghstack-source-id: 111206308

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D23465034

fbshipit-source-id: 4c51f37ffcba3d58ffa5a0adcd5457e0c5676a5d
2020-09-17 11:30:37 -07:00
Yanan Cao
2558e5769d Implement sort for list of tuples (#43448)
Summary:
* Implement tuple sort by traversing contained IValue types and generate a lambda function as comparator for sort.
* Tuple, class objects can now arbitrarily nest within each other and still be sortable

Fixes https://github.com/pytorch/pytorch/issues/43219

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43448

Reviewed By: eellison

Differential Revision: D23352273

Pulled By: gmagogsfm

fbshipit-source-id: b6efa8d00e112178de8256da3deebdba7d06c0e1
2020-09-17 11:20:56 -07:00
Supriya Rao
1fde54d531 [quant][qat] Ensure fake_quant and observer can be disabled on scriptmodule (#44773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44773

The model is created and prepared using fx APIs and then scripted for training.
In order to test QAT on scriptmodel we need to be able to disable/enable fake_quant
and observer modules on it.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qat_and_script

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23741354

fbshipit-source-id: 3fee7aa9b049d9901313b977710f4dc1c4501532
2020-09-17 10:21:52 -07:00
Supriya Rao
361b38da19 [quant][fx] Add node name as prefix to observer module name (#44765)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44765

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_save_observer_state_dict

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23741355

fbshipit-source-id: 7185ceae5b3b520ac0beebb627c44eab7ae7d231
2020-09-17 10:17:42 -07:00
Natalia Gimelshein
74c3dcd1d2 Revert D23725053: [pytorch][PR] change self.generator to generator
Test Plan: revert-hammer

Differential Revision:
D23725053 (a011b86115)

Original commit changeset: 89706313013d

fbshipit-source-id: 035214f0d4298d29a52f8032d364b52dfd956fe8
2020-09-17 09:42:37 -07:00
Yanli Zhao
d2b4534d4d refactor intialize bucket views (#44330)
Summary:
[test all]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44330

Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well
ghstack-source-id: 112257271

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D23583347

fbshipit-source-id: a5f2041b2c4f2c2b5faba1af834c7143eaade938
2020-09-17 09:20:23 -07:00
Jane Xu
4affbbd9f8 minor style edits to torch/testing/_internal/common_quantized.py (#44807)
Summary:
style nits

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44807

Reviewed By: malfet

Differential Revision: D23742537

Pulled By: janeyx99

fbshipit-source-id: 446343822d61f8fd9ef6dfcb8e5da4feff6522b6
2020-09-17 08:02:43 -07:00
Heitor Schueroff de Souza
28085cbd39 Fixed quantile nan propagation and implemented nanquantile (#44393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44393

torch.quantile now correctly propagates nan and implemented torch.nanquantile similar to numpy.nanquantile.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23649613

Pulled By: heitorschueroff

fbshipit-source-id: 5201d076745ae1237cedc7631c28cf446be99936
2020-09-17 05:53:25 -07:00
Yanan Cao
99093277c0 Support Python Slice class in TorchScript (#44335)
Summary:
Implements support for[ Python Slice class](https://docs.python.org/3/c-api/slice.html) (not slice expression, which is already supported)

Slice object can be used in any place that supports slice expression, including multi-dim tensor slicing.

Fixes https://github.com/pytorch/pytorch/issues/43511
Fixes https://github.com/pytorch/pytorch/issues/43125

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44335

Reviewed By: suo, jamesr66a

Differential Revision: D23682213

Pulled By: gmagogsfm

fbshipit-source-id: f74fe25370e89fbfd2b3727d95ce4e1c4ba8dec4
2020-09-17 00:41:53 -07:00
Sameer Deshmukh
e18a2219dd Implement scatter reductions (CUDA), remove divide/subtract (#41977)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33394 .

This PR does two things:
1. Implement CUDA scatter reductions with revamped GPU atomic operations.
2. Remove support for divide and subtract for CPU reduction as was discussed with ngimel .

I've also updated the docs to reflect the existence of only multiply and add.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41977

Reviewed By: mruberry

Differential Revision: D23748888

Pulled By: ngimel

fbshipit-source-id: ea643c0da03c9058e433de96db02b503514c4e9c
2020-09-16 23:25:21 -07:00
Muthu Arivoli
b61d3d8be8 Implement torch.kaiser_window (#44271)
Summary:
Related to https://github.com/pytorch/pytorch/issues/38349

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44271

Reviewed By: ngimel

Differential Revision: D23727972

Pulled By: mruberry

fbshipit-source-id: b4c931b2eb3a536231ad6d6c3cb66e52a13286ac
2020-09-16 20:41:31 -07:00
alanashine
ba6534ae2b enable type check common_distributed (#44821)
Summary:
Enabled type checking in common_distributed by using tensors of ints

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44821

Test Plan: Run python test/test_type_hints.py, errors are no longer ingnored by mypy.ini

Reviewed By: walterddr

Differential Revision: D23747466

Pulled By: alanadakotashine

fbshipit-source-id: 820fd502d7ff715728470fbef0be90ae7f128dd6
2020-09-16 19:19:36 -07:00
Xiang Gao
e48201c5cf Mention TF32 on related docs (#44690)
Summary:
cc: ptrblck

![image](https://user-images.githubusercontent.com/1032377/93168022-cbbfcb80-f6d6-11ea-8f6e-f2c8a15c5bea.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44690

Reviewed By: ngimel

Differential Revision: D23727921

Pulled By: mruberry

fbshipit-source-id: db7cc8e74cde09c13d6a57683129fd839863b914
2020-09-16 19:18:30 -07:00
James Reed
29664e6aa3 [FX] Further sanitize generated names (#44808)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44808

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D23739413

Pulled By: jamesr66a

fbshipit-source-id: b759c3ea613dfa717fb23977b72ff4773d9dcc99
2020-09-16 18:47:38 -07:00
Nick Gibson
204f985fc3 [NNC] Add simplification of Loop + Condition patterns. (#44764)
Summary:
Adds a new optimization to the IRSimplifier which changes this pattern:
```
for ...
  if ...
   do thing;
```
into:
```
if ...
  for ...
    do thing;
```

Which should be almost strictly better.

There are many cases where this isn't safe to do, hence tests. Most  obviously when the condition depends on something modified within the loop.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44764

Reviewed By: mruberry

Differential Revision: D23734463

Pulled By: nickgg

fbshipit-source-id: 51617e837de96b354fb702d0090ac65ddc523d36
2020-09-16 18:41:58 -07:00
Yanan Cao
6befc09465 Fix misuse of PyObject_IsSubclass (#44769)
Summary:
PyObject_IsSubclass may set python live exception bit if given object is not a class. `IsNamedTuple` is currently using it incorrectly, which may trip all following python operations in debug-build python. Normal release-build python is not affected because `assert` is no-op in release-build.

Fixes https://github.com/pytorch/pytorch/issues/43577

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44769

Reviewed By: jamesr66a

Differential Revision: D23725584

Pulled By: gmagogsfm

fbshipit-source-id: 2dabd4f8667a045d5bf75813500876c6fd81542b
2020-09-16 16:19:01 -07:00
Meghan Lele
43fe034514 [JIT] Disallow plain Optional type annotation without arg (#44586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44586

**Summary**
This commit disallows plain `Optional` type annotations without
any contained types both in type comments and in-line as
Python3-style type annotations.

**Test Plan**
This commit adds a unit test for these two situations.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23721517

Pulled By: SplitInfinity

fbshipit-source-id: ead411e94aa0ccce227af74eb0341e2a5331370a
2020-09-16 16:07:26 -07:00
Mingzhe Li
574f9af160 [NCCL] Add option to run NCCL on high priority cuda stream (#43796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43796

This diff adds an option for the process group NCCL backend to pick high priority cuda streams.

Test Plan: waitforsandcastle

Reviewed By: jiayisuse

Differential Revision: D23404286

fbshipit-source-id: b79ae097b7cd945a26e8ba1dd13ad3147ac790eb
2020-09-16 16:00:41 -07:00
Michael Suo
161490d441 Move torch/version.py generation to cmake (#44577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44577

I would like to to move this to cmake so that I can depend on it
happening from other parts of the build.

This PR pulls out the logic for determining the version string and
writing the version file into its own module. `setup.py` still receives
the version string and uses it as before, but now the code for writing
out `torch/version.py` lives in a custom command in torch/CMakeLists.txt

I noticed a small inconsistency in how version info is populated.
`TORCH_BUILD_VERSION` is populated from `setup.py` at configuration
time, while `torch/version.py` is written at build time. So if, e.g. you
configured cmake on a certain git rev, then built it in on another, the
two versions would be inconsistent.

This does not appear to matter, so I opted to preserve the existing
behavior.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23734781

Pulled By: suo

fbshipit-source-id: 4002c9ec8058503dc0550f8eece2256bc98c03a4
2020-09-16 15:49:22 -07:00
Meghan Lele
ffe127e4f1 [JIT] Disallow plain Tuple type annotation without arg (#44585)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44585

**Summary**
This commit disallows plain `Tuple` type annotations without any
contained types both in type comments and in-line as Python3-style
type annotations.

**Test Plan**
This commit adds a unit test for these two situations.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23721515

Pulled By: SplitInfinity

fbshipit-source-id: e11c77a4fac0b81cd535c37a31b9f4129c276592
2020-09-16 15:49:19 -07:00
qxu
09a84071a3 enable mypy check for jit_metaprogramming_utils (#44752)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42969
enable mypy check for jit_metaprogramming_utils.py and fixed all errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44752

Reviewed By: walterddr

Differential Revision: D23741285

Pulled By: qxu-fb

fbshipit-source-id: 21e36ca5d25c8682fb93b806e416b9e1db76f71e
2020-09-16 15:44:37 -07:00
Alex Suhan
7b3432caff [TensorExpr] Support boolean in simplifier (#44659)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44659

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ConstantFoldCastToBool

Reviewed By: ngimel

Differential Revision: D23714675

Pulled By: asuhan

fbshipit-source-id: 4c18d972b628d5ad55bad58eddd5f6974e043d9c
2020-09-16 15:30:19 -07:00
Meghan Lele
78b806ab4a [JIT] Disallow plain List type annotation without arg (#44584)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44584

**Summary**
This commit extends the work done in #38130 and disallows plain
Python3-style `List` type annotations.

**Test Plan**
This commit extends `TestList.test_no_element_type_annotation` to the
Python3-style type annotation.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23721514

Pulled By: SplitInfinity

fbshipit-source-id: 48957868286f44ab6d5bf5e1bf97f0a4ebf955df
2020-09-16 15:08:04 -07:00
Meghan Lele
cb3b8a33f1 [JIT] Disallow plain Dict type annotation without arg (#44334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44334

**Summary**
This commit detects and prohibits the case in which `typing.Dict` is
used as an annotation without type arguments (i.e. `typing.Dict[K, V]`).
At present, `typing.Dict` is always assumed to have two arguments, and
when it is used without them, `typing.Dict.__args__` is nonempty and
contains some `typing.TypeVar` instances, which have no JIT type equivalent.
Consequently, trying to convert `typing.Dict` to a JIT type results in
a `c10::DictType` with `nullptr` for its key and value types, which can cause
a segmentation fault.

This is fixed by returning a `DictType` from
`jit.annotations.try_ann_to_type` only if the key and value types are converted
successfully to a JIT type and returning `None` otherwise.

**Test Plan**
This commit adds a unit test to `TestDict` that tests the plain `Dict`
annotations throw an error.

**Fixes**
This commit closes #43530.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23610766

Pulled By: SplitInfinity

fbshipit-source-id: 036b10eff6e3206e0da3131cfb4997d8189c4fec
2020-09-16 14:38:28 -07:00
Edward Yang
5027c161a9 Add TORCH_SELECTIVE_NAME to AMP definitions (#44711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44711

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23711425

Pulled By: ezyang

fbshipit-source-id: d4b0ef77893af80fe9b74791e66825e223ae221d
2020-09-16 14:25:17 -07:00
Nick Gibson
82ab167cce [NNC] Fix masking for all block and thread dimensions in CudaCodeGen (#44733)
Summary:
Unifies a number of partial solutions to the thread and block dimension extent masking, including the NoThreadIdxWriter and my last fix https://github.com/pytorch/pytorch/issues/44325. The NoThreadIdxWriter is gone in favour of tracking the current loop extents and masking any statements that have a lower rank than the launch parameters in any Block or Thread dimension, which handles both the "no" and "smaller" axis binding cases.

For example it will transform the following:
```
for i in 0..10 // blockIdx.x
  for j in 0..10 // threadIdx.x
    do thing(i, j);
  for k in 0..5 // threadIdx.x
    do other thing(i, k);
```

Into:
```
do thing(blockIdx.x, threadIdx.x);
if (threadIdx.x < 5) {
  do other thing(blockIdx.x, threadIdx.x);
}
```

And handle the case where statements are not bound by any axis, eg.
```
do outer thing;
for i in 0..10 // blockIdx.x
  for j in 0..10 // threadIdx.x
    do thing(i, j);
  do other thing(i);
```

will become:

```
if (blockIdx.x < 1) {
  if (threadIdx.x < 1) {
    do outer thing;
  }
}
syncthreads();
do thing(blockIdx.x, threadIdx.x);
syncthreads();
if (threadIdx.x < 1) {
  do other thing(blockIdx.x);
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44733

Reviewed By: mruberry

Differential Revision: D23736878

Pulled By: nickgg

fbshipit-source-id: 52d08626ae8043d53eb937843466874d479a6768
2020-09-16 14:23:47 -07:00
Yi Wang
f3bd984e44 Move the description comment of compute_bucket_assignment_by_size from cpp to the header file. (#44703)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44703

The description of this public function should be in the header file.

Also fix some typos.

Test Plan: N/A.

Reviewed By: pritamdamania87

Differential Revision: D23703661

fbshipit-source-id: 24ae63de9498e321b31dfb2efadb44183c6370df
2020-09-16 13:44:14 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
James Reed
e9c6449b46 [FX][EZ] Allow constructing GraphModule with dict for root (#44679)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44679

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D23696766

Pulled By: jamesr66a

fbshipit-source-id: fe18b7b579c1728d00589bd5fd5e54c917cc61fe
2020-09-16 12:43:23 -07:00
Nikita Shulga
c44e4878ae Enable torch.backends.quantized typechecks (#44794)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44793

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44794

Reviewed By: walterddr

Differential Revision: D23734353

Pulled By: malfet

fbshipit-source-id: 491bd7c8f147759715eb296d7537a172685aa066
2020-09-16 12:21:20 -07:00
Shen Li
cce7680a23 Add bound method tests for async_execution with RRef helper (#44716)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44716

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D23707326

Pulled By: mrshenli

fbshipit-source-id: a2f8db17447e9f82c9f6ed941ff1f8cb9090ad74
2020-09-16 12:01:07 -07:00
Shen Li
257c6d0fde Make async_execution compatible with RRef helpers (#44666)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44666

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D23691989

Pulled By: mrshenli

fbshipit-source-id: b36f4b1c9d7782797a0220434a8272610a23e83e
2020-09-16 12:01:05 -07:00
Shen Li
924717bf51 Add _get_type() API to RRef (#44663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44663

The new API returns the type of the data object referenced by this
`RRef`. On the owner, this is same as `type(rref.local_value())`.
On a user, this will trigger an RPC to fetch the `type` object from
the owner. After this function is run once, the `type` object is
cached by the `RRef`, and subsequent invocations no longer trigger
RPC.

closes #33210

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D23691990

Pulled By: mrshenli

fbshipit-source-id: a2d87cd601a691dd75164b6bcd7315245e9cf6bd
2020-09-16 11:59:22 -07:00
Yanan Cao
07d07e3c6c Remove EXPERIMENTAL_ENUM_SUPPORT feature guard (#44243)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41095

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44243

Reviewed By: ZolotukhinM

Differential Revision: D23605979

Pulled By: gmagogsfm

fbshipit-source-id: 098ae69049c4664ad5d1521c45b8a7dd22e72f6c
2020-09-16 11:45:59 -07:00
Michael Carilli
3e6bb5233f Reference amp tutorial (recipe) from core amp docs (#44725)
Summary:
https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live.  Core amp docs should reference it.

Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725

Reviewed By: mruberry

Differential Revision: D23723807

Pulled By: ngimel

fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3
2020-09-16 11:37:58 -07:00
Fang Zhang
a011b86115 change self.generator to generator (#44461)
Summary:
bug fix

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44461

Reviewed By: mruberry

Differential Revision: D23725053

Pulled By: ngimel

fbshipit-source-id: 89706313013d9eae96aaaf144924867457efd2c0
2020-09-16 11:32:17 -07:00
Jimmy Yao
5e717f0d5e delete the space for the docs rendering (#44740)
Summary:
see the docs rendering of `jacobian` and `hessian` at https://pytorch.org/docs/stable/autograd.html

![image](https://user-images.githubusercontent.com/20907377/93268949-f0618500-f762-11ea-9ec6-ddd062540c59.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44740

Reviewed By: ngimel

Differential Revision: D23724899

Pulled By: mrshenli

fbshipit-source-id: f7558ff53989e5dc7e678706207be2ac7ce22c66
2020-09-16 11:13:45 -07:00
Pritam Damania
dbf17a1d4c Fixing a few links in distributed CONTRIBUTING.md (#44753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44753

ghstack-source-id: 112132781

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D23719077

fbshipit-source-id: 3d943dfde100d175f417554fc7fca1fdb295129f
2020-09-16 10:14:19 -07:00
Rohan Varma
63469da3bb Add a test to ensure DDP join works with RPC (#44439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44439

Adds a test to ddp_under_dist_autograd_test to enusre that that uneven
inputs join() API works properly when DDP + RPC is combined. We test that when
running in outside DDP mode (DDP applied to whole hybrid module) we can
correctly process uneven inputs across different trainers.
ghstack-source-id: 112156980

Test Plan: CI

Reviewed By: albanD

Differential Revision: D23612409

fbshipit-source-id: f1e328c096822042daaba263aa8747a9c7e89de7
2020-09-16 09:51:43 -07:00
Supriya Rao
3f512b0de2 [quant][qat] Ensure observers and fq modules are scriptable (#44749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44749

Ensure fx module is scriptable after calling prepare_qat on it

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qat_and_script

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23718380

fbshipit-source-id: abf63ffb21e707f7def8f6c88246877f5aded58c
2020-09-16 09:30:07 -07:00
Mikhail Zolotukhin
d66520ba08 [TensorExpr] Fuser: try merging adjacent fusion groups. (#43671)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43671

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23360796

Pulled By: ZolotukhinM

fbshipit-source-id: 60ec318fe77ae9f2c821d9c4d106281845266e0f
2020-09-15 21:31:02 -07:00
Kent Gauen
2efc618f19 lr_schedule.py redundant code (#44613)
Summary:
The subclass sets "self.last_epoch" when this is set in the parent class's init function. Why would we need to set last_epoch twice? I think calling "super" resets last_epoch anyway, so I am not sure why we would want to include this in the subclass. Am I missing something?

For the record, I am just a Pytorch enthusiast. I hope my question isn't totally silly.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44613

Reviewed By: albanD

Differential Revision: D23691770

Pulled By: mrshenli

fbshipit-source-id: 080d9acda86e1a2bfaafe2c6fcb8fc1544f8cf8a
2020-09-15 20:28:39 -07:00
Zachary DeVito
2c1b215b48 [fx] remove delegate, replace with tracer (#44566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44566

The Delegate objects were confusing. They were suppose to be a way to
configure how tracing works, but in some cases they appeared necessary
for consturcting graphs, which was not true. This makes the organization
clearer by removing Delgate and moving its functionality into a Tracer class,
similar to how pickle has a Pickler class.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23683177

Pulled By: zdevito

fbshipit-source-id: 7605a34e65dfac9a487c0bada39a23ca1327ab00
2020-09-15 16:52:22 -07:00
Ailing Zhang
fb085d90e3 Revert D23583017: move rebuild buckets from end of first iteration to beginning of second iteration
Test Plan: revert-hammer

Differential Revision:
D23583017 (f5d231d593)

Original commit changeset: ef67f79437a8

fbshipit-source-id: fd914b7565aba6a5574a32b31403525abb80ff07
2020-09-15 15:10:52 -07:00
Dmytro Dzhulgakov
2f4c31ce3a [jit] Speed up saving in case of many classes (#44589)
Summary:
There's an annoying O(N^2) in module export logic that makes saving some of the models (if they have many classes) take eternity.

I'm not super familiar with this code to properly untangle the deps and make it a pure hash lookup. So I just added a side lookup table for raw pointers. It's still quadratic, but it's O(num_classes^2) instead of O(num_classes * num_references) which already gives huge savings.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44589

Test Plan:
Tested with one of the offending models - just loading a saving a Torchscript file:

```
Before:
load 1.9239683151245117
save 165.74712467193604

After:
load 1.9409027099609375
save 1.4711427688598633
```

Reviewed By: suo

Differential Revision: D23675278

Pulled By: dzhulgakov

fbshipit-source-id: 8f3fa7730941085ea20d9255b49a149ac1bf64fe
2020-09-15 15:10:45 -07:00
Nick Gibson
69839ea3f6 [NNC] make inlining immediate (take 3) (#44231)
Summary:
This is a reup https://github.com/pytorch/pytorch/issues/43885 with an extra commit which should fix the bugs that caused it to be reverted. Read that for general context.

The issue here was that we were still using the side maps `tensor_to_stmt_` and `stmt_to_tensor_` which get invalidated by any transform of the IR (rather than just any transform that isn't computeInline). I added a comment about this but didn't actually address our usages of it.

I've removed these maps and changed the `getLoopBodyFor` and `getLoopStatementsFor` helpers to search the root stmt directly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44231

Reviewed By: albanD

Differential Revision: D23689688

Pulled By: nickgg

fbshipit-source-id: 1c6009a880f8c0cebf2300fd06b5cc9322bffbf9
2020-09-15 11:12:24 -07:00
Elias Ellison
8df0400a50 Fix fallback graph in specialize autogradzero (#44654)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44654

Previously we weren't creating a fallback graph as intended in specialize autograd zero, so if a Tensor failed one of our undefinedness checks we would run the backward normally without reprofiling & optimizing.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23691764

Pulled By: eellison

fbshipit-source-id: 10c6fa79518c84a6f5ef2bfbd9ea10843af751eb
2020-09-15 11:12:20 -07:00
kshitij12345
1d733d660d [docs] torch.min/max: remove incorrect warning from docs (#44615)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44195

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44615

Reviewed By: ngimel

Differential Revision: D23703525

Pulled By: mruberry

fbshipit-source-id: 471ebd764be667e29c03a30f3ef341440adc54d2
2020-09-15 10:42:08 -07:00
Xiang Gao
6bc77f4d35 Use amax/maximum instead of max in optimizers (#43797)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43797

Reviewed By: malfet

Differential Revision: D23406641

Pulled By: mruberry

fbshipit-source-id: 0cd075124aa6533b21375fe2c90c44a5d05ad6e6
2020-09-15 10:39:40 -07:00
Muthu Arivoli
9c364da9b9 Fix doc builds for bool kwargs (#44686)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43669

The bool will still link to https://docs.python.org/3/library/functions.html#bool.
Tested using bmm:
![image](https://user-images.githubusercontent.com/16063114/93156438-2ad11080-f6d6-11ea-9b81-96e02ee68d90.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44686

Reviewed By: ngimel

Differential Revision: D23703823

Pulled By: mruberry

fbshipit-source-id: 7286afad084f5ab24a1254ad84e5d01907781c85
2020-09-15 10:34:58 -07:00
Yanli Zhao
f5d231d593 move rebuild buckets from end of first iteration to beginning of second iteration (#44326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44326

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration
ghstack-source-id: 112011490

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D23583017

fbshipit-source-id: ef67f79437a820d9b5699b651803622418499a83
2020-09-15 09:51:33 -07:00
Vasiliy Kuznetsov
5f692a67db qat conv_fused.py: one more patch for forward compatibility (#44671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44671

See comments inline - the FC between
https://github.com/pytorch/pytorch/pull/38478 and
https://github.com/pytorch/pytorch/pull/38820 was broken,
patching it.

Test Plan: Verified with customer hitting the issue that this fixes their issue.

Reviewed By: jerryzh168

Differential Revision: D23694029

fbshipit-source-id: a5e1733334e22305a111df750b190776889705d0
2020-09-15 09:43:29 -07:00
Vitaliy Chiley
c71ce10cfc add dilation to transposeconv's _output_padding method (#43793)
Summary:
This PR adds dilation to _ConvTransposeNd._output_padding method and tests using a bunch of different sized inputs.

Fixes https://github.com/pytorch/pytorch/issues/14272

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43793

Reviewed By: zou3519

Differential Revision: D23493313

Pulled By: ezyang

fbshipit-source-id: bca605c428cbf3a97d3d24316d8d7fde4bddb307
2020-09-14 21:28:27 -07:00
Meghan Lele
e7d782e724 [JIT] Add property support for ScriptModules (#42390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42390

**Summary**
This commit extends support for properties to include
ScriptModules.

**Test Plan**
This commit adds a unit test that has a ScriptModule with
a user-defined property.

`python test/test_jit_py3.py TestScriptPy3.test_module_properties`

Test Plan: Imported from OSS

Reviewed By: eellison, mannatsingh

Differential Revision: D22880298

Pulled By: SplitInfinity

fbshipit-source-id: 74f6cb80f716084339e2151ca25092b6341a1560
2020-09-14 18:49:21 -07:00
Guilherme Leobas
e107ef5ca2 Add type annotations for torch.nn.utils.* (#43080)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43013

Redo of gh-42954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43080

Reviewed By: albanD

Differential Revision: D23681334

Pulled By: malfet

fbshipit-source-id: 20ec78aa3bfecb7acffc12eb89d3ad833024394c
2020-09-14 17:52:37 -07:00
Elias Ellison
551494b01d [JIT] Fix torch.tensor for empty multidimensional-typed lists (#44652)
Summary:
We were hitting an assert error when you passed in an empty `List[List[int]]` - this fixes that error by not recursing into 0-element tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44652

Reviewed By: ZolotukhinM

Differential Revision: D23688247

Pulled By: eellison

fbshipit-source-id: d48ea24893044fae96bc39f76c0f1f9726eaf4c7
2020-09-14 17:28:23 -07:00
Mike Ruberry
686e281bcf Updates div to perform true division (#42907)
Summary:
This PR:

- updates div to perform true division
- makes torch.true_divide an alias of torch.div

This follows on work in previous PyTorch releases that first deprecated div performing "integer" or "floor" division, then prevented it by throwing a runtime error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42907

Reviewed By: ngimel

Differential Revision: D23622114

Pulled By: mruberry

fbshipit-source-id: 414c7e3c1a662a6c3c731ad99cc942507d843927
2020-09-14 15:50:38 -07:00
Jerry Zhang
e594c30bc2 [quant][graphmode][fx] Support fp16 dynamic quantization for linear (#44582)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44582

Test Plan:
test_quantize_fx.py

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23665974

fbshipit-source-id: 19ba6c61a9c77ef570b00614016506e9a2729f7c
2020-09-14 15:43:08 -07:00
BowenBao
43406e218a [ONNX] Update ONNX shape inference (#43929)
Summary:
* Support sequence type (de)serialization, enables onnx shape inference on sequence nodes.
* Fix shape inference with block input/output: e.g. Loop and If nodes.
* Fix bugs in symbolic discovered by coverage of onnx shape inference.
* Improve debuggability: added more jit logs. For simplicity, the default log level, when jit log is enabled, will not dump ir graphs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43929

Reviewed By: albanD

Differential Revision: D23674604

Pulled By: bzinodev

fbshipit-source-id: ab6aacb16d0e3b9a4708845bce27c6d65e567ba7
2020-09-14 15:36:19 -07:00
Ksenija Stanojevic
f7cfbac89b [ONNX] Update len symbolic (#43824)
Summary:
Update len symbolic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43824

Reviewed By: izdeby

Differential Revision: D23575765

Pulled By: bzinodev

fbshipit-source-id: 0e5c8c8d4a5297f65e2dc43168993350f784c776
2020-09-14 15:00:44 -07:00
shubhambhokare1
da11d932bc [ONNX] Update arange op to support out argument (#43777)
Summary:
Update arange op to support out argument

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43777

Reviewed By: albanD

Differential Revision: D23674583

Pulled By: bzinodev

fbshipit-source-id: 6fb65e048c6b1a551569d4d2a33223522d2a960c
2020-09-14 14:56:17 -07:00
neginraoof
62ebad4ff9 [ONNX] Export new_empty and new_zeros (#43506)
Summary:
Adding symbolic to export new_empty and new_zeros

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43506

Reviewed By: houseroad

Differential Revision: D23674574

Pulled By: bzinodev

fbshipit-source-id: ecfcdbd4845fd3a3c6618a060129fbeee4df5dd7
2020-09-14 14:48:34 -07:00
Zafar
742654d1b6 [quant] ConvTranspose1d / ConvTranspose2d (#40371)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40371

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D22158981

Pulled By: z-a-f

fbshipit-source-id: defbf6fbe730a58d5b155dcb2460dd969797215c
2020-09-14 14:25:06 -07:00
Alex Suhan
a188dbdf3f Check for index-rank consistency in FunctionInliner (#44561)
Summary:
When caller / callee pairs are	inserted into the mapping, verify that
the arity of the buffer access is consistent with its declared rank.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44561

Test Plan: CI, test_tensorexpr --gtest_filter=TensorExprTest.DetectInlineRankMismatch

Reviewed By: albanD

Differential Revision: D23684342

Pulled By: asuhan

fbshipit-source-id: dd3a0cdd4c2492853fa68381468e0ec037136cab
2020-09-14 14:07:22 -07:00
Rong Rong
b5dd6e3e61 split torch.testing._internal.* and add type checking for torch.testing._internal.common_cuda (#44575)
Summary:
First step to fix https://github.com/pytorch/pytorch/issues/42969.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44575

Reviewed By: malfet

Differential Revision: D23668740

Pulled By: walterddr

fbshipit-source-id: eeb3650b1780aaa5727b525b4e6182e1bc47a83f
2020-09-14 14:04:02 -07:00
mariosasko
cfba33bde3 Fix the ELU formula in the docs (#43764)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43389.

This PR replaces the old ELU formula from the docs that yields wrong results for negative alphas with the new one that fixes the issue and relies on the cases notation which makes the formula more straightforward.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43764

Reviewed By: ailzhang

Differential Revision: D23425532

Pulled By: albanD

fbshipit-source-id: d0931996e5667897d926ba4fc7a8cc66e8a66837
2020-09-14 14:01:56 -07:00
Zafar
9d4943daaf [quant] conv_transpose1d / conv_transpose2d (#40370)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40370

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D22158979

Pulled By: z-a-f

fbshipit-source-id: f5cb812c9953efa7608f06cf0188de447f73f358
2020-09-14 13:45:28 -07:00
Rong Rong
ecac8294a6 enable type checking for torch._classes (#44576)
Summary:
Fix https://github.com/pytorch/pytorch/issues/42980

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44576

Reviewed By: malfet

Differential Revision: D23668741

Pulled By: walterddr

fbshipit-source-id: 4201ea3187a40051ebff53d28c8e571ea1a61126
2020-09-14 13:26:46 -07:00
Raghavan Raman
ad7a2eb1c9 Simplify nested Min and Max patterns. (#44142)
Summary:
Improve simplification of nested Min and Max patterns.

Specifically, handles the following pattern simplications:
  * `Max(A, Max(A, Const)) => Max(A, Const)`
  * `Max(Min(A, B), Min(A, C)) => Min(A, Max(B, C))`
  * `Max(Const, Max(A, OtherConst) => Max(A, Max(Const, OtherConst))`
     - This case can have an arbitrarily long chain of Max ops. For example: `Max(5, Max(x, Max(y, Max(z, 8)))) => Max(Max(Max(x, 8), y), z)`

Similarly, for the case of Min as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44142

Reviewed By: albanD

Differential Revision: D23644486

Pulled By: navahgar

fbshipit-source-id: 42bd241e6c2af820566744c8494e5dee172107f4
2020-09-14 13:24:46 -07:00
Heitor Schueroff de Souza
199435af90 Update median doc to note return value of even-sized input (#44562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44562

Add a note that torch.median returns the smaller of the two middle elements for even-sized input and refer user to torch.quantile for the mean of the middle values.

fixes https://github.com/pytorch/pytorch/issues/39520

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23657208

Pulled By: heitorschueroff

fbshipit-source-id: 2747aa652d1e7f10229d9299b089295aeae092c2
2020-09-14 13:18:33 -07:00
Bram Wasti
a475613d1d [static runtime] Swap to out-variant compatible nodes (#44127)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44127

Test Plan: Imported from OSS

Reviewed By: hlu1

Differential Revision: D23604306

Pulled By: bwasti

fbshipit-source-id: 18ccfb9b466b822e28130be3d5c4fae36c76820b
2020-09-14 12:38:25 -07:00
Elias Ellison
856510c96d [JIT] Dont optimize shape info in batch_mm (#44565)
Summary:
We run remove profile nodes and specialize types before batch_mm, so we cannot run peepholes on the type information of tensors since these properties have not been guarded to be guaranteed to be correct.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44565

Reviewed By: albanD

Differential Revision: D23661538

Pulled By: eellison

fbshipit-source-id: 0dd23a65714f047f49b4db4ec582b21870925fe1
2020-09-14 12:34:20 -07:00
Yi Wang
ace81b6794 Remove an extra empty line in the warning comments. (#44622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44622

Remove an extra empty line in the warning comments.Remove an extra empty line.

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D23674070

fbshipit-source-id: 4ee570590c66a72fb808e9ee034fb773b833efcd
2020-09-14 11:15:35 -07:00
Natalia Gimelshein
95a69a7d09 adds list_gpu_processes function (#44616)
Summary:
per title, to make it easier to track the creation of stray contexts:
```
python -c "import torch; a=torch.randn(1, device='cuda'); print(torch.cuda.memory.list_gpu_processes(0)); print(torch.cuda.memory.list_gpu_processes(1))"
GPU:0
process      79749 uses      601.000 MB GPU memory
GPU:1
no processes are running
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44616

Reviewed By: mruberry

Differential Revision: D23675739

Pulled By: ngimel

fbshipit-source-id: ffa14cad9d7144e883de13b1c2c6817bd432f53a
2020-09-14 09:54:32 -07:00
Thomas Viehmann
bd257a17a1 Add HIP/ROCm version to collect_env.py (#44106)
Summary:
This adds HIP version info to the `collect_env.py` output.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44106

Reviewed By: VitalyFedyunin

Differential Revision: D23652341

Pulled By: zou3519

fbshipit-source-id: a1f5bce8da7ad27a1277a95885934293d0fd43c5
2020-09-14 09:19:18 -07:00
Jeremy Lilley
7040a070e3 [torch] Minor: Avoid ostreamstring in Operator's canonicalSchemaString() (#44442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44442

I noticed lock contention on startup as lookupByLiteral() was
calling registerPendingOperators() - some calls were holding the
lock for 10+ ms, as operators were being registered.

canonicalSchemaString() was using ostreamstring, which isn't typically
particularly fast (partly because of c++ spec locale requirements).
If we repalce with regular c++ string appends, it's somewhat faster
(which isn't hard when comparing with stringstream; albeit a bit
more codegen)

Over the first minute or so, this cuts out 1.4 seconds under the
OperatorRegistry lock (as part of registerPendingOperators) in the
first couple minutes of run time (mostly front-loaded) when running
sync sgd.

As an example, before:
   registerPendingOperators 12688 usec for 2449 operators
After:
   registerPendingOperators 6853 usec for 2449 operators
ghstack-source-id: 111862971

Test Plan: buck test mode/dev-nosan caffe2/test/cpp/...

Reviewed By: ailzhang

Differential Revision: D23614515

fbshipit-source-id: e712f9dac5bca0b1876e11fb8f0850402f03873a
2020-09-14 08:24:16 -07:00
kshitij12345
c68a99bd61 [numpy] Add torch.exp2 (#44184)
Summary:
Reference https://github.com/pytorch/pytorch/issues/42515

TODO
* [x] Add tests
* [x] Add docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44184

Reviewed By: ngimel

Differential Revision: D23674237

Pulled By: mruberry

fbshipit-source-id: 7f4fb1900fad3051cd7fc9d3d7f6d985c5fb093c
2020-09-14 04:05:37 -07:00
Victor Bittorf
68a5c361ae Adding Adapative Autorange to benchmark utils. (#44607)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44219

Rebasing https://github.com/pytorch/pytorch/pull/44288 and fixing the git history.

This allows users to bencmark code without having to specify how long to run the benchmark. It runs the benchmark until the variance (IQR / Median) is low enough that we can be confident in the measurement.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44607

Test Plan: There are unit tests, and we manually tested using Examples posted in git.

Reviewed By: robieta

Differential Revision: D23671208

Pulled By: bitfort

fbshipit-source-id: d63184290b88b26fb81c2452e1ae701c7d513d12
2020-09-13 20:55:40 -07:00
Peter Bell
8daaa3bc7e Fix latex error in heaviside docs (#44481)
Summary:
This fixes a `katex` error I was getting trying to build the docs:
```
ParseError: KaTeX parse error: Undefined control sequence: \0 at position 55: …gin{cases}
```

This failure was introduced in https://github.com/pytorch/pytorch/issues/42523.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44481

Reviewed By: colesbury

Differential Revision: D23627700

Pulled By: mruberry

fbshipit-source-id: 9cc09c687a7d9349da79a0ac87d6c962c9cfbe2d
2020-09-13 16:42:19 -07:00
Martin Yuan
7862827269 [pytorch] Add variadic run_method for lite intepreter (#44337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44337

Add a new run_method to mobile Module which is variadic (takes any number of arguments) to match full jit.
ghstack-source-id: 111909068

Test Plan: Added new unit test to test_jit test suite

Reviewed By: linbinyu, ann-ss

Differential Revision: D23585763

fbshipit-source-id: 007cf852290f03615b78c35aa6f7a21287ccff9e
2020-09-13 13:26:30 -07:00
Mikhail Zolotukhin
bcf97b8986 [JIT] Cleanup some places where we log graphs in executors. (#44588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44588

1) SOURCE_DUMP crashes when invoked on a backward graph since
   `prim::GradOf` nodes can't be printed as sources (they don't have
   schema).
2) Dumping graph each time we execute an optimized plan produces lots of
   output in tests where we run the graph multiple times (e.g.
   benchmarks). Outputting that on the least level of verbosity seems
   like an overkill.
3) Duplicated log statement is removed.

Differential Revision: D23666812

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: b9a30e34fd39c85f3e13c3f1e3594e157e1c130f
2020-09-13 11:31:02 -07:00
Mikhail Zolotukhin
82da6b3702 [JIT] Fix jit-log verbosity selection logic. (#44587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44587

Currently it's skewed by one.

The following test demonstrates it:
```
$ cat test.py

import torch
def foo(a,b):
    return a*a*b
torch._C._jit_set_profiling_executor(True)
torch._C._jit_set_profiling_mode(True)
torch._C._jit_override_can_fuse_on_cpu(True)
torch._C._jit_set_texpr_fuser_enabled(True)
f = torch.jit.script(foo)
for _ in range(10):
    f(torch.rand(10), torch.rand(10))

$ cat test_logging_levels.sh

PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser"    python test.py 2>&1 | grep DUMP   >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser"    python test.py 2>&1 | grep UPDATE >& /dev/null && echo FAIL || echo OK
PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser"    python test.py 2>&1 | grep DEBUG  >& /dev/null && echo FAIL || echo OK

PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser"   python test.py 2>&1 | grep DUMP   >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser"   python test.py 2>&1 | grep UPDATE >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser"   python test.py 2>&1 | grep DEBUG  >& /dev/null && echo FAIL || echo OK

PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser"  python test.py 2>&1 | grep DUMP   >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser"  python test.py 2>&1 | grep UPDATE >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser"  python test.py 2>&1 | grep DEBUG  >& /dev/null && echo OK || echo FAIL
```

Before this change:
```
OK
FAIL
OK
OK
OK
FAIL
OK
OK
OK
```

With this change everthing passes.

Differential Revision: D23666813

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: 4adaa5a3d06deadf54eae014a0d76588cdc5e20a
2020-09-13 11:29:25 -07:00
Bert Maher
6d4a605ce9 Fix bug simplifying if-then-else when it can be removed (#44462)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44462

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23671157

Pulled By: bertmaher

fbshipit-source-id: b9b92ad0de1a7bd9bc1fcac390b542d885d0ca58
2020-09-13 10:29:28 -07:00
Mike Ruberry
7e91728f68 Deprecates calling linspace and logspace without setting steps explicitly (#43860)
Summary:
**BC-breaking note**

This change is BC-breaking for C++ callers of linspace and logspace if they were providing a steps argument that could not be converted to an optional.

**PR note**

This PR deprecates calling linspace and logspace wihout setting steps explicitly by:

- updating the documentation to warn that not setting steps is deprecated
- warning (once) when linspace and logspace are called without steps being specified

A test for this behavior is added to test_tensor_creation_ops. The warning only appears once per process, however, so the test would pass even if no warning were thrown. Ideally there would be a mechanism to force all warnings, include those from TORCH_WARN_ONCE, to trigger.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43860

Reviewed By: izdeby

Differential Revision: D23498980

Pulled By: mruberry

fbshipit-source-id: c48d7a58896714d184cb6ff2a48e964243fafc90
2020-09-13 06:09:19 -07:00
Yi Wang
82b4477948 Pass the input tensor vector by const reference. (#44340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44340

Changed the constructor of GradBucket to pass the input by const
reference and hence avoided unnecessary explicit move semantics. Since
previously the declaration and definition are separated, passing the input
tensor vector by value looks quite bizarre.

Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest

Reviewed By: pritamdamania87

Differential Revision: D23569939

fbshipit-source-id: db761d42e76bf938089a0b38e98e76a05bcf4162
2020-09-11 18:03:56 -07:00
Yi Wang
ab5fee2784 Move the inline implementations of GradBucket class to the header. (#44339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44339

Moved the inline implementations of GradBucket class to the header for
succinctness and readability. This coding style is also consistent with
reducer.h under the same directory.

Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest

Reviewed By: pritamdamania87

Differential Revision: D23569701

fbshipit-source-id: 237d9e2c5f63a6bcac829d0fcb4a5ba3bede75e5
2020-09-11 18:01:37 -07:00
Elias Ellison
1f0dcf39fc [JIT] dont optimize device dtype on inline (#43363)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/36404

Adding prim::device and prim::dtype to list of skipped peepholes when we run inlining. In the long term another fix may not be to encode shape / dtype info on the traced graph, because it is not guaranteed to be correct. This is blocked by ONNX currently.

Partial fix for https://github.com/pytorch/pytorch/issues/43134

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43363

Reviewed By: glaringlee

Differential Revision: D23383987

Pulled By: eellison

fbshipit-source-id: 2e9c5160d39d690046bd9904be979d58af8d3a20
2020-09-11 17:29:54 -07:00
Mikhail Zolotukhin
d729e2965e [TensorExpr] Do not inline autodiff graphs if they contain prim::TypeCheck nodes. (#44564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44564

Before this change we sometimes inlined autodiff subgraph containing
fusion groups. This happened because we didn't look for 'unsupported'
nodes recursively (maybe we should), but fusion groups were inside
if-nodes.

The problem was detected by bertmaher in 'LearningToPaint' benchmark
investigation where this bug caused us to keep constantly hitting
fallback paths of the graph.

Test Plan: Imported from OSS

Reviewed By: bwasti

Differential Revision: D23657049

Pulled By: ZolotukhinM

fbshipit-source-id: 7c853424f6dce4b5c344d6cd9c467ee04a8f167e
2020-09-11 17:28:53 -07:00
Nick Gibson
64b4307d47 [NNC] Cuda Codegen - mask loops bound to block/thread dimensions (#44325)
Summary:
Fix an issue where loops of different sizes are bound to the same Cuda dimension / metavar.

Coming soon more info and tests...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44325

Reviewed By: colesbury

Differential Revision: D23628859

Pulled By: nickgg

fbshipit-source-id: 3621850a4cc38a790b62ad168d32e7a0e2462fad
2020-09-11 16:48:16 -07:00
Nikita Shulga
2ae74c0632 Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453)
Summary:
2nd attempt to land https://github.com/pytorch/pytorch/pull/44079

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453

Reviewed By: walterddr, seemethere

Differential Revision: D23619528

Pulled By: malfet

fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7
2020-09-11 16:27:47 -07:00
Jerry Zhang
b6f0ea0c71 [quant][graphmode][fx][fix] Remove qconfig in convert (#44526)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44526

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23641960

fbshipit-source-id: 546da1c16694d1e1dfb72629085acaae2165e759
2020-09-11 15:51:47 -07:00
Jerry Zhang
a82ea6a91f [quant][graphmode][fx][fix] Support None qconfig in convert (#44524)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44524

None qconfig is not handled previously
closes: https://github.com/pytorch/pytorch/issues/44438

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23640269

fbshipit-source-id: 8bfa88c8c78d4530338d9d7fa9669876c386d91f
2020-09-11 15:22:25 -07:00
Zafar
1fb5883072 removing conv filters from conv pattern matching (#44512)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44512

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23637409

Pulled By: z-a-f

fbshipit-source-id: ad5be0fa6accfbcceaae9171bf529772d87b4098
2020-09-11 15:16:29 -07:00
Wanchao Liang
ab6126b50e [rpc][jit] support remote call in TorchScript (#43046)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43046

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23621108

Pulled By: wanchaol

fbshipit-source-id: e8152c6cdd3831f32d72d46ac86ce22f3f13c651
2020-09-11 14:59:51 -07:00
Wanchao Liang
3e5df5f216 [rpc][jit] support rpc_sync in TorchScript (#43043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43043

This add the support for rpc_sync in TorchScript in a way similar to
rpc_async

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23252039

Pulled By: wanchaol

fbshipit-source-id: 8a05329cb8a24079b2863178b73087d47273914c
2020-09-11 14:59:47 -07:00
Wanchao Liang
8bec7cfa91 [rpc] rename some functions (#43042)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43042

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23228894

Pulled By: wanchaol

fbshipit-source-id: 3702b7826ecb455073fabb9dc5dca804c0e092b2
2020-09-11 14:58:39 -07:00
Vasiliy Kuznetsov
70dfeb44bd MinMax based observers: respect device affinity for state_dict (#44537)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44537

Originally, the `min_val`, `max_val`, `min_vals`, `max_vals`
attributes of observers were Tensors but not buffers.  They had custom
state_dict save/load code to ensure their state was saved.

At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:
* create model A, move it to a device (cpu/cuda) and save its state_dict
* create model B, load its state dict.
* `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device
* the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices)

In practice, the case people would sometimes hit is:
* model A is on CPU, state dict is saved
* model B is created and moved to GPU, state_dict from model A is loaded
* assertions throw when operations are attempted across different devices

This PR fixes the behavior by removing the custom save/load where
possible and letting the default `nn.Module` save/load code handle
device assignment.  We special case `PerChannelMinMaxObserver` and its
children to allow for loading buffers or different size, which is
normal.

There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.

Test Plan:
```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D23644493

fbshipit-source-id: 0dbb6aa309ad569a91a663b9ee7e44644080032e
2020-09-11 14:48:56 -07:00
Gregory Chanan
192c4111a3 Simplify target handling in nn gradcheck. (#44507)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44507

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23635799

Pulled By: gchanan

fbshipit-source-id: 75090d6a48771e5c92e737a0829fbfa949f7c8a7
2020-09-11 13:25:59 -07:00
Gregory Chanan
5579b53a7f Fix SmoothL1Loss when target.requires_grad is True. (#44486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44486

SmoothL1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True.

This PR does the following:

1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the SmoothL1Loss CriterionTests to verify that the target derivative is checked.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23630699

Pulled By: gchanan

fbshipit-source-id: 0f94d1a928002122d6b6875182867618e713a917
2020-09-11 12:13:36 -07:00
Cheng Chang
b7ef4eec46 [NNC] Add loop slicing transforms (#43854)
Summary:
Add new transforms `sliceHead` and `sliceTail` to `LoopNest`, for example:

Before transformation:
```
for x in 0..10:
  A[x] = x*2
```

After `sliceHead(x, 4)`:

```
for x in 0..4:
  A[x] = x*2
for x in 4..10:
  A[x] = x*2
```

After `sliceTail(x, 1)`:
```
for x in 0..4:
  A[x] = x*2
for x in 4..9:
  A[x] = x*2
for x in 9..10:
  A[x] = x*2
```

`sliceHead(x, 10)` and `sliceTail(x, 10)` is no-op.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43854

Test Plan: Tests are added in `test_loopnest.cpp`, the tests cover the basic transformations, and also tests the combination with other transformations such as `splitWithTail`.

Reviewed By: nickgg

Differential Revision: D23417366

Pulled By: cheng-chang

fbshipit-source-id: 06c6348285f2bafb4be3286d1642bfbe1ea499bf
2020-09-11 12:09:12 -07:00
Jerry Zhang
11fb51d093 [quant][graphmode][fx][fix] Support dictionary output (#44508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44508

Bug fix for dictionary output

Test Plan: Imported from OSS

Reviewed By: z-a-f

Differential Revision: D23636182

fbshipit-source-id: 0c00cd6b9747fa3f8702d7f7a0d5edb31265f466
2020-09-11 11:29:20 -07:00
Ann Shan
442957d8b6 [pytorch] Remove mobile nonvariadic run_method (#44235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44235

Removes nonvariadic run_method() from mobile Module entirely (to be later replaced by a variadic version). All use cases should have been migrated to use get_method() and Method::operator() in D23436351
ghstack-source-id: 111848220

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D23484577

fbshipit-source-id: 602fcde61e13047a34915b509da048b9550103b1
2020-09-11 10:23:08 -07:00
Ann Shan
a61318a535 [pytorch] Replace mobile run_method with get_method and operator() (#44202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202

In preparation for changing mobile run_method() to be variadic, this diff:

* Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist.
* Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects.
ghstack-source-id: 111848222

Test Plan: CI, and all the unit tests which currently contain run_method that are being changed.

Reviewed By: iseeyuan

Differential Revision: D23436351

fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577
2020-09-11 10:23:06 -07:00
Guilherme Leobas
cdf5e2ae86 add typing annotations for a few torch.utils.* modules (#43806)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43431. Depends on [gh-43862](https://github.com/pytorch/pytorch/pull/43862) (EDIT: now merged)

Modules:
- torch.utils.mkldnn
- torch.utils.mobile_optimizer
- torch.utils.bundled_inputs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43806

Reviewed By: gmagogsfm

Differential Revision: D23635151

Pulled By: SplitInfinity

fbshipit-source-id: a85b75a7927dde6cc55bcb361f8ff601ffb0b2a1
2020-09-11 10:20:55 -07:00
David Reiss
7d78a6fcdd Update interpolate to use new upsample overloads (#43025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43025

- Use new overloads that better reflect the arguments to interpolate.
- More uniform interface for upsample ops allows simplifying the Python code.
- Also reorder overloads in native_functions.yaml to give them priority.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/37177

ghstack-source-id: 106938111

Test Plan:
test_nn has pretty good coverage.

Relying on CI for ONNX, etc.

Didn't test FC because this change is *not* forward compatible.

To ensure backwards compatibility, I ran this code before this change

```python
def test_func(arg):
    interp = torch.nn.functional.interpolate
    with_size = interp(arg, size=(16,16))
    with_scale = interp(arg, scale_factor=[2.1, 2.2], recompute_scale_factor=False)
    with_compute = interp(arg, scale_factor=[2.1, 2.2])
    return (with_size, with_scale, with_compute)

traced_func = torch.jit.trace(test_func, torch.randn(1,1,1,1))

sample = torch.randn(1, 3, 7, 7)
output = traced_func(sample)

assert not torch.allclose(output[1], output[2])

torch.jit.save(traced_func, "model.pt")
torch.save((sample, output), "data.pt")
```

then this code after this change

```python
model = torch.jit.load("model.pt")
sample, golden = torch.load("data.pt")
result = model(sample)
for r, g in zip(result, golden):
    assert torch.allclose(r, g)
```

Reviewed By: AshkanAliabadi

Differential Revision: D21209991

fbshipit-source-id: 5b2ebb7c3ed76947361fe532d1dbdd6faa3544c8
2020-09-11 09:59:14 -07:00
Gregory Chanan
3de2c0b42f Fix L1Loss when target.requires_grad is True. (#44471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44471

L1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True.

This PR does the following:

1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the L1Loss CriterionTests to verify that the target derivative is checked.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23626008

Pulled By: gchanan

fbshipit-source-id: 2828be16b56b8dabe114962223d71b0e9a85f0f5
2020-09-11 09:51:16 -07:00
Martin Yuan
b73b44f976 [PyTorch Mobile] Move some string ops to register_prim_ops.cpp and make them selective (#44500)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44500

Some user models are using those operators. Unblock them while keep the ops selective.

Test Plan: CI

Reviewed By: linbinyu

Differential Revision: D23634769

fbshipit-source-id: 55841d1b07136b6a27b6a39342f321638dc508cd
2020-09-11 09:24:35 -07:00
Rohan Varma
567c51cce9 In common_distributed, fix TEST_SKIPS multiprocessing manager (#44525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44525

Since `TEST_SKIPS` is a global multiprocessing.manager, this was causing
issues when one test would fail and make the rest of the tests fail during
setup due to networking errors.

See the failed CI job: https://app.circleci.com/pipelines/github/pytorch/pytorch/212491/workflows/0450151d-ca09-4cf6-863d-272de6ed917f/jobs/7389065 for an example, where `test_ddp_backward` failed but then caused the rest of the tests to fail at the line `test_skips.update(TEST_SKIPS)`.

To fix this issue, at the end of every test we revert `TEST_SKIPS` back to a regular dict, and redo the conversion to a `mulitiprocessing.Manager` in the next test, which prevents these errors.
ghstack-source-id: 111844724

Test Plan: CI

Reviewed By: malfet

Differential Revision: D23641618

fbshipit-source-id: 27ce823968ece9804bb4dda898ffac43ef732b89
2020-09-11 09:16:33 -07:00
Gregory Chanan
d07d25a8c5 Fix MSELoss when target.requires_grad is True. (#44437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44437

MSELoss had a completely different (and incorrect, see https://github.com/pytorch/pytorch/issues/43228) path when target.requires_grad was True.

This PR does the following:
1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the MSELoss CriterionTests to verify that the target derivative is checked.

TODO:
1) do we still need check_criterion_jacobian when we run grad/gradgrad checks?
2) ensure the Module tests check when target.requires_grad
3) do we actually test when reduction='none' and reduction='mean'?

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23612166

Pulled By: gchanan

fbshipit-source-id: 4f74d38d8a81063c74e002e07fbb7837b2172a10
2020-09-11 08:51:28 -07:00
Shen Li
a9754fb860 Use TP Tensor.metadata to carry device info (#44396)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44396

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D23602576

Pulled By: mrshenli

fbshipit-source-id: c639789979b2b71fc165efbcf70f37b4c39469df
2020-09-11 08:33:22 -07:00
Shen Li
f44de7cdc3 Add missing rpc.shutdown() (#44417)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44417

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D23626208

Pulled By: mrshenli

fbshipit-source-id: 4ff8cad0e1193f99518804c21c9dd26ae718f4eb
2020-09-11 08:32:15 -07:00
lixinyu
77cc7d1ecd C++ APIs Transformer NN Module Top Layer (#44333)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44333

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23584010

Pulled By: glaringlee

fbshipit-source-id: 990026e3f1b5ae276776e344ea981386cb7528fe
2020-09-11 08:25:27 -07:00
Tongzhou Wang
09892de815 Clarify track_running_stats docs; Make SyncBatchNorm track_running_stats behavior consistent (#44445)
Summary:
context: https://github.com/pytorch/pytorch/pull/38084

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44445

Reviewed By: colesbury

Differential Revision: D23634216

Pulled By: mrshenli

fbshipit-source-id: d1242c694dec0e7794651f8031327625eb9989ee
2020-09-11 08:20:34 -07:00
Nick Gibson
30fccc53a9 [NNC] Don't attempt to refactor conditional scalars (#44223)
Summary:
Fixes a bug in the NNC registerizer for Cuda where it would hoist reads out of a conditional context when trying to cache them. As a quick fix, prevent scalar replacement if a usage is within a condition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44223

Reviewed By: gchanan

Differential Revision: D23551247

Pulled By: nickgg

fbshipit-source-id: 17a7bf2be4c8c3dd8a9ab7997dce9aea200c3685
2020-09-11 04:22:16 -07:00
Zafar
c967e7724e [quant] conv_transpose1d_prepack / conv_transpose1d_unpack (#40360)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40360

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D22158982

Pulled By: z-a-f

fbshipit-source-id: 844d02806554aaa68b521283703e630cc544d419
2020-09-11 04:12:28 -07:00
Elias Ellison
8b8986662f [JIT] Remove profiling nodes in autodiff forward graph (#44420)
Summary:
Previously we were not removing profiling nodes in graphs that required grad and contained diff graphs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44420

Reviewed By: bertmaher

Differential Revision: D23607482

Pulled By: eellison

fbshipit-source-id: af095f3ed8bb3c5d09610f38cc7d1481cbbd2613
2020-09-11 02:59:39 -07:00
Mikhail Zolotukhin
c6febc6480 [JIT] Add a python hook for a function to interpret JIT graphs. (#44493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44493

This function allows to execute a graph exactly as it is, without going
through a graph executor which would run passes on the graph before
interpreting it. I found this feature extremely helpful when I worked on
a stress-testing script to shake out bugs from the TE fuser: I needed to
execute a very specific set of passes on a graph and nothing else, and
then execute exactly it.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23632505

Pulled By: ZolotukhinM

fbshipit-source-id: ea81fc838933743e2057312d3156b77284d832ef
2020-09-11 02:55:26 -07:00
Pritam Damania
51ed31269e Replace FutureMessage with c10::ivalue::Future in DistEngine. (#44239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44239

As part of https://github.com/pytorch/pytorch/issues/41574, use
c10::ivalue::Future everywhere in DistEngine.
ghstack-source-id: 111645070

Test Plan: waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D23553507

fbshipit-source-id: 1b51ba13d1ebfa6c5c70b12028e9e96ce8ba51ff
2020-09-11 01:03:42 -07:00
Jerry Zhang
0c58a017bd [quant][eagermode][refactor] Add set/get method for quantization and fusion mappings (#43990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43990

Allow user to register custom quantization and fusion patterns

Test Plan: Imported from OSS

Reviewed By: z-a-f

Differential Revision: D23485344

fbshipit-source-id: 4f0174ee6d8000d83de0f73cb370e9a1941d54aa
2020-09-10 21:29:39 -07:00
Omkar Salpekar
f7278473d3 [NCCL] Fix NCCL_BLOCKING_WAIT functionality with Async Error Handling (#44411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44411

This basically aborts errored NCCL communicators if either blocking
wait or async error handling is enabled. Otherwise we may abort nccl
communicators where neither are enabled, and this may result in subsequent GPU
operations using corrupted data.
ghstack-source-id: 111839264

Test Plan: Succesful Flow run: f217591683

Reviewed By: jiayisuse

Differential Revision: D23605382

fbshipit-source-id: 6c16f9626362be3b0ce2feaf0979b2dff97ce61b
2020-09-10 20:57:55 -07:00
Richard Zou
69f6d94caa Register diag_backward, diagonal_backward, infinitetely...gelu_backward as operators (#44422)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44422

See #44052 for context.

Test Plan:
- `pytest test/test_autograd.py -v`
- `pytest test/test_nn.py -v`

Reviewed By: mrshenli

Differential Revision: D23607691

Pulled By: zou3519

fbshipit-source-id: 09fbcd66b877af4fa85fd9b2f851ed3912ce84d6
2020-09-10 18:43:18 -07:00
Richard Zou
7ff7e6cfc8 Register cummaxmin_backward, cumprod_backward as operators (#44410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44410

See #44052 for context. One of the cumprod_backward overloads was unused
so I just deleted it.

Test Plan: - `pytest test/test_autograd.py -v`

Reviewed By: mrshenli

Differential Revision: D23605503

Pulled By: zou3519

fbshipit-source-id: f9c5b595e62d2d6e71f26580ba96df15cc9de4f7
2020-09-10 18:43:15 -07:00
Richard Zou
08b431f54c Add trace_backward, masked_select_backward, and take_backward as ops (#44408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44408

See #44052 for context.

Test Plan: - `pytest test/test_autograd.py -v`

Reviewed By: mrshenli

Differential Revision: D23605504

Pulled By: zou3519

fbshipit-source-id: b9b1646d13caa6e536d08669c29bfc2ad8ff89a3
2020-09-10 18:41:07 -07:00
Rohan Varma
41f62b17e7 Fix DDP join() API in the case of model.no_sync() (#44427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44427

Closes https://github.com/pytorch/pytorch/issues/44425

DDP join API currently does not work properly with `model.no_sync()`, see https://github.com/pytorch/pytorch/issues/44425 for details. This PR fixes the problem via the approach mentioned in the issue, namely scheduling an allreduce that tells joined ranks whether to sync in the backwards pass or not. Tests are added for skipping gradient synchronization for various `sync_interval`s.
ghstack-source-id: 111786479

Reviewed By: pritamdamania87

Differential Revision: D23609070

fbshipit-source-id: e8716b7881f8eee95e3e3499283e716bd3d7fe76
2020-09-10 18:31:40 -07:00
Mike Ruberry
c48f511c7e Moves some of TestTorchMathOps to OpInfos (#44277)
Summary:
This PR fixes three OpInfo-related bugs and moves some functions from TestTorchMathOps to be tested using the OpInfo pattern. The bugs are:

- A skip test path in test_ops.py incorrectly formatted its string argument
- Decorating the tests in common_device_type.py was incorrectly always applying decorators to the original test, not the op-specific variant of the test. This could cause the same decorator to be applied multiple times, overriding past applications.
- make_tensor was incorrectly constructing tensors in some cases

The functions moved are:

- asin
- asinh
- sinh
- acosh
- tan
- atan
- atanh
- tanh
- log
- log10
- log1p
- log2

In a follow-up PR more or all of the remaining functions in TestTorchMathOps will be refactored as OpInfo-based tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44277

Reviewed By: mrshenli, ngimel

Differential Revision: D23617361

Pulled By: mruberry

fbshipit-source-id: edb292947769967de9383f6a84eb327f027509e0
2020-09-10 17:31:50 -07:00
Mehdi Mirzazadeh
2e744b1820 Support work.result() to get result tensors for allreduce for Gloo, NCCL backends (#43970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43970

It is resubmition of #43386

Original commit changeset: 27fbeb161706
ghstack-source-id: 111775070

Test Plan:
Added checks to existing unit test and ran it on gpu devserver.
Verified the test that was failing in original diff also passes: https://app.circleci.com/pipelines/github/pytorch/pytorch/210229/workflows/86bde47b-f2da-48e3-a618-566ae2713102/jobs/7253683

Reviewed By: pritamdamania87

Differential Revision: D23455047

fbshipit-source-id: b8dc4a30b95570d68a482c19131674fff2a3bc7c
2020-09-10 17:13:37 -07:00
Ann Shan
1dd3fae3d2 [pytorch] Add logging to mobile Method run (#44234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44234

Changes mobile Method to point to a mobile Module directly instead of the Module ivalue in order to access metadata for logging/debugging, and then adds said logging.
ghstack-source-id: 111775806

Test Plan:
CI/existing unit tests to test BC
Testing fb4a logging:
Built fb4a on D23436351 (because usage of run_method isn't replaced yet in this diff), and then checked the Scuba logs to see that the appropriate ad clicks were logged (one ad for Buzzfeed shopping and another about Netflix from Bustle)

{F328510687}
{F328511201}
[Scuba sample of QPL metrics](https://www.internalfb.com/intern/scuba/query/?dataset=qpl_metrics%2Fpytorch_employee&pool=uber&view=samples_client&drillstate=%7B%22sampleCols%22%3A[%22device_model%22%2C%22instance_id_sampled%22%2C%22method%22%2C%22ios_device_class%22%2C%22points_path%22%2C%22userid_sampled%22%2C%22client_sample_rate%22%2C%22browser_name%22%2C%22ios_device_name%22%2C%22points%22%2C%22is_employee%22%2C%22is_test_user%22%2C%22network_only_queries%22%2C%22annotations%22%2C%22oncall_shortname%22%2C%22environment_tags%22%2C%22revoked_queries%22%2C%22annotations_bool%22%2C%22points_data%22%2C%22annotations_double_array%22%2C%22annotations_string_array%22%2C%22revoked_steps%22%2C%22points_set%22%2C%22device_os_version%22%2C%22ota_version_rollout%22%2C%22steps%22%2C%22vadar_calculation_result%22%2C%22app_name%22%2C%22client_push_phase%22%2C%22vadar%22%2C%22release_channel%22%2C%22interaction_class%22%2C%22exposures%22%2C%22annotations_double%22%2C%22deviceid_sampled%22%2C%22is_logged_in%22%2C%22device_os%22%2C%22time%22%2C%22major_os_ver%22%2C%22annotations_int_array%22%2C%22duration_ns%22%2C%22app_build%22%2C%22bucket_id%22%2C%22cache_and_network_queries%22%2C%22value%22%2C%22vadar_v2%22%2C%22quicklog_event%22%2C%22unixname%22%2C%22vadar_calculation_result_v2%22%2C%22trace_tags%22%2C%22annotations_int%22%2C%22quicklog_module%22%2C%22push_phase%22%2C%22year_class%22%2C%22country%22%2C%22capped_duration%22%2C%22ram_class%22%2C%22weight%22%2C%22carrier%22%2C%22app_id%22%2C%22app_version%22%2C%22react_bundle_version%22%2C%22logging_source%22%2C%22is_unsampled_for_scuba%22%2C%22instrumentation_errors%22%2C%22android_cpu_abi_list%22%2C%22days_after_release%22%2C%22cpu_cores%22%2C%22user_bucket%22%2C%22quicklog_action%22%2C%22server_scuba_sample_rate%22%2C%22points_vector%22%2C%22annotations_bool_array%22%2C%22android_device_class%22%2C%22browser_full_version%22%2C%22major_app_ver%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22hideEmptyColumns%22%3Afalse%2C%22focused_event%22%3A%22%22%2C%22show_metadata%22%3A%22false%22%2C%22start%22%3A%222020-09-08%2011%3A27%3A00%22%2C%22end%22%3A%22start%20%2B%201%20minute%22%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22samplingRatio%22%3A%221%22%2C%22num_samples%22%3A%22100%22%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[]%2C%22modifiers%22%3A[]%2C%22order%22%3A%22none%22%2C%22order_desc%22%3Atrue%2C%22filterMode%22%3A%22DEFAULT%22%2C%22constraints%22%3A[[%7B%22column%22%3A%22quicklog_event%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22MOBILE_MODULE_STATS%5C%22]%22]%7D%2C%7B%22column%22%3A%22userid_sampled%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22100013484978975%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22metrik_view_params%22%3A%7B%22should_use_legacy_colors%22%3Afalse%2C%22columns_skip_formatting%22%3A[]%2C%22view%22%3A%22samples_client%22%2C%22width%22%3A%221358%22%2C%22height%22%3A%22912%22%2C%22tableID%22%3A%22qpl_metrics%2Fpytorch_employee%22%2C%22fitToContent%22%3Afalse%2C%22format_tooltip_in_percent%22%3Afalse%2C%22use_y_axis_hints_as_limits%22%3Atrue%2C%22has_dynamic_context_menu%22%3Atrue%2C%22has_context_menu%22%3Afalse%2C%22legend_mode%22%3A%22nongrid%22%2C%22connect_nulls%22%3Atrue%2C%22timezone_offset%22%3A420%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22y_min_hint%22%3A0%2C%22should_render_plugins_menu%22%3Afalse%7D%7D&normalized=1599581160)
[Scuba sample showing ad source; just the bottom two results](https://www.internalfb.com/intern/scuba/query/?dataset=business_integrity_webpage_semantic&pool=uber&drillstate=%7B%22sampleCols%22%3A[%22from_custom_sampling%22%2C%22data_version%22%2C%22scribe_category_type%22%2C%22page_id%22%2C%22name%22%2C%22source_url%22%2C%22time%22%2C%22title_semantic%22%2C%22major_version%22%2C%22server_protocol%22%2C%22custom_sampling_enabled%22%2C%22ad_id%22%2C%22appversion%22%2C%22clienttime%22%2C%22isemployee%22%2C%22title%22%2C%22images%22%2C%22weight%22%2C%22carrier%22%2C%22is_ad%22%2C%22locale%22%2C%22appid%22%2C%22ip_country%22%2C%22iab_models%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22return_remainder%22%3Afalse%2C%22should_pivot%22%3Afalse%2C%22is_timeseries%22%3Afalse%2C%22hideEmptyColumns%22%3Afalse%2C%22main_dimension%22%3A%22time%22%2C%22start%22%3A%22-5%20minutes%22%2C%22samplingRatio%22%3A%221%22%2C%22compare%22%3A%22none%22%2C%22axes%22%3A%22linked%22%2C%22overlay_types%22%3A[]%2C%22minBucketSamples%22%3A%22%22%2C%22dimensions%22%3A[]%2C%22scale_type%22%3A%22absolute%22%2C%22num_samples%22%3A%22100%22%2C%22metric%22%3A%22avg%22%2C%22fill_missing_buckets%22%3A%22connect%22%2C%22smoothing_bucket%22%3A%221%22%2C%22top%22%3A%227%22%2C%22markers%22%3A%22%22%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22end%22%3A%22now%22%2C%22show_p95_ci%22%3Afalse%2C%22time_bucket%22%3A%22auto%22%2C%22compare_mode%22%3A%22normal%22%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[]%2C%22modifiers%22%3A[]%2C%22order%22%3A%22none%22%2C%22order_desc%22%3Atrue%2C%22filterMode%22%3A%22DEFAULT%22%2C%22constraints%22%3A[[%7B%22column%22%3A%22major_version%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22288%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22metrik_view_params%22%3A%7B%22should_use_legacy_colors%22%3Afalse%2C%22columns_skip_formatting%22%3A[]%2C%22view%22%3A%22time_view%22%2C%22width%22%3A%221358%22%2C%22height%22%3A%22912%22%2C%22tableID%22%3A%22business_integrity_webpage_semantic%22%2C%22fitToContent%22%3Afalse%2C%22format_tooltip_in_percent%22%3Afalse%2C%22use_y_axis_hints_as_limits%22%3Atrue%2C%22has_dynamic_context_menu%22%3Atrue%2C%22has_context_menu%22%3Afalse%2C%22legend_mode%22%3A%22nongrid%22%2C%22connect_nulls%22%3Atrue%2C%22timezone_offset%22%3A420%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22y_min_hint%22%3A0%2C%22should_render_plugins_menu%22%3Afalse%7D%7D&view=samples_client&normalized=1599587280)

Reviewed By: iseeyuan

Differential Revision: D23548687

fbshipit-source-id: 3e63085663f5fd8de90a4c7dbad0a17947aee973
2020-09-10 15:26:33 -07:00
Pritam Damania
a2a81e1335 Add a CONTRIBUTING.md for the distributed package. (#44224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44224

The purpose of this file is to help developers on PT distributed get
upto speed on the code structure and layout for PT Distributed.
ghstack-source-id: 111644842

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D23548377

fbshipit-source-id: 561d5b8e257642de172def8fdcc1311fae20690b
2020-09-10 14:58:00 -07:00
Nikita Shulga
4bead6438a Enable torch.autograd typechecks (#44451)
Summary:
To help with further typing, move dynamically added native contributions from `torch.autograd` to `torch._C._autograd`
Fix invalid error handling pattern in
89ac30afb8/torch/csrc/autograd/init.cpp (L13-L15)
`PyImport_ImportModule` already raises Python exception and nullptr should be returned to properly propagate the to Python runtime.

And all native methods/types in `torch/autograd/__init.py` after `torch._C._init_autograd()` has been called
Use f-strings instead of `.format` in test_type_hints.py
Fixes https://github.com/pytorch/pytorch/issues/44450

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44451

Reviewed By: ezyang

Differential Revision: D23618261

Pulled By: malfet

fbshipit-source-id: fa5f739d7cff8410641128b55b810318c5f636ae
2020-09-10 13:37:29 -07:00
Elias Ellison
cc5a1cf616 [JIT] Erase shapes before fallback graph (#44434)
Summary:
Previously the specialized types were copied over to the fallback function, although the tensors in the fallback type were not of that type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44434

Reviewed By: SplitInfinity

Differential Revision: D23611943

Pulled By: eellison

fbshipit-source-id: 2ea88a97529409f6c5c4c1f59a14b623524933de
2020-09-10 12:07:31 -07:00
Yi Wang
38c10b4f30 [NCCL] Fix the initialization of futureNCCLCallbackStreams (#44347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44347

Cloned from Pull Request resolved: https://github.com/pytorch/pytorch/pull/44097, because the original author Sinan has completed the internship and now is unable to submit this diff.

As johnsonpaul mentioned in D23277575 (7d517cf96f). It looks like all processes were allocating memory on GPU-ID=0.

I was able to reproduce it by running `test_ddp_comm_hook_allreduce_with_then_hook_nccl` unit test of `test_c10d.py` and running `nvidia-smi` while test was running. The issue was reproduced as:
```
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0   3132563      C   python                                       777MiB |
|    0   3132564      C   python                                       775MiB |
|    4   3132564      C   python                                       473MiB |
+-----------------------------------------------------------------------------+
```
I realized that as we initialize ProcessGroupNCCL both processes were initially allocating memory on GPU 0.

We later also realized that I forgot `isHighPriority` input of `getStreamFromPool` and `futureNCCLCallbackStreams_.push_back(std::make_shared<at::cuda::CUDAStream>(at::cuda::getStreamFromPool(device_index)));` was just creating a vector of GPU 0 streams. As i changed `at::cuda::getStreamFromPool(device_index)` to `at::cuda::getStreamFromPool(false, device_index)`. `nvidia-smi` looked like:
```
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    673925      C   python                                       771MiB |
|    0    673926      C   python                                       771MiB |
|    1    673925      C   python                                       771MiB |
|    1    673926      C   python                                       771MiB |
|    2    673925      C   python                                       771MiB |
|    2    673926      C   python                                       771MiB |
|    3    673925      C   python                                       771MiB |
|    3    673926      C   python                                       771MiB |
|    4    673925      C   python                                       771MiB |
|    4    673926      C   python                                       771MiB |
|    5    673925      C   python                                       771MiB |
|    5    673926      C   python                                       771MiB |
|    6    673925      C   python                                       771MiB |
|    6    673926      C   python                                       771MiB |
|    7    673925      C   python                                       707MiB |
|    7    673926      C   python                                       623MiB |
+-----------------------------------------------------------------------------+
```
This confirms that we were just getting GPU 0 streams for the callback. I think this does not explain the `fp16_compress` stability issue, because we were able to reproduce that even without any then callback and just calling copy from fp32 to fp16 before allreduce. However, this can explain other issues where `allreduce` was not on par with `no_hook`. I'll run some additional simulations with this diff.

I tried to to replace `getStreamFromPool` by `getDefaultCUDAStream(deviceIndex)` and it wasn't causing additional memory usage. In this diff, I temporarily solved the issue by just initializing null pointers for each device in the constructor and setting the callback stream for corresponding devices inside `ProcessGroupNCCL::getNCCLComm`. After the fix it looks like the memory issue was resolved:
```
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0   2513142      C   python                                       745MiB |
|    4   2513144      C   python                                       747MiB |
+-----------------------------------------------------------------------------+
```
I could use a dictionary instead of a vector for `futureNCCLCallbackStreams_`, but since number of devices is fixed, I think it isn't necessary. Please let me know what you think in the comments.
ghstack-source-id: 111485483

Test Plan:
`test_c10d.py` and some perf tests. Also check `nvidia-smi` while running tests to validate memory looks okay.

This diff also fixes the regression in HPC tests as we register a hook:

{F322730175}

See https://fb.quip.com/IGuaAbD8 (474fdd7e2d)bnvy for details.

Reviewed By: pritamdamania87

Differential Revision: D23495436

fbshipit-source-id: ad08e1d94343252224595d7c8a279fe75e244822
2020-09-10 11:25:38 -07:00
Kenichi Maehashi
cb90fef770 Fix return value of PyErr_WarnEx ignored (SystemError) (#44371)
Summary:
This PR fixes unexpected `SystemError` when warnings are emitted and warning filters are set.

## Current behavior

```
$ python -Werror
>>> import torch
>>> torch.range(1, 3)
UserWarning: torch.range is deprecated in favor of torch.arange and will be removed in 0.5. Note that arange generates values in [start; end), not [start; end].

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: <built-in method range of type object at 0x7f38c7703a60> returned a result with an error set
```

## Expected behavior

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end).
```

## Note

Python exception must be raised if `PyErr_WarnEx` returns `-1` ([python docs](https://docs.python.org/3/c-api/exceptions.html#issuing-warnings)). This PR fixes warnings raised in the following code:
```py
import torch

torch.range(1, 3)
torch.autograd.Variable().volatile
torch.autograd.Variable().volatile = True
torch.tensor(torch.tensor([]))
torch.tensor([]).new_tensor(torch.tensor([]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44371

Reviewed By: mrshenli

Differential Revision: D23598410

Pulled By: albanD

fbshipit-source-id: 2fbcb13fe4025dbebaf1fd837d4c8e0944e05010
2020-09-10 10:15:21 -07:00
Hameer Abbasi
f9a0d0c21e Allow Tensor-likes in torch.autograd.gradcheck (#43877)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42942

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43877

Reviewed By: zou3519

Differential Revision: D23493257

Pulled By: ezyang

fbshipit-source-id: 6cdaabe17157b484e9491189706ccc15420ac239
2020-09-10 09:02:17 -07:00
Gregory Chanan
c8914afdfa Merge criterion_tests and new_criterion_tests. (#44398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44398

These end up executing the same tests, so no reason to have them separate.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23600855

Pulled By: gchanan

fbshipit-source-id: 0952492771498bf813f1bf8e1d7c8dce574ec965
2020-09-10 08:29:59 -07:00
Gregory Chanan
fa158c4ca6 Combine criterion and new criterion tests in test_jit. (#43958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43958

There is not any difference between these tests (I'm merging them), so let's merge them in the JIT as well.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23452337

Pulled By: gchanan

fbshipit-source-id: e6d13cdb164205eec3dbb7cdcd0052b02c961778
2020-09-10 08:28:14 -07:00
Gregory Chanan
af9cad761a Stop ignoring NotImplementedErrors in cuda CriterionTests. (#44381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44381

Perhaps this was necessary when the test was originally introduced, but it's difficult to figure out what is actually tested.  And I don't think we actually use NotImplementedErorrs.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23598646

Pulled By: gchanan

fbshipit-source-id: aa18154bfc4969cca22323e61683a301198823be
2020-09-10 08:18:33 -07:00
generatedunixname89002005287564
356aa54694 [Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D23621463

fbshipit-source-id: 1cd7e94e480c7073c9a0aad55aeba98de4b96164
2020-09-10 04:24:43 -07:00
Kurt Mohler
28a23fce4c Deprecate torch.norm and torch.functional.norm (#44321)
Summary:
Part of https://github.com/pytorch/pytorch/issues/24802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44321

Reviewed By: mrshenli

Differential Revision: D23617273

Pulled By: mruberry

fbshipit-source-id: 6f88b5cb097fd0acb9cf0e415172c5a86f94e9f2
2020-09-10 01:16:41 -07:00
Chris Huynh
7b547f086f To fix extra memory allocation when using circular padding (#39273)
Summary:
For fixing https://github.com/pytorch/pytorch/issues/39256

Pull Request resolved: https://github.com/pytorch/pytorch/pull/39273

Reviewed By: anjali411

Differential Revision: D23471811

Pulled By: mruberry

fbshipit-source-id: fb324b51baea765311715cdf14642b334f335733
2020-09-10 00:15:31 -07:00
Jeff Daily
65d4a6b7c0 [ROCm] fix cub hipify mappings (#44431)
Summary:
Fixes ROCm-specific workarounds introduced by https://github.com/pytorch/pytorch/issues/44259.  This adds new hipify mappings that properly handle cub outside of caffe2 sources.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44431

Reviewed By: mrshenli

Differential Revision: D23617417

Pulled By: ngimel

fbshipit-source-id: 5d16afb6b8e6ec5ed049c51571866b0878d534ca
2020-09-09 23:39:25 -07:00
Cheng Chang
28bd4929bd [NNC] Make it able to normalize loop with variable start (#44133)
Summary:
Loops with variable start can also be normalized.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44133

Test Plan: updated testNormalizeStartVariable.

Reviewed By: navahgar

Differential Revision: D23507097

Pulled By: cheng-chang

fbshipit-source-id: 4e9aad1cd4f4a839f59a00bf8ddf97637a1a6648
2020-09-09 23:05:57 -07:00
taiyuanz
c515881137 Add reset_grad() function (#44423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44423

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42754

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23010859

Pulled By: ngimel

fbshipit-source-id: 56eec43eba88b98cbf714841813977c68f983564
2020-09-09 22:05:45 -07:00
Meghan Lele
89ac30afb8 [JIT] Propagate type sharing setting to submodule compilation (#44226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44226

**Summary**
At present, the `share_types` argument to `create_script_module` is used
to decide whether to reuse a previously created type for a top-level
module that has not yet been compiled. However, that setting does not apply
to the compilation of submodules of the top-level module; types are
still reused if possible.

This commit modifies `create_script_module` so that the `share_types`
flag is honoured during submodule compilation as well.

**Test Plan**
This commit adds a unit test to `TestTypeSharing` that checks that
submodule types are not shared or reused when `share_types` is set to
`False`.

**Fixes**
This commit fixes #43605.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23602371

Pulled By: SplitInfinity

fbshipit-source-id: b909b8b6abbe3b4cb9be8319ac263ade90e83bd3
2020-09-09 20:06:35 -07:00
Meghan Lele
d3b6d5caf1 [JIT] Add support for del to TS classes (#44352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44352

**Summary**
This commit adds support for `del` with class instances. If a class
implements `__delitem__`, then `del class_instance[key]` is syntactic
sugar for `class_instance.__delitem__[key]`.

**Test Plan**
This commit adds a unit test to TestClassTypes to test this feature.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23603102

Pulled By: SplitInfinity

fbshipit-source-id: 28ad26ddc9a693a58a6c48a0e853a1c7cf5c9fd6
2020-09-09 19:52:35 -07:00
Omkar Salpekar
e028ad0762 Fix HashStoreTests and move to Gtest (#43384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43384

Much like the FileStoreTests, the HashStoreTests were also run in a single blob and threw exceptions upon failure. This modularizes the test by separating each function into separate gtest test cases.
ghstack-source-id: 111690834

Test Plan: Confirmed that the tests pass on devvm.

Reviewed By: jiayisuse

Differential Revision: D23257579

fbshipit-source-id: 7e821f0e9ee74c8b815f06facddfdb7dc2724294
2020-09-09 17:56:33 -07:00
Omkar Salpekar
69a3ff005d Modularize FileStoreTest and move to Gtest (#43383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43383

FileStore Test currently has a large blob of tests that throw
exceptions upon failure. This PR modularizes each test so they can run
independently, and migrates the framework to gtest.
ghstack-source-id: 111690831

Test Plan: Confirmed tests pass on devvm

Reviewed By: jiayisuse

Differential Revision: D22879473

fbshipit-source-id: 6fa5468e594a53c9a6b972757068dfc41645703e
2020-09-09 17:56:30 -07:00
Omkar Salpekar
a7fba7de22 Convert StoreTestUtils to Gtest (#43382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43382

StoreTestCommon defines standard helper functions that are used by all of our Store tests. These helpers currently throw exceptions upon failure, this PR changes them to use gtest assertions instead.
ghstack-source-id: 111690833

Test Plan: Tested the 2 PR's above this on devvm

Reviewed By: jiayisuse

Differential Revision: D22828156

fbshipit-source-id: 9e116cf2904e05ac0342a441e483501e00aad3dd
2020-09-09 17:55:25 -07:00
Elias Ellison
b69c28d02c Improving ModuleList indexing error msg (#43361)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/41946/, to suggest enumerating a module as an alternative if a user tries indexing into a modulelist/sequential with a non-integer literal

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43361

Reviewed By: mrshenli

Differential Revision: D23602388

Pulled By: eellison

fbshipit-source-id: 51fa28d5bc45720529b3d45e92d367ee6c9e3316
2020-09-09 16:22:57 -07:00
Elias Ellison
e0c65abd38 Revert D23568330: [pytorch][PR] Moves some of TestTorchMathOps to OpInfos
Test Plan: revert-hammer

Differential Revision:
D23568330 (a953a825cc)

Original commit changeset: 03e69fccdbfd

fbshipit-source-id: 04ec6843c5eb3c84ddf226dad0088172d9bed84d
2020-09-09 15:48:56 -07:00
Lillian Johnson
b0bcdbb1ab [JIT] Support partially specified sizes/strides in IRParser (#44113)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44113

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23508149

Pulled By: Lilyjjo

fbshipit-source-id: b6b2d32109fae599bc5347dae742b67a2e4a0a49
2020-09-09 14:45:51 -07:00
Yuchen Huang
a00d36b0e7 [PyTorch][Mobile] Insert the module name as name() to metadata dict if metadata doesn't contain "model_name" (#44400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44400

This diff does the identical thing as D23549149 (398409f072) does. A fix included for OSS CI: pytorch_windows_vs2019_py36_cuda10.1_test1
ghstack-source-id: 111679745

Test Plan:
- CI
- OSS CI

Reviewed By: xcheng16

Differential Revision: D23601050

fbshipit-source-id: 8ebdcd8fdc5865078889b54b0baeb397a90ddc40
2020-09-09 13:01:17 -07:00
Ailing Zhang
24efd29d19 Check commutativity for computed dispatch table and add a test to check entries. (#44088)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44088

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23492793

Pulled By: ailzhang

fbshipit-source-id: 37502f2a8a4d755219b400fcbb029e49d6cdb6e9
2020-09-09 12:48:34 -07:00
Omkar Salpekar
48c47db8fe [NCCL] Add Environment Variable to guard Async Error Handling feature (#44163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44163

In this PR, we introduce a new environment variable
(NCCL_ASYNC_ERROR_HANDLING), which guards the asynchronous error handling
feature. We intend to eventually turn this feature on by default for all users,
but this is a temporary solution so the change in behavior from hanging to
crashing is not the default for users all of a sudden.
ghstack-source-id: 111637788

Test Plan:
CI/Sandcastle. We will turn on this env var by default in
torchelastic and HPC trainer soon.

Reviewed By: jiayisuse

Differential Revision: D23517895

fbshipit-source-id: e7cd244b2ddf2dc0800ff7df33c73a6f00b63dcc
2020-09-09 12:26:25 -07:00
Omkar Salpekar
211ece7267 [NCCL] ProcessGroupNCCL Destructor Blocks on WorkNCCL Completion (#41054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41054

**This Commit:**
ProcessGroupNCCL destructor now blocks until all WorkNCCL objects have either been aborted or completed and removed from the work vector.

**This Stack:**
The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic.

ghstack-source-id: 111614314

Test Plan:
1. **DDP Sanity Check**: First we have a sanity check based on the PyTorch DDP benchmark. This verifies that the baseline DDP training with NCCL for  standard CU workloads works well (esp. with standard models like Resnet50 and BERT). Here is a sample Flow: f213293473

1. **HPC Performance Benchmarks**: This stack has undergone thorough testing and profiling on the Training Cluster with varying number of nodes. This introduces 1-1.5% QPS regression only (~200-400 QPS regression for 8-64 GPUs).

1. **HPC Accuracy Benchmarks**: We've confirmed NE parity with the existing NCCL/DDP stack without this change.

1. **Kernel-Specific Benchmarks**: We have profiled other approaches for this system (such as cudaStreamAddCallback) and performed microbenchmarks to confirm the current solution is optimal.

1. **Sandcastle/CI**: Apart from the recently fixed ProcessGroupNCCL tests, we will also introduce a new test for desynchronization scenarios.

Reviewed By: jiayisuse

Differential Revision: D22054298

fbshipit-source-id: 2b95a4430a4c9e9348611fd9cbcb476096183c06
2020-09-09 12:26:22 -07:00
Omkar Salpekar
afbf2f140b [NCCL] WorkNCCL Helper Functions (#41053)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41053

**This Commit:**
Some minor refactoring - added helper to check if `WorkNCCL` objects have timed out. Adding a new finish function to ProcessGroupNCCL::WorkNCCL that avoids notifying CV and uses `lock_guard`. Also renaming the timeoutCVMutex mutex to be more descriptive.

**This Stack:**
The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic.

ghstack-source-id: 111614315

Test Plan: See D22054298 for verification of correctness and performance

Reviewed By: jiayisuse

Differential Revision: D21943520

fbshipit-source-id: b27ee329f0da6465857204ee9d87953ed6072cbb
2020-09-09 12:26:18 -07:00
Omkar Salpekar
f8f7b7840d [NCCL] Abort Errored and Timed Out NCCL Communicators from Watchdog Thread (#41052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41052

**This Commit:**
Watchdog Thread checks for error-ed or timed out `WorkNCCL` objects and aborts all associated NCCL Communicators. For now, we  also process these aborted communicators as with the existing Watchdog logic (by adding them to abortedCommIds and writing aborted communicator ids to the store.)

**This Stack:**
The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic.

ghstack-source-id: 111614313

Test Plan: See D22054298 for verification of correctness and performance

Reviewed By: jiayisuse

Differential Revision: D21943151

fbshipit-source-id: 337bfcb8af7542c451f1e4b3dcdfc5870bdec453
2020-09-09 12:26:15 -07:00
Omkar Salpekar
4e5c55ef69 [NCCL] Use cudaEventQuery to Poll for GPU operation errors (#41051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41051

**This Commit:**
In the workCleanupThread, we process completion and exception handling for workNCCL objects corresponding to collective calls that have either completed GPU Execution, or have already thrown an exception. This way, we throw an exception from the workCleanupThread for failed GPU operations. This approach replaces the previous (and lower performance) approach of enqueuing a callback on the CUDA stream to process failures.

**This Stack:**
The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic.

ghstack-source-id: 111614319

Test Plan: See D22054298 for verification of correctness and performance

Reviewed By: jiayisuse

Differential Revision: D21938498

fbshipit-source-id: df598365031ff210afba57e0c7be865e3323ca07
2020-09-09 12:26:12 -07:00
Omkar Salpekar
1df24fd457 [NCCL] Timeout Loop Thread for Async Error Handling (#41050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41050

**This Commit:**
We introduce a workVector to track live workNCCL objects corresponding to collective operations. Further, we introduce a workCleanupLoop, which busy-polls the vector of workNCCL objects and removes them upon completion.

**This Stack:**
The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic.

Test Plan: See D22054298 for verification of correctness and performance

Reviewed By: jiayisuse

Differential Revision: D21916637

fbshipit-source-id: f8cadaab0071aaad1c4e31f9b089aa23cba0cfbe
2020-09-09 12:25:06 -07:00
Nikita Shulga
683380fc91 Use compile time cudnn version if linking with it statically (#44402)
Summary:
This should prevent torch_python from linking the entire cudnn library statically just to query its version

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44402

Reviewed By: seemethere

Differential Revision: D23602720

Pulled By: malfet

fbshipit-source-id: 185b15b789bd48b1df178120801d140ea54ba569
2020-09-09 11:33:41 -07:00
Bert Maher
6ec8fabc29 Fix frac in CUDA fuser (#44152)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44152

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23528506

fbshipit-source-id: bfd468d72fa55ce317f88ae83e1f2d5eee041aa0
2020-09-09 11:10:08 -07:00
Bert Maher
350130a69d Prevent the TE fuser from getting datatypes it can't handle (#44160)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44160

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D23528508

Pulled By: bertmaher

fbshipit-source-id: 03b22725fb2666f441cb504b35397ea6d155bb85
2020-09-09 11:10:04 -07:00
Bert Maher
960c088a58 [te] Fix casting of unsigned char, and abs(int) (#44157)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44157

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D23528507

Pulled By: bertmaher

fbshipit-source-id: c5ef0422a91a4665b616601bed8b7cd137be39f9
2020-09-09 11:08:36 -07:00
Omkar Salpekar
7c464eed16 Skipping CUDA tests in ProcessGroupGloo and logs (#42488)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42488

Currently, ProcessGroupGloo tests do not emit logs if the test was
skipped due CUDA not being available/not enough CUDA devices. This PR clarifies
the reason for skipping through these logs.
ghstack-source-id: 111638111

Test Plan: tested on devvm and devgpu

Reviewed By: jiayisuse

Differential Revision: D22879396

fbshipit-source-id: d483ca46b5e22ed986521262c11a1c6dbfbe7efd
2020-09-09 10:52:52 -07:00
Michael Carilli
2a87742ffa Autocast wrappers for RNN cell apis (#44296)
Summary:
Should fix https://github.com/pytorch/pytorch/issues/42605.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44296

Reviewed By: izdeby

Differential Revision: D23580447

Pulled By: ezyang

fbshipit-source-id: 86027b693fd2b648f043ab781b84ffcc1f72854d
2020-09-09 09:44:59 -07:00
Mike Ruberry
a953a825cc Moves some of TestTorchMathOps to OpInfos (#44277)
Summary:
This PR fixes three OpInfo-related bugs and moves some functions from TestTorchMathOps to be tested using the OpInfo pattern. The bugs are:

- A skip test path in test_ops.py incorrectly formatted its string argument
- Decorating the tests in common_device_type.py was incorrectly always applying decorators to the original test, not the op-specific variant of the test. This could cause the same decorator to be applied multiple times, overriding past applications.
- make_tensor was incorrectly constructing tensors in some cases

The functions moved are:

- asin
- asinh
- sinh
- acosh
- tan
- atan
- atanh
- tanh
- log
- log10
- log1p
- log2

In a follow-up PR more or all of the remaining functions in TestTorchMathOps will be refactored as OpInfo-based tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44277

Reviewed By: ngimel

Differential Revision: D23568330

Pulled By: mruberry

fbshipit-source-id: 03e69fccdbfd560217c34ce4e9a5f20e10d05a5e
2020-09-09 09:41:03 -07:00
Bert Maher
8acce55015 Dump optimized graph when logging in already-optimized PE (#44315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44315

I find it more intuitive to dump the optimized graph if we have one;
when I first saw the unoptimized graph being dumped I thought we had failed to
apply any optimizations.

Test Plan: Observe output by hand

Reviewed By: Lilyjjo

Differential Revision: D23578813

Pulled By: bertmaher

fbshipit-source-id: e2161189fb0e1cd53aae980a153aea610871662a
2020-09-09 01:28:48 -07:00
Taewook Oh
7a64b0c27a Export Node::isBefore/isAfter for PythonAPI (#44162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44162

This diff exports Node::isBefore/isAfter method to PythonAPI.

Test Plan: Tested locally. Please let me know if there is a set of unit tests to be passed.

Reviewed By: soumith

Differential Revision: D23514448

fbshipit-source-id: 7ef709b036370217ffebef52fd93fbd68c464e89
2020-09-09 00:57:08 -07:00
Rohan Varma
b22abbe381 Enable test_distributed to work with spawn mode (#41769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41769

Currently the tests in `test_distributed` only work with the `fork` mode multiprocessing, this PR introduces support for `spawn` mode multiprocessing as well (while keeping the `fork` mode intact).

Motivations for the change:
1) Spawn multiprocessing is the default on MacOS, so it better emulates how MacOS users would use distributed
2) With python 3.8+, spawn is the default on linux, so we should have test coverage for this
3) PT multiprocessing suggests using spawn/forkserver over fork, for sharing cuda tensors: https://pytorch.org/docs/stable/multiprocessing.html
4) Spawn is better supported with respect to certain sanitizers such as TSAN, so adding this sanitizer coverage may help us uncover issues.

How it is done:
1) Move `test_distributed` tests in `_DistTestBase` class to a shared file `distributed_test` (similar to how the RPC tests are structured)
2) For `Barrier`, refactor the setup of temp directories, as the current version did not work with spawn, each process would get a different randomly generated directory and thus would write to different barriers.
3) Add all the relevant builds to run internally and in OSS.
Running test_distributed with spawn mode in OSS can be done with:
`python test/run_test.py -i distributed/test_distributed_spawn -v`

Reviewed By: izdeby

Differential Revision: D22408023

fbshipit-source-id: e206be16961fd80438f995e221f18139d7e6d2a9
2020-09-08 23:11:12 -07:00
Natalia Gimelshein
ecc6358dbe Port nonzero cuda from THC to ATen (#44259)
Summary:
1) Ports nonzero from THC to ATen
2) replaces most thrust uses with cub, to avoid synchronization and to improve performance. There is still one necessary synchronization point, communicating number of nonzero elements from GPU to CPU
3) slightly changes algorithm, now we first compute the number of nonzeros, and then allocate correct-sized output, instead of allocating full-sized output as was done before, to account for possibly all elements being non-zero
4) unfortunately, since the last transforms are still done with thrust, 2) is slightly beside the point, however it is a step towards a future without thrust
4) hard limits the number of elements in the input tensor to MAX_INT. Previous implementation allocated a Long tensor with the size ndim*nelements, so that would be at least 16 GB for a tensor with MAX_INT elements. It is reasonable to say that larger tensors could not be used anyway.

Benchmarking is done for tensors with approximately half non-zeros
<details><summary>Benchmarking script</summary>
<p>

```
import torch
from torch.utils._benchmark import Timer
from torch.utils._benchmark import Compare
import sys

device = "cuda"
results = []
for numel in (1024 * 128,):#, 1024 * 1024, 1024 * 1024 * 128):
    inp = torch.randint(2, (numel,), device="cuda", dtype=torch.float)
    for ndim in range(2,3):#(1,4):
        if ndim == 1:
            shape = (numel,)
        elif ndim == 2:
            shape = (1024, numel // 1024)
        else:
            shape = (1024, 128, numel // 1024 // 128)
        inp = inp.reshape(shape)
        repeats = 3
        timer = Timer(stmt="torch.nonzero(inp, as_tuple=False)", label="Nonzero", sub_label=f"number of elts {numel}",
        description = f"ndim {ndim}", globals=globals())
        for i in range(repeats):
            results.append(timer.blocked_autorange())
        print(f"\rnumel {numel} ndim {ndim}", end="")
        sys.stdout.flush()

comparison = Compare(results)
comparison.print()
```
</p>
</details>

### Results
Before:
```
[--------------------------- Nonzero ---------------------------]
                                 |  ndim 1  |   ndim 2  |   ndim 3
 1 threads: ------------------------------------------------------
       number of elts 131072     |    55.2  |     71.7  |     90.5
       number of elts 1048576    |   113.2  |    250.7  |    497.0
       number of elts 134217728  |  8353.7  |  23809.2  |  54602.3

 Times are in microseconds (us).
```
After:
```
[-------------------------- Nonzero --------------------------]
                                |  ndim 1  |  ndim 2  |  ndim 3
1 threads: ----------------------------------------------------
      number of elts 131072     |    48.6  |    79.1  |    90.2
      number of elts 1048576    |    64.7  |   134.2  |   161.1
      number of elts 134217728  |  3748.8  |  7881.3  |  9953.7

Times are in microseconds (us).

```
There's a real regression for smallish 2D tensor due to added work of computing number of nonzero elements, however, for other sizes there are significant gains, and there are drastically lower memory requirements. Perf gains would be even larger for tensors with fewer nonzeros.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44259

Reviewed By: izdeby

Differential Revision: D23581955

Pulled By: ngimel

fbshipit-source-id: 0b99a767fd60d674003d83f0848dc550d7a363dc
2020-09-08 20:52:51 -07:00
Mikhail Zolotukhin
bd8e38cd88 [TensorExpr] Fuser: check node inputs' device before merging the node into a fusion group. (#44241)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44241

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23554192

Pulled By: ZolotukhinM

fbshipit-source-id: fb03262520303152b83671603e08e7aecc24f5f2
2020-09-08 19:32:23 -07:00
Supriya Rao
646ffd4886 [quant] Move EmbeddingBag eager quantization to static (#44217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44217

Move the tests to static ones as well

Test Plan:
python test/test_quantization.py TestStaticQuantizedModule.test_embedding_bag_api

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D23547386

fbshipit-source-id: 41f81c31e1613098ecf6a7eff601c7dcd4b09c76
2020-09-08 19:05:02 -07:00
Supriya Rao
57b87aaf59 [quant] Add quantized Embedding module (#44208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44208

Add quantized module in static quantization namespace. Embedding
quantization requires only weights to be quantized so it is static.
Internally it calls the embedding_bag_byte op with the offsets set corresponding to the
indices.

Future PR will move EmbeddingBag quantization from dynamic to static as well.

Test Plan:
python test/test_quantization.py test_embedding_api

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23547384

fbshipit-source-id: eddc6fb144b4a771060e7bab5853656ccb4443f0
2020-09-08 19:04:59 -07:00
Jerry Zhang
6269b6e0f0 [quant][graphmode][fx][api] Call fuse in prepare (#43984)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43984

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23459261

fbshipit-source-id: 6b56b0916d76df67b9cc2f4be1fcee905d604019
2020-09-08 18:09:26 -07:00
Nick Gibson
be94dba429 [NNC] fix support for FP16 in CudaCodgen (#44209)
Summary:
Fixes a bug where FP16 values could be incorrectly cast to a half type that doesn't have a cast operator by inserting the cuda specific cast to float during handling of the Cast node, not as a wrapper around printing Loads and Stores. Two main changes: the HalfChecker now inserts the casts to float explicitly in the IR, and the PrioritizeLoad mutator now consumes both Loads and a Cast which immediately preceded a load.

Tested with test_jit_fuser_te.py and test_tensorexpr.py, plus C++ tests obv.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44209

Reviewed By: izdeby

Differential Revision: D23575577

Pulled By: nickgg

fbshipit-source-id: 808605aeb2af812758f96f9fdc11b07e08053b46
2020-09-08 18:00:39 -07:00
Jerry Zhang
9f54bcc522 [quant][graphmode][fx] Support inplace option (#43983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43983

Support inplace option in apis

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23459260

fbshipit-source-id: 80409c7984f17d1a4e13fb1eece8e18a69ee43b3
2020-09-08 17:39:13 -07:00
Vasiliy Kuznetsov
00b5bd536f fx quant: add docblocks to _find_matches and _find_quants (#43928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43928

Improving readability, no logic change.

Test Plan:
CI

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23440249

fbshipit-source-id: a7ebfc7ad15c73e26b9a94758e7254413cc17d29
2020-09-08 16:13:11 -07:00
Jerry Zhang
43e38d60d6 [quant][graphmode][fx] Support quantize per channel in all cases (#44042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44042

Missed one case last time

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23479345

fbshipit-source-id: 30e6713120c494e9fab5584de4df9b25bec83d32
2020-09-08 15:45:14 -07:00
James Reed
1fcccd6a18 [FX] Minor fixups in Graph printout (#44214)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44214

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D23545501

Pulled By: jamesr66a

fbshipit-source-id: dabb3b051ed4da213b2087979ade8a649288bd5d
2020-09-08 14:45:32 -07:00
Sujoy Saraswati
54931ebb7b Release saved variable from DifferentiableGraphBackward (#42994)
Summary:
When the backward ops execute via the autograd engine evaluate_function(), the fn.release_variables() is called to release the SavedVariables. For the eager mode ops, this releases the saved inputs that was required for backward grad function. However, with TorchScript, we get a DifferentableGraph and the DifferentiableGraphBackward() doesn't implement a release_variables(). This leads to the SavedVariables to be alive longer. Implement release_variables() for DifferentiableGraphBackward to release these SavedVariables  early.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42994

Reviewed By: izdeby

Differential Revision: D23503172

Pulled By: albanD

fbshipit-source-id: d87127498cfa72883ae6bb31d0e6c7056c4c36d4
2020-09-08 14:36:52 -07:00
Mike Ruberry
63d62d3e44 Skips test_addcmul_cuda if using ROCm (#44304)
Summary:
This test is failing consistently on linux-bionic-rocm3.7-py3.6-test2. Relevant log snippet:

```
03:43:11 FAIL: test_addcmul_cuda_float16 (__main__.TestForeachCUDA)
03:43:11 ----------------------------------------------------------------------
03:43:11 Traceback (most recent call last):
03:43:11   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 818, in wrapper
03:43:11     method(*args, **kwargs)
03:43:11   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 258, in instantiated_test
03:43:11     result = test(self, *args)
03:43:11   File "test_foreach.py", line 83, in test_addcmul
03:43:11     self._test_pointwise_op(device, dtype, torch._foreach_addcmul, torch._foreach_addcmul_, torch.addcmul)
03:43:11   File "test_foreach.py", line 58, in _test_pointwise_op
03:43:11     self.assertEqual(tensors, expected)
03:43:11   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1153, in assertEqual
03:43:11     exact_dtype=exact_dtype, exact_device=exact_device)
03:43:11   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1127, in assertEqual
03:43:11     self.assertTrue(result, msg=msg)
03:43:11 AssertionError: False is not true : Tensors failed to compare as equal! With rtol=0.001 and atol=1e-05, found 10 element(s) (out of 400) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.00048828125 (-0.46484375 vs. -0.46533203125), which occurred at index (11, 18).
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44304

Reviewed By: malfet, izdeby

Differential Revision: D23578316

Pulled By: mruberry

fbshipit-source-id: 558eecf42677383e7deaa4961e12ef990ffbe28c
2020-09-08 13:14:25 -07:00
Meghan Lele
caf23d110f [JIT] Unshare types for modules that define() in __init__ (#44233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44233

**Summary**
By default, scripting tries to share concrete and JIT types across
compilations. However, this can lead to incorrect results if a module
extends `torch.jit.ScriptModule`, and injects instance variables into
methods defined using `define`.

This commit detects when this has happened and disables type sharing
for the compilation of the module that uses `define` in `__init__`.

**Test Plan**
This commit adds a test to TestTypeSharing that tests this scenario.

**Fixes**
This commit fixes #43580.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23553870

Pulled By: SplitInfinity

fbshipit-source-id: d756e87fcf239befa0012998ce29eeb25728d3e1
2020-09-08 12:16:45 -07:00
James Reed
4e0ac120e9 [FX] Only copy over training attr if it\'s there (#44314)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44314

Test Plan: Imported from OSS

Reviewed By: dzhulgakov

Differential Revision: D23578189

Pulled By: jamesr66a

fbshipit-source-id: fb7643f28582bd5009a826663a937fbe188c50bc
2020-09-08 11:50:08 -07:00
Vasiliy Kuznetsov
fd8e2064e0 quant: switch observers to use min_max (#42957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42957

Switches observers to use the new min_max function to calculate
min and max at the same time.  We see around 45-50% speedup on
representative input shapes on the microbenchmarks for all observers except `HistogramObserver`.

Test Plan:
CI for correctness

performance:
```
cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
/*
    * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/
    * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/
    * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/
    * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/
*/
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D23093995

fbshipit-source-id: 9f416d144109b5b80baf089eb4bcfabe8fe358d5
2020-09-08 11:39:44 -07:00
Ailing Zhang
1b2da9ed82 Expose alias key info in dumpState and update test_dispatch. (#44081)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44081

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23492794

Pulled By: ailzhang

fbshipit-source-id: 27a2978591900463bda2e92e0201c9fd719f9792
2020-09-06 18:43:05 -07:00
Mike Ruberry
bb861e1d69 Ports CUDA var and std reduce all (with no out argument) to ATen, fixes var docs (#43858)
Summary:
When var and std are called without args (other than unbiased) they currently call into TH or THC. This PR:

- Removes the THC var_all and std_all functions and updates CUDA var and std to use the ATen reduction
- Fixes var's docs, which listed its arguments in the incorrect order
- Adds new tests comparing var and std with their NumPy counterparts

Performance appears to have improved as a result of this change. I ran experiments on 1D tensors, 1D tensors with every other element viewed ([::2]), 2D tensors and 2D transposed tensors. Some notable datapoints:

- torch.randn((8000, 8000))
  - var measured 0.0022215843200683594s on CUDA before the change
  - var measured 0.0020322799682617188s on CUDA after the change
- torch.randn((8000, 8000)).T
  - var measured .015128850936889648 on CUDA before the change
  - var measured 0.001912832260131836 on CUDA after the change
- torch.randn(8000 ** 2)
  - std measured 0.11031460762023926 on CUDA before the change
  - std measured 0.0017833709716796875 on CUDA after the change

Timings for var and std are, as expected, similar.

On the CPU, however, the performance change from making the analogous update was more complicated, and ngimel and I decided not to remove CPU var_all and std_all. ngimel wrote the following script that showcases how single-threaded CPU inference would suffer from this change:

```
import torch
import numpy as np
from torch.utils._benchmark import Timer
from torch.utils._benchmark import Compare
import sys
base = 8
multiplier = 1

def stdfn(a):
    meanv = a.mean()
    ac = a-meanv
    return torch.sqrt(((ac*ac).sum())/a.numel())

results = []
num_threads=1
for _ in range(7):
    size = base*multiplier
    input = torch.randn(size)

    tasks = [("torch.var(input)", "torch_var"),
             ("torch.var(input, dim=0)", "torch_var0"),
             ("stdfn(input)", "stdfn"),
             ("torch.sum(input, dim=0)", "torch_sum0")
            ]
    timers = [Timer(stmt=stmt, num_threads=num_threads, label="Index", sub_label=f"{size}",
    description=label, globals=globals()) for stmt, label in tasks]
    repeats = 3

    for i, timer in enumerate(timers * repeats):
        results.append(
            timer.blocked_autorange()
        )
        print(f"\r{i + 1} / {len(timers) * repeats}", end="")
        sys.stdout.flush()
    multiplier *=10
print()

comparison = Compare(results)

comparison.print()
```

The TH timings using this script on my devfair are:

```
[------------------------------ Index ------------------------------]
        | torch_var | torch_var0 |  stdfn  | torch_sum0
1 threads: ----------------------------------------------------------
   8    |   16.0  |    5.6  |   40.9 |    5.0
   80    |   15.9  |    6.1  |   41.6 |    4.9
   800   |   16.7  |   12.0  |   42.3 |    5.0
   8000   |   27.2  |   72.7  |   51.5 |    6.2
   80000  |   129.0  |   715.0  |  133.0 |   18.0
   800000  |  1099.8  |  6961.2  |  842.0 |   112.6
   8000000 |  11879.8  |  68948.5  | 20138.4 |  1750.3
```

and the ATen timings are:

```
[------------------------------ Index ------------------------------]
               |  torch_var  |  torch_var0  |   stdfn   |  torch_sum0
1 threads: ----------------------------------------------------------
      8              |       4.3   |       5.4    |     41.4  |       5.4
      80            |       4.9   |       5.7    |     42.6  |       5.4
      800          |      10.7   |      11.7    |     43.3  |       5.5
      8000        |      69.3   |      72.2    |     52.8  |       6.6
      80000      |     679.1   |     676.3    |    129.5  |      18.1
      800000    |    6770.8   |    6728.8    |    819.8  |     109.7
      8000000  |   65928.2   |   65538.7    |  19408.7  |    1699.4
```

which demonstrates that performance is analogous to calling the existing var and std with `dim=0` on a 1D tensor. This would be a significant performance hit. Another simple script shows the performance is mixed when using multiple threads, too:

```
import torch
import time

# Benchmarking var and std, 1D with varying sizes
base = 8
multiplier = 1

op = torch.var
reps = 1000

for _ in range(7):
    size = base * multiplier
    t = torch.randn(size)
    elapsed = 0
    for _ in range(reps):
        start = time.time()
        op(t)
        end = time.time()
        elapsed += end - start
    multiplier *= 10

    print("Size: ", size)
    print("Avg. elapsed time: ", elapsed / reps)
```

```
var cpu TH vs ATen timings

Size:  8
Avg. elapsed time:  1.7853736877441406e-05 vs 4.9788951873779295e-06 (ATen wins)
Size:  80
Avg. elapsed time:  1.7803430557250977e-05 vs 6.156444549560547e-06 (ATen wins)
Size:  800
Avg. elapsed time:  1.8569469451904296e-05 vs 1.2302875518798827e-05 (ATen wins)
Size:  8000
Avg. elapsed time:  2.8756141662597655e-05 vs. 6.97789192199707e-05 (TH wins)
Size:  80000
Avg. elapsed time:  0.00026622867584228516 vs. 0.0002447957992553711 (ATen wins)
Size:  800000
Avg. elapsed time:  0.0010556647777557374 vs 0.00030616092681884767 (ATen wins)
Size:  8000000
Avg. elapsed time:  0.009990205764770508 vs 0.002938544034957886 (ATen wins)

std cpu TH vs ATen timings

Size:  8
Avg. elapsed time:  1.6681909561157225e-05 vs. 4.659652709960938e-06 (ATen wins)
Size:  80
Avg. elapsed time:  1.699185371398926e-05 vs. 5.431413650512695e-06 (ATen wins)
Size:  800
Avg. elapsed time:  1.768803596496582e-05 vs. 1.1279821395874023e-05 (ATen wins)
Size:  8000
Avg. elapsed time:  2.7791500091552735e-05  vs 7.031106948852539e-05 (TH wins)
Size:  80000
Avg. elapsed time:  0.00018650460243225096 vs 0.00024368906021118164 (TH wins)
Size:  800000
Avg. elapsed time:  0.0010522041320800782 vs 0.0003039860725402832 (ATen wins)
Size:  8000000
Avg. elapsed time:  0.009976618766784668 vs. 0.0029211788177490234 (ATen wins)
```

These results show the TH solution still performs better than the ATen solution with default threading for some sizes.

It seems like removing CPU var_all and std_all will require an improvement in ATen reductions. https://github.com/pytorch/pytorch/issues/40570 has been updated with this information.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43858

Reviewed By: zou3519

Differential Revision: D23498981

Pulled By: mruberry

fbshipit-source-id: 34bee046c4872d11c3f2ffa1b5beee8968b22050
2020-09-06 09:40:54 -07:00
Mike Ruberry
83a6e7d342 Adds inequality testing aliases for better NumPy compatibility (#43870)
Summary:
This PR adds the following aliaes:

- not_equal for torch.ne
- greater for torch.gt
- greater_equal for torch.ge
- less for torch.lt
- less_equal for torch.le

This aliases are consistent with NumPy's naming for these functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43870

Reviewed By: zou3519

Differential Revision: D23498975

Pulled By: mruberry

fbshipit-source-id: 78560df98c9f7747e804a420c1e53fd1dd225002
2020-09-06 09:36:23 -07:00
Nikita Shulga
e358d516c8 Revert D23549149: [PyTorch][Mobile] Insert the module name as name() to metadata dict if metadata doesn't contain "model_name"
Test Plan: revert-hammer

Differential Revision:
D23549149 (398409f072)

Original commit changeset: fad742a8d4e6

fbshipit-source-id: bd92a2033a804d3e6a2747b4fda4ca527991a993
2020-09-06 00:06:35 -07:00
Martin Yuan
70c8daf439 Apply selective build on RNN operators (#44132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44132

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43985

Added
```
def(detail::SelectiveStr<true>, ...)
impl(detail::SelectiveStr<true>, ...)
```
in torch/library, which can also be used for other templated selective registration.

Size saves for this diff:
fbios-pika: 78 KB
igios: 87 KB

Test Plan: Imported from OSS

Reviewed By: ljk53, smessmer

Differential Revision: D23459774

Pulled By: iseeyuan

fbshipit-source-id: 86d34cfe8e3f852602f203db06f23fa99af2c018
2020-09-05 23:47:51 -07:00
Muthu Arivoli
719d29dab5 Implement torch.i0 and torch.kaiser_window (#43132)
Summary:
Related to https://github.com/pytorch/pytorch/issues/38349

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43132

Reviewed By: smessmer

Differential Revision: D23479072

Pulled By: mruberry

fbshipit-source-id: 4fb1de44830771c6a7222cf19f7728d9ac7c043b
2020-09-05 23:11:47 -07:00
Yi Wang
396469f18c Explicitly forbidden the other inherited methods of RemoteModule. (#43895)
Summary:
Throw exceptions when the methods except for forwardXXX are used.

Original PR issue: RemoteModule enhancements #40550

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43895

Test Plan: buck test test/distributed/rpc:process_group_agent -- RemoteModule

Reviewed By: rohan-varma

Differential Revision: D23392842

Pulled By: SciPioneer

fbshipit-source-id: 7c09a55a03f9f0b7e9f9264a42bfb907607f4651
2020-09-05 14:48:56 -07:00
Supriya Rao
199c73be0f [quant][pyper] Support quantization of ops in fork-wait subgraph (#44048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44048

Inline the fork-wait calls to make sure we can see the ops to be quantized in the main graph

Also fix the InlineForkWait JIT pass to account for the case where the aten::wait call isn't present in the main graph
and we return future tensor from subgraph

Example

```
graph(%self.1 : __torch__.dper3.core.interop.___torch_mangle_6325.DperModuleWrapper,
       %argument_1.1 : Tensor,
       %argument_2.1 : Tensor):
   %3 : Future[Tensor[]] = prim::fork_0(%self.1, %argument_1.1, %argument_2.1) # :0:0
   return (%3)
 with prim::fork_0 = graph(%self.1 : __torch__.dper3.core.interop.___torch_mangle_5396.DperModuleWrapper,
       %argument_1.1 : Tensor,
       %argument_2.1 : Tensor):
   %3 : __torch__.dper3.core.interop.___torch_mangle_6330.DperModuleWrapper = prim::GetAttr[name="x"](%self.1)
   %4 : __torch__.dper3.core.interop.___torch_mangle_5397.DperModuleWrapper = prim::GetAttr[name="y"](%self.1)
   %5 : __torch__.dper3.core.interop.___torch_mangle_6327.DperModuleWrapper = prim::GetAttr[name="z"](%4)
   %6 : Tensor = prim::CallMethod[name="forward"](%5, %argument_1.1, %argument_2.1) # :0:0
   %7 : None = prim::CallMethod[name="forward"](%3, %6) # :0:0
   %8 : Tensor[] = prim::ListConstruct(%6)
   return (%8)
```

Test Plan:
python test/test_quantization.py test_interface_with_fork

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23481003

fbshipit-source-id: 2e756be73c248319da38e053f021888b40593032
2020-09-05 12:06:19 -07:00
Supriya Rao
164b96c34c [quant][pyper] make embedding_bag quantization static (#44008)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44008

embedding_bag requires only quantization of weights (no dynamic quantization of inputs)
So the type of quantization is essentially static (without calibration)
This will enable pyper to do fc and embedding_bag quantization using the same API call

Test Plan:
python test/test_quantization.py test_embedding_bag

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23467019

fbshipit-source-id: 41a61a17ee34bcb737ba5b4e19fb7a576d4aeaf9
2020-09-05 12:06:16 -07:00
Supriya Rao
a0ae416d60 [quant] Support aten::embedding_bag quantization in graph mode (#43989)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43989

When we trace the model it produces aten::embedding_bag node in the graph,
Add necessary passes in graph mode to help support quantizing it as well

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23460485

fbshipit-source-id: 328c5e1816cfebb10ba951113f657665b6d17575
2020-09-05 12:05:06 -07:00
Yi Wang
15a7368115 Add const to getTensors method of GradBucket. (#44126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44126

Add const to getTensors method of GradBucket.

Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest

Reviewed By: sinannasir, jiayisuse

Differential Revision: D23504088

fbshipit-source-id: 427d9591042e0c03cde02629c1146ff1e5e027f9
2020-09-05 09:19:42 -07:00
Elias Ellison
5bd2902796 [JIT] Remove references to no longer generated _tanh_backward and _sigmoid_backward (#44138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44138

If you look at the sigmoid and tanh backward they are composed of other ops: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L786
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L164

So tanh_backward and sigmoid_backward are no longer generated / legacy ops.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23543603

Pulled By: eellison

fbshipit-source-id: ce8353e53043cf969b536aac47c9576d66d4ce02
2020-09-05 01:41:36 -07:00
Elias Ellison
df67f0beab [TensorExpr fuser] Guard nodes that have tensor output properties determined by non-tensor inputs (#44137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44137

We only insert guards on Tensor types, so we rely on the output
of a node being uniquely determined by its input types.
bail if any non-Tensor input affects the output type
and cannot be reasoned about statically

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23543602

Pulled By: eellison

fbshipit-source-id: abd6fe0b1fd7fe6fc251694d4cd442b19c032dd7
2020-09-05 01:40:18 -07:00
Wanchao Liang
d07a36e0c1 Revert D23490149: [pytorch][PR] Compile less legacy code when BUILD_CAFFE2 is set to False
Test Plan: revert-hammer

Differential Revision:
D23490149 (15e99b6ff6)

Original commit changeset: a76382c30d83

fbshipit-source-id: 75057fa9af2c19eb976962552118bf0a99911b38
2020-09-04 22:59:39 -07:00
Vasiliy Kuznetsov
618b4dd763 fx quant prepare: clarify naming (#44125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44125

In `Quantizer._prepare`, `observed` was used for two different variables
with different types.  Making the names a bit cleaner and removing the
name conflict.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: dskhudia

Differential Revision: D23504109

fbshipit-source-id: 0f73eac3d6dd5f72ad5574a4d47d33808a70174a
2020-09-04 21:29:56 -07:00
Vasiliy Kuznetsov
a940f5ea5d torchscript graph mode quant: remove benchmark filter (#44165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44165

Allows convolutions to be quantized if `torch.cudnn.backends.benchmark`
flag was set.

Not for land yet, just testing.

Test Plan:
in the gist below, the resulting graph now has quantized convolutions
https://gist.github.com/vkuzo/622213cb12faa0996b6700b08d6ab2f0

Imported from OSS

Reviewed By: supriyar

Differential Revision: D23518775

fbshipit-source-id: 294f678c6afbd3feeb89b7a6655bc66ac9f8bfbc
2020-09-04 21:25:35 -07:00
Yuchen Huang
398409f072 [PyTorch][Mobile] Insert the module name as name() to metadata dict if metadata doesn't contain "model_name" (#44227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44227

As title
ghstack-source-id: 111490242

Test Plan: CI

Reviewed By: xcheng16

Differential Revision: D23549149

fbshipit-source-id: fad742a8d4e6f844f83495514cd60ff2bf0d5bcb
2020-09-04 21:18:12 -07:00
Nikita Shulga
15e99b6ff6 Compile less legacy code when BUILD_CAFFE2 is set to False (#44079)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44079

Reviewed By: walterddr

Differential Revision: D23490149

Pulled By: malfet

fbshipit-source-id: a76382c30d83127d180ec63ac15093a7297aae53
2020-09-04 20:04:21 -07:00
shubhambhokare1
f3bf6a41ca [ONNX] Update repeat op (#43430)
Summary:
Update repeat op so that the inputs to sizes argument can a mixture of dynamic and constant inputs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43430

Reviewed By: houseroad

Differential Revision: D23494257

Pulled By: bzinodev

fbshipit-source-id: 90c5e90e4f73e98f3a9d5c8772850e72cecdf0d4
2020-09-04 18:53:31 -07:00
Yi Wang
8b17fd2516 Add remote_parameters() into RemoteModule class. (#43906)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43906

This method returns a list of RRefs of remote parameters that can be fed into the DistributedOptimizer.

Original PR issue: RemoteModule enhancements #40550

Test Plan: buck test caffe2/test/distributed/rpc:process_group_agent -- RemoteModule

Reviewed By: rohan-varma

Differential Revision: D23399586

fbshipit-source-id: 4b0f1ccf2e47c8a9e4f79cb2c8668f3cdbdff820
2020-09-04 16:22:40 -07:00
neginraoof
3d7c22a2ce [ONNX] Enable new scripting passes for functionalization and remove_mutation (#43791)
Summary:
Duplicate of https://github.com/pytorch/pytorch/issues/41413
This PR initiates the process of updating the torchsciprt backend interface used by ONNX exporter.

Replace jit lower graph pass by freeze module pass

Enable ScriptModule tests for ONNX operator tests (ORT backend) and model tests by default.

Replace jit remove_inplace_ops pass with remove_mutation and consolidation all passes for handling inplace ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43791

Reviewed By: houseroad

Differential Revision: D23421872

Pulled By: bzinodev

fbshipit-source-id: a98710c45ee905748ec58385e2a232de2486331b
2020-09-04 15:21:45 -07:00
Zachary DeVito
2ad5a82c43 [fx] get rid of graph_module.root (#44092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44092

instead submodules and weights are installed directly on the
graph_module by transferring the original modules. This makes it more
likely that scripting will succeed (since we no longer have submodules
that are not used in the trace). It also prevents layered transforms
from having to special case handling of the `root` module. GraphModules
can now be re-traced as part of the input to other transforms.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23504210

Pulled By: zdevito

fbshipit-source-id: f79e5c4cbfc52eb0ffb5d6ed89b37ce35a7dc467
2020-09-04 11:35:32 -07:00
Mikhail Zolotukhin
6474057c76 Revert D23503636: [pytorch][PR] [NNC] make inlining immediate (take 2) and fix bugs
Test Plan: revert-hammer

Differential Revision:
D23503636 (70aecd2a7f)

Original commit changeset: cdbdc902b7a1

fbshipit-source-id: b5164835f874a56213de4bed9ad690164eae9230
2020-09-04 10:58:23 -07:00
neginraoof
539d029d8c [ONNX] Fix split export using slice (#43670)
Summary:
Fix for exporting split with fixed output shape using slice.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43670

Reviewed By: houseroad

Differential Revision: D23420318

Pulled By: bzinodev

fbshipit-source-id: 09c2b58049fe32dca2f2977d91dd64de6ee9a72f
2020-09-04 10:52:44 -07:00
James Reed
af13faf18b [FX] __str__ for GraphModule and Graph (#44166)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44166

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23520801

Pulled By: jamesr66a

fbshipit-source-id: f77e3466e435127ec01e66291964395f32a18992
2020-09-04 10:46:43 -07:00
Vinod Kumar S
2a1fc56694 replace the white list from default mappings (#41802)
Summary:
Replaced "whitelist" from default_mappings.py
Fixes https://github.com/pytorch/pytorch/issues/41756

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41802

Reviewed By: ngimel

Differential Revision: D23521452

Pulled By: malfet

fbshipit-source-id: 019a2d5c06dc59dc53d6c48b70fb35b216299cf4
2020-09-04 10:04:28 -07:00
Richard Zou
9a5a732866 Register some backwards functions as operators (#44052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44052

Summary
=======

This PR registers the following backwards functions as operators:
- slice_backward
- select_backward
- gather_backward
- index_select_backward (the backward function for index_select)
- select_index_backward (prevously known as index_select_backward, but is actually the backward function for max.dim, min.dim, etc)

In the future, I'd like to register more backward functions as operators
so that we can write batching rules for the backward functions. Batching
rules for backward functions makes it so that we can compute batched
gradients.

Motivation
==========
The rationale behind this PR is that a lot of backwards functions (27 in total)
are incompatible with BatchedTensor due to using in-place operations.
Sometimes we can allow the in-place operations, but other times we can't.
For example, consider select_backward:

```
Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) {
  auto grad_input = at::zeros(input_sizes, grad.options());
  grad_input.select(dim, index).copy_(grad);
  return grad_input;
}
```

and consider the following code:
```
x = torch.randn(5, requires_grad=True)
def select_grad(v):
    torch.autograd.grad(x[0], x, v)

vs = torch.randn(B0)
batched_grads = vmap(select_grad)(vs)
```

For the batched gradient use case, `grad` is a BatchedTensor.
The physical version of `grad` has size `(B0,)`.
However, select_backward creates a `grad_input` of shape `(5)`, and
tries to copy `grad` to a slice of it.

Other approaches
================

I've considered the following:
- register select_backward as an operator (this PR)
- have a branch inside select_backward for if `grad` is batched.
    - this is OK, but what if we have more tensor extensions that want to override this?
- modify select_backward to work with BatchedTensor, by creating a new operator for the "select + copy_ behavior".
    - select + copy_ isn't used elsewhere in derivative formulas so this doesn't seem useful

Test Plan
=========

- `pytest test/test_autograd.py -v`
- Registering backward functions may impact performance. I benchmarked
select_backward to see if registering it as an operator led to any noticable
performance overheads: https://gist.github.com/zou3519/56d6cb53775649047b0e66de6f0007dc.
The TL;DR is that the overhead is pretty minimal.

Test Plan: Imported from OSS

Reviewed By: ezyang, fbhuba

Differential Revision: D23481183

Pulled By: zou3519

fbshipit-source-id: 125af62eb95824626dc83d06bbc513262ee27350
2020-09-04 08:30:39 -07:00
Nikita Shulga
0c01f136f3 [BE] Use f-string in various Python functions (#44161)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44161

Reviewed By: seemethere

Differential Revision: D23515874

Pulled By: malfet

fbshipit-source-id: 868cf65aedd58fce943c08f8e079e84e0a36df1f
2020-09-04 07:38:25 -07:00
generatedunixname89002005287564
ef28ee50b0 [Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D23536086

fbshipit-source-id: 56e9c70a6998086515f59d74c5d8a2280ac2f669
2020-09-04 03:33:32 -07:00
Bert Maher
98ad5ff41f [te] Disable reductions by default (#44122)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44122

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D23504769

Pulled By: bertmaher

fbshipit-source-id: 1889217cd22da529e46ab30c9319a5646267e4ec
2020-09-03 23:37:45 -07:00
Martin Yuan
d221256888 [Message] Add what to do for missing operators.
Summary: As title.

Test Plan: N/A

Reviewed By: gaurav-work

Differential Revision: D23502416

fbshipit-source-id: a341eb10030e3f319266019ba4c02d9d9a0a6298
2020-09-03 22:41:27 -07:00
Nikita Shulga
b60ffcdfdd Enable typechecks for torch.nn.quantized.modules.linear (#44154)
Summary:
Also import `Optional` directly from `typing` rather than from `_jit_internal`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44154

Reviewed By: seemethere

Differential Revision: D23511833

Pulled By: malfet

fbshipit-source-id: f78c5fd679c002b218e4d287a9e56fa198171981
2020-09-03 19:52:49 -07:00
Zafar
69e38828f5 [quant] conv_transpose2d_prepack/conv_transpose2d_unpack (#40351)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40351

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D22158983

Pulled By: z-a-f

fbshipit-source-id: 3ca064c2d826609724b2740fcc9b9eb40556168d
2020-09-03 17:21:32 -07:00
Nick Gibson
70aecd2a7f [NNC] make inlining immediate (take 2) and fix bugs (#43885)
Summary:
A rework of `computeInline` which makes it work a bit better, particularly when combined with other transformations. Previously we stored Functions that were inlined and then deferred the actual inlining of the function body until prepareForCodgen was called. This has an issue when transformations are applied to the LoopNest: the function body can be different from what appears in the root_stmt and result in inlining that a) fails, b) reverses other transformations or c) a weird unpredictable combination of the two.

This PR changes that behaviour so that the inlining occurs in the root stmt immediately, which means it reflects any previous transformations and any future transformations have a true view of the internal IR. It also has the benefit that inspecting the root statement gives an accurate view of it without needing to call prepareForCodgen. I also removed the difference between `computeInline` and `computeInlineWithRand` and we handle calls to `rand()` in all branches.

This is a rework of https://github.com/pytorch/pytorch/issues/38696, with the agreed changes from ZolotukhinM and zheng-xq: we should only inline if the dimensions are trivial (ie. they are vars not exprs).

This PR is mostly tests, and I fixed a bunch of bugs I found along the way. Partial list:
* When inlining an expression involving rand, we would create random vars equal to the dimensionality of the enclosing Tensor not the produced Tensor - meaning we'd use an incorrect value if the inlined tensor was smaller. E.g: `X[i] = rand(); A[i, j] = X[i]` would produce a tensor where `A[0, 0] != A[0, 1]`. This is fixed by inserting the Let binding of the random variable at the correct loop body.
* When inlining we'd replace all calls to `rand()` rather than just those present in the Tensor being inlined.
* `rand()` was treated symbolically by the simplifier and we would aggregate or cancel calls to `rand()`. Have fixed the hasher to hash all calls to `rand()` distinctly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43885

Reviewed By: gmagogsfm

Differential Revision: D23503636

Pulled By: nickgg

fbshipit-source-id: cdbdc902b7a14d269911d978a74a1c11eab004fa
2020-09-03 16:49:24 -07:00
Mikhail Zolotukhin
3105d8a9b2 [TensorExpr] Fuser: rely on input types when checking whether a device is supported. (#44139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44139

Also, make sure that we're checking that condition when we're starting a
new fusion group, not only when we merge a node into an existing fusion
group. Oh, and one more: add a test checking that we're rejecting graphs
with unspecified shapes.

Differential Revision: D23507510

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: 9c268825ac785671d7c90faf2aff2a3e5985ac5b
2020-09-03 16:27:14 -07:00
Vasiliy Kuznetsov
71510c60ad fx qat: respect device affinity (#44115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44115

Fixes device affinity in the FX prepare pass for QAT. Before this PR, observers
were always created on CPU. After this PR, observers are created on the
same device as the rest of the model. This will enable QAT prepare to
work regardless of whether users move the model to cuda before or after
calling this pass.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_qat_prepare_device_affinity
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D23502291

fbshipit-source-id: ec4ed20c21748a56a25e3395b35ab8640d71b5a8
2020-09-03 16:16:59 -07:00
Meghan Lele
7816d53798 [JIT] Add mypy type annotations for JIT (#43862)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43862

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23491151

Pulled By: SplitInfinity

fbshipit-source-id: 88367b89896cf409bb9ac3db7490d6779efdc3a4
2020-09-03 15:09:24 -07:00
Michael Suo
9dd8670d7d [jit] Better match behavior of loaded ScriptModules vs. freshly created ones (#43298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43298

IR emitter uses `ModuleValue` to represent ScriptModules and emit IR for
attribute access, submodule access, etc.

`ModuleValue` relies on two pieces of information, the JIT type of the
module, and the `ConcreteModuleType`, which encapsulates Python-only
information about the module.

ScriptModules loaded from a package used to create a dummy
ConcreteModuleType without any info in it. This led to divergences in
behavior during compilation.

This PR makes the two ways of constructing a ConcreteModuleType equivalent,
modulo any py-only information (which, by definition, is never present in
packaged files anyway).

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23228738

Pulled By: suo

fbshipit-source-id: f6a660f42272640ca1a1bb8c4ee7edfa2d1b07cc
2020-09-03 15:03:39 -07:00
Michael Suo
74f18476a2 [jit] fix segfault in attribute lookup on loaded ScriptModules (#43284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43284

The IR emitter looks for attributes on modules like:
1. Check the JIT type for the attribute
2. Check the originating Python class, in order to fulfill requests for, e.g. static methods or ignored methods.

In the case where you do:
```
inner_module = torch.jit.load("inner.pt")
wrapped = Wrapper(inner_module)  # wrap the loaded ScriptModule in an nn.Module
torch.jit.script(wrapped)
```

The IR emitter may check for attributes on `inner_module`. There is no
originating Python class for `inner_module`, since it was directly
compiled from the serialized format.

Due to a bug in the code, we don't guard for this case an a segfault
results if the wrapper asks for an undefined attribute. The lookup in
this case looks like:
1. Check the JIT type for the attribute (not there!)
2. Check the originating Python class (this is a nullptr! segfault!)

This PR guards this case and properly just raises an attribute missing
compiler error instead of segfaulting.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23224337

Pulled By: suo

fbshipit-source-id: 0cf3060c427f2253286f76f646765ec37b9c4c49
2020-09-03 15:01:59 -07:00
Elias Ellison
6868bf95c6 [JIT] Fuser match on schemas not node kind (#44083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44083

Match on the complete schema of a node instead of its node kind when deciding to fuse it. Previously we matched on node kind, which could fail with something like `aten::add(int, int)` and if a new overload was added to an op without corresponding NNC support we would fuse it.

Follow ups are:
 - bail when an output tensor type isnt uniquely determined by the input types (e.g. aten::add and the second input could be either a float or an int)
- remove NNC lowering for _tanh_backward & _sigmoid_backward
- Validate that we support all of the overloads here. I optimistically added ops that included Tensors, it's possible that we do not support every overload here. This isn't a regression, and this PR is at least improving our failures in that regard.

I can do any of these as part of this PR if desired, but there are a number of failures people have run into that this PR fixes so I think it would be good to land this sooner than later.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D23503704

Pulled By: eellison

fbshipit-source-id: 3ce971fb1bc3a7f1cbaa38f1ed853e2db3d67c18
2020-09-03 14:47:19 -07:00
Ann Shan
9b3c72d46e [pytorch] Make mobile find_method return an optional (#43965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43965

As part of a larger effort to unify the API between the lite interpreter and full JIT:
- implement torch::jit::mobile::Method, a proxy for torch::jit::mobile::Function
- add support for overloaded operator() to mobile Method and Function
- mobile find_method now returns a c10::optional<Method> (so signature matches full jit)
- moves some implementation of Function from module.cpp to function.cpp
ghstack-source-id: 111161942

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D23330762

fbshipit-source-id: bf0ba0d711d9566c92af31772057ecd35983ee6d
2020-09-03 14:46:18 -07:00
Nikolay Korovaiko
f91bdbeabd Enable function calls in TEFuser and SpecializeAutogradZero (#43866)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43866

Reviewed By: ezyang

Differential Revision: D23452798

Pulled By: Krovatkin

fbshipit-source-id: 2cff4c905bf1b5d9de56e7869458ffa6fce1f1b5
2020-09-03 14:42:52 -07:00
Zafar
e05fa2f553 [quant] Prep for conv_transpose packing (#39714)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39714

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D22087071

Pulled By: z-a-f

fbshipit-source-id: 507f8a414026eb4c9926f68c1e94d2f56119bca6
2020-09-03 14:10:32 -07:00
Yanan Cao
f3da9e3b50 Enable Enum pickling/unpickling. (#43188)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **https://github.com/pytorch/pytorch/issues/43188 Enable Enum pickling/unpickling.**
* https://github.com/pytorch/pytorch/issues/42963 Add Enum TorchScript serialization and deserialization support
* https://github.com/pytorch/pytorch/issues/42874 Fix enum constant printing and add FileCheck to all Enum tests
* https://github.com/pytorch/pytorch/issues/43121 Add Enum convert back to Python object support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43188

Reviewed By: zdevito

Differential Revision: D23365141

Pulled By: gmagogsfm

fbshipit-source-id: f0c93d4ac614dec047ad8640eb6bd9c74159b558
2020-09-03 13:51:02 -07:00
Rohan Varma
3806c939bd Polish DDP join API docstrings (#43973)
Summary:
Polishes DDP join api docstrings and makes a few minor cosmetic changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43973

Reviewed By: zou3519

Differential Revision: D23467238

Pulled By: rohan-varma

fbshipit-source-id: faf0ee56585fca5cc16f6891ea88032336b3be56
2020-09-03 13:39:45 -07:00
Nikita Shulga
442684cb25 Enable typechecks for torch.nn.modules.[activation|upsampling] (#44093)
Summary:
Add missing `hardsigmoid`, `silu`, `hardswish` and `multi_head_attention_forward` to functional.pyi.in
 Embed some typing annotations into functional.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44093

Reviewed By: ezyang

Differential Revision: D23494384

Pulled By: malfet

fbshipit-source-id: 27023c16ff5951ceaebb78799c4629efa25f7c5c
2020-09-03 13:20:04 -07:00
Kimish Patel
a153f69417 Fix replaceAtenConvolution for BC. (#44036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44036

Running replaceAtenConvolution on older traced model wont work as
_convolution signature has changed and replaceAtenConvolution was
changed to account for that.
But we did not preserve the old behavior during that. This change
restores the old behavior while keeing the new one.

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23476775

fbshipit-source-id: 73a0c2b7387f2a8d82a8d26070d0059972126836
2020-09-03 12:57:57 -07:00
Kimish Patel
ba65cce2a2 Fix transposed conv2d rewrite pattern to account for convolution api (#44035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44035

change

Also added test so as to capture such cases for future.

Test Plan:
python test/test_xnnpack_integration.py

Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D23476773

fbshipit-source-id: a62c4429351c909245106a70b4c60b1bacffa817
2020-09-03 12:55:43 -07:00
Bert Maher
55ff9aa185 Test TE fuser unary ops and fix sigmoid(half) (#44094)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44094

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D23494950

Pulled By: bertmaher

fbshipit-source-id: 676c4e57267c4ad92065ea90b06323918dd5b0de
2020-09-03 12:48:46 -07:00
Gregory Chanan
49215d7f26 For CriterionTests, have check_gradgrad actually only affect gradgrad checks. (#44060)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44060

Right now it skips grad checks as well.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23484018

Pulled By: gchanan

fbshipit-source-id: 24a8f1af41f9918aaa62bc3cd78b139b2f8de1e1
2020-09-03 12:29:32 -07:00
Meghan Lele
de672e874d [JIT] Improve error message for unsupported Optional types (#44054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44054

**Summary**
This commit improves the error message that is printed when an
`Optional` type annotation with an unsupported contained type is
encountered. At present, the `Optional` is printed as-is, and
`Optional[T]` is syntatic sugar for `Union[T, None]`, so that is what
shows up in the error message and can be confusing. This commit modifies
the error message so that it prints `T` instead of `Union[T, None]`.

**Test Plan**
Continuous integration.

Example of old message:
```
AssertionError: Unsupported annotation typing.Union[typing.List, NoneType] could not be resolved.
```
Example of new message:
```
AssertionError: Unsupported annotation typing.Union[typing.List, NoneType] could not be resolved because typing.List could not be resolved.
```

**Fixes**
This commit fixes #42859.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23490365

Pulled By: SplitInfinity

fbshipit-source-id: 2aa9233718e78cf1ba3501ae11f5c6f0089e29cd
2020-09-03 11:55:06 -07:00
Xingying Cheng
c59e11bfbb Add soft error reporting to capture all the inference runtime failure. (#44078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44078

When PyTorch mobile inference failed and throw exception, if caller catch and not crash the app, we are not able to track all the inference failures.

So we are adding native soft error reporting to capture all the failures occurring during module loading and running including both crashing and on-crashing failures. Since c10::Error has good error messaging stack handling (D21202891 (a058e938f9)), we are utilizing it for the error handling and message print out.
ghstack-source-id: 111307080

Test Plan:
Verified that the soft error reporting is sent through module.cpp when operator is missing, make sure a logview mid is generated with stack trace: https://www.internalfb.com/intern/logview/details/facebook_android_softerrors/5dd347d1398c1a9a73c804b20f7c2179/?selected-logview-tab=latest.

Error message with context is logged below:

```
soft_error.cpp		[PyTorchMobileInference] : Error occured during model running entry point: Could not run 'aten::embedding' with arguments from the 'CPU' backend. 'aten::embedding' is only available for these backends: [BackendSelect, Named, Autograd, Autocast, Batched, VmapMode].

BackendSelect: fallthrough registered at xplat/caffe2/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at xplat/caffe2/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Autograd: fallthrough registered at xplat/caffe2/aten/src/ATen/core/VariableFallbackKernel.cpp:31 [backend fallback]
Autocast: fallthrough registered at xplat/caffe2/aten/src/ATen/autocast_mode.cpp:253 [backend fallback]
Batched: registered at xplat/caffe2/aten/src/ATen/BatchingRegistrations.cpp:317 [backend fallback]
VmapMode: fallthrough registered at xplat/caffe2/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

Exception raised from reportError at xplat/caffe2/aten/src/ATen/core/dispatch/OperatorEntry.cpp:261 (m
```

Reviewed By: iseeyuan

Differential Revision: D23428636

fbshipit-source-id: 82d5d9c054300dff18d144f264389402d0b55a8a
2020-09-03 10:54:43 -07:00
Gregory Chanan
5973b44d9e Rename NewCriterionTest to CriterionTest. (#44056)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44056

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23482573

Pulled By: gchanan

fbshipit-source-id: dde0f1624330dc85f48e5a0b9d98fb55fdb72f68
2020-09-03 10:29:20 -07:00
Sinan Nasir
98320061ad DDP Communication hook: (Patch) Fix the way we pass future result to buckets. (#43734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43734

Following the additional GH comments on the original PR https://github.com/pytorch/pytorch/pull/43307.
ghstack-source-id: 111327130

Test Plan: Run `python test/distributed/test_c10d.py`

Reviewed By: smessmer

Differential Revision: D23380288

fbshipit-source-id: 4b8889341c57b3701f0efa4edbe1d7bbc2a82ced
2020-09-03 08:59:10 -07:00
Gregory Chanan
cae52b4036 Merge CriterionTest into NewCriterionTest. (#44055)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44055

There is no functional change here.  Another patch will rename NewCriterionTest to CriterionTest.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23482572

Pulled By: gchanan

fbshipit-source-id: de364579067e2cc9de7df6767491f8fa3a685de2
2020-09-03 08:14:34 -07:00
Andrew Jones
24ca6aab02 Improves type-checking guards. (#43339)
Summary:
PR https://github.com/pytorch/pytorch/issues/38157 fixed type checking for mypy by including `if False` guards on some type-checker-only imports. However other typecheckers - [like pyright](https://github.com/microsoft/pylance-release/issues/262#issuecomment-677758245) - will respect this logic and ignore the imports. Using [`if TYPE_CHECKING`](https://docs.python.org/3/library/typing.html#typing.TYPE_CHECKING) instead means both mypy and pyright will work correctly.

[For background, an example of where the current code fails](https://github.com/microsoft/pylance-release/issues/262) is if you make a file `tmp.py` with the contents
```python
import torch
torch.ones((1,))
```
Then [`pyright tmp.py --lib`](https://github.com/microsoft/pyright#command-line) will fail with a `"ones" is not a known member of module` error. This is because it can't find the `_VariableFunctions.pyi` stub file, as pyright respects the `if False` logic. After adding the `TYPE_CHECKING` guard, all works correctly.

Credit to erictraut for suggesting the fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43339

Reviewed By: agolynski

Differential Revision: D23348142

Pulled By: ezyang

fbshipit-source-id: c8a58122a7b0016845c311da39a1cc48748ba03f
2020-09-03 07:45:53 -07:00
Gregory Chanan
68a1fbe308 Allow criterion backwards test on modules requiring extra args (i.e. CTCLoss). (#44050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44050

We don't actually turn on the CTCLoss tests since they fail, but this allows you to toggle check_forward_only and for the code to actually run.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23481091

Pulled By: gchanan

fbshipit-source-id: f2a3b0a2dee27341933c5d25f1e37a878b04b9f6
2020-09-03 07:41:21 -07:00
Gregory Chanan
5f89aa36cf Actually run backward criterion tests. (#44030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44030

This looks to have been a mistake from https://github.com/pytorch/pytorch/pull/9287.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23476274

Pulled By: gchanan

fbshipit-source-id: 81ed9d0c9a40d49153fc97cd69fdcd469bec0c73
2020-09-03 07:39:13 -07:00
Mike Ruberry
665feda15b Adds opinfo-based autograd tests and (un)supported dtype tests (#43451)
Summary:
This PR adds a new test suite, test_ops.py, designed for generic tests across all operators with OpInfos. It currently has two kinds of tests:

- it validates that the OpInfo has the correct supported dtypes by verifying that unsupported dtypes throw an error and supported dtypes do not
- it runs grad and gradgrad checks on each op and its variants (method and inplace) that has an OpInfo

This is a significant expansion and simplification of the current autogenerated autograd tests, which spend considerable processing their inputs. As an alternative, this PR extends OpInfos with "SampleInputs" that are much easier to use. These sample inputs are analogous to the existing tuples in`method_tests()`.

Future PRs will extend OpInfo-based testing to other uses of `method_tests()`, like test_jit.py, to ensure that new operator tests can be implemented entirely using an OpInfo.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43451

Reviewed By: albanD

Differential Revision: D23481723

Pulled By: mruberry

fbshipit-source-id: 0c2cdeacc1fdaaf8c69bcd060d623fa3db3d6459
2020-09-03 02:50:48 -07:00
Milind Yishu Ujjawal
ab7606702c Rectified a few grammatical errors in documentation (#43695)
Summary:
Rectified a few grammatical errors in documentation of pytorch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43695

Reviewed By: anjali411

Differential Revision: D23451600

Pulled By: ezyang

fbshipit-source-id: bc7b34c240fde1b31cac811080befa2ff2989395
2020-09-02 23:59:45 -07:00
Mikhail Zolotukhin
40fec4e739 [TensorExpr] Fuser: do not fuse ops with 0-dim tensors. (#44073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44073

We don't have a proper support on NNC and JIT IR->NNC lowering side for it yet.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D23487905

Pulled By: ZolotukhinM

fbshipit-source-id: da0da7478fc8ce7b455176c95d8fd610c94352c1
2020-09-02 22:59:04 -07:00
Mikhail Zolotukhin
3da82aee03 [JIT] Remove profile nodes before BatchMM. (#43961)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43961

Currently we're removing prim::profile nodes and embed the type info
directly in the IR right before the fuser, because it is difficult to
fuse in a presence of prim::profile nodes. It turns out that BatchMM has
a similar problem: it doesn't work when there are prim::profile nodes in
the graph. These two passes run next to each other, so we could simply
remove prim::profile nodes slightly earlier: before the BatchMM pass.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23453266

Pulled By: ZolotukhinM

fbshipit-source-id: 92cb50863962109b3c0e0112e56c1f2cb7467ff1
2020-09-02 22:57:39 -07:00
Gao, Xiang
37658b144b Remove useless py2 compatibility import __future__, part 1 (#43808)
Summary:
To avoid conflicts, this PR does not remove all imports. More are coming in further PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43808

Reviewed By: wanchaol

Differential Revision: D23436675

Pulled By: ailzhang

fbshipit-source-id: ccc21a1955c244f0804277e9e47e54bfd23455cd
2020-09-02 19:15:11 -07:00
Mikhail Zolotukhin
b2aaf212aa [TensorExpr] Add option to enforce TensorExprKernel fallbacks. (#43972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43972

It is useful when debugging a bug to disable NNC backend to see whether
the bug is there or in the fuser logic.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23455624

Pulled By: ZolotukhinM

fbshipit-source-id: f7c0452a29b860afc806e2d58acf35aa89afc060
2020-09-02 18:34:24 -07:00
Bert Maher
33d51a9b32 Respect canFuseOn{CPU,GPU} in TE fuser (#43967)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967

Test Plan: Imported from OSS

Reviewed By: asuhan

Differential Revision: D23469048

Pulled By: bertmaher

fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb
2020-09-02 18:00:25 -07:00
anjali411
129f406062 Make torch.conj() a composite function and return self for real tensors (#43270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43270

`torch.conj` is a very commonly used operator for complex tensors, but it's mathematically a no op for real tensors. Switching to tensorflow gradients for complex tensors (as discussed in #41857) would involve adding `torch.conj()` to the backward definitions for a lot of operators. In order to preserve autograd performance for real tensors and maintain numpy compatibility for `torch.conj`, this PR updates `torch.conj()` which behaves the same for complex tensors but performs a view/returns `self` tensor for tensors of non-complex dtypes. The documentation states that the returned tensor for a real input shouldn't be mutated. We could perhaps return an immutable tensor for this case in future when that functionality is available (zdevito ezyang ).

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23460493

Pulled By: anjali411

fbshipit-source-id: 3b3bf0af55423b77ff2d0e29f5d2c160291ae3d9
2020-09-02 17:06:04 -07:00