Commit Graph

11958 Commits

Author SHA1 Message Date
Nikita Shulga
63fd257879 Add Ellipsis constant to the list of recognized tokens (#44959)
Summary:
Per https://docs.python.org/3.6/library/constants.html
> `Ellipsis` is the same as ellipsis literal `...`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44959

Reviewed By: suo

Differential Revision: D23785660

Pulled By: malfet

fbshipit-source-id: f68461849e7d16ef68042eb96566f2c936c06b0f
2020-09-22 09:05:25 -07:00
albanD
e155fbe915 add warning when ParameterList/Dict is used with DataParallel (#44405)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44405

Test Plan: Imported from OSS

Reviewed By: agolynski

Differential Revision: D23783987

Pulled By: albanD

fbshipit-source-id: 5018b0d381cb09301d2f88a98a910854f740ace1
2020-09-22 08:58:00 -07:00
Rong Rong
4a0aa69a66 Fix undefined variable 'namedshape' in tensor.py (#45085)
Summary:
Hot Fix

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45085

Reviewed By: malfet, seemethere

Differential Revision: D23824444

Pulled By: walterddr

fbshipit-source-id: c9f37b394d281b7ef44b14c30699bb7510a362a7
2020-09-22 08:52:47 -07:00
anjali411
58b6ab69e5 torch.sgn for complex tensors (#39955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39955

resolves https://github.com/pytorch/pytorch/issues/36323 by adding `torch.sgn` for complex tensors.
`torch.sgn` returns `x/abs(x)` for `x != 0` and returns `0 + 0j` for `x==0`

This PR doesn't test the correctness of the gradients. It will be done as a part of auditing all the ops in future once we decide the autograd behavior (JAX vs TF) and add gradchek.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23460526

Pulled By: anjali411

fbshipit-source-id: 70fc4e14e4d66196e27cf188e0422a335fc42f92
2020-09-22 08:24:53 -07:00
Bugra Akyildiz
1b059f2c6d Directly use work.result() to retrieve tensor rather than passing as a separate argument (#44914)
Summary:
We currently are fetching an allreduced tensor from Python in C++ in, where we are storing the resulting tensor in a struct's parameter. This PR removes extra tensor paratemeter in the function parameter and fetch from a single place.

Fixes https://github.com/pytorch/pytorch/issues/43960

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44914

Reviewed By: rohan-varma

Differential Revision: D23798888

Pulled By: bugra

fbshipit-source-id: ad1b8c31c15e3758a57b17218bbb9dc1f61f1577
2020-09-22 06:28:47 -07:00
Jerry Zhang
5aed75b21b [quant][graphmode][jit] Try to support append (#44641)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44641

Test Plan: Imported from OSS

Reviewed By: z-a-f

Differential Revision: D23682356

fbshipit-source-id: 09a03dfde0b1346a5764e8e28ba56e32b343d239
2020-09-21 23:13:56 -07:00
Gao, Xiang
2111ec3bf3 CUDA BFloat16 losses (#45011)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45011

Reviewed By: mruberry

Differential Revision: D23805840

Pulled By: ngimel

fbshipit-source-id: 3eb60d4367c727100763879e20e9df9d58bf5ad6
2020-09-21 22:51:17 -07:00
Ksenija Stanojevic
0dda65ac77 [ONNX] add jit pass for lists (#43820)
Summary:
Add jit preprocessing pass for adding int lists.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43820

Reviewed By: albanD

Differential Revision: D23674598

Pulled By: bzinodev

fbshipit-source-id: 35766403a073e202563bba5251c07efb7cc5cfb1
2020-09-21 22:05:25 -07:00
Shen Li
09e7f62ce2 Fix RPC and ProcessGroup GIL deadlock (#45088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45088

Fixes #45082

Found a few problems while working on #44983

1. We deliberately swallow RPC timeouts during shutdown, as we haven't
found a good way to handle those. When we convert `_wait_all_workers`
into `_all_gather`, the same logic was inherited. However, as
`_all_gather` meant to be used in more general scenarios, we should
no longer keep silent about errors. This commit let the error throw
in `_all_gather` and also let `shutdown()` to catch them and log.
2. After fixing (1), I found that `UnpickledPythonCall` needs to
acquire GIL on destruction, and this can lead to deadlock when used
in conjuction with `ProcessGroup`. Because `ProcessGroup` ctor is a
synchronization point which holds GIL. In `init_rpc`, followers
(`rank != 0`) can exit before the leader (`rank == 0`). If the two
happens together, we could get a) on a follower, it exits `init_rpc`
after running `_broadcast_to_followers` and before the reaching dtor
of `UnpickledPythonCall`. Then it runs the ctor of `ProcessGroup`,
which holds the GIL and wait for the leader to join. However, the
leader is waiting for the response from `_broadcast_to_followers`,
which is blocked by the dtor of `UnpickledPythonCall`. And hence
the deadlock. This commit drops the GIL in `ProcessGroup` ctor.
3. After fixing (2), I found that `TensorPipe` backend
nondeterministically fails with `test_local_shutdown`, due to a
similar reason as (2), but this time it is that `shutdown()` on a
follower runs before the leader finishes `init_rpc`. This commit
adds a join for `TensorPipe` backend `init_rpc` after `_all_gather`.

The 3rd one should be able to solve the 2nd one as well. But since
I didn't see a reason to hold GIL during `ProcessGroup` ctor, I
made that change too.

Test Plan: Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D23825592

Pulled By: mrshenli

fbshipit-source-id: 94920f2ad357746a6b8e4ffaa380dd56a7310976
2020-09-21 21:47:27 -07:00
Lin.Sung
f77ba0e48c Change typo 'momemtum' to 'momentum' (#45045)
Summary:
As the title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45045

Reviewed By: mruberry

Differential Revision: D23808563

Pulled By: mrshenli

fbshipit-source-id: ca818377f4c23d67b037c146fef667ab8731961e
2020-09-21 19:03:26 -07:00
Nikita Shulga
81bb19c9f0 [JIT] Prohibit subscripted assignments for tuple types (#44929)
Summary:
This would force jit.script to raise an error if someone tries to mutate tuple
```
Tuple[int, int] does not support subscripted assignment:
  File "/home/nshulga/test/tupleassignment.py", line 9
torch.jit.script
def foo(x: Tuple[int, int]) -> int:
    x[-1] = x[0] + 1
    ~~~~~ <--- HERE
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44929

Reviewed By: suo

Differential Revision: D23777668

Pulled By: malfet

fbshipit-source-id: 8efaa4167354ffb4930ccb3e702736a3209151b6
2020-09-21 16:35:44 -07:00
Xiang Gao
581a364437 CUDA BFloat16 unary ops part 1 (#44813)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44813

Reviewed By: mruberry

Differential Revision: D23805816

Pulled By: ngimel

fbshipit-source-id: 28c645dc31f094c8b6c3d3803f0b4152f0475a64
2020-09-21 14:22:31 -07:00
ahassan@azavea.com
1cab27d485 Add a torch.hub.load_local() function that can load models from any local directory with a hubconf.py (#44204)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43622

- Moves the model loading part of `torch.hub.load()` into a new `torch.hub.load_local()` function that takes in a path to a local directory that contains a `hubconf.py` instead of a repo name.
- Refactors `torch.hub.load()` so that it now calls `torch.hub.load_local()` after downloading and extracting the repo.
- Updates `torch.hub` docs to include the new function + minor fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44204

Reviewed By: malfet

Differential Revision: D23817429

Pulled By: ailzhang

fbshipit-source-id: 788fd83c87a94f487b558715b2809d346ead02b2
2020-09-21 14:17:21 -07:00
James Reed
c941dd3492 [FX] s/get_param/get_attr/ (#45000)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45000

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D23798016

Pulled By: jamesr66a

fbshipit-source-id: 1d2f3db1994a62b95d0ced03bf958e54d30c35dd
2020-09-21 14:09:32 -07:00
Ailing Zhang
92f8f75c59 Add alias dispatch key Math. (#44354)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44354

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23591481

Pulled By: ailzhang

fbshipit-source-id: 6e93c4ec99a07f3fc920ba2d09dc222e6ced5adf
2020-09-21 11:10:39 -07:00
Lucas Hosseini
ac8c7c4e9f Make Channel API accept buffer structs rather than raw pointers. (#45014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45014

Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/219

Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/212

+ Introduce buffer.h defining the buffer struct(s). The `CpuBuffer`
struct is always defined, while the `CudaBuffer` struct is defined
only when `TENSORPIPE_SUPPORTS_CUDA` is true.
+ Update all channels to take a `CpuBuffer` or `CudaBuffer` for
`send`/`recv` rather than a raw pointer and a length.
+ Make the base `Channel`/`Context` classes templated on `TBuffer`,
effectively creating two channel hierarchies (one for CPU channels,
one for CUDA channels).
+ Update the Pipe and the generic channel tests to use the new API. So
far, generic channel tests are CPU only, and tests for the CUDA IPC
channel are (temporarily) disabled. A subsequent PR will take care of
refactoring tests so that generic tests work for CUDA channels. An
other PR will add support for CUDA tensors in the Pipe.

Differential Revision: D23598033

Test Plan: Imported from OSS

Reviewed By: lw

Pulled By: beauby

fbshipit-source-id: 1d6c3f91e288420858835cd5e7962e8da051b44b
2020-09-21 10:18:45 -07:00
Nick Gibson
4bbb6adff5 [NNC] fix SyncThreads insertion and reenable CudaSharedMem test (#44909)
Summary:
A previous fix for masking Cuda dimensions (https://github.com/pytorch/pytorch/issues/44733) changed the behaviour of inserting thread synchronization barriers in the Cuda CodeGen, causing the CudaSharedMemReduce_1 to be flaky and ultimately disabled.

The issue is working out where these barriers must be inserted - solving this optimally is very hard, and I think not possible without dependency analysis we don't have, so I've changed our logic to be quite pessimistic. We'll insert barriers before and after any blocks that have thread dimensions masked (even between blocks that have no data dependencies). This should be correct, but it's an area we could improve performance. To address this somewhat I've added a simplifier pass that removes obviously unnecessary syncThreads.

To avoid this test being flaky again, I've added a check against the generated code to ensure there is a syncThread in the right place.

Also fixed a couple of non-functional but clarity issues in the generated code: fixed the missing newline after Stores in the CudaPrinter, and prevented the PrioritizeLoad mutator from pulling out loads contained within simple Let statements (such as those produced by the Registerizer).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44909

Reviewed By: agolynski

Differential Revision: D23800565

Pulled By: nickgg

fbshipit-source-id: bddef1f40d8d461da965685f01d00b468d8a2c2f
2020-09-21 09:27:22 -07:00
Gregory Chanan
a6895d43b6 Turn on gradgrad check for BCELoss Criterion Tests. (#44894)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44894

Looks like we added double backwards support but only turned on the ModuleTests.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23762544

Pulled By: gchanan

fbshipit-source-id: b5cef579608dd71f3de245c4ba92e49216ce8a5e
2020-09-21 07:14:22 -07:00
Kaushik Ram Sadagopan
4810365576 Enabled torch.testing._internal.jit_utils.* typechecking. (#44985)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44985

Reviewed By: malfet

Differential Revision: D23794444

Pulled By: kauterry

fbshipit-source-id: 9893cc91780338a8223904fb574efa77fa3ab2b9
2020-09-21 01:19:06 -07:00
anjali411
9f67176b82 Complex gradcheck logic (#43208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43208

This PR adds gradcheck for complex. The logic used for complex gradcheck is described in Section 3.5.3 here: https://arxiv.org/pdf/1701.00392.pdf

More concretely, this PR introduces the following changes:
1. Updates get_numerical_jacobian to take as input a scalar value for vector (v). Adds gradcheck logic for C -> C, C-> R, R -> C. For R -> C functions, only the real value of gradient is propagated.
2. Adds backward definition for `torch.complex` and also adds a test to verify the definition added.
3. Updates backward for `mul`, `sin`, `cos`, `sinh`, `cosh`.
4. Adds tests for all `torch.real`, `torch.imag`, `torch.view_as_real`, `torch.view_as_complex`, `torch.conj`.

Follow up tasks:
1. Add more thorough tests for R -> C cases. Specifically, add R->C test variants for functions. for e.g., `torch.mul(complex_tensor, real_tensor)`
2. Add back commented test in `common_methods_invocation.py`.
3. Add more special case checking for complex gradcheck to make debugging easier.
4. Update complex autograd note.
5. disable complex autograd for operators not tested for complex.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23655088

Pulled By: anjali411

fbshipit-source-id: caa75e09864b5f6ead0f988f6368dce64cf15deb
2020-09-20 22:05:04 -07:00
Peter Bell
da7863f46b Add one dimensional FFTs to torch.fft namespace (#43011)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43011

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23751850

Pulled By: mruberry

fbshipit-source-id: 8dc5fec75102d8809eeb85a3d347ba1b5de45b33
2020-09-19 23:32:22 -07:00
Mike Ruberry
60709ad1bf Adds multiply and divide aliases (#44463)
Summary:
These alias are consistent with NumPy. Note that C++'s naming would be different (std::multiplies and std::divides), and that PyTorch's existing names (mul and div) are consistent with Python's dunders.

This also improves the instructions for adding an alias to clarify that dispatch keys should be removed when copying native_function.yaml entries to create the alias entries.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44463

Reviewed By: ngimel

Differential Revision: D23670782

Pulled By: mruberry

fbshipit-source-id: 9f1bdf8ff447abc624ff9e9be7ac600f98340ac4
2020-09-19 15:47:52 -07:00
Vasiliy Kuznetsov
2163d31016 histogram observer: ensure buffer shape consistency (#44956)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44956

Makes buffer shapes for HistogramObserver have the
same shapes in uninitialized versus initialized states.

This is useful because the detectron2 checkpointer assumes
that these states will stay the same, so it removes the
need for manual hacks around the shapes changing.

Test Plan:
```
python test/test_quantization.py TestObserver.test_histogram_observer_consistent_buffer_shape
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D23785382

fbshipit-source-id: 1a83fd4f39b244b00747c368d5d305a07d877c92
2020-09-19 09:29:39 -07:00
Xiao Wang
d75c402755 Add cusolver to build, rewrite MAGMA inverse with cusolver (#42403)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42265

This PR adds cusolver to the pytorch build, and enables the use of cusolver/cublas library functions on GPU `torch.inverse` on certain tensor shapes.

Specifically, when

* the tensor is two dimensional (single batch), or
* has >2 dimensions (multiple batches) and `batch_size <= 2`, or
* magma is not linked,

cusolver/cublas will be used. In other conditions, the current implementation of MAGMA will still be used.

8c0949ae45/aten/src/ATen/native/cuda/BatchLinearAlgebra.cu (L742-L752)

The reason for this is that for tensors with large batch_size, `cublasXgetrfBatched` and `cublasXgetriBatched` doesn't perform very well. For `batch_size > 1`, we launch cusolver functions in multiple streams. This lets cusolver functions run in parallel, and can greatly increase the performance. When `batch_size > 2`, the parallel launched cusolver functions are slightly slower than the current magma implementation, so we still use the current magma impl.

On CUDA 9.2, there were some numerical issues detected, so cusolver impl will not be used. The cusolver impl will also not be used on platforms other than Nvidia CUDA.

060769feaf/aten/src/ATen/native/cuda/BatchLinearAlgebraLib.h (L10-L13)

Note that there is a new heuristic used before cusolver/cublas calls here:

8c0949ae45/aten/src/ATen/native/cuda/MiscUtils.h (L113-L121)

where `use_loop_launch = true` means launch single batch cusolver functions in parallel, and `use_loop_launch = false` means use cublas_X_batched functions. When magma is enabled (only `batch_size <= 2` will be dispatched to cusolver/cublas), the heuristic will always return `true` and the cusolver calls are faster than small batch_size magma calls. When magma is disabled, this adds the functionality of `torch.inverse`, which was disabled before for all shapes (though large batch_size cublas performance may not be as well as magma).

Checklist:
- [X] Add benchmark, cpu, gpu-before (magma), gpu-after (cusolver)
- [X] Rewrite single inverse (ndim == 2) with cusolver
- [X] Rewrite batched inverse (ndim > 2) with cublas
- [X] Add cusolver to build
- [x] Clean up functions related to `USE_MAGMA` define guard
- [x] Workaround for non-cuda platform
- [x] Workaround for cuda 9.2
- [x] Add zero size check
- [x] Add tests

Next step:

If cusolver doesn't cause any problem in pytorch build, and there are no major performance regressions reported after this PR being merged, I will start porting other cusolver/cublas functions for linear algebra to improve the performance.

<details>
<summary> benchmark 73499c6 </summary>

benchmark code: https://github.com/xwang233/code-snippet/blob/master/torch.inverse/inverse-cusolver.ipynb

shape meaning:

* `[] 2 torch.float32 -> torch.randn(2, 2, dtype=torch.float32)`
* `[2] 4 torch.float32 -> torch.randn(2, 4, 4, dtype=torch.float32)`

| shape | cpu_time (ms) | gpu_time_before (magma) (ms) | gpu_time_after (ms) |
| --- | --- | --- | --- |
| [] 2 torch.float32 |  0.095 |  7.534 |  0.129  |
| [] 4 torch.float32 |  0.009 |  7.522 |  0.129  |
| [] 8 torch.float32 |  0.011 |  7.647 |  0.138  |
| [] 16 torch.float32 |  0.075 |  7.582 |  0.135  |
| [] 32 torch.float32 |  0.073 |  7.573 |  0.191  |
| [] 64 torch.float32 |  0.134 |  7.694 |  0.288  |
| [] 128 torch.float32 |  0.398 |  8.073 |  0.491  |
| [] 256 torch.float32 |  1.054 |  11.860 |  1.074  |
| [] 512 torch.float32 |  5.218 |  14.130 |  2.582  |
| [] 1024 torch.float32 |  19.010 |  18.780 |  6.936  |
| [1] 2 torch.float32 |  0.009 |  0.113 |  0.128 ***regressed |
| [1] 4 torch.float32 |  0.009 |  0.113 |  0.131 ***regressed |
| [1] 8 torch.float32 |  0.011 |  0.116 |  0.129 ***regressed |
| [1] 16 torch.float32 |  0.015 |  0.122 |  0.135 ***regressed |
| [1] 32 torch.float32 |  0.032 |  0.177 |  0.178 ***regressed |
| [1] 64 torch.float32 |  0.070 |  0.420 |  0.281  |
| [1] 128 torch.float32 |  0.328 |  0.816 |  0.490  |
| [1] 256 torch.float32 |  1.125 |  1.690 |  1.084  |
| [1] 512 torch.float32 |  4.344 |  4.305 |  2.576  |
| [1] 1024 torch.float32 |  16.510 |  16.340 |  6.928  |
| [2] 2 torch.float32 |  0.009 |  0.113 |  0.186 ***regressed |
| [2] 4 torch.float32 |  0.011 |  0.115 |  0.184 ***regressed |
| [2] 8 torch.float32 |  0.012 |  0.114 |  0.184 ***regressed |
| [2] 16 torch.float32 |  0.019 |  0.119 |  0.173 ***regressed |
| [2] 32 torch.float32 |  0.050 |  0.170 |  0.240 ***regressed |
| [2] 64 torch.float32 |  0.120 |  0.429 |  0.375  |
| [2] 128 torch.float32 |  0.576 |  0.830 |  0.675  |
| [2] 256 torch.float32 |  2.021 |  1.748 |  1.451  |
| [2] 512 torch.float32 |  9.070 |  4.749 |  3.539  |
| [2] 1024 torch.float32 |  33.655 |  18.240 |  12.220  |
| [4] 2 torch.float32 |  0.009 |  0.112 |  0.318 ***regressed |
| [4] 4 torch.float32 |  0.010 |  0.115 |  0.319 ***regressed |
| [4] 8 torch.float32 |  0.013 |  0.115 |  0.320 ***regressed |
| [4] 16 torch.float32 |  0.027 |  0.120 |  0.331 ***regressed |
| [4] 32 torch.float32 |  0.085 |  0.173 |  0.385 ***regressed |
| [4] 64 torch.float32 |  0.221 |  0.431 |  0.646 ***regressed |
| [4] 128 torch.float32 |  1.102 |  0.834 |  1.055 ***regressed |
| [4] 256 torch.float32 |  4.042 |  1.811 |  2.054 ***regressed |
| [4] 512 torch.float32 |  18.390 |  4.884 |  5.087 ***regressed |
| [4] 1024 torch.float32 |  69.025 |  19.840 |  20.000 ***regressed |

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42403

Reviewed By: ailzhang, mruberry

Differential Revision: D23717984

Pulled By: ngimel

fbshipit-source-id: 54cbd9ea72a97989cff4127089938e8a8e29a72b
2020-09-18 20:43:29 -07:00
Ivan Kobzarev
e9941a5dd4 [vulkan][py] torch.utils.optimize_for_vulkan (#44903)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44903

Test Plan: Imported from OSS

Reviewed By: kimishpatel

Differential Revision: D23766039

Pulled By: IvanKobzarev

fbshipit-source-id: dbdf484ee7d3a7719aab105efba51b92ebc51568
2020-09-18 18:20:11 -07:00
Shawn Wu
572f7e069c Enable type check for torch.testing._internal.te_utils.* (#44927)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44927

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D23776842

Pulled By: sshawnwu

fbshipit-source-id: 65c028169a37e1f2f7d9fdce8a958234ee1caa26
2020-09-18 18:09:15 -07:00
James Reed
043466f978 [FX] Pass module's qualname to is_leaf_module (#44966)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44966

Test Plan: Imported from OSS

Reviewed By: dzhulgakov

Differential Revision: D23790360

Pulled By: jamesr66a

fbshipit-source-id: 7ef569fd93646584b27af7a615fa69c8d8bbdd3b
2020-09-18 17:02:33 -07:00
Peter Bell
fd4e21c91e Add optional string support to native_functions schema (#43010)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43010

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23751851

Pulled By: mruberry

fbshipit-source-id: 648f7430e1b7311eff28421f38e01f52d998fcbd
2020-09-18 14:57:24 -07:00
Michael Suo
374e9373b5 [jit] Pull (most) tests out of libtorch_python (#44795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44795

Today, we build our cpp tests twice, once as a standalone gtest binary,
and once linked in `libtorch_python` so we can call them from
`test_jit.py`.

This is convenient (it means that `test_jit.py` is a single entry point
for all our tests), but has a few drawbacks:
1. We can't actually use the gtest APIs, since we don't link gtest into
`libtorch_python`. We're stuck with the subset that we want to write
polyfills for, and an awkward registration scheme where you have to
write a test then include it in `tests.h`).
2. More seriously, we register custom operators and classes in these
tests. In a world where we may be linking many `libtorch_python`s, this
has a tendency to cause errors with `libtorch`.

So now, only tests that explicitly require cooperation with Python are
built into `libtorch_python`. The rest are built into
`build/bin/test_jit`.

There are tests which require that we define custom classes and
operators. In these cases, I've built thm into separate `.so`s that we
call `torch.ops.load_library()` on.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity, ZolotukhinM

Differential Revision: D23735520

Pulled By: suo

fbshipit-source-id: d146bf4e7eb908afa6f96b394e4d395d63ad72ff
2020-09-18 14:04:40 -07:00
Lucas Hosseini
af3fc9725d Extract rpc/tensorpipe_utils.{cpp,h} from rpc/utils.{cpp,h} (#44803)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44803

Test Plan: CI

Reviewed By: lw

Differential Revision: D23732022

fbshipit-source-id: 5b839c7997bbee162a14d03414ee32baabbc8ece
2020-09-18 13:51:43 -07:00
wuyangz
d22dd80128 Enable type check for torch.testing._internal.common_device_type. (#44911)
Summary:
This PR intends to fix the type exceptions in common_device_type.py.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44911

Reviewed By: walterddr

Differential Revision: D23768397

Pulled By: wuyangzhang

fbshipit-source-id: 053692583b4d6169b0eb5ffe0c3d30635c0db699
2020-09-18 13:42:11 -07:00
Richard Zou
6d312132e1 Beef up vmap docs and expose to master documentation (#44825)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44825

Test Plan: - build and view docs locally.

Reviewed By: ezyang

Differential Revision: D23742727

Pulled By: zou3519

fbshipit-source-id: f62b7a76b5505d3387b7816c514c086c01089de0
2020-09-18 13:26:25 -07:00
Sam Estep
c2cf6efd96 Enable type check for torch.testing._internal.dist_utils.* (#44832)
Summary:
Addresses a sub-task of https://github.com/pytorch/pytorch/issues/44752.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44832

Reviewed By: malfet

Differential Revision: D23744260

Pulled By: samestep

fbshipit-source-id: 46aede57b4fa66a770d5df382b0aea2bd6772b9b
2020-09-18 12:50:48 -07:00
Nick Gibson
f175830558 [NNC] Fuse identical conditions in simplifier (#44886)
Summary:
Adds a pass to the IR Simplifier which fuses together the bodies of Cond statements which have identical conditions. e.g.

```
if (i < 10) {
  do_thing_1;
} else {
  do_thing_2;
}
if (i < 10) {
  do_thing_3;
}
```

is transformed into:

```
if (i < 10) {
  do_thing_1;
  do_thing_3;
} else {
  do_thing_2;
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44886

Reviewed By: glaringlee

Differential Revision: D23768565

Pulled By: nickgg

fbshipit-source-id: 3fe40d91e82bdfff8dcb8c56a02a4fd579c070df
2020-09-18 11:38:03 -07:00
Yanan Cao
174cbff00a Improve sugared value's error message (#42889)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **https://github.com/pytorch/pytorch/issues/42889 Improve sugared value's error message**

I think most (if not all) cases where this code path is reached can be attributed to closing over a global variable.
Improving error message to make this clearer to users.

close https://github.com/pytorch/pytorch/issues/41288

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42889

Reviewed By: SplitInfinity

Differential Revision: D23779347

Pulled By: gmagogsfm

fbshipit-source-id: ced702a96234040f79eb16ad998d202e360d6654
2020-09-18 11:01:40 -07:00
shubhambhokare1
0063512a4b [ONNX] Updates to diagnostic tool to find missing ops (#44124)
Summary:
Moved description of tool and changes in function name

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44124

Reviewed By: albanD

Differential Revision: D23674618

Pulled By: bzinodev

fbshipit-source-id: 5db0bb14fc106fc96358b1e0590f08e975388c6d
2020-09-18 10:32:30 -07:00
Yi Wang
c68cc78299 Add a device parameter to RemoteModule (#44254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44254

Add a device parameter to RemoteModule, so it can be placed on any device
and not just CPU.

Original PR issue: RemoteModule enhancements #40550

Test Plan: buck test test/distributed/rpc:process_group_agent -- RemoteModule

Reviewed By: pritamdamania87

Differential Revision: D23483803

fbshipit-source-id: 4918583c15c6a38a255ccbf12c9168660ab7f6db
2020-09-18 10:31:03 -07:00
Gregory Chanan
07b7e44ed1 Stop using check_criterion_jacobian. (#44786)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44786

This predates gradcheck and gradcheck does the same and more.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23731902

Pulled By: gchanan

fbshipit-source-id: 425fd30e943194f63a663708bada8960265b8f05
2020-09-18 07:04:57 -07:00
Gregory Chanan
6d178f6b8e Stop ignoring errors in cuda nn module tests. (#44783)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44783

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23731778

Pulled By: gchanan

fbshipit-source-id: 32df903a9e36bbf3f66645ee2d77efa5ed6ee429
2020-09-18 07:03:41 -07:00
Peter Bell
df39c40054 Cleanup tracer handling of optional arguments (#43009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43009

* **#43009 Cleanup tracer handling of optional arguments**

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23766621

Pulled By: mruberry

fbshipit-source-id: c1b46cd23b58b18ef4c03021b2514d7e692badb6
2020-09-18 06:54:09 -07:00
Peter Bell
caea1adc35 Complex support for stft and istft (#43886)
Summary:
Ref https://github.com/pytorch/pytorch/issues/42175, fixes https://github.com/pytorch/pytorch/issues/34797

This adds complex support to `torch.stft` and `torch.istft`. Note that there are really two issues with complex here: complex signals, and returning complex tensors.

## Complex signals and windows
`stft` currently assumes all signals are real and uses `rfft` with `onesided=True` by default. Similarly, `istft` always takes a complex fourier series and uses `irfft` to return real signals.

For `stft`, I now allow complex inputs and windows by calling the full `fft` if either are complex. If the user gives `onesided=True` and the signal is complex, then this doesn't work and raises an error instead. For `istft`, there's no way to automatically know what to do when `onesided=False` because that could either be a redundant representation of a real signal or a complex signal. So there, the user needs to pass the argument `return_complex=True` in order to use `ifft` and get a complex result back.

## stft returning complex tensors
The other issue is that `stft` returns a complex result, represented as a `(... X 2)` real tensor. I think ideally we want this to return proper complex tensors but to preserver BC I've had to add a `return_complex` argument to manage this transition. `return_complex` defaults to false for real inputs to preserve BC but defaults to True for complex inputs where there is no BC to consider.

In order to `return_complex` by default everywhere without a sudden BC-breaking change, a simple transition plan could be:
1. introduce `return_complex`, defaulted to false when BC is an issue but giving a warning. (this PR)
2. raise an error in cases where `return_complex` defaults to false, making it a required argument.
3. change `return_complex` default to true in all cases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43886

Reviewed By: glaringlee

Differential Revision: D23760174

Pulled By: mruberry

fbshipit-source-id: 2fec4404f5d980ddd6bdd941a63852a555eb9147
2020-09-18 01:39:47 -07:00
Rohan Varma
5dbcbea265 TorchScript with record_function (#44345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44345

As part of enhancing profiler support for RPC, when executing TorchScript functions over RPC, we would like to be able to support user-defined profiling scopes created by `with record_function(...)`.

Since after https://github.com/pytorch/pytorch/pull/34705, we support `with` statements in TorchScript, this PR adds support for `with torch.autograd.profiler.record_function` to be used within TorchScript.

This can be accomplished via the following without this PR:
```
torch.opts.profiler._record_function_enter(...)
# Script code, such as forward pass
torch.opts.profiler._record_function_exit(....)
```

This is a bit hacky and it would be much cleaner to use the context manager now that we support `with` statements. Also, `_record_function_` type operators are internal operators that are subject to change, this change will help avoid BC issues in the future.

Tested with `python test/test_jit.py TestWith.test_with_record_function -v`
ghstack-source-id: 112320645

Test Plan:
Repro instructions:
1) Change `def script_add_ones_return_any(x) -> Any` to `def script_add_ones_return_any(x) -> Tensor` in `jit/rpc_test.py`
2) `buck test mode/dev-nosan //caffe2/test/distributed/rpc:process_group_agent -- test_record_function_on_caller_rpc_async --print-passing-details`
3) The function which ideally should accept `Future[Any]` is `def _call_end_callbacks_on_future` in `autograd/profiler.py`.

python test/test_jit.py TestWith.test_with_foo -v

Reviewed By: pritamdamania87

Differential Revision: D23332074

fbshipit-source-id: 61b0078578e8b23bfad5eeec3b0b146b6b35a870
2020-09-17 18:45:00 -07:00
Yuxin Wu
9a007ba4cb [jit] stop parsing the block after seeing exit statements (#44870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44870

fix https://github.com/pytorch/pytorch/issues/44864

Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- 'test_assert_is_script'

Reviewed By: eellison

Differential Revision: D23755094

fbshipit-source-id: ca3f8b27dc6f9dc9364a22a1bce0e2f588ed4308
2020-09-17 18:09:16 -07:00
James Reed
60ae6c9c18 [FX] Fix GraphModule copy methods not regenerating forward (#44806)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44806

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D23738732

Pulled By: jamesr66a

fbshipit-source-id: 14e13551c6568c562f3f789b6274b6c86afefd0b
2020-09-17 17:14:38 -07:00
Yanli Zhao
e14b2080be [reland] move rebuild buckets from end of first iteration to beginning of second iteration (#44798)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44798

[test all]

Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well.

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration
ghstack-source-id: 112279261
ghstack-source-id: 112279261

Test Plan: unit tests

Reviewed By: rohan-varma

Differential Revision: D23735185

fbshipit-source-id: c26e0efeecb3511640120faa1122a2c856cd694e
2020-09-17 17:10:21 -07:00
Nikita Shulga
2043fbdfb6 Enable torch.backends.cuda typechecking in CI (#44916)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44916

Reviewed By: walterddr

Differential Revision: D23769844

Pulled By: malfet

fbshipit-source-id: 3be3616fba9e2f9c6d89cc71d5f0d24ffcc45cf2
2020-09-17 15:31:38 -07:00
Alex Suhan
18b77d7d17 [TensorExpr] Add Mod support to the LLVM backend (#44823)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44823

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseMod_LLVM

Reviewed By: glaringlee

Differential Revision: D23761996

Pulled By: asuhan

fbshipit-source-id: c3c5b2fe0d989dec04f0152ce47c5cae35ed19c9
2020-09-17 15:25:42 -07:00
Jane (Yuan) Xu
1c996b7170 Enable typechecking for torch.testing._internal.common_quantized.* (#44805)
Summary:
Addresses a subproblem of [Issue 42969](https://github.com/pytorch/pytorch/issues/42969)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44805

Reviewed By: malfet

Differential Revision: D23742754

Pulled By: janeyx99

fbshipit-source-id: e916a6a0c049cac318549a485d47f19363087d15
2020-09-17 14:24:32 -07:00
Alex Suhan
f5b92332c1 [TensorExpr] Fix order comparisons for unsigned types (#44857)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44857

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMCompareSelectByte*_LLVM

Reviewed By: glaringlee

Differential Revision: D23762162

Pulled By: asuhan

fbshipit-source-id: 1553429bd2d5292ccda57910326b8c70e4e6ab88
2020-09-17 14:16:54 -07:00
Nikita Shulga
4066022146 Do not use PRId64 in torch/csrc (#44767)
Summary:
Instead use `fmt::format()` or `%lld` and cast argument to `(long long)`
Fix typos and add helper `PyErr_SetString()` method in torch/csrc/Exceptions.h

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44767

Reviewed By: ezyang

Differential Revision: D23723671

Pulled By: malfet

fbshipit-source-id: c0101aed222184aa436b1e8768480d1531dff232
2020-09-17 14:00:02 -07:00
Alex Suhan
5d57025206 [TensorExpr] Add log1p support to the LLVM backend (#44839)
Summary:
Also corrected Sleef_log1p registrations, float versions had a redundant f.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44839

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseLog1pFloat_LLVM

Reviewed By: glaringlee

Differential Revision: D23762113

Pulled By: asuhan

fbshipit-source-id: b5cf003b5c0c1ad549c7f04470352231929ac459
2020-09-17 13:38:35 -07:00
Rohan Varma
bee97d5be0 Document the default behavior for dist.new_group() when ranks=None (#44000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44000

This wasn't documented, so add a doc saying all ranks are used when
ranks=None
ghstack-source-id: 111206308

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D23465034

fbshipit-source-id: 4c51f37ffcba3d58ffa5a0adcd5457e0c5676a5d
2020-09-17 11:30:37 -07:00
Yanan Cao
2558e5769d Implement sort for list of tuples (#43448)
Summary:
* Implement tuple sort by traversing contained IValue types and generate a lambda function as comparator for sort.
* Tuple, class objects can now arbitrarily nest within each other and still be sortable

Fixes https://github.com/pytorch/pytorch/issues/43219

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43448

Reviewed By: eellison

Differential Revision: D23352273

Pulled By: gmagogsfm

fbshipit-source-id: b6efa8d00e112178de8256da3deebdba7d06c0e1
2020-09-17 11:20:56 -07:00
Supriya Rao
1fde54d531 [quant][qat] Ensure fake_quant and observer can be disabled on scriptmodule (#44773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44773

The model is created and prepared using fx APIs and then scripted for training.
In order to test QAT on scriptmodel we need to be able to disable/enable fake_quant
and observer modules on it.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qat_and_script

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23741354

fbshipit-source-id: 3fee7aa9b049d9901313b977710f4dc1c4501532
2020-09-17 10:21:52 -07:00
Supriya Rao
361b38da19 [quant][fx] Add node name as prefix to observer module name (#44765)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44765

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_save_observer_state_dict

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23741355

fbshipit-source-id: 7185ceae5b3b520ac0beebb627c44eab7ae7d231
2020-09-17 10:17:42 -07:00
Natalia Gimelshein
74c3dcd1d2 Revert D23725053: [pytorch][PR] change self.generator to generator
Test Plan: revert-hammer

Differential Revision:
D23725053 (a011b86115)

Original commit changeset: 89706313013d

fbshipit-source-id: 035214f0d4298d29a52f8032d364b52dfd956fe8
2020-09-17 09:42:37 -07:00
Yanli Zhao
d2b4534d4d refactor intialize bucket views (#44330)
Summary:
[test all]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44330

Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well
ghstack-source-id: 112257271

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D23583347

fbshipit-source-id: a5f2041b2c4f2c2b5faba1af834c7143eaade938
2020-09-17 09:20:23 -07:00
Jane Xu
4affbbd9f8 minor style edits to torch/testing/_internal/common_quantized.py (#44807)
Summary:
style nits

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44807

Reviewed By: malfet

Differential Revision: D23742537

Pulled By: janeyx99

fbshipit-source-id: 446343822d61f8fd9ef6dfcb8e5da4feff6522b6
2020-09-17 08:02:43 -07:00
Heitor Schueroff de Souza
28085cbd39 Fixed quantile nan propagation and implemented nanquantile (#44393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44393

torch.quantile now correctly propagates nan and implemented torch.nanquantile similar to numpy.nanquantile.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23649613

Pulled By: heitorschueroff

fbshipit-source-id: 5201d076745ae1237cedc7631c28cf446be99936
2020-09-17 05:53:25 -07:00
Yanan Cao
99093277c0 Support Python Slice class in TorchScript (#44335)
Summary:
Implements support for[ Python Slice class](https://docs.python.org/3/c-api/slice.html) (not slice expression, which is already supported)

Slice object can be used in any place that supports slice expression, including multi-dim tensor slicing.

Fixes https://github.com/pytorch/pytorch/issues/43511
Fixes https://github.com/pytorch/pytorch/issues/43125

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44335

Reviewed By: suo, jamesr66a

Differential Revision: D23682213

Pulled By: gmagogsfm

fbshipit-source-id: f74fe25370e89fbfd2b3727d95ce4e1c4ba8dec4
2020-09-17 00:41:53 -07:00
Sameer Deshmukh
e18a2219dd Implement scatter reductions (CUDA), remove divide/subtract (#41977)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33394 .

This PR does two things:
1. Implement CUDA scatter reductions with revamped GPU atomic operations.
2. Remove support for divide and subtract for CPU reduction as was discussed with ngimel .

I've also updated the docs to reflect the existence of only multiply and add.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41977

Reviewed By: mruberry

Differential Revision: D23748888

Pulled By: ngimel

fbshipit-source-id: ea643c0da03c9058e433de96db02b503514c4e9c
2020-09-16 23:25:21 -07:00
Muthu Arivoli
b61d3d8be8 Implement torch.kaiser_window (#44271)
Summary:
Related to https://github.com/pytorch/pytorch/issues/38349

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44271

Reviewed By: ngimel

Differential Revision: D23727972

Pulled By: mruberry

fbshipit-source-id: b4c931b2eb3a536231ad6d6c3cb66e52a13286ac
2020-09-16 20:41:31 -07:00
alanashine
ba6534ae2b enable type check common_distributed (#44821)
Summary:
Enabled type checking in common_distributed by using tensors of ints

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44821

Test Plan: Run python test/test_type_hints.py, errors are no longer ingnored by mypy.ini

Reviewed By: walterddr

Differential Revision: D23747466

Pulled By: alanadakotashine

fbshipit-source-id: 820fd502d7ff715728470fbef0be90ae7f128dd6
2020-09-16 19:19:36 -07:00
Xiang Gao
e48201c5cf Mention TF32 on related docs (#44690)
Summary:
cc: ptrblck

![image](https://user-images.githubusercontent.com/1032377/93168022-cbbfcb80-f6d6-11ea-8f6e-f2c8a15c5bea.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44690

Reviewed By: ngimel

Differential Revision: D23727921

Pulled By: mruberry

fbshipit-source-id: db7cc8e74cde09c13d6a57683129fd839863b914
2020-09-16 19:18:30 -07:00
James Reed
29664e6aa3 [FX] Further sanitize generated names (#44808)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44808

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D23739413

Pulled By: jamesr66a

fbshipit-source-id: b759c3ea613dfa717fb23977b72ff4773d9dcc99
2020-09-16 18:47:38 -07:00
Nick Gibson
204f985fc3 [NNC] Add simplification of Loop + Condition patterns. (#44764)
Summary:
Adds a new optimization to the IRSimplifier which changes this pattern:
```
for ...
  if ...
   do thing;
```
into:
```
if ...
  for ...
    do thing;
```

Which should be almost strictly better.

There are many cases where this isn't safe to do, hence tests. Most  obviously when the condition depends on something modified within the loop.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44764

Reviewed By: mruberry

Differential Revision: D23734463

Pulled By: nickgg

fbshipit-source-id: 51617e837de96b354fb702d0090ac65ddc523d36
2020-09-16 18:41:58 -07:00
Yanan Cao
6befc09465 Fix misuse of PyObject_IsSubclass (#44769)
Summary:
PyObject_IsSubclass may set python live exception bit if given object is not a class. `IsNamedTuple` is currently using it incorrectly, which may trip all following python operations in debug-build python. Normal release-build python is not affected because `assert` is no-op in release-build.

Fixes https://github.com/pytorch/pytorch/issues/43577

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44769

Reviewed By: jamesr66a

Differential Revision: D23725584

Pulled By: gmagogsfm

fbshipit-source-id: 2dabd4f8667a045d5bf75813500876c6fd81542b
2020-09-16 16:19:01 -07:00
Meghan Lele
43fe034514 [JIT] Disallow plain Optional type annotation without arg (#44586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44586

**Summary**
This commit disallows plain `Optional` type annotations without
any contained types both in type comments and in-line as
Python3-style type annotations.

**Test Plan**
This commit adds a unit test for these two situations.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23721517

Pulled By: SplitInfinity

fbshipit-source-id: ead411e94aa0ccce227af74eb0341e2a5331370a
2020-09-16 16:07:26 -07:00
Mingzhe Li
574f9af160 [NCCL] Add option to run NCCL on high priority cuda stream (#43796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43796

This diff adds an option for the process group NCCL backend to pick high priority cuda streams.

Test Plan: waitforsandcastle

Reviewed By: jiayisuse

Differential Revision: D23404286

fbshipit-source-id: b79ae097b7cd945a26e8ba1dd13ad3147ac790eb
2020-09-16 16:00:41 -07:00
Michael Suo
161490d441 Move torch/version.py generation to cmake (#44577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44577

I would like to to move this to cmake so that I can depend on it
happening from other parts of the build.

This PR pulls out the logic for determining the version string and
writing the version file into its own module. `setup.py` still receives
the version string and uses it as before, but now the code for writing
out `torch/version.py` lives in a custom command in torch/CMakeLists.txt

I noticed a small inconsistency in how version info is populated.
`TORCH_BUILD_VERSION` is populated from `setup.py` at configuration
time, while `torch/version.py` is written at build time. So if, e.g. you
configured cmake on a certain git rev, then built it in on another, the
two versions would be inconsistent.

This does not appear to matter, so I opted to preserve the existing
behavior.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23734781

Pulled By: suo

fbshipit-source-id: 4002c9ec8058503dc0550f8eece2256bc98c03a4
2020-09-16 15:49:22 -07:00
Meghan Lele
ffe127e4f1 [JIT] Disallow plain Tuple type annotation without arg (#44585)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44585

**Summary**
This commit disallows plain `Tuple` type annotations without any
contained types both in type comments and in-line as Python3-style
type annotations.

**Test Plan**
This commit adds a unit test for these two situations.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23721515

Pulled By: SplitInfinity

fbshipit-source-id: e11c77a4fac0b81cd535c37a31b9f4129c276592
2020-09-16 15:49:19 -07:00
qxu
09a84071a3 enable mypy check for jit_metaprogramming_utils (#44752)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42969
enable mypy check for jit_metaprogramming_utils.py and fixed all errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44752

Reviewed By: walterddr

Differential Revision: D23741285

Pulled By: qxu-fb

fbshipit-source-id: 21e36ca5d25c8682fb93b806e416b9e1db76f71e
2020-09-16 15:44:37 -07:00
Alex Suhan
7b3432caff [TensorExpr] Support boolean in simplifier (#44659)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44659

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ConstantFoldCastToBool

Reviewed By: ngimel

Differential Revision: D23714675

Pulled By: asuhan

fbshipit-source-id: 4c18d972b628d5ad55bad58eddd5f6974e043d9c
2020-09-16 15:30:19 -07:00
Meghan Lele
78b806ab4a [JIT] Disallow plain List type annotation without arg (#44584)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44584

**Summary**
This commit extends the work done in #38130 and disallows plain
Python3-style `List` type annotations.

**Test Plan**
This commit extends `TestList.test_no_element_type_annotation` to the
Python3-style type annotation.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23721514

Pulled By: SplitInfinity

fbshipit-source-id: 48957868286f44ab6d5bf5e1bf97f0a4ebf955df
2020-09-16 15:08:04 -07:00
Meghan Lele
cb3b8a33f1 [JIT] Disallow plain Dict type annotation without arg (#44334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44334

**Summary**
This commit detects and prohibits the case in which `typing.Dict` is
used as an annotation without type arguments (i.e. `typing.Dict[K, V]`).
At present, `typing.Dict` is always assumed to have two arguments, and
when it is used without them, `typing.Dict.__args__` is nonempty and
contains some `typing.TypeVar` instances, which have no JIT type equivalent.
Consequently, trying to convert `typing.Dict` to a JIT type results in
a `c10::DictType` with `nullptr` for its key and value types, which can cause
a segmentation fault.

This is fixed by returning a `DictType` from
`jit.annotations.try_ann_to_type` only if the key and value types are converted
successfully to a JIT type and returning `None` otherwise.

**Test Plan**
This commit adds a unit test to `TestDict` that tests the plain `Dict`
annotations throw an error.

**Fixes**
This commit closes #43530.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23610766

Pulled By: SplitInfinity

fbshipit-source-id: 036b10eff6e3206e0da3131cfb4997d8189c4fec
2020-09-16 14:38:28 -07:00
Edward Yang
5027c161a9 Add TORCH_SELECTIVE_NAME to AMP definitions (#44711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44711

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23711425

Pulled By: ezyang

fbshipit-source-id: d4b0ef77893af80fe9b74791e66825e223ae221d
2020-09-16 14:25:17 -07:00
Nick Gibson
82ab167cce [NNC] Fix masking for all block and thread dimensions in CudaCodeGen (#44733)
Summary:
Unifies a number of partial solutions to the thread and block dimension extent masking, including the NoThreadIdxWriter and my last fix https://github.com/pytorch/pytorch/issues/44325. The NoThreadIdxWriter is gone in favour of tracking the current loop extents and masking any statements that have a lower rank than the launch parameters in any Block or Thread dimension, which handles both the "no" and "smaller" axis binding cases.

For example it will transform the following:
```
for i in 0..10 // blockIdx.x
  for j in 0..10 // threadIdx.x
    do thing(i, j);
  for k in 0..5 // threadIdx.x
    do other thing(i, k);
```

Into:
```
do thing(blockIdx.x, threadIdx.x);
if (threadIdx.x < 5) {
  do other thing(blockIdx.x, threadIdx.x);
}
```

And handle the case where statements are not bound by any axis, eg.
```
do outer thing;
for i in 0..10 // blockIdx.x
  for j in 0..10 // threadIdx.x
    do thing(i, j);
  do other thing(i);
```

will become:

```
if (blockIdx.x < 1) {
  if (threadIdx.x < 1) {
    do outer thing;
  }
}
syncthreads();
do thing(blockIdx.x, threadIdx.x);
syncthreads();
if (threadIdx.x < 1) {
  do other thing(blockIdx.x);
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44733

Reviewed By: mruberry

Differential Revision: D23736878

Pulled By: nickgg

fbshipit-source-id: 52d08626ae8043d53eb937843466874d479a6768
2020-09-16 14:23:47 -07:00
Yi Wang
f3bd984e44 Move the description comment of compute_bucket_assignment_by_size from cpp to the header file. (#44703)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44703

The description of this public function should be in the header file.

Also fix some typos.

Test Plan: N/A.

Reviewed By: pritamdamania87

Differential Revision: D23703661

fbshipit-source-id: 24ae63de9498e321b31dfb2efadb44183c6370df
2020-09-16 13:44:14 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
James Reed
e9c6449b46 [FX][EZ] Allow constructing GraphModule with dict for root (#44679)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44679

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D23696766

Pulled By: jamesr66a

fbshipit-source-id: fe18b7b579c1728d00589bd5fd5e54c917cc61fe
2020-09-16 12:43:23 -07:00
Nikita Shulga
c44e4878ae Enable torch.backends.quantized typechecks (#44794)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44793

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44794

Reviewed By: walterddr

Differential Revision: D23734353

Pulled By: malfet

fbshipit-source-id: 491bd7c8f147759715eb296d7537a172685aa066
2020-09-16 12:21:20 -07:00
Shen Li
cce7680a23 Add bound method tests for async_execution with RRef helper (#44716)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44716

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D23707326

Pulled By: mrshenli

fbshipit-source-id: a2f8db17447e9f82c9f6ed941ff1f8cb9090ad74
2020-09-16 12:01:07 -07:00
Shen Li
257c6d0fde Make async_execution compatible with RRef helpers (#44666)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44666

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D23691989

Pulled By: mrshenli

fbshipit-source-id: b36f4b1c9d7782797a0220434a8272610a23e83e
2020-09-16 12:01:05 -07:00
Shen Li
924717bf51 Add _get_type() API to RRef (#44663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44663

The new API returns the type of the data object referenced by this
`RRef`. On the owner, this is same as `type(rref.local_value())`.
On a user, this will trigger an RPC to fetch the `type` object from
the owner. After this function is run once, the `type` object is
cached by the `RRef`, and subsequent invocations no longer trigger
RPC.

closes #33210

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D23691990

Pulled By: mrshenli

fbshipit-source-id: a2d87cd601a691dd75164b6bcd7315245e9cf6bd
2020-09-16 11:59:22 -07:00
Yanan Cao
07d07e3c6c Remove EXPERIMENTAL_ENUM_SUPPORT feature guard (#44243)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41095

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44243

Reviewed By: ZolotukhinM

Differential Revision: D23605979

Pulled By: gmagogsfm

fbshipit-source-id: 098ae69049c4664ad5d1521c45b8a7dd22e72f6c
2020-09-16 11:45:59 -07:00
Michael Carilli
3e6bb5233f Reference amp tutorial (recipe) from core amp docs (#44725)
Summary:
https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live.  Core amp docs should reference it.

Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725

Reviewed By: mruberry

Differential Revision: D23723807

Pulled By: ngimel

fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3
2020-09-16 11:37:58 -07:00
Fang Zhang
a011b86115 change self.generator to generator (#44461)
Summary:
bug fix

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44461

Reviewed By: mruberry

Differential Revision: D23725053

Pulled By: ngimel

fbshipit-source-id: 89706313013d9eae96aaaf144924867457efd2c0
2020-09-16 11:32:17 -07:00
Jimmy Yao
5e717f0d5e delete the space for the docs rendering (#44740)
Summary:
see the docs rendering of `jacobian` and `hessian` at https://pytorch.org/docs/stable/autograd.html

![image](https://user-images.githubusercontent.com/20907377/93268949-f0618500-f762-11ea-9ec6-ddd062540c59.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44740

Reviewed By: ngimel

Differential Revision: D23724899

Pulled By: mrshenli

fbshipit-source-id: f7558ff53989e5dc7e678706207be2ac7ce22c66
2020-09-16 11:13:45 -07:00
Pritam Damania
dbf17a1d4c Fixing a few links in distributed CONTRIBUTING.md (#44753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44753

ghstack-source-id: 112132781

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D23719077

fbshipit-source-id: 3d943dfde100d175f417554fc7fca1fdb295129f
2020-09-16 10:14:19 -07:00
Rohan Varma
63469da3bb Add a test to ensure DDP join works with RPC (#44439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44439

Adds a test to ddp_under_dist_autograd_test to enusre that that uneven
inputs join() API works properly when DDP + RPC is combined. We test that when
running in outside DDP mode (DDP applied to whole hybrid module) we can
correctly process uneven inputs across different trainers.
ghstack-source-id: 112156980

Test Plan: CI

Reviewed By: albanD

Differential Revision: D23612409

fbshipit-source-id: f1e328c096822042daaba263aa8747a9c7e89de7
2020-09-16 09:51:43 -07:00
Supriya Rao
3f512b0de2 [quant][qat] Ensure observers and fq modules are scriptable (#44749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44749

Ensure fx module is scriptable after calling prepare_qat on it

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qat_and_script

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23718380

fbshipit-source-id: abf63ffb21e707f7def8f6c88246877f5aded58c
2020-09-16 09:30:07 -07:00
Mikhail Zolotukhin
d66520ba08 [TensorExpr] Fuser: try merging adjacent fusion groups. (#43671)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43671

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23360796

Pulled By: ZolotukhinM

fbshipit-source-id: 60ec318fe77ae9f2c821d9c4d106281845266e0f
2020-09-15 21:31:02 -07:00
Kent Gauen
2efc618f19 lr_schedule.py redundant code (#44613)
Summary:
The subclass sets "self.last_epoch" when this is set in the parent class's init function. Why would we need to set last_epoch twice? I think calling "super" resets last_epoch anyway, so I am not sure why we would want to include this in the subclass. Am I missing something?

For the record, I am just a Pytorch enthusiast. I hope my question isn't totally silly.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44613

Reviewed By: albanD

Differential Revision: D23691770

Pulled By: mrshenli

fbshipit-source-id: 080d9acda86e1a2bfaafe2c6fcb8fc1544f8cf8a
2020-09-15 20:28:39 -07:00
Zachary DeVito
2c1b215b48 [fx] remove delegate, replace with tracer (#44566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44566

The Delegate objects were confusing. They were suppose to be a way to
configure how tracing works, but in some cases they appeared necessary
for consturcting graphs, which was not true. This makes the organization
clearer by removing Delgate and moving its functionality into a Tracer class,
similar to how pickle has a Pickler class.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23683177

Pulled By: zdevito

fbshipit-source-id: 7605a34e65dfac9a487c0bada39a23ca1327ab00
2020-09-15 16:52:22 -07:00
Ailing Zhang
fb085d90e3 Revert D23583017: move rebuild buckets from end of first iteration to beginning of second iteration
Test Plan: revert-hammer

Differential Revision:
D23583017 (f5d231d593)

Original commit changeset: ef67f79437a8

fbshipit-source-id: fd914b7565aba6a5574a32b31403525abb80ff07
2020-09-15 15:10:52 -07:00
Dmytro Dzhulgakov
2f4c31ce3a [jit] Speed up saving in case of many classes (#44589)
Summary:
There's an annoying O(N^2) in module export logic that makes saving some of the models (if they have many classes) take eternity.

I'm not super familiar with this code to properly untangle the deps and make it a pure hash lookup. So I just added a side lookup table for raw pointers. It's still quadratic, but it's O(num_classes^2) instead of O(num_classes * num_references) which already gives huge savings.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44589

Test Plan:
Tested with one of the offending models - just loading a saving a Torchscript file:

```
Before:
load 1.9239683151245117
save 165.74712467193604

After:
load 1.9409027099609375
save 1.4711427688598633
```

Reviewed By: suo

Differential Revision: D23675278

Pulled By: dzhulgakov

fbshipit-source-id: 8f3fa7730941085ea20d9255b49a149ac1bf64fe
2020-09-15 15:10:45 -07:00
Nick Gibson
69839ea3f6 [NNC] make inlining immediate (take 3) (#44231)
Summary:
This is a reup https://github.com/pytorch/pytorch/issues/43885 with an extra commit which should fix the bugs that caused it to be reverted. Read that for general context.

The issue here was that we were still using the side maps `tensor_to_stmt_` and `stmt_to_tensor_` which get invalidated by any transform of the IR (rather than just any transform that isn't computeInline). I added a comment about this but didn't actually address our usages of it.

I've removed these maps and changed the `getLoopBodyFor` and `getLoopStatementsFor` helpers to search the root stmt directly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44231

Reviewed By: albanD

Differential Revision: D23689688

Pulled By: nickgg

fbshipit-source-id: 1c6009a880f8c0cebf2300fd06b5cc9322bffbf9
2020-09-15 11:12:24 -07:00
Elias Ellison
8df0400a50 Fix fallback graph in specialize autogradzero (#44654)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44654

Previously we weren't creating a fallback graph as intended in specialize autograd zero, so if a Tensor failed one of our undefinedness checks we would run the backward normally without reprofiling & optimizing.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23691764

Pulled By: eellison

fbshipit-source-id: 10c6fa79518c84a6f5ef2bfbd9ea10843af751eb
2020-09-15 11:12:20 -07:00
kshitij12345
1d733d660d [docs] torch.min/max: remove incorrect warning from docs (#44615)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44195

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44615

Reviewed By: ngimel

Differential Revision: D23703525

Pulled By: mruberry

fbshipit-source-id: 471ebd764be667e29c03a30f3ef341440adc54d2
2020-09-15 10:42:08 -07:00
Xiang Gao
6bc77f4d35 Use amax/maximum instead of max in optimizers (#43797)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43797

Reviewed By: malfet

Differential Revision: D23406641

Pulled By: mruberry

fbshipit-source-id: 0cd075124aa6533b21375fe2c90c44a5d05ad6e6
2020-09-15 10:39:40 -07:00
Muthu Arivoli
9c364da9b9 Fix doc builds for bool kwargs (#44686)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43669

The bool will still link to https://docs.python.org/3/library/functions.html#bool.
Tested using bmm:
![image](https://user-images.githubusercontent.com/16063114/93156438-2ad11080-f6d6-11ea-9b81-96e02ee68d90.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44686

Reviewed By: ngimel

Differential Revision: D23703823

Pulled By: mruberry

fbshipit-source-id: 7286afad084f5ab24a1254ad84e5d01907781c85
2020-09-15 10:34:58 -07:00
Yanli Zhao
f5d231d593 move rebuild buckets from end of first iteration to beginning of second iteration (#44326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44326

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration
ghstack-source-id: 112011490

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D23583017

fbshipit-source-id: ef67f79437a820d9b5699b651803622418499a83
2020-09-15 09:51:33 -07:00
Vasiliy Kuznetsov
5f692a67db qat conv_fused.py: one more patch for forward compatibility (#44671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44671

See comments inline - the FC between
https://github.com/pytorch/pytorch/pull/38478 and
https://github.com/pytorch/pytorch/pull/38820 was broken,
patching it.

Test Plan: Verified with customer hitting the issue that this fixes their issue.

Reviewed By: jerryzh168

Differential Revision: D23694029

fbshipit-source-id: a5e1733334e22305a111df750b190776889705d0
2020-09-15 09:43:29 -07:00
Vitaliy Chiley
c71ce10cfc add dilation to transposeconv's _output_padding method (#43793)
Summary:
This PR adds dilation to _ConvTransposeNd._output_padding method and tests using a bunch of different sized inputs.

Fixes https://github.com/pytorch/pytorch/issues/14272

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43793

Reviewed By: zou3519

Differential Revision: D23493313

Pulled By: ezyang

fbshipit-source-id: bca605c428cbf3a97d3d24316d8d7fde4bddb307
2020-09-14 21:28:27 -07:00
Meghan Lele
e7d782e724 [JIT] Add property support for ScriptModules (#42390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42390

**Summary**
This commit extends support for properties to include
ScriptModules.

**Test Plan**
This commit adds a unit test that has a ScriptModule with
a user-defined property.

`python test/test_jit_py3.py TestScriptPy3.test_module_properties`

Test Plan: Imported from OSS

Reviewed By: eellison, mannatsingh

Differential Revision: D22880298

Pulled By: SplitInfinity

fbshipit-source-id: 74f6cb80f716084339e2151ca25092b6341a1560
2020-09-14 18:49:21 -07:00
Guilherme Leobas
e107ef5ca2 Add type annotations for torch.nn.utils.* (#43080)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43013

Redo of gh-42954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43080

Reviewed By: albanD

Differential Revision: D23681334

Pulled By: malfet

fbshipit-source-id: 20ec78aa3bfecb7acffc12eb89d3ad833024394c
2020-09-14 17:52:37 -07:00
Elias Ellison
551494b01d [JIT] Fix torch.tensor for empty multidimensional-typed lists (#44652)
Summary:
We were hitting an assert error when you passed in an empty `List[List[int]]` - this fixes that error by not recursing into 0-element tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44652

Reviewed By: ZolotukhinM

Differential Revision: D23688247

Pulled By: eellison

fbshipit-source-id: d48ea24893044fae96bc39f76c0f1f9726eaf4c7
2020-09-14 17:28:23 -07:00
Mike Ruberry
686e281bcf Updates div to perform true division (#42907)
Summary:
This PR:

- updates div to perform true division
- makes torch.true_divide an alias of torch.div

This follows on work in previous PyTorch releases that first deprecated div performing "integer" or "floor" division, then prevented it by throwing a runtime error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42907

Reviewed By: ngimel

Differential Revision: D23622114

Pulled By: mruberry

fbshipit-source-id: 414c7e3c1a662a6c3c731ad99cc942507d843927
2020-09-14 15:50:38 -07:00
Jerry Zhang
e594c30bc2 [quant][graphmode][fx] Support fp16 dynamic quantization for linear (#44582)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44582

Test Plan:
test_quantize_fx.py

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23665974

fbshipit-source-id: 19ba6c61a9c77ef570b00614016506e9a2729f7c
2020-09-14 15:43:08 -07:00
BowenBao
43406e218a [ONNX] Update ONNX shape inference (#43929)
Summary:
* Support sequence type (de)serialization, enables onnx shape inference on sequence nodes.
* Fix shape inference with block input/output: e.g. Loop and If nodes.
* Fix bugs in symbolic discovered by coverage of onnx shape inference.
* Improve debuggability: added more jit logs. For simplicity, the default log level, when jit log is enabled, will not dump ir graphs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43929

Reviewed By: albanD

Differential Revision: D23674604

Pulled By: bzinodev

fbshipit-source-id: ab6aacb16d0e3b9a4708845bce27c6d65e567ba7
2020-09-14 15:36:19 -07:00
Ksenija Stanojevic
f7cfbac89b [ONNX] Update len symbolic (#43824)
Summary:
Update len symbolic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43824

Reviewed By: izdeby

Differential Revision: D23575765

Pulled By: bzinodev

fbshipit-source-id: 0e5c8c8d4a5297f65e2dc43168993350f784c776
2020-09-14 15:00:44 -07:00
shubhambhokare1
da11d932bc [ONNX] Update arange op to support out argument (#43777)
Summary:
Update arange op to support out argument

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43777

Reviewed By: albanD

Differential Revision: D23674583

Pulled By: bzinodev

fbshipit-source-id: 6fb65e048c6b1a551569d4d2a33223522d2a960c
2020-09-14 14:56:17 -07:00
neginraoof
62ebad4ff9 [ONNX] Export new_empty and new_zeros (#43506)
Summary:
Adding symbolic to export new_empty and new_zeros

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43506

Reviewed By: houseroad

Differential Revision: D23674574

Pulled By: bzinodev

fbshipit-source-id: ecfcdbd4845fd3a3c6618a060129fbeee4df5dd7
2020-09-14 14:48:34 -07:00
Zafar
742654d1b6 [quant] ConvTranspose1d / ConvTranspose2d (#40371)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40371

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D22158981

Pulled By: z-a-f

fbshipit-source-id: defbf6fbe730a58d5b155dcb2460dd969797215c
2020-09-14 14:25:06 -07:00
Alex Suhan
a188dbdf3f Check for index-rank consistency in FunctionInliner (#44561)
Summary:
When caller / callee pairs are	inserted into the mapping, verify that
the arity of the buffer access is consistent with its declared rank.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44561

Test Plan: CI, test_tensorexpr --gtest_filter=TensorExprTest.DetectInlineRankMismatch

Reviewed By: albanD

Differential Revision: D23684342

Pulled By: asuhan

fbshipit-source-id: dd3a0cdd4c2492853fa68381468e0ec037136cab
2020-09-14 14:07:22 -07:00
Rong Rong
b5dd6e3e61 split torch.testing._internal.* and add type checking for torch.testing._internal.common_cuda (#44575)
Summary:
First step to fix https://github.com/pytorch/pytorch/issues/42969.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44575

Reviewed By: malfet

Differential Revision: D23668740

Pulled By: walterddr

fbshipit-source-id: eeb3650b1780aaa5727b525b4e6182e1bc47a83f
2020-09-14 14:04:02 -07:00
mariosasko
cfba33bde3 Fix the ELU formula in the docs (#43764)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43389.

This PR replaces the old ELU formula from the docs that yields wrong results for negative alphas with the new one that fixes the issue and relies on the cases notation which makes the formula more straightforward.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43764

Reviewed By: ailzhang

Differential Revision: D23425532

Pulled By: albanD

fbshipit-source-id: d0931996e5667897d926ba4fc7a8cc66e8a66837
2020-09-14 14:01:56 -07:00
Zafar
9d4943daaf [quant] conv_transpose1d / conv_transpose2d (#40370)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40370

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D22158979

Pulled By: z-a-f

fbshipit-source-id: f5cb812c9953efa7608f06cf0188de447f73f358
2020-09-14 13:45:28 -07:00
Rong Rong
ecac8294a6 enable type checking for torch._classes (#44576)
Summary:
Fix https://github.com/pytorch/pytorch/issues/42980

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44576

Reviewed By: malfet

Differential Revision: D23668741

Pulled By: walterddr

fbshipit-source-id: 4201ea3187a40051ebff53d28c8e571ea1a61126
2020-09-14 13:26:46 -07:00
Raghavan Raman
ad7a2eb1c9 Simplify nested Min and Max patterns. (#44142)
Summary:
Improve simplification of nested Min and Max patterns.

Specifically, handles the following pattern simplications:
  * `Max(A, Max(A, Const)) => Max(A, Const)`
  * `Max(Min(A, B), Min(A, C)) => Min(A, Max(B, C))`
  * `Max(Const, Max(A, OtherConst) => Max(A, Max(Const, OtherConst))`
     - This case can have an arbitrarily long chain of Max ops. For example: `Max(5, Max(x, Max(y, Max(z, 8)))) => Max(Max(Max(x, 8), y), z)`

Similarly, for the case of Min as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44142

Reviewed By: albanD

Differential Revision: D23644486

Pulled By: navahgar

fbshipit-source-id: 42bd241e6c2af820566744c8494e5dee172107f4
2020-09-14 13:24:46 -07:00
Heitor Schueroff de Souza
199435af90 Update median doc to note return value of even-sized input (#44562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44562

Add a note that torch.median returns the smaller of the two middle elements for even-sized input and refer user to torch.quantile for the mean of the middle values.

fixes https://github.com/pytorch/pytorch/issues/39520

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23657208

Pulled By: heitorschueroff

fbshipit-source-id: 2747aa652d1e7f10229d9299b089295aeae092c2
2020-09-14 13:18:33 -07:00
Bram Wasti
a475613d1d [static runtime] Swap to out-variant compatible nodes (#44127)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44127

Test Plan: Imported from OSS

Reviewed By: hlu1

Differential Revision: D23604306

Pulled By: bwasti

fbshipit-source-id: 18ccfb9b466b822e28130be3d5c4fae36c76820b
2020-09-14 12:38:25 -07:00
Elias Ellison
856510c96d [JIT] Dont optimize shape info in batch_mm (#44565)
Summary:
We run remove profile nodes and specialize types before batch_mm, so we cannot run peepholes on the type information of tensors since these properties have not been guarded to be guaranteed to be correct.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44565

Reviewed By: albanD

Differential Revision: D23661538

Pulled By: eellison

fbshipit-source-id: 0dd23a65714f047f49b4db4ec582b21870925fe1
2020-09-14 12:34:20 -07:00
Yi Wang
ace81b6794 Remove an extra empty line in the warning comments. (#44622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44622

Remove an extra empty line in the warning comments.Remove an extra empty line.

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D23674070

fbshipit-source-id: 4ee570590c66a72fb808e9ee034fb773b833efcd
2020-09-14 11:15:35 -07:00
Natalia Gimelshein
95a69a7d09 adds list_gpu_processes function (#44616)
Summary:
per title, to make it easier to track the creation of stray contexts:
```
python -c "import torch; a=torch.randn(1, device='cuda'); print(torch.cuda.memory.list_gpu_processes(0)); print(torch.cuda.memory.list_gpu_processes(1))"
GPU:0
process      79749 uses      601.000 MB GPU memory
GPU:1
no processes are running
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44616

Reviewed By: mruberry

Differential Revision: D23675739

Pulled By: ngimel

fbshipit-source-id: ffa14cad9d7144e883de13b1c2c6817bd432f53a
2020-09-14 09:54:32 -07:00
Thomas Viehmann
bd257a17a1 Add HIP/ROCm version to collect_env.py (#44106)
Summary:
This adds HIP version info to the `collect_env.py` output.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44106

Reviewed By: VitalyFedyunin

Differential Revision: D23652341

Pulled By: zou3519

fbshipit-source-id: a1f5bce8da7ad27a1277a95885934293d0fd43c5
2020-09-14 09:19:18 -07:00
Jeremy Lilley
7040a070e3 [torch] Minor: Avoid ostreamstring in Operator's canonicalSchemaString() (#44442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44442

I noticed lock contention on startup as lookupByLiteral() was
calling registerPendingOperators() - some calls were holding the
lock for 10+ ms, as operators were being registered.

canonicalSchemaString() was using ostreamstring, which isn't typically
particularly fast (partly because of c++ spec locale requirements).
If we repalce with regular c++ string appends, it's somewhat faster
(which isn't hard when comparing with stringstream; albeit a bit
more codegen)

Over the first minute or so, this cuts out 1.4 seconds under the
OperatorRegistry lock (as part of registerPendingOperators) in the
first couple minutes of run time (mostly front-loaded) when running
sync sgd.

As an example, before:
   registerPendingOperators 12688 usec for 2449 operators
After:
   registerPendingOperators 6853 usec for 2449 operators
ghstack-source-id: 111862971

Test Plan: buck test mode/dev-nosan caffe2/test/cpp/...

Reviewed By: ailzhang

Differential Revision: D23614515

fbshipit-source-id: e712f9dac5bca0b1876e11fb8f0850402f03873a
2020-09-14 08:24:16 -07:00
kshitij12345
c68a99bd61 [numpy] Add torch.exp2 (#44184)
Summary:
Reference https://github.com/pytorch/pytorch/issues/42515

TODO
* [x] Add tests
* [x] Add docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44184

Reviewed By: ngimel

Differential Revision: D23674237

Pulled By: mruberry

fbshipit-source-id: 7f4fb1900fad3051cd7fc9d3d7f6d985c5fb093c
2020-09-14 04:05:37 -07:00
Victor Bittorf
68a5c361ae Adding Adapative Autorange to benchmark utils. (#44607)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44219

Rebasing https://github.com/pytorch/pytorch/pull/44288 and fixing the git history.

This allows users to bencmark code without having to specify how long to run the benchmark. It runs the benchmark until the variance (IQR / Median) is low enough that we can be confident in the measurement.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44607

Test Plan: There are unit tests, and we manually tested using Examples posted in git.

Reviewed By: robieta

Differential Revision: D23671208

Pulled By: bitfort

fbshipit-source-id: d63184290b88b26fb81c2452e1ae701c7d513d12
2020-09-13 20:55:40 -07:00
Peter Bell
8daaa3bc7e Fix latex error in heaviside docs (#44481)
Summary:
This fixes a `katex` error I was getting trying to build the docs:
```
ParseError: KaTeX parse error: Undefined control sequence: \0 at position 55: …gin{cases}
```

This failure was introduced in https://github.com/pytorch/pytorch/issues/42523.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44481

Reviewed By: colesbury

Differential Revision: D23627700

Pulled By: mruberry

fbshipit-source-id: 9cc09c687a7d9349da79a0ac87d6c962c9cfbe2d
2020-09-13 16:42:19 -07:00
Martin Yuan
7862827269 [pytorch] Add variadic run_method for lite intepreter (#44337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44337

Add a new run_method to mobile Module which is variadic (takes any number of arguments) to match full jit.
ghstack-source-id: 111909068

Test Plan: Added new unit test to test_jit test suite

Reviewed By: linbinyu, ann-ss

Differential Revision: D23585763

fbshipit-source-id: 007cf852290f03615b78c35aa6f7a21287ccff9e
2020-09-13 13:26:30 -07:00
Mikhail Zolotukhin
bcf97b8986 [JIT] Cleanup some places where we log graphs in executors. (#44588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44588

1) SOURCE_DUMP crashes when invoked on a backward graph since
   `prim::GradOf` nodes can't be printed as sources (they don't have
   schema).
2) Dumping graph each time we execute an optimized plan produces lots of
   output in tests where we run the graph multiple times (e.g.
   benchmarks). Outputting that on the least level of verbosity seems
   like an overkill.
3) Duplicated log statement is removed.

Differential Revision: D23666812

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: b9a30e34fd39c85f3e13c3f1e3594e157e1c130f
2020-09-13 11:31:02 -07:00
Mikhail Zolotukhin
82da6b3702 [JIT] Fix jit-log verbosity selection logic. (#44587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44587

Currently it's skewed by one.

The following test demonstrates it:
```
$ cat test.py

import torch
def foo(a,b):
    return a*a*b
torch._C._jit_set_profiling_executor(True)
torch._C._jit_set_profiling_mode(True)
torch._C._jit_override_can_fuse_on_cpu(True)
torch._C._jit_set_texpr_fuser_enabled(True)
f = torch.jit.script(foo)
for _ in range(10):
    f(torch.rand(10), torch.rand(10))

$ cat test_logging_levels.sh

PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser"    python test.py 2>&1 | grep DUMP   >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser"    python test.py 2>&1 | grep UPDATE >& /dev/null && echo FAIL || echo OK
PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser"    python test.py 2>&1 | grep DEBUG  >& /dev/null && echo FAIL || echo OK

PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser"   python test.py 2>&1 | grep DUMP   >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser"   python test.py 2>&1 | grep UPDATE >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser"   python test.py 2>&1 | grep DEBUG  >& /dev/null && echo FAIL || echo OK

PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser"  python test.py 2>&1 | grep DUMP   >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser"  python test.py 2>&1 | grep UPDATE >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser"  python test.py 2>&1 | grep DEBUG  >& /dev/null && echo OK || echo FAIL
```

Before this change:
```
OK
FAIL
OK
OK
OK
FAIL
OK
OK
OK
```

With this change everthing passes.

Differential Revision: D23666813

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: 4adaa5a3d06deadf54eae014a0d76588cdc5e20a
2020-09-13 11:29:25 -07:00
Bert Maher
6d4a605ce9 Fix bug simplifying if-then-else when it can be removed (#44462)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44462

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23671157

Pulled By: bertmaher

fbshipit-source-id: b9b92ad0de1a7bd9bc1fcac390b542d885d0ca58
2020-09-13 10:29:28 -07:00
Mike Ruberry
7e91728f68 Deprecates calling linspace and logspace without setting steps explicitly (#43860)
Summary:
**BC-breaking note**

This change is BC-breaking for C++ callers of linspace and logspace if they were providing a steps argument that could not be converted to an optional.

**PR note**

This PR deprecates calling linspace and logspace wihout setting steps explicitly by:

- updating the documentation to warn that not setting steps is deprecated
- warning (once) when linspace and logspace are called without steps being specified

A test for this behavior is added to test_tensor_creation_ops. The warning only appears once per process, however, so the test would pass even if no warning were thrown. Ideally there would be a mechanism to force all warnings, include those from TORCH_WARN_ONCE, to trigger.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43860

Reviewed By: izdeby

Differential Revision: D23498980

Pulled By: mruberry

fbshipit-source-id: c48d7a58896714d184cb6ff2a48e964243fafc90
2020-09-13 06:09:19 -07:00
Yi Wang
82b4477948 Pass the input tensor vector by const reference. (#44340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44340

Changed the constructor of GradBucket to pass the input by const
reference and hence avoided unnecessary explicit move semantics. Since
previously the declaration and definition are separated, passing the input
tensor vector by value looks quite bizarre.

Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest

Reviewed By: pritamdamania87

Differential Revision: D23569939

fbshipit-source-id: db761d42e76bf938089a0b38e98e76a05bcf4162
2020-09-11 18:03:56 -07:00
Yi Wang
ab5fee2784 Move the inline implementations of GradBucket class to the header. (#44339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44339

Moved the inline implementations of GradBucket class to the header for
succinctness and readability. This coding style is also consistent with
reducer.h under the same directory.

Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest

Reviewed By: pritamdamania87

Differential Revision: D23569701

fbshipit-source-id: 237d9e2c5f63a6bcac829d0fcb4a5ba3bede75e5
2020-09-11 18:01:37 -07:00
Elias Ellison
1f0dcf39fc [JIT] dont optimize device dtype on inline (#43363)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/36404

Adding prim::device and prim::dtype to list of skipped peepholes when we run inlining. In the long term another fix may not be to encode shape / dtype info on the traced graph, because it is not guaranteed to be correct. This is blocked by ONNX currently.

Partial fix for https://github.com/pytorch/pytorch/issues/43134

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43363

Reviewed By: glaringlee

Differential Revision: D23383987

Pulled By: eellison

fbshipit-source-id: 2e9c5160d39d690046bd9904be979d58af8d3a20
2020-09-11 17:29:54 -07:00
Mikhail Zolotukhin
d729e2965e [TensorExpr] Do not inline autodiff graphs if they contain prim::TypeCheck nodes. (#44564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44564

Before this change we sometimes inlined autodiff subgraph containing
fusion groups. This happened because we didn't look for 'unsupported'
nodes recursively (maybe we should), but fusion groups were inside
if-nodes.

The problem was detected by bertmaher in 'LearningToPaint' benchmark
investigation where this bug caused us to keep constantly hitting
fallback paths of the graph.

Test Plan: Imported from OSS

Reviewed By: bwasti

Differential Revision: D23657049

Pulled By: ZolotukhinM

fbshipit-source-id: 7c853424f6dce4b5c344d6cd9c467ee04a8f167e
2020-09-11 17:28:53 -07:00
Nick Gibson
64b4307d47 [NNC] Cuda Codegen - mask loops bound to block/thread dimensions (#44325)
Summary:
Fix an issue where loops of different sizes are bound to the same Cuda dimension / metavar.

Coming soon more info and tests...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44325

Reviewed By: colesbury

Differential Revision: D23628859

Pulled By: nickgg

fbshipit-source-id: 3621850a4cc38a790b62ad168d32e7a0e2462fad
2020-09-11 16:48:16 -07:00
Nikita Shulga
2ae74c0632 Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453)
Summary:
2nd attempt to land https://github.com/pytorch/pytorch/pull/44079

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453

Reviewed By: walterddr, seemethere

Differential Revision: D23619528

Pulled By: malfet

fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7
2020-09-11 16:27:47 -07:00
Jerry Zhang
b6f0ea0c71 [quant][graphmode][fx][fix] Remove qconfig in convert (#44526)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44526

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23641960

fbshipit-source-id: 546da1c16694d1e1dfb72629085acaae2165e759
2020-09-11 15:51:47 -07:00
Jerry Zhang
a82ea6a91f [quant][graphmode][fx][fix] Support None qconfig in convert (#44524)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44524

None qconfig is not handled previously
closes: https://github.com/pytorch/pytorch/issues/44438

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23640269

fbshipit-source-id: 8bfa88c8c78d4530338d9d7fa9669876c386d91f
2020-09-11 15:22:25 -07:00
Zafar
1fb5883072 removing conv filters from conv pattern matching (#44512)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44512

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23637409

Pulled By: z-a-f

fbshipit-source-id: ad5be0fa6accfbcceaae9171bf529772d87b4098
2020-09-11 15:16:29 -07:00
Wanchao Liang
ab6126b50e [rpc][jit] support remote call in TorchScript (#43046)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43046

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23621108

Pulled By: wanchaol

fbshipit-source-id: e8152c6cdd3831f32d72d46ac86ce22f3f13c651
2020-09-11 14:59:51 -07:00
Wanchao Liang
3e5df5f216 [rpc][jit] support rpc_sync in TorchScript (#43043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43043

This add the support for rpc_sync in TorchScript in a way similar to
rpc_async

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23252039

Pulled By: wanchaol

fbshipit-source-id: 8a05329cb8a24079b2863178b73087d47273914c
2020-09-11 14:59:47 -07:00
Wanchao Liang
8bec7cfa91 [rpc] rename some functions (#43042)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43042

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23228894

Pulled By: wanchaol

fbshipit-source-id: 3702b7826ecb455073fabb9dc5dca804c0e092b2
2020-09-11 14:58:39 -07:00
Vasiliy Kuznetsov
70dfeb44bd MinMax based observers: respect device affinity for state_dict (#44537)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44537

Originally, the `min_val`, `max_val`, `min_vals`, `max_vals`
attributes of observers were Tensors but not buffers.  They had custom
state_dict save/load code to ensure their state was saved.

At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:
* create model A, move it to a device (cpu/cuda) and save its state_dict
* create model B, load its state dict.
* `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device
* the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices)

In practice, the case people would sometimes hit is:
* model A is on CPU, state dict is saved
* model B is created and moved to GPU, state_dict from model A is loaded
* assertions throw when operations are attempted across different devices

This PR fixes the behavior by removing the custom save/load where
possible and letting the default `nn.Module` save/load code handle
device assignment.  We special case `PerChannelMinMaxObserver` and its
children to allow for loading buffers or different size, which is
normal.

There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.

Test Plan:
```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D23644493

fbshipit-source-id: 0dbb6aa309ad569a91a663b9ee7e44644080032e
2020-09-11 14:48:56 -07:00
Gregory Chanan
192c4111a3 Simplify target handling in nn gradcheck. (#44507)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44507

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23635799

Pulled By: gchanan

fbshipit-source-id: 75090d6a48771e5c92e737a0829fbfa949f7c8a7
2020-09-11 13:25:59 -07:00
Gregory Chanan
5579b53a7f Fix SmoothL1Loss when target.requires_grad is True. (#44486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44486

SmoothL1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True.

This PR does the following:

1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the SmoothL1Loss CriterionTests to verify that the target derivative is checked.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23630699

Pulled By: gchanan

fbshipit-source-id: 0f94d1a928002122d6b6875182867618e713a917
2020-09-11 12:13:36 -07:00
Cheng Chang
b7ef4eec46 [NNC] Add loop slicing transforms (#43854)
Summary:
Add new transforms `sliceHead` and `sliceTail` to `LoopNest`, for example:

Before transformation:
```
for x in 0..10:
  A[x] = x*2
```

After `sliceHead(x, 4)`:

```
for x in 0..4:
  A[x] = x*2
for x in 4..10:
  A[x] = x*2
```

After `sliceTail(x, 1)`:
```
for x in 0..4:
  A[x] = x*2
for x in 4..9:
  A[x] = x*2
for x in 9..10:
  A[x] = x*2
```

`sliceHead(x, 10)` and `sliceTail(x, 10)` is no-op.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43854

Test Plan: Tests are added in `test_loopnest.cpp`, the tests cover the basic transformations, and also tests the combination with other transformations such as `splitWithTail`.

Reviewed By: nickgg

Differential Revision: D23417366

Pulled By: cheng-chang

fbshipit-source-id: 06c6348285f2bafb4be3286d1642bfbe1ea499bf
2020-09-11 12:09:12 -07:00
Jerry Zhang
11fb51d093 [quant][graphmode][fx][fix] Support dictionary output (#44508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44508

Bug fix for dictionary output

Test Plan: Imported from OSS

Reviewed By: z-a-f

Differential Revision: D23636182

fbshipit-source-id: 0c00cd6b9747fa3f8702d7f7a0d5edb31265f466
2020-09-11 11:29:20 -07:00
Ann Shan
442957d8b6 [pytorch] Remove mobile nonvariadic run_method (#44235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44235

Removes nonvariadic run_method() from mobile Module entirely (to be later replaced by a variadic version). All use cases should have been migrated to use get_method() and Method::operator() in D23436351
ghstack-source-id: 111848220

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D23484577

fbshipit-source-id: 602fcde61e13047a34915b509da048b9550103b1
2020-09-11 10:23:08 -07:00
Ann Shan
a61318a535 [pytorch] Replace mobile run_method with get_method and operator() (#44202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202

In preparation for changing mobile run_method() to be variadic, this diff:

* Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist.
* Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects.
ghstack-source-id: 111848222

Test Plan: CI, and all the unit tests which currently contain run_method that are being changed.

Reviewed By: iseeyuan

Differential Revision: D23436351

fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577
2020-09-11 10:23:06 -07:00
Guilherme Leobas
cdf5e2ae86 add typing annotations for a few torch.utils.* modules (#43806)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43431. Depends on [gh-43862](https://github.com/pytorch/pytorch/pull/43862) (EDIT: now merged)

Modules:
- torch.utils.mkldnn
- torch.utils.mobile_optimizer
- torch.utils.bundled_inputs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43806

Reviewed By: gmagogsfm

Differential Revision: D23635151

Pulled By: SplitInfinity

fbshipit-source-id: a85b75a7927dde6cc55bcb361f8ff601ffb0b2a1
2020-09-11 10:20:55 -07:00
David Reiss
7d78a6fcdd Update interpolate to use new upsample overloads (#43025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43025

- Use new overloads that better reflect the arguments to interpolate.
- More uniform interface for upsample ops allows simplifying the Python code.
- Also reorder overloads in native_functions.yaml to give them priority.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/37177

ghstack-source-id: 106938111

Test Plan:
test_nn has pretty good coverage.

Relying on CI for ONNX, etc.

Didn't test FC because this change is *not* forward compatible.

To ensure backwards compatibility, I ran this code before this change

```python
def test_func(arg):
    interp = torch.nn.functional.interpolate
    with_size = interp(arg, size=(16,16))
    with_scale = interp(arg, scale_factor=[2.1, 2.2], recompute_scale_factor=False)
    with_compute = interp(arg, scale_factor=[2.1, 2.2])
    return (with_size, with_scale, with_compute)

traced_func = torch.jit.trace(test_func, torch.randn(1,1,1,1))

sample = torch.randn(1, 3, 7, 7)
output = traced_func(sample)

assert not torch.allclose(output[1], output[2])

torch.jit.save(traced_func, "model.pt")
torch.save((sample, output), "data.pt")
```

then this code after this change

```python
model = torch.jit.load("model.pt")
sample, golden = torch.load("data.pt")
result = model(sample)
for r, g in zip(result, golden):
    assert torch.allclose(r, g)
```

Reviewed By: AshkanAliabadi

Differential Revision: D21209991

fbshipit-source-id: 5b2ebb7c3ed76947361fe532d1dbdd6faa3544c8
2020-09-11 09:59:14 -07:00
Gregory Chanan
3de2c0b42f Fix L1Loss when target.requires_grad is True. (#44471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44471

L1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True.

This PR does the following:

1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the L1Loss CriterionTests to verify that the target derivative is checked.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23626008

Pulled By: gchanan

fbshipit-source-id: 2828be16b56b8dabe114962223d71b0e9a85f0f5
2020-09-11 09:51:16 -07:00
Martin Yuan
b73b44f976 [PyTorch Mobile] Move some string ops to register_prim_ops.cpp and make them selective (#44500)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44500

Some user models are using those operators. Unblock them while keep the ops selective.

Test Plan: CI

Reviewed By: linbinyu

Differential Revision: D23634769

fbshipit-source-id: 55841d1b07136b6a27b6a39342f321638dc508cd
2020-09-11 09:24:35 -07:00
Rohan Varma
567c51cce9 In common_distributed, fix TEST_SKIPS multiprocessing manager (#44525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44525

Since `TEST_SKIPS` is a global multiprocessing.manager, this was causing
issues when one test would fail and make the rest of the tests fail during
setup due to networking errors.

See the failed CI job: https://app.circleci.com/pipelines/github/pytorch/pytorch/212491/workflows/0450151d-ca09-4cf6-863d-272de6ed917f/jobs/7389065 for an example, where `test_ddp_backward` failed but then caused the rest of the tests to fail at the line `test_skips.update(TEST_SKIPS)`.

To fix this issue, at the end of every test we revert `TEST_SKIPS` back to a regular dict, and redo the conversion to a `mulitiprocessing.Manager` in the next test, which prevents these errors.
ghstack-source-id: 111844724

Test Plan: CI

Reviewed By: malfet

Differential Revision: D23641618

fbshipit-source-id: 27ce823968ece9804bb4dda898ffac43ef732b89
2020-09-11 09:16:33 -07:00
Gregory Chanan
d07d25a8c5 Fix MSELoss when target.requires_grad is True. (#44437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44437

MSELoss had a completely different (and incorrect, see https://github.com/pytorch/pytorch/issues/43228) path when target.requires_grad was True.

This PR does the following:
1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the MSELoss CriterionTests to verify that the target derivative is checked.

TODO:
1) do we still need check_criterion_jacobian when we run grad/gradgrad checks?
2) ensure the Module tests check when target.requires_grad
3) do we actually test when reduction='none' and reduction='mean'?

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23612166

Pulled By: gchanan

fbshipit-source-id: 4f74d38d8a81063c74e002e07fbb7837b2172a10
2020-09-11 08:51:28 -07:00
Shen Li
a9754fb860 Use TP Tensor.metadata to carry device info (#44396)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44396

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D23602576

Pulled By: mrshenli

fbshipit-source-id: c639789979b2b71fc165efbcf70f37b4c39469df
2020-09-11 08:33:22 -07:00
Shen Li
f44de7cdc3 Add missing rpc.shutdown() (#44417)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44417

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D23626208

Pulled By: mrshenli

fbshipit-source-id: 4ff8cad0e1193f99518804c21c9dd26ae718f4eb
2020-09-11 08:32:15 -07:00
lixinyu
77cc7d1ecd C++ APIs Transformer NN Module Top Layer (#44333)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44333

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23584010

Pulled By: glaringlee

fbshipit-source-id: 990026e3f1b5ae276776e344ea981386cb7528fe
2020-09-11 08:25:27 -07:00
Tongzhou Wang
09892de815 Clarify track_running_stats docs; Make SyncBatchNorm track_running_stats behavior consistent (#44445)
Summary:
context: https://github.com/pytorch/pytorch/pull/38084

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44445

Reviewed By: colesbury

Differential Revision: D23634216

Pulled By: mrshenli

fbshipit-source-id: d1242c694dec0e7794651f8031327625eb9989ee
2020-09-11 08:20:34 -07:00
Nick Gibson
30fccc53a9 [NNC] Don't attempt to refactor conditional scalars (#44223)
Summary:
Fixes a bug in the NNC registerizer for Cuda where it would hoist reads out of a conditional context when trying to cache them. As a quick fix, prevent scalar replacement if a usage is within a condition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44223

Reviewed By: gchanan

Differential Revision: D23551247

Pulled By: nickgg

fbshipit-source-id: 17a7bf2be4c8c3dd8a9ab7997dce9aea200c3685
2020-09-11 04:22:16 -07:00
Zafar
c967e7724e [quant] conv_transpose1d_prepack / conv_transpose1d_unpack (#40360)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40360

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D22158982

Pulled By: z-a-f

fbshipit-source-id: 844d02806554aaa68b521283703e630cc544d419
2020-09-11 04:12:28 -07:00
Elias Ellison
8b8986662f [JIT] Remove profiling nodes in autodiff forward graph (#44420)
Summary:
Previously we were not removing profiling nodes in graphs that required grad and contained diff graphs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44420

Reviewed By: bertmaher

Differential Revision: D23607482

Pulled By: eellison

fbshipit-source-id: af095f3ed8bb3c5d09610f38cc7d1481cbbd2613
2020-09-11 02:59:39 -07:00
Mikhail Zolotukhin
c6febc6480 [JIT] Add a python hook for a function to interpret JIT graphs. (#44493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44493

This function allows to execute a graph exactly as it is, without going
through a graph executor which would run passes on the graph before
interpreting it. I found this feature extremely helpful when I worked on
a stress-testing script to shake out bugs from the TE fuser: I needed to
execute a very specific set of passes on a graph and nothing else, and
then execute exactly it.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23632505

Pulled By: ZolotukhinM

fbshipit-source-id: ea81fc838933743e2057312d3156b77284d832ef
2020-09-11 02:55:26 -07:00
Pritam Damania
51ed31269e Replace FutureMessage with c10::ivalue::Future in DistEngine. (#44239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44239

As part of https://github.com/pytorch/pytorch/issues/41574, use
c10::ivalue::Future everywhere in DistEngine.
ghstack-source-id: 111645070

Test Plan: waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D23553507

fbshipit-source-id: 1b51ba13d1ebfa6c5c70b12028e9e96ce8ba51ff
2020-09-11 01:03:42 -07:00
Jerry Zhang
0c58a017bd [quant][eagermode][refactor] Add set/get method for quantization and fusion mappings (#43990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43990

Allow user to register custom quantization and fusion patterns

Test Plan: Imported from OSS

Reviewed By: z-a-f

Differential Revision: D23485344

fbshipit-source-id: 4f0174ee6d8000d83de0f73cb370e9a1941d54aa
2020-09-10 21:29:39 -07:00
Omkar Salpekar
f7278473d3 [NCCL] Fix NCCL_BLOCKING_WAIT functionality with Async Error Handling (#44411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44411

This basically aborts errored NCCL communicators if either blocking
wait or async error handling is enabled. Otherwise we may abort nccl
communicators where neither are enabled, and this may result in subsequent GPU
operations using corrupted data.
ghstack-source-id: 111839264

Test Plan: Succesful Flow run: f217591683

Reviewed By: jiayisuse

Differential Revision: D23605382

fbshipit-source-id: 6c16f9626362be3b0ce2feaf0979b2dff97ce61b
2020-09-10 20:57:55 -07:00
Richard Zou
69f6d94caa Register diag_backward, diagonal_backward, infinitetely...gelu_backward as operators (#44422)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44422

See #44052 for context.

Test Plan:
- `pytest test/test_autograd.py -v`
- `pytest test/test_nn.py -v`

Reviewed By: mrshenli

Differential Revision: D23607691

Pulled By: zou3519

fbshipit-source-id: 09fbcd66b877af4fa85fd9b2f851ed3912ce84d6
2020-09-10 18:43:18 -07:00
Richard Zou
7ff7e6cfc8 Register cummaxmin_backward, cumprod_backward as operators (#44410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44410

See #44052 for context. One of the cumprod_backward overloads was unused
so I just deleted it.

Test Plan: - `pytest test/test_autograd.py -v`

Reviewed By: mrshenli

Differential Revision: D23605503

Pulled By: zou3519

fbshipit-source-id: f9c5b595e62d2d6e71f26580ba96df15cc9de4f7
2020-09-10 18:43:15 -07:00
Richard Zou
08b431f54c Add trace_backward, masked_select_backward, and take_backward as ops (#44408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44408

See #44052 for context.

Test Plan: - `pytest test/test_autograd.py -v`

Reviewed By: mrshenli

Differential Revision: D23605504

Pulled By: zou3519

fbshipit-source-id: b9b1646d13caa6e536d08669c29bfc2ad8ff89a3
2020-09-10 18:41:07 -07:00
Rohan Varma
41f62b17e7 Fix DDP join() API in the case of model.no_sync() (#44427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44427

Closes https://github.com/pytorch/pytorch/issues/44425

DDP join API currently does not work properly with `model.no_sync()`, see https://github.com/pytorch/pytorch/issues/44425 for details. This PR fixes the problem via the approach mentioned in the issue, namely scheduling an allreduce that tells joined ranks whether to sync in the backwards pass or not. Tests are added for skipping gradient synchronization for various `sync_interval`s.
ghstack-source-id: 111786479

Reviewed By: pritamdamania87

Differential Revision: D23609070

fbshipit-source-id: e8716b7881f8eee95e3e3499283e716bd3d7fe76
2020-09-10 18:31:40 -07:00
Mike Ruberry
c48f511c7e Moves some of TestTorchMathOps to OpInfos (#44277)
Summary:
This PR fixes three OpInfo-related bugs and moves some functions from TestTorchMathOps to be tested using the OpInfo pattern. The bugs are:

- A skip test path in test_ops.py incorrectly formatted its string argument
- Decorating the tests in common_device_type.py was incorrectly always applying decorators to the original test, not the op-specific variant of the test. This could cause the same decorator to be applied multiple times, overriding past applications.
- make_tensor was incorrectly constructing tensors in some cases

The functions moved are:

- asin
- asinh
- sinh
- acosh
- tan
- atan
- atanh
- tanh
- log
- log10
- log1p
- log2

In a follow-up PR more or all of the remaining functions in TestTorchMathOps will be refactored as OpInfo-based tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44277

Reviewed By: mrshenli, ngimel

Differential Revision: D23617361

Pulled By: mruberry

fbshipit-source-id: edb292947769967de9383f6a84eb327f027509e0
2020-09-10 17:31:50 -07:00
Mehdi Mirzazadeh
2e744b1820 Support work.result() to get result tensors for allreduce for Gloo, NCCL backends (#43970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43970

It is resubmition of #43386

Original commit changeset: 27fbeb161706
ghstack-source-id: 111775070

Test Plan:
Added checks to existing unit test and ran it on gpu devserver.
Verified the test that was failing in original diff also passes: https://app.circleci.com/pipelines/github/pytorch/pytorch/210229/workflows/86bde47b-f2da-48e3-a618-566ae2713102/jobs/7253683

Reviewed By: pritamdamania87

Differential Revision: D23455047

fbshipit-source-id: b8dc4a30b95570d68a482c19131674fff2a3bc7c
2020-09-10 17:13:37 -07:00
Ann Shan
1dd3fae3d2 [pytorch] Add logging to mobile Method run (#44234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44234

Changes mobile Method to point to a mobile Module directly instead of the Module ivalue in order to access metadata for logging/debugging, and then adds said logging.
ghstack-source-id: 111775806

Test Plan:
CI/existing unit tests to test BC
Testing fb4a logging:
Built fb4a on D23436351 (because usage of run_method isn't replaced yet in this diff), and then checked the Scuba logs to see that the appropriate ad clicks were logged (one ad for Buzzfeed shopping and another about Netflix from Bustle)

{F328510687}
{F328511201}
[Scuba sample of QPL metrics](https://www.internalfb.com/intern/scuba/query/?dataset=qpl_metrics%2Fpytorch_employee&pool=uber&view=samples_client&drillstate=%7B%22sampleCols%22%3A[%22device_model%22%2C%22instance_id_sampled%22%2C%22method%22%2C%22ios_device_class%22%2C%22points_path%22%2C%22userid_sampled%22%2C%22client_sample_rate%22%2C%22browser_name%22%2C%22ios_device_name%22%2C%22points%22%2C%22is_employee%22%2C%22is_test_user%22%2C%22network_only_queries%22%2C%22annotations%22%2C%22oncall_shortname%22%2C%22environment_tags%22%2C%22revoked_queries%22%2C%22annotations_bool%22%2C%22points_data%22%2C%22annotations_double_array%22%2C%22annotations_string_array%22%2C%22revoked_steps%22%2C%22points_set%22%2C%22device_os_version%22%2C%22ota_version_rollout%22%2C%22steps%22%2C%22vadar_calculation_result%22%2C%22app_name%22%2C%22client_push_phase%22%2C%22vadar%22%2C%22release_channel%22%2C%22interaction_class%22%2C%22exposures%22%2C%22annotations_double%22%2C%22deviceid_sampled%22%2C%22is_logged_in%22%2C%22device_os%22%2C%22time%22%2C%22major_os_ver%22%2C%22annotations_int_array%22%2C%22duration_ns%22%2C%22app_build%22%2C%22bucket_id%22%2C%22cache_and_network_queries%22%2C%22value%22%2C%22vadar_v2%22%2C%22quicklog_event%22%2C%22unixname%22%2C%22vadar_calculation_result_v2%22%2C%22trace_tags%22%2C%22annotations_int%22%2C%22quicklog_module%22%2C%22push_phase%22%2C%22year_class%22%2C%22country%22%2C%22capped_duration%22%2C%22ram_class%22%2C%22weight%22%2C%22carrier%22%2C%22app_id%22%2C%22app_version%22%2C%22react_bundle_version%22%2C%22logging_source%22%2C%22is_unsampled_for_scuba%22%2C%22instrumentation_errors%22%2C%22android_cpu_abi_list%22%2C%22days_after_release%22%2C%22cpu_cores%22%2C%22user_bucket%22%2C%22quicklog_action%22%2C%22server_scuba_sample_rate%22%2C%22points_vector%22%2C%22annotations_bool_array%22%2C%22android_device_class%22%2C%22browser_full_version%22%2C%22major_app_ver%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22hideEmptyColumns%22%3Afalse%2C%22focused_event%22%3A%22%22%2C%22show_metadata%22%3A%22false%22%2C%22start%22%3A%222020-09-08%2011%3A27%3A00%22%2C%22end%22%3A%22start%20%2B%201%20minute%22%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22samplingRatio%22%3A%221%22%2C%22num_samples%22%3A%22100%22%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[]%2C%22modifiers%22%3A[]%2C%22order%22%3A%22none%22%2C%22order_desc%22%3Atrue%2C%22filterMode%22%3A%22DEFAULT%22%2C%22constraints%22%3A[[%7B%22column%22%3A%22quicklog_event%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22MOBILE_MODULE_STATS%5C%22]%22]%7D%2C%7B%22column%22%3A%22userid_sampled%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22100013484978975%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22metrik_view_params%22%3A%7B%22should_use_legacy_colors%22%3Afalse%2C%22columns_skip_formatting%22%3A[]%2C%22view%22%3A%22samples_client%22%2C%22width%22%3A%221358%22%2C%22height%22%3A%22912%22%2C%22tableID%22%3A%22qpl_metrics%2Fpytorch_employee%22%2C%22fitToContent%22%3Afalse%2C%22format_tooltip_in_percent%22%3Afalse%2C%22use_y_axis_hints_as_limits%22%3Atrue%2C%22has_dynamic_context_menu%22%3Atrue%2C%22has_context_menu%22%3Afalse%2C%22legend_mode%22%3A%22nongrid%22%2C%22connect_nulls%22%3Atrue%2C%22timezone_offset%22%3A420%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22y_min_hint%22%3A0%2C%22should_render_plugins_menu%22%3Afalse%7D%7D&normalized=1599581160)
[Scuba sample showing ad source; just the bottom two results](https://www.internalfb.com/intern/scuba/query/?dataset=business_integrity_webpage_semantic&pool=uber&drillstate=%7B%22sampleCols%22%3A[%22from_custom_sampling%22%2C%22data_version%22%2C%22scribe_category_type%22%2C%22page_id%22%2C%22name%22%2C%22source_url%22%2C%22time%22%2C%22title_semantic%22%2C%22major_version%22%2C%22server_protocol%22%2C%22custom_sampling_enabled%22%2C%22ad_id%22%2C%22appversion%22%2C%22clienttime%22%2C%22isemployee%22%2C%22title%22%2C%22images%22%2C%22weight%22%2C%22carrier%22%2C%22is_ad%22%2C%22locale%22%2C%22appid%22%2C%22ip_country%22%2C%22iab_models%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22return_remainder%22%3Afalse%2C%22should_pivot%22%3Afalse%2C%22is_timeseries%22%3Afalse%2C%22hideEmptyColumns%22%3Afalse%2C%22main_dimension%22%3A%22time%22%2C%22start%22%3A%22-5%20minutes%22%2C%22samplingRatio%22%3A%221%22%2C%22compare%22%3A%22none%22%2C%22axes%22%3A%22linked%22%2C%22overlay_types%22%3A[]%2C%22minBucketSamples%22%3A%22%22%2C%22dimensions%22%3A[]%2C%22scale_type%22%3A%22absolute%22%2C%22num_samples%22%3A%22100%22%2C%22metric%22%3A%22avg%22%2C%22fill_missing_buckets%22%3A%22connect%22%2C%22smoothing_bucket%22%3A%221%22%2C%22top%22%3A%227%22%2C%22markers%22%3A%22%22%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22end%22%3A%22now%22%2C%22show_p95_ci%22%3Afalse%2C%22time_bucket%22%3A%22auto%22%2C%22compare_mode%22%3A%22normal%22%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[]%2C%22modifiers%22%3A[]%2C%22order%22%3A%22none%22%2C%22order_desc%22%3Atrue%2C%22filterMode%22%3A%22DEFAULT%22%2C%22constraints%22%3A[[%7B%22column%22%3A%22major_version%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22288%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22metrik_view_params%22%3A%7B%22should_use_legacy_colors%22%3Afalse%2C%22columns_skip_formatting%22%3A[]%2C%22view%22%3A%22time_view%22%2C%22width%22%3A%221358%22%2C%22height%22%3A%22912%22%2C%22tableID%22%3A%22business_integrity_webpage_semantic%22%2C%22fitToContent%22%3Afalse%2C%22format_tooltip_in_percent%22%3Afalse%2C%22use_y_axis_hints_as_limits%22%3Atrue%2C%22has_dynamic_context_menu%22%3Atrue%2C%22has_context_menu%22%3Afalse%2C%22legend_mode%22%3A%22nongrid%22%2C%22connect_nulls%22%3Atrue%2C%22timezone_offset%22%3A420%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22y_min_hint%22%3A0%2C%22should_render_plugins_menu%22%3Afalse%7D%7D&view=samples_client&normalized=1599587280)

Reviewed By: iseeyuan

Differential Revision: D23548687

fbshipit-source-id: 3e63085663f5fd8de90a4c7dbad0a17947aee973
2020-09-10 15:26:33 -07:00
Pritam Damania
a2a81e1335 Add a CONTRIBUTING.md for the distributed package. (#44224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44224

The purpose of this file is to help developers on PT distributed get
upto speed on the code structure and layout for PT Distributed.
ghstack-source-id: 111644842

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D23548377

fbshipit-source-id: 561d5b8e257642de172def8fdcc1311fae20690b
2020-09-10 14:58:00 -07:00
Nikita Shulga
4bead6438a Enable torch.autograd typechecks (#44451)
Summary:
To help with further typing, move dynamically added native contributions from `torch.autograd` to `torch._C._autograd`
Fix invalid error handling pattern in
89ac30afb8/torch/csrc/autograd/init.cpp (L13-L15)
`PyImport_ImportModule` already raises Python exception and nullptr should be returned to properly propagate the to Python runtime.

And all native methods/types in `torch/autograd/__init.py` after `torch._C._init_autograd()` has been called
Use f-strings instead of `.format` in test_type_hints.py
Fixes https://github.com/pytorch/pytorch/issues/44450

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44451

Reviewed By: ezyang

Differential Revision: D23618261

Pulled By: malfet

fbshipit-source-id: fa5f739d7cff8410641128b55b810318c5f636ae
2020-09-10 13:37:29 -07:00
Elias Ellison
cc5a1cf616 [JIT] Erase shapes before fallback graph (#44434)
Summary:
Previously the specialized types were copied over to the fallback function, although the tensors in the fallback type were not of that type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44434

Reviewed By: SplitInfinity

Differential Revision: D23611943

Pulled By: eellison

fbshipit-source-id: 2ea88a97529409f6c5c4c1f59a14b623524933de
2020-09-10 12:07:31 -07:00
Yi Wang
38c10b4f30 [NCCL] Fix the initialization of futureNCCLCallbackStreams (#44347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44347

Cloned from Pull Request resolved: https://github.com/pytorch/pytorch/pull/44097, because the original author Sinan has completed the internship and now is unable to submit this diff.

As johnsonpaul mentioned in D23277575 (7d517cf96f). It looks like all processes were allocating memory on GPU-ID=0.

I was able to reproduce it by running `test_ddp_comm_hook_allreduce_with_then_hook_nccl` unit test of `test_c10d.py` and running `nvidia-smi` while test was running. The issue was reproduced as:
```
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0   3132563      C   python                                       777MiB |
|    0   3132564      C   python                                       775MiB |
|    4   3132564      C   python                                       473MiB |
+-----------------------------------------------------------------------------+
```
I realized that as we initialize ProcessGroupNCCL both processes were initially allocating memory on GPU 0.

We later also realized that I forgot `isHighPriority` input of `getStreamFromPool` and `futureNCCLCallbackStreams_.push_back(std::make_shared<at::cuda::CUDAStream>(at::cuda::getStreamFromPool(device_index)));` was just creating a vector of GPU 0 streams. As i changed `at::cuda::getStreamFromPool(device_index)` to `at::cuda::getStreamFromPool(false, device_index)`. `nvidia-smi` looked like:
```
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    673925      C   python                                       771MiB |
|    0    673926      C   python                                       771MiB |
|    1    673925      C   python                                       771MiB |
|    1    673926      C   python                                       771MiB |
|    2    673925      C   python                                       771MiB |
|    2    673926      C   python                                       771MiB |
|    3    673925      C   python                                       771MiB |
|    3    673926      C   python                                       771MiB |
|    4    673925      C   python                                       771MiB |
|    4    673926      C   python                                       771MiB |
|    5    673925      C   python                                       771MiB |
|    5    673926      C   python                                       771MiB |
|    6    673925      C   python                                       771MiB |
|    6    673926      C   python                                       771MiB |
|    7    673925      C   python                                       707MiB |
|    7    673926      C   python                                       623MiB |
+-----------------------------------------------------------------------------+
```
This confirms that we were just getting GPU 0 streams for the callback. I think this does not explain the `fp16_compress` stability issue, because we were able to reproduce that even without any then callback and just calling copy from fp32 to fp16 before allreduce. However, this can explain other issues where `allreduce` was not on par with `no_hook`. I'll run some additional simulations with this diff.

I tried to to replace `getStreamFromPool` by `getDefaultCUDAStream(deviceIndex)` and it wasn't causing additional memory usage. In this diff, I temporarily solved the issue by just initializing null pointers for each device in the constructor and setting the callback stream for corresponding devices inside `ProcessGroupNCCL::getNCCLComm`. After the fix it looks like the memory issue was resolved:
```
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0   2513142      C   python                                       745MiB |
|    4   2513144      C   python                                       747MiB |
+-----------------------------------------------------------------------------+
```
I could use a dictionary instead of a vector for `futureNCCLCallbackStreams_`, but since number of devices is fixed, I think it isn't necessary. Please let me know what you think in the comments.
ghstack-source-id: 111485483

Test Plan:
`test_c10d.py` and some perf tests. Also check `nvidia-smi` while running tests to validate memory looks okay.

This diff also fixes the regression in HPC tests as we register a hook:

{F322730175}

See https://fb.quip.com/IGuaAbD8 (474fdd7e2d)bnvy for details.

Reviewed By: pritamdamania87

Differential Revision: D23495436

fbshipit-source-id: ad08e1d94343252224595d7c8a279fe75e244822
2020-09-10 11:25:38 -07:00
Kenichi Maehashi
cb90fef770 Fix return value of PyErr_WarnEx ignored (SystemError) (#44371)
Summary:
This PR fixes unexpected `SystemError` when warnings are emitted and warning filters are set.

## Current behavior

```
$ python -Werror
>>> import torch
>>> torch.range(1, 3)
UserWarning: torch.range is deprecated in favor of torch.arange and will be removed in 0.5. Note that arange generates values in [start; end), not [start; end].

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: <built-in method range of type object at 0x7f38c7703a60> returned a result with an error set
```

## Expected behavior

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end).
```

## Note

Python exception must be raised if `PyErr_WarnEx` returns `-1` ([python docs](https://docs.python.org/3/c-api/exceptions.html#issuing-warnings)). This PR fixes warnings raised in the following code:
```py
import torch

torch.range(1, 3)
torch.autograd.Variable().volatile
torch.autograd.Variable().volatile = True
torch.tensor(torch.tensor([]))
torch.tensor([]).new_tensor(torch.tensor([]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44371

Reviewed By: mrshenli

Differential Revision: D23598410

Pulled By: albanD

fbshipit-source-id: 2fbcb13fe4025dbebaf1fd837d4c8e0944e05010
2020-09-10 10:15:21 -07:00
Hameer Abbasi
f9a0d0c21e Allow Tensor-likes in torch.autograd.gradcheck (#43877)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42942

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43877

Reviewed By: zou3519

Differential Revision: D23493257

Pulled By: ezyang

fbshipit-source-id: 6cdaabe17157b484e9491189706ccc15420ac239
2020-09-10 09:02:17 -07:00
Gregory Chanan
c8914afdfa Merge criterion_tests and new_criterion_tests. (#44398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44398

These end up executing the same tests, so no reason to have them separate.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23600855

Pulled By: gchanan

fbshipit-source-id: 0952492771498bf813f1bf8e1d7c8dce574ec965
2020-09-10 08:29:59 -07:00
Gregory Chanan
fa158c4ca6 Combine criterion and new criterion tests in test_jit. (#43958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43958

There is not any difference between these tests (I'm merging them), so let's merge them in the JIT as well.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23452337

Pulled By: gchanan

fbshipit-source-id: e6d13cdb164205eec3dbb7cdcd0052b02c961778
2020-09-10 08:28:14 -07:00
Gregory Chanan
af9cad761a Stop ignoring NotImplementedErrors in cuda CriterionTests. (#44381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44381

Perhaps this was necessary when the test was originally introduced, but it's difficult to figure out what is actually tested.  And I don't think we actually use NotImplementedErorrs.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23598646

Pulled By: gchanan

fbshipit-source-id: aa18154bfc4969cca22323e61683a301198823be
2020-09-10 08:18:33 -07:00
generatedunixname89002005287564
356aa54694 [Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D23621463

fbshipit-source-id: 1cd7e94e480c7073c9a0aad55aeba98de4b96164
2020-09-10 04:24:43 -07:00
Kurt Mohler
28a23fce4c Deprecate torch.norm and torch.functional.norm (#44321)
Summary:
Part of https://github.com/pytorch/pytorch/issues/24802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44321

Reviewed By: mrshenli

Differential Revision: D23617273

Pulled By: mruberry

fbshipit-source-id: 6f88b5cb097fd0acb9cf0e415172c5a86f94e9f2
2020-09-10 01:16:41 -07:00
Chris Huynh
7b547f086f To fix extra memory allocation when using circular padding (#39273)
Summary:
For fixing https://github.com/pytorch/pytorch/issues/39256

Pull Request resolved: https://github.com/pytorch/pytorch/pull/39273

Reviewed By: anjali411

Differential Revision: D23471811

Pulled By: mruberry

fbshipit-source-id: fb324b51baea765311715cdf14642b334f335733
2020-09-10 00:15:31 -07:00
Jeff Daily
65d4a6b7c0 [ROCm] fix cub hipify mappings (#44431)
Summary:
Fixes ROCm-specific workarounds introduced by https://github.com/pytorch/pytorch/issues/44259.  This adds new hipify mappings that properly handle cub outside of caffe2 sources.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44431

Reviewed By: mrshenli

Differential Revision: D23617417

Pulled By: ngimel

fbshipit-source-id: 5d16afb6b8e6ec5ed049c51571866b0878d534ca
2020-09-09 23:39:25 -07:00
Cheng Chang
28bd4929bd [NNC] Make it able to normalize loop with variable start (#44133)
Summary:
Loops with variable start can also be normalized.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44133

Test Plan: updated testNormalizeStartVariable.

Reviewed By: navahgar

Differential Revision: D23507097

Pulled By: cheng-chang

fbshipit-source-id: 4e9aad1cd4f4a839f59a00bf8ddf97637a1a6648
2020-09-09 23:05:57 -07:00
taiyuanz
c515881137 Add reset_grad() function (#44423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44423

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42754

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23010859

Pulled By: ngimel

fbshipit-source-id: 56eec43eba88b98cbf714841813977c68f983564
2020-09-09 22:05:45 -07:00
Meghan Lele
89ac30afb8 [JIT] Propagate type sharing setting to submodule compilation (#44226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44226

**Summary**
At present, the `share_types` argument to `create_script_module` is used
to decide whether to reuse a previously created type for a top-level
module that has not yet been compiled. However, that setting does not apply
to the compilation of submodules of the top-level module; types are
still reused if possible.

This commit modifies `create_script_module` so that the `share_types`
flag is honoured during submodule compilation as well.

**Test Plan**
This commit adds a unit test to `TestTypeSharing` that checks that
submodule types are not shared or reused when `share_types` is set to
`False`.

**Fixes**
This commit fixes #43605.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23602371

Pulled By: SplitInfinity

fbshipit-source-id: b909b8b6abbe3b4cb9be8319ac263ade90e83bd3
2020-09-09 20:06:35 -07:00
Meghan Lele
d3b6d5caf1 [JIT] Add support for del to TS classes (#44352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44352

**Summary**
This commit adds support for `del` with class instances. If a class
implements `__delitem__`, then `del class_instance[key]` is syntactic
sugar for `class_instance.__delitem__[key]`.

**Test Plan**
This commit adds a unit test to TestClassTypes to test this feature.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23603102

Pulled By: SplitInfinity

fbshipit-source-id: 28ad26ddc9a693a58a6c48a0e853a1c7cf5c9fd6
2020-09-09 19:52:35 -07:00
Omkar Salpekar
e028ad0762 Fix HashStoreTests and move to Gtest (#43384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43384

Much like the FileStoreTests, the HashStoreTests were also run in a single blob and threw exceptions upon failure. This modularizes the test by separating each function into separate gtest test cases.
ghstack-source-id: 111690834

Test Plan: Confirmed that the tests pass on devvm.

Reviewed By: jiayisuse

Differential Revision: D23257579

fbshipit-source-id: 7e821f0e9ee74c8b815f06facddfdb7dc2724294
2020-09-09 17:56:33 -07:00
Omkar Salpekar
69a3ff005d Modularize FileStoreTest and move to Gtest (#43383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43383

FileStore Test currently has a large blob of tests that throw
exceptions upon failure. This PR modularizes each test so they can run
independently, and migrates the framework to gtest.
ghstack-source-id: 111690831

Test Plan: Confirmed tests pass on devvm

Reviewed By: jiayisuse

Differential Revision: D22879473

fbshipit-source-id: 6fa5468e594a53c9a6b972757068dfc41645703e
2020-09-09 17:56:30 -07:00
Omkar Salpekar
a7fba7de22 Convert StoreTestUtils to Gtest (#43382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43382

StoreTestCommon defines standard helper functions that are used by all of our Store tests. These helpers currently throw exceptions upon failure, this PR changes them to use gtest assertions instead.
ghstack-source-id: 111690833

Test Plan: Tested the 2 PR's above this on devvm

Reviewed By: jiayisuse

Differential Revision: D22828156

fbshipit-source-id: 9e116cf2904e05ac0342a441e483501e00aad3dd
2020-09-09 17:55:25 -07:00
Elias Ellison
b69c28d02c Improving ModuleList indexing error msg (#43361)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/41946/, to suggest enumerating a module as an alternative if a user tries indexing into a modulelist/sequential with a non-integer literal

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43361

Reviewed By: mrshenli

Differential Revision: D23602388

Pulled By: eellison

fbshipit-source-id: 51fa28d5bc45720529b3d45e92d367ee6c9e3316
2020-09-09 16:22:57 -07:00
Elias Ellison
e0c65abd38 Revert D23568330: [pytorch][PR] Moves some of TestTorchMathOps to OpInfos
Test Plan: revert-hammer

Differential Revision:
D23568330 (a953a825cc)

Original commit changeset: 03e69fccdbfd

fbshipit-source-id: 04ec6843c5eb3c84ddf226dad0088172d9bed84d
2020-09-09 15:48:56 -07:00