Commit Graph

6361 Commits

Author SHA1 Message Date
Mikhail Zolotukhin
71e6ce6616 [JIT] Specialize AutogradZero: merge AutogradAnyNonZero and Not(AutogradAnyNonZero) checks into one. (#44987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44987

This PR introduces new `prim::AutogradAllZero` and
`prim::AutogradAllNonZero` ops that are used for a batch check for
multiple tensors. The specialize-autogradzero pass now generates one
check for all expected-to-be-undefined tensors, one check for all
expected-to-be-defined tensors, and a bunch of checks for size
parameters passed to `grad_sum_to_size` (this probably could be cleaned
up somehow as well in future).

An example of what we generated before this change:
```
%1626 : bool = prim::AutogradAnyNonZero(%0)
%1627 : bool = prim::AutogradAnyNonZero(%2)
%1628 : bool = aten::__not__(%1627)
%1629 : bool = prim::AutogradAnyNonZero(%3)
%1630 : bool = aten::__not__(%1629)
%1631 : bool = prim::AutogradAnyNonZero(%4)
%1632 : bool = aten::__not__(%1631)
%1633 : bool = prim::AutogradAnyNonZero(%5)
%1634 : bool = aten::__not__(%1633)
%1635 : bool = prim::AutogradAnyNonZero(%6)
%1636 : bool = aten::__not__(%1635)
%1637 : bool = prim::AutogradAnyNonZero(%7)
%1638 : bool = aten::__not__(%1637)
%1639 : bool = prim::AutogradAnyNonZero(%8)
%1640 : bool = aten::__not__(%1639)
%1641 : bool = prim::AutogradAnyNonZero(%9)
%1642 : bool = aten::__not__(%1641)
%1643 : bool = prim::AutogradAnyNonZero(%10)
%1644 : bool = aten::__not__(%1643)
%1645 : bool = prim::AutogradAnyNonZero(%11)
%1646 : bool = aten::__not__(%1645)
%1647 : bool = prim::AutogradAnyNonZero(%12)
%1648 : bool = aten::__not__(%1647)
%1649 : bool = prim::AutogradAnyNonZero(%13)
%1650 : bool = aten::__not__(%1649)
%1651 : bool = prim::AutogradAnyNonZero(%14)
%1652 : bool = aten::__not__(%1651)
%1653 : bool = prim::AutogradAnyNonZero(%15)
%1654 : bool = aten::__not__(%1653)
%1655 : bool = prim::AutogradAnyNonZero(%16)
%1656 : bool = aten::__not__(%1655)
%1657 : bool = prim::AutogradAnyNonZero(%17)
%1658 : bool = prim::AutogradAnyNonZero(%18)
%1659 : bool = prim::AutogradAnyNonZero(%19)
%1660 : bool = prim::AutogradAnyNonZero(%20)
%1661 : bool = aten::__is__(%self_size.16, %1625)
%1662 : bool = aten::__is__(%other_size.16, %1625)
%1663 : bool = aten::__is__(%self_size.14, %1625)
%1664 : bool = aten::__is__(%self_size.12, %1625)
%1665 : bool = prim::AutogradAnyNonZero(%ingate.7)
%1666 : bool = prim::AutogradAnyNonZero(%forgetgate.7)
%1667 : bool = prim::AutogradAnyNonZero(%cellgate.7)
%1668 : bool = prim::AutogradAnyNonZero(%30)
%1669 : bool = prim::AutogradAnyNonZero(%31)
%1670 : bool = aten::__is__(%self_size.10, %1625)
%1671 : bool = aten::__is__(%other_size.10, %1625)
%1672 : bool = prim::AutogradAnyNonZero(%34)
%1673 : bool = prim::AutogradAnyNonZero(%35)
%1674 : bool = aten::__is__(%self_size.8, %1625)
%1675 : bool = aten::__is__(%other_size.8, %1625)
%1676 : bool = aten::__is__(%self_size.6, %1625)
%1677 : bool = aten::__is__(%other_size.6, %1625)
%1678 : bool = prim::AutogradAnyNonZero(%outgate.7)
%1679 : bool = prim::AutogradAnyNonZero(%41)
%1680 : bool = prim::AutogradAnyNonZero(%42)
%1681 : bool = prim::AutogradAnyNonZero(%43)
%1682 : bool = aten::__is__(%self_size.4, %1625)
%1683 : bool = aten::__is__(%other_size.4, %1625)
%1684 : bool[] = prim::ListConstruct(%1626, %1628, %1630, %1632, %1634, %1636, %1638, %1640, %1642, %1644, %1646, %1648, %1650, %1652, %1654, %1656, %1657, %1658, %1659, %1660, %1661, %1662, %1663, %1664, %1665, %1666, %1667, %1668, %1669, %1670, %1671, %1672, %1673, %1674, %1675, %1676, %1677, %1678, %1679, %1680, %1681, %1682, %1683)
%1685 : bool = aten::all(%1684)
```

Same example after this change:
```
%1625 : None = prim::Constant()
%1626 : bool = aten::__is__(%self_size.16, %1625)
%1627 : bool = aten::__is__(%other_size.16, %1625)
%1628 : bool = aten::__is__(%self_size.14, %1625)
%1629 : bool = aten::__is__(%self_size.12, %1625)
%1630 : bool = aten::__is__(%self_size.10, %1625)
%1631 : bool = aten::__is__(%other_size.10, %1625)
%1632 : bool = aten::__is__(%self_size.8, %1625)
%1633 : bool = aten::__is__(%other_size.8, %1625)
%1634 : bool = aten::__is__(%self_size.6, %1625)
%1635 : bool = aten::__is__(%other_size.6, %1625)
%1636 : bool = aten::__is__(%self_size.4, %1625)
%1637 : bool = aten::__is__(%other_size.4, %1625)
%1638 : bool = prim::AutogradAllNonZero(%0, %17, %18, %19, %20, %ingate.7, %forgetgate.7, %cellgate.7, %30, %31, %34, %35, %outgate.7, %41, %42, %43)
%1639 : bool = prim::AutogradAllZero(%2, %3, %4, %5, %6, %7, %8, %9, %10, %11, %12, %13, %14, %15, %16)
%1640 : bool[] = prim::ListConstruct(%1626, %1627, %1628, %1629, %1630, %1631, %1632, %1633, %1634, %1635, %1636, %1637, %1638, %1639)
%1641 : bool = aten::all(%1640)
```

My performance measurements showed some changes, but I don't really
trust them and think that they are probably just a noise. Below are
tables with min-aggregation over 10 runs:

FastRNN models:

| name                                             | base time (s) |   diff time (s) |   % change |
| :---                                             |          ---: |            ---: |       ---: |
| lstm[aten]:bwd                                   |     30.059927 |       29.834089 |      -0.8% |
| lstm[aten]:fwd                                   |     25.673708 |       25.700039 |       0.1% |
| lstm[cudnn]:bwd                                  |     17.866232 |       17.893120 |       0.2% |
| lstm[cudnn]:fwd                                  |     11.418444 |       11.408514 |      -0.1% |
| lstm[jit]:bwd                                    |     27.127205 |       27.141029 |       0.1% |
| lstm[jit]:fwd                                    |     17.018047 |       16.975451 |      -0.3% |
| lstm[jit_multilayer]:bwd                         |     27.502396 |       27.365149 |      -0.5% |
| lstm[jit_multilayer]:fwd                         |     16.918591 |       16.917767 |      -0.0% |
| lstm[jit_premul]:bwd                             |     22.281199 |       22.215082 |      -0.3% |
| lstm[jit_premul]:fwd                             |     14.848708 |       14.896231 |       0.3% |
| lstm[jit_premul_bias]:bwd                        |     20.761206 |       21.170969 |       2.0% |
| lstm[jit_premul_bias]:fwd                        |     15.013515 |       15.037978 |       0.2% |
| lstm[jit_simple]:bwd                             |     26.715771 |       26.697786 |      -0.1% |
| lstm[jit_simple]:fwd                             |     16.675898 |       16.545893 |      -0.8% |
| lstm[py]:bwd                                     |     56.327065 |       54.731030 |      -2.8% |
| lstm[py]:fwd                                     |     39.876324 |       39.230572 |      -1.6% |

Torch Hub models:

| name                                             | base time (s) |   diff time (s) |   % change |
| :---                                             |          ---: |            ---: |       ---: |
| test_eval[BERT_pytorch-cuda-jit]                 |      0.111706 |        0.106604 |      -4.6% |
| test_eval[LearningToPaint-cuda-jit]              |      0.002841 |        0.002801 |      -1.4% |
| test_eval[Super_SloMo-cuda-jit]                  |      0.384869 |        0.384737 |      -0.0% |
| test_eval[attension_is_all_you_nee...-cuda-jit]  |      0.123857 |        0.123923 |       0.1% |
| test_eval[demucs-cuda-jit]                       |      0.077270 |        0.076878 |      -0.5% |
| test_eval[fastNLP-cuda-jit]                      |      0.000255 |        0.000249 |      -2.3% |
| test_eval[moco-cuda-jit]                         |      0.426472 |        0.427380 |       0.2% |
| test_eval[pytorch_CycleGAN_and_pix...-cuda-jit]  |      0.026483 |        0.026423 |      -0.2% |
| test_eval[pytorch_mobilenet_v3-cuda-jit]         |      0.036202 |        0.035853 |      -1.0% |
| test_eval[pytorch_struct-cuda-jit]               |      0.001439 |        0.001495 |       3.9% |
| test_train[BERT_pytorch-cuda-jit]                |      0.247236 |        0.247188 |      -0.0% |
| test_train[Background_Matting-cuda-jit]          |      3.536659 |        3.581864 |       1.3% |
| test_train[LearningToPaint-cuda-jit]             |      0.015341 |        0.015331 |      -0.1% |
| test_train[Super_SloMo-cuda-jit]                 |      1.018626 |        1.019098 |       0.0% |
| test_train[attension_is_all_you_nee...-cuda-jit] |      0.446314 |        0.444893 |      -0.3% |
| test_train[demucs-cuda-jit]                      |      0.169647 |        0.169846 |       0.1% |
| test_train[fastNLP-cuda-jit]                     |      0.001990 |        0.001978 |      -0.6% |
| test_train[moco-cuda-jit]                        |      0.855323 |        0.856974 |       0.2% |
| test_train[pytorch_mobilenet_v3-cuda-jit]        |      0.497723 |        0.485416 |      -2.5% |
| test_train[pytorch_struct-cuda-jit]              |      0.309692 |        0.308792 |      -0.3% |

Differential Revision: D23794659

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: 859b68868ef839c5c6cbc7021879ee22d3144ea8
2020-09-24 14:31:49 -07:00
Xinyu Li
26001a2334 Revert D23753711: [pytorch][PR] Add foreach APIs for binary ops with ScalarList
Test Plan: revert-hammer

Differential Revision:
D23753711 (71d1b5b0e2)

Original commit changeset: bf3e8c54bc07

fbshipit-source-id: 192692e0d3fff4cade9983db0a1760fedfc9674c
2020-09-24 11:55:49 -07:00
Raziel Alvarez Guevara
2b38c09f69 Moves prim ops from C10 back to JIT (#45144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45144

Moves prim ops from C10 back to JIT.

These were originally moved to C10 from JIT in D19237648 (f362cd510d)
ghstack-source-id: 112775781

Test Plan:
buck test //caffe2/test/cpp/jit:jit

https://pxl.cl/1l22N

buck test adsatlas/gavel/lib/ata_processor/tests:ata_processor_test

https://pxl.cl/1lBxD

Reviewed By: iseeyuan

Differential Revision: D23697598

fbshipit-source-id: 36d1eb8c346e9b161ba6af537a218440a9bafd27
2020-09-24 09:44:20 -07:00
iurii zdebskyi
71d1b5b0e2 Add foreach APIs for binary ops with ScalarList (#44743)
Summary:
In this PR:
1) Added binary operations with ScalarLists.
2) Fixed _foreach_div(...) bug in native_functions
3) Covered all possible cases with scalars and scalar lists in tests
4) [minor] fixed bug in native_functions by adding "use_c10_dispatcher: full" to all _foreach functions

tested via unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44743

Reviewed By: bwasti, malfet

Differential Revision: D23753711

Pulled By: izdeby

fbshipit-source-id: bf3e8c54bc07867e8f6e82b5d3d35ff8e99b5a0a
2020-09-24 08:30:42 -07:00
Alex Suhan
3dd0e362db [TensorExpr] Fix min and max for integral inputs in CUDA backend (#44984)
Summary:
For integral types, isnan is meaningless. Provide specializations for
maximum and minimum which don't call it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44984

Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_minmax_int_ops

Reviewed By: ezyang

Differential Revision: D23885259

Pulled By: asuhan

fbshipit-source-id: 2e6da2c43c0ed18f0b648a2383d510894c574437
2020-09-23 23:19:12 -07:00
Peter Bell
6a2e9eb51c torch.fft: Multi-dimensional transforms (#44550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44550

Part of the `torch.fft` work (gh-42175).
This adds n-dimensional transforms: `fftn`, `ifftn`, `rfftn` and `irfftn`.

This is aiming for correctness first, with the implementation on top of the existing `_fft_with_size` restrictions. I plan to follow up later with a more efficient rewrite that makes `_fft_with_size` work with arbitrary numbers of dimensions.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23846032

Pulled By: mruberry

fbshipit-source-id: e6950aa8be438ec5cb95fb10bd7b8bc9ffb7d824
2020-09-23 22:09:58 -07:00
Supriya Rao
60665ace17 [quant] Add optimized approach to calculate qparams for qembedding_bag (#45149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45149

The choose_qparams_optimized calculates the the optimized qparams.
It uses a greedy approach to nudge the min and max and calculate the l2 norm
  and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))`

Test Plan: Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D23848060

fbshipit-source-id: c6c57c9bb07664c3f1c87dd7664543e09f634aee
2020-09-23 19:00:22 -07:00
Alex Suhan
76c185dcca [TensorExpr] When lanes differ, insert Broadcast instead of Cast (#45179)
Summary:
We need to check if dtypes differ in scalar type or lanes to decide between
Cast and Broadcast.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45179

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.SimplifyBroadcastTermExpander

Reviewed By: bwasti

Differential Revision: D23873316

Pulled By: asuhan

fbshipit-source-id: ca141be67e10c2b6c5f2ff9c11e42dcfc62ac620
2020-09-23 17:06:54 -07:00
Alex Suhan
0495998862 [TensorExpr] Disallow arithmetic binary operations on Bool (#44677)
Summary:
Arithmetic operations on Bool aren't fully supported in the evaluator. Moreover,
such semantics can be implemented by the client code through insertion of
explicit casts to widen and narrow to the desired types.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44677

Test Plan:
test_tensorexpr --gtest_filter=TensorExprTest.ExprDisallowBoolArithmetic
python test/test_jit_fuser_te.py

Reviewed By: agolynski

Differential Revision: D23801412

Pulled By: asuhan

fbshipit-source-id: fff5284e3a216655dbf5a9a64d1cb1efda271a36
2020-09-23 14:59:11 -07:00
Alex Suhan
8e0fc711f4 [TensorExpr] Remove unused EvalConstExpr function (#45180)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45180

Test Plan: build

Reviewed By: ezyang

Differential Revision: D23877151

Pulled By: asuhan

fbshipit-source-id: a5d4d211c1dc85e6f7045330606163a933b9474e
2020-09-23 14:55:27 -07:00
Yi Wang
2a1a51facb Fix typos. (#45195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45195

Fix some typos in reducer class.
ghstack-source-id: 112673443

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D23862399

fbshipit-source-id: 0dc69e5ea1fa7d33c85d1909b2216bcd1f579f6a
2020-09-23 14:51:15 -07:00
Nick Gibson
9e206ee9f1 [NNC] Fix a bug in SplitWithMask when splitting multiple times (#45141)
Summary:
When doing a splitWithMask we only mask if the loop extent is not cleanly divide by the split factor. However, the logic does not simplify so any nontrivial loop extents will always cause a mask to be added, e.g. if the loop had been previously split. Unlike splitWithTail, the masks added by splitWithMask are always overhead and we don't have the analysis to optimize them out if they are unnecessary, so it's good to avoid inserting them if we can.

The fix is just to simplify the loop extents before doing the extent calculation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45141

Reviewed By: ezyang

Differential Revision: D23869170

Pulled By: nickgg

fbshipit-source-id: 44686fd7b802965ca4f5097b0172a41cf837a1f5
2020-09-23 14:04:58 -07:00
Bradley Davis
21fabae47a Remove expensive call to PyObject_GetAttrString in PyTorch_LookupSpecial (#44684)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44684

The ad-hoc quantization benchmarking script in D23689062 recently highlighted that quantized ops were surprisingly slow after the introduction of support for custom ops in torch.fx in D23203204 (f15e27265f).

Using strobelight, it's immediately clear that up to 66% of samples were seen in `c10::get_backtrace`, which is descends from `torch::is_tensor_and_apppend_overloaded -> torch::check_has_torch_function ->  torch::PyTorch_LookupSpecial -> PyObject_HasAttrString ->  PyObject_GetAttrString`.

I'm no expert by any means so please correct any/all misinterpretation, but it appears that:
- `check_has_torch_function` only needs to return a bool
- `PyTorch_LookupSpecial` should return `NULL` if a matching method is not found on the object
- in the impl of `PyTorch_LookupSpecial` the return value from `PyObject_HasAttrString` only serves as a bool to return early, but ultimately ends up invoking `PyObject_GetAttrString`, which raises, spawning the generation of a backtrace
- `PyObject_FastGetAttrString` returns `NULL` (stolen ref to an empty py::object if the if/else if isn't hit) if the method is not found, anyway, so it could be used singularly instead of invoking both `GetAttrString` and `FastGetAttrString`
- D23203204 (f15e27265f) compounded (but maybe not directly caused) the problem by increasing the number of invocations

so, removing it in this diff and seeing how many things break :)

before:
strobelight: see internal section
output from D23689062 script:
```
$ ./buck-out/gen/scripts/v/test_pt_quant_perf.par
Sequential(
  (0): Quantize(scale=tensor([0.0241]), zero_point=tensor([60]), dtype=torch.quint8)
  (1): QuantizedLinear(in_features=4, out_features=4, scale=0.017489388585090637, zero_point=68, qscheme=torch.per_tensor_affine)
  (2): DeQuantize()
)
fp 0.010896682739257812
q 0.11908197402954102
```

after:
strobelight: see internal section
output from D23689062 script:
```
$ ./buck-out/gen/scripts/v/test_pt_quant_perf.par
Sequential(
  (0): Quantize(scale=tensor([0.0247]), zero_point=tensor([46]), dtype=torch.quint8)
  (1): QuantizedLinear(in_features=4, out_features=4, scale=0.012683945707976818, zero_point=41, qscheme=torch.per_tensor_affine)
  (2): DeQuantize()
)
fp 0.011141300201416016
q 0.022639036178588867
```

which roughly restores original performance seen in P142370729

UPDATE: 9/22 mode/opt benchmarks
```
buck run //scripts/x:test_pt_quant_perf mode/opt
Sequential(
  (0): Quantize(scale=tensor([0.0263]), zero_point=tensor([82]), dtype=torch.quint8)
  (1): QuantizedLinear(in_features=4, out_features=4, scale=0.021224206313490868, zero_point=50, qscheme=torch.per_tensor_affine)
  (2): DeQuantize()
)
fp 0.002968311309814453
q 0.5138928890228271
```

with patch:
```
buck run //scripts/x:test_pt_quant_perf mode/opt
Sequential(
  (0): Quantize(scale=tensor([0.0323]), zero_point=tensor([70]), dtype=torch.quint8)
  (1): QuantizedLinear(in_features=4, out_features=4, scale=0.017184294760227203, zero_point=61, qscheme=torch.per_tensor_affine)
  (2): DeQuantize()
)
fp 0.0026655197143554688
q 0.0064449310302734375
```

Reviewed By: ezyang

Differential Revision: D23697334

fbshipit-source-id: f756d744688615e01c94bf5c48c425747458fb33
2020-09-23 13:52:54 -07:00
Zino Benaissa
4d80c8c648 Fix inlining interface call in fork subgraph (#43790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43790

Interface calls were not handled properly when they are used in fork
subgraph. This PR fixes this issue.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23402039

Pulled By: bzinodev

fbshipit-source-id: 41adc5ee7d942250e732e243ab30e356d78d9bf7
2020-09-23 11:17:19 -07:00
Edward Yang
da4033d32a Make cudaHostRegister actually useful on cudart. (#45159)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45159

By default, pybind11 binds void* to be capsules.  After a lot of
Googling, I have concluded that this is not actually useful:
you can't actually create a capsule from Python land, and our
data_ptr() function returns an int, which means that the
function is effectively unusable.  It didn't help that we had no
tests exercising it.

I've replaced the void* with uintptr_t, so that we now accept int
(and you can pass data_ptr() in directly).  I'm not sure if we
should make these functions accept ctypes types; unfortunately,
pybind11 doesn't seem to have any easy way to do this.

Fixes #43006

Also added cudaHostUnregister which was requested.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D23849731

Pulled By: ezyang

fbshipit-source-id: 8a79986f3aa9546abbd2a6a5828329ae90fd298f
2020-09-23 11:05:44 -07:00
Shen Li
94c3cdd994 Let rpc._all_gather use default RPC timeout (#44983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44983

`_all_gather` was converted from `_wait_all_workers` and inherited its
5 seconds fixed timeout. As `_all_gather` meant to support a broader
set of use cases, the timeout configuration should be more flexible.
This PR makes `rpc._all_gather` use the global default RPC timeout.

Test Plan: Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D23794383

Pulled By: mrshenli

fbshipit-source-id: 382f52c375f0f25c032c5abfc910f72baf4c5ad9
2020-09-23 08:06:09 -07:00
Martin Yuan
e5bade7b2c [PyTorch Mobile] Move string op registrations to prim and make them selective (#44960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44960

Since we have templated selective build, it should be safe to move the operators to prim so that they can be selectively built in mobile

Test Plan: CI

Reviewed By: linbinyu

Differential Revision: D23772025

fbshipit-source-id: 52cebae76e4df5a6b2b51f2cd82f06f75e2e45d0
2020-09-23 07:42:35 -07:00
Luca Wehrstedt
76dc50e9c8 [RPC] Infer backend type if only options are given (#45065)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45065

To preserve backwards compatibility with applications that were passing in some ProcessGroupRpcBackendOptions but were not explicitly setting backend=BackendType.PROCESS_GROUP, we're here now inferring the backend type from the options if only the latter ones are passed. If neither are passed, we'll default to TensorPipe, as before this change.
ghstack-source-id: 112586258

Test Plan: Added new unit tests.

Reviewed By: pritamdamania87

Differential Revision: D23814289

fbshipit-source-id: f4be7919e0817a4f539a50ab12216dc3178cb752
2020-09-23 00:46:27 -07:00
Alex Suhan
215679573e [TensorExpr] Fix operator order in combineMultilane (#45157)
Summary:
combineMultilane used the wrong order when ramp was on the left hand side,
which matters for subtract.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45157

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.SimplifyRampSubBroadcast

Reviewed By: ailzhang

Differential Revision: D23851751

Pulled By: asuhan

fbshipit-source-id: 864d1611e88769fb43327ef226bb3310017bf858
2020-09-22 23:50:47 -07:00
Rohan Varma
d4a634c209 [RPC profiling] Don't wrap toHere() calls with profiling (#44655)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44655

Since `toHere()` does not execute operations over RPC and simply
transfers the value to the local node, we don't need to enable the profiler
remotely for this message. This causes unnecessary overhead and is not needed.

Since `toHere` is a blocking call, we already profile the call on the local node using `RECORD_USER_SCOPE`, so this does not change the expected profiler results (validated by ensuring all remote profiling tests pass).
ghstack-source-id: 112605610

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D23641466

fbshipit-source-id: 109d9eb10bd7fe76122b2026aaf1c7893ad10588
2020-09-22 21:17:00 -07:00
Rohan Varma
70d2e4d1f6 [RPC profiling] Allow disableProfiler() to be called from another thread. (#44653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44653

This changes the profiler per a discussion with ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling.

This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default.

Added a test in `test_misc.cpp` to test this.
ghstack-source-id: 112605620

Reviewed By: mrshenli

Differential Revision: D23638499

fbshipit-source-id: f5bbb0d41ef883c5e5870bc27e086b8b8908f46b
2020-09-22 21:16:58 -07:00
Rohan Varma
1bd6533d60 Remove thread_local RecordFunctionGuard from profiler. (#44646)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44646

Per a discussion with ilia-cher, this is not needed anymore and
removing it would make some future changes to support async RPC profiling
easier. Tested by ensuring profiling tests in `test_autograd.py` still pass.
ghstack-source-id: 112605618

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D23683998

fbshipit-source-id: 4e49a439509884fe04d922553890ae353e3331ab
2020-09-22 21:15:31 -07:00
Jerry Zhang
f575df201f [quant][graphmode][jit][api] Expose preserved_attrs from finalize to convert_jit (#44490)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44490

Test Plan: Imported from OSS

Reviewed By: z-a-f

Differential Revision: D23631142

fbshipit-source-id: f0913f0cb4576067e2a7288326024942d12e0ae0
2020-09-22 19:37:25 -07:00
Meghan Lele
e045119956 [JIT] Add default arguments for class types (#45098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45098

**Summary**
This commit adds support for default arguments in methods of class
types. Similar to how default arguments are supported for regular
script functions and methods on scripted modules, default values are
retrieved from the definition of a TorchScript class in Python as Python
objects, converted to IValues, and then attached to the schemas of
already compiled class methods.

**Test Plan**
This commit adds a set of new tests to TestClassType to test default
arguments.

**Fixes**
This commit fixes #42562.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23844769

Pulled By: SplitInfinity

fbshipit-source-id: ceedff7703bf9ede8bd07b3abcb44a0f654936bd
2020-09-22 18:37:44 -07:00
Bram Wasti
ebde5a80bb [tensorexpr] Add flag to fuse with unknown shapes (#44401)
Summary:
This flag simply allows users to get fusion groups that will *eventually* have shapes (such that `getOperation` is a valid).

This is useful for doing early analysis and compiling just in time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44401

Reviewed By: ZolotukhinM

Differential Revision: D23656140

Pulled By: bwasti

fbshipit-source-id: 9a26c202752399d1932ad7d69f21c88081ffc1e5
2020-09-22 18:17:47 -07:00
Yanan Cao
c253b10154 Fix incorrect EnumValue serialization issue (#44891)
Summary:
Previously, `prim::EnumValue` is serialized to `ops.prim.EnumValue`, which doesn't have the right implementation to refine return type. This diff correctly serializes it to enum.value, thus fixing the issue.

Fixes https://github.com/pytorch/pytorch/issues/44892

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44891

Reviewed By: malfet

Differential Revision: D23818962

Pulled By: gmagogsfm

fbshipit-source-id: 6edfdf9c4b932176b08abc69284a916cab10081b
2020-09-22 11:59:45 -07:00
Ailing Zhang
10f287539f Align casing in test_dispatch with dispatch keys. (#44933)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44933

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23778247

Pulled By: ailzhang

fbshipit-source-id: bc3725eae670b03543015afe763cb3bb16baf8f6
2020-09-22 10:50:08 -07:00
Elias Ellison
ae286d81e0 [JIT] improve alias analysis for list constructs (#39111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39111

In our present alias analysis, we consider any Value that enter another container as entering the heap, and thus aliasing all other heap values of the same type. There are a number of advantages to this approach:
- it is not to hard to maintain the aliasDb implementation
- it is much easier from an op schema perspective - there are many composite list ops registered internally and externally that would be tricky to register and get right if we did something more complicated
- It limits the size of the AliasDb, because a container of size 10 only contains a single memory dag element instead of 10 elements.

The downside is that we have are unable to handle the simple and extremely common case of a list of tensors being used in an ATen op.

In an example like:

```
 def foo(input):
    x = torch.tensor([1, 2, 3, 4])
    y = [x, x]
    input.add_(1)
    return torch.cat(y)
```

we will consider x to be written to. any write to any wildcard element (an element that enters a tuple, an element that is taken from a list) will mark x as written to. This can be limiting for our ability to create a functional subset and fuse graphs - as a result, 4 of TorchVision classification models could not be functionalized.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D23828003

Pulled By: eellison

fbshipit-source-id: 9109fcb6f2ca20ca897cae71683530285da9d537
2020-09-22 09:38:59 -07:00
Nikita Shulga
63fd257879 Add Ellipsis constant to the list of recognized tokens (#44959)
Summary:
Per https://docs.python.org/3.6/library/constants.html
> `Ellipsis` is the same as ellipsis literal `...`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44959

Reviewed By: suo

Differential Revision: D23785660

Pulled By: malfet

fbshipit-source-id: f68461849e7d16ef68042eb96566f2c936c06b0f
2020-09-22 09:05:25 -07:00
anjali411
58b6ab69e5 torch.sgn for complex tensors (#39955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39955

resolves https://github.com/pytorch/pytorch/issues/36323 by adding `torch.sgn` for complex tensors.
`torch.sgn` returns `x/abs(x)` for `x != 0` and returns `0 + 0j` for `x==0`

This PR doesn't test the correctness of the gradients. It will be done as a part of auditing all the ops in future once we decide the autograd behavior (JAX vs TF) and add gradchek.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23460526

Pulled By: anjali411

fbshipit-source-id: 70fc4e14e4d66196e27cf188e0422a335fc42f92
2020-09-22 08:24:53 -07:00
Bugra Akyildiz
1b059f2c6d Directly use work.result() to retrieve tensor rather than passing as a separate argument (#44914)
Summary:
We currently are fetching an allreduced tensor from Python in C++ in, where we are storing the resulting tensor in a struct's parameter. This PR removes extra tensor paratemeter in the function parameter and fetch from a single place.

Fixes https://github.com/pytorch/pytorch/issues/43960

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44914

Reviewed By: rohan-varma

Differential Revision: D23798888

Pulled By: bugra

fbshipit-source-id: ad1b8c31c15e3758a57b17218bbb9dc1f61f1577
2020-09-22 06:28:47 -07:00
Jerry Zhang
5aed75b21b [quant][graphmode][jit] Try to support append (#44641)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44641

Test Plan: Imported from OSS

Reviewed By: z-a-f

Differential Revision: D23682356

fbshipit-source-id: 09a03dfde0b1346a5764e8e28ba56e32b343d239
2020-09-21 23:13:56 -07:00
Ksenija Stanojevic
0dda65ac77 [ONNX] add jit pass for lists (#43820)
Summary:
Add jit preprocessing pass for adding int lists.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43820

Reviewed By: albanD

Differential Revision: D23674598

Pulled By: bzinodev

fbshipit-source-id: 35766403a073e202563bba5251c07efb7cc5cfb1
2020-09-21 22:05:25 -07:00
Shen Li
09e7f62ce2 Fix RPC and ProcessGroup GIL deadlock (#45088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45088

Fixes #45082

Found a few problems while working on #44983

1. We deliberately swallow RPC timeouts during shutdown, as we haven't
found a good way to handle those. When we convert `_wait_all_workers`
into `_all_gather`, the same logic was inherited. However, as
`_all_gather` meant to be used in more general scenarios, we should
no longer keep silent about errors. This commit let the error throw
in `_all_gather` and also let `shutdown()` to catch them and log.
2. After fixing (1), I found that `UnpickledPythonCall` needs to
acquire GIL on destruction, and this can lead to deadlock when used
in conjuction with `ProcessGroup`. Because `ProcessGroup` ctor is a
synchronization point which holds GIL. In `init_rpc`, followers
(`rank != 0`) can exit before the leader (`rank == 0`). If the two
happens together, we could get a) on a follower, it exits `init_rpc`
after running `_broadcast_to_followers` and before the reaching dtor
of `UnpickledPythonCall`. Then it runs the ctor of `ProcessGroup`,
which holds the GIL and wait for the leader to join. However, the
leader is waiting for the response from `_broadcast_to_followers`,
which is blocked by the dtor of `UnpickledPythonCall`. And hence
the deadlock. This commit drops the GIL in `ProcessGroup` ctor.
3. After fixing (2), I found that `TensorPipe` backend
nondeterministically fails with `test_local_shutdown`, due to a
similar reason as (2), but this time it is that `shutdown()` on a
follower runs before the leader finishes `init_rpc`. This commit
adds a join for `TensorPipe` backend `init_rpc` after `_all_gather`.

The 3rd one should be able to solve the 2nd one as well. But since
I didn't see a reason to hold GIL during `ProcessGroup` ctor, I
made that change too.

Test Plan: Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D23825592

Pulled By: mrshenli

fbshipit-source-id: 94920f2ad357746a6b8e4ffaa380dd56a7310976
2020-09-21 21:47:27 -07:00
Nikita Shulga
81bb19c9f0 [JIT] Prohibit subscripted assignments for tuple types (#44929)
Summary:
This would force jit.script to raise an error if someone tries to mutate tuple
```
Tuple[int, int] does not support subscripted assignment:
  File "/home/nshulga/test/tupleassignment.py", line 9
torch.jit.script
def foo(x: Tuple[int, int]) -> int:
    x[-1] = x[0] + 1
    ~~~~~ <--- HERE
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44929

Reviewed By: suo

Differential Revision: D23777668

Pulled By: malfet

fbshipit-source-id: 8efaa4167354ffb4930ccb3e702736a3209151b6
2020-09-21 16:35:44 -07:00
Ailing Zhang
92f8f75c59 Add alias dispatch key Math. (#44354)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44354

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23591481

Pulled By: ailzhang

fbshipit-source-id: 6e93c4ec99a07f3fc920ba2d09dc222e6ced5adf
2020-09-21 11:10:39 -07:00
Lucas Hosseini
ac8c7c4e9f Make Channel API accept buffer structs rather than raw pointers. (#45014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45014

Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/219

Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/212

+ Introduce buffer.h defining the buffer struct(s). The `CpuBuffer`
struct is always defined, while the `CudaBuffer` struct is defined
only when `TENSORPIPE_SUPPORTS_CUDA` is true.
+ Update all channels to take a `CpuBuffer` or `CudaBuffer` for
`send`/`recv` rather than a raw pointer and a length.
+ Make the base `Channel`/`Context` classes templated on `TBuffer`,
effectively creating two channel hierarchies (one for CPU channels,
one for CUDA channels).
+ Update the Pipe and the generic channel tests to use the new API. So
far, generic channel tests are CPU only, and tests for the CUDA IPC
channel are (temporarily) disabled. A subsequent PR will take care of
refactoring tests so that generic tests work for CUDA channels. An
other PR will add support for CUDA tensors in the Pipe.

Differential Revision: D23598033

Test Plan: Imported from OSS

Reviewed By: lw

Pulled By: beauby

fbshipit-source-id: 1d6c3f91e288420858835cd5e7962e8da051b44b
2020-09-21 10:18:45 -07:00
Nick Gibson
4bbb6adff5 [NNC] fix SyncThreads insertion and reenable CudaSharedMem test (#44909)
Summary:
A previous fix for masking Cuda dimensions (https://github.com/pytorch/pytorch/issues/44733) changed the behaviour of inserting thread synchronization barriers in the Cuda CodeGen, causing the CudaSharedMemReduce_1 to be flaky and ultimately disabled.

The issue is working out where these barriers must be inserted - solving this optimally is very hard, and I think not possible without dependency analysis we don't have, so I've changed our logic to be quite pessimistic. We'll insert barriers before and after any blocks that have thread dimensions masked (even between blocks that have no data dependencies). This should be correct, but it's an area we could improve performance. To address this somewhat I've added a simplifier pass that removes obviously unnecessary syncThreads.

To avoid this test being flaky again, I've added a check against the generated code to ensure there is a syncThread in the right place.

Also fixed a couple of non-functional but clarity issues in the generated code: fixed the missing newline after Stores in the CudaPrinter, and prevented the PrioritizeLoad mutator from pulling out loads contained within simple Let statements (such as those produced by the Registerizer).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44909

Reviewed By: agolynski

Differential Revision: D23800565

Pulled By: nickgg

fbshipit-source-id: bddef1f40d8d461da965685f01d00b468d8a2c2f
2020-09-21 09:27:22 -07:00
anjali411
9f67176b82 Complex gradcheck logic (#43208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43208

This PR adds gradcheck for complex. The logic used for complex gradcheck is described in Section 3.5.3 here: https://arxiv.org/pdf/1701.00392.pdf

More concretely, this PR introduces the following changes:
1. Updates get_numerical_jacobian to take as input a scalar value for vector (v). Adds gradcheck logic for C -> C, C-> R, R -> C. For R -> C functions, only the real value of gradient is propagated.
2. Adds backward definition for `torch.complex` and also adds a test to verify the definition added.
3. Updates backward for `mul`, `sin`, `cos`, `sinh`, `cosh`.
4. Adds tests for all `torch.real`, `torch.imag`, `torch.view_as_real`, `torch.view_as_complex`, `torch.conj`.

Follow up tasks:
1. Add more thorough tests for R -> C cases. Specifically, add R->C test variants for functions. for e.g., `torch.mul(complex_tensor, real_tensor)`
2. Add back commented test in `common_methods_invocation.py`.
3. Add more special case checking for complex gradcheck to make debugging easier.
4. Update complex autograd note.
5. disable complex autograd for operators not tested for complex.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23655088

Pulled By: anjali411

fbshipit-source-id: caa75e09864b5f6ead0f988f6368dce64cf15deb
2020-09-20 22:05:04 -07:00
Peter Bell
da7863f46b Add one dimensional FFTs to torch.fft namespace (#43011)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43011

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23751850

Pulled By: mruberry

fbshipit-source-id: 8dc5fec75102d8809eeb85a3d347ba1b5de45b33
2020-09-19 23:32:22 -07:00
Mike Ruberry
60709ad1bf Adds multiply and divide aliases (#44463)
Summary:
These alias are consistent with NumPy. Note that C++'s naming would be different (std::multiplies and std::divides), and that PyTorch's existing names (mul and div) are consistent with Python's dunders.

This also improves the instructions for adding an alias to clarify that dispatch keys should be removed when copying native_function.yaml entries to create the alias entries.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44463

Reviewed By: ngimel

Differential Revision: D23670782

Pulled By: mruberry

fbshipit-source-id: 9f1bdf8ff447abc624ff9e9be7ac600f98340ac4
2020-09-19 15:47:52 -07:00
Ivan Kobzarev
e9941a5dd4 [vulkan][py] torch.utils.optimize_for_vulkan (#44903)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44903

Test Plan: Imported from OSS

Reviewed By: kimishpatel

Differential Revision: D23766039

Pulled By: IvanKobzarev

fbshipit-source-id: dbdf484ee7d3a7719aab105efba51b92ebc51568
2020-09-18 18:20:11 -07:00
Shawn Wu
572f7e069c Enable type check for torch.testing._internal.te_utils.* (#44927)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44927

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D23776842

Pulled By: sshawnwu

fbshipit-source-id: 65c028169a37e1f2f7d9fdce8a958234ee1caa26
2020-09-18 18:09:15 -07:00
Peter Bell
fd4e21c91e Add optional string support to native_functions schema (#43010)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43010

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23751851

Pulled By: mruberry

fbshipit-source-id: 648f7430e1b7311eff28421f38e01f52d998fcbd
2020-09-18 14:57:24 -07:00
Michael Suo
374e9373b5 [jit] Pull (most) tests out of libtorch_python (#44795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44795

Today, we build our cpp tests twice, once as a standalone gtest binary,
and once linked in `libtorch_python` so we can call them from
`test_jit.py`.

This is convenient (it means that `test_jit.py` is a single entry point
for all our tests), but has a few drawbacks:
1. We can't actually use the gtest APIs, since we don't link gtest into
`libtorch_python`. We're stuck with the subset that we want to write
polyfills for, and an awkward registration scheme where you have to
write a test then include it in `tests.h`).
2. More seriously, we register custom operators and classes in these
tests. In a world where we may be linking many `libtorch_python`s, this
has a tendency to cause errors with `libtorch`.

So now, only tests that explicitly require cooperation with Python are
built into `libtorch_python`. The rest are built into
`build/bin/test_jit`.

There are tests which require that we define custom classes and
operators. In these cases, I've built thm into separate `.so`s that we
call `torch.ops.load_library()` on.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity, ZolotukhinM

Differential Revision: D23735520

Pulled By: suo

fbshipit-source-id: d146bf4e7eb908afa6f96b394e4d395d63ad72ff
2020-09-18 14:04:40 -07:00
Lucas Hosseini
af3fc9725d Extract rpc/tensorpipe_utils.{cpp,h} from rpc/utils.{cpp,h} (#44803)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44803

Test Plan: CI

Reviewed By: lw

Differential Revision: D23732022

fbshipit-source-id: 5b839c7997bbee162a14d03414ee32baabbc8ece
2020-09-18 13:51:43 -07:00
Nick Gibson
f175830558 [NNC] Fuse identical conditions in simplifier (#44886)
Summary:
Adds a pass to the IR Simplifier which fuses together the bodies of Cond statements which have identical conditions. e.g.

```
if (i < 10) {
  do_thing_1;
} else {
  do_thing_2;
}
if (i < 10) {
  do_thing_3;
}
```

is transformed into:

```
if (i < 10) {
  do_thing_1;
  do_thing_3;
} else {
  do_thing_2;
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44886

Reviewed By: glaringlee

Differential Revision: D23768565

Pulled By: nickgg

fbshipit-source-id: 3fe40d91e82bdfff8dcb8c56a02a4fd579c070df
2020-09-18 11:38:03 -07:00
Yanan Cao
174cbff00a Improve sugared value's error message (#42889)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **https://github.com/pytorch/pytorch/issues/42889 Improve sugared value's error message**

I think most (if not all) cases where this code path is reached can be attributed to closing over a global variable.
Improving error message to make this clearer to users.

close https://github.com/pytorch/pytorch/issues/41288

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42889

Reviewed By: SplitInfinity

Differential Revision: D23779347

Pulled By: gmagogsfm

fbshipit-source-id: ced702a96234040f79eb16ad998d202e360d6654
2020-09-18 11:01:40 -07:00
Peter Bell
df39c40054 Cleanup tracer handling of optional arguments (#43009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43009

* **#43009 Cleanup tracer handling of optional arguments**

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23766621

Pulled By: mruberry

fbshipit-source-id: c1b46cd23b58b18ef4c03021b2514d7e692badb6
2020-09-18 06:54:09 -07:00
Rohan Varma
5dbcbea265 TorchScript with record_function (#44345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44345

As part of enhancing profiler support for RPC, when executing TorchScript functions over RPC, we would like to be able to support user-defined profiling scopes created by `with record_function(...)`.

Since after https://github.com/pytorch/pytorch/pull/34705, we support `with` statements in TorchScript, this PR adds support for `with torch.autograd.profiler.record_function` to be used within TorchScript.

This can be accomplished via the following without this PR:
```
torch.opts.profiler._record_function_enter(...)
# Script code, such as forward pass
torch.opts.profiler._record_function_exit(....)
```

This is a bit hacky and it would be much cleaner to use the context manager now that we support `with` statements. Also, `_record_function_` type operators are internal operators that are subject to change, this change will help avoid BC issues in the future.

Tested with `python test/test_jit.py TestWith.test_with_record_function -v`
ghstack-source-id: 112320645

Test Plan:
Repro instructions:
1) Change `def script_add_ones_return_any(x) -> Any` to `def script_add_ones_return_any(x) -> Tensor` in `jit/rpc_test.py`
2) `buck test mode/dev-nosan //caffe2/test/distributed/rpc:process_group_agent -- test_record_function_on_caller_rpc_async --print-passing-details`
3) The function which ideally should accept `Future[Any]` is `def _call_end_callbacks_on_future` in `autograd/profiler.py`.

python test/test_jit.py TestWith.test_with_foo -v

Reviewed By: pritamdamania87

Differential Revision: D23332074

fbshipit-source-id: 61b0078578e8b23bfad5eeec3b0b146b6b35a870
2020-09-17 18:45:00 -07:00
Yuxin Wu
9a007ba4cb [jit] stop parsing the block after seeing exit statements (#44870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44870

fix https://github.com/pytorch/pytorch/issues/44864

Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- 'test_assert_is_script'

Reviewed By: eellison

Differential Revision: D23755094

fbshipit-source-id: ca3f8b27dc6f9dc9364a22a1bce0e2f588ed4308
2020-09-17 18:09:16 -07:00
Yanli Zhao
e14b2080be [reland] move rebuild buckets from end of first iteration to beginning of second iteration (#44798)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44798

[test all]

Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well.

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration
ghstack-source-id: 112279261
ghstack-source-id: 112279261

Test Plan: unit tests

Reviewed By: rohan-varma

Differential Revision: D23735185

fbshipit-source-id: c26e0efeecb3511640120faa1122a2c856cd694e
2020-09-17 17:10:21 -07:00
Alex Suhan
18b77d7d17 [TensorExpr] Add Mod support to the LLVM backend (#44823)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44823

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseMod_LLVM

Reviewed By: glaringlee

Differential Revision: D23761996

Pulled By: asuhan

fbshipit-source-id: c3c5b2fe0d989dec04f0152ce47c5cae35ed19c9
2020-09-17 15:25:42 -07:00
Alex Suhan
f5b92332c1 [TensorExpr] Fix order comparisons for unsigned types (#44857)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44857

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMCompareSelectByte*_LLVM

Reviewed By: glaringlee

Differential Revision: D23762162

Pulled By: asuhan

fbshipit-source-id: 1553429bd2d5292ccda57910326b8c70e4e6ab88
2020-09-17 14:16:54 -07:00
Nikita Shulga
4066022146 Do not use PRId64 in torch/csrc (#44767)
Summary:
Instead use `fmt::format()` or `%lld` and cast argument to `(long long)`
Fix typos and add helper `PyErr_SetString()` method in torch/csrc/Exceptions.h

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44767

Reviewed By: ezyang

Differential Revision: D23723671

Pulled By: malfet

fbshipit-source-id: c0101aed222184aa436b1e8768480d1531dff232
2020-09-17 14:00:02 -07:00
Alex Suhan
5d57025206 [TensorExpr] Add log1p support to the LLVM backend (#44839)
Summary:
Also corrected Sleef_log1p registrations, float versions had a redundant f.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44839

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseLog1pFloat_LLVM

Reviewed By: glaringlee

Differential Revision: D23762113

Pulled By: asuhan

fbshipit-source-id: b5cf003b5c0c1ad549c7f04470352231929ac459
2020-09-17 13:38:35 -07:00
Yanan Cao
2558e5769d Implement sort for list of tuples (#43448)
Summary:
* Implement tuple sort by traversing contained IValue types and generate a lambda function as comparator for sort.
* Tuple, class objects can now arbitrarily nest within each other and still be sortable

Fixes https://github.com/pytorch/pytorch/issues/43219

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43448

Reviewed By: eellison

Differential Revision: D23352273

Pulled By: gmagogsfm

fbshipit-source-id: b6efa8d00e112178de8256da3deebdba7d06c0e1
2020-09-17 11:20:56 -07:00
Yanli Zhao
d2b4534d4d refactor intialize bucket views (#44330)
Summary:
[test all]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44330

Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well
ghstack-source-id: 112257271

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D23583347

fbshipit-source-id: a5f2041b2c4f2c2b5faba1af834c7143eaade938
2020-09-17 09:20:23 -07:00
Yanan Cao
99093277c0 Support Python Slice class in TorchScript (#44335)
Summary:
Implements support for[ Python Slice class](https://docs.python.org/3/c-api/slice.html) (not slice expression, which is already supported)

Slice object can be used in any place that supports slice expression, including multi-dim tensor slicing.

Fixes https://github.com/pytorch/pytorch/issues/43511
Fixes https://github.com/pytorch/pytorch/issues/43125

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44335

Reviewed By: suo, jamesr66a

Differential Revision: D23682213

Pulled By: gmagogsfm

fbshipit-source-id: f74fe25370e89fbfd2b3727d95ce4e1c4ba8dec4
2020-09-17 00:41:53 -07:00
Nick Gibson
204f985fc3 [NNC] Add simplification of Loop + Condition patterns. (#44764)
Summary:
Adds a new optimization to the IRSimplifier which changes this pattern:
```
for ...
  if ...
   do thing;
```
into:
```
if ...
  for ...
    do thing;
```

Which should be almost strictly better.

There are many cases where this isn't safe to do, hence tests. Most  obviously when the condition depends on something modified within the loop.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44764

Reviewed By: mruberry

Differential Revision: D23734463

Pulled By: nickgg

fbshipit-source-id: 51617e837de96b354fb702d0090ac65ddc523d36
2020-09-16 18:41:58 -07:00
Yanan Cao
6befc09465 Fix misuse of PyObject_IsSubclass (#44769)
Summary:
PyObject_IsSubclass may set python live exception bit if given object is not a class. `IsNamedTuple` is currently using it incorrectly, which may trip all following python operations in debug-build python. Normal release-build python is not affected because `assert` is no-op in release-build.

Fixes https://github.com/pytorch/pytorch/issues/43577

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44769

Reviewed By: jamesr66a

Differential Revision: D23725584

Pulled By: gmagogsfm

fbshipit-source-id: 2dabd4f8667a045d5bf75813500876c6fd81542b
2020-09-16 16:19:01 -07:00
Mingzhe Li
574f9af160 [NCCL] Add option to run NCCL on high priority cuda stream (#43796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43796

This diff adds an option for the process group NCCL backend to pick high priority cuda streams.

Test Plan: waitforsandcastle

Reviewed By: jiayisuse

Differential Revision: D23404286

fbshipit-source-id: b79ae097b7cd945a26e8ba1dd13ad3147ac790eb
2020-09-16 16:00:41 -07:00
Alex Suhan
7b3432caff [TensorExpr] Support boolean in simplifier (#44659)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44659

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ConstantFoldCastToBool

Reviewed By: ngimel

Differential Revision: D23714675

Pulled By: asuhan

fbshipit-source-id: 4c18d972b628d5ad55bad58eddd5f6974e043d9c
2020-09-16 15:30:19 -07:00
Nick Gibson
82ab167cce [NNC] Fix masking for all block and thread dimensions in CudaCodeGen (#44733)
Summary:
Unifies a number of partial solutions to the thread and block dimension extent masking, including the NoThreadIdxWriter and my last fix https://github.com/pytorch/pytorch/issues/44325. The NoThreadIdxWriter is gone in favour of tracking the current loop extents and masking any statements that have a lower rank than the launch parameters in any Block or Thread dimension, which handles both the "no" and "smaller" axis binding cases.

For example it will transform the following:
```
for i in 0..10 // blockIdx.x
  for j in 0..10 // threadIdx.x
    do thing(i, j);
  for k in 0..5 // threadIdx.x
    do other thing(i, k);
```

Into:
```
do thing(blockIdx.x, threadIdx.x);
if (threadIdx.x < 5) {
  do other thing(blockIdx.x, threadIdx.x);
}
```

And handle the case where statements are not bound by any axis, eg.
```
do outer thing;
for i in 0..10 // blockIdx.x
  for j in 0..10 // threadIdx.x
    do thing(i, j);
  do other thing(i);
```

will become:

```
if (blockIdx.x < 1) {
  if (threadIdx.x < 1) {
    do outer thing;
  }
}
syncthreads();
do thing(blockIdx.x, threadIdx.x);
syncthreads();
if (threadIdx.x < 1) {
  do other thing(blockIdx.x);
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44733

Reviewed By: mruberry

Differential Revision: D23736878

Pulled By: nickgg

fbshipit-source-id: 52d08626ae8043d53eb937843466874d479a6768
2020-09-16 14:23:47 -07:00
Yi Wang
f3bd984e44 Move the description comment of compute_bucket_assignment_by_size from cpp to the header file. (#44703)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44703

The description of this public function should be in the header file.

Also fix some typos.

Test Plan: N/A.

Reviewed By: pritamdamania87

Differential Revision: D23703661

fbshipit-source-id: 24ae63de9498e321b31dfb2efadb44183c6370df
2020-09-16 13:44:14 -07:00
Shen Li
924717bf51 Add _get_type() API to RRef (#44663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44663

The new API returns the type of the data object referenced by this
`RRef`. On the owner, this is same as `type(rref.local_value())`.
On a user, this will trigger an RPC to fetch the `type` object from
the owner. After this function is run once, the `type` object is
cached by the `RRef`, and subsequent invocations no longer trigger
RPC.

closes #33210

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D23691990

Pulled By: mrshenli

fbshipit-source-id: a2d87cd601a691dd75164b6bcd7315245e9cf6bd
2020-09-16 11:59:22 -07:00
Mikhail Zolotukhin
d66520ba08 [TensorExpr] Fuser: try merging adjacent fusion groups. (#43671)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43671

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23360796

Pulled By: ZolotukhinM

fbshipit-source-id: 60ec318fe77ae9f2c821d9c4d106281845266e0f
2020-09-15 21:31:02 -07:00
Ailing Zhang
fb085d90e3 Revert D23583017: move rebuild buckets from end of first iteration to beginning of second iteration
Test Plan: revert-hammer

Differential Revision:
D23583017 (f5d231d593)

Original commit changeset: ef67f79437a8

fbshipit-source-id: fd914b7565aba6a5574a32b31403525abb80ff07
2020-09-15 15:10:52 -07:00
Dmytro Dzhulgakov
2f4c31ce3a [jit] Speed up saving in case of many classes (#44589)
Summary:
There's an annoying O(N^2) in module export logic that makes saving some of the models (if they have many classes) take eternity.

I'm not super familiar with this code to properly untangle the deps and make it a pure hash lookup. So I just added a side lookup table for raw pointers. It's still quadratic, but it's O(num_classes^2) instead of O(num_classes * num_references) which already gives huge savings.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44589

Test Plan:
Tested with one of the offending models - just loading a saving a Torchscript file:

```
Before:
load 1.9239683151245117
save 165.74712467193604

After:
load 1.9409027099609375
save 1.4711427688598633
```

Reviewed By: suo

Differential Revision: D23675278

Pulled By: dzhulgakov

fbshipit-source-id: 8f3fa7730941085ea20d9255b49a149ac1bf64fe
2020-09-15 15:10:45 -07:00
Nick Gibson
69839ea3f6 [NNC] make inlining immediate (take 3) (#44231)
Summary:
This is a reup https://github.com/pytorch/pytorch/issues/43885 with an extra commit which should fix the bugs that caused it to be reverted. Read that for general context.

The issue here was that we were still using the side maps `tensor_to_stmt_` and `stmt_to_tensor_` which get invalidated by any transform of the IR (rather than just any transform that isn't computeInline). I added a comment about this but didn't actually address our usages of it.

I've removed these maps and changed the `getLoopBodyFor` and `getLoopStatementsFor` helpers to search the root stmt directly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44231

Reviewed By: albanD

Differential Revision: D23689688

Pulled By: nickgg

fbshipit-source-id: 1c6009a880f8c0cebf2300fd06b5cc9322bffbf9
2020-09-15 11:12:24 -07:00
Elias Ellison
8df0400a50 Fix fallback graph in specialize autogradzero (#44654)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44654

Previously we weren't creating a fallback graph as intended in specialize autograd zero, so if a Tensor failed one of our undefinedness checks we would run the backward normally without reprofiling & optimizing.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23691764

Pulled By: eellison

fbshipit-source-id: 10c6fa79518c84a6f5ef2bfbd9ea10843af751eb
2020-09-15 11:12:20 -07:00
Yanli Zhao
f5d231d593 move rebuild buckets from end of first iteration to beginning of second iteration (#44326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44326

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration
ghstack-source-id: 112011490

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D23583017

fbshipit-source-id: ef67f79437a820d9b5699b651803622418499a83
2020-09-15 09:51:33 -07:00
Meghan Lele
e7d782e724 [JIT] Add property support for ScriptModules (#42390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42390

**Summary**
This commit extends support for properties to include
ScriptModules.

**Test Plan**
This commit adds a unit test that has a ScriptModule with
a user-defined property.

`python test/test_jit_py3.py TestScriptPy3.test_module_properties`

Test Plan: Imported from OSS

Reviewed By: eellison, mannatsingh

Differential Revision: D22880298

Pulled By: SplitInfinity

fbshipit-source-id: 74f6cb80f716084339e2151ca25092b6341a1560
2020-09-14 18:49:21 -07:00
Elias Ellison
551494b01d [JIT] Fix torch.tensor for empty multidimensional-typed lists (#44652)
Summary:
We were hitting an assert error when you passed in an empty `List[List[int]]` - this fixes that error by not recursing into 0-element tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44652

Reviewed By: ZolotukhinM

Differential Revision: D23688247

Pulled By: eellison

fbshipit-source-id: d48ea24893044fae96bc39f76c0f1f9726eaf4c7
2020-09-14 17:28:23 -07:00
Mike Ruberry
686e281bcf Updates div to perform true division (#42907)
Summary:
This PR:

- updates div to perform true division
- makes torch.true_divide an alias of torch.div

This follows on work in previous PyTorch releases that first deprecated div performing "integer" or "floor" division, then prevented it by throwing a runtime error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42907

Reviewed By: ngimel

Differential Revision: D23622114

Pulled By: mruberry

fbshipit-source-id: 414c7e3c1a662a6c3c731ad99cc942507d843927
2020-09-14 15:50:38 -07:00
BowenBao
43406e218a [ONNX] Update ONNX shape inference (#43929)
Summary:
* Support sequence type (de)serialization, enables onnx shape inference on sequence nodes.
* Fix shape inference with block input/output: e.g. Loop and If nodes.
* Fix bugs in symbolic discovered by coverage of onnx shape inference.
* Improve debuggability: added more jit logs. For simplicity, the default log level, when jit log is enabled, will not dump ir graphs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43929

Reviewed By: albanD

Differential Revision: D23674604

Pulled By: bzinodev

fbshipit-source-id: ab6aacb16d0e3b9a4708845bce27c6d65e567ba7
2020-09-14 15:36:19 -07:00
Alex Suhan
a188dbdf3f Check for index-rank consistency in FunctionInliner (#44561)
Summary:
When caller / callee pairs are	inserted into the mapping, verify that
the arity of the buffer access is consistent with its declared rank.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44561

Test Plan: CI, test_tensorexpr --gtest_filter=TensorExprTest.DetectInlineRankMismatch

Reviewed By: albanD

Differential Revision: D23684342

Pulled By: asuhan

fbshipit-source-id: dd3a0cdd4c2492853fa68381468e0ec037136cab
2020-09-14 14:07:22 -07:00
Zafar
9d4943daaf [quant] conv_transpose1d / conv_transpose2d (#40370)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40370

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D22158979

Pulled By: z-a-f

fbshipit-source-id: f5cb812c9953efa7608f06cf0188de447f73f358
2020-09-14 13:45:28 -07:00
Raghavan Raman
ad7a2eb1c9 Simplify nested Min and Max patterns. (#44142)
Summary:
Improve simplification of nested Min and Max patterns.

Specifically, handles the following pattern simplications:
  * `Max(A, Max(A, Const)) => Max(A, Const)`
  * `Max(Min(A, B), Min(A, C)) => Min(A, Max(B, C))`
  * `Max(Const, Max(A, OtherConst) => Max(A, Max(Const, OtherConst))`
     - This case can have an arbitrarily long chain of Max ops. For example: `Max(5, Max(x, Max(y, Max(z, 8)))) => Max(Max(Max(x, 8), y), z)`

Similarly, for the case of Min as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44142

Reviewed By: albanD

Differential Revision: D23644486

Pulled By: navahgar

fbshipit-source-id: 42bd241e6c2af820566744c8494e5dee172107f4
2020-09-14 13:24:46 -07:00
Bram Wasti
a475613d1d [static runtime] Swap to out-variant compatible nodes (#44127)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44127

Test Plan: Imported from OSS

Reviewed By: hlu1

Differential Revision: D23604306

Pulled By: bwasti

fbshipit-source-id: 18ccfb9b466b822e28130be3d5c4fae36c76820b
2020-09-14 12:38:25 -07:00
Elias Ellison
856510c96d [JIT] Dont optimize shape info in batch_mm (#44565)
Summary:
We run remove profile nodes and specialize types before batch_mm, so we cannot run peepholes on the type information of tensors since these properties have not been guarded to be guaranteed to be correct.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44565

Reviewed By: albanD

Differential Revision: D23661538

Pulled By: eellison

fbshipit-source-id: 0dd23a65714f047f49b4db4ec582b21870925fe1
2020-09-14 12:34:20 -07:00
Jeremy Lilley
7040a070e3 [torch] Minor: Avoid ostreamstring in Operator's canonicalSchemaString() (#44442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44442

I noticed lock contention on startup as lookupByLiteral() was
calling registerPendingOperators() - some calls were holding the
lock for 10+ ms, as operators were being registered.

canonicalSchemaString() was using ostreamstring, which isn't typically
particularly fast (partly because of c++ spec locale requirements).
If we repalce with regular c++ string appends, it's somewhat faster
(which isn't hard when comparing with stringstream; albeit a bit
more codegen)

Over the first minute or so, this cuts out 1.4 seconds under the
OperatorRegistry lock (as part of registerPendingOperators) in the
first couple minutes of run time (mostly front-loaded) when running
sync sgd.

As an example, before:
   registerPendingOperators 12688 usec for 2449 operators
After:
   registerPendingOperators 6853 usec for 2449 operators
ghstack-source-id: 111862971

Test Plan: buck test mode/dev-nosan caffe2/test/cpp/...

Reviewed By: ailzhang

Differential Revision: D23614515

fbshipit-source-id: e712f9dac5bca0b1876e11fb8f0850402f03873a
2020-09-14 08:24:16 -07:00
Martin Yuan
7862827269 [pytorch] Add variadic run_method for lite intepreter (#44337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44337

Add a new run_method to mobile Module which is variadic (takes any number of arguments) to match full jit.
ghstack-source-id: 111909068

Test Plan: Added new unit test to test_jit test suite

Reviewed By: linbinyu, ann-ss

Differential Revision: D23585763

fbshipit-source-id: 007cf852290f03615b78c35aa6f7a21287ccff9e
2020-09-13 13:26:30 -07:00
Mikhail Zolotukhin
bcf97b8986 [JIT] Cleanup some places where we log graphs in executors. (#44588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44588

1) SOURCE_DUMP crashes when invoked on a backward graph since
   `prim::GradOf` nodes can't be printed as sources (they don't have
   schema).
2) Dumping graph each time we execute an optimized plan produces lots of
   output in tests where we run the graph multiple times (e.g.
   benchmarks). Outputting that on the least level of verbosity seems
   like an overkill.
3) Duplicated log statement is removed.

Differential Revision: D23666812

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: b9a30e34fd39c85f3e13c3f1e3594e157e1c130f
2020-09-13 11:31:02 -07:00
Mikhail Zolotukhin
82da6b3702 [JIT] Fix jit-log verbosity selection logic. (#44587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44587

Currently it's skewed by one.

The following test demonstrates it:
```
$ cat test.py

import torch
def foo(a,b):
    return a*a*b
torch._C._jit_set_profiling_executor(True)
torch._C._jit_set_profiling_mode(True)
torch._C._jit_override_can_fuse_on_cpu(True)
torch._C._jit_set_texpr_fuser_enabled(True)
f = torch.jit.script(foo)
for _ in range(10):
    f(torch.rand(10), torch.rand(10))

$ cat test_logging_levels.sh

PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser"    python test.py 2>&1 | grep DUMP   >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser"    python test.py 2>&1 | grep UPDATE >& /dev/null && echo FAIL || echo OK
PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser"    python test.py 2>&1 | grep DEBUG  >& /dev/null && echo FAIL || echo OK

PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser"   python test.py 2>&1 | grep DUMP   >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser"   python test.py 2>&1 | grep UPDATE >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser"   python test.py 2>&1 | grep DEBUG  >& /dev/null && echo FAIL || echo OK

PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser"  python test.py 2>&1 | grep DUMP   >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser"  python test.py 2>&1 | grep UPDATE >& /dev/null && echo OK || echo FAIL
PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser"  python test.py 2>&1 | grep DEBUG  >& /dev/null && echo OK || echo FAIL
```

Before this change:
```
OK
FAIL
OK
OK
OK
FAIL
OK
OK
OK
```

With this change everthing passes.

Differential Revision: D23666813

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: 4adaa5a3d06deadf54eae014a0d76588cdc5e20a
2020-09-13 11:29:25 -07:00
Bert Maher
6d4a605ce9 Fix bug simplifying if-then-else when it can be removed (#44462)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44462

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23671157

Pulled By: bertmaher

fbshipit-source-id: b9b92ad0de1a7bd9bc1fcac390b542d885d0ca58
2020-09-13 10:29:28 -07:00
Yi Wang
82b4477948 Pass the input tensor vector by const reference. (#44340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44340

Changed the constructor of GradBucket to pass the input by const
reference and hence avoided unnecessary explicit move semantics. Since
previously the declaration and definition are separated, passing the input
tensor vector by value looks quite bizarre.

Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest

Reviewed By: pritamdamania87

Differential Revision: D23569939

fbshipit-source-id: db761d42e76bf938089a0b38e98e76a05bcf4162
2020-09-11 18:03:56 -07:00
Yi Wang
ab5fee2784 Move the inline implementations of GradBucket class to the header. (#44339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44339

Moved the inline implementations of GradBucket class to the header for
succinctness and readability. This coding style is also consistent with
reducer.h under the same directory.

Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest

Reviewed By: pritamdamania87

Differential Revision: D23569701

fbshipit-source-id: 237d9e2c5f63a6bcac829d0fcb4a5ba3bede75e5
2020-09-11 18:01:37 -07:00
Elias Ellison
1f0dcf39fc [JIT] dont optimize device dtype on inline (#43363)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/36404

Adding prim::device and prim::dtype to list of skipped peepholes when we run inlining. In the long term another fix may not be to encode shape / dtype info on the traced graph, because it is not guaranteed to be correct. This is blocked by ONNX currently.

Partial fix for https://github.com/pytorch/pytorch/issues/43134

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43363

Reviewed By: glaringlee

Differential Revision: D23383987

Pulled By: eellison

fbshipit-source-id: 2e9c5160d39d690046bd9904be979d58af8d3a20
2020-09-11 17:29:54 -07:00
Mikhail Zolotukhin
d729e2965e [TensorExpr] Do not inline autodiff graphs if they contain prim::TypeCheck nodes. (#44564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44564

Before this change we sometimes inlined autodiff subgraph containing
fusion groups. This happened because we didn't look for 'unsupported'
nodes recursively (maybe we should), but fusion groups were inside
if-nodes.

The problem was detected by bertmaher in 'LearningToPaint' benchmark
investigation where this bug caused us to keep constantly hitting
fallback paths of the graph.

Test Plan: Imported from OSS

Reviewed By: bwasti

Differential Revision: D23657049

Pulled By: ZolotukhinM

fbshipit-source-id: 7c853424f6dce4b5c344d6cd9c467ee04a8f167e
2020-09-11 17:28:53 -07:00
Nick Gibson
64b4307d47 [NNC] Cuda Codegen - mask loops bound to block/thread dimensions (#44325)
Summary:
Fix an issue where loops of different sizes are bound to the same Cuda dimension / metavar.

Coming soon more info and tests...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44325

Reviewed By: colesbury

Differential Revision: D23628859

Pulled By: nickgg

fbshipit-source-id: 3621850a4cc38a790b62ad168d32e7a0e2462fad
2020-09-11 16:48:16 -07:00
Nikita Shulga
2ae74c0632 Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453)
Summary:
2nd attempt to land https://github.com/pytorch/pytorch/pull/44079

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453

Reviewed By: walterddr, seemethere

Differential Revision: D23619528

Pulled By: malfet

fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7
2020-09-11 16:27:47 -07:00
Zafar
1fb5883072 removing conv filters from conv pattern matching (#44512)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44512

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23637409

Pulled By: z-a-f

fbshipit-source-id: ad5be0fa6accfbcceaae9171bf529772d87b4098
2020-09-11 15:16:29 -07:00
Wanchao Liang
ab6126b50e [rpc][jit] support remote call in TorchScript (#43046)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43046

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23621108

Pulled By: wanchaol

fbshipit-source-id: e8152c6cdd3831f32d72d46ac86ce22f3f13c651
2020-09-11 14:59:51 -07:00
Wanchao Liang
3e5df5f216 [rpc][jit] support rpc_sync in TorchScript (#43043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43043

This add the support for rpc_sync in TorchScript in a way similar to
rpc_async

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23252039

Pulled By: wanchaol

fbshipit-source-id: 8a05329cb8a24079b2863178b73087d47273914c
2020-09-11 14:59:47 -07:00
Gregory Chanan
5579b53a7f Fix SmoothL1Loss when target.requires_grad is True. (#44486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44486

SmoothL1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True.

This PR does the following:

1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the SmoothL1Loss CriterionTests to verify that the target derivative is checked.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23630699

Pulled By: gchanan

fbshipit-source-id: 0f94d1a928002122d6b6875182867618e713a917
2020-09-11 12:13:36 -07:00
Cheng Chang
b7ef4eec46 [NNC] Add loop slicing transforms (#43854)
Summary:
Add new transforms `sliceHead` and `sliceTail` to `LoopNest`, for example:

Before transformation:
```
for x in 0..10:
  A[x] = x*2
```

After `sliceHead(x, 4)`:

```
for x in 0..4:
  A[x] = x*2
for x in 4..10:
  A[x] = x*2
```

After `sliceTail(x, 1)`:
```
for x in 0..4:
  A[x] = x*2
for x in 4..9:
  A[x] = x*2
for x in 9..10:
  A[x] = x*2
```

`sliceHead(x, 10)` and `sliceTail(x, 10)` is no-op.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43854

Test Plan: Tests are added in `test_loopnest.cpp`, the tests cover the basic transformations, and also tests the combination with other transformations such as `splitWithTail`.

Reviewed By: nickgg

Differential Revision: D23417366

Pulled By: cheng-chang

fbshipit-source-id: 06c6348285f2bafb4be3286d1642bfbe1ea499bf
2020-09-11 12:09:12 -07:00
Ann Shan
442957d8b6 [pytorch] Remove mobile nonvariadic run_method (#44235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44235

Removes nonvariadic run_method() from mobile Module entirely (to be later replaced by a variadic version). All use cases should have been migrated to use get_method() and Method::operator() in D23436351
ghstack-source-id: 111848220

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D23484577

fbshipit-source-id: 602fcde61e13047a34915b509da048b9550103b1
2020-09-11 10:23:08 -07:00
Ann Shan
a61318a535 [pytorch] Replace mobile run_method with get_method and operator() (#44202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202

In preparation for changing mobile run_method() to be variadic, this diff:

* Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist.
* Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects.
ghstack-source-id: 111848222

Test Plan: CI, and all the unit tests which currently contain run_method that are being changed.

Reviewed By: iseeyuan

Differential Revision: D23436351

fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577
2020-09-11 10:23:06 -07:00
Martin Yuan
b73b44f976 [PyTorch Mobile] Move some string ops to register_prim_ops.cpp and make them selective (#44500)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44500

Some user models are using those operators. Unblock them while keep the ops selective.

Test Plan: CI

Reviewed By: linbinyu

Differential Revision: D23634769

fbshipit-source-id: 55841d1b07136b6a27b6a39342f321638dc508cd
2020-09-11 09:24:35 -07:00
Gregory Chanan
d07d25a8c5 Fix MSELoss when target.requires_grad is True. (#44437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44437

MSELoss had a completely different (and incorrect, see https://github.com/pytorch/pytorch/issues/43228) path when target.requires_grad was True.

This PR does the following:
1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the MSELoss CriterionTests to verify that the target derivative is checked.

TODO:
1) do we still need check_criterion_jacobian when we run grad/gradgrad checks?
2) ensure the Module tests check when target.requires_grad
3) do we actually test when reduction='none' and reduction='mean'?

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23612166

Pulled By: gchanan

fbshipit-source-id: 4f74d38d8a81063c74e002e07fbb7837b2172a10
2020-09-11 08:51:28 -07:00
Shen Li
a9754fb860 Use TP Tensor.metadata to carry device info (#44396)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44396

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D23602576

Pulled By: mrshenli

fbshipit-source-id: c639789979b2b71fc165efbcf70f37b4c39469df
2020-09-11 08:33:22 -07:00
lixinyu
77cc7d1ecd C++ APIs Transformer NN Module Top Layer (#44333)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44333

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23584010

Pulled By: glaringlee

fbshipit-source-id: 990026e3f1b5ae276776e344ea981386cb7528fe
2020-09-11 08:25:27 -07:00
Nick Gibson
30fccc53a9 [NNC] Don't attempt to refactor conditional scalars (#44223)
Summary:
Fixes a bug in the NNC registerizer for Cuda where it would hoist reads out of a conditional context when trying to cache them. As a quick fix, prevent scalar replacement if a usage is within a condition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44223

Reviewed By: gchanan

Differential Revision: D23551247

Pulled By: nickgg

fbshipit-source-id: 17a7bf2be4c8c3dd8a9ab7997dce9aea200c3685
2020-09-11 04:22:16 -07:00
Elias Ellison
8b8986662f [JIT] Remove profiling nodes in autodiff forward graph (#44420)
Summary:
Previously we were not removing profiling nodes in graphs that required grad and contained diff graphs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44420

Reviewed By: bertmaher

Differential Revision: D23607482

Pulled By: eellison

fbshipit-source-id: af095f3ed8bb3c5d09610f38cc7d1481cbbd2613
2020-09-11 02:59:39 -07:00
Mikhail Zolotukhin
c6febc6480 [JIT] Add a python hook for a function to interpret JIT graphs. (#44493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44493

This function allows to execute a graph exactly as it is, without going
through a graph executor which would run passes on the graph before
interpreting it. I found this feature extremely helpful when I worked on
a stress-testing script to shake out bugs from the TE fuser: I needed to
execute a very specific set of passes on a graph and nothing else, and
then execute exactly it.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23632505

Pulled By: ZolotukhinM

fbshipit-source-id: ea81fc838933743e2057312d3156b77284d832ef
2020-09-11 02:55:26 -07:00
Pritam Damania
51ed31269e Replace FutureMessage with c10::ivalue::Future in DistEngine. (#44239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44239

As part of https://github.com/pytorch/pytorch/issues/41574, use
c10::ivalue::Future everywhere in DistEngine.
ghstack-source-id: 111645070

Test Plan: waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D23553507

fbshipit-source-id: 1b51ba13d1ebfa6c5c70b12028e9e96ce8ba51ff
2020-09-11 01:03:42 -07:00
Richard Zou
69f6d94caa Register diag_backward, diagonal_backward, infinitetely...gelu_backward as operators (#44422)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44422

See #44052 for context.

Test Plan:
- `pytest test/test_autograd.py -v`
- `pytest test/test_nn.py -v`

Reviewed By: mrshenli

Differential Revision: D23607691

Pulled By: zou3519

fbshipit-source-id: 09fbcd66b877af4fa85fd9b2f851ed3912ce84d6
2020-09-10 18:43:18 -07:00
Richard Zou
7ff7e6cfc8 Register cummaxmin_backward, cumprod_backward as operators (#44410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44410

See #44052 for context. One of the cumprod_backward overloads was unused
so I just deleted it.

Test Plan: - `pytest test/test_autograd.py -v`

Reviewed By: mrshenli

Differential Revision: D23605503

Pulled By: zou3519

fbshipit-source-id: f9c5b595e62d2d6e71f26580ba96df15cc9de4f7
2020-09-10 18:43:15 -07:00
Richard Zou
08b431f54c Add trace_backward, masked_select_backward, and take_backward as ops (#44408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44408

See #44052 for context.

Test Plan: - `pytest test/test_autograd.py -v`

Reviewed By: mrshenli

Differential Revision: D23605504

Pulled By: zou3519

fbshipit-source-id: b9b1646d13caa6e536d08669c29bfc2ad8ff89a3
2020-09-10 18:41:07 -07:00
Ann Shan
1dd3fae3d2 [pytorch] Add logging to mobile Method run (#44234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44234

Changes mobile Method to point to a mobile Module directly instead of the Module ivalue in order to access metadata for logging/debugging, and then adds said logging.
ghstack-source-id: 111775806

Test Plan:
CI/existing unit tests to test BC
Testing fb4a logging:
Built fb4a on D23436351 (because usage of run_method isn't replaced yet in this diff), and then checked the Scuba logs to see that the appropriate ad clicks were logged (one ad for Buzzfeed shopping and another about Netflix from Bustle)

{F328510687}
{F328511201}
[Scuba sample of QPL metrics](https://www.internalfb.com/intern/scuba/query/?dataset=qpl_metrics%2Fpytorch_employee&pool=uber&view=samples_client&drillstate=%7B%22sampleCols%22%3A[%22device_model%22%2C%22instance_id_sampled%22%2C%22method%22%2C%22ios_device_class%22%2C%22points_path%22%2C%22userid_sampled%22%2C%22client_sample_rate%22%2C%22browser_name%22%2C%22ios_device_name%22%2C%22points%22%2C%22is_employee%22%2C%22is_test_user%22%2C%22network_only_queries%22%2C%22annotations%22%2C%22oncall_shortname%22%2C%22environment_tags%22%2C%22revoked_queries%22%2C%22annotations_bool%22%2C%22points_data%22%2C%22annotations_double_array%22%2C%22annotations_string_array%22%2C%22revoked_steps%22%2C%22points_set%22%2C%22device_os_version%22%2C%22ota_version_rollout%22%2C%22steps%22%2C%22vadar_calculation_result%22%2C%22app_name%22%2C%22client_push_phase%22%2C%22vadar%22%2C%22release_channel%22%2C%22interaction_class%22%2C%22exposures%22%2C%22annotations_double%22%2C%22deviceid_sampled%22%2C%22is_logged_in%22%2C%22device_os%22%2C%22time%22%2C%22major_os_ver%22%2C%22annotations_int_array%22%2C%22duration_ns%22%2C%22app_build%22%2C%22bucket_id%22%2C%22cache_and_network_queries%22%2C%22value%22%2C%22vadar_v2%22%2C%22quicklog_event%22%2C%22unixname%22%2C%22vadar_calculation_result_v2%22%2C%22trace_tags%22%2C%22annotations_int%22%2C%22quicklog_module%22%2C%22push_phase%22%2C%22year_class%22%2C%22country%22%2C%22capped_duration%22%2C%22ram_class%22%2C%22weight%22%2C%22carrier%22%2C%22app_id%22%2C%22app_version%22%2C%22react_bundle_version%22%2C%22logging_source%22%2C%22is_unsampled_for_scuba%22%2C%22instrumentation_errors%22%2C%22android_cpu_abi_list%22%2C%22days_after_release%22%2C%22cpu_cores%22%2C%22user_bucket%22%2C%22quicklog_action%22%2C%22server_scuba_sample_rate%22%2C%22points_vector%22%2C%22annotations_bool_array%22%2C%22android_device_class%22%2C%22browser_full_version%22%2C%22major_app_ver%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22hideEmptyColumns%22%3Afalse%2C%22focused_event%22%3A%22%22%2C%22show_metadata%22%3A%22false%22%2C%22start%22%3A%222020-09-08%2011%3A27%3A00%22%2C%22end%22%3A%22start%20%2B%201%20minute%22%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22samplingRatio%22%3A%221%22%2C%22num_samples%22%3A%22100%22%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[]%2C%22modifiers%22%3A[]%2C%22order%22%3A%22none%22%2C%22order_desc%22%3Atrue%2C%22filterMode%22%3A%22DEFAULT%22%2C%22constraints%22%3A[[%7B%22column%22%3A%22quicklog_event%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22MOBILE_MODULE_STATS%5C%22]%22]%7D%2C%7B%22column%22%3A%22userid_sampled%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22100013484978975%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22metrik_view_params%22%3A%7B%22should_use_legacy_colors%22%3Afalse%2C%22columns_skip_formatting%22%3A[]%2C%22view%22%3A%22samples_client%22%2C%22width%22%3A%221358%22%2C%22height%22%3A%22912%22%2C%22tableID%22%3A%22qpl_metrics%2Fpytorch_employee%22%2C%22fitToContent%22%3Afalse%2C%22format_tooltip_in_percent%22%3Afalse%2C%22use_y_axis_hints_as_limits%22%3Atrue%2C%22has_dynamic_context_menu%22%3Atrue%2C%22has_context_menu%22%3Afalse%2C%22legend_mode%22%3A%22nongrid%22%2C%22connect_nulls%22%3Atrue%2C%22timezone_offset%22%3A420%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22y_min_hint%22%3A0%2C%22should_render_plugins_menu%22%3Afalse%7D%7D&normalized=1599581160)
[Scuba sample showing ad source; just the bottom two results](https://www.internalfb.com/intern/scuba/query/?dataset=business_integrity_webpage_semantic&pool=uber&drillstate=%7B%22sampleCols%22%3A[%22from_custom_sampling%22%2C%22data_version%22%2C%22scribe_category_type%22%2C%22page_id%22%2C%22name%22%2C%22source_url%22%2C%22time%22%2C%22title_semantic%22%2C%22major_version%22%2C%22server_protocol%22%2C%22custom_sampling_enabled%22%2C%22ad_id%22%2C%22appversion%22%2C%22clienttime%22%2C%22isemployee%22%2C%22title%22%2C%22images%22%2C%22weight%22%2C%22carrier%22%2C%22is_ad%22%2C%22locale%22%2C%22appid%22%2C%22ip_country%22%2C%22iab_models%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22return_remainder%22%3Afalse%2C%22should_pivot%22%3Afalse%2C%22is_timeseries%22%3Afalse%2C%22hideEmptyColumns%22%3Afalse%2C%22main_dimension%22%3A%22time%22%2C%22start%22%3A%22-5%20minutes%22%2C%22samplingRatio%22%3A%221%22%2C%22compare%22%3A%22none%22%2C%22axes%22%3A%22linked%22%2C%22overlay_types%22%3A[]%2C%22minBucketSamples%22%3A%22%22%2C%22dimensions%22%3A[]%2C%22scale_type%22%3A%22absolute%22%2C%22num_samples%22%3A%22100%22%2C%22metric%22%3A%22avg%22%2C%22fill_missing_buckets%22%3A%22connect%22%2C%22smoothing_bucket%22%3A%221%22%2C%22top%22%3A%227%22%2C%22markers%22%3A%22%22%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22end%22%3A%22now%22%2C%22show_p95_ci%22%3Afalse%2C%22time_bucket%22%3A%22auto%22%2C%22compare_mode%22%3A%22normal%22%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[]%2C%22modifiers%22%3A[]%2C%22order%22%3A%22none%22%2C%22order_desc%22%3Atrue%2C%22filterMode%22%3A%22DEFAULT%22%2C%22constraints%22%3A[[%7B%22column%22%3A%22major_version%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22288%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22metrik_view_params%22%3A%7B%22should_use_legacy_colors%22%3Afalse%2C%22columns_skip_formatting%22%3A[]%2C%22view%22%3A%22time_view%22%2C%22width%22%3A%221358%22%2C%22height%22%3A%22912%22%2C%22tableID%22%3A%22business_integrity_webpage_semantic%22%2C%22fitToContent%22%3Afalse%2C%22format_tooltip_in_percent%22%3Afalse%2C%22use_y_axis_hints_as_limits%22%3Atrue%2C%22has_dynamic_context_menu%22%3Atrue%2C%22has_context_menu%22%3Afalse%2C%22legend_mode%22%3A%22nongrid%22%2C%22connect_nulls%22%3Atrue%2C%22timezone_offset%22%3A420%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22y_min_hint%22%3A0%2C%22should_render_plugins_menu%22%3Afalse%7D%7D&view=samples_client&normalized=1599587280)

Reviewed By: iseeyuan

Differential Revision: D23548687

fbshipit-source-id: 3e63085663f5fd8de90a4c7dbad0a17947aee973
2020-09-10 15:26:33 -07:00
Nikita Shulga
4bead6438a Enable torch.autograd typechecks (#44451)
Summary:
To help with further typing, move dynamically added native contributions from `torch.autograd` to `torch._C._autograd`
Fix invalid error handling pattern in
89ac30afb8/torch/csrc/autograd/init.cpp (L13-L15)
`PyImport_ImportModule` already raises Python exception and nullptr should be returned to properly propagate the to Python runtime.

And all native methods/types in `torch/autograd/__init.py` after `torch._C._init_autograd()` has been called
Use f-strings instead of `.format` in test_type_hints.py
Fixes https://github.com/pytorch/pytorch/issues/44450

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44451

Reviewed By: ezyang

Differential Revision: D23618261

Pulled By: malfet

fbshipit-source-id: fa5f739d7cff8410641128b55b810318c5f636ae
2020-09-10 13:37:29 -07:00
Elias Ellison
cc5a1cf616 [JIT] Erase shapes before fallback graph (#44434)
Summary:
Previously the specialized types were copied over to the fallback function, although the tensors in the fallback type were not of that type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44434

Reviewed By: SplitInfinity

Differential Revision: D23611943

Pulled By: eellison

fbshipit-source-id: 2ea88a97529409f6c5c4c1f59a14b623524933de
2020-09-10 12:07:31 -07:00
Kenichi Maehashi
cb90fef770 Fix return value of PyErr_WarnEx ignored (SystemError) (#44371)
Summary:
This PR fixes unexpected `SystemError` when warnings are emitted and warning filters are set.

## Current behavior

```
$ python -Werror
>>> import torch
>>> torch.range(1, 3)
UserWarning: torch.range is deprecated in favor of torch.arange and will be removed in 0.5. Note that arange generates values in [start; end), not [start; end].

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: <built-in method range of type object at 0x7f38c7703a60> returned a result with an error set
```

## Expected behavior

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end).
```

## Note

Python exception must be raised if `PyErr_WarnEx` returns `-1` ([python docs](https://docs.python.org/3/c-api/exceptions.html#issuing-warnings)). This PR fixes warnings raised in the following code:
```py
import torch

torch.range(1, 3)
torch.autograd.Variable().volatile
torch.autograd.Variable().volatile = True
torch.tensor(torch.tensor([]))
torch.tensor([]).new_tensor(torch.tensor([]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44371

Reviewed By: mrshenli

Differential Revision: D23598410

Pulled By: albanD

fbshipit-source-id: 2fbcb13fe4025dbebaf1fd837d4c8e0944e05010
2020-09-10 10:15:21 -07:00
generatedunixname89002005287564
356aa54694 [Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D23621463

fbshipit-source-id: 1cd7e94e480c7073c9a0aad55aeba98de4b96164
2020-09-10 04:24:43 -07:00
Cheng Chang
28bd4929bd [NNC] Make it able to normalize loop with variable start (#44133)
Summary:
Loops with variable start can also be normalized.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44133

Test Plan: updated testNormalizeStartVariable.

Reviewed By: navahgar

Differential Revision: D23507097

Pulled By: cheng-chang

fbshipit-source-id: 4e9aad1cd4f4a839f59a00bf8ddf97637a1a6648
2020-09-09 23:05:57 -07:00
Meghan Lele
d3b6d5caf1 [JIT] Add support for del to TS classes (#44352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44352

**Summary**
This commit adds support for `del` with class instances. If a class
implements `__delitem__`, then `del class_instance[key]` is syntactic
sugar for `class_instance.__delitem__[key]`.

**Test Plan**
This commit adds a unit test to TestClassTypes to test this feature.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23603102

Pulled By: SplitInfinity

fbshipit-source-id: 28ad26ddc9a693a58a6c48a0e853a1c7cf5c9fd6
2020-09-09 19:52:35 -07:00
Elias Ellison
b69c28d02c Improving ModuleList indexing error msg (#43361)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/41946/, to suggest enumerating a module as an alternative if a user tries indexing into a modulelist/sequential with a non-integer literal

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43361

Reviewed By: mrshenli

Differential Revision: D23602388

Pulled By: eellison

fbshipit-source-id: 51fa28d5bc45720529b3d45e92d367ee6c9e3316
2020-09-09 16:22:57 -07:00
Lillian Johnson
b0bcdbb1ab [JIT] Support partially specified sizes/strides in IRParser (#44113)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44113

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23508149

Pulled By: Lilyjjo

fbshipit-source-id: b6b2d32109fae599bc5347dae742b67a2e4a0a49
2020-09-09 14:45:51 -07:00
Yuchen Huang
a00d36b0e7 [PyTorch][Mobile] Insert the module name as name() to metadata dict if metadata doesn't contain "model_name" (#44400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44400

This diff does the identical thing as D23549149 (398409f072) does. A fix included for OSS CI: pytorch_windows_vs2019_py36_cuda10.1_test1
ghstack-source-id: 111679745

Test Plan:
- CI
- OSS CI

Reviewed By: xcheng16

Differential Revision: D23601050

fbshipit-source-id: 8ebdcd8fdc5865078889b54b0baeb397a90ddc40
2020-09-09 13:01:17 -07:00
Ailing Zhang
24efd29d19 Check commutativity for computed dispatch table and add a test to check entries. (#44088)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44088

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23492793

Pulled By: ailzhang

fbshipit-source-id: 37502f2a8a4d755219b400fcbb029e49d6cdb6e9
2020-09-09 12:48:34 -07:00
Nikita Shulga
683380fc91 Use compile time cudnn version if linking with it statically (#44402)
Summary:
This should prevent torch_python from linking the entire cudnn library statically just to query its version

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44402

Reviewed By: seemethere

Differential Revision: D23602720

Pulled By: malfet

fbshipit-source-id: 185b15b789bd48b1df178120801d140ea54ba569
2020-09-09 11:33:41 -07:00
Bert Maher
6ec8fabc29 Fix frac in CUDA fuser (#44152)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44152

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23528506

fbshipit-source-id: bfd468d72fa55ce317f88ae83e1f2d5eee041aa0
2020-09-09 11:10:08 -07:00
Bert Maher
350130a69d Prevent the TE fuser from getting datatypes it can't handle (#44160)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44160

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D23528508

Pulled By: bertmaher

fbshipit-source-id: 03b22725fb2666f441cb504b35397ea6d155bb85
2020-09-09 11:10:04 -07:00
Bert Maher
960c088a58 [te] Fix casting of unsigned char, and abs(int) (#44157)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44157

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D23528507

Pulled By: bertmaher

fbshipit-source-id: c5ef0422a91a4665b616601bed8b7cd137be39f9
2020-09-09 11:08:36 -07:00
Bert Maher
8acce55015 Dump optimized graph when logging in already-optimized PE (#44315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44315

I find it more intuitive to dump the optimized graph if we have one;
when I first saw the unoptimized graph being dumped I thought we had failed to
apply any optimizations.

Test Plan: Observe output by hand

Reviewed By: Lilyjjo

Differential Revision: D23578813

Pulled By: bertmaher

fbshipit-source-id: e2161189fb0e1cd53aae980a153aea610871662a
2020-09-09 01:28:48 -07:00
Taewook Oh
7a64b0c27a Export Node::isBefore/isAfter for PythonAPI (#44162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44162

This diff exports Node::isBefore/isAfter method to PythonAPI.

Test Plan: Tested locally. Please let me know if there is a set of unit tests to be passed.

Reviewed By: soumith

Differential Revision: D23514448

fbshipit-source-id: 7ef709b036370217ffebef52fd93fbd68c464e89
2020-09-09 00:57:08 -07:00
Mikhail Zolotukhin
bd8e38cd88 [TensorExpr] Fuser: check node inputs' device before merging the node into a fusion group. (#44241)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44241

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23554192

Pulled By: ZolotukhinM

fbshipit-source-id: fb03262520303152b83671603e08e7aecc24f5f2
2020-09-08 19:32:23 -07:00
Nick Gibson
be94dba429 [NNC] fix support for FP16 in CudaCodgen (#44209)
Summary:
Fixes a bug where FP16 values could be incorrectly cast to a half type that doesn't have a cast operator by inserting the cuda specific cast to float during handling of the Cast node, not as a wrapper around printing Loads and Stores. Two main changes: the HalfChecker now inserts the casts to float explicitly in the IR, and the PrioritizeLoad mutator now consumes both Loads and a Cast which immediately preceded a load.

Tested with test_jit_fuser_te.py and test_tensorexpr.py, plus C++ tests obv.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44209

Reviewed By: izdeby

Differential Revision: D23575577

Pulled By: nickgg

fbshipit-source-id: 808605aeb2af812758f96f9fdc11b07e08053b46
2020-09-08 18:00:39 -07:00
Sujoy Saraswati
54931ebb7b Release saved variable from DifferentiableGraphBackward (#42994)
Summary:
When the backward ops execute via the autograd engine evaluate_function(), the fn.release_variables() is called to release the SavedVariables. For the eager mode ops, this releases the saved inputs that was required for backward grad function. However, with TorchScript, we get a DifferentableGraph and the DifferentiableGraphBackward() doesn't implement a release_variables(). This leads to the SavedVariables to be alive longer. Implement release_variables() for DifferentiableGraphBackward to release these SavedVariables  early.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42994

Reviewed By: izdeby

Differential Revision: D23503172

Pulled By: albanD

fbshipit-source-id: d87127498cfa72883ae6bb31d0e6c7056c4c36d4
2020-09-08 14:36:52 -07:00
Ailing Zhang
1b2da9ed82 Expose alias key info in dumpState and update test_dispatch. (#44081)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44081

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23492794

Pulled By: ailzhang

fbshipit-source-id: 27a2978591900463bda2e92e0201c9fd719f9792
2020-09-06 18:43:05 -07:00
Mike Ruberry
83a6e7d342 Adds inequality testing aliases for better NumPy compatibility (#43870)
Summary:
This PR adds the following aliaes:

- not_equal for torch.ne
- greater for torch.gt
- greater_equal for torch.ge
- less for torch.lt
- less_equal for torch.le

This aliases are consistent with NumPy's naming for these functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43870

Reviewed By: zou3519

Differential Revision: D23498975

Pulled By: mruberry

fbshipit-source-id: 78560df98c9f7747e804a420c1e53fd1dd225002
2020-09-06 09:36:23 -07:00
Nikita Shulga
e358d516c8 Revert D23549149: [PyTorch][Mobile] Insert the module name as name() to metadata dict if metadata doesn't contain "model_name"
Test Plan: revert-hammer

Differential Revision:
D23549149 (398409f072)

Original commit changeset: fad742a8d4e6

fbshipit-source-id: bd92a2033a804d3e6a2747b4fda4ca527991a993
2020-09-06 00:06:35 -07:00
Supriya Rao
199c73be0f [quant][pyper] Support quantization of ops in fork-wait subgraph (#44048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44048

Inline the fork-wait calls to make sure we can see the ops to be quantized in the main graph

Also fix the InlineForkWait JIT pass to account for the case where the aten::wait call isn't present in the main graph
and we return future tensor from subgraph

Example

```
graph(%self.1 : __torch__.dper3.core.interop.___torch_mangle_6325.DperModuleWrapper,
       %argument_1.1 : Tensor,
       %argument_2.1 : Tensor):
   %3 : Future[Tensor[]] = prim::fork_0(%self.1, %argument_1.1, %argument_2.1) # :0:0
   return (%3)
 with prim::fork_0 = graph(%self.1 : __torch__.dper3.core.interop.___torch_mangle_5396.DperModuleWrapper,
       %argument_1.1 : Tensor,
       %argument_2.1 : Tensor):
   %3 : __torch__.dper3.core.interop.___torch_mangle_6330.DperModuleWrapper = prim::GetAttr[name="x"](%self.1)
   %4 : __torch__.dper3.core.interop.___torch_mangle_5397.DperModuleWrapper = prim::GetAttr[name="y"](%self.1)
   %5 : __torch__.dper3.core.interop.___torch_mangle_6327.DperModuleWrapper = prim::GetAttr[name="z"](%4)
   %6 : Tensor = prim::CallMethod[name="forward"](%5, %argument_1.1, %argument_2.1) # :0:0
   %7 : None = prim::CallMethod[name="forward"](%3, %6) # :0:0
   %8 : Tensor[] = prim::ListConstruct(%6)
   return (%8)
```

Test Plan:
python test/test_quantization.py test_interface_with_fork

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23481003

fbshipit-source-id: 2e756be73c248319da38e053f021888b40593032
2020-09-05 12:06:19 -07:00
Supriya Rao
164b96c34c [quant][pyper] make embedding_bag quantization static (#44008)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44008

embedding_bag requires only quantization of weights (no dynamic quantization of inputs)
So the type of quantization is essentially static (without calibration)
This will enable pyper to do fc and embedding_bag quantization using the same API call

Test Plan:
python test/test_quantization.py test_embedding_bag

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23467019

fbshipit-source-id: 41a61a17ee34bcb737ba5b4e19fb7a576d4aeaf9
2020-09-05 12:06:16 -07:00
Supriya Rao
a0ae416d60 [quant] Support aten::embedding_bag quantization in graph mode (#43989)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43989

When we trace the model it produces aten::embedding_bag node in the graph,
Add necessary passes in graph mode to help support quantizing it as well

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23460485

fbshipit-source-id: 328c5e1816cfebb10ba951113f657665b6d17575
2020-09-05 12:05:06 -07:00
Yi Wang
15a7368115 Add const to getTensors method of GradBucket. (#44126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44126

Add const to getTensors method of GradBucket.

Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest

Reviewed By: sinannasir, jiayisuse

Differential Revision: D23504088

fbshipit-source-id: 427d9591042e0c03cde02629c1146ff1e5e027f9
2020-09-05 09:19:42 -07:00
Elias Ellison
5bd2902796 [JIT] Remove references to no longer generated _tanh_backward and _sigmoid_backward (#44138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44138

If you look at the sigmoid and tanh backward they are composed of other ops: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L786
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L164

So tanh_backward and sigmoid_backward are no longer generated / legacy ops.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23543603

Pulled By: eellison

fbshipit-source-id: ce8353e53043cf969b536aac47c9576d66d4ce02
2020-09-05 01:41:36 -07:00
Elias Ellison
df67f0beab [TensorExpr fuser] Guard nodes that have tensor output properties determined by non-tensor inputs (#44137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44137

We only insert guards on Tensor types, so we rely on the output
of a node being uniquely determined by its input types.
bail if any non-Tensor input affects the output type
and cannot be reasoned about statically

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23543602

Pulled By: eellison

fbshipit-source-id: abd6fe0b1fd7fe6fc251694d4cd442b19c032dd7
2020-09-05 01:40:18 -07:00
Wanchao Liang
d07a36e0c1 Revert D23490149: [pytorch][PR] Compile less legacy code when BUILD_CAFFE2 is set to False
Test Plan: revert-hammer

Differential Revision:
D23490149 (15e99b6ff6)

Original commit changeset: a76382c30d83

fbshipit-source-id: 75057fa9af2c19eb976962552118bf0a99911b38
2020-09-04 22:59:39 -07:00
Vasiliy Kuznetsov
a940f5ea5d torchscript graph mode quant: remove benchmark filter (#44165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44165

Allows convolutions to be quantized if `torch.cudnn.backends.benchmark`
flag was set.

Not for land yet, just testing.

Test Plan:
in the gist below, the resulting graph now has quantized convolutions
https://gist.github.com/vkuzo/622213cb12faa0996b6700b08d6ab2f0

Imported from OSS

Reviewed By: supriyar

Differential Revision: D23518775

fbshipit-source-id: 294f678c6afbd3feeb89b7a6655bc66ac9f8bfbc
2020-09-04 21:25:35 -07:00
Yuchen Huang
398409f072 [PyTorch][Mobile] Insert the module name as name() to metadata dict if metadata doesn't contain "model_name" (#44227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44227

As title
ghstack-source-id: 111490242

Test Plan: CI

Reviewed By: xcheng16

Differential Revision: D23549149

fbshipit-source-id: fad742a8d4e6f844f83495514cd60ff2bf0d5bcb
2020-09-04 21:18:12 -07:00
Nikita Shulga
15e99b6ff6 Compile less legacy code when BUILD_CAFFE2 is set to False (#44079)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44079

Reviewed By: walterddr

Differential Revision: D23490149

Pulled By: malfet

fbshipit-source-id: a76382c30d83127d180ec63ac15093a7297aae53
2020-09-04 20:04:21 -07:00
neginraoof
3d7c22a2ce [ONNX] Enable new scripting passes for functionalization and remove_mutation (#43791)
Summary:
Duplicate of https://github.com/pytorch/pytorch/issues/41413
This PR initiates the process of updating the torchsciprt backend interface used by ONNX exporter.

Replace jit lower graph pass by freeze module pass

Enable ScriptModule tests for ONNX operator tests (ORT backend) and model tests by default.

Replace jit remove_inplace_ops pass with remove_mutation and consolidation all passes for handling inplace ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43791

Reviewed By: houseroad

Differential Revision: D23421872

Pulled By: bzinodev

fbshipit-source-id: a98710c45ee905748ec58385e2a232de2486331b
2020-09-04 15:21:45 -07:00
Mikhail Zolotukhin
6474057c76 Revert D23503636: [pytorch][PR] [NNC] make inlining immediate (take 2) and fix bugs
Test Plan: revert-hammer

Differential Revision:
D23503636 (70aecd2a7f)

Original commit changeset: cdbdc902b7a1

fbshipit-source-id: b5164835f874a56213de4bed9ad690164eae9230
2020-09-04 10:58:23 -07:00
Richard Zou
9a5a732866 Register some backwards functions as operators (#44052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44052

Summary
=======

This PR registers the following backwards functions as operators:
- slice_backward
- select_backward
- gather_backward
- index_select_backward (the backward function for index_select)
- select_index_backward (prevously known as index_select_backward, but is actually the backward function for max.dim, min.dim, etc)

In the future, I'd like to register more backward functions as operators
so that we can write batching rules for the backward functions. Batching
rules for backward functions makes it so that we can compute batched
gradients.

Motivation
==========
The rationale behind this PR is that a lot of backwards functions (27 in total)
are incompatible with BatchedTensor due to using in-place operations.
Sometimes we can allow the in-place operations, but other times we can't.
For example, consider select_backward:

```
Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) {
  auto grad_input = at::zeros(input_sizes, grad.options());
  grad_input.select(dim, index).copy_(grad);
  return grad_input;
}
```

and consider the following code:
```
x = torch.randn(5, requires_grad=True)
def select_grad(v):
    torch.autograd.grad(x[0], x, v)

vs = torch.randn(B0)
batched_grads = vmap(select_grad)(vs)
```

For the batched gradient use case, `grad` is a BatchedTensor.
The physical version of `grad` has size `(B0,)`.
However, select_backward creates a `grad_input` of shape `(5)`, and
tries to copy `grad` to a slice of it.

Other approaches
================

I've considered the following:
- register select_backward as an operator (this PR)
- have a branch inside select_backward for if `grad` is batched.
    - this is OK, but what if we have more tensor extensions that want to override this?
- modify select_backward to work with BatchedTensor, by creating a new operator for the "select + copy_ behavior".
    - select + copy_ isn't used elsewhere in derivative formulas so this doesn't seem useful

Test Plan
=========

- `pytest test/test_autograd.py -v`
- Registering backward functions may impact performance. I benchmarked
select_backward to see if registering it as an operator led to any noticable
performance overheads: https://gist.github.com/zou3519/56d6cb53775649047b0e66de6f0007dc.
The TL;DR is that the overhead is pretty minimal.

Test Plan: Imported from OSS

Reviewed By: ezyang, fbhuba

Differential Revision: D23481183

Pulled By: zou3519

fbshipit-source-id: 125af62eb95824626dc83d06bbc513262ee27350
2020-09-04 08:30:39 -07:00
generatedunixname89002005287564
ef28ee50b0 [Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D23536086

fbshipit-source-id: 56e9c70a6998086515f59d74c5d8a2280ac2f669
2020-09-04 03:33:32 -07:00
Bert Maher
98ad5ff41f [te] Disable reductions by default (#44122)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44122

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D23504769

Pulled By: bertmaher

fbshipit-source-id: 1889217cd22da529e46ab30c9319a5646267e4ec
2020-09-03 23:37:45 -07:00
Martin Yuan
d221256888 [Message] Add what to do for missing operators.
Summary: As title.

Test Plan: N/A

Reviewed By: gaurav-work

Differential Revision: D23502416

fbshipit-source-id: a341eb10030e3f319266019ba4c02d9d9a0a6298
2020-09-03 22:41:27 -07:00
Nick Gibson
70aecd2a7f [NNC] make inlining immediate (take 2) and fix bugs (#43885)
Summary:
A rework of `computeInline` which makes it work a bit better, particularly when combined with other transformations. Previously we stored Functions that were inlined and then deferred the actual inlining of the function body until prepareForCodgen was called. This has an issue when transformations are applied to the LoopNest: the function body can be different from what appears in the root_stmt and result in inlining that a) fails, b) reverses other transformations or c) a weird unpredictable combination of the two.

This PR changes that behaviour so that the inlining occurs in the root stmt immediately, which means it reflects any previous transformations and any future transformations have a true view of the internal IR. It also has the benefit that inspecting the root statement gives an accurate view of it without needing to call prepareForCodgen. I also removed the difference between `computeInline` and `computeInlineWithRand` and we handle calls to `rand()` in all branches.

This is a rework of https://github.com/pytorch/pytorch/issues/38696, with the agreed changes from ZolotukhinM and zheng-xq: we should only inline if the dimensions are trivial (ie. they are vars not exprs).

This PR is mostly tests, and I fixed a bunch of bugs I found along the way. Partial list:
* When inlining an expression involving rand, we would create random vars equal to the dimensionality of the enclosing Tensor not the produced Tensor - meaning we'd use an incorrect value if the inlined tensor was smaller. E.g: `X[i] = rand(); A[i, j] = X[i]` would produce a tensor where `A[0, 0] != A[0, 1]`. This is fixed by inserting the Let binding of the random variable at the correct loop body.
* When inlining we'd replace all calls to `rand()` rather than just those present in the Tensor being inlined.
* `rand()` was treated symbolically by the simplifier and we would aggregate or cancel calls to `rand()`. Have fixed the hasher to hash all calls to `rand()` distinctly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43885

Reviewed By: gmagogsfm

Differential Revision: D23503636

Pulled By: nickgg

fbshipit-source-id: cdbdc902b7a14d269911d978a74a1c11eab004fa
2020-09-03 16:49:24 -07:00
Mikhail Zolotukhin
3105d8a9b2 [TensorExpr] Fuser: rely on input types when checking whether a device is supported. (#44139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44139

Also, make sure that we're checking that condition when we're starting a
new fusion group, not only when we merge a node into an existing fusion
group. Oh, and one more: add a test checking that we're rejecting graphs
with unspecified shapes.

Differential Revision: D23507510

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: 9c268825ac785671d7c90faf2aff2a3e5985ac5b
2020-09-03 16:27:14 -07:00
Meghan Lele
7816d53798 [JIT] Add mypy type annotations for JIT (#43862)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43862

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23491151

Pulled By: SplitInfinity

fbshipit-source-id: 88367b89896cf409bb9ac3db7490d6779efdc3a4
2020-09-03 15:09:24 -07:00
Michael Suo
9dd8670d7d [jit] Better match behavior of loaded ScriptModules vs. freshly created ones (#43298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43298

IR emitter uses `ModuleValue` to represent ScriptModules and emit IR for
attribute access, submodule access, etc.

`ModuleValue` relies on two pieces of information, the JIT type of the
module, and the `ConcreteModuleType`, which encapsulates Python-only
information about the module.

ScriptModules loaded from a package used to create a dummy
ConcreteModuleType without any info in it. This led to divergences in
behavior during compilation.

This PR makes the two ways of constructing a ConcreteModuleType equivalent,
modulo any py-only information (which, by definition, is never present in
packaged files anyway).

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23228738

Pulled By: suo

fbshipit-source-id: f6a660f42272640ca1a1bb8c4ee7edfa2d1b07cc
2020-09-03 15:03:39 -07:00
Michael Suo
74f18476a2 [jit] fix segfault in attribute lookup on loaded ScriptModules (#43284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43284

The IR emitter looks for attributes on modules like:
1. Check the JIT type for the attribute
2. Check the originating Python class, in order to fulfill requests for, e.g. static methods or ignored methods.

In the case where you do:
```
inner_module = torch.jit.load("inner.pt")
wrapped = Wrapper(inner_module)  # wrap the loaded ScriptModule in an nn.Module
torch.jit.script(wrapped)
```

The IR emitter may check for attributes on `inner_module`. There is no
originating Python class for `inner_module`, since it was directly
compiled from the serialized format.

Due to a bug in the code, we don't guard for this case an a segfault
results if the wrapper asks for an undefined attribute. The lookup in
this case looks like:
1. Check the JIT type for the attribute (not there!)
2. Check the originating Python class (this is a nullptr! segfault!)

This PR guards this case and properly just raises an attribute missing
compiler error instead of segfaulting.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23224337

Pulled By: suo

fbshipit-source-id: 0cf3060c427f2253286f76f646765ec37b9c4c49
2020-09-03 15:01:59 -07:00
Elias Ellison
6868bf95c6 [JIT] Fuser match on schemas not node kind (#44083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44083

Match on the complete schema of a node instead of its node kind when deciding to fuse it. Previously we matched on node kind, which could fail with something like `aten::add(int, int)` and if a new overload was added to an op without corresponding NNC support we would fuse it.

Follow ups are:
 - bail when an output tensor type isnt uniquely determined by the input types (e.g. aten::add and the second input could be either a float or an int)
- remove NNC lowering for _tanh_backward & _sigmoid_backward
- Validate that we support all of the overloads here. I optimistically added ops that included Tensors, it's possible that we do not support every overload here. This isn't a regression, and this PR is at least improving our failures in that regard.

I can do any of these as part of this PR if desired, but there are a number of failures people have run into that this PR fixes so I think it would be good to land this sooner than later.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D23503704

Pulled By: eellison

fbshipit-source-id: 3ce971fb1bc3a7f1cbaa38f1ed853e2db3d67c18
2020-09-03 14:47:19 -07:00
Ann Shan
9b3c72d46e [pytorch] Make mobile find_method return an optional (#43965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43965

As part of a larger effort to unify the API between the lite interpreter and full JIT:
- implement torch::jit::mobile::Method, a proxy for torch::jit::mobile::Function
- add support for overloaded operator() to mobile Method and Function
- mobile find_method now returns a c10::optional<Method> (so signature matches full jit)
- moves some implementation of Function from module.cpp to function.cpp
ghstack-source-id: 111161942

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D23330762

fbshipit-source-id: bf0ba0d711d9566c92af31772057ecd35983ee6d
2020-09-03 14:46:18 -07:00
Nikolay Korovaiko
f91bdbeabd Enable function calls in TEFuser and SpecializeAutogradZero (#43866)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43866

Reviewed By: ezyang

Differential Revision: D23452798

Pulled By: Krovatkin

fbshipit-source-id: 2cff4c905bf1b5d9de56e7869458ffa6fce1f1b5
2020-09-03 14:42:52 -07:00
Zafar
e05fa2f553 [quant] Prep for conv_transpose packing (#39714)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39714

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D22087071

Pulled By: z-a-f

fbshipit-source-id: 507f8a414026eb4c9926f68c1e94d2f56119bca6
2020-09-03 14:10:32 -07:00
Yanan Cao
f3da9e3b50 Enable Enum pickling/unpickling. (#43188)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **https://github.com/pytorch/pytorch/issues/43188 Enable Enum pickling/unpickling.**
* https://github.com/pytorch/pytorch/issues/42963 Add Enum TorchScript serialization and deserialization support
* https://github.com/pytorch/pytorch/issues/42874 Fix enum constant printing and add FileCheck to all Enum tests
* https://github.com/pytorch/pytorch/issues/43121 Add Enum convert back to Python object support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43188

Reviewed By: zdevito

Differential Revision: D23365141

Pulled By: gmagogsfm

fbshipit-source-id: f0c93d4ac614dec047ad8640eb6bd9c74159b558
2020-09-03 13:51:02 -07:00
Kimish Patel
a153f69417 Fix replaceAtenConvolution for BC. (#44036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44036

Running replaceAtenConvolution on older traced model wont work as
_convolution signature has changed and replaceAtenConvolution was
changed to account for that.
But we did not preserve the old behavior during that. This change
restores the old behavior while keeing the new one.

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23476775

fbshipit-source-id: 73a0c2b7387f2a8d82a8d26070d0059972126836
2020-09-03 12:57:57 -07:00
Kimish Patel
ba65cce2a2 Fix transposed conv2d rewrite pattern to account for convolution api (#44035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44035

change

Also added test so as to capture such cases for future.

Test Plan:
python test/test_xnnpack_integration.py

Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D23476773

fbshipit-source-id: a62c4429351c909245106a70b4c60b1bacffa817
2020-09-03 12:55:43 -07:00
Bert Maher
55ff9aa185 Test TE fuser unary ops and fix sigmoid(half) (#44094)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44094

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D23494950

Pulled By: bertmaher

fbshipit-source-id: 676c4e57267c4ad92065ea90b06323918dd5b0de
2020-09-03 12:48:46 -07:00
Xingying Cheng
c59e11bfbb Add soft error reporting to capture all the inference runtime failure. (#44078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44078

When PyTorch mobile inference failed and throw exception, if caller catch and not crash the app, we are not able to track all the inference failures.

So we are adding native soft error reporting to capture all the failures occurring during module loading and running including both crashing and on-crashing failures. Since c10::Error has good error messaging stack handling (D21202891 (a058e938f9)), we are utilizing it for the error handling and message print out.
ghstack-source-id: 111307080

Test Plan:
Verified that the soft error reporting is sent through module.cpp when operator is missing, make sure a logview mid is generated with stack trace: https://www.internalfb.com/intern/logview/details/facebook_android_softerrors/5dd347d1398c1a9a73c804b20f7c2179/?selected-logview-tab=latest.

Error message with context is logged below:

```
soft_error.cpp		[PyTorchMobileInference] : Error occured during model running entry point: Could not run 'aten::embedding' with arguments from the 'CPU' backend. 'aten::embedding' is only available for these backends: [BackendSelect, Named, Autograd, Autocast, Batched, VmapMode].

BackendSelect: fallthrough registered at xplat/caffe2/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at xplat/caffe2/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Autograd: fallthrough registered at xplat/caffe2/aten/src/ATen/core/VariableFallbackKernel.cpp:31 [backend fallback]
Autocast: fallthrough registered at xplat/caffe2/aten/src/ATen/autocast_mode.cpp:253 [backend fallback]
Batched: registered at xplat/caffe2/aten/src/ATen/BatchingRegistrations.cpp:317 [backend fallback]
VmapMode: fallthrough registered at xplat/caffe2/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

Exception raised from reportError at xplat/caffe2/aten/src/ATen/core/dispatch/OperatorEntry.cpp:261 (m
```

Reviewed By: iseeyuan

Differential Revision: D23428636

fbshipit-source-id: 82d5d9c054300dff18d144f264389402d0b55a8a
2020-09-03 10:54:43 -07:00
Sinan Nasir
98320061ad DDP Communication hook: (Patch) Fix the way we pass future result to buckets. (#43734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43734

Following the additional GH comments on the original PR https://github.com/pytorch/pytorch/pull/43307.
ghstack-source-id: 111327130

Test Plan: Run `python test/distributed/test_c10d.py`

Reviewed By: smessmer

Differential Revision: D23380288

fbshipit-source-id: 4b8889341c57b3701f0efa4edbe1d7bbc2a82ced
2020-09-03 08:59:10 -07:00
Mikhail Zolotukhin
40fec4e739 [TensorExpr] Fuser: do not fuse ops with 0-dim tensors. (#44073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44073

We don't have a proper support on NNC and JIT IR->NNC lowering side for it yet.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D23487905

Pulled By: ZolotukhinM

fbshipit-source-id: da0da7478fc8ce7b455176c95d8fd610c94352c1
2020-09-02 22:59:04 -07:00
Mikhail Zolotukhin
3da82aee03 [JIT] Remove profile nodes before BatchMM. (#43961)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43961

Currently we're removing prim::profile nodes and embed the type info
directly in the IR right before the fuser, because it is difficult to
fuse in a presence of prim::profile nodes. It turns out that BatchMM has
a similar problem: it doesn't work when there are prim::profile nodes in
the graph. These two passes run next to each other, so we could simply
remove prim::profile nodes slightly earlier: before the BatchMM pass.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23453266

Pulled By: ZolotukhinM

fbshipit-source-id: 92cb50863962109b3c0e0112e56c1f2cb7467ff1
2020-09-02 22:57:39 -07:00
Mikhail Zolotukhin
b2aaf212aa [TensorExpr] Add option to enforce TensorExprKernel fallbacks. (#43972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43972

It is useful when debugging a bug to disable NNC backend to see whether
the bug is there or in the fuser logic.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23455624

Pulled By: ZolotukhinM

fbshipit-source-id: f7c0452a29b860afc806e2d58acf35aa89afc060
2020-09-02 18:34:24 -07:00
Bert Maher
33d51a9b32 Respect canFuseOn{CPU,GPU} in TE fuser (#43967)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967

Test Plan: Imported from OSS

Reviewed By: asuhan

Differential Revision: D23469048

Pulled By: bertmaher

fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb
2020-09-02 18:00:25 -07:00
Lu Fang
f15e27265f [torch.fx] Add support for custom op (#43248)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43248

We add the support of __torch_function__ override for C++ custom op. The logic is the same as the other components, like torch.nn.Module.
Refactored some code a little bit to make it reusable.

Test Plan: buck test //caffe2/test:fx -- test_torch_custom_ops

Reviewed By: bradleyhd

Differential Revision: D23203204

fbshipit-source-id: c462a86e407e46c777171da32d7a40860acf061e
2020-09-02 16:08:37 -07:00
Elias Ellison
544a56ef69 [JIT] Always map node output in vmap (#43988)
Summary:
Previously when merging a node without a subgraph, we would merge the node's outputs to the corresponding subgraph values, but when merging a node with a subgraph the node's outputs would be absent in the value mapping. This PR makes it so they are included.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43988

Reviewed By: ZolotukhinM

Differential Revision: D23462116

Pulled By: eellison

fbshipit-source-id: 232c081261e9ae040df0accca34b1b96a5a5af57
2020-09-02 10:30:43 -07:00
albanD
73f009a2aa refactor manual function definitions (#43711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43711

this makes them available in forward if needed

No change to the file content, just a copy-paste.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23454146

Pulled By: albanD

fbshipit-source-id: 6269a4aaf02ed53870fadf8b769ac960e49af195
2020-09-02 09:23:21 -07:00
Lillian Johnson
cd58114c6c Adjust level of verbosity of debug dumps in graph executor T74227880 (#43682)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43682

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23397980

Pulled By: Lilyjjo

fbshipit-source-id: b0114efbd63b2a29eb14086b0a8963880023c2a8
2020-09-02 08:45:16 -07:00
Hong Xu
4bb5d33076 is_numpy_scalar should also consider bool and complex types (#43644)
Summary:
Before this PR,

```python
import torch
import numpy as np

a = torch.tensor([1, 2], dtype=torch.bool)
c = np.array([1, 2], dtype=np.bool)
print(a[0] == c[0])

a = torch.tensor([1, 2], dtype=torch.complex64)
c = np.array([1, 2], dtype=np.complex64)
print(a[0] == c[0])

 # This case is still broken
a = torch.tensor([1 + 1j, 2 + 2j], dtype=torch.complex64)
c = np.array([1 + 1j, 2 + 2j], dtype=np.complex64)
print(a[0] == c[0])
```

outputs

```
False
False
False
```

After this PR, it outputs:

```
tensor(True)
/home/user/src/pytorch/torch/tensor.py:25: ComplexWarning: Casting complex values to real discards the imaginary part return f(*args, **kwargs)
tensor(True)
tensor(False)
```

Related issue: https://github.com/pytorch/pytorch/issues/43579

cc anjali411 mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43644

Reviewed By: ailzhang

Differential Revision: D23425569

Pulled By: anjali411

fbshipit-source-id: a868209376b30cea601295e54015c47803923054
2020-09-02 07:41:50 -07:00
Yuchen Huang
7000c2efb5 [2/2][PyTorch][Mobile] Added mobile module metadata logging (#43853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43853

Add QPL logging for mobile module's metadata
ghstack-source-id: 111113492

(Note: this ignores all push blocking failures!)

Test Plan:
- CI

- Load the model trained by `mobile_model_util.py`

- Local QPL logger standard output.
{F319012106}

Reviewed By: xcheng16

Differential Revision: D23417304

fbshipit-source-id: 7bc834f39e616be1eccfae698b3bccdf2f7146e5
2020-09-01 22:27:10 -07:00
Alex Suhan
9db90fe1f3 [TensorExpr] Remove unused functions in kernel.cpp (#43966)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43966

Test Plan: build.

Reviewed By: ZolotukhinM

Differential Revision: D23456660

Pulled By: asuhan

fbshipit-source-id: c13411b61cf62dd5d038e7246f79a8682822b472
2020-09-01 20:25:16 -07:00
Gao, Xiang
5e97f251a8 Enable TF32 support for cuDNN (#40737)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737

Reviewed By: mruberry

Differential Revision: D22801525

Pulled By: ngimel

fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2
2020-09-01 15:34:24 -07:00
Lillian Johnson
e3cb582e05 Error printing extension support for multiline errors (#43807)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43807

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23407457

Pulled By: Lilyjjo

fbshipit-source-id: 05a6a50dc39c00474d9087ef56028a2c183aa53a
2020-09-01 10:02:43 -07:00
Ailing Zhang
224232032c Move Autograd to an alias dispatch key (#43070)
Summary:
This PR moves `DispatchKey::Autograd` to an alias dispatch key mapping to `AutogradCPU, AutogradCUDA, AutogradXLA, AutogradOther, AutogradPrivate*` keys.

A few things are handled in this PR:
- Update alias dispatch key mapping and precompute dispatchTable logic
- Move `Autograd` key from `always_included` set to TensorImpl constructor.
- Update `dummyTensor` constructor to take `requires_grad` as optional argument so that it's closer to the real application in op_registration_test.
- Use `BackendSelect` key for both backend select before and after autograd layer. (1 liner in backend_select codegen)

A few planned followups ordered by priority:
- [cleanup] Update `test_dispatch.py` to include testing `Autograd`.
- [cleanup] Add Math alias key and move catchAll to Math. (to remove 2.2 in `computeDispatchTableEntryWithDebug`)
- [new feature] Add support for Math in native_functions.yaml
- [cleanup] Add iterator like functionality to DispatchKeySet
- [cleanup/large] Only add Autograd backend keys when tensor requires grad. (cc: ljk53 ?)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43070

Reviewed By: ezyang

Differential Revision: D23281535

Pulled By: ailzhang

fbshipit-source-id: 9ad00b17142e9b83304f63cf599f785500f28f71
2020-09-01 09:05:29 -07:00
Bert Maher
c14a3613a8 Fix NaN propagation in TE fuser's min/max implementation (#43609)
Summary:
Per eager mode source-of-truth, NaNs shall be propagated by min/max.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43609

Reviewed By: ZolotukhinM

Differential Revision: D23349184

Pulled By: bertmaher

fbshipit-source-id: 094eb8b89a02b27d5ecf3988d0f473c0f91e4afb
2020-09-01 02:10:13 -07:00
Pritam Damania
f1624b82b5 Preserve python backtrace in autograd engine errors. (#43684)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43684

This PR attempts to address #42560 by capturing the appropriate
exception_ptr in the autograd engine and passing it over to the Future.

As part of this change, there is a significant change the Future API where we
now only accept an exception_ptr as part of setError.

For the example in #42560, the exception trace would now look like:

```
> Traceback (most recent call last):
>   File "test_autograd.py", line 6914, in test_preserve_backtrace
>     Foo.apply(t).sum().backward()
>   File "torch/tensor.py", line 214, in backward
>     torch.autograd.backward(self, gradient, retain_graph, create_graph)
>   File "torch/autograd/__init__.py", line 127, in backward
>     allow_unreachable=True)  # allow_unreachable flag
>   File "torch/autograd/function.py", line 87, in apply
>     return self._forward_cls.backward(self, *args)
>   File "test_autograd.py", line 6910, in backward
>     raise ValueError("something")
> ValueError: something
```
ghstack-source-id: 111109637

Test Plan: waitforbuildbot

Reviewed By: albanD

Differential Revision: D23365408

fbshipit-source-id: 1470c4776ec8053ea92a6ee1663460a3bae6edc5
2020-09-01 01:28:47 -07:00
Ailing Zhang
f229d2c07b Revert D23335106: [quant][graphmode][fix] Fix insert quant dequant for observers without qparams
Test Plan: revert-hammer

Differential Revision:
D23335106 (602209751e)

Original commit changeset: 84af2884d521

fbshipit-source-id: 8d227fe2048b532016407d8ecfbaa6ffd1c313fd
2020-08-31 22:12:37 -07:00
Jerry Zhang
602209751e [quant][graphmode][fix] Fix insert quant dequant for observers without qparams (#43606)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43606

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23335106

fbshipit-source-id: 84af2884d52118c069fc43a9f166dc336a8a87c8
2020-08-31 18:27:53 -07:00
Mikhail Zolotukhin
98b846cd1d [JIT] Remove loop peeling from the profiling executor pipeline. (#43847)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43847

It seems to slowdown two fastRNN benchmarks and does not speed up
others.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23416197

Pulled By: ZolotukhinM

fbshipit-source-id: 598144561979e84bcf6bccf9b0ca786f5af18383
2020-08-31 17:26:55 -07:00
Mikhail Zolotukhin
d69d603061 [JIT] Specialize autograd zero: actually remove the original graph after we created its versioned copy. (#43900)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43900

The original code assumed that the versioning if was inserted in the
beginning of the graph while in fact it was inserted in the end. We're
now also not removing `profile_optional` nodes and rely on DCE to clean
it up later (the reason we're not doing it is that deletion could
invalidate the insertion point being used).

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23432175

Pulled By: ZolotukhinM

fbshipit-source-id: 1bf55affaa3f17af1bf71bad3ef64edf71a3e3fb
2020-08-31 17:26:51 -07:00
Mikhail Zolotukhin
f150f924d3 [JIT] Specialize autograd zero: fix the guarding condition. (#43846)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43846

We are looking for tensors that are expected to be undefined (according
to the profile info) and should be checking for them to satisfy the
following condition: "not(have any non-zero)", which is equivalent to
"tensor is all zeros". The issue was that we've been checking tensors
that were expected *not* to be undefined.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23416198

Pulled By: ZolotukhinM

fbshipit-source-id: 71e22f552680f68f2af29f427b7355df9b1a4278
2020-08-31 17:25:50 -07:00
Ralf Gommers
4c19a1e350 Move torch/autograd/grad_mode.pyi stubs inline (#43415)
Summary:
- Add `torch._C` bindings from `torch/csrc/autograd/init.cpp`
- Renamed `torch._C.set_grad_enabled` to `torch._C._set_grad_enabled`
  so it doesn't conflict with torch.set_grad_enabled anymore

This is a continuation of gh-38201. All I did was resolve merge conflicts and finish the annotation of `_DecoratorContextManager.__call__` that ezyang started in the first commit.

~Reverts commit b5cd3a80bb, which was only motivated by not having `typing_extensions` available.~ (JIT can't be made to understand `Literal[False]`, so keep as is).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43415

Reviewed By: ngimel

Differential Revision: D23301168

Pulled By: malfet

fbshipit-source-id: cb5290f2e556b4036592655b9fe54564cbb036f6
2020-08-31 16:14:41 -07:00
Rohan Varma
4e4626a23d Join-based API to support DDP uneven inputs (#42577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42577

Closes https://github.com/pytorch/pytorch/issues/38174. Implements a join-based API to support training with the DDP module in the scenario where different processes have different no. of inputs. The implementation follows the description in https://github.com/pytorch/pytorch/issues/38174. Details are available in the RFC, but as a summary, we make the following changes:

#### Approach
1) Add a context manager `torch.nn.parallel.distributed.join`
2) In the forward pass, we schedule a "present" allreduce where non-joined process contribute 1 and joined processes contribute 0. This lets us keep track of joined processes and know when all procs are joined.
3) When a process depletes its input and exits the context manager, it enters "joining" mode and attempts to "shadow" the collective comm. calls made in the model's forward and backward pass. For example we schedule the same allreduces in the same order as the backward pass, but with zeros
4) We adjust the allreduce division logic to divide by the effective world size (no. of non-joined procs) rather than the absolute world size to maintain correctness.
5) At the end of training, the last joined process is selected to be the "authoritative" model copy

We also make some misc. changes such as adding a `rank` argument to `_distributed_broadcast_coalesced` and exposing some getters/setters on `Reducer` to support the above changes.

#### How is it tested?
We have tests covering the following models/scenarios:
- [x] Simple linear model
- [x] Large convolutional model
- [x] Large model with module buffers that are broadcast in the forward pass (resnet). We verify this with a helper function `will_sync_module_buffers` and ensure this is true for ResNet (due to batchnorm)
- [x] Scenario where a rank calls join() without iterating at all, so without rebuilding buckets (which requires collective comm)
- [x] Model with unused params (with find unused parameters=True)
- [x] Scenarios where different processes iterate for a varying number of different iterations.
- [x] Test consistency in tie-breaking when multiple ranks are the last ones to join
- [x] Test that we divide by the effective world_size (no. of unjoined processes)

#### Performance implications

###### Trunk vs PR patched, 32 GPUs, batch size = 32
P50, forward + backward + optimizer batch latency & total QPS: 0.121 264/s vs 0.121 264/s
P50 backwards only batch latency & total QPS: 0.087 369/s vs 0.087 368/s

###### join(enable=True) vs without join, 32 GPUs, batch size = 32, even inputs
P50, forward + backward + optimizer batch latency & total QPS: 0.120 265/s vs 0.121 264/s
P50 backwards only batch latency & total QPS: 0.088 364/s vs 0.087 368/s

###### join(enable=False) vs without join, 32 GPUs, batch size = 32, even inputs
P50 forward + backward + optimizer batch latency & total QPS: 0.121 264/s vs 0.121 264/s
P50 backwards only batch latency & total QPS: 0.087 368/s vs 0.087 368/s

###### join(enable=True) with uneven inputs (offset = 2000), 32 GPUs, batch size = 32
P50 forward + backward + optimizer batch latency & total QPS: 0.183 174/s vs 0.121 264/s
P50 backwards only batch latency & total QPS: 0.150 213/s vs 0.087 368/s

###### join(enable=True) with uneven inputs ((offset = 2000)), 8 GPUs, batch size = 32
P50 forward + backward + optimizer batch latency & total QPS: 0.104 308/s vs 0.104 308/s
P50 backwards only batch latency & total QPS: 0.070 454/s vs 0.070 459/s

The 2 above uneven inputs benchmark was conducted 32 GPUs and 4 GPUs immediately depleting their inputs and entering "join" mode (i.e. not iterating at all), while the other 28 iterating as normal. It looks like there is a pretty significant perf hit for this case when there are uneven inputs and multi-node training. Strangely, when there is a single node (8 GPUs), this does not reproduce.

#### Limitations
1) This is only implemented for MPSD, not SPMD. Per a discussion with mrshenli we want to encourage the use of MPSD over SPMD for DDP.
2) This does not currently work with SyncBN or custom collective calls made in the model's forward pass. This is because the `join` class only shadows the `broadcast` for buffers in the forward pass, the gradient allreduces in the bwd pass, unused parameters reduction, and (optionally) the rebuild buckets broadcasting in the backwards pass. Supporting this will require additional design thought.
3) Has not been tested with the [DDP comm. hook](https://github.com/pytorch/pytorch/issues/39272) as this feature is still being finalized/in progress. We will add support for this in follow up PRs.
ghstack-source-id: 111033819

Reviewed By: mrshenli

Differential Revision: D22893859

fbshipit-source-id: dd02a7aac6c6cd968db882c62892ee1c48817fbe
2020-08-31 13:29:03 -07:00
Elias Ellison
3c8b1d73c9 Update aliasing in tensorexpr fuser (#43743)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43743

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D23385205

Pulled By: eellison

fbshipit-source-id: 097a15d5bcf216453e1dd144d6117108b3deae4d
2020-08-31 11:52:26 -07:00
Elias Ellison
5da8a7bf2d use types in the IR instead of vmap (#43742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43742

We can remove all prim::profiles, update the values to their specialized profiled types, and then later guard the input graphs based on the input types of the fusion group. After that we remove specialized tensor types from the graph. This gets rid of having to update the vmap and removes all of the profile nodes in fusing.

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D23385206

Pulled By: eellison

fbshipit-source-id: 2c84bd1d1c38df0d7585e523c30f7bd28f399d7c
2020-08-31 11:52:23 -07:00
Elias Ellison
259e5b7d71 Add passes to profiling executor pipeline (#43636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43636

We weren't running inlining in the forward graph of differentiable subgraphs, and we weren't getting rid of all profiles as part of optimization.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23358804

Pulled By: eellison

fbshipit-source-id: 05ede5fa356a15ca385f899006cb5b35484ef620
2020-08-31 11:52:20 -07:00
Elias Ellison
a7e7981c0b Use prim::TensorExprGroup interned symbol (#43635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43635

Intern the symbol, no functional changes. Aliasing need to be looked at but this should be done in a separate PR; this PR is just changing the symbol.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23358806

Pulled By: eellison

fbshipit-source-id: f18bcd142a0daf514136f019ae607e4c3f45d9f8
2020-08-31 11:52:16 -07:00
Elias Ellison
1c0faa759e Update requires grad property (#43634)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43634

Because differentiable graphs detach the gradients of input Tensors, creating and inlining differentiable graphs changes the requires_grad property of tensors in the graph. In the legacy executor, this was not a problem as the Fuser would simply ignore the gradient property because it would be invariant that the LegacyExecutor only passed tensors with grad = False. This is not the case with the profiler, as the Fuser does it's own guarding.

Updating the type also helps with other typechecks, e.g. the ones specializing the backward, and with debugging the graph.

Other possibilities considered were:
- Fuser/Specialize AutogradZero always guards against requires_grad=False regardless of the profiled type
- Re-profile forward execution of differentiable graph

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23358803

Pulled By: eellison

fbshipit-source-id: b106998accd5d0f718527bc00177de9af5bad5fc
2020-08-31 11:51:06 -07:00
Nick Gibson
1390cad2d8 [NNC] Hook up registerizer to Cuda codegen [2/x] (#42878)
Summary:
Insert the registerizer into the Cuda Codegen pass list, to enable scalar replacement and close the gap in simple reduction performance.

First up the good stuff, benchmark before:
```
          Column sum          Caffe2             NNC          Simple          Better
           (10, 100)          5.7917          9.7037          6.9386          6.0448
          (100, 100)          5.9338          14.972          7.1139          6.3254
        (100, 10000)          21.453          741.54          145.74          12.555
        (1000, 1000)          8.0678          122.75          22.833          9.0778

             Row sum          Caffe2             NNC          Simple          Better
           (10, 100)          5.4502          7.9661          6.1469          5.5587
          (100, 100)          5.7613          13.897           21.49          5.5808
        (100, 10000)          21.702          82.398          75.462          22.793
        (1000, 1000)          22.527             129          176.51          22.517

```

After:
```
          Column sum          Caffe2             NNC          Simple          Better
           (10, 100)          6.0458          9.4966          7.1094           6.056
          (100, 100)          5.9299          9.1482          7.1693           6.593
        (100, 10000)          21.739          121.97          162.63          14.376
        (1000, 1000)          9.2374           29.01          26.883          10.127

             Row sum          Caffe2             NNC          Simple          Better
           (10, 100)          5.9773          8.1792          7.2307          5.8941
          (100, 100)          6.1456          9.3155          24.563          5.8163
        (100, 10000)          25.384          30.212          88.531          27.185
        (1000, 1000)          26.517          32.702          209.31          26.537
```

Speedup about 3-8x depending on the size of the data (increasing with bigger inputs).

The gap between NNC and simple is closed or eliminated - remaining issue appears to be kernel launch overhead. Next up is getting us closer to the _Better_ kernel.

It required a lot of refactoring and bug fixes on the way:
* Refactored flattening of parallelized loops out of the CudaPrinter and into its own stage, so we can transform the graph in the stage between flattening and printing (where registerization occurs).
* Made AtomicAddFuser less pessimistic, it will now recognize that if an Add to a buffer is dependent on all used Block and Thread vars then it has no overlap and does not need to be atomic. This allows registerization to apply to these stores.
* Fixed PrioritizeLoad mutator so that it does not attempt to separate the Store and Load to the same buffer (i.e. reduction case).
* Moved CudaAnalysis earlier in the process, allowing later stages to use the analyzed bufs.
* Fixed a bug in the Registerizer where when adding a default initializer statement it would use the dtype of the underlying var (which is always kHandle) instead of the dtype of the Buf.
* Fixed a bug in the IRMutator where Allocate statements logic was inverted to be replaced only if they did not change.
* Added simplification of simple Division patterns to the IRSimplifier.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42878

Reviewed By: glaringlee

Differential Revision: D23382499

Pulled By: nickgg

fbshipit-source-id: 3640a98fd843723abad9f54e67070d48c96fe949
2020-08-31 10:39:46 -07:00
mfkasim91
576880febf Print all traceback for nested backwards in detect_anomaly (#43626)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43405.

This pull request adds a feature of printing all tracebacks if a `detect_anomaly` mode detects `nan` in nested backward operations.
The way I did it is by assigning a node as a parent to all nodes it produces during its backward calculation. Then if one of the children produces `nan`, it will print the traceback from the parent and grand parents (if any).

The parent is assigned in `parent_node_` member in `Node` class which is accessible in C++ by function `node->parent()` and in Python by `node.parent_function`.
A node has a parent iff:

1. it is created from a backward operation, and
2. created when anomaly mode and grad mode are both enabled.

An example of this feature:

    import torch

    def example():
        x = torch.tensor(1.0, requires_grad=True)
        y = torch.tensor(1e-8, requires_grad=True)  # small to induce nan in n-th backward
        a = x * y
        b = x * y
        z1 = a / b  # can produce nan in n-th backward as long as https://github.com/pytorch/pytorch/issues/43414 is unsolved
        z = z1 * z1
        gy , = torch.autograd.grad( z , (y,), create_graph=True)
        gy2, = torch.autograd.grad(gy , (y,), create_graph=True)
        gy3, = torch.autograd.grad(gy2, (y,), create_graph=True)
        gy4, = torch.autograd.grad(gy3, (y,), create_graph=True)
        return gy4

    with torch.autograd.detect_anomaly():
        gy4 = example()

with output:

    example.py:16: UserWarning: Anomaly Detection has been enabled. This mode will increase the runtime and should only be enabled for debugging.
      with torch.autograd.detect_anomaly():
    /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Error detected in DivBackward0. Traceback of forward call that caused the error:
      File "example.py", line 17, in <module>
        gy4 = example()
      File "example.py", line 12, in example
        gy3, = torch.autograd.grad(gy2, (y,), create_graph=True)
      File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad
        return Variable._execution_engine.run_backward(
     (Triggered internally at  ../torch/csrc/autograd/python_anomaly_mode.cpp:61.)
      return Variable._execution_engine.run_backward(
    /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning:

    Traceback of forward call that induces the previous calculation:
      File "example.py", line 17, in <module>
        gy4 = example()
      File "example.py", line 11, in example
        gy2, = torch.autograd.grad(gy , (y,), create_graph=True)
      File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad
        return Variable._execution_engine.run_backward(
     (Triggered internally at  ../torch/csrc/autograd/python_anomaly_mode.cpp:65.)
      return Variable._execution_engine.run_backward(
    /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning:

    Traceback of forward call that induces the previous calculation:
      File "example.py", line 17, in <module>
        gy4 = example()
      File "example.py", line 8, in example
        z1 = a / b  # can produce nan in n-th backward as long as https://github.com/pytorch/pytorch/issues/43414 is unsolved
     (Triggered internally at  ../torch/csrc/autograd/python_anomaly_mode.cpp:65.)
      return Variable._execution_engine.run_backward(
    Traceback (most recent call last):
      File "example.py", line 17, in <module>
        gy4 = example()
      File "example.py", line 13, in example
        gy4, = torch.autograd.grad(gy3, (y,), create_graph=True)
      File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad
        return Variable._execution_engine.run_backward(
    RuntimeError: Function 'DivBackward0' returned nan values in its 1th output.

cc & thanks to albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43626

Reviewed By: malfet

Differential Revision: D23397499

Pulled By: albanD

fbshipit-source-id: aa7435ec2a7f0d23a7a02ab7db751c198faf3b7d
2020-08-31 08:23:07 -07:00
BowenBao
08126c9153 [ONNX] Utilize ONNX shape inference for ONNX exporter (#40628)
Summary:
It is often that the conversion from torch operator to onnx operator requires input rank/dtype/shape to be known. Previously, the conversion depends on tracer to provide these info, leaving a gap in conversion of scripted modules.

We are extending the export with support from onnx shape inference. If enabled, onnx shape inference will be called whenever an onnx node is created. This is the first PR introducing the initial look of the feature. More and more cases will be supported following this PR.

* Added pass to run onnx shape inference on a given node. The node has to have namespace `onnx`.
* Moved helper functions from `export.cpp` to a common place for re-use.
* This feature is currently experimental, and can be turned on through flag `onnx_shape_inference` in internal api `torch.onnx._export`.
* Currently skipping ONNX Sequence ops, If/Loop and ConstantOfShape due to limitations. Support will be added in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40628

Reviewed By: mrshenli

Differential Revision: D22709746

Pulled By: bzinodev

fbshipit-source-id: b52aeeae00667e66e0b0c1144022f7af9a8b2948
2020-08-30 18:35:46 -07:00
Mike Ruberry
3aeb70db0b Documents sub properly, adds subtract alias (#43850)
Summary:
`torch.sub` was undocumented, so this PR adds its documentation, analogous to `torch.add`'s documentation, and adds the alias `torch.subtract` for `torch.sub`, too. This alias comes from NumPy (see https://numpy.org/doc/stable/reference/generated/numpy.subtract.html?highlight=subtract#numpy.subtract)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43850

Reviewed By: ngimel

Differential Revision: D23416908

Pulled By: mruberry

fbshipit-source-id: 6c4d2ebaf6ecae91f3a6efe484ce6c4dad96f016
2020-08-30 15:44:56 -07:00
Bert Maher
1830e4f08c Remove unnamed namespace in headers (#43689)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43689

Test Plan: Imported from OSS

Reviewed By: eellison, asuhan

Differential Revision: D23367636

Pulled By: bertmaher

fbshipit-source-id: ddb6d34d2f7cadff3a591c3650e1dd1b401c3d2d
2020-08-29 22:45:53 -07:00
Gao, Xiang
ab3ea95e90 #include <string> in loopnest.h (#43835)
Summary:
This file is causing compiling failure on my gcc-10.1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43835

Reviewed By: bhosmer

Differential Revision: D23416417

Pulled By: ZolotukhinM

fbshipit-source-id: d0c2998347438fb729212574d52ce20dd6faae85
2020-08-29 19:06:44 -07:00
Ashkan Aliabadi
4e39c310eb Move torch/csrc/utils/hash.h to c10/util/hash.h. (#42503)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42503

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D23252331

Pulled By: AshkanAliabadi

fbshipit-source-id: 3c4c0e27b9a7eec8560e374c2a3ba5f1c65dae48
2020-08-29 17:47:00 -07:00
Alex Suhan
60ad7e9c04 [TensorExpr] Make sum available from Python (#43730)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43730

Test Plan:
python test/test_jit_fuser_te.py -k TestTEFuser.test_sum
test_tensorexpr --gtest_filter=TensorExprTest.KernelSum*

Reviewed By: ZolotukhinM

Differential Revision: D23407600

Pulled By: asuhan

fbshipit-source-id: e6da4690ae6d802f9be012e39e61b7467aa5285c
2020-08-29 10:38:21 -07:00
Nikita Shulga
d10056652b Enable torch.half for lt and masked_select (#43704)
Summary:
Enable testing of those options in `TestTorchDeviceTypeCPU.test_logical_cpu` and `TestTorchDeviceTypeCPU.test_masked_select_cpu_float16`
Add `view_as_real` testing for `torch.complex32` type

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43704

Reviewed By: albanD

Differential Revision: D23373070

Pulled By: malfet

fbshipit-source-id: 00f17f23b48513379a414227aea91e2d3c0dd5f9
2020-08-29 02:37:26 -07:00
Pritam Damania
931b8b4ac8 Use ivalue::Future in autograd engine and DistEngine. (#43676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43676

This is one part of https://github.com/pytorch/pytorch/issues/41574 to
ensure we consolidate everything around ivalue::Future.

I've removed the use of torch/csrc/utils/future.h from the autograd engines and
used ivalue::Future instead.
ghstack-source-id: 110895545

Test Plan: waitforbuildbot.

Reviewed By: albanD

Differential Revision: D23362415

fbshipit-source-id: aa109b3f8acf0814d59fc5264a85a8c27ef4bdb6
2020-08-29 02:15:26 -07:00
Nikolay Korovaiko
000739c31a Function calls for fallback paths (#43274)
Summary:
This PR adds API to package unoptimized/fallback blocks as function calls. It's mainly meant to be used by TensorExpressionsFuser and SpecializeAutogradZero passes as both specialize the original graph but would also like to provide a fallback path in case the assumptions under which the graph was specialized do not hold for some inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43274

Reviewed By: malfet

Differential Revision: D23406961

Pulled By: Krovatkin

fbshipit-source-id: ef21fc9ad886953461b09418d02c75c58375490c
2020-08-28 23:31:02 -07:00
Hao Lu
8538a79bfe [jit][static] Basic executor (#43647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43647

Nothing fancy, just a basic implementation of the graph executor without using stack machine.

Reviewed By: bwasti

Differential Revision: D23208413

fbshipit-source-id: e483bb6ad7ba8591bbe1767e669654d82f42c356
2020-08-28 23:20:07 -07:00
Martin Yuan
58148c85f4 Use template OperatorGenerator for prim and special operator registration (#43481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43481

Apply OperatorGenerator for prim and special operator registration. It does not affect the existing build by default. However, if a whitelist of operator exists, only the operators in the whitelist will be registered. It has the potential to save up to 200 KB binary size, depending on the usage.

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D23287251

Pulled By: iseeyuan

fbshipit-source-id: 3ca39fbba645bad8d69e69195f3680e4f6d633c5
2020-08-28 21:18:00 -07:00
Vinod Kumar S
13c7c6227e Python/C++ API Parity: TransformerDecoder (#42886)
Summary:
Fixes #{[37756](https://github.com/pytorch/pytorch/issues/37756)}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42886

Reviewed By: zhangguanheng66

Differential Revision: D23385631

Pulled By: glaringlee

fbshipit-source-id: 610a2fabb4c25b2dfd37b33287215bb8872d653d
2020-08-28 20:13:53 -07:00
Dmytro Dzhulgakov
47e489b135 Make ExtraFilesMap return bytes instead of str (#43241)
Summary:
In case we want to store binary files using `ScriptModule.save(..., _extra_files=...)` functionality. With python3 we can just use bytes only and not bother about it.

I had to do a copy-pasta from pybind sources, maybe we should upstream it, but it'd mean adding a bunch of template arguments to `bind_map` which is a bind untidy.

Let me know if there's a better place to park this function (it seems to be the only invocation of `bind_map` so I put it in the same file)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43241

Reviewed By: zdevito

Differential Revision: D23205244

Pulled By: dzhulgakov

fbshipit-source-id: 8f291eb4294945fe1c581c620d48ba2e81b3dd9c
2020-08-28 19:11:33 -07:00
Kurt Mohler
68b9daa9bf Add torch.linalg.norm (#42749)
Summary:
Adds `torch.linalg.norm` function that matches the behavior of `numpy.linalg.norm`.

Additional changes:
* Add support for dimension wrapping in `frobenius_norm` and `nuclear_norm`
* Fix `out` argument behavior for `nuclear_norm`
* Fix issue where `frobenius_norm` allowed duplicates in `dim` argument
* Add `_norm_matrix`

Closes https://github.com/pytorch/pytorch/issues/24802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42749

Reviewed By: ngimel

Differential Revision: D23336234

Pulled By: mruberry

fbshipit-source-id: f0aba3089a3a0bf856aa9c4215e673ff34228fac
2020-08-28 18:28:33 -07:00
neginraoof
cd0bab8d8d [ONNX] Where op (#41544)
Summary:
Extending where op export

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41544

Reviewed By: malfet

Differential Revision: D23279515

Pulled By: bzinodev

fbshipit-source-id: 4627c95ba18c8a5ac8d06839c343e06e71c46aa7
2020-08-28 18:15:01 -07:00
Sinan Nasir
7d517cf96f [NCCL] Dedicated stream to run all FutureNCCL callbacks. (#43447)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43447

Two main better-engineering motivations to run all FutureNCCL callbacks on a dedicated stream:
1. Each time a then callback was called, we would get a stream from the pool and run the callback on that stream. If we observe the stream traces using that approach, we would see a lot of streams and debugging would become more complicated. If we have a dedicated stream to run all then callback operations, the trace results will be much cleaner and easier to follow.
2. getStreamFromPool may eventually return the default stream or a stream that is used for other operations. This can cause slowdowns.

Unless then callback takes longer than preceding allreduce, this approach will be as performant as the previous approach.
ghstack-source-id: 110909401

Test Plan:
Perf trace runs to validate the desired behavior:
See the dedicated stream 152 is running the then callback operations:

{F299759342}

I run pytorch.benchmark.main.workflow using resnet50 and 32 GPUs registering allreduce with then hook.
See f213777896 [traces](https://www.internalfb.com/intern/perfdoctor/results?run_id=26197585)

After updates, same observation: see f214890101

Reviewed By: malfet

Differential Revision: D23277575

fbshipit-source-id: 67a89900ed7b70f3daa92505f75049c547d6b4d9
2020-08-28 17:26:23 -07:00
Vasiliy Kuznetsov
3f5ea2367e Adding a version serialization type to ConvPackedParam (#43086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43086

This PR changes the format of `ConvPackedParam` in a nearly backwards-compatible way:
* a new format is introduced which has more flexibility and a lower on-disk size
* custom pickle functions are added to `ConvPackedParams` which know how to load the old format
* the custom pickle functions are **not** BC because the output type of `__getstate__` has changed.  We expect this to be acceptable as no user flows are actually broken (loading a v1 model with v2 code works), which is why we whitelist the failure.

Test plan (TODO finalize):

```
// adhoc testing of saving v1 and loading in v2: https://gist.github.com/vkuzo/f3616c5de1b3109cb2a1f504feed69be

// test that loading models with v1 conv params format works and leads to the same numerics
python test/test_quantization.py TestSerialization.test_conv2d_graph
python test/test_quantization.py TestSerialization.test_conv2d_nobias_graph

// test that saving and loading models with v2 conv params format works and leads to same numerics
python test/test_quantization.py TestSerialization.test_conv2d_graph_v2
python test/test_quantization.py TestSerialization.test_conv2d_nobias_graph_v2

// TODO before land:
// test numerics for a real model
// test legacy ONNX path
```

Note: this is a newer copy of https://github.com/pytorch/pytorch/pull/40003

Test Plan: Imported from OSS

Reviewed By: dreiss

Differential Revision: D23347832

Pulled By: vkuzo

fbshipit-source-id: 06bbe4666421ebad25dc54004c3b49a481d3cc92
2020-08-28 15:41:30 -07:00
Iurii Zdebskyi
4cb8d306e6 Add _foreach_add_(TensorList tensors, Scalar scalar) API (#42531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42531

[First PR: Add private API to support tensor lists: _foreach_add(TensorList tensors, Scalar scalar)](https://github.com/pytorch/pytorch/pull/41554).

**Motivation**
[GitHub issue](https://github.com/pytorch/pytorch/issues/38655)
Current PyTorch optimizer implementations are not efficient in cases when we work with a lot of small feature tensors. Starting a lot of kernels slows down the whole process. We need to reduce the number of kernels that we start.
As an example, we should be looking at [NVIDIAs Apex](https://github.com/NVIDIA/apex).
In order to track progress, we will pick PyTorchs DCGAN model with Adam optimizer and once the optimizer is reimplemented with tensor lists, benchmark the model performance against original model version, Apexs version with original Adam optimizer and it’s FusedAdam optimizer.

**Current API restrictions**
- List can't be empty (will fixed in upcoming PRs).
- All tensors in the list must have the same dtype, device and size.

**Broadcasting**
At this point we don't support broadcasting.

**What is 'Fast' and 'Slow' route**
In particular cases, we cant process an op with a fast list CUDA kernel. Still, we can do with a regular for-loop where the op will be applied to each tensor individually through the dispatch mechanisms. There are a few checks that decide whether the op will be performed via a 'fast' or 'slow' path.
To go the fast route,
- All tensors must have strided layout
- All tensors must be dense and not have overlapping memory
- The resulting tensor type must be the same.

---------------
**In this PR**
- Adding a `std::vector<Tensor> _foreach_add_(TensorList tensors, Scalar scalar)` API
- Resolving some additional comments from previous [PR](https://github.com/pytorch/pytorch/pull/41554).

**Tests**
Tested via unit tests

**TODO**
1. Properly handle empty lists

**Plan for the next PRs**
1. APIs
- Binary Ops for list with Scalar
- Binary Ops for list with list
- Unary Ops for list
- Pointwise Ops

2. Complete tasks from TODO
3. Rewrite PyTorch optimizers to use for-each operators for performance gains.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23331892

Pulled By: izdeby

fbshipit-source-id: c585b72e1e87f6f273f904f75445618915665c4c
2020-08-28 14:34:46 -07:00
Mike Ruberry
20abfc21e4 Adds arctanh, arcsinh aliases, simplifies arc* alias dispatch (#43762)
Summary:
Adds two more "missing" NumPy aliases: arctanh and arcsinh, and simplifies the dispatch of other arc* aliases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43762

Reviewed By: ngimel

Differential Revision: D23396370

Pulled By: mruberry

fbshipit-source-id: 43eb0c62536615fed221d460c1dec289526fb23c
2020-08-28 13:59:19 -07:00
Mikhail Zolotukhin
776c2d495f [JIT] IRParser: store list attributes as generic ivalue lists. (#43785)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43785

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23400565

Pulled By: ZolotukhinM

fbshipit-source-id: e248eb1854c4ec40da9455d4279ea6e47b1f2a16
2020-08-28 13:27:28 -07:00
Mike Ruberry
f4695203c2 Fixes fft function calls for C++ API (#43749)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43732.

Requires importing the fft namespace in the C++ API, just like the Python API does, to avoid clobbering torch::fft the function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43749

Reviewed By: glaringlee

Differential Revision: D23391544

Pulled By: mruberry

fbshipit-source-id: d477d0b6d9a689d5c154ad6c31213a7d96fdf271
2020-08-28 12:41:30 -07:00
mspryn@fb.com
b630c1870d Add stateful XNNPack deconvolution2d operator to torch. (#43233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43233

XNNPack is already being used for the convolution2d operation. Add the
ability for it to be used with transpose convolution.

Test Plan: buck run caffe2/test:xnnpack_integration

Reviewed By: kimishpatel

Differential Revision: D23184249

fbshipit-source-id: 3fa728ce1eaca154d24e60f800d5e946d768c8b7
2020-08-28 10:31:36 -07:00
Protonu Basu
58a7e73a95 [TensorExpr] Block Codegen (#40054)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40054

Reviewed By: ZolotukhinM

Differential Revision: D22061350

Pulled By: protonu

fbshipit-source-id: 004f7c316629b16610ecdbb97e43036c72c65067
2020-08-28 09:53:42 -07:00
generatedunixname89002005287564@sandcastle148.ftw3.facebook.com
26161e8ab6 [Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D23393950

fbshipit-source-id: 6a31b7ab6961cba88014f41b3ed1eda108edebab
2020-08-28 05:38:13 -07:00
Meghan Lele
87d7c362b1 [JIT] Add JIT support for torch.no_grad (#41371)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41371

**Summary**
This commit enables the use of `torch.no_grad()` in a with item of a
with statement within JIT. Note that the use of this context manager as
a decorator is not supported.

**Test Plan**
This commit adds a test case to the existing with statements tests for
`torch.no_grad()`.

**Fixes**
This commit fixes #40259.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D22649519

Pulled By: SplitInfinity

fbshipit-source-id: 7fa675d04835377666dfd0ca4e6bc393dc541ab9
2020-08-27 15:32:57 -07:00
Jiakai Liu
3a0e35c9f2 [pytorch] deprecate static dispatch (#43564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564

Static dispatch was originally introduced for mobile selective build.

Since we have added selective build support for dynamic dispatch and
tested it in FB production for months, we can deprecate static dispatch
to reduce the complexity of the codebase.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23324452

Pulled By: ljk53

fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7
2020-08-27 14:52:48 -07:00
Elias Ellison
01f974eb1e Specialize optionals for grad_sum_to_size (#43633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43633

In the backward graph, _grad_sum_to_size is inserted whenever a possibly broadcasting op is called:"
`"aten::_grad_sum_to_size(Tensor(a) self, int[]? size) -> Tensor(a)"`
 If a broadcast occurred, a sum is called, otherwise the second input is None and it is a no-op. Most of the time, it's a no-op (in the fast RNNs benchmark > 90% of the time).

We can get rid of this op by profiling the optionality of the second input. I added `prim::profile_optional` to do this, which counts the number of times it saw a None value and the number of times it saw a value present. When specializing the backward graph, we insert checks for values we profiled as None, and in the optimized block can remove the grad_sum_to_size calls that use those values.

In the future we may revisit this when NNC supports reductions and we want to replace grad_sum_to_size with sums as well, but I think this is worth landing now.

Test Plan: Imported from OSS

Reviewed By: bwasti, ZolotukhinM

Differential Revision: D23358809

Pulled By: eellison

fbshipit-source-id: a30a148ca581370789d57ba082d23cbf7ef2cd4d
2020-08-27 14:35:37 -07:00
Elias Ellison
a19fd3a388 Add undefined specializations in backward (#43632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43632

Specialize the backward graph by guarding on the undefinedness of the input tensors. The graph will look like:
```
ty1, ty2, succesful_checks = prim::TypeCheck(...)
if (succesful_checks)
-> optimized graph
else:
-> fallback graph
```

Specializing on the undefinedness of tensors allows us to clean up the
```
if any_defined(inputs):
 outputs = <original_computation>
else:
 outputs = autograd zero tensors
```
blocks that make up the backward graph, so that we can fuse the original_computation nodes together.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23358808

Pulled By: eellison

fbshipit-source-id: f5bb28f78a4a3082ecc688a8fe0345a8a098c091
2020-08-27 14:35:35 -07:00
Elias Ellison
e189ef5577 Refactor pass to class (#43630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43630

No functional changes here - just refactoring specialize autograd zero to a class, and standardizing its API to take in a shared_ptr<Graph>

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23358805

Pulled By: eellison

fbshipit-source-id: 42e19ef2e14df66b44592252497a47d03cb07a7f
2020-08-27 14:35:30 -07:00
Elias Ellison
d1c4d75c14 Add API for unexecuted op (#43629)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43629

We have a few places where we count the size a block / subgraph - it's nice to have a shared API to ignore operators that are not executed in the optimized graph (will be used when i add a new profiling node in PR ^^)

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23358807

Pulled By: eellison

fbshipit-source-id: 62c745d9025de94bdafd9f748f7c5a8574cace3f
2020-08-27 14:34:05 -07:00
aizjForever
cdc3e232e9 Add __str__ and __repr__ bindings to SourceRange (#43601)
Summary:
Added the bindings for `__str__` and `__repr__` methods for SourceRange

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43601

Test Plan:
`python test/test_jit.py`

cc gmagogsfm

Reviewed By: agolynski

Differential Revision: D23366500

Pulled By: gmagogsfm

fbshipit-source-id: ab4be6e8f9ad5f67a323554437878198483f4320
2020-08-27 12:30:47 -07:00
Martin Yuan
288a2effa0 Operator generator based on templated selective build. (#43456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43456

Introduce the template OperatorGenerator, which returns an optional Operator. It's null if the templated bool value is null.

RegisterOperators() is updated to take the optional Operator. A null will not be registered.

With this update the selective operator registration can be done at compile time. Tests are added to show an operator can be registered if it's in a whitelist and it will not be registered if it's not in the whitelist.

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D23283563

Pulled By: iseeyuan

fbshipit-source-id: 456e0c72b2f335256be800aeabb797bd83bcf0b3
2020-08-27 07:26:07 -07:00
Alex Suhan
de84db2a9d [TensorExpr] Add aten::sum lowering to the kernel (#43585)
Summary:
Handles all dimensions and selected dimensions, per PyTorch semantics.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43585

Test Plan: test_tensorexpr

Reviewed By: bertmaher

Differential Revision: D23362382

Pulled By: asuhan

fbshipit-source-id: e8d8f1197a026be0b46603b0807d996a0de5d58c
2020-08-27 02:46:47 -07:00
lixinyu
48e08f884e C++ APIs TransformerEncoder (#43187)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43187

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23182770

Pulled By: glaringlee

fbshipit-source-id: 968846138d4b1c391a74277216111dba8b72d683
2020-08-27 01:31:46 -07:00
Meghan Lele
00c1501bc0 [JIT] Cast return values of functions returning Any (#42259)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42259

**Summary**
This commit modifies IR generation to insert explicit cast that cast
each return value to `Any` when a function is annotated as returning `Any`.
This precludes the failure in type unification (see below) that caused
this issue.

Issue #41962 reported that the use of an `Any` return type in
combination with different code paths returning values of different
types causes a segmentation fault. This is because the exit transform
pass tries to unify the different return types, fails, but silently sets
the type of the if node to c10::nullopt. This causes problems later in
shape analysis when that type object is dereferenced.

**Test Plan**
This commit adds a unit test that checks that a function similar to the
one in #41962 can be scripted and executed.

**Fixes**
This commit fixes #41962.

Differential Revision: D22883244

Test Plan: Imported from OSS

Reviewed By: eellison, yf225

Pulled By: SplitInfinity

fbshipit-source-id: 523d002d846239df0222cd07f0d519956e521c5f
2020-08-26 18:24:11 -07:00
Bert Maher
0bf27d64f4 Fix NaN propagation in fuser's min/max implementation (#43590)
Summary:
fmax/fmin propagate the number if one argument is NaN, which doesn't match the eager mode behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43590

Reviewed By: mruberry

Differential Revision: D23338664

Pulled By: bertmaher

fbshipit-source-id: b0316a6f01fcf8946ba77621efa18f339379b2d0
2020-08-26 17:31:06 -07:00
shubhambhokare1
9ca338a9d4 [ONNX] Modified slice node in inplace ops pass (#43275)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42292

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43275

Reviewed By: hl475

Differential Revision: D23352540

Pulled By: houseroad

fbshipit-source-id: 7fce3087c333efe3db4b03e9b678d0bee418e93a
2020-08-26 16:51:20 -07:00
Sinan Nasir
769b9381fc DDP Communication hook: Fix the way we pass future result to buckets. (#43307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43307

I identified a bug with DDP communication hook while I was trying accuracy benchmarks: I was getting `loss=nan`.

Looks like when we re-`initialize_bucketviews` with the value of `future_work`, as `Reducer::mark_variable_ready_dense` does `bucket_view.copy_(grad)` it wasn't copying the `grads` back to the contents since `bucket_view` wouldn't have any relationship with `contents` after re-intitializing it with something else. As we have multiple iterations, this was causing problems.
I solved this by adding two states for `bucket_view`:
```
    // bucket_views_in[i].copy_(grad) and
    // grad.copy_(bucket_views_out[i])
    // provide convenient ways to move grad data in/out of contents.
    std::vector<at::Tensor> bucket_views_in;
    std::vector<at::Tensor> bucket_views_out;
```

I included two additional unit tests where we run multiple iterations for better test coverage:
1) `test_accumulate_gradients_no_sync_allreduce_hook`
2) `test_accumulate_gradients_no_sync_allreduce_with_then_hook`.

ghstack-source-id: 110728299

Test Plan:
Run `python test/distributed/test_c10d.py`, some perf&accuracy benchmarks.

New tests:
`test_accumulate_gradients_no_sync_allreduce_hook`
`test_accumulate_gradients_no_sync_allreduce_with_then_hook`

Acc benchmark results look okay:
f214188350

Reviewed By: agolynski

Differential Revision: D23229309

fbshipit-source-id: 329470036cbc05ac12049055828495fdb548a082
2020-08-26 14:22:09 -07:00
Yuchen Huang
0521c71241 [D23047144 Duplicate][2/3][lite interpreter] add metadata when saving and loading models for mobile (#43584)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43584

1. add `metadata.pkl` to `.bc` file which includes the model info that we are interested in
2. load `metadata.pkl` as a attribute `unordered_map<string, string>` in the module
ghstack-source-id: 110730013

Test Plan:
- CI
```buck build //xplat/caffe2:jit_module_saving
```
```buck build //xplat/caffe2:torch_mobile_core
```

Reviewed By: xcheng16

Differential Revision: D23330080

fbshipit-source-id: 5d65bd730b4b566730930d3754fa1bf16aa3957e
2020-08-26 14:07:49 -07:00
Wenfang Xu
db1fbc5729 [OACR][NLU] Add aten::str operator (#43573)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43573

We recently updated the Stella NLU model in D23307228, and the App started to crash with `Following ops cannot be found:{aten::str, }`.

Test Plan: Verified by installing the assistant-playground app on Android.

Reviewed By: czlx0701

Differential Revision: D23325409

fbshipit-source-id: d670242868774bb0aef4be5c8212bc3a3f2f667c
2020-08-26 13:27:11 -07:00
Hao Lu
25dcc28cd6 [jit][static] Replace deepcopy with copy (#43182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43182

We should avoid using `deepcopy` on the module because it involves copying the weights.

Comparing the implementation of `c10::ivalue::Object::copy()` vs `c10::ivalue::Object::deepcopy()`, the only difference is `deepcopy` copies the attributes (slots) while `copy` does not.

Reviewed By: bwasti

Differential Revision: D23171770

fbshipit-source-id: 3cd711c6a2a19ea31d1ac1ab2703a0248b5a4ef3
2020-08-26 11:15:49 -07:00
Mikhail Zolotukhin
3ec24f02af [TensorExpr] Start using typecheck in the fuser. (#43173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43173

With this change the fuser starts to generate typechecks for inputs of
fusion group. For each fusion group we generate a typecheck and an if
node: the true block contains the fused subgraph, the false block
contains unoptimized original subgraph.

Differential Revision: D23178230

Test Plan: Imported from OSS

Reviewed By: eellison

Pulled By: ZolotukhinM

fbshipit-source-id: f56e9529613263fb3e6575869fdb49973c7a520b
2020-08-25 18:13:32 -07:00
Mikhail Zolotukhin
b763666f9f [JIT] Subgraph utils: add an optional vmap argument to the API to allow retrieving value mappings. (#43235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43235

This functionality is needed when we want to not lose track of
nodes/values as we merge and unmerge them into other nodes. For
instance, if we have a side data structure with some meta information
about values or nodes, this new functionality would allow to keep that
metadata up to date after merging and unmerging nodes.

Differential Revision: D23202648

Test Plan: Imported from OSS

Reviewed By: eellison

Pulled By: ZolotukhinM

fbshipit-source-id: 350d21a5d462454166f8a61b51d833551c49fcc9
2020-08-25 18:13:29 -07:00
Mikhail Zolotukhin
d18566c617 [TensorExpr] Fuser: disallow aten::slice nodes. (#43365)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43365

We don't have shape inference for them yet.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23253418

Pulled By: ZolotukhinM

fbshipit-source-id: 9c38778b8a616e70f6b2cb5aab03d3c2013b34b0
2020-08-25 18:13:27 -07:00
Mikhail Zolotukhin
8dc4b415eb [TensorExpr] Fuser: only require input shapes to be known (output shapes can be inferred). (#43171)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43171

Differential Revision: D23178228

Test Plan: Imported from OSS

Reviewed By: eellison

Pulled By: ZolotukhinM

fbshipit-source-id: e3465066e0cc4274d28db655de274a51c67594c4
2020-08-25 18:13:25 -07:00
Mikhail Zolotukhin
f6b7c6da19 [TensorExpr] Fuser: move canHandle and some other auxiliary functions into TensorExprFuser class. (#43170)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43170

Differential Revision: D23178227

Test Plan: Imported from OSS

Reviewed By: eellison

Pulled By: ZolotukhinM

fbshipit-source-id: 3c3a0215344fb5942c4f3078023fef32ad062fe9
2020-08-25 18:12:01 -07:00
Haoran Li
f35e069622 Back out "Make grad point to bucket buffer in DDP to save memory usage" (#43557)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43557

backout the diff that caused some errors in pytext distributed training

Test Plan: Tested by rayhou who verified reverting the diff works

Differential Revision: D23320238

fbshipit-source-id: caa0fe74404059e336cd95fdb41373f58ecf486e
2020-08-25 18:04:39 -07:00
Yuchen Huang
05f27b18fb Back out D23047144 "[2/3][lite interpreter] add metadata when saving and loading models for mobile"
Summary:
Original commit changeset: f368d00f7bae

Back out "[2/3][lite interpreter] add metadata when saving and loading models for mobile"

D23047144 (e37f871e87)

Pull Request: https://github.com/pytorch/pytorch/pull/43516

(Note: this ignores all push blocking failures!)

Test Plan: CI

Reviewed By: xcheng16

Differential Revision: D23304639

fbshipit-source-id: 970ca3438c1858f8656cbcf831ffee2c4a551110
2020-08-25 14:58:38 -07:00
Yuchen Huang
e37f871e87 [2/3][lite interpreter] add metadata when saving and loading models for mobile
Summary:
1. add `metadata.pkl` to `.bc` file which includes the model info that we are interested in
2. load `metadata.pkl` as a attribute `unordered_map<string, string>` in the module

Test Plan:
- CI
```buck build //xplat/caffe2:jit_module_saving
```
```buck build //xplat/caffe2:torch_mobile_core
```

Reviewed By: xcheng16

Differential Revision: D23047144

fbshipit-source-id: f368d00f7baef2d3d15f89473cdb146467aa1e0b
2020-08-24 13:40:52 -07:00
David Reiss
ed8b08a3ba Update quantize_jit to handle new upsample overloads (#43407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43407

ghstack-source-id: 110404846

Test Plan:
test_general_value_ops passes with D21209991 applied.
(Without this diff D21209991 breaks that test.)

Reviewed By: jerryzh168

Differential Revision: D23256503

fbshipit-source-id: 0f75e50a9f7fccb5b4325604319a5f76b42dfe5e
2020-08-24 13:33:47 -07:00
Yanan Cao
35a36c1280 Implement JIT Enum type serialization and deserialization (#43460)
Summary:
[Re-review tips: nothing changed other than a type in python_ir.cpp to fix a windows build failure]

Adds code printing for enum type
Enhance enum type to include all contained enum names and values
Adds code parsing for enum type in deserialization
Enabled serialization/deserialization test in most TestCases. (With a few dangling issues to be addressed in later PRs to avoid this PR grows too large)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43460

Reviewed By: albanD

Differential Revision: D23284929

Pulled By: gmagogsfm

fbshipit-source-id: e3e81d6106f18b7337ac3ff5cd1eeaff854904f3
2020-08-24 12:04:31 -07:00
Ann Shan
7cc1efec13 Add lite SequentialSampler to torch mobile (#43299)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43299

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D23228415

Pulled By: ann-ss

fbshipit-source-id: eebe54353a128783f039c7dac0e2dd765a61940d
2020-08-24 09:45:24 -07:00
generatedunixname89002005287564@sandcastle105.ftw3.facebook.com
2f9c9796f1 [Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D23290730

fbshipit-source-id: ee3ffbd6f9c0fade4586d8f4f8c8dd3d310d1f33
2020-08-24 05:36:38 -07:00
Hameer Abbasi
c4e841654d Add alias torch.negative to torch.neg. (#43400)
Summary:
xref https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43400

Reviewed By: albanD

Differential Revision: D23266011

Pulled By: mruberry

fbshipit-source-id: ca20b30d99206a255cf26438b09c3ca1f99445c6
2020-08-24 01:15:04 -07:00
Nikolay Korovaiko
a97ca93c0e remove prim::profile and special-casing (#43160)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43160

Reviewed By: ZolotukhinM

Differential Revision: D23284421

Pulled By: Krovatkin

fbshipit-source-id: 35e97aad299509a682ae7e95d7cef53301625309
2020-08-22 23:52:36 -07:00
Nikolay Korovaiko
b1d31428e7 Reduce number of prim::profile (#43147)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43147

Reviewed By: colesbury

Differential Revision: D23190137

Pulled By: Krovatkin

fbshipit-source-id: bf5f29a76e5ebfb5b9d3b6adee424e213c25891b
2020-08-22 16:06:30 -07:00