pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mikhail Zolotukhin	71e6ce6616	[JIT] Specialize AutogradZero: merge AutogradAnyNonZero and Not(AutogradAnyNonZero) checks into one. (#44987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44987 This PR introduces new `prim::AutogradAllZero` and `prim::AutogradAllNonZero` ops that are used for a batch check for multiple tensors. The specialize-autogradzero pass now generates one check for all expected-to-be-undefined tensors, one check for all expected-to-be-defined tensors, and a bunch of checks for size parameters passed to `grad_sum_to_size` (this probably could be cleaned up somehow as well in future). An example of what we generated before this change: ``` %1626 : bool = prim::AutogradAnyNonZero(%0) %1627 : bool = prim::AutogradAnyNonZero(%2) %1628 : bool = aten::__not__(%1627) %1629 : bool = prim::AutogradAnyNonZero(%3) %1630 : bool = aten::__not__(%1629) %1631 : bool = prim::AutogradAnyNonZero(%4) %1632 : bool = aten::__not__(%1631) %1633 : bool = prim::AutogradAnyNonZero(%5) %1634 : bool = aten::__not__(%1633) %1635 : bool = prim::AutogradAnyNonZero(%6) %1636 : bool = aten::__not__(%1635) %1637 : bool = prim::AutogradAnyNonZero(%7) %1638 : bool = aten::__not__(%1637) %1639 : bool = prim::AutogradAnyNonZero(%8) %1640 : bool = aten::__not__(%1639) %1641 : bool = prim::AutogradAnyNonZero(%9) %1642 : bool = aten::__not__(%1641) %1643 : bool = prim::AutogradAnyNonZero(%10) %1644 : bool = aten::__not__(%1643) %1645 : bool = prim::AutogradAnyNonZero(%11) %1646 : bool = aten::__not__(%1645) %1647 : bool = prim::AutogradAnyNonZero(%12) %1648 : bool = aten::__not__(%1647) %1649 : bool = prim::AutogradAnyNonZero(%13) %1650 : bool = aten::__not__(%1649) %1651 : bool = prim::AutogradAnyNonZero(%14) %1652 : bool = aten::__not__(%1651) %1653 : bool = prim::AutogradAnyNonZero(%15) %1654 : bool = aten::__not__(%1653) %1655 : bool = prim::AutogradAnyNonZero(%16) %1656 : bool = aten::__not__(%1655) %1657 : bool = prim::AutogradAnyNonZero(%17) %1658 : bool = prim::AutogradAnyNonZero(%18) %1659 : bool = prim::AutogradAnyNonZero(%19) %1660 : bool = prim::AutogradAnyNonZero(%20) %1661 : bool = aten::__is__(%self_size.16, %1625) %1662 : bool = aten::__is__(%other_size.16, %1625) %1663 : bool = aten::__is__(%self_size.14, %1625) %1664 : bool = aten::__is__(%self_size.12, %1625) %1665 : bool = prim::AutogradAnyNonZero(%ingate.7) %1666 : bool = prim::AutogradAnyNonZero(%forgetgate.7) %1667 : bool = prim::AutogradAnyNonZero(%cellgate.7) %1668 : bool = prim::AutogradAnyNonZero(%30) %1669 : bool = prim::AutogradAnyNonZero(%31) %1670 : bool = aten::__is__(%self_size.10, %1625) %1671 : bool = aten::__is__(%other_size.10, %1625) %1672 : bool = prim::AutogradAnyNonZero(%34) %1673 : bool = prim::AutogradAnyNonZero(%35) %1674 : bool = aten::__is__(%self_size.8, %1625) %1675 : bool = aten::__is__(%other_size.8, %1625) %1676 : bool = aten::__is__(%self_size.6, %1625) %1677 : bool = aten::__is__(%other_size.6, %1625) %1678 : bool = prim::AutogradAnyNonZero(%outgate.7) %1679 : bool = prim::AutogradAnyNonZero(%41) %1680 : bool = prim::AutogradAnyNonZero(%42) %1681 : bool = prim::AutogradAnyNonZero(%43) %1682 : bool = aten::__is__(%self_size.4, %1625) %1683 : bool = aten::__is__(%other_size.4, %1625) %1684 : bool[] = prim::ListConstruct(%1626, %1628, %1630, %1632, %1634, %1636, %1638, %1640, %1642, %1644, %1646, %1648, %1650, %1652, %1654, %1656, %1657, %1658, %1659, %1660, %1661, %1662, %1663, %1664, %1665, %1666, %1667, %1668, %1669, %1670, %1671, %1672, %1673, %1674, %1675, %1676, %1677, %1678, %1679, %1680, %1681, %1682, %1683) %1685 : bool = aten::all(%1684) ``` Same example after this change: ``` %1625 : None = prim::Constant() %1626 : bool = aten::__is__(%self_size.16, %1625) %1627 : bool = aten::__is__(%other_size.16, %1625) %1628 : bool = aten::__is__(%self_size.14, %1625) %1629 : bool = aten::__is__(%self_size.12, %1625) %1630 : bool = aten::__is__(%self_size.10, %1625) %1631 : bool = aten::__is__(%other_size.10, %1625) %1632 : bool = aten::__is__(%self_size.8, %1625) %1633 : bool = aten::__is__(%other_size.8, %1625) %1634 : bool = aten::__is__(%self_size.6, %1625) %1635 : bool = aten::__is__(%other_size.6, %1625) %1636 : bool = aten::__is__(%self_size.4, %1625) %1637 : bool = aten::__is__(%other_size.4, %1625) %1638 : bool = prim::AutogradAllNonZero(%0, %17, %18, %19, %20, %ingate.7, %forgetgate.7, %cellgate.7, %30, %31, %34, %35, %outgate.7, %41, %42, %43) %1639 : bool = prim::AutogradAllZero(%2, %3, %4, %5, %6, %7, %8, %9, %10, %11, %12, %13, %14, %15, %16) %1640 : bool[] = prim::ListConstruct(%1626, %1627, %1628, %1629, %1630, %1631, %1632, %1633, %1634, %1635, %1636, %1637, %1638, %1639) %1641 : bool = aten::all(%1640) ``` My performance measurements showed some changes, but I don't really trust them and think that they are probably just a noise. Below are tables with min-aggregation over 10 runs: FastRNN models: \| name \| base time (s) \| diff time (s) \| % change \| \| :--- \| ---: \| ---: \| ---: \| \| lstm[aten]:bwd \| 30.059927 \| 29.834089 \| -0.8% \| \| lstm[aten]:fwd \| 25.673708 \| 25.700039 \| 0.1% \| \| lstm[cudnn]:bwd \| 17.866232 \| 17.893120 \| 0.2% \| \| lstm[cudnn]:fwd \| 11.418444 \| 11.408514 \| -0.1% \| \| lstm[jit]:bwd \| 27.127205 \| 27.141029 \| 0.1% \| \| lstm[jit]:fwd \| 17.018047 \| 16.975451 \| -0.3% \| \| lstm[jit_multilayer]:bwd \| 27.502396 \| 27.365149 \| -0.5% \| \| lstm[jit_multilayer]:fwd \| 16.918591 \| 16.917767 \| -0.0% \| \| lstm[jit_premul]:bwd \| 22.281199 \| 22.215082 \| -0.3% \| \| lstm[jit_premul]:fwd \| 14.848708 \| 14.896231 \| 0.3% \| \| lstm[jit_premul_bias]:bwd \| 20.761206 \| 21.170969 \| 2.0% \| \| lstm[jit_premul_bias]:fwd \| 15.013515 \| 15.037978 \| 0.2% \| \| lstm[jit_simple]:bwd \| 26.715771 \| 26.697786 \| -0.1% \| \| lstm[jit_simple]:fwd \| 16.675898 \| 16.545893 \| -0.8% \| \| lstm[py]:bwd \| 56.327065 \| 54.731030 \| -2.8% \| \| lstm[py]:fwd \| 39.876324 \| 39.230572 \| -1.6% \| Torch Hub models: \| name \| base time (s) \| diff time (s) \| % change \| \| :--- \| ---: \| ---: \| ---: \| \| test_eval[BERT_pytorch-cuda-jit] \| 0.111706 \| 0.106604 \| -4.6% \| \| test_eval[LearningToPaint-cuda-jit] \| 0.002841 \| 0.002801 \| -1.4% \| \| test_eval[Super_SloMo-cuda-jit] \| 0.384869 \| 0.384737 \| -0.0% \| \| test_eval[attension_is_all_you_nee...-cuda-jit] \| 0.123857 \| 0.123923 \| 0.1% \| \| test_eval[demucs-cuda-jit] \| 0.077270 \| 0.076878 \| -0.5% \| \| test_eval[fastNLP-cuda-jit] \| 0.000255 \| 0.000249 \| -2.3% \| \| test_eval[moco-cuda-jit] \| 0.426472 \| 0.427380 \| 0.2% \| \| test_eval[pytorch_CycleGAN_and_pix...-cuda-jit] \| 0.026483 \| 0.026423 \| -0.2% \| \| test_eval[pytorch_mobilenet_v3-cuda-jit] \| 0.036202 \| 0.035853 \| -1.0% \| \| test_eval[pytorch_struct-cuda-jit] \| 0.001439 \| 0.001495 \| 3.9% \| \| test_train[BERT_pytorch-cuda-jit] \| 0.247236 \| 0.247188 \| -0.0% \| \| test_train[Background_Matting-cuda-jit] \| 3.536659 \| 3.581864 \| 1.3% \| \| test_train[LearningToPaint-cuda-jit] \| 0.015341 \| 0.015331 \| -0.1% \| \| test_train[Super_SloMo-cuda-jit] \| 1.018626 \| 1.019098 \| 0.0% \| \| test_train[attension_is_all_you_nee...-cuda-jit] \| 0.446314 \| 0.444893 \| -0.3% \| \| test_train[demucs-cuda-jit] \| 0.169647 \| 0.169846 \| 0.1% \| \| test_train[fastNLP-cuda-jit] \| 0.001990 \| 0.001978 \| -0.6% \| \| test_train[moco-cuda-jit] \| 0.855323 \| 0.856974 \| 0.2% \| \| test_train[pytorch_mobilenet_v3-cuda-jit] \| 0.497723 \| 0.485416 \| -2.5% \| \| test_train[pytorch_struct-cuda-jit] \| 0.309692 \| 0.308792 \| -0.3% \| Differential Revision: D23794659 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 859b68868ef839c5c6cbc7021879ee22d3144ea8	2020-09-24 14:31:49 -07:00
Xinyu Li	26001a2334	Revert D23753711: [pytorch][PR] Add foreach APIs for binary ops with ScalarList Test Plan: revert-hammer Differential Revision: D23753711 (`71d1b5b0e2`) Original commit changeset: bf3e8c54bc07 fbshipit-source-id: 192692e0d3fff4cade9983db0a1760fedfc9674c	2020-09-24 11:55:49 -07:00
Raziel Alvarez Guevara	2b38c09f69	Moves prim ops from C10 back to JIT (#45144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45144 Moves prim ops from C10 back to JIT. These were originally moved to C10 from JIT in D19237648 (`f362cd510d`) ghstack-source-id: 112775781 Test Plan: buck test //caffe2/test/cpp/jit:jit https://pxl.cl/1l22N buck test adsatlas/gavel/lib/ata_processor/tests:ata_processor_test https://pxl.cl/1lBxD Reviewed By: iseeyuan Differential Revision: D23697598 fbshipit-source-id: 36d1eb8c346e9b161ba6af537a218440a9bafd27	2020-09-24 09:44:20 -07:00
iurii zdebskyi	71d1b5b0e2	Add foreach APIs for binary ops with ScalarList (#44743 ) Summary: In this PR: 1) Added binary operations with ScalarLists. 2) Fixed _foreach_div(...) bug in native_functions 3) Covered all possible cases with scalars and scalar lists in tests 4) [minor] fixed bug in native_functions by adding "use_c10_dispatcher: full" to all _foreach functions tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/44743 Reviewed By: bwasti, malfet Differential Revision: D23753711 Pulled By: izdeby fbshipit-source-id: bf3e8c54bc07867e8f6e82b5d3d35ff8e99b5a0a	2020-09-24 08:30:42 -07:00
Alex Suhan	3dd0e362db	[TensorExpr] Fix min and max for integral inputs in CUDA backend (#44984 ) Summary: For integral types, isnan is meaningless. Provide specializations for maximum and minimum which don't call it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44984 Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_minmax_int_ops Reviewed By: ezyang Differential Revision: D23885259 Pulled By: asuhan fbshipit-source-id: 2e6da2c43c0ed18f0b648a2383d510894c574437	2020-09-23 23:19:12 -07:00
Peter Bell	6a2e9eb51c	torch.fft: Multi-dimensional transforms (#44550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44550 Part of the `torch.fft` work (gh-42175). This adds n-dimensional transforms: `fftn`, `ifftn`, `rfftn` and `irfftn`. This is aiming for correctness first, with the implementation on top of the existing `_fft_with_size` restrictions. I plan to follow up later with a more efficient rewrite that makes `_fft_with_size` work with arbitrary numbers of dimensions. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23846032 Pulled By: mruberry fbshipit-source-id: e6950aa8be438ec5cb95fb10bd7b8bc9ffb7d824	2020-09-23 22:09:58 -07:00
Supriya Rao	60665ace17	[quant] Add optimized approach to calculate qparams for qembedding_bag (#45149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45149 The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23848060 fbshipit-source-id: c6c57c9bb07664c3f1c87dd7664543e09f634aee	2020-09-23 19:00:22 -07:00
Alex Suhan	76c185dcca	[TensorExpr] When lanes differ, insert Broadcast instead of Cast (#45179 ) Summary: We need to check if dtypes differ in scalar type or lanes to decide between Cast and Broadcast. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45179 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.SimplifyBroadcastTermExpander Reviewed By: bwasti Differential Revision: D23873316 Pulled By: asuhan fbshipit-source-id: ca141be67e10c2b6c5f2ff9c11e42dcfc62ac620	2020-09-23 17:06:54 -07:00
Alex Suhan	0495998862	[TensorExpr] Disallow arithmetic binary operations on Bool (#44677 ) Summary: Arithmetic operations on Bool aren't fully supported in the evaluator. Moreover, such semantics can be implemented by the client code through insertion of explicit casts to widen and narrow to the desired types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44677 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ExprDisallowBoolArithmetic python test/test_jit_fuser_te.py Reviewed By: agolynski Differential Revision: D23801412 Pulled By: asuhan fbshipit-source-id: fff5284e3a216655dbf5a9a64d1cb1efda271a36	2020-09-23 14:59:11 -07:00
Alex Suhan	8e0fc711f4	[TensorExpr] Remove unused EvalConstExpr function (#45180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45180 Test Plan: build Reviewed By: ezyang Differential Revision: D23877151 Pulled By: asuhan fbshipit-source-id: a5d4d211c1dc85e6f7045330606163a933b9474e	2020-09-23 14:55:27 -07:00
Yi Wang	2a1a51facb	Fix typos. (#45195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45195 Fix some typos in reducer class. ghstack-source-id: 112673443 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D23862399 fbshipit-source-id: 0dc69e5ea1fa7d33c85d1909b2216bcd1f579f6a	2020-09-23 14:51:15 -07:00
Nick Gibson	9e206ee9f1	[NNC] Fix a bug in SplitWithMask when splitting multiple times (#45141 ) Summary: When doing a splitWithMask we only mask if the loop extent is not cleanly divide by the split factor. However, the logic does not simplify so any nontrivial loop extents will always cause a mask to be added, e.g. if the loop had been previously split. Unlike splitWithTail, the masks added by splitWithMask are always overhead and we don't have the analysis to optimize them out if they are unnecessary, so it's good to avoid inserting them if we can. The fix is just to simplify the loop extents before doing the extent calculation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45141 Reviewed By: ezyang Differential Revision: D23869170 Pulled By: nickgg fbshipit-source-id: 44686fd7b802965ca4f5097b0172a41cf837a1f5	2020-09-23 14:04:58 -07:00
Bradley Davis	21fabae47a	Remove expensive call to PyObject_GetAttrString in PyTorch_LookupSpecial (#44684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44684 The ad-hoc quantization benchmarking script in D23689062 recently highlighted that quantized ops were surprisingly slow after the introduction of support for custom ops in torch.fx in D23203204 (`f15e27265f`). Using strobelight, it's immediately clear that up to 66% of samples were seen in `c10::get_backtrace`, which is descends from `torch::is_tensor_and_apppend_overloaded -> torch::check_has_torch_function -> torch::PyTorch_LookupSpecial -> PyObject_HasAttrString -> PyObject_GetAttrString`. I'm no expert by any means so please correct any/all misinterpretation, but it appears that: - `check_has_torch_function` only needs to return a bool - `PyTorch_LookupSpecial` should return `NULL` if a matching method is not found on the object - in the impl of `PyTorch_LookupSpecial` the return value from `PyObject_HasAttrString` only serves as a bool to return early, but ultimately ends up invoking `PyObject_GetAttrString`, which raises, spawning the generation of a backtrace - `PyObject_FastGetAttrString` returns `NULL` (stolen ref to an empty py::object if the if/else if isn't hit) if the method is not found, anyway, so it could be used singularly instead of invoking both `GetAttrString` and `FastGetAttrString` - D23203204 (`f15e27265f`) compounded (but maybe not directly caused) the problem by increasing the number of invocations so, removing it in this diff and seeing how many things break :) before: strobelight: see internal section output from D23689062 script: ``` $ ./buck-out/gen/scripts/v/test_pt_quant_perf.par Sequential( (0): Quantize(scale=tensor([0.0241]), zero_point=tensor([60]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.017489388585090637, zero_point=68, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.010896682739257812 q 0.11908197402954102 ``` after: strobelight: see internal section output from D23689062 script: ``` $ ./buck-out/gen/scripts/v/test_pt_quant_perf.par Sequential( (0): Quantize(scale=tensor([0.0247]), zero_point=tensor([46]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.012683945707976818, zero_point=41, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.011141300201416016 q 0.022639036178588867 ``` which roughly restores original performance seen in P142370729 UPDATE: 9/22 mode/opt benchmarks ``` buck run //scripts/x:test_pt_quant_perf mode/opt Sequential( (0): Quantize(scale=tensor([0.0263]), zero_point=tensor([82]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.021224206313490868, zero_point=50, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.002968311309814453 q 0.5138928890228271 ``` with patch: ``` buck run //scripts/x:test_pt_quant_perf mode/opt Sequential( (0): Quantize(scale=tensor([0.0323]), zero_point=tensor([70]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.017184294760227203, zero_point=61, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.0026655197143554688 q 0.0064449310302734375 ``` Reviewed By: ezyang Differential Revision: D23697334 fbshipit-source-id: f756d744688615e01c94bf5c48c425747458fb33	2020-09-23 13:52:54 -07:00
Zino Benaissa	4d80c8c648	Fix inlining interface call in fork subgraph (#43790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43790 Interface calls were not handled properly when they are used in fork subgraph. This PR fixes this issue. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23402039 Pulled By: bzinodev fbshipit-source-id: 41adc5ee7d942250e732e243ab30e356d78d9bf7	2020-09-23 11:17:19 -07:00
Edward Yang	da4033d32a	Make cudaHostRegister actually useful on cudart. (#45159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45159 By default, pybind11 binds void* to be capsules. After a lot of Googling, I have concluded that this is not actually useful: you can't actually create a capsule from Python land, and our data_ptr() function returns an int, which means that the function is effectively unusable. It didn't help that we had no tests exercising it. I've replaced the void* with uintptr_t, so that we now accept int (and you can pass data_ptr() in directly). I'm not sure if we should make these functions accept ctypes types; unfortunately, pybind11 doesn't seem to have any easy way to do this. Fixes #43006 Also added cudaHostUnregister which was requested. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D23849731 Pulled By: ezyang fbshipit-source-id: 8a79986f3aa9546abbd2a6a5828329ae90fd298f	2020-09-23 11:05:44 -07:00
Shen Li	94c3cdd994	Let rpc._all_gather use default RPC timeout (#44983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44983 `_all_gather` was converted from `_wait_all_workers` and inherited its 5 seconds fixed timeout. As `_all_gather` meant to support a broader set of use cases, the timeout configuration should be more flexible. This PR makes `rpc._all_gather` use the global default RPC timeout. Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23794383 Pulled By: mrshenli fbshipit-source-id: 382f52c375f0f25c032c5abfc910f72baf4c5ad9	2020-09-23 08:06:09 -07:00
Martin Yuan	e5bade7b2c	[PyTorch Mobile] Move string op registrations to prim and make them selective (#44960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44960 Since we have templated selective build, it should be safe to move the operators to prim so that they can be selectively built in mobile Test Plan: CI Reviewed By: linbinyu Differential Revision: D23772025 fbshipit-source-id: 52cebae76e4df5a6b2b51f2cd82f06f75e2e45d0	2020-09-23 07:42:35 -07:00
Luca Wehrstedt	76dc50e9c8	[RPC] Infer backend type if only options are given (#45065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45065 To preserve backwards compatibility with applications that were passing in some ProcessGroupRpcBackendOptions but were not explicitly setting backend=BackendType.PROCESS_GROUP, we're here now inferring the backend type from the options if only the latter ones are passed. If neither are passed, we'll default to TensorPipe, as before this change. ghstack-source-id: 112586258 Test Plan: Added new unit tests. Reviewed By: pritamdamania87 Differential Revision: D23814289 fbshipit-source-id: f4be7919e0817a4f539a50ab12216dc3178cb752	2020-09-23 00:46:27 -07:00
Alex Suhan	215679573e	[TensorExpr] Fix operator order in combineMultilane (#45157 ) Summary: combineMultilane used the wrong order when ramp was on the left hand side, which matters for subtract. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45157 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.SimplifyRampSubBroadcast Reviewed By: ailzhang Differential Revision: D23851751 Pulled By: asuhan fbshipit-source-id: 864d1611e88769fb43327ef226bb3310017bf858	2020-09-22 23:50:47 -07:00
Rohan Varma	d4a634c209	[RPC profiling] Don't wrap toHere() calls with profiling (#44655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44655 Since `toHere()` does not execute operations over RPC and simply transfers the value to the local node, we don't need to enable the profiler remotely for this message. This causes unnecessary overhead and is not needed. Since `toHere` is a blocking call, we already profile the call on the local node using `RECORD_USER_SCOPE`, so this does not change the expected profiler results (validated by ensuring all remote profiling tests pass). ghstack-source-id: 112605610 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23641466 fbshipit-source-id: 109d9eb10bd7fe76122b2026aaf1c7893ad10588	2020-09-22 21:17:00 -07:00
Rohan Varma	70d2e4d1f6	[RPC profiling] Allow disableProfiler() to be called from another thread. (#44653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44653 This changes the profiler per a discussion with ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling. This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default. Added a test in `test_misc.cpp` to test this. ghstack-source-id: 112605620 Reviewed By: mrshenli Differential Revision: D23638499 fbshipit-source-id: f5bbb0d41ef883c5e5870bc27e086b8b8908f46b	2020-09-22 21:16:58 -07:00
Rohan Varma	1bd6533d60	Remove thread_local RecordFunctionGuard from profiler. (#44646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44646 Per a discussion with ilia-cher, this is not needed anymore and removing it would make some future changes to support async RPC profiling easier. Tested by ensuring profiling tests in `test_autograd.py` still pass. ghstack-source-id: 112605618 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23683998 fbshipit-source-id: 4e49a439509884fe04d922553890ae353e3331ab	2020-09-22 21:15:31 -07:00
Jerry Zhang	f575df201f	[quant][graphmode][jit][api] Expose preserved_attrs from finalize to convert_jit (#44490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44490 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23631142 fbshipit-source-id: f0913f0cb4576067e2a7288326024942d12e0ae0	2020-09-22 19:37:25 -07:00
Meghan Lele	e045119956	[JIT] Add default arguments for class types (#45098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45098 Summary This commit adds support for default arguments in methods of class types. Similar to how default arguments are supported for regular script functions and methods on scripted modules, default values are retrieved from the definition of a TorchScript class in Python as Python objects, converted to IValues, and then attached to the schemas of already compiled class methods. Test Plan This commit adds a set of new tests to TestClassType to test default arguments. Fixes This commit fixes #42562. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23844769 Pulled By: SplitInfinity fbshipit-source-id: ceedff7703bf9ede8bd07b3abcb44a0f654936bd	2020-09-22 18:37:44 -07:00
Bram Wasti	ebde5a80bb	[tensorexpr] Add flag to fuse with unknown shapes (#44401 ) Summary: This flag simply allows users to get fusion groups that will eventually have shapes (such that `getOperation` is a valid). This is useful for doing early analysis and compiling just in time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44401 Reviewed By: ZolotukhinM Differential Revision: D23656140 Pulled By: bwasti fbshipit-source-id: 9a26c202752399d1932ad7d69f21c88081ffc1e5	2020-09-22 18:17:47 -07:00
Yanan Cao	c253b10154	Fix incorrect EnumValue serialization issue (#44891 ) Summary: Previously, `prim::EnumValue` is serialized to `ops.prim.EnumValue`, which doesn't have the right implementation to refine return type. This diff correctly serializes it to enum.value, thus fixing the issue. Fixes https://github.com/pytorch/pytorch/issues/44892 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44891 Reviewed By: malfet Differential Revision: D23818962 Pulled By: gmagogsfm fbshipit-source-id: 6edfdf9c4b932176b08abc69284a916cab10081b	2020-09-22 11:59:45 -07:00
Ailing Zhang	10f287539f	Align casing in test_dispatch with dispatch keys. (#44933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44933 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23778247 Pulled By: ailzhang fbshipit-source-id: bc3725eae670b03543015afe763cb3bb16baf8f6	2020-09-22 10:50:08 -07:00
Elias Ellison	ae286d81e0	[JIT] improve alias analysis for list constructs (#39111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39111 In our present alias analysis, we consider any Value that enter another container as entering the heap, and thus aliasing all other heap values of the same type. There are a number of advantages to this approach: - it is not to hard to maintain the aliasDb implementation - it is much easier from an op schema perspective - there are many composite list ops registered internally and externally that would be tricky to register and get right if we did something more complicated - It limits the size of the AliasDb, because a container of size 10 only contains a single memory dag element instead of 10 elements. The downside is that we have are unable to handle the simple and extremely common case of a list of tensors being used in an ATen op. In an example like: ``` def foo(input): x = torch.tensor([1, 2, 3, 4]) y = [x, x] input.add_(1) return torch.cat(y) ``` we will consider x to be written to. any write to any wildcard element (an element that enters a tuple, an element that is taken from a list) will mark x as written to. This can be limiting for our ability to create a functional subset and fuse graphs - as a result, 4 of TorchVision classification models could not be functionalized. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23828003 Pulled By: eellison fbshipit-source-id: 9109fcb6f2ca20ca897cae71683530285da9d537	2020-09-22 09:38:59 -07:00
Nikita Shulga	63fd257879	Add `Ellipsis` constant to the list of recognized tokens (#44959 ) Summary: Per https://docs.python.org/3.6/library/constants.html > `Ellipsis` is the same as ellipsis literal `...` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44959 Reviewed By: suo Differential Revision: D23785660 Pulled By: malfet fbshipit-source-id: f68461849e7d16ef68042eb96566f2c936c06b0f	2020-09-22 09:05:25 -07:00
anjali411	58b6ab69e5	torch.sgn for complex tensors (#39955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39955 resolves https://github.com/pytorch/pytorch/issues/36323 by adding `torch.sgn` for complex tensors. `torch.sgn` returns `x/abs(x)` for `x != 0` and returns `0 + 0j` for `x==0` This PR doesn't test the correctness of the gradients. It will be done as a part of auditing all the ops in future once we decide the autograd behavior (JAX vs TF) and add gradchek. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23460526 Pulled By: anjali411 fbshipit-source-id: 70fc4e14e4d66196e27cf188e0422a335fc42f92	2020-09-22 08:24:53 -07:00
Bugra Akyildiz	1b059f2c6d	Directly use work.result() to retrieve tensor rather than passing as a separate argument (#44914 ) Summary: We currently are fetching an allreduced tensor from Python in C++ in, where we are storing the resulting tensor in a struct's parameter. This PR removes extra tensor paratemeter in the function parameter and fetch from a single place. Fixes https://github.com/pytorch/pytorch/issues/43960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44914 Reviewed By: rohan-varma Differential Revision: D23798888 Pulled By: bugra fbshipit-source-id: ad1b8c31c15e3758a57b17218bbb9dc1f61f1577	2020-09-22 06:28:47 -07:00
Jerry Zhang	5aed75b21b	[quant][graphmode][jit] Try to support append (#44641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44641 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23682356 fbshipit-source-id: 09a03dfde0b1346a5764e8e28ba56e32b343d239	2020-09-21 23:13:56 -07:00
Ksenija Stanojevic	0dda65ac77	[ONNX] add jit pass for lists (#43820 ) Summary: Add jit preprocessing pass for adding int lists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43820 Reviewed By: albanD Differential Revision: D23674598 Pulled By: bzinodev fbshipit-source-id: 35766403a073e202563bba5251c07efb7cc5cfb1	2020-09-21 22:05:25 -07:00
Shen Li	09e7f62ce2	Fix RPC and ProcessGroup GIL deadlock (#45088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45088 Fixes #45082 Found a few problems while working on #44983 1. We deliberately swallow RPC timeouts during shutdown, as we haven't found a good way to handle those. When we convert `_wait_all_workers` into `_all_gather`, the same logic was inherited. However, as `_all_gather` meant to be used in more general scenarios, we should no longer keep silent about errors. This commit let the error throw in `_all_gather` and also let `shutdown()` to catch them and log. 2. After fixing (1), I found that `UnpickledPythonCall` needs to acquire GIL on destruction, and this can lead to deadlock when used in conjuction with `ProcessGroup`. Because `ProcessGroup` ctor is a synchronization point which holds GIL. In `init_rpc`, followers (`rank != 0`) can exit before the leader (`rank == 0`). If the two happens together, we could get a) on a follower, it exits `init_rpc` after running `_broadcast_to_followers` and before the reaching dtor of `UnpickledPythonCall`. Then it runs the ctor of `ProcessGroup`, which holds the GIL and wait for the leader to join. However, the leader is waiting for the response from `_broadcast_to_followers`, which is blocked by the dtor of `UnpickledPythonCall`. And hence the deadlock. This commit drops the GIL in `ProcessGroup` ctor. 3. After fixing (2), I found that `TensorPipe` backend nondeterministically fails with `test_local_shutdown`, due to a similar reason as (2), but this time it is that `shutdown()` on a follower runs before the leader finishes `init_rpc`. This commit adds a join for `TensorPipe` backend `init_rpc` after `_all_gather`. The 3rd one should be able to solve the 2nd one as well. But since I didn't see a reason to hold GIL during `ProcessGroup` ctor, I made that change too. Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23825592 Pulled By: mrshenli fbshipit-source-id: 94920f2ad357746a6b8e4ffaa380dd56a7310976	2020-09-21 21:47:27 -07:00
Nikita Shulga	81bb19c9f0	[JIT] Prohibit subscripted assignments for tuple types (#44929 ) Summary: This would force jit.script to raise an error if someone tries to mutate tuple ``` Tuple[int, int] does not support subscripted assignment: File "/home/nshulga/test/tupleassignment.py", line 9 torch.jit.script def foo(x: Tuple[int, int]) -> int: x[-1] = x[0] + 1 ~~~~~ <--- HERE ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44929 Reviewed By: suo Differential Revision: D23777668 Pulled By: malfet fbshipit-source-id: 8efaa4167354ffb4930ccb3e702736a3209151b6	2020-09-21 16:35:44 -07:00
Ailing Zhang	92f8f75c59	Add alias dispatch key Math. (#44354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44354 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23591481 Pulled By: ailzhang fbshipit-source-id: 6e93c4ec99a07f3fc920ba2d09dc222e6ced5adf	2020-09-21 11:10:39 -07:00
Lucas Hosseini	ac8c7c4e9f	Make Channel API accept buffer structs rather than raw pointers. (#45014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45014 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/219 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/212 + Introduce buffer.h defining the buffer struct(s). The `CpuBuffer` struct is always defined, while the `CudaBuffer` struct is defined only when `TENSORPIPE_SUPPORTS_CUDA` is true. + Update all channels to take a `CpuBuffer` or `CudaBuffer` for `send`/`recv` rather than a raw pointer and a length. + Make the base `Channel`/`Context` classes templated on `TBuffer`, effectively creating two channel hierarchies (one for CPU channels, one for CUDA channels). + Update the Pipe and the generic channel tests to use the new API. So far, generic channel tests are CPU only, and tests for the CUDA IPC channel are (temporarily) disabled. A subsequent PR will take care of refactoring tests so that generic tests work for CUDA channels. An other PR will add support for CUDA tensors in the Pipe. Differential Revision: D23598033 Test Plan: Imported from OSS Reviewed By: lw Pulled By: beauby fbshipit-source-id: 1d6c3f91e288420858835cd5e7962e8da051b44b	2020-09-21 10:18:45 -07:00
Nick Gibson	4bbb6adff5	[NNC] fix SyncThreads insertion and reenable CudaSharedMem test (#44909 ) Summary: A previous fix for masking Cuda dimensions (https://github.com/pytorch/pytorch/issues/44733) changed the behaviour of inserting thread synchronization barriers in the Cuda CodeGen, causing the CudaSharedMemReduce_1 to be flaky and ultimately disabled. The issue is working out where these barriers must be inserted - solving this optimally is very hard, and I think not possible without dependency analysis we don't have, so I've changed our logic to be quite pessimistic. We'll insert barriers before and after any blocks that have thread dimensions masked (even between blocks that have no data dependencies). This should be correct, but it's an area we could improve performance. To address this somewhat I've added a simplifier pass that removes obviously unnecessary syncThreads. To avoid this test being flaky again, I've added a check against the generated code to ensure there is a syncThread in the right place. Also fixed a couple of non-functional but clarity issues in the generated code: fixed the missing newline after Stores in the CudaPrinter, and prevented the PrioritizeLoad mutator from pulling out loads contained within simple Let statements (such as those produced by the Registerizer). Pull Request resolved: https://github.com/pytorch/pytorch/pull/44909 Reviewed By: agolynski Differential Revision: D23800565 Pulled By: nickgg fbshipit-source-id: bddef1f40d8d461da965685f01d00b468d8a2c2f	2020-09-21 09:27:22 -07:00
anjali411	9f67176b82	Complex gradcheck logic (#43208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43208 This PR adds gradcheck for complex. The logic used for complex gradcheck is described in Section 3.5.3 here: https://arxiv.org/pdf/1701.00392.pdf More concretely, this PR introduces the following changes: 1. Updates get_numerical_jacobian to take as input a scalar value for vector (v). Adds gradcheck logic for C -> C, C-> R, R -> C. For R -> C functions, only the real value of gradient is propagated. 2. Adds backward definition for `torch.complex` and also adds a test to verify the definition added. 3. Updates backward for `mul`, `sin`, `cos`, `sinh`, `cosh`. 4. Adds tests for all `torch.real`, `torch.imag`, `torch.view_as_real`, `torch.view_as_complex`, `torch.conj`. Follow up tasks: 1. Add more thorough tests for R -> C cases. Specifically, add R->C test variants for functions. for e.g., `torch.mul(complex_tensor, real_tensor)` 2. Add back commented test in `common_methods_invocation.py`. 3. Add more special case checking for complex gradcheck to make debugging easier. 4. Update complex autograd note. 5. disable complex autograd for operators not tested for complex. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23655088 Pulled By: anjali411 fbshipit-source-id: caa75e09864b5f6ead0f988f6368dce64cf15deb	2020-09-20 22:05:04 -07:00
Peter Bell	da7863f46b	Add one dimensional FFTs to torch.fft namespace (#43011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43011 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23751850 Pulled By: mruberry fbshipit-source-id: 8dc5fec75102d8809eeb85a3d347ba1b5de45b33	2020-09-19 23:32:22 -07:00
Mike Ruberry	60709ad1bf	Adds multiply and divide aliases (#44463 ) Summary: These alias are consistent with NumPy. Note that C++'s naming would be different (std::multiplies and std::divides), and that PyTorch's existing names (mul and div) are consistent with Python's dunders. This also improves the instructions for adding an alias to clarify that dispatch keys should be removed when copying native_function.yaml entries to create the alias entries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44463 Reviewed By: ngimel Differential Revision: D23670782 Pulled By: mruberry fbshipit-source-id: 9f1bdf8ff447abc624ff9e9be7ac600f98340ac4	2020-09-19 15:47:52 -07:00
Ivan Kobzarev	e9941a5dd4	[vulkan][py] torch.utils.optimize_for_vulkan (#44903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44903 Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D23766039 Pulled By: IvanKobzarev fbshipit-source-id: dbdf484ee7d3a7719aab105efba51b92ebc51568	2020-09-18 18:20:11 -07:00
Shawn Wu	572f7e069c	Enable type check for torch.testing._internal.te_utils.* (#44927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44927 Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D23776842 Pulled By: sshawnwu fbshipit-source-id: 65c028169a37e1f2f7d9fdce8a958234ee1caa26	2020-09-18 18:09:15 -07:00
Peter Bell	fd4e21c91e	Add optional string support to native_functions schema (#43010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43010 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23751851 Pulled By: mruberry fbshipit-source-id: 648f7430e1b7311eff28421f38e01f52d998fcbd	2020-09-18 14:57:24 -07:00
Michael Suo	374e9373b5	[jit] Pull (most) tests out of libtorch_python (#44795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44795 Today, we build our cpp tests twice, once as a standalone gtest binary, and once linked in `libtorch_python` so we can call them from `test_jit.py`. This is convenient (it means that `test_jit.py` is a single entry point for all our tests), but has a few drawbacks: 1. We can't actually use the gtest APIs, since we don't link gtest into `libtorch_python`. We're stuck with the subset that we want to write polyfills for, and an awkward registration scheme where you have to write a test then include it in `tests.h`). 2. More seriously, we register custom operators and classes in these tests. In a world where we may be linking many `libtorch_python`s, this has a tendency to cause errors with `libtorch`. So now, only tests that explicitly require cooperation with Python are built into `libtorch_python`. The rest are built into `build/bin/test_jit`. There are tests which require that we define custom classes and operators. In these cases, I've built thm into separate `.so`s that we call `torch.ops.load_library()` on. Test Plan: Imported from OSS Reviewed By: SplitInfinity, ZolotukhinM Differential Revision: D23735520 Pulled By: suo fbshipit-source-id: d146bf4e7eb908afa6f96b394e4d395d63ad72ff	2020-09-18 14:04:40 -07:00
Lucas Hosseini	af3fc9725d	Extract rpc/tensorpipe_utils.{cpp,h} from rpc/utils.{cpp,h} (#44803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44803 Test Plan: CI Reviewed By: lw Differential Revision: D23732022 fbshipit-source-id: 5b839c7997bbee162a14d03414ee32baabbc8ece	2020-09-18 13:51:43 -07:00
Nick Gibson	f175830558	[NNC] Fuse identical conditions in simplifier (#44886 ) Summary: Adds a pass to the IR Simplifier which fuses together the bodies of Cond statements which have identical conditions. e.g. ``` if (i < 10) { do_thing_1; } else { do_thing_2; } if (i < 10) { do_thing_3; } ``` is transformed into: ``` if (i < 10) { do_thing_1; do_thing_3; } else { do_thing_2; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44886 Reviewed By: glaringlee Differential Revision: D23768565 Pulled By: nickgg fbshipit-source-id: 3fe40d91e82bdfff8dcb8c56a02a4fd579c070df	2020-09-18 11:38:03 -07:00
Yanan Cao	174cbff00a	Improve sugared value's error message (#42889 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/42889 Improve sugared value's error message I think most (if not all) cases where this code path is reached can be attributed to closing over a global variable. Improving error message to make this clearer to users. close https://github.com/pytorch/pytorch/issues/41288 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42889 Reviewed By: SplitInfinity Differential Revision: D23779347 Pulled By: gmagogsfm fbshipit-source-id: ced702a96234040f79eb16ad998d202e360d6654	2020-09-18 11:01:40 -07:00
Peter Bell	df39c40054	Cleanup tracer handling of optional arguments (#43009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43009 * #43009 Cleanup tracer handling of optional arguments Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23766621 Pulled By: mruberry fbshipit-source-id: c1b46cd23b58b18ef4c03021b2514d7e692badb6	2020-09-18 06:54:09 -07:00
Rohan Varma	5dbcbea265	TorchScript with record_function (#44345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44345 As part of enhancing profiler support for RPC, when executing TorchScript functions over RPC, we would like to be able to support user-defined profiling scopes created by `with record_function(...)`. Since after https://github.com/pytorch/pytorch/pull/34705, we support `with` statements in TorchScript, this PR adds support for `with torch.autograd.profiler.record_function` to be used within TorchScript. This can be accomplished via the following without this PR: ``` torch.opts.profiler._record_function_enter(...) # Script code, such as forward pass torch.opts.profiler._record_function_exit(....) ``` This is a bit hacky and it would be much cleaner to use the context manager now that we support `with` statements. Also, `_record_function_` type operators are internal operators that are subject to change, this change will help avoid BC issues in the future. Tested with `python test/test_jit.py TestWith.test_with_record_function -v` ghstack-source-id: 112320645 Test Plan: Repro instructions: 1) Change `def script_add_ones_return_any(x) -> Any` to `def script_add_ones_return_any(x) -> Tensor` in `jit/rpc_test.py` 2) `buck test mode/dev-nosan //caffe2/test/distributed/rpc:process_group_agent -- test_record_function_on_caller_rpc_async --print-passing-details` 3) The function which ideally should accept `Future[Any]` is `def _call_end_callbacks_on_future` in `autograd/profiler.py`. python test/test_jit.py TestWith.test_with_foo -v Reviewed By: pritamdamania87 Differential Revision: D23332074 fbshipit-source-id: 61b0078578e8b23bfad5eeec3b0b146b6b35a870	2020-09-17 18:45:00 -07:00
Yuxin Wu	9a007ba4cb	[jit] stop parsing the block after seeing exit statements (#44870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44870 fix https://github.com/pytorch/pytorch/issues/44864 Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- 'test_assert_is_script' Reviewed By: eellison Differential Revision: D23755094 fbshipit-source-id: ca3f8b27dc6f9dc9364a22a1bce0e2f588ed4308	2020-09-17 18:09:16 -07:00
Yanli Zhao	e14b2080be	[reland] move rebuild buckets from end of first iteration to beginning of second iteration (#44798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44798 [test all] Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well. Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration ghstack-source-id: 112279261 ghstack-source-id: 112279261 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D23735185 fbshipit-source-id: c26e0efeecb3511640120faa1122a2c856cd694e	2020-09-17 17:10:21 -07:00
Alex Suhan	18b77d7d17	[TensorExpr] Add Mod support to the LLVM backend (#44823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44823 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseMod_LLVM Reviewed By: glaringlee Differential Revision: D23761996 Pulled By: asuhan fbshipit-source-id: c3c5b2fe0d989dec04f0152ce47c5cae35ed19c9	2020-09-17 15:25:42 -07:00
Alex Suhan	f5b92332c1	[TensorExpr] Fix order comparisons for unsigned types (#44857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44857 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMCompareSelectByte*_LLVM Reviewed By: glaringlee Differential Revision: D23762162 Pulled By: asuhan fbshipit-source-id: 1553429bd2d5292ccda57910326b8c70e4e6ab88	2020-09-17 14:16:54 -07:00
Nikita Shulga	4066022146	Do not use `PRId64` in torch/csrc (#44767 ) Summary: Instead use `fmt::format()` or `%lld` and cast argument to `(long long)` Fix typos and add helper `PyErr_SetString()` method in torch/csrc/Exceptions.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/44767 Reviewed By: ezyang Differential Revision: D23723671 Pulled By: malfet fbshipit-source-id: c0101aed222184aa436b1e8768480d1531dff232	2020-09-17 14:00:02 -07:00
Alex Suhan	5d57025206	[TensorExpr] Add log1p support to the LLVM backend (#44839 ) Summary: Also corrected Sleef_log1p registrations, float versions had a redundant f. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44839 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseLog1pFloat_LLVM Reviewed By: glaringlee Differential Revision: D23762113 Pulled By: asuhan fbshipit-source-id: b5cf003b5c0c1ad549c7f04470352231929ac459	2020-09-17 13:38:35 -07:00
Yanan Cao	2558e5769d	Implement sort for list of tuples (#43448 ) Summary: * Implement tuple sort by traversing contained IValue types and generate a lambda function as comparator for sort. * Tuple, class objects can now arbitrarily nest within each other and still be sortable Fixes https://github.com/pytorch/pytorch/issues/43219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43448 Reviewed By: eellison Differential Revision: D23352273 Pulled By: gmagogsfm fbshipit-source-id: b6efa8d00e112178de8256da3deebdba7d06c0e1	2020-09-17 11:20:56 -07:00
Yanli Zhao	d2b4534d4d	refactor intialize bucket views (#44330 ) Summary: [test all] Pull Request resolved: https://github.com/pytorch/pytorch/pull/44330 Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well ghstack-source-id: 112257271 Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D23583347 fbshipit-source-id: a5f2041b2c4f2c2b5faba1af834c7143eaade938	2020-09-17 09:20:23 -07:00
Yanan Cao	99093277c0	Support Python Slice class in TorchScript (#44335 ) Summary: Implements support for[ Python Slice class](https://docs.python.org/3/c-api/slice.html) (not slice expression, which is already supported) Slice object can be used in any place that supports slice expression, including multi-dim tensor slicing. Fixes https://github.com/pytorch/pytorch/issues/43511 Fixes https://github.com/pytorch/pytorch/issues/43125 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44335 Reviewed By: suo, jamesr66a Differential Revision: D23682213 Pulled By: gmagogsfm fbshipit-source-id: f74fe25370e89fbfd2b3727d95ce4e1c4ba8dec4	2020-09-17 00:41:53 -07:00
Nick Gibson	204f985fc3	[NNC] Add simplification of Loop + Condition patterns. (#44764 ) Summary: Adds a new optimization to the IRSimplifier which changes this pattern: ``` for ... if ... do thing; ``` into: ``` if ... for ... do thing; ``` Which should be almost strictly better. There are many cases where this isn't safe to do, hence tests. Most obviously when the condition depends on something modified within the loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44764 Reviewed By: mruberry Differential Revision: D23734463 Pulled By: nickgg fbshipit-source-id: 51617e837de96b354fb702d0090ac65ddc523d36	2020-09-16 18:41:58 -07:00
Yanan Cao	6befc09465	Fix misuse of PyObject_IsSubclass (#44769 ) Summary: PyObject_IsSubclass may set python live exception bit if given object is not a class. `IsNamedTuple` is currently using it incorrectly, which may trip all following python operations in debug-build python. Normal release-build python is not affected because `assert` is no-op in release-build. Fixes https://github.com/pytorch/pytorch/issues/43577 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44769 Reviewed By: jamesr66a Differential Revision: D23725584 Pulled By: gmagogsfm fbshipit-source-id: 2dabd4f8667a045d5bf75813500876c6fd81542b	2020-09-16 16:19:01 -07:00
Mingzhe Li	574f9af160	[NCCL] Add option to run NCCL on high priority cuda stream (#43796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43796 This diff adds an option for the process group NCCL backend to pick high priority cuda streams. Test Plan: waitforsandcastle Reviewed By: jiayisuse Differential Revision: D23404286 fbshipit-source-id: b79ae097b7cd945a26e8ba1dd13ad3147ac790eb	2020-09-16 16:00:41 -07:00
Alex Suhan	7b3432caff	[TensorExpr] Support boolean in simplifier (#44659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44659 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ConstantFoldCastToBool Reviewed By: ngimel Differential Revision: D23714675 Pulled By: asuhan fbshipit-source-id: 4c18d972b628d5ad55bad58eddd5f6974e043d9c	2020-09-16 15:30:19 -07:00
Nick Gibson	82ab167cce	[NNC] Fix masking for all block and thread dimensions in CudaCodeGen (#44733 ) Summary: Unifies a number of partial solutions to the thread and block dimension extent masking, including the NoThreadIdxWriter and my last fix https://github.com/pytorch/pytorch/issues/44325. The NoThreadIdxWriter is gone in favour of tracking the current loop extents and masking any statements that have a lower rank than the launch parameters in any Block or Thread dimension, which handles both the "no" and "smaller" axis binding cases. For example it will transform the following: ``` for i in 0..10 // blockIdx.x for j in 0..10 // threadIdx.x do thing(i, j); for k in 0..5 // threadIdx.x do other thing(i, k); ``` Into: ``` do thing(blockIdx.x, threadIdx.x); if (threadIdx.x < 5) { do other thing(blockIdx.x, threadIdx.x); } ``` And handle the case where statements are not bound by any axis, eg. ``` do outer thing; for i in 0..10 // blockIdx.x for j in 0..10 // threadIdx.x do thing(i, j); do other thing(i); ``` will become: ``` if (blockIdx.x < 1) { if (threadIdx.x < 1) { do outer thing; } } syncthreads(); do thing(blockIdx.x, threadIdx.x); syncthreads(); if (threadIdx.x < 1) { do other thing(blockIdx.x); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44733 Reviewed By: mruberry Differential Revision: D23736878 Pulled By: nickgg fbshipit-source-id: 52d08626ae8043d53eb937843466874d479a6768	2020-09-16 14:23:47 -07:00
Yi Wang	f3bd984e44	Move the description comment of compute_bucket_assignment_by_size from cpp to the header file. (#44703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44703 The description of this public function should be in the header file. Also fix some typos. Test Plan: N/A. Reviewed By: pritamdamania87 Differential Revision: D23703661 fbshipit-source-id: 24ae63de9498e321b31dfb2efadb44183c6370df	2020-09-16 13:44:14 -07:00
Shen Li	924717bf51	Add _get_type() API to RRef (#44663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44663 The new API returns the type of the data object referenced by this `RRef`. On the owner, this is same as `type(rref.local_value())`. On a user, this will trigger an RPC to fetch the `type` object from the owner. After this function is run once, the `type` object is cached by the `RRef`, and subsequent invocations no longer trigger RPC. closes #33210 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23691990 Pulled By: mrshenli fbshipit-source-id: a2d87cd601a691dd75164b6bcd7315245e9cf6bd	2020-09-16 11:59:22 -07:00
Mikhail Zolotukhin	d66520ba08	[TensorExpr] Fuser: try merging adjacent fusion groups. (#43671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43671 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23360796 Pulled By: ZolotukhinM fbshipit-source-id: 60ec318fe77ae9f2c821d9c4d106281845266e0f	2020-09-15 21:31:02 -07:00
Ailing Zhang	fb085d90e3	Revert D23583017: move rebuild buckets from end of first iteration to beginning of second iteration Test Plan: revert-hammer Differential Revision: D23583017 (`f5d231d593`) Original commit changeset: ef67f79437a8 fbshipit-source-id: fd914b7565aba6a5574a32b31403525abb80ff07	2020-09-15 15:10:52 -07:00
Dmytro Dzhulgakov	2f4c31ce3a	[jit] Speed up saving in case of many classes (#44589 ) Summary: There's an annoying O(N^2) in module export logic that makes saving some of the models (if they have many classes) take eternity. I'm not super familiar with this code to properly untangle the deps and make it a pure hash lookup. So I just added a side lookup table for raw pointers. It's still quadratic, but it's O(num_classes^2) instead of O(num_classes * num_references) which already gives huge savings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44589 Test Plan: Tested with one of the offending models - just loading a saving a Torchscript file: ``` Before: load 1.9239683151245117 save 165.74712467193604 After: load 1.9409027099609375 save 1.4711427688598633 ``` Reviewed By: suo Differential Revision: D23675278 Pulled By: dzhulgakov fbshipit-source-id: 8f3fa7730941085ea20d9255b49a149ac1bf64fe	2020-09-15 15:10:45 -07:00
Nick Gibson	69839ea3f6	[NNC] make inlining immediate (take 3) (#44231 ) Summary: This is a reup https://github.com/pytorch/pytorch/issues/43885 with an extra commit which should fix the bugs that caused it to be reverted. Read that for general context. The issue here was that we were still using the side maps `tensor_to_stmt_` and `stmt_to_tensor_` which get invalidated by any transform of the IR (rather than just any transform that isn't computeInline). I added a comment about this but didn't actually address our usages of it. I've removed these maps and changed the `getLoopBodyFor` and `getLoopStatementsFor` helpers to search the root stmt directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44231 Reviewed By: albanD Differential Revision: D23689688 Pulled By: nickgg fbshipit-source-id: 1c6009a880f8c0cebf2300fd06b5cc9322bffbf9	2020-09-15 11:12:24 -07:00
Elias Ellison	8df0400a50	Fix fallback graph in specialize autogradzero (#44654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44654 Previously we weren't creating a fallback graph as intended in specialize autograd zero, so if a Tensor failed one of our undefinedness checks we would run the backward normally without reprofiling & optimizing. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23691764 Pulled By: eellison fbshipit-source-id: 10c6fa79518c84a6f5ef2bfbd9ea10843af751eb	2020-09-15 11:12:20 -07:00
Yanli Zhao	f5d231d593	move rebuild buckets from end of first iteration to beginning of second iteration (#44326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44326 Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration ghstack-source-id: 112011490 Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D23583017 fbshipit-source-id: ef67f79437a820d9b5699b651803622418499a83	2020-09-15 09:51:33 -07:00
Meghan Lele	e7d782e724	[JIT] Add property support for ScriptModules (#42390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42390 Summary This commit extends support for properties to include ScriptModules. Test Plan This commit adds a unit test that has a ScriptModule with a user-defined property. `python test/test_jit_py3.py TestScriptPy3.test_module_properties` Test Plan: Imported from OSS Reviewed By: eellison, mannatsingh Differential Revision: D22880298 Pulled By: SplitInfinity fbshipit-source-id: 74f6cb80f716084339e2151ca25092b6341a1560	2020-09-14 18:49:21 -07:00
Elias Ellison	551494b01d	[JIT] Fix torch.tensor for empty multidimensional-typed lists (#44652 ) Summary: We were hitting an assert error when you passed in an empty `List[List[int]]` - this fixes that error by not recursing into 0-element tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44652 Reviewed By: ZolotukhinM Differential Revision: D23688247 Pulled By: eellison fbshipit-source-id: d48ea24893044fae96bc39f76c0f1f9726eaf4c7	2020-09-14 17:28:23 -07:00
Mike Ruberry	686e281bcf	Updates div to perform true division (#42907 ) Summary: This PR: - updates div to perform true division - makes torch.true_divide an alias of torch.div This follows on work in previous PyTorch releases that first deprecated div performing "integer" or "floor" division, then prevented it by throwing a runtime error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42907 Reviewed By: ngimel Differential Revision: D23622114 Pulled By: mruberry fbshipit-source-id: 414c7e3c1a662a6c3c731ad99cc942507d843927	2020-09-14 15:50:38 -07:00
BowenBao	43406e218a	[ONNX] Update ONNX shape inference (#43929 ) Summary: * Support sequence type (de)serialization, enables onnx shape inference on sequence nodes. * Fix shape inference with block input/output: e.g. Loop and If nodes. * Fix bugs in symbolic discovered by coverage of onnx shape inference. * Improve debuggability: added more jit logs. For simplicity, the default log level, when jit log is enabled, will not dump ir graphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43929 Reviewed By: albanD Differential Revision: D23674604 Pulled By: bzinodev fbshipit-source-id: ab6aacb16d0e3b9a4708845bce27c6d65e567ba7	2020-09-14 15:36:19 -07:00
Alex Suhan	a188dbdf3f	Check for index-rank consistency in FunctionInliner (#44561 ) Summary: When caller / callee pairs are inserted into the mapping, verify that the arity of the buffer access is consistent with its declared rank. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44561 Test Plan: CI, test_tensorexpr --gtest_filter=TensorExprTest.DetectInlineRankMismatch Reviewed By: albanD Differential Revision: D23684342 Pulled By: asuhan fbshipit-source-id: dd3a0cdd4c2492853fa68381468e0ec037136cab	2020-09-14 14:07:22 -07:00
Zafar	9d4943daaf	[quant] conv_transpose1d / conv_transpose2d (#40370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40370 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22158979 Pulled By: z-a-f fbshipit-source-id: f5cb812c9953efa7608f06cf0188de447f73f358	2020-09-14 13:45:28 -07:00
Raghavan Raman	ad7a2eb1c9	Simplify nested Min and Max patterns. (#44142 ) Summary: Improve simplification of nested Min and Max patterns. Specifically, handles the following pattern simplications: * `Max(A, Max(A, Const)) => Max(A, Const)` * `Max(Min(A, B), Min(A, C)) => Min(A, Max(B, C))` * `Max(Const, Max(A, OtherConst) => Max(A, Max(Const, OtherConst))` - This case can have an arbitrarily long chain of Max ops. For example: `Max(5, Max(x, Max(y, Max(z, 8)))) => Max(Max(Max(x, 8), y), z)` Similarly, for the case of Min as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44142 Reviewed By: albanD Differential Revision: D23644486 Pulled By: navahgar fbshipit-source-id: 42bd241e6c2af820566744c8494e5dee172107f4	2020-09-14 13:24:46 -07:00
Bram Wasti	a475613d1d	[static runtime] Swap to out-variant compatible nodes (#44127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44127 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604306 Pulled By: bwasti fbshipit-source-id: 18ccfb9b466b822e28130be3d5c4fae36c76820b	2020-09-14 12:38:25 -07:00
Elias Ellison	856510c96d	[JIT] Dont optimize shape info in batch_mm (#44565 ) Summary: We run remove profile nodes and specialize types before batch_mm, so we cannot run peepholes on the type information of tensors since these properties have not been guarded to be guaranteed to be correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44565 Reviewed By: albanD Differential Revision: D23661538 Pulled By: eellison fbshipit-source-id: 0dd23a65714f047f49b4db4ec582b21870925fe1	2020-09-14 12:34:20 -07:00
Jeremy Lilley	7040a070e3	[torch] Minor: Avoid ostreamstring in Operator's canonicalSchemaString() (#44442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44442 I noticed lock contention on startup as lookupByLiteral() was calling registerPendingOperators() - some calls were holding the lock for 10+ ms, as operators were being registered. canonicalSchemaString() was using ostreamstring, which isn't typically particularly fast (partly because of c++ spec locale requirements). If we repalce with regular c++ string appends, it's somewhat faster (which isn't hard when comparing with stringstream; albeit a bit more codegen) Over the first minute or so, this cuts out 1.4 seconds under the OperatorRegistry lock (as part of registerPendingOperators) in the first couple minutes of run time (mostly front-loaded) when running sync sgd. As an example, before: registerPendingOperators 12688 usec for 2449 operators After: registerPendingOperators 6853 usec for 2449 operators ghstack-source-id: 111862971 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/... Reviewed By: ailzhang Differential Revision: D23614515 fbshipit-source-id: e712f9dac5bca0b1876e11fb8f0850402f03873a	2020-09-14 08:24:16 -07:00
Martin Yuan	7862827269	[pytorch] Add variadic run_method for lite intepreter (#44337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44337 Add a new run_method to mobile Module which is variadic (takes any number of arguments) to match full jit. ghstack-source-id: 111909068 Test Plan: Added new unit test to test_jit test suite Reviewed By: linbinyu, ann-ss Differential Revision: D23585763 fbshipit-source-id: 007cf852290f03615b78c35aa6f7a21287ccff9e	2020-09-13 13:26:30 -07:00
Mikhail Zolotukhin	bcf97b8986	[JIT] Cleanup some places where we log graphs in executors. (#44588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44588 1) SOURCE_DUMP crashes when invoked on a backward graph since `prim::GradOf` nodes can't be printed as sources (they don't have schema). 2) Dumping graph each time we execute an optimized plan produces lots of output in tests where we run the graph multiple times (e.g. benchmarks). Outputting that on the least level of verbosity seems like an overkill. 3) Duplicated log statement is removed. Differential Revision: D23666812 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: b9a30e34fd39c85f3e13c3f1e3594e157e1c130f	2020-09-13 11:31:02 -07:00
Mikhail Zolotukhin	82da6b3702	[JIT] Fix jit-log verbosity selection logic. (#44587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44587 Currently it's skewed by one. The following test demonstrates it: ``` $ cat test.py import torch def foo(a,b): return aab torch._C._jit_set_profiling_executor(True) torch._C._jit_set_profiling_mode(True) torch._C._jit_override_can_fuse_on_cpu(True) torch._C._jit_set_texpr_fuser_enabled(True) f = torch.jit.script(foo) for _ in range(10): f(torch.rand(10), torch.rand(10)) $ cat test_logging_levels.sh PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser" python test.py 2>&1 \| grep DUMP >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser" python test.py 2>&1 \| grep UPDATE >& /dev/null && echo FAIL \|\| echo OK PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser" python test.py 2>&1 \| grep DEBUG >& /dev/null && echo FAIL \|\| echo OK PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser" python test.py 2>&1 \| grep DUMP >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser" python test.py 2>&1 \| grep UPDATE >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser" python test.py 2>&1 \| grep DEBUG >& /dev/null && echo FAIL \|\| echo OK PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser" python test.py 2>&1 \| grep DUMP >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser" python test.py 2>&1 \| grep UPDATE >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser" python test.py 2>&1 \| grep DEBUG >& /dev/null && echo OK \|\| echo FAIL ``` Before this change: ``` OK FAIL OK OK OK FAIL OK OK OK ``` With this change everthing passes. Differential Revision: D23666813 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 4adaa5a3d06deadf54eae014a0d76588cdc5e20a	2020-09-13 11:29:25 -07:00
Bert Maher	6d4a605ce9	Fix bug simplifying if-then-else when it can be removed (#44462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44462 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23671157 Pulled By: bertmaher fbshipit-source-id: b9b92ad0de1a7bd9bc1fcac390b542d885d0ca58	2020-09-13 10:29:28 -07:00
Yi Wang	82b4477948	Pass the input tensor vector by const reference. (#44340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44340 Changed the constructor of GradBucket to pass the input by const reference and hence avoided unnecessary explicit move semantics. Since previously the declaration and definition are separated, passing the input tensor vector by value looks quite bizarre. Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest Reviewed By: pritamdamania87 Differential Revision: D23569939 fbshipit-source-id: db761d42e76bf938089a0b38e98e76a05bcf4162	2020-09-11 18:03:56 -07:00
Yi Wang	ab5fee2784	Move the inline implementations of GradBucket class to the header. (#44339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44339 Moved the inline implementations of GradBucket class to the header for succinctness and readability. This coding style is also consistent with reducer.h under the same directory. Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest Reviewed By: pritamdamania87 Differential Revision: D23569701 fbshipit-source-id: 237d9e2c5f63a6bcac829d0fcb4a5ba3bede75e5	2020-09-11 18:01:37 -07:00
Elias Ellison	1f0dcf39fc	[JIT] dont optimize device dtype on inline (#43363 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/36404 Adding prim::device and prim::dtype to list of skipped peepholes when we run inlining. In the long term another fix may not be to encode shape / dtype info on the traced graph, because it is not guaranteed to be correct. This is blocked by ONNX currently. Partial fix for https://github.com/pytorch/pytorch/issues/43134 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43363 Reviewed By: glaringlee Differential Revision: D23383987 Pulled By: eellison fbshipit-source-id: 2e9c5160d39d690046bd9904be979d58af8d3a20	2020-09-11 17:29:54 -07:00
Mikhail Zolotukhin	d729e2965e	[TensorExpr] Do not inline autodiff graphs if they contain prim::TypeCheck nodes. (#44564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44564 Before this change we sometimes inlined autodiff subgraph containing fusion groups. This happened because we didn't look for 'unsupported' nodes recursively (maybe we should), but fusion groups were inside if-nodes. The problem was detected by bertmaher in 'LearningToPaint' benchmark investigation where this bug caused us to keep constantly hitting fallback paths of the graph. Test Plan: Imported from OSS Reviewed By: bwasti Differential Revision: D23657049 Pulled By: ZolotukhinM fbshipit-source-id: 7c853424f6dce4b5c344d6cd9c467ee04a8f167e	2020-09-11 17:28:53 -07:00
Nick Gibson	64b4307d47	[NNC] Cuda Codegen - mask loops bound to block/thread dimensions (#44325 ) Summary: Fix an issue where loops of different sizes are bound to the same Cuda dimension / metavar. Coming soon more info and tests... Pull Request resolved: https://github.com/pytorch/pytorch/pull/44325 Reviewed By: colesbury Differential Revision: D23628859 Pulled By: nickgg fbshipit-source-id: 3621850a4cc38a790b62ad168d32e7a0e2462fad	2020-09-11 16:48:16 -07:00
Nikita Shulga	2ae74c0632	Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453 ) Summary: 2nd attempt to land https://github.com/pytorch/pytorch/pull/44079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453 Reviewed By: walterddr, seemethere Differential Revision: D23619528 Pulled By: malfet fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7	2020-09-11 16:27:47 -07:00
Zafar	1fb5883072	removing conv filters from conv pattern matching (#44512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44512 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23637409 Pulled By: z-a-f fbshipit-source-id: ad5be0fa6accfbcceaae9171bf529772d87b4098	2020-09-11 15:16:29 -07:00
Wanchao Liang	ab6126b50e	[rpc][jit] support remote call in TorchScript (#43046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43046 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23621108 Pulled By: wanchaol fbshipit-source-id: e8152c6cdd3831f32d72d46ac86ce22f3f13c651	2020-09-11 14:59:51 -07:00
Wanchao Liang	3e5df5f216	[rpc][jit] support rpc_sync in TorchScript (#43043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43043 This add the support for rpc_sync in TorchScript in a way similar to rpc_async Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23252039 Pulled By: wanchaol fbshipit-source-id: 8a05329cb8a24079b2863178b73087d47273914c	2020-09-11 14:59:47 -07:00
Gregory Chanan	5579b53a7f	Fix SmoothL1Loss when target.requires_grad is True. (#44486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44486 SmoothL1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True. This PR does the following: 1) adds derivative support for target via the normal derivatives.yaml route 2) kill the different (and incorrect) path for when target.requires_grad was True 3) modify the SmoothL1Loss CriterionTests to verify that the target derivative is checked. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23630699 Pulled By: gchanan fbshipit-source-id: 0f94d1a928002122d6b6875182867618e713a917	2020-09-11 12:13:36 -07:00
Cheng Chang	b7ef4eec46	[NNC] Add loop slicing transforms (#43854 ) Summary: Add new transforms `sliceHead` and `sliceTail` to `LoopNest`, for example: Before transformation: ``` for x in 0..10: A[x] = x2 ``` After `sliceHead(x, 4)`: ``` for x in 0..4: A[x] = x2 for x in 4..10: A[x] = x2 ``` After `sliceTail(x, 1)`: ``` for x in 0..4: A[x] = x2 for x in 4..9: A[x] = x2 for x in 9..10: A[x] = x2 ``` `sliceHead(x, 10)` and `sliceTail(x, 10)` is no-op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43854 Test Plan: Tests are added in `test_loopnest.cpp`, the tests cover the basic transformations, and also tests the combination with other transformations such as `splitWithTail`. Reviewed By: nickgg Differential Revision: D23417366 Pulled By: cheng-chang fbshipit-source-id: 06c6348285f2bafb4be3286d1642bfbe1ea499bf	2020-09-11 12:09:12 -07:00
Ann Shan	442957d8b6	[pytorch] Remove mobile nonvariadic run_method (#44235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44235 Removes nonvariadic run_method() from mobile Module entirely (to be later replaced by a variadic version). All use cases should have been migrated to use get_method() and Method::operator() in D23436351 ghstack-source-id: 111848220 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D23484577 fbshipit-source-id: 602fcde61e13047a34915b509da048b9550103b1	2020-09-11 10:23:08 -07:00
Ann Shan	a61318a535	[pytorch] Replace mobile run_method with get_method and operator() (#44202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202 In preparation for changing mobile run_method() to be variadic, this diff: * Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist. * Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects. ghstack-source-id: 111848222 Test Plan: CI, and all the unit tests which currently contain run_method that are being changed. Reviewed By: iseeyuan Differential Revision: D23436351 fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577	2020-09-11 10:23:06 -07:00
Martin Yuan	b73b44f976	[PyTorch Mobile] Move some string ops to register_prim_ops.cpp and make them selective (#44500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44500 Some user models are using those operators. Unblock them while keep the ops selective. Test Plan: CI Reviewed By: linbinyu Differential Revision: D23634769 fbshipit-source-id: 55841d1b07136b6a27b6a39342f321638dc508cd	2020-09-11 09:24:35 -07:00
Gregory Chanan	d07d25a8c5	Fix MSELoss when target.requires_grad is True. (#44437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44437 MSELoss had a completely different (and incorrect, see https://github.com/pytorch/pytorch/issues/43228) path when target.requires_grad was True. This PR does the following: 1) adds derivative support for target via the normal derivatives.yaml route 2) kill the different (and incorrect) path for when target.requires_grad was True 3) modify the MSELoss CriterionTests to verify that the target derivative is checked. TODO: 1) do we still need check_criterion_jacobian when we run grad/gradgrad checks? 2) ensure the Module tests check when target.requires_grad 3) do we actually test when reduction='none' and reduction='mean'? Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23612166 Pulled By: gchanan fbshipit-source-id: 4f74d38d8a81063c74e002e07fbb7837b2172a10	2020-09-11 08:51:28 -07:00
Shen Li	a9754fb860	Use TP Tensor.metadata to carry device info (#44396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44396 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D23602576 Pulled By: mrshenli fbshipit-source-id: c639789979b2b71fc165efbcf70f37b4c39469df	2020-09-11 08:33:22 -07:00
lixinyu	77cc7d1ecd	C++ APIs Transformer NN Module Top Layer (#44333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44333 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23584010 Pulled By: glaringlee fbshipit-source-id: 990026e3f1b5ae276776e344ea981386cb7528fe	2020-09-11 08:25:27 -07:00
Nick Gibson	30fccc53a9	[NNC] Don't attempt to refactor conditional scalars (#44223 ) Summary: Fixes a bug in the NNC registerizer for Cuda where it would hoist reads out of a conditional context when trying to cache them. As a quick fix, prevent scalar replacement if a usage is within a condition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44223 Reviewed By: gchanan Differential Revision: D23551247 Pulled By: nickgg fbshipit-source-id: 17a7bf2be4c8c3dd8a9ab7997dce9aea200c3685	2020-09-11 04:22:16 -07:00
Elias Ellison	8b8986662f	[JIT] Remove profiling nodes in autodiff forward graph (#44420 ) Summary: Previously we were not removing profiling nodes in graphs that required grad and contained diff graphs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44420 Reviewed By: bertmaher Differential Revision: D23607482 Pulled By: eellison fbshipit-source-id: af095f3ed8bb3c5d09610f38cc7d1481cbbd2613	2020-09-11 02:59:39 -07:00
Mikhail Zolotukhin	c6febc6480	[JIT] Add a python hook for a function to interpret JIT graphs. (#44493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44493 This function allows to execute a graph exactly as it is, without going through a graph executor which would run passes on the graph before interpreting it. I found this feature extremely helpful when I worked on a stress-testing script to shake out bugs from the TE fuser: I needed to execute a very specific set of passes on a graph and nothing else, and then execute exactly it. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23632505 Pulled By: ZolotukhinM fbshipit-source-id: ea81fc838933743e2057312d3156b77284d832ef	2020-09-11 02:55:26 -07:00
Pritam Damania	51ed31269e	Replace FutureMessage with c10::ivalue::Future in DistEngine. (#44239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44239 As part of https://github.com/pytorch/pytorch/issues/41574, use c10::ivalue::Future everywhere in DistEngine. ghstack-source-id: 111645070 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D23553507 fbshipit-source-id: 1b51ba13d1ebfa6c5c70b12028e9e96ce8ba51ff	2020-09-11 01:03:42 -07:00
Richard Zou	69f6d94caa	Register diag_backward, diagonal_backward, infinitetely...gelu_backward as operators (#44422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44422 See #44052 for context. Test Plan: - `pytest test/test_autograd.py -v` - `pytest test/test_nn.py -v` Reviewed By: mrshenli Differential Revision: D23607691 Pulled By: zou3519 fbshipit-source-id: 09fbcd66b877af4fa85fd9b2f851ed3912ce84d6	2020-09-10 18:43:18 -07:00
Richard Zou	7ff7e6cfc8	Register cummaxmin_backward, cumprod_backward as operators (#44410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44410 See #44052 for context. One of the cumprod_backward overloads was unused so I just deleted it. Test Plan: - `pytest test/test_autograd.py -v` Reviewed By: mrshenli Differential Revision: D23605503 Pulled By: zou3519 fbshipit-source-id: f9c5b595e62d2d6e71f26580ba96df15cc9de4f7	2020-09-10 18:43:15 -07:00
Richard Zou	08b431f54c	Add trace_backward, masked_select_backward, and take_backward as ops (#44408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44408 See #44052 for context. Test Plan: - `pytest test/test_autograd.py -v` Reviewed By: mrshenli Differential Revision: D23605504 Pulled By: zou3519 fbshipit-source-id: b9b1646d13caa6e536d08669c29bfc2ad8ff89a3	2020-09-10 18:41:07 -07:00
Ann Shan	1dd3fae3d2	[pytorch] Add logging to mobile Method run (#44234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44234 Changes mobile Method to point to a mobile Module directly instead of the Module ivalue in order to access metadata for logging/debugging, and then adds said logging. ghstack-source-id: 111775806 Test Plan: CI/existing unit tests to test BC Testing fb4a logging: Built fb4a on D23436351 (because usage of run_method isn't replaced yet in this diff), and then checked the Scuba logs to see that the appropriate ad clicks were logged (one ad for Buzzfeed shopping and another about Netflix from Bustle) {F328510687} {F328511201} [Scuba sample of QPL metrics](https://www.internalfb.com/intern/scuba/query/?dataset=qpl_metrics%2Fpytorch_employee&pool=uber&view=samples_client&drillstate=%7B%22sampleCols%22%3A[%22device_model%22%2C%22instance_id_sampled%22%2C%22method%22%2C%22ios_device_class%22%2C%22points_path%22%2C%22userid_sampled%22%2C%22client_sample_rate%22%2C%22browser_name%22%2C%22ios_device_name%22%2C%22points%22%2C%22is_employee%22%2C%22is_test_user%22%2C%22network_only_queries%22%2C%22annotations%22%2C%22oncall_shortname%22%2C%22environment_tags%22%2C%22revoked_queries%22%2C%22annotations_bool%22%2C%22points_data%22%2C%22annotations_double_array%22%2C%22annotations_string_array%22%2C%22revoked_steps%22%2C%22points_set%22%2C%22device_os_version%22%2C%22ota_version_rollout%22%2C%22steps%22%2C%22vadar_calculation_result%22%2C%22app_name%22%2C%22client_push_phase%22%2C%22vadar%22%2C%22release_channel%22%2C%22interaction_class%22%2C%22exposures%22%2C%22annotations_double%22%2C%22deviceid_sampled%22%2C%22is_logged_in%22%2C%22device_os%22%2C%22time%22%2C%22major_os_ver%22%2C%22annotations_int_array%22%2C%22duration_ns%22%2C%22app_build%22%2C%22bucket_id%22%2C%22cache_and_network_queries%22%2C%22value%22%2C%22vadar_v2%22%2C%22quicklog_event%22%2C%22unixname%22%2C%22vadar_calculation_result_v2%22%2C%22trace_tags%22%2C%22annotations_int%22%2C%22quicklog_module%22%2C%22push_phase%22%2C%22year_class%22%2C%22country%22%2C%22capped_duration%22%2C%22ram_class%22%2C%22weight%22%2C%22carrier%22%2C%22app_id%22%2C%22app_version%22%2C%22react_bundle_version%22%2C%22logging_source%22%2C%22is_unsampled_for_scuba%22%2C%22instrumentation_errors%22%2C%22android_cpu_abi_list%22%2C%22days_after_release%22%2C%22cpu_cores%22%2C%22user_bucket%22%2C%22quicklog_action%22%2C%22server_scuba_sample_rate%22%2C%22points_vector%22%2C%22annotations_bool_array%22%2C%22android_device_class%22%2C%22browser_full_version%22%2C%22major_app_ver%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22hideEmptyColumns%22%3Afalse%2C%22focused_event%22%3A%22%22%2C%22show_metadata%22%3A%22false%22%2C%22start%22%3A%222020-09-08%2011%3A27%3A00%22%2C%22end%22%3A%22start%20%2B%201%20minute%22%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22samplingRatio%22%3A%221%22%2C%22num_samples%22%3A%22100%22%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[]%2C%22modifiers%22%3A[]%2C%22order%22%3A%22none%22%2C%22order_desc%22%3Atrue%2C%22filterMode%22%3A%22DEFAULT%22%2C%22constraints%22%3A[[%7B%22column%22%3A%22quicklog_event%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22MOBILE_MODULE_STATS%5C%22]%22]%7D%2C%7B%22column%22%3A%22userid_sampled%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22100013484978975%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22metrik_view_params%22%3A%7B%22should_use_legacy_colors%22%3Afalse%2C%22columns_skip_formatting%22%3A[]%2C%22view%22%3A%22samples_client%22%2C%22width%22%3A%221358%22%2C%22height%22%3A%22912%22%2C%22tableID%22%3A%22qpl_metrics%2Fpytorch_employee%22%2C%22fitToContent%22%3Afalse%2C%22format_tooltip_in_percent%22%3Afalse%2C%22use_y_axis_hints_as_limits%22%3Atrue%2C%22has_dynamic_context_menu%22%3Atrue%2C%22has_context_menu%22%3Afalse%2C%22legend_mode%22%3A%22nongrid%22%2C%22connect_nulls%22%3Atrue%2C%22timezone_offset%22%3A420%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22y_min_hint%22%3A0%2C%22should_render_plugins_menu%22%3Afalse%7D%7D&normalized=1599581160) [Scuba sample showing ad source; just the bottom two results](https://www.internalfb.com/intern/scuba/query/?dataset=business_integrity_webpage_semantic&pool=uber&drillstate=%7B%22sampleCols%22%3A[%22from_custom_sampling%22%2C%22data_version%22%2C%22scribe_category_type%22%2C%22page_id%22%2C%22name%22%2C%22source_url%22%2C%22time%22%2C%22title_semantic%22%2C%22major_version%22%2C%22server_protocol%22%2C%22custom_sampling_enabled%22%2C%22ad_id%22%2C%22appversion%22%2C%22clienttime%22%2C%22isemployee%22%2C%22title%22%2C%22images%22%2C%22weight%22%2C%22carrier%22%2C%22is_ad%22%2C%22locale%22%2C%22appid%22%2C%22ip_country%22%2C%22iab_models%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22return_remainder%22%3Afalse%2C%22should_pivot%22%3Afalse%2C%22is_timeseries%22%3Afalse%2C%22hideEmptyColumns%22%3Afalse%2C%22main_dimension%22%3A%22time%22%2C%22start%22%3A%22-5%20minutes%22%2C%22samplingRatio%22%3A%221%22%2C%22compare%22%3A%22none%22%2C%22axes%22%3A%22linked%22%2C%22overlay_types%22%3A[]%2C%22minBucketSamples%22%3A%22%22%2C%22dimensions%22%3A[]%2C%22scale_type%22%3A%22absolute%22%2C%22num_samples%22%3A%22100%22%2C%22metric%22%3A%22avg%22%2C%22fill_missing_buckets%22%3A%22connect%22%2C%22smoothing_bucket%22%3A%221%22%2C%22top%22%3A%227%22%2C%22markers%22%3A%22%22%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22end%22%3A%22now%22%2C%22show_p95_ci%22%3Afalse%2C%22time_bucket%22%3A%22auto%22%2C%22compare_mode%22%3A%22normal%22%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[]%2C%22modifiers%22%3A[]%2C%22order%22%3A%22none%22%2C%22order_desc%22%3Atrue%2C%22filterMode%22%3A%22DEFAULT%22%2C%22constraints%22%3A[[%7B%22column%22%3A%22major_version%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22288%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22metrik_view_params%22%3A%7B%22should_use_legacy_colors%22%3Afalse%2C%22columns_skip_formatting%22%3A[]%2C%22view%22%3A%22time_view%22%2C%22width%22%3A%221358%22%2C%22height%22%3A%22912%22%2C%22tableID%22%3A%22business_integrity_webpage_semantic%22%2C%22fitToContent%22%3Afalse%2C%22format_tooltip_in_percent%22%3Afalse%2C%22use_y_axis_hints_as_limits%22%3Atrue%2C%22has_dynamic_context_menu%22%3Atrue%2C%22has_context_menu%22%3Afalse%2C%22legend_mode%22%3A%22nongrid%22%2C%22connect_nulls%22%3Atrue%2C%22timezone_offset%22%3A420%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22y_min_hint%22%3A0%2C%22should_render_plugins_menu%22%3Afalse%7D%7D&view=samples_client&normalized=1599587280) Reviewed By: iseeyuan Differential Revision: D23548687 fbshipit-source-id: 3e63085663f5fd8de90a4c7dbad0a17947aee973	2020-09-10 15:26:33 -07:00
Nikita Shulga	4bead6438a	Enable torch.autograd typechecks (#44451 ) Summary: To help with further typing, move dynamically added native contributions from `torch.autograd` to `torch._C._autograd` Fix invalid error handling pattern in `89ac30afb8/torch/csrc/autograd/init.cpp (L13-L15)` `PyImport_ImportModule` already raises Python exception and nullptr should be returned to properly propagate the to Python runtime. And all native methods/types in `torch/autograd/__init.py` after `torch._C._init_autograd()` has been called Use f-strings instead of `.format` in test_type_hints.py Fixes https://github.com/pytorch/pytorch/issues/44450 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44451 Reviewed By: ezyang Differential Revision: D23618261 Pulled By: malfet fbshipit-source-id: fa5f739d7cff8410641128b55b810318c5f636ae	2020-09-10 13:37:29 -07:00
Elias Ellison	cc5a1cf616	[JIT] Erase shapes before fallback graph (#44434 ) Summary: Previously the specialized types were copied over to the fallback function, although the tensors in the fallback type were not of that type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44434 Reviewed By: SplitInfinity Differential Revision: D23611943 Pulled By: eellison fbshipit-source-id: 2ea88a97529409f6c5c4c1f59a14b623524933de	2020-09-10 12:07:31 -07:00
Kenichi Maehashi	cb90fef770	Fix return value of PyErr_WarnEx ignored (SystemError) (#44371 ) Summary: This PR fixes unexpected `SystemError` when warnings are emitted and warning filters are set. ## Current behavior ``` $ python -Werror >>> import torch >>> torch.range(1, 3) UserWarning: torch.range is deprecated in favor of torch.arange and will be removed in 0.5. Note that arange generates values in [start; end), not [start; end]. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> SystemError: <built-in method range of type object at 0x7f38c7703a60> returned a result with an error set ``` ## Expected behavior ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end). ``` ## Note Python exception must be raised if `PyErr_WarnEx` returns `-1` ([python docs](https://docs.python.org/3/c-api/exceptions.html#issuing-warnings)). This PR fixes warnings raised in the following code: ```py import torch torch.range(1, 3) torch.autograd.Variable().volatile torch.autograd.Variable().volatile = True torch.tensor(torch.tensor([])) torch.tensor([]).new_tensor(torch.tensor([])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44371 Reviewed By: mrshenli Differential Revision: D23598410 Pulled By: albanD fbshipit-source-id: 2fbcb13fe4025dbebaf1fd837d4c8e0944e05010	2020-09-10 10:15:21 -07:00
generatedunixname89002005287564	356aa54694	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23621463 fbshipit-source-id: 1cd7e94e480c7073c9a0aad55aeba98de4b96164	2020-09-10 04:24:43 -07:00
Cheng Chang	28bd4929bd	[NNC] Make it able to normalize loop with variable start (#44133 ) Summary: Loops with variable start can also be normalized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44133 Test Plan: updated testNormalizeStartVariable. Reviewed By: navahgar Differential Revision: D23507097 Pulled By: cheng-chang fbshipit-source-id: 4e9aad1cd4f4a839f59a00bf8ddf97637a1a6648	2020-09-09 23:05:57 -07:00
Meghan Lele	d3b6d5caf1	[JIT] Add support for del to TS classes (#44352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44352 Summary This commit adds support for `del` with class instances. If a class implements `__delitem__`, then `del class_instance[key]` is syntactic sugar for `class_instance.__delitem__[key]`. Test Plan This commit adds a unit test to TestClassTypes to test this feature. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23603102 Pulled By: SplitInfinity fbshipit-source-id: 28ad26ddc9a693a58a6c48a0e853a1c7cf5c9fd6	2020-09-09 19:52:35 -07:00
Elias Ellison	b69c28d02c	Improving ModuleList indexing error msg (#43361 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/41946/, to suggest enumerating a module as an alternative if a user tries indexing into a modulelist/sequential with a non-integer literal Pull Request resolved: https://github.com/pytorch/pytorch/pull/43361 Reviewed By: mrshenli Differential Revision: D23602388 Pulled By: eellison fbshipit-source-id: 51fa28d5bc45720529b3d45e92d367ee6c9e3316	2020-09-09 16:22:57 -07:00
Lillian Johnson	b0bcdbb1ab	[JIT] Support partially specified sizes/strides in IRParser (#44113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44113 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23508149 Pulled By: Lilyjjo fbshipit-source-id: b6b2d32109fae599bc5347dae742b67a2e4a0a49	2020-09-09 14:45:51 -07:00
Yuchen Huang	a00d36b0e7	[PyTorch][Mobile] Insert the module name as `name()` to metadata dict if metadata doesn't contain "model_name" (#44400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44400 This diff does the identical thing as D23549149 (`398409f072`) does. A fix included for OSS CI: pytorch_windows_vs2019_py36_cuda10.1_test1 ghstack-source-id: 111679745 Test Plan: - CI - OSS CI Reviewed By: xcheng16 Differential Revision: D23601050 fbshipit-source-id: 8ebdcd8fdc5865078889b54b0baeb397a90ddc40	2020-09-09 13:01:17 -07:00
Ailing Zhang	24efd29d19	Check commutativity for computed dispatch table and add a test to check entries. (#44088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44088 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23492793 Pulled By: ailzhang fbshipit-source-id: 37502f2a8a4d755219b400fcbb029e49d6cdb6e9	2020-09-09 12:48:34 -07:00
Nikita Shulga	683380fc91	Use compile time cudnn version if linking with it statically (#44402 ) Summary: This should prevent torch_python from linking the entire cudnn library statically just to query its version Pull Request resolved: https://github.com/pytorch/pytorch/pull/44402 Reviewed By: seemethere Differential Revision: D23602720 Pulled By: malfet fbshipit-source-id: 185b15b789bd48b1df178120801d140ea54ba569	2020-09-09 11:33:41 -07:00
Bert Maher	6ec8fabc29	Fix frac in CUDA fuser (#44152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44152 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23528506 fbshipit-source-id: bfd468d72fa55ce317f88ae83e1f2d5eee041aa0	2020-09-09 11:10:08 -07:00
Bert Maher	350130a69d	Prevent the TE fuser from getting datatypes it can't handle (#44160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44160 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D23528508 Pulled By: bertmaher fbshipit-source-id: 03b22725fb2666f441cb504b35397ea6d155bb85	2020-09-09 11:10:04 -07:00
Bert Maher	960c088a58	[te] Fix casting of unsigned char, and abs(int) (#44157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44157 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D23528507 Pulled By: bertmaher fbshipit-source-id: c5ef0422a91a4665b616601bed8b7cd137be39f9	2020-09-09 11:08:36 -07:00
Bert Maher	8acce55015	Dump optimized graph when logging in already-optimized PE (#44315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44315 I find it more intuitive to dump the optimized graph if we have one; when I first saw the unoptimized graph being dumped I thought we had failed to apply any optimizations. Test Plan: Observe output by hand Reviewed By: Lilyjjo Differential Revision: D23578813 Pulled By: bertmaher fbshipit-source-id: e2161189fb0e1cd53aae980a153aea610871662a	2020-09-09 01:28:48 -07:00
Taewook Oh	7a64b0c27a	Export Node::isBefore/isAfter for PythonAPI (#44162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44162 This diff exports Node::isBefore/isAfter method to PythonAPI. Test Plan: Tested locally. Please let me know if there is a set of unit tests to be passed. Reviewed By: soumith Differential Revision: D23514448 fbshipit-source-id: 7ef709b036370217ffebef52fd93fbd68c464e89	2020-09-09 00:57:08 -07:00
Mikhail Zolotukhin	bd8e38cd88	[TensorExpr] Fuser: check node inputs' device before merging the node into a fusion group. (#44241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44241 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23554192 Pulled By: ZolotukhinM fbshipit-source-id: fb03262520303152b83671603e08e7aecc24f5f2	2020-09-08 19:32:23 -07:00
Nick Gibson	be94dba429	[NNC] fix support for FP16 in CudaCodgen (#44209 ) Summary: Fixes a bug where FP16 values could be incorrectly cast to a half type that doesn't have a cast operator by inserting the cuda specific cast to float during handling of the Cast node, not as a wrapper around printing Loads and Stores. Two main changes: the HalfChecker now inserts the casts to float explicitly in the IR, and the PrioritizeLoad mutator now consumes both Loads and a Cast which immediately preceded a load. Tested with test_jit_fuser_te.py and test_tensorexpr.py, plus C++ tests obv. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44209 Reviewed By: izdeby Differential Revision: D23575577 Pulled By: nickgg fbshipit-source-id: 808605aeb2af812758f96f9fdc11b07e08053b46	2020-09-08 18:00:39 -07:00
Sujoy Saraswati	54931ebb7b	Release saved variable from DifferentiableGraphBackward (#42994 ) Summary: When the backward ops execute via the autograd engine evaluate_function(), the fn.release_variables() is called to release the SavedVariables. For the eager mode ops, this releases the saved inputs that was required for backward grad function. However, with TorchScript, we get a DifferentableGraph and the DifferentiableGraphBackward() doesn't implement a release_variables(). This leads to the SavedVariables to be alive longer. Implement release_variables() for DifferentiableGraphBackward to release these SavedVariables early. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42994 Reviewed By: izdeby Differential Revision: D23503172 Pulled By: albanD fbshipit-source-id: d87127498cfa72883ae6bb31d0e6c7056c4c36d4	2020-09-08 14:36:52 -07:00
Ailing Zhang	1b2da9ed82	Expose alias key info in dumpState and update test_dispatch. (#44081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44081 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23492794 Pulled By: ailzhang fbshipit-source-id: 27a2978591900463bda2e92e0201c9fd719f9792	2020-09-06 18:43:05 -07:00
Mike Ruberry	83a6e7d342	Adds inequality testing aliases for better NumPy compatibility (#43870 ) Summary: This PR adds the following aliaes: - not_equal for torch.ne - greater for torch.gt - greater_equal for torch.ge - less for torch.lt - less_equal for torch.le This aliases are consistent with NumPy's naming for these functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43870 Reviewed By: zou3519 Differential Revision: D23498975 Pulled By: mruberry fbshipit-source-id: 78560df98c9f7747e804a420c1e53fd1dd225002	2020-09-06 09:36:23 -07:00
Nikita Shulga	e358d516c8	Revert D23549149: [PyTorch][Mobile] Insert the module name as `name()` to metadata dict if metadata doesn't contain "model_name" Test Plan: revert-hammer Differential Revision: D23549149 (`398409f072`) Original commit changeset: fad742a8d4e6 fbshipit-source-id: bd92a2033a804d3e6a2747b4fda4ca527991a993	2020-09-06 00:06:35 -07:00
Supriya Rao	199c73be0f	[quant][pyper] Support quantization of ops in fork-wait subgraph (#44048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44048 Inline the fork-wait calls to make sure we can see the ops to be quantized in the main graph Also fix the InlineForkWait JIT pass to account for the case where the aten::wait call isn't present in the main graph and we return future tensor from subgraph Example ``` graph(%self.1 : __torch__.dper3.core.interop.___torch_mangle_6325.DperModuleWrapper, %argument_1.1 : Tensor, %argument_2.1 : Tensor): %3 : Future[Tensor[]] = prim::fork_0(%self.1, %argument_1.1, %argument_2.1) # :0:0 return (%3) with prim::fork_0 = graph(%self.1 : __torch__.dper3.core.interop.___torch_mangle_5396.DperModuleWrapper, %argument_1.1 : Tensor, %argument_2.1 : Tensor): %3 : __torch__.dper3.core.interop.___torch_mangle_6330.DperModuleWrapper = prim::GetAttr[name="x"](%self.1) %4 : __torch__.dper3.core.interop.___torch_mangle_5397.DperModuleWrapper = prim::GetAttr[name="y"](%self.1) %5 : __torch__.dper3.core.interop.___torch_mangle_6327.DperModuleWrapper = prim::GetAttr[name="z"](%4) %6 : Tensor = prim::CallMethod[name="forward"](%5, %argument_1.1, %argument_2.1) # :0:0 %7 : None = prim::CallMethod[name="forward"](%3, %6) # :0:0 %8 : Tensor[] = prim::ListConstruct(%6) return (%8) ``` Test Plan: python test/test_quantization.py test_interface_with_fork Imported from OSS Reviewed By: vkuzo Differential Revision: D23481003 fbshipit-source-id: 2e756be73c248319da38e053f021888b40593032	2020-09-05 12:06:19 -07:00
Supriya Rao	164b96c34c	[quant][pyper] make embedding_bag quantization static (#44008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44008 embedding_bag requires only quantization of weights (no dynamic quantization of inputs) So the type of quantization is essentially static (without calibration) This will enable pyper to do fc and embedding_bag quantization using the same API call Test Plan: python test/test_quantization.py test_embedding_bag Imported from OSS Reviewed By: vkuzo Differential Revision: D23467019 fbshipit-source-id: 41a61a17ee34bcb737ba5b4e19fb7a576d4aeaf9	2020-09-05 12:06:16 -07:00
Supriya Rao	a0ae416d60	[quant] Support aten::embedding_bag quantization in graph mode (#43989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43989 When we trace the model it produces aten::embedding_bag node in the graph, Add necessary passes in graph mode to help support quantizing it as well Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Imported from OSS Reviewed By: vkuzo Differential Revision: D23460485 fbshipit-source-id: 328c5e1816cfebb10ba951113f657665b6d17575	2020-09-05 12:05:06 -07:00
Yi Wang	15a7368115	Add const to getTensors method of GradBucket. (#44126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44126 Add const to getTensors method of GradBucket. Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest Reviewed By: sinannasir, jiayisuse Differential Revision: D23504088 fbshipit-source-id: 427d9591042e0c03cde02629c1146ff1e5e027f9	2020-09-05 09:19:42 -07:00
Elias Ellison	5bd2902796	[JIT] Remove references to no longer generated _tanh_backward and _sigmoid_backward (#44138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44138 If you look at the sigmoid and tanh backward they are composed of other ops: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L786 https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L164 So tanh_backward and sigmoid_backward are no longer generated / legacy ops. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23543603 Pulled By: eellison fbshipit-source-id: ce8353e53043cf969b536aac47c9576d66d4ce02	2020-09-05 01:41:36 -07:00
Elias Ellison	df67f0beab	[TensorExpr fuser] Guard nodes that have tensor output properties determined by non-tensor inputs (#44137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44137 We only insert guards on Tensor types, so we rely on the output of a node being uniquely determined by its input types. bail if any non-Tensor input affects the output type and cannot be reasoned about statically Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23543602 Pulled By: eellison fbshipit-source-id: abd6fe0b1fd7fe6fc251694d4cd442b19c032dd7	2020-09-05 01:40:18 -07:00
Wanchao Liang	d07a36e0c1	Revert D23490149: [pytorch][PR] Compile less legacy code when BUILD_CAFFE2 is set to False Test Plan: revert-hammer Differential Revision: D23490149 (`15e99b6ff6`) Original commit changeset: a76382c30d83 fbshipit-source-id: 75057fa9af2c19eb976962552118bf0a99911b38	2020-09-04 22:59:39 -07:00
Vasiliy Kuznetsov	a940f5ea5d	torchscript graph mode quant: remove benchmark filter (#44165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44165 Allows convolutions to be quantized if `torch.cudnn.backends.benchmark` flag was set. Not for land yet, just testing. Test Plan: in the gist below, the resulting graph now has quantized convolutions https://gist.github.com/vkuzo/622213cb12faa0996b6700b08d6ab2f0 Imported from OSS Reviewed By: supriyar Differential Revision: D23518775 fbshipit-source-id: 294f678c6afbd3feeb89b7a6655bc66ac9f8bfbc	2020-09-04 21:25:35 -07:00
Yuchen Huang	398409f072	[PyTorch][Mobile] Insert the module name as `name()` to metadata dict if metadata doesn't contain "model_name" (#44227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44227 As title ghstack-source-id: 111490242 Test Plan: CI Reviewed By: xcheng16 Differential Revision: D23549149 fbshipit-source-id: fad742a8d4e6f844f83495514cd60ff2bf0d5bcb	2020-09-04 21:18:12 -07:00
Nikita Shulga	15e99b6ff6	Compile less legacy code when BUILD_CAFFE2 is set to False (#44079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44079 Reviewed By: walterddr Differential Revision: D23490149 Pulled By: malfet fbshipit-source-id: a76382c30d83127d180ec63ac15093a7297aae53	2020-09-04 20:04:21 -07:00
neginraoof	3d7c22a2ce	[ONNX] Enable new scripting passes for functionalization and remove_mutation (#43791 ) Summary: Duplicate of https://github.com/pytorch/pytorch/issues/41413 This PR initiates the process of updating the torchsciprt backend interface used by ONNX exporter. Replace jit lower graph pass by freeze module pass Enable ScriptModule tests for ONNX operator tests (ORT backend) and model tests by default. Replace jit remove_inplace_ops pass with remove_mutation and consolidation all passes for handling inplace ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43791 Reviewed By: houseroad Differential Revision: D23421872 Pulled By: bzinodev fbshipit-source-id: a98710c45ee905748ec58385e2a232de2486331b	2020-09-04 15:21:45 -07:00
Mikhail Zolotukhin	6474057c76	Revert D23503636: [pytorch][PR] [NNC] make inlining immediate (take 2) and fix bugs Test Plan: revert-hammer Differential Revision: D23503636 (`70aecd2a7f`) Original commit changeset: cdbdc902b7a1 fbshipit-source-id: b5164835f874a56213de4bed9ad690164eae9230	2020-09-04 10:58:23 -07:00
Richard Zou	9a5a732866	Register some backwards functions as operators (#44052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44052 Summary ======= This PR registers the following backwards functions as operators: - slice_backward - select_backward - gather_backward - index_select_backward (the backward function for index_select) - select_index_backward (prevously known as index_select_backward, but is actually the backward function for max.dim, min.dim, etc) In the future, I'd like to register more backward functions as operators so that we can write batching rules for the backward functions. Batching rules for backward functions makes it so that we can compute batched gradients. Motivation ========== The rationale behind this PR is that a lot of backwards functions (27 in total) are incompatible with BatchedTensor due to using in-place operations. Sometimes we can allow the in-place operations, but other times we can't. For example, consider select_backward: ``` Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) { auto grad_input = at::zeros(input_sizes, grad.options()); grad_input.select(dim, index).copy_(grad); return grad_input; } ``` and consider the following code: ``` x = torch.randn(5, requires_grad=True) def select_grad(v): torch.autograd.grad(x[0], x, v) vs = torch.randn(B0) batched_grads = vmap(select_grad)(vs) ``` For the batched gradient use case, `grad` is a BatchedTensor. The physical version of `grad` has size `(B0,)`. However, select_backward creates a `grad_input` of shape `(5)`, and tries to copy `grad` to a slice of it. Other approaches ================ I've considered the following: - register select_backward as an operator (this PR) - have a branch inside select_backward for if `grad` is batched. - this is OK, but what if we have more tensor extensions that want to override this? - modify select_backward to work with BatchedTensor, by creating a new operator for the "select + copy_ behavior". - select + copy_ isn't used elsewhere in derivative formulas so this doesn't seem useful Test Plan ========= - `pytest test/test_autograd.py -v` - Registering backward functions may impact performance. I benchmarked select_backward to see if registering it as an operator led to any noticable performance overheads: https://gist.github.com/zou3519/56d6cb53775649047b0e66de6f0007dc. The TL;DR is that the overhead is pretty minimal. Test Plan: Imported from OSS Reviewed By: ezyang, fbhuba Differential Revision: D23481183 Pulled By: zou3519 fbshipit-source-id: 125af62eb95824626dc83d06bbc513262ee27350	2020-09-04 08:30:39 -07:00
generatedunixname89002005287564	ef28ee50b0	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23536086 fbshipit-source-id: 56e9c70a6998086515f59d74c5d8a2280ac2f669	2020-09-04 03:33:32 -07:00
Bert Maher	98ad5ff41f	[te] Disable reductions by default (#44122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44122 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D23504769 Pulled By: bertmaher fbshipit-source-id: 1889217cd22da529e46ab30c9319a5646267e4ec	2020-09-03 23:37:45 -07:00
Martin Yuan	d221256888	[Message] Add what to do for missing operators. Summary: As title. Test Plan: N/A Reviewed By: gaurav-work Differential Revision: D23502416 fbshipit-source-id: a341eb10030e3f319266019ba4c02d9d9a0a6298	2020-09-03 22:41:27 -07:00
Nick Gibson	70aecd2a7f	[NNC] make inlining immediate (take 2) and fix bugs (#43885 ) Summary: A rework of `computeInline` which makes it work a bit better, particularly when combined with other transformations. Previously we stored Functions that were inlined and then deferred the actual inlining of the function body until prepareForCodgen was called. This has an issue when transformations are applied to the LoopNest: the function body can be different from what appears in the root_stmt and result in inlining that a) fails, b) reverses other transformations or c) a weird unpredictable combination of the two. This PR changes that behaviour so that the inlining occurs in the root stmt immediately, which means it reflects any previous transformations and any future transformations have a true view of the internal IR. It also has the benefit that inspecting the root statement gives an accurate view of it without needing to call prepareForCodgen. I also removed the difference between `computeInline` and `computeInlineWithRand` and we handle calls to `rand()` in all branches. This is a rework of https://github.com/pytorch/pytorch/issues/38696, with the agreed changes from ZolotukhinM and zheng-xq: we should only inline if the dimensions are trivial (ie. they are vars not exprs). This PR is mostly tests, and I fixed a bunch of bugs I found along the way. Partial list: * When inlining an expression involving rand, we would create random vars equal to the dimensionality of the enclosing Tensor not the produced Tensor - meaning we'd use an incorrect value if the inlined tensor was smaller. E.g: `X[i] = rand(); A[i, j] = X[i]` would produce a tensor where `A[0, 0] != A[0, 1]`. This is fixed by inserting the Let binding of the random variable at the correct loop body. * When inlining we'd replace all calls to `rand()` rather than just those present in the Tensor being inlined. * `rand()` was treated symbolically by the simplifier and we would aggregate or cancel calls to `rand()`. Have fixed the hasher to hash all calls to `rand()` distinctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43885 Reviewed By: gmagogsfm Differential Revision: D23503636 Pulled By: nickgg fbshipit-source-id: cdbdc902b7a14d269911d978a74a1c11eab004fa	2020-09-03 16:49:24 -07:00
Mikhail Zolotukhin	3105d8a9b2	[TensorExpr] Fuser: rely on input types when checking whether a device is supported. (#44139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44139 Also, make sure that we're checking that condition when we're starting a new fusion group, not only when we merge a node into an existing fusion group. Oh, and one more: add a test checking that we're rejecting graphs with unspecified shapes. Differential Revision: D23507510 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 9c268825ac785671d7c90faf2aff2a3e5985ac5b	2020-09-03 16:27:14 -07:00
Meghan Lele	7816d53798	[JIT] Add mypy type annotations for JIT (#43862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43862 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23491151 Pulled By: SplitInfinity fbshipit-source-id: 88367b89896cf409bb9ac3db7490d6779efdc3a4	2020-09-03 15:09:24 -07:00
Michael Suo	9dd8670d7d	[jit] Better match behavior of loaded ScriptModules vs. freshly created ones (#43298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43298 IR emitter uses `ModuleValue` to represent ScriptModules and emit IR for attribute access, submodule access, etc. `ModuleValue` relies on two pieces of information, the JIT type of the module, and the `ConcreteModuleType`, which encapsulates Python-only information about the module. ScriptModules loaded from a package used to create a dummy ConcreteModuleType without any info in it. This led to divergences in behavior during compilation. This PR makes the two ways of constructing a ConcreteModuleType equivalent, modulo any py-only information (which, by definition, is never present in packaged files anyway). Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23228738 Pulled By: suo fbshipit-source-id: f6a660f42272640ca1a1bb8c4ee7edfa2d1b07cc	2020-09-03 15:03:39 -07:00
Michael Suo	74f18476a2	[jit] fix segfault in attribute lookup on loaded ScriptModules (#43284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43284 The IR emitter looks for attributes on modules like: 1. Check the JIT type for the attribute 2. Check the originating Python class, in order to fulfill requests for, e.g. static methods or ignored methods. In the case where you do: ``` inner_module = torch.jit.load("inner.pt") wrapped = Wrapper(inner_module) # wrap the loaded ScriptModule in an nn.Module torch.jit.script(wrapped) ``` The IR emitter may check for attributes on `inner_module`. There is no originating Python class for `inner_module`, since it was directly compiled from the serialized format. Due to a bug in the code, we don't guard for this case an a segfault results if the wrapper asks for an undefined attribute. The lookup in this case looks like: 1. Check the JIT type for the attribute (not there!) 2. Check the originating Python class (this is a nullptr! segfault!) This PR guards this case and properly just raises an attribute missing compiler error instead of segfaulting. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23224337 Pulled By: suo fbshipit-source-id: 0cf3060c427f2253286f76f646765ec37b9c4c49	2020-09-03 15:01:59 -07:00
Elias Ellison	6868bf95c6	[JIT] Fuser match on schemas not node kind (#44083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44083 Match on the complete schema of a node instead of its node kind when deciding to fuse it. Previously we matched on node kind, which could fail with something like `aten::add(int, int)` and if a new overload was added to an op without corresponding NNC support we would fuse it. Follow ups are: - bail when an output tensor type isnt uniquely determined by the input types (e.g. aten::add and the second input could be either a float or an int) - remove NNC lowering for _tanh_backward & _sigmoid_backward - Validate that we support all of the overloads here. I optimistically added ops that included Tensors, it's possible that we do not support every overload here. This isn't a regression, and this PR is at least improving our failures in that regard. I can do any of these as part of this PR if desired, but there are a number of failures people have run into that this PR fixes so I think it would be good to land this sooner than later. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23503704 Pulled By: eellison fbshipit-source-id: 3ce971fb1bc3a7f1cbaa38f1ed853e2db3d67c18	2020-09-03 14:47:19 -07:00
Ann Shan	9b3c72d46e	[pytorch] Make mobile find_method return an optional (#43965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43965 As part of a larger effort to unify the API between the lite interpreter and full JIT: - implement torch::jit::mobile::Method, a proxy for torch::jit::mobile::Function - add support for overloaded operator() to mobile Method and Function - mobile find_method now returns a c10::optional<Method> (so signature matches full jit) - moves some implementation of Function from module.cpp to function.cpp ghstack-source-id: 111161942 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D23330762 fbshipit-source-id: bf0ba0d711d9566c92af31772057ecd35983ee6d	2020-09-03 14:46:18 -07:00
Nikolay Korovaiko	f91bdbeabd	Enable function calls in TEFuser and SpecializeAutogradZero (#43866 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43866 Reviewed By: ezyang Differential Revision: D23452798 Pulled By: Krovatkin fbshipit-source-id: 2cff4c905bf1b5d9de56e7869458ffa6fce1f1b5	2020-09-03 14:42:52 -07:00
Zafar	e05fa2f553	[quant] Prep for conv_transpose packing (#39714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39714 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22087071 Pulled By: z-a-f fbshipit-source-id: 507f8a414026eb4c9926f68c1e94d2f56119bca6	2020-09-03 14:10:32 -07:00
Yanan Cao	f3da9e3b50	Enable Enum pickling/unpickling. (#43188 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/43188 Enable Enum pickling/unpickling. * https://github.com/pytorch/pytorch/issues/42963 Add Enum TorchScript serialization and deserialization support * https://github.com/pytorch/pytorch/issues/42874 Fix enum constant printing and add FileCheck to all Enum tests * https://github.com/pytorch/pytorch/issues/43121 Add Enum convert back to Python object support Pull Request resolved: https://github.com/pytorch/pytorch/pull/43188 Reviewed By: zdevito Differential Revision: D23365141 Pulled By: gmagogsfm fbshipit-source-id: f0c93d4ac614dec047ad8640eb6bd9c74159b558	2020-09-03 13:51:02 -07:00
Kimish Patel	a153f69417	Fix replaceAtenConvolution for BC. (#44036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44036 Running replaceAtenConvolution on older traced model wont work as _convolution signature has changed and replaceAtenConvolution was changed to account for that. But we did not preserve the old behavior during that. This change restores the old behavior while keeing the new one. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23476775 fbshipit-source-id: 73a0c2b7387f2a8d82a8d26070d0059972126836	2020-09-03 12:57:57 -07:00
Kimish Patel	ba65cce2a2	Fix transposed conv2d rewrite pattern to account for convolution api (#44035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44035 change Also added test so as to capture such cases for future. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Reviewed By: iseeyuan Differential Revision: D23476773 fbshipit-source-id: a62c4429351c909245106a70b4c60b1bacffa817	2020-09-03 12:55:43 -07:00
Bert Maher	55ff9aa185	Test TE fuser unary ops and fix sigmoid(half) (#44094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44094 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23494950 Pulled By: bertmaher fbshipit-source-id: 676c4e57267c4ad92065ea90b06323918dd5b0de	2020-09-03 12:48:46 -07:00
Xingying Cheng	c59e11bfbb	Add soft error reporting to capture all the inference runtime failure. (#44078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44078 When PyTorch mobile inference failed and throw exception, if caller catch and not crash the app, we are not able to track all the inference failures. So we are adding native soft error reporting to capture all the failures occurring during module loading and running including both crashing and on-crashing failures. Since c10::Error has good error messaging stack handling (D21202891 (`a058e938f9`)), we are utilizing it for the error handling and message print out. ghstack-source-id: 111307080 Test Plan: Verified that the soft error reporting is sent through module.cpp when operator is missing, make sure a logview mid is generated with stack trace: https://www.internalfb.com/intern/logview/details/facebook_android_softerrors/5dd347d1398c1a9a73c804b20f7c2179/?selected-logview-tab=latest. Error message with context is logged below: ``` soft_error.cpp [PyTorchMobileInference] : Error occured during model running entry point: Could not run 'aten::embedding' with arguments from the 'CPU' backend. 'aten::embedding' is only available for these backends: [BackendSelect, Named, Autograd, Autocast, Batched, VmapMode]. BackendSelect: fallthrough registered at xplat/caffe2/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Named: registered at xplat/caffe2/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Autograd: fallthrough registered at xplat/caffe2/aten/src/ATen/core/VariableFallbackKernel.cpp:31 [backend fallback] Autocast: fallthrough registered at xplat/caffe2/aten/src/ATen/autocast_mode.cpp:253 [backend fallback] Batched: registered at xplat/caffe2/aten/src/ATen/BatchingRegistrations.cpp:317 [backend fallback] VmapMode: fallthrough registered at xplat/caffe2/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] Exception raised from reportError at xplat/caffe2/aten/src/ATen/core/dispatch/OperatorEntry.cpp:261 (m ``` Reviewed By: iseeyuan Differential Revision: D23428636 fbshipit-source-id: 82d5d9c054300dff18d144f264389402d0b55a8a	2020-09-03 10:54:43 -07:00
Sinan Nasir	98320061ad	DDP Communication hook: (Patch) Fix the way we pass future result to buckets. (#43734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43734 Following the additional GH comments on the original PR https://github.com/pytorch/pytorch/pull/43307. ghstack-source-id: 111327130 Test Plan: Run `python test/distributed/test_c10d.py` Reviewed By: smessmer Differential Revision: D23380288 fbshipit-source-id: 4b8889341c57b3701f0efa4edbe1d7bbc2a82ced	2020-09-03 08:59:10 -07:00
Mikhail Zolotukhin	40fec4e739	[TensorExpr] Fuser: do not fuse ops with 0-dim tensors. (#44073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44073 We don't have a proper support on NNC and JIT IR->NNC lowering side for it yet. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23487905 Pulled By: ZolotukhinM fbshipit-source-id: da0da7478fc8ce7b455176c95d8fd610c94352c1	2020-09-02 22:59:04 -07:00
Mikhail Zolotukhin	3da82aee03	[JIT] Remove profile nodes before BatchMM. (#43961 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43961 Currently we're removing prim::profile nodes and embed the type info directly in the IR right before the fuser, because it is difficult to fuse in a presence of prim::profile nodes. It turns out that BatchMM has a similar problem: it doesn't work when there are prim::profile nodes in the graph. These two passes run next to each other, so we could simply remove prim::profile nodes slightly earlier: before the BatchMM pass. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23453266 Pulled By: ZolotukhinM fbshipit-source-id: 92cb50863962109b3c0e0112e56c1f2cb7467ff1	2020-09-02 22:57:39 -07:00
Mikhail Zolotukhin	b2aaf212aa	[TensorExpr] Add option to enforce TensorExprKernel fallbacks. (#43972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43972 It is useful when debugging a bug to disable NNC backend to see whether the bug is there or in the fuser logic. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23455624 Pulled By: ZolotukhinM fbshipit-source-id: f7c0452a29b860afc806e2d58acf35aa89afc060	2020-09-02 18:34:24 -07:00
Bert Maher	33d51a9b32	Respect canFuseOn{CPU,GPU} in TE fuser (#43967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D23469048 Pulled By: bertmaher fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb	2020-09-02 18:00:25 -07:00
Lu Fang	f15e27265f	[torch.fx] Add support for custom op (#43248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43248 We add the support of __torch_function__ override for C++ custom op. The logic is the same as the other components, like torch.nn.Module. Refactored some code a little bit to make it reusable. Test Plan: buck test //caffe2/test:fx -- test_torch_custom_ops Reviewed By: bradleyhd Differential Revision: D23203204 fbshipit-source-id: c462a86e407e46c777171da32d7a40860acf061e	2020-09-02 16:08:37 -07:00
Elias Ellison	544a56ef69	[JIT] Always map node output in vmap (#43988 ) Summary: Previously when merging a node without a subgraph, we would merge the node's outputs to the corresponding subgraph values, but when merging a node with a subgraph the node's outputs would be absent in the value mapping. This PR makes it so they are included. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43988 Reviewed By: ZolotukhinM Differential Revision: D23462116 Pulled By: eellison fbshipit-source-id: 232c081261e9ae040df0accca34b1b96a5a5af57	2020-09-02 10:30:43 -07:00
albanD	73f009a2aa	refactor manual function definitions (#43711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43711 this makes them available in forward if needed No change to the file content, just a copy-paste. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23454146 Pulled By: albanD fbshipit-source-id: 6269a4aaf02ed53870fadf8b769ac960e49af195	2020-09-02 09:23:21 -07:00
Lillian Johnson	cd58114c6c	Adjust level of verbosity of debug dumps in graph executor T74227880 (#43682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43682 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23397980 Pulled By: Lilyjjo fbshipit-source-id: b0114efbd63b2a29eb14086b0a8963880023c2a8	2020-09-02 08:45:16 -07:00
Hong Xu	4bb5d33076	is_numpy_scalar should also consider bool and complex types (#43644 ) Summary: Before this PR, ```python import torch import numpy as np a = torch.tensor([1, 2], dtype=torch.bool) c = np.array([1, 2], dtype=np.bool) print(a[0] == c[0]) a = torch.tensor([1, 2], dtype=torch.complex64) c = np.array([1, 2], dtype=np.complex64) print(a[0] == c[0]) # This case is still broken a = torch.tensor([1 + 1j, 2 + 2j], dtype=torch.complex64) c = np.array([1 + 1j, 2 + 2j], dtype=np.complex64) print(a[0] == c[0]) ``` outputs ``` False False False ``` After this PR, it outputs: ``` tensor(True) /home/user/src/pytorch/torch/tensor.py:25: ComplexWarning: Casting complex values to real discards the imaginary part return f(args, *kwargs) tensor(True) tensor(False) ``` Related issue: https://github.com/pytorch/pytorch/issues/43579 cc anjali411 mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/43644 Reviewed By: ailzhang Differential Revision: D23425569 Pulled By: anjali411 fbshipit-source-id: a868209376b30cea601295e54015c47803923054	2020-09-02 07:41:50 -07:00
Yuchen Huang	7000c2efb5	[2/2][PyTorch][Mobile] Added mobile module metadata logging (#43853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43853 Add QPL logging for mobile module's metadata ghstack-source-id: 111113492 (Note: this ignores all push blocking failures!) Test Plan: - CI - Load the model trained by `mobile_model_util.py` - Local QPL logger standard output. {F319012106} Reviewed By: xcheng16 Differential Revision: D23417304 fbshipit-source-id: 7bc834f39e616be1eccfae698b3bccdf2f7146e5	2020-09-01 22:27:10 -07:00
Alex Suhan	9db90fe1f3	[TensorExpr] Remove unused functions in kernel.cpp (#43966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43966 Test Plan: build. Reviewed By: ZolotukhinM Differential Revision: D23456660 Pulled By: asuhan fbshipit-source-id: c13411b61cf62dd5d038e7246f79a8682822b472	2020-09-01 20:25:16 -07:00
Gao, Xiang	5e97f251a8	Enable TF32 support for cuDNN (#40737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2	2020-09-01 15:34:24 -07:00
Lillian Johnson	e3cb582e05	Error printing extension support for multiline errors (#43807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43807 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23407457 Pulled By: Lilyjjo fbshipit-source-id: 05a6a50dc39c00474d9087ef56028a2c183aa53a	2020-09-01 10:02:43 -07:00
Ailing Zhang	224232032c	Move Autograd to an alias dispatch key (#43070 ) Summary: This PR moves `DispatchKey::Autograd` to an alias dispatch key mapping to `AutogradCPU, AutogradCUDA, AutogradXLA, AutogradOther, AutogradPrivate*` keys. A few things are handled in this PR: - Update alias dispatch key mapping and precompute dispatchTable logic - Move `Autograd` key from `always_included` set to TensorImpl constructor. - Update `dummyTensor` constructor to take `requires_grad` as optional argument so that it's closer to the real application in op_registration_test. - Use `BackendSelect` key for both backend select before and after autograd layer. (1 liner in backend_select codegen) A few planned followups ordered by priority: - [cleanup] Update `test_dispatch.py` to include testing `Autograd`. - [cleanup] Add Math alias key and move catchAll to Math. (to remove 2.2 in `computeDispatchTableEntryWithDebug`) - [new feature] Add support for Math in native_functions.yaml - [cleanup] Add iterator like functionality to DispatchKeySet - [cleanup/large] Only add Autograd backend keys when tensor requires grad. (cc: ljk53 ?) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43070 Reviewed By: ezyang Differential Revision: D23281535 Pulled By: ailzhang fbshipit-source-id: 9ad00b17142e9b83304f63cf599f785500f28f71	2020-09-01 09:05:29 -07:00
Bert Maher	c14a3613a8	Fix NaN propagation in TE fuser's min/max implementation (#43609 ) Summary: Per eager mode source-of-truth, NaNs shall be propagated by min/max. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43609 Reviewed By: ZolotukhinM Differential Revision: D23349184 Pulled By: bertmaher fbshipit-source-id: 094eb8b89a02b27d5ecf3988d0f473c0f91e4afb	2020-09-01 02:10:13 -07:00
Pritam Damania	f1624b82b5	Preserve python backtrace in autograd engine errors. (#43684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43684 This PR attempts to address #42560 by capturing the appropriate exception_ptr in the autograd engine and passing it over to the Future. As part of this change, there is a significant change the Future API where we now only accept an exception_ptr as part of setError. For the example in #42560, the exception trace would now look like: ``` > Traceback (most recent call last): > File "test_autograd.py", line 6914, in test_preserve_backtrace > Foo.apply(t).sum().backward() > File "torch/tensor.py", line 214, in backward > torch.autograd.backward(self, gradient, retain_graph, create_graph) > File "torch/autograd/__init__.py", line 127, in backward > allow_unreachable=True) # allow_unreachable flag > File "torch/autograd/function.py", line 87, in apply > return self._forward_cls.backward(self, *args) > File "test_autograd.py", line 6910, in backward > raise ValueError("something") > ValueError: something ``` ghstack-source-id: 111109637 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D23365408 fbshipit-source-id: 1470c4776ec8053ea92a6ee1663460a3bae6edc5	2020-09-01 01:28:47 -07:00
Ailing Zhang	f229d2c07b	Revert D23335106: [quant][graphmode][fix] Fix insert quant dequant for observers without qparams Test Plan: revert-hammer Differential Revision: D23335106 (`602209751e`) Original commit changeset: 84af2884d521 fbshipit-source-id: 8d227fe2048b532016407d8ecfbaa6ffd1c313fd	2020-08-31 22:12:37 -07:00
Jerry Zhang	602209751e	[quant][graphmode][fix] Fix insert quant dequant for observers without qparams (#43606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43606 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23335106 fbshipit-source-id: 84af2884d52118c069fc43a9f166dc336a8a87c8	2020-08-31 18:27:53 -07:00
Mikhail Zolotukhin	98b846cd1d	[JIT] Remove loop peeling from the profiling executor pipeline. (#43847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43847 It seems to slowdown two fastRNN benchmarks and does not speed up others. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23416197 Pulled By: ZolotukhinM fbshipit-source-id: 598144561979e84bcf6bccf9b0ca786f5af18383	2020-08-31 17:26:55 -07:00
Mikhail Zolotukhin	d69d603061	[JIT] Specialize autograd zero: actually remove the original graph after we created its versioned copy. (#43900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43900 The original code assumed that the versioning if was inserted in the beginning of the graph while in fact it was inserted in the end. We're now also not removing `profile_optional` nodes and rely on DCE to clean it up later (the reason we're not doing it is that deletion could invalidate the insertion point being used). Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23432175 Pulled By: ZolotukhinM fbshipit-source-id: 1bf55affaa3f17af1bf71bad3ef64edf71a3e3fb	2020-08-31 17:26:51 -07:00
Mikhail Zolotukhin	f150f924d3	[JIT] Specialize autograd zero: fix the guarding condition. (#43846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43846 We are looking for tensors that are expected to be undefined (according to the profile info) and should be checking for them to satisfy the following condition: "not(have any non-zero)", which is equivalent to "tensor is all zeros". The issue was that we've been checking tensors that were expected not to be undefined. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23416198 Pulled By: ZolotukhinM fbshipit-source-id: 71e22f552680f68f2af29f427b7355df9b1a4278	2020-08-31 17:25:50 -07:00
Ralf Gommers	4c19a1e350	Move torch/autograd/grad_mode.pyi stubs inline (#43415 ) Summary: - Add `torch._C` bindings from `torch/csrc/autograd/init.cpp` - Renamed `torch._C.set_grad_enabled` to `torch._C._set_grad_enabled` so it doesn't conflict with torch.set_grad_enabled anymore This is a continuation of gh-38201. All I did was resolve merge conflicts and finish the annotation of `_DecoratorContextManager.__call__` that ezyang started in the first commit. ~Reverts commit `b5cd3a80bb`, which was only motivated by not having `typing_extensions` available.~ (JIT can't be made to understand `Literal[False]`, so keep as is). Pull Request resolved: https://github.com/pytorch/pytorch/pull/43415 Reviewed By: ngimel Differential Revision: D23301168 Pulled By: malfet fbshipit-source-id: cb5290f2e556b4036592655b9fe54564cbb036f6	2020-08-31 16:14:41 -07:00
Rohan Varma	4e4626a23d	Join-based API to support DDP uneven inputs (#42577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42577 Closes https://github.com/pytorch/pytorch/issues/38174. Implements a join-based API to support training with the DDP module in the scenario where different processes have different no. of inputs. The implementation follows the description in https://github.com/pytorch/pytorch/issues/38174. Details are available in the RFC, but as a summary, we make the following changes: #### Approach 1) Add a context manager `torch.nn.parallel.distributed.join` 2) In the forward pass, we schedule a "present" allreduce where non-joined process contribute 1 and joined processes contribute 0. This lets us keep track of joined processes and know when all procs are joined. 3) When a process depletes its input and exits the context manager, it enters "joining" mode and attempts to "shadow" the collective comm. calls made in the model's forward and backward pass. For example we schedule the same allreduces in the same order as the backward pass, but with zeros 4) We adjust the allreduce division logic to divide by the effective world size (no. of non-joined procs) rather than the absolute world size to maintain correctness. 5) At the end of training, the last joined process is selected to be the "authoritative" model copy We also make some misc. changes such as adding a `rank` argument to `_distributed_broadcast_coalesced` and exposing some getters/setters on `Reducer` to support the above changes. #### How is it tested? We have tests covering the following models/scenarios: - [x] Simple linear model - [x] Large convolutional model - [x] Large model with module buffers that are broadcast in the forward pass (resnet). We verify this with a helper function `will_sync_module_buffers` and ensure this is true for ResNet (due to batchnorm) - [x] Scenario where a rank calls join() without iterating at all, so without rebuilding buckets (which requires collective comm) - [x] Model with unused params (with find unused parameters=True) - [x] Scenarios where different processes iterate for a varying number of different iterations. - [x] Test consistency in tie-breaking when multiple ranks are the last ones to join - [x] Test that we divide by the effective world_size (no. of unjoined processes) #### Performance implications ###### Trunk vs PR patched, 32 GPUs, batch size = 32 P50, forward + backward + optimizer batch latency & total QPS: 0.121 264/s vs 0.121 264/s P50 backwards only batch latency & total QPS: 0.087 369/s vs 0.087 368/s ###### join(enable=True) vs without join, 32 GPUs, batch size = 32, even inputs P50, forward + backward + optimizer batch latency & total QPS: 0.120 265/s vs 0.121 264/s P50 backwards only batch latency & total QPS: 0.088 364/s vs 0.087 368/s ###### join(enable=False) vs without join, 32 GPUs, batch size = 32, even inputs P50 forward + backward + optimizer batch latency & total QPS: 0.121 264/s vs 0.121 264/s P50 backwards only batch latency & total QPS: 0.087 368/s vs 0.087 368/s ###### join(enable=True) with uneven inputs (offset = 2000), 32 GPUs, batch size = 32 P50 forward + backward + optimizer batch latency & total QPS: 0.183 174/s vs 0.121 264/s P50 backwards only batch latency & total QPS: 0.150 213/s vs 0.087 368/s ###### join(enable=True) with uneven inputs ((offset = 2000)), 8 GPUs, batch size = 32 P50 forward + backward + optimizer batch latency & total QPS: 0.104 308/s vs 0.104 308/s P50 backwards only batch latency & total QPS: 0.070 454/s vs 0.070 459/s The 2 above uneven inputs benchmark was conducted 32 GPUs and 4 GPUs immediately depleting their inputs and entering "join" mode (i.e. not iterating at all), while the other 28 iterating as normal. It looks like there is a pretty significant perf hit for this case when there are uneven inputs and multi-node training. Strangely, when there is a single node (8 GPUs), this does not reproduce. #### Limitations 1) This is only implemented for MPSD, not SPMD. Per a discussion with mrshenli we want to encourage the use of MPSD over SPMD for DDP. 2) This does not currently work with SyncBN or custom collective calls made in the model's forward pass. This is because the `join` class only shadows the `broadcast` for buffers in the forward pass, the gradient allreduces in the bwd pass, unused parameters reduction, and (optionally) the rebuild buckets broadcasting in the backwards pass. Supporting this will require additional design thought. 3) Has not been tested with the [DDP comm. hook](https://github.com/pytorch/pytorch/issues/39272) as this feature is still being finalized/in progress. We will add support for this in follow up PRs. ghstack-source-id: 111033819 Reviewed By: mrshenli Differential Revision: D22893859 fbshipit-source-id: dd02a7aac6c6cd968db882c62892ee1c48817fbe	2020-08-31 13:29:03 -07:00
Elias Ellison	3c8b1d73c9	Update aliasing in tensorexpr fuser (#43743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43743 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D23385205 Pulled By: eellison fbshipit-source-id: 097a15d5bcf216453e1dd144d6117108b3deae4d	2020-08-31 11:52:26 -07:00
Elias Ellison	5da8a7bf2d	use types in the IR instead of vmap (#43742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43742 We can remove all prim::profiles, update the values to their specialized profiled types, and then later guard the input graphs based on the input types of the fusion group. After that we remove specialized tensor types from the graph. This gets rid of having to update the vmap and removes all of the profile nodes in fusing. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D23385206 Pulled By: eellison fbshipit-source-id: 2c84bd1d1c38df0d7585e523c30f7bd28f399d7c	2020-08-31 11:52:23 -07:00
Elias Ellison	259e5b7d71	Add passes to profiling executor pipeline (#43636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43636 We weren't running inlining in the forward graph of differentiable subgraphs, and we weren't getting rid of all profiles as part of optimization. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23358804 Pulled By: eellison fbshipit-source-id: 05ede5fa356a15ca385f899006cb5b35484ef620	2020-08-31 11:52:20 -07:00
Elias Ellison	a7e7981c0b	Use prim::TensorExprGroup interned symbol (#43635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43635 Intern the symbol, no functional changes. Aliasing need to be looked at but this should be done in a separate PR; this PR is just changing the symbol. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358806 Pulled By: eellison fbshipit-source-id: f18bcd142a0daf514136f019ae607e4c3f45d9f8	2020-08-31 11:52:16 -07:00
Elias Ellison	1c0faa759e	Update requires grad property (#43634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43634 Because differentiable graphs detach the gradients of input Tensors, creating and inlining differentiable graphs changes the requires_grad property of tensors in the graph. In the legacy executor, this was not a problem as the Fuser would simply ignore the gradient property because it would be invariant that the LegacyExecutor only passed tensors with grad = False. This is not the case with the profiler, as the Fuser does it's own guarding. Updating the type also helps with other typechecks, e.g. the ones specializing the backward, and with debugging the graph. Other possibilities considered were: - Fuser/Specialize AutogradZero always guards against requires_grad=False regardless of the profiled type - Re-profile forward execution of differentiable graph Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23358803 Pulled By: eellison fbshipit-source-id: b106998accd5d0f718527bc00177de9af5bad5fc	2020-08-31 11:51:06 -07:00
Nick Gibson	1390cad2d8	[NNC] Hook up registerizer to Cuda codegen [2/x] (#42878 ) Summary: Insert the registerizer into the Cuda Codegen pass list, to enable scalar replacement and close the gap in simple reduction performance. First up the good stuff, benchmark before: ``` Column sum Caffe2 NNC Simple Better (10, 100) 5.7917 9.7037 6.9386 6.0448 (100, 100) 5.9338 14.972 7.1139 6.3254 (100, 10000) 21.453 741.54 145.74 12.555 (1000, 1000) 8.0678 122.75 22.833 9.0778 Row sum Caffe2 NNC Simple Better (10, 100) 5.4502 7.9661 6.1469 5.5587 (100, 100) 5.7613 13.897 21.49 5.5808 (100, 10000) 21.702 82.398 75.462 22.793 (1000, 1000) 22.527 129 176.51 22.517 ``` After: ``` Column sum Caffe2 NNC Simple Better (10, 100) 6.0458 9.4966 7.1094 6.056 (100, 100) 5.9299 9.1482 7.1693 6.593 (100, 10000) 21.739 121.97 162.63 14.376 (1000, 1000) 9.2374 29.01 26.883 10.127 Row sum Caffe2 NNC Simple Better (10, 100) 5.9773 8.1792 7.2307 5.8941 (100, 100) 6.1456 9.3155 24.563 5.8163 (100, 10000) 25.384 30.212 88.531 27.185 (1000, 1000) 26.517 32.702 209.31 26.537 ``` Speedup about 3-8x depending on the size of the data (increasing with bigger inputs). The gap between NNC and simple is closed or eliminated - remaining issue appears to be kernel launch overhead. Next up is getting us closer to the _Better_ kernel. It required a lot of refactoring and bug fixes on the way: * Refactored flattening of parallelized loops out of the CudaPrinter and into its own stage, so we can transform the graph in the stage between flattening and printing (where registerization occurs). * Made AtomicAddFuser less pessimistic, it will now recognize that if an Add to a buffer is dependent on all used Block and Thread vars then it has no overlap and does not need to be atomic. This allows registerization to apply to these stores. * Fixed PrioritizeLoad mutator so that it does not attempt to separate the Store and Load to the same buffer (i.e. reduction case). * Moved CudaAnalysis earlier in the process, allowing later stages to use the analyzed bufs. * Fixed a bug in the Registerizer where when adding a default initializer statement it would use the dtype of the underlying var (which is always kHandle) instead of the dtype of the Buf. * Fixed a bug in the IRMutator where Allocate statements logic was inverted to be replaced only if they did not change. * Added simplification of simple Division patterns to the IRSimplifier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42878 Reviewed By: glaringlee Differential Revision: D23382499 Pulled By: nickgg fbshipit-source-id: 3640a98fd843723abad9f54e67070d48c96fe949	2020-08-31 10:39:46 -07:00
mfkasim91	576880febf	Print all traceback for nested backwards in detect_anomaly (#43626 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43405. This pull request adds a feature of printing all tracebacks if a `detect_anomaly` mode detects `nan` in nested backward operations. The way I did it is by assigning a node as a parent to all nodes it produces during its backward calculation. Then if one of the children produces `nan`, it will print the traceback from the parent and grand parents (if any). The parent is assigned in `parent_node_` member in `Node` class which is accessible in C++ by function `node->parent()` and in Python by `node.parent_function`. A node has a parent iff: 1. it is created from a backward operation, and 2. created when anomaly mode and grad mode are both enabled. An example of this feature: import torch def example(): x = torch.tensor(1.0, requires_grad=True) y = torch.tensor(1e-8, requires_grad=True) # small to induce nan in n-th backward a = x * y b = x * y z1 = a / b # can produce nan in n-th backward as long as https://github.com/pytorch/pytorch/issues/43414 is unsolved z = z1 * z1 gy , = torch.autograd.grad( z , (y,), create_graph=True) gy2, = torch.autograd.grad(gy , (y,), create_graph=True) gy3, = torch.autograd.grad(gy2, (y,), create_graph=True) gy4, = torch.autograd.grad(gy3, (y,), create_graph=True) return gy4 with torch.autograd.detect_anomaly(): gy4 = example() with output: example.py:16: UserWarning: Anomaly Detection has been enabled. This mode will increase the runtime and should only be enabled for debugging. with torch.autograd.detect_anomaly(): /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Error detected in DivBackward0. Traceback of forward call that caused the error: File "example.py", line 17, in <module> gy4 = example() File "example.py", line 12, in example gy3, = torch.autograd.grad(gy2, (y,), create_graph=True) File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad return Variable._execution_engine.run_backward( (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:61.) return Variable._execution_engine.run_backward( /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Traceback of forward call that induces the previous calculation: File "example.py", line 17, in <module> gy4 = example() File "example.py", line 11, in example gy2, = torch.autograd.grad(gy , (y,), create_graph=True) File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad return Variable._execution_engine.run_backward( (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:65.) return Variable._execution_engine.run_backward( /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Traceback of forward call that induces the previous calculation: File "example.py", line 17, in <module> gy4 = example() File "example.py", line 8, in example z1 = a / b # can produce nan in n-th backward as long as https://github.com/pytorch/pytorch/issues/43414 is unsolved (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:65.) return Variable._execution_engine.run_backward( Traceback (most recent call last): File "example.py", line 17, in <module> gy4 = example() File "example.py", line 13, in example gy4, = torch.autograd.grad(gy3, (y,), create_graph=True) File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad return Variable._execution_engine.run_backward( RuntimeError: Function 'DivBackward0' returned nan values in its 1th output. cc & thanks to albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/43626 Reviewed By: malfet Differential Revision: D23397499 Pulled By: albanD fbshipit-source-id: aa7435ec2a7f0d23a7a02ab7db751c198faf3b7d	2020-08-31 08:23:07 -07:00
BowenBao	08126c9153	[ONNX] Utilize ONNX shape inference for ONNX exporter (#40628 ) Summary: It is often that the conversion from torch operator to onnx operator requires input rank/dtype/shape to be known. Previously, the conversion depends on tracer to provide these info, leaving a gap in conversion of scripted modules. We are extending the export with support from onnx shape inference. If enabled, onnx shape inference will be called whenever an onnx node is created. This is the first PR introducing the initial look of the feature. More and more cases will be supported following this PR. * Added pass to run onnx shape inference on a given node. The node has to have namespace `onnx`. * Moved helper functions from `export.cpp` to a common place for re-use. * This feature is currently experimental, and can be turned on through flag `onnx_shape_inference` in internal api `torch.onnx._export`. * Currently skipping ONNX Sequence ops, If/Loop and ConstantOfShape due to limitations. Support will be added in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40628 Reviewed By: mrshenli Differential Revision: D22709746 Pulled By: bzinodev fbshipit-source-id: b52aeeae00667e66e0b0c1144022f7af9a8b2948	2020-08-30 18:35:46 -07:00
Mike Ruberry	3aeb70db0b	Documents sub properly, adds subtract alias (#43850 ) Summary: `torch.sub` was undocumented, so this PR adds its documentation, analogous to `torch.add`'s documentation, and adds the alias `torch.subtract` for `torch.sub`, too. This alias comes from NumPy (see https://numpy.org/doc/stable/reference/generated/numpy.subtract.html?highlight=subtract#numpy.subtract) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43850 Reviewed By: ngimel Differential Revision: D23416908 Pulled By: mruberry fbshipit-source-id: 6c4d2ebaf6ecae91f3a6efe484ce6c4dad96f016	2020-08-30 15:44:56 -07:00
Bert Maher	1830e4f08c	Remove unnamed namespace in headers (#43689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43689 Test Plan: Imported from OSS Reviewed By: eellison, asuhan Differential Revision: D23367636 Pulled By: bertmaher fbshipit-source-id: ddb6d34d2f7cadff3a591c3650e1dd1b401c3d2d	2020-08-29 22:45:53 -07:00
Gao, Xiang	ab3ea95e90	#include <string> in loopnest.h (#43835 ) Summary: This file is causing compiling failure on my gcc-10.1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43835 Reviewed By: bhosmer Differential Revision: D23416417 Pulled By: ZolotukhinM fbshipit-source-id: d0c2998347438fb729212574d52ce20dd6faae85	2020-08-29 19:06:44 -07:00
Ashkan Aliabadi	4e39c310eb	Move torch/csrc/utils/hash.h to c10/util/hash.h. (#42503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42503 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252331 Pulled By: AshkanAliabadi fbshipit-source-id: 3c4c0e27b9a7eec8560e374c2a3ba5f1c65dae48	2020-08-29 17:47:00 -07:00
Alex Suhan	60ad7e9c04	[TensorExpr] Make sum available from Python (#43730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43730 Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_sum test_tensorexpr --gtest_filter=TensorExprTest.KernelSum* Reviewed By: ZolotukhinM Differential Revision: D23407600 Pulled By: asuhan fbshipit-source-id: e6da4690ae6d802f9be012e39e61b7467aa5285c	2020-08-29 10:38:21 -07:00
Nikita Shulga	d10056652b	Enable `torch.half` for `lt` and `masked_select` (#43704 ) Summary: Enable testing of those options in `TestTorchDeviceTypeCPU.test_logical_cpu` and `TestTorchDeviceTypeCPU.test_masked_select_cpu_float16` Add `view_as_real` testing for `torch.complex32` type Pull Request resolved: https://github.com/pytorch/pytorch/pull/43704 Reviewed By: albanD Differential Revision: D23373070 Pulled By: malfet fbshipit-source-id: 00f17f23b48513379a414227aea91e2d3c0dd5f9	2020-08-29 02:37:26 -07:00
Pritam Damania	931b8b4ac8	Use ivalue::Future in autograd engine and DistEngine. (#43676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43676 This is one part of https://github.com/pytorch/pytorch/issues/41574 to ensure we consolidate everything around ivalue::Future. I've removed the use of torch/csrc/utils/future.h from the autograd engines and used ivalue::Future instead. ghstack-source-id: 110895545 Test Plan: waitforbuildbot. Reviewed By: albanD Differential Revision: D23362415 fbshipit-source-id: aa109b3f8acf0814d59fc5264a85a8c27ef4bdb6	2020-08-29 02:15:26 -07:00
Nikolay Korovaiko	000739c31a	Function calls for fallback paths (#43274 ) Summary: This PR adds API to package unoptimized/fallback blocks as function calls. It's mainly meant to be used by TensorExpressionsFuser and SpecializeAutogradZero passes as both specialize the original graph but would also like to provide a fallback path in case the assumptions under which the graph was specialized do not hold for some inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43274 Reviewed By: malfet Differential Revision: D23406961 Pulled By: Krovatkin fbshipit-source-id: ef21fc9ad886953461b09418d02c75c58375490c	2020-08-28 23:31:02 -07:00
Hao Lu	8538a79bfe	[jit][static] Basic executor (#43647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43647 Nothing fancy, just a basic implementation of the graph executor without using stack machine. Reviewed By: bwasti Differential Revision: D23208413 fbshipit-source-id: e483bb6ad7ba8591bbe1767e669654d82f42c356	2020-08-28 23:20:07 -07:00
Martin Yuan	58148c85f4	Use template OperatorGenerator for prim and special operator registration (#43481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43481 Apply OperatorGenerator for prim and special operator registration. It does not affect the existing build by default. However, if a whitelist of operator exists, only the operators in the whitelist will be registered. It has the potential to save up to 200 KB binary size, depending on the usage. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D23287251 Pulled By: iseeyuan fbshipit-source-id: 3ca39fbba645bad8d69e69195f3680e4f6d633c5	2020-08-28 21:18:00 -07:00
Vinod Kumar S	13c7c6227e	Python/C++ API Parity: TransformerDecoder (#42886 ) Summary: Fixes #{[37756](https://github.com/pytorch/pytorch/issues/37756)} Pull Request resolved: https://github.com/pytorch/pytorch/pull/42886 Reviewed By: zhangguanheng66 Differential Revision: D23385631 Pulled By: glaringlee fbshipit-source-id: 610a2fabb4c25b2dfd37b33287215bb8872d653d	2020-08-28 20:13:53 -07:00
Dmytro Dzhulgakov	47e489b135	Make ExtraFilesMap return bytes instead of str (#43241 ) Summary: In case we want to store binary files using `ScriptModule.save(..., _extra_files=...)` functionality. With python3 we can just use bytes only and not bother about it. I had to do a copy-pasta from pybind sources, maybe we should upstream it, but it'd mean adding a bunch of template arguments to `bind_map` which is a bind untidy. Let me know if there's a better place to park this function (it seems to be the only invocation of `bind_map` so I put it in the same file) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43241 Reviewed By: zdevito Differential Revision: D23205244 Pulled By: dzhulgakov fbshipit-source-id: 8f291eb4294945fe1c581c620d48ba2e81b3dd9c	2020-08-28 19:11:33 -07:00
Kurt Mohler	68b9daa9bf	Add `torch.linalg.norm` (#42749 ) Summary: Adds `torch.linalg.norm` function that matches the behavior of `numpy.linalg.norm`. Additional changes: * Add support for dimension wrapping in `frobenius_norm` and `nuclear_norm` * Fix `out` argument behavior for `nuclear_norm` * Fix issue where `frobenius_norm` allowed duplicates in `dim` argument * Add `_norm_matrix` Closes https://github.com/pytorch/pytorch/issues/24802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42749 Reviewed By: ngimel Differential Revision: D23336234 Pulled By: mruberry fbshipit-source-id: f0aba3089a3a0bf856aa9c4215e673ff34228fac	2020-08-28 18:28:33 -07:00
neginraoof	cd0bab8d8d	[ONNX] Where op (#41544 ) Summary: Extending where op export Pull Request resolved: https://github.com/pytorch/pytorch/pull/41544 Reviewed By: malfet Differential Revision: D23279515 Pulled By: bzinodev fbshipit-source-id: 4627c95ba18c8a5ac8d06839c343e06e71c46aa7	2020-08-28 18:15:01 -07:00
Sinan Nasir	7d517cf96f	[NCCL] Dedicated stream to run all FutureNCCL callbacks. (#43447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43447 Two main better-engineering motivations to run all FutureNCCL callbacks on a dedicated stream: 1. Each time a then callback was called, we would get a stream from the pool and run the callback on that stream. If we observe the stream traces using that approach, we would see a lot of streams and debugging would become more complicated. If we have a dedicated stream to run all then callback operations, the trace results will be much cleaner and easier to follow. 2. getStreamFromPool may eventually return the default stream or a stream that is used for other operations. This can cause slowdowns. Unless then callback takes longer than preceding allreduce, this approach will be as performant as the previous approach. ghstack-source-id: 110909401 Test Plan: Perf trace runs to validate the desired behavior: See the dedicated stream 152 is running the then callback operations: {F299759342} I run pytorch.benchmark.main.workflow using resnet50 and 32 GPUs registering allreduce with then hook. See f213777896 [traces](https://www.internalfb.com/intern/perfdoctor/results?run_id=26197585) After updates, same observation: see f214890101 Reviewed By: malfet Differential Revision: D23277575 fbshipit-source-id: 67a89900ed7b70f3daa92505f75049c547d6b4d9	2020-08-28 17:26:23 -07:00
Vasiliy Kuznetsov	3f5ea2367e	Adding a version serialization type to ConvPackedParam (#43086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43086 This PR changes the format of `ConvPackedParam` in a nearly backwards-compatible way: * a new format is introduced which has more flexibility and a lower on-disk size * custom pickle functions are added to `ConvPackedParams` which know how to load the old format * the custom pickle functions are not BC because the output type of `__getstate__` has changed. We expect this to be acceptable as no user flows are actually broken (loading a v1 model with v2 code works), which is why we whitelist the failure. Test plan (TODO finalize): ``` // adhoc testing of saving v1 and loading in v2: https://gist.github.com/vkuzo/f3616c5de1b3109cb2a1f504feed69be // test that loading models with v1 conv params format works and leads to the same numerics python test/test_quantization.py TestSerialization.test_conv2d_graph python test/test_quantization.py TestSerialization.test_conv2d_nobias_graph // test that saving and loading models with v2 conv params format works and leads to same numerics python test/test_quantization.py TestSerialization.test_conv2d_graph_v2 python test/test_quantization.py TestSerialization.test_conv2d_nobias_graph_v2 // TODO before land: // test numerics for a real model // test legacy ONNX path ``` Note: this is a newer copy of https://github.com/pytorch/pytorch/pull/40003 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D23347832 Pulled By: vkuzo fbshipit-source-id: 06bbe4666421ebad25dc54004c3b49a481d3cc92	2020-08-28 15:41:30 -07:00
Iurii Zdebskyi	4cb8d306e6	Add _foreach_add_(TensorList tensors, Scalar scalar) API (#42531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42531 [First PR: Add private API to support tensor lists: _foreach_add(TensorList tensors, Scalar scalar)](https://github.com/pytorch/pytorch/pull/41554). Motivation [GitHub issue](https://github.com/pytorch/pytorch/issues/38655) Current PyTorch optimizer implementations are not efficient in cases when we work with a lot of small feature tensors. Starting a lot of kernels slows down the whole process. We need to reduce the number of kernels that we start. As an example, we should be looking at [NVIDIAs Apex](https://github.com/NVIDIA/apex). In order to track progress, we will pick PyTorchs DCGAN model with Adam optimizer and once the optimizer is reimplemented with tensor lists, benchmark the model performance against original model version, Apexs version with original Adam optimizer and it’s FusedAdam optimizer. Current API restrictions - List can't be empty (will fixed in upcoming PRs). - All tensors in the list must have the same dtype, device and size. Broadcasting At this point we don't support broadcasting. What is 'Fast' and 'Slow' route In particular cases, we cant process an op with a fast list CUDA kernel. Still, we can do with a regular for-loop where the op will be applied to each tensor individually through the dispatch mechanisms. There are a few checks that decide whether the op will be performed via a 'fast' or 'slow' path. To go the fast route, - All tensors must have strided layout - All tensors must be dense and not have overlapping memory - The resulting tensor type must be the same. --------------- In this PR - Adding a `std::vector<Tensor> _foreach_add_(TensorList tensors, Scalar scalar)` API - Resolving some additional comments from previous [PR](https://github.com/pytorch/pytorch/pull/41554). Tests Tested via unit tests TODO 1. Properly handle empty lists Plan for the next PRs 1. APIs - Binary Ops for list with Scalar - Binary Ops for list with list - Unary Ops for list - Pointwise Ops 2. Complete tasks from TODO 3. Rewrite PyTorch optimizers to use for-each operators for performance gains. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23331892 Pulled By: izdeby fbshipit-source-id: c585b72e1e87f6f273f904f75445618915665c4c	2020-08-28 14:34:46 -07:00
Mike Ruberry	20abfc21e4	Adds arctanh, arcsinh aliases, simplifies arc* alias dispatch (#43762 ) Summary: Adds two more "missing" NumPy aliases: arctanh and arcsinh, and simplifies the dispatch of other arc* aliases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43762 Reviewed By: ngimel Differential Revision: D23396370 Pulled By: mruberry fbshipit-source-id: 43eb0c62536615fed221d460c1dec289526fb23c	2020-08-28 13:59:19 -07:00
Mikhail Zolotukhin	776c2d495f	[JIT] IRParser: store list attributes as generic ivalue lists. (#43785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43785 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23400565 Pulled By: ZolotukhinM fbshipit-source-id: e248eb1854c4ec40da9455d4279ea6e47b1f2a16	2020-08-28 13:27:28 -07:00
Mike Ruberry	f4695203c2	Fixes fft function calls for C++ API (#43749 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43732. Requires importing the fft namespace in the C++ API, just like the Python API does, to avoid clobbering torch::fft the function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43749 Reviewed By: glaringlee Differential Revision: D23391544 Pulled By: mruberry fbshipit-source-id: d477d0b6d9a689d5c154ad6c31213a7d96fdf271	2020-08-28 12:41:30 -07:00
mspryn@fb.com	b630c1870d	Add stateful XNNPack deconvolution2d operator to torch. (#43233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43233 XNNPack is already being used for the convolution2d operation. Add the ability for it to be used with transpose convolution. Test Plan: buck run caffe2/test:xnnpack_integration Reviewed By: kimishpatel Differential Revision: D23184249 fbshipit-source-id: 3fa728ce1eaca154d24e60f800d5e946d768c8b7	2020-08-28 10:31:36 -07:00
Protonu Basu	58a7e73a95	[TensorExpr] Block Codegen (#40054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40054 Reviewed By: ZolotukhinM Differential Revision: D22061350 Pulled By: protonu fbshipit-source-id: 004f7c316629b16610ecdbb97e43036c72c65067	2020-08-28 09:53:42 -07:00
generatedunixname89002005287564@sandcastle148.ftw3.facebook.com	26161e8ab6	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23393950 fbshipit-source-id: 6a31b7ab6961cba88014f41b3ed1eda108edebab	2020-08-28 05:38:13 -07:00
Meghan Lele	87d7c362b1	[JIT] Add JIT support for torch.no_grad (#41371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41371 Summary This commit enables the use of `torch.no_grad()` in a with item of a with statement within JIT. Note that the use of this context manager as a decorator is not supported. Test Plan This commit adds a test case to the existing with statements tests for `torch.no_grad()`. Fixes This commit fixes #40259. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D22649519 Pulled By: SplitInfinity fbshipit-source-id: 7fa675d04835377666dfd0ca4e6bc393dc541ab9	2020-08-27 15:32:57 -07:00
Jiakai Liu	3a0e35c9f2	[pytorch] deprecate static dispatch (#43564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564 Static dispatch was originally introduced for mobile selective build. Since we have added selective build support for dynamic dispatch and tested it in FB production for months, we can deprecate static dispatch to reduce the complexity of the codebase. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23324452 Pulled By: ljk53 fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7	2020-08-27 14:52:48 -07:00
Elias Ellison	01f974eb1e	Specialize optionals for grad_sum_to_size (#43633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43633 In the backward graph, _grad_sum_to_size is inserted whenever a possibly broadcasting op is called:" `"aten::_grad_sum_to_size(Tensor(a) self, int[]? size) -> Tensor(a)"` If a broadcast occurred, a sum is called, otherwise the second input is None and it is a no-op. Most of the time, it's a no-op (in the fast RNNs benchmark > 90% of the time). We can get rid of this op by profiling the optionality of the second input. I added `prim::profile_optional` to do this, which counts the number of times it saw a None value and the number of times it saw a value present. When specializing the backward graph, we insert checks for values we profiled as None, and in the optimized block can remove the grad_sum_to_size calls that use those values. In the future we may revisit this when NNC supports reductions and we want to replace grad_sum_to_size with sums as well, but I think this is worth landing now. Test Plan: Imported from OSS Reviewed By: bwasti, ZolotukhinM Differential Revision: D23358809 Pulled By: eellison fbshipit-source-id: a30a148ca581370789d57ba082d23cbf7ef2cd4d	2020-08-27 14:35:37 -07:00
Elias Ellison	a19fd3a388	Add undefined specializations in backward (#43632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43632 Specialize the backward graph by guarding on the undefinedness of the input tensors. The graph will look like: ``` ty1, ty2, succesful_checks = prim::TypeCheck(...) if (succesful_checks) -> optimized graph else: -> fallback graph ``` Specializing on the undefinedness of tensors allows us to clean up the ``` if any_defined(inputs): outputs = <original_computation> else: outputs = autograd zero tensors ``` blocks that make up the backward graph, so that we can fuse the original_computation nodes together. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23358808 Pulled By: eellison fbshipit-source-id: f5bb28f78a4a3082ecc688a8fe0345a8a098c091	2020-08-27 14:35:35 -07:00
Elias Ellison	e189ef5577	Refactor pass to class (#43630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43630 No functional changes here - just refactoring specialize autograd zero to a class, and standardizing its API to take in a shared_ptr<Graph> Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358805 Pulled By: eellison fbshipit-source-id: 42e19ef2e14df66b44592252497a47d03cb07a7f	2020-08-27 14:35:30 -07:00
Elias Ellison	d1c4d75c14	Add API for unexecuted op (#43629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43629 We have a few places where we count the size a block / subgraph - it's nice to have a shared API to ignore operators that are not executed in the optimized graph (will be used when i add a new profiling node in PR ^^) Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358807 Pulled By: eellison fbshipit-source-id: 62c745d9025de94bdafd9f748f7c5a8574cace3f	2020-08-27 14:34:05 -07:00
aizjForever	cdc3e232e9	Add `__str__` and `__repr__` bindings to SourceRange (#43601 ) Summary: Added the bindings for `__str__` and `__repr__` methods for SourceRange Pull Request resolved: https://github.com/pytorch/pytorch/pull/43601 Test Plan: `python test/test_jit.py` cc gmagogsfm Reviewed By: agolynski Differential Revision: D23366500 Pulled By: gmagogsfm fbshipit-source-id: ab4be6e8f9ad5f67a323554437878198483f4320	2020-08-27 12:30:47 -07:00
Martin Yuan	288a2effa0	Operator generator based on templated selective build. (#43456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43456 Introduce the template OperatorGenerator, which returns an optional Operator. It's null if the templated bool value is null. RegisterOperators() is updated to take the optional Operator. A null will not be registered. With this update the selective operator registration can be done at compile time. Tests are added to show an operator can be registered if it's in a whitelist and it will not be registered if it's not in the whitelist. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D23283563 Pulled By: iseeyuan fbshipit-source-id: 456e0c72b2f335256be800aeabb797bd83bcf0b3	2020-08-27 07:26:07 -07:00
Alex Suhan	de84db2a9d	[TensorExpr] Add aten::sum lowering to the kernel (#43585 ) Summary: Handles all dimensions and selected dimensions, per PyTorch semantics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43585 Test Plan: test_tensorexpr Reviewed By: bertmaher Differential Revision: D23362382 Pulled By: asuhan fbshipit-source-id: e8d8f1197a026be0b46603b0807d996a0de5d58c	2020-08-27 02:46:47 -07:00
lixinyu	48e08f884e	C++ APIs TransformerEncoder (#43187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43187 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23182770 Pulled By: glaringlee fbshipit-source-id: 968846138d4b1c391a74277216111dba8b72d683	2020-08-27 01:31:46 -07:00
Meghan Lele	00c1501bc0	[JIT] Cast return values of functions returning Any (#42259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42259 Summary This commit modifies IR generation to insert explicit cast that cast each return value to `Any` when a function is annotated as returning `Any`. This precludes the failure in type unification (see below) that caused this issue. Issue #41962 reported that the use of an `Any` return type in combination with different code paths returning values of different types causes a segmentation fault. This is because the exit transform pass tries to unify the different return types, fails, but silently sets the type of the if node to c10::nullopt. This causes problems later in shape analysis when that type object is dereferenced. Test Plan This commit adds a unit test that checks that a function similar to the one in #41962 can be scripted and executed. Fixes This commit fixes #41962. Differential Revision: D22883244 Test Plan: Imported from OSS Reviewed By: eellison, yf225 Pulled By: SplitInfinity fbshipit-source-id: 523d002d846239df0222cd07f0d519956e521c5f	2020-08-26 18:24:11 -07:00
Bert Maher	0bf27d64f4	Fix NaN propagation in fuser's min/max implementation (#43590 ) Summary: fmax/fmin propagate the number if one argument is NaN, which doesn't match the eager mode behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43590 Reviewed By: mruberry Differential Revision: D23338664 Pulled By: bertmaher fbshipit-source-id: b0316a6f01fcf8946ba77621efa18f339379b2d0	2020-08-26 17:31:06 -07:00
shubhambhokare1	9ca338a9d4	[ONNX] Modified slice node in inplace ops pass (#43275 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42292 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43275 Reviewed By: hl475 Differential Revision: D23352540 Pulled By: houseroad fbshipit-source-id: 7fce3087c333efe3db4b03e9b678d0bee418e93a	2020-08-26 16:51:20 -07:00
Sinan Nasir	769b9381fc	DDP Communication hook: Fix the way we pass future result to buckets. (#43307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43307 I identified a bug with DDP communication hook while I was trying accuracy benchmarks: I was getting `loss=nan`. Looks like when we re-`initialize_bucketviews` with the value of `future_work`, as `Reducer::mark_variable_ready_dense` does `bucket_view.copy_(grad)` it wasn't copying the `grads` back to the contents since `bucket_view` wouldn't have any relationship with `contents` after re-intitializing it with something else. As we have multiple iterations, this was causing problems. I solved this by adding two states for `bucket_view`: ``` // bucket_views_in[i].copy_(grad) and // grad.copy_(bucket_views_out[i]) // provide convenient ways to move grad data in/out of contents. std::vector<at::Tensor> bucket_views_in; std::vector<at::Tensor> bucket_views_out; ``` I included two additional unit tests where we run multiple iterations for better test coverage: 1) `test_accumulate_gradients_no_sync_allreduce_hook` 2) `test_accumulate_gradients_no_sync_allreduce_with_then_hook`. ghstack-source-id: 110728299 Test Plan: Run `python test/distributed/test_c10d.py`, some perf&accuracy benchmarks. New tests: `test_accumulate_gradients_no_sync_allreduce_hook` `test_accumulate_gradients_no_sync_allreduce_with_then_hook` Acc benchmark results look okay: f214188350 Reviewed By: agolynski Differential Revision: D23229309 fbshipit-source-id: 329470036cbc05ac12049055828495fdb548a082	2020-08-26 14:22:09 -07:00
Yuchen Huang	0521c71241	[D23047144 Duplicate][2/3][lite interpreter] add metadata when saving and loading models for mobile (#43584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43584 1. add `metadata.pkl` to `.bc` file which includes the model info that we are interested in 2. load `metadata.pkl` as a attribute `unordered_map<string, string>` in the module ghstack-source-id: 110730013 Test Plan: - CI ```buck build //xplat/caffe2:jit_module_saving ``` ```buck build //xplat/caffe2:torch_mobile_core ``` Reviewed By: xcheng16 Differential Revision: D23330080 fbshipit-source-id: 5d65bd730b4b566730930d3754fa1bf16aa3957e	2020-08-26 14:07:49 -07:00
Wenfang Xu	db1fbc5729	[OACR][NLU] Add aten::str operator (#43573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43573 We recently updated the Stella NLU model in D23307228, and the App started to crash with `Following ops cannot be found:{aten::str, }`. Test Plan: Verified by installing the assistant-playground app on Android. Reviewed By: czlx0701 Differential Revision: D23325409 fbshipit-source-id: d670242868774bb0aef4be5c8212bc3a3f2f667c	2020-08-26 13:27:11 -07:00
Hao Lu	25dcc28cd6	[jit][static] Replace deepcopy with copy (#43182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43182 We should avoid using `deepcopy` on the module because it involves copying the weights. Comparing the implementation of `c10::ivalue::Object::copy()` vs `c10::ivalue::Object::deepcopy()`, the only difference is `deepcopy` copies the attributes (slots) while `copy` does not. Reviewed By: bwasti Differential Revision: D23171770 fbshipit-source-id: 3cd711c6a2a19ea31d1ac1ab2703a0248b5a4ef3	2020-08-26 11:15:49 -07:00
Mikhail Zolotukhin	3ec24f02af	[TensorExpr] Start using typecheck in the fuser. (#43173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43173 With this change the fuser starts to generate typechecks for inputs of fusion group. For each fusion group we generate a typecheck and an if node: the true block contains the fused subgraph, the false block contains unoptimized original subgraph. Differential Revision: D23178230 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: f56e9529613263fb3e6575869fdb49973c7a520b	2020-08-25 18:13:32 -07:00
Mikhail Zolotukhin	b763666f9f	[JIT] Subgraph utils: add an optional vmap argument to the API to allow retrieving value mappings. (#43235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43235 This functionality is needed when we want to not lose track of nodes/values as we merge and unmerge them into other nodes. For instance, if we have a side data structure with some meta information about values or nodes, this new functionality would allow to keep that metadata up to date after merging and unmerging nodes. Differential Revision: D23202648 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: 350d21a5d462454166f8a61b51d833551c49fcc9	2020-08-25 18:13:29 -07:00
Mikhail Zolotukhin	d18566c617	[TensorExpr] Fuser: disallow aten::slice nodes. (#43365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43365 We don't have shape inference for them yet. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23253418 Pulled By: ZolotukhinM fbshipit-source-id: 9c38778b8a616e70f6b2cb5aab03d3c2013b34b0	2020-08-25 18:13:27 -07:00
Mikhail Zolotukhin	8dc4b415eb	[TensorExpr] Fuser: only require input shapes to be known (output shapes can be inferred). (#43171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43171 Differential Revision: D23178228 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: e3465066e0cc4274d28db655de274a51c67594c4	2020-08-25 18:13:25 -07:00
Mikhail Zolotukhin	f6b7c6da19	[TensorExpr] Fuser: move canHandle and some other auxiliary functions into TensorExprFuser class. (#43170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43170 Differential Revision: D23178227 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: 3c3a0215344fb5942c4f3078023fef32ad062fe9	2020-08-25 18:12:01 -07:00
Haoran Li	f35e069622	Back out "Make grad point to bucket buffer in DDP to save memory usage" (#43557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43557 backout the diff that caused some errors in pytext distributed training Test Plan: Tested by rayhou who verified reverting the diff works Differential Revision: D23320238 fbshipit-source-id: caa0fe74404059e336cd95fdb41373f58ecf486e	2020-08-25 18:04:39 -07:00
Yuchen Huang	05f27b18fb	Back out D23047144 "[2/3][lite interpreter] add metadata when saving and loading models for mobile" Summary: Original commit changeset: f368d00f7bae Back out "[2/3][lite interpreter] add metadata when saving and loading models for mobile" D23047144 (`e37f871e87`) Pull Request: https://github.com/pytorch/pytorch/pull/43516 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: xcheng16 Differential Revision: D23304639 fbshipit-source-id: 970ca3438c1858f8656cbcf831ffee2c4a551110	2020-08-25 14:58:38 -07:00
Yuchen Huang	e37f871e87	[2/3][lite interpreter] add metadata when saving and loading models for mobile Summary: 1. add `metadata.pkl` to `.bc` file which includes the model info that we are interested in 2. load `metadata.pkl` as a attribute `unordered_map<string, string>` in the module Test Plan: - CI ```buck build //xplat/caffe2:jit_module_saving ``` ```buck build //xplat/caffe2:torch_mobile_core ``` Reviewed By: xcheng16 Differential Revision: D23047144 fbshipit-source-id: f368d00f7baef2d3d15f89473cdb146467aa1e0b	2020-08-24 13:40:52 -07:00
David Reiss	ed8b08a3ba	Update quantize_jit to handle new upsample overloads (#43407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43407 ghstack-source-id: 110404846 Test Plan: test_general_value_ops passes with D21209991 applied. (Without this diff D21209991 breaks that test.) Reviewed By: jerryzh168 Differential Revision: D23256503 fbshipit-source-id: 0f75e50a9f7fccb5b4325604319a5f76b42dfe5e	2020-08-24 13:33:47 -07:00
Yanan Cao	35a36c1280	Implement JIT Enum type serialization and deserialization (#43460 ) Summary: [Re-review tips: nothing changed other than a type in python_ir.cpp to fix a windows build failure] Adds code printing for enum type Enhance enum type to include all contained enum names and values Adds code parsing for enum type in deserialization Enabled serialization/deserialization test in most TestCases. (With a few dangling issues to be addressed in later PRs to avoid this PR grows too large) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43460 Reviewed By: albanD Differential Revision: D23284929 Pulled By: gmagogsfm fbshipit-source-id: e3e81d6106f18b7337ac3ff5cd1eeaff854904f3	2020-08-24 12:04:31 -07:00
Ann Shan	7cc1efec13	Add lite SequentialSampler to torch mobile (#43299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43299 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23228415 Pulled By: ann-ss fbshipit-source-id: eebe54353a128783f039c7dac0e2dd765a61940d	2020-08-24 09:45:24 -07:00
generatedunixname89002005287564@sandcastle105.ftw3.facebook.com	2f9c9796f1	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23290730 fbshipit-source-id: ee3ffbd6f9c0fade4586d8f4f8c8dd3d310d1f33	2020-08-24 05:36:38 -07:00
Hameer Abbasi	c4e841654d	Add alias torch.negative to torch.neg. (#43400 ) Summary: xref https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43400 Reviewed By: albanD Differential Revision: D23266011 Pulled By: mruberry fbshipit-source-id: ca20b30d99206a255cf26438b09c3ca1f99445c6	2020-08-24 01:15:04 -07:00
Nikolay Korovaiko	a97ca93c0e	remove prim::profile and special-casing (#43160 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43160 Reviewed By: ZolotukhinM Differential Revision: D23284421 Pulled By: Krovatkin fbshipit-source-id: 35e97aad299509a682ae7e95d7cef53301625309	2020-08-22 23:52:36 -07:00
Nikolay Korovaiko	b1d31428e7	Reduce number of `prim::profile` (#43147 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43147 Reviewed By: colesbury Differential Revision: D23190137 Pulled By: Krovatkin fbshipit-source-id: bf5f29a76e5ebfb5b9d3b6adee424e213c25891b	2020-08-22 16:06:30 -07:00

... 3 4 5 6 7 ...

6361 Commits