pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
FFFrog	af0bc75460	Remove deprecated alias macro(1/3) (#137556 ) Detailed Descriptions: - Remove AT_ERROR Macro Pull Request resolved: https://github.com/pytorch/pytorch/pull/137556 Approved by: https://github.com/ezyang	2024-10-21 17:32:32 +00:00
cyy	226384b460	[2/N] Cleanup header inclusions in torch_cpu by iwyu (#109964 ) Further cleaning up of torch_cpu header inclusions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109964 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2023-11-19 20:56:32 +00:00
rzou	235a04c0de	Add getAllSortedOperatorsFor helper function (#112198 ) I need this for later. This roughly returns all the OpOverloads for an OpOverloadPacket in the order that the OpOverloadPacket decides to resolve them in. Test Plan: - wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/112198 Approved by: https://github.com/ezyang	2023-10-29 15:36:14 +00:00
Nikita Shulga	ad8aef0f98	[BE] [3/N] Use nested namespaces (#110314 ) Mostly in torch/csrc/jit/runtime and in `ATen/cuda/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110314 Approved by: https://github.com/seemethere	2023-09-30 02:23:48 +00:00
Ivan Kobzarev	2fc73622f8	[jit] Support Awaitable type (#90863 ) We want to make TorchRec sharded models TorchScriptable. TorchRec sharded models uses generic types Awaitable[W] and LazyAwaitable[W] (https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/types.py#L212). In sharded model those types are used instead of contained type W, having the initialization function that produces object of type W. At the moment when the first attribute of W is requested - `LazyAwaitable[W]` will call its initialization function (on the same stack), cache the result inside and work transparently as an object of W. So we can think about it as a delayed object initialization. To support this behavior in TorchScript - we propose a new type to TorchScript - `Await`. In eager mode it works the same as `LazyAwaitable[W]` in TorchRec, being dynamically typed - acting as a type `W` while it is `Await[W]`. Within torchscript it is `Await[W]` and can be only explicitly converted to W, using special function `torch.jit.awaitable_wait(aw)`. Creation of this `Await[W]` is done via another special function `torch.jit.awaitable(func, args)`. The semantic is close to `torch.jit.Future`, fork, wait and uses the same jit mechanics (inline fork Closures) with the difference that it does not start this function in parallel on fork. It only stores as a lambda inside IValue that will be called on the same thread when `torch.jit.awaitable_wait` is called. For example (more examples in this PR `test/jit/test_await.py`) ``` def delayed(z: Tensor) -> Tensor: return Tensor 3 @torch.jit.script def fn(x: Tensor): aw: Await[int] = torch.jit._awaitable(delayed, 99) a = torch.eye(2) b = torch.jit._awaitable_wait(aw) return a + b + x ``` Functions semantics: `_awaitable(func -> Callable[Tuple[...], W], args, *kwargs) -> Await[W]` Creates Await object, owns args and kwargs. Once _awaitable_wait calls, executes function func and owns the result of the function. Following _awaitable_wait calls will return this result from the first function call. `_awaitable_wait(Await[W]) -> W` Returns either cached result of W if it is not the first _awaitable_wait call to this Await object or calls specified function if the first. `_awaitable_nowait(W) -> Await[W]` Creates trivial Await[W] wrapper on specified object To be type complaint for the corner cases. Differential Revision: [D42502706](https://our.internmc.facebook.com/intern/diff/D42502706) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90863 Approved by: https://github.com/davidberard98	2023-01-30 17:38:59 +00:00
sanchitintel	4ee29d6033	[Reland take-2] Add JIT graph fuser for oneDNN Graph API (v0.5) Re-landing #68111/#74596 ## Description v0.5 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of #50256, the below improvements are included: * The [v0.5 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5) of the oneDNN Graph API is used * The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` `torch.jit.freeze` should be used after tracing (recommended) or scripting a model. ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: * SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) * SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) * By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) ** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code is placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is in: ``` caffe2/CMakeLists.txt cmake/public/mkldnn.cmake cmake/Modules/FindMKLDNN.cmake ``` ## Limitations * In this PR, we only support Pytorch-oneDNN-Graph integration on Linux platform. Support on Windows and MacOS will be enabled as a next step. * We have only optimized the inference use-case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76622 Approved by: https://github.com/eellison	2022-05-05 16:57:03 +00:00
PyTorch MergeBot	3dcd67a1b3	Revert "[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1)" This reverts commit `8b11d81058`. Reverted https://github.com/pytorch/pytorch/pull/74596 on behalf of https://github.com/janeyx99	2022-04-29 15:40:17 +00:00
chunyuan	8b11d81058	[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1) Re-landing https://github.com/pytorch/pytorch/pull/68111 ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74596 Approved by: https://github.com/malfet	2022-04-29 01:01:33 +00:00
Michael Suo	e5bf87963d	Revert D34584878: [pytorch][PR] Add JIT graph fuser for oneDNN Graph API (Preview4) Test Plan: revert-hammer Differential Revision: D34584878 (`7dd0823011`) Original commit changeset: ce817aa8cc90 Original Phabricator Diff: D34584878 (`7dd0823011`) fbshipit-source-id: a941aaad34f8fe5f0c51f719f9f5c29b811c4d5b (cherry picked from commit a43262ec7521b1665b02a64d3f279e72ee2344b9)	2022-03-21 23:07:14 +00:00
chunyuan	7dd0823011	Add JIT graph fuser for oneDNN Graph API (Preview4) (#68111 ) Summary: ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68111 Reviewed By: eellison Differential Revision: D34584878 Pulled By: malfet fbshipit-source-id: ce817aa8cc9052ee9ed930c9cf66be83449e61a4 (cherry picked from commit cd17683aa7d9c0947df45a1ab53627feff795587)	2022-03-21 22:12:19 +00:00
Elias Ellison	fb66f561b1	Add copy out to the fallback path in SR invocation of composed op (#70871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70871 We had previously handled reusing memory in the optimized kernel execution path, but not yet handled it if we hit the unoptimized fallback. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33458652 Pulled By: eellison fbshipit-source-id: 4eb62181ed02c95813a99638f5e2d0f9347b5c08	2022-01-10 12:16:38 -08:00
Raghavan Raman	0bdf4702f6	[jit] Add a new op that composes all of the dynamic shape logic (#69476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69476 This diff adds a new op, `TensorExprDynamicGroup`, that composes all the logic behind running a dynamic shaped fused node. This includes a guard instruction that checks for conditions, a conditional that calls the fused node or the fallback graph depending on the guard. ghstack-source-id: 146107006 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/cpp/jit:jit ``` Reviewed By: eellison Differential Revision: D32320082 fbshipit-source-id: 2bd1a43391ca559837d78ddb892d931abe9ebb73	2021-12-22 00:28:57 -08:00
Peter Bell	ef70174f2e	Separate c10::Symbol header from list of interned strings (#69406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69406 Most files that include `interned_strings.h` don't actually depend on anything generated from `FORALL_NS_SYMBOLS` yet because they're in a single file you need to recompile whenever a new symbol is added. Here I move the class definition into a separate file so this doesn't happen. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32923637 Pulled By: albanD fbshipit-source-id: 6e488cbfcfe2c041a99d9ff22e167dbddf3f46d7	2021-12-19 14:52:26 -08:00
Don Jang	1a033b45dd	[JIT] Fix a bug of rejecting ops with AliasAnalysisKind::CONSERVATIVE (#64336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64336 Currently AliasDB rejects any user-defined ops with `AliasAnalysisKind::CONSERVATIVE` if they do not have a special treatment for alias analysis. For example, the following alias schema gets rejects: ``` m.def(torch::schema( "namescope::my_op(...) -> ...", c10::AliasAnalysisKind::CONSERVATIVE)); ``` This rejection condition is contradictory: AliasDB can handle ops with `CONSERVATIVE` in a general way without any special casing at https://fburl.com/diffusion/op5u72sk calling https://fburl.com/diffusion/h3aws5dd which seems very appropriate to be conservative for alias analysis. This change corrects the rejection condition to be satisfied for ops with special casing but have `CONSERVATIVE`, since they both cannot be used simultaneously. Test Plan: Confirmed that ``` m.def(torch::schema( "namescope::my_op(...) -> ...", c10::AliasAnalysisKind::CONSERVATIVE)); ``` gets accepted and `my_op`'s all inputs and outputs are put to point to wildcard(*) by AliasDB. Reviewed By: eellison Differential Revision: D30690121 fbshipit-source-id: 431cc1a84edd5227f52b44a0fd85d5eb16f3c288	2021-09-07 18:26:31 -07:00
Richard Barnes	4fdb9579fa	irange-ify 12 (#62120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62120 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879713 fbshipit-source-id: 3084a5eacb722f7fb0a630d47bf694f4d6831136	2021-08-09 15:31:51 -07:00
Elias Ellison	32fed3f375	Handle mkldnn broadcasting in mkldnn fuser (#51736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51736 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696694 Pulled By: eellison fbshipit-source-id: 473cc64c8d9f775e9d06340437aff2eb6c0619b9	2021-03-01 21:22:23 -08:00
Elias Ellison	a2f7e929ef	Add MKLDNN fuser (#51600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51600 Looking for notes on implementation first, will post more notes on benchmarks and overall thoughts/implementation and solicit more input soon. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696702 Pulled By: eellison fbshipit-source-id: cd612f093fe3859e42fb0b77560ebd1b44fccff7	2021-03-01 21:22:19 -08:00
Elias Ellison	bfae3789ba	Move conv to mkldnn (#51483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51483 This PR moves the conv weights of a frozen model to MKLDNN, and AOT reorders the weights. When the weights are already in MKLDNN, just computing a single conv by converting the input and output from/to mkldnn provides large speedups. I benchmark'd the results of the top 200 shapes in predictor [here](https://www.internalfb.com/phabricator/paste/view/P171537938), as well as verified that it sped up popular models in torchvision. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696703 Pulled By: eellison fbshipit-source-id: 0b4441bee4f6e0890a4540fbca3bb5e58b8c5adf	2021-03-01 21:19:27 -08:00
jiej	dd1c2a06b7	refactor profiling optional (#47667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47667 Test Plan: Imported from OSS Reviewed By: anjali411, ngimel Differential Revision: D25255572 Pulled By: Krovatkin fbshipit-source-id: d0152c9ef5b1994e27be9888bcb123dca3ecd88f	2021-01-22 14:45:28 -08:00
Nikolay Korovaiko	8e60bf9034	add RequiresGradCheck (#50392 ) Summary: This change improves perf by 3-4% on fastrnns. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50392 Reviewed By: izdeby Differential Revision: D25891392 Pulled By: Krovatkin fbshipit-source-id: 44d9b6907d3975742c9d77102fe6a85aab2c08c0	2021-01-15 16:50:42 -08:00
Andres Suarez	8530c65e25	[codemod][fbcode/caffe2] Apply clang-format update fixes Test Plan: Sandcastle and visual inspection. Reviewed By: igorsugak Differential Revision: D25849205 fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0	2021-01-09 14:37:36 -08:00
Bram Wasti	f4226b5c90	[static runtime] add static subgraph fusion pass (#49185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49185 This diff adds a fusion feature that will let us use static runtime for parts of the graph. This will prove useful in cases where fully eliminating control flow is hard etc. TODO: [x] factor out into separate fusion file [x] add python test case [x] add graph that isn't fully lowered test case [x] add graph that has weird list/tuple outputs test case the loop example looks quite good: ``` graph(%a.1 : Tensor, %b.1 : Tensor, %iters.1 : int): %12 : bool = prim::Constant[value=1]() # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:110:4 %c.2 : Tensor = prim::StaticSubgraph_0(%a.1, %b.1) %c : Tensor = prim::Loop(%iters.1, %12, %c.2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:110:4 block0(%i : int, %c.12 : Tensor): %c.10 : Tensor = prim::StaticSubgraph_1(%a.1, %c.12, %b.1) -> (%12, %c.10) return (%c) with prim::StaticSubgraph_0 = graph(%0 : Tensor, %4 : Tensor): %5 : int = prim::Constant[value=2]() %6 : Tensor = aten::mul(%4, %5) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:109:12 %2 : int = prim::Constant[value=1]() %c.2 : Tensor = aten::add(%0, %6, %2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:109:8 return (%c.2) with prim::StaticSubgraph_1 = graph(%1 : Tensor, %7 : Tensor, %8 : Tensor): %9 : int = prim::Constant[value=1]() %c.4 : Tensor = aten::add(%7, %8, %9) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:111:12 %5 : int = prim::Constant[value=2]() %c.7 : Tensor = aten::mul_(%c.4, %5) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:112:8 %2 : int = prim::Constant[value=1]() %c.10 : Tensor = aten::sub_(%c.7, %1, %2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:113:8 return (%c.10) ``` (Note: this ignores all push blocking failures!) Test Plan: buck test mode/no-gpu //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test mode/no-gpu caffe2/test:static_runtime Reviewed By: bertmaher Differential Revision: D25385702 fbshipit-source-id: 2f24af4f11d92a959167facd03fbd24f464a6098	2020-12-10 14:03:11 -08:00
jiej	a6fa3b2682	adding profile_ivalue (#47666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47666 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25255573 Pulled By: Krovatkin fbshipit-source-id: 5d8753e4040a3d96105d28d26728125947c7a638	2020-12-09 15:29:15 -08:00
jiej	dabc286ab3	Remove output used only by sizes (#448 ) (#47665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47665 Re-enabled the pass to remove outputs from fusion that is only used by aten::size; Added size computation for reduction op via new operator prim::ReductionSizes; Test Plan: Imported from OSS Reviewed By: navahgar, jamesr66a Differential Revision: D25254675 Pulled By: Krovatkin fbshipit-source-id: e9a057b0287ed0ac93b415647fd8e5e836ba9856	2020-12-03 11:14:30 -08:00
Michael Suo	dc8176356e	Various cleanups to ir_emitter and friends (#46686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46686 I was trying to page this code back in after a while and some things stuck out as unnecessarily confusing. 1. Improve documentation of closures and fork stuff to be more accurate to how we use them today. 2. Change `prim::LocalVariableScope` to `prim::ListComprehension`. It is only ever used for a list comprehensions, and in general the nodes emitted by `ir_emitter` should correspond to concrete operations or language features rather than semantic constraints. 3. Change the somewhat mysterious "inputs" and "attributes" argument names throughout the codebase to be the more obvious "args" and "kwargs" that they generally represent (I think "inputs" and "attributes" come from the AST naming). Test Plan: Imported from OSS Reviewed By: navahgar, jamesr66a Differential Revision: D24464197 Pulled By: suo fbshipit-source-id: 1f4b1475b58b5690a0b204e705caceff969533b4	2020-10-28 16:28:05 -07:00
jiej	ac146c4820	[nvFuser] Switching to `CudaFusionGuard` from `BailOut` for nvfuser - update 2 (#46452 ) Summary: 1. Added CudaFusionGuard as the custom TypeCheck for nvfuser; enabled dynamic shape support with profiling executor; 2. dropped support for legacy fuser; 3. re-enabled nvfuser tests; 4. added registration for profiling record to allow profiling on user specified nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46452 Reviewed By: zou3519, anjali411 Differential Revision: D24364642 Pulled By: ngimel fbshipit-source-id: daf53a9a6b6636e1ede420a3a6d0397d4a8b450b	2020-10-19 15:44:31 -07:00
Mikhail Zolotukhin	71e6ce6616	[JIT] Specialize AutogradZero: merge AutogradAnyNonZero and Not(AutogradAnyNonZero) checks into one. (#44987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44987 This PR introduces new `prim::AutogradAllZero` and `prim::AutogradAllNonZero` ops that are used for a batch check for multiple tensors. The specialize-autogradzero pass now generates one check for all expected-to-be-undefined tensors, one check for all expected-to-be-defined tensors, and a bunch of checks for size parameters passed to `grad_sum_to_size` (this probably could be cleaned up somehow as well in future). An example of what we generated before this change: ``` %1626 : bool = prim::AutogradAnyNonZero(%0) %1627 : bool = prim::AutogradAnyNonZero(%2) %1628 : bool = aten::__not__(%1627) %1629 : bool = prim::AutogradAnyNonZero(%3) %1630 : bool = aten::__not__(%1629) %1631 : bool = prim::AutogradAnyNonZero(%4) %1632 : bool = aten::__not__(%1631) %1633 : bool = prim::AutogradAnyNonZero(%5) %1634 : bool = aten::__not__(%1633) %1635 : bool = prim::AutogradAnyNonZero(%6) %1636 : bool = aten::__not__(%1635) %1637 : bool = prim::AutogradAnyNonZero(%7) %1638 : bool = aten::__not__(%1637) %1639 : bool = prim::AutogradAnyNonZero(%8) %1640 : bool = aten::__not__(%1639) %1641 : bool = prim::AutogradAnyNonZero(%9) %1642 : bool = aten::__not__(%1641) %1643 : bool = prim::AutogradAnyNonZero(%10) %1644 : bool = aten::__not__(%1643) %1645 : bool = prim::AutogradAnyNonZero(%11) %1646 : bool = aten::__not__(%1645) %1647 : bool = prim::AutogradAnyNonZero(%12) %1648 : bool = aten::__not__(%1647) %1649 : bool = prim::AutogradAnyNonZero(%13) %1650 : bool = aten::__not__(%1649) %1651 : bool = prim::AutogradAnyNonZero(%14) %1652 : bool = aten::__not__(%1651) %1653 : bool = prim::AutogradAnyNonZero(%15) %1654 : bool = aten::__not__(%1653) %1655 : bool = prim::AutogradAnyNonZero(%16) %1656 : bool = aten::__not__(%1655) %1657 : bool = prim::AutogradAnyNonZero(%17) %1658 : bool = prim::AutogradAnyNonZero(%18) %1659 : bool = prim::AutogradAnyNonZero(%19) %1660 : bool = prim::AutogradAnyNonZero(%20) %1661 : bool = aten::__is__(%self_size.16, %1625) %1662 : bool = aten::__is__(%other_size.16, %1625) %1663 : bool = aten::__is__(%self_size.14, %1625) %1664 : bool = aten::__is__(%self_size.12, %1625) %1665 : bool = prim::AutogradAnyNonZero(%ingate.7) %1666 : bool = prim::AutogradAnyNonZero(%forgetgate.7) %1667 : bool = prim::AutogradAnyNonZero(%cellgate.7) %1668 : bool = prim::AutogradAnyNonZero(%30) %1669 : bool = prim::AutogradAnyNonZero(%31) %1670 : bool = aten::__is__(%self_size.10, %1625) %1671 : bool = aten::__is__(%other_size.10, %1625) %1672 : bool = prim::AutogradAnyNonZero(%34) %1673 : bool = prim::AutogradAnyNonZero(%35) %1674 : bool = aten::__is__(%self_size.8, %1625) %1675 : bool = aten::__is__(%other_size.8, %1625) %1676 : bool = aten::__is__(%self_size.6, %1625) %1677 : bool = aten::__is__(%other_size.6, %1625) %1678 : bool = prim::AutogradAnyNonZero(%outgate.7) %1679 : bool = prim::AutogradAnyNonZero(%41) %1680 : bool = prim::AutogradAnyNonZero(%42) %1681 : bool = prim::AutogradAnyNonZero(%43) %1682 : bool = aten::__is__(%self_size.4, %1625) %1683 : bool = aten::__is__(%other_size.4, %1625) %1684 : bool[] = prim::ListConstruct(%1626, %1628, %1630, %1632, %1634, %1636, %1638, %1640, %1642, %1644, %1646, %1648, %1650, %1652, %1654, %1656, %1657, %1658, %1659, %1660, %1661, %1662, %1663, %1664, %1665, %1666, %1667, %1668, %1669, %1670, %1671, %1672, %1673, %1674, %1675, %1676, %1677, %1678, %1679, %1680, %1681, %1682, %1683) %1685 : bool = aten::all(%1684) ``` Same example after this change: ``` %1625 : None = prim::Constant() %1626 : bool = aten::__is__(%self_size.16, %1625) %1627 : bool = aten::__is__(%other_size.16, %1625) %1628 : bool = aten::__is__(%self_size.14, %1625) %1629 : bool = aten::__is__(%self_size.12, %1625) %1630 : bool = aten::__is__(%self_size.10, %1625) %1631 : bool = aten::__is__(%other_size.10, %1625) %1632 : bool = aten::__is__(%self_size.8, %1625) %1633 : bool = aten::__is__(%other_size.8, %1625) %1634 : bool = aten::__is__(%self_size.6, %1625) %1635 : bool = aten::__is__(%other_size.6, %1625) %1636 : bool = aten::__is__(%self_size.4, %1625) %1637 : bool = aten::__is__(%other_size.4, %1625) %1638 : bool = prim::AutogradAllNonZero(%0, %17, %18, %19, %20, %ingate.7, %forgetgate.7, %cellgate.7, %30, %31, %34, %35, %outgate.7, %41, %42, %43) %1639 : bool = prim::AutogradAllZero(%2, %3, %4, %5, %6, %7, %8, %9, %10, %11, %12, %13, %14, %15, %16) %1640 : bool[] = prim::ListConstruct(%1626, %1627, %1628, %1629, %1630, %1631, %1632, %1633, %1634, %1635, %1636, %1637, %1638, %1639) %1641 : bool = aten::all(%1640) ``` My performance measurements showed some changes, but I don't really trust them and think that they are probably just a noise. Below are tables with min-aggregation over 10 runs: FastRNN models: \| name \| base time (s) \| diff time (s) \| % change \| \| :--- \| ---: \| ---: \| ---: \| \| lstm[aten]:bwd \| 30.059927 \| 29.834089 \| -0.8% \| \| lstm[aten]:fwd \| 25.673708 \| 25.700039 \| 0.1% \| \| lstm[cudnn]:bwd \| 17.866232 \| 17.893120 \| 0.2% \| \| lstm[cudnn]:fwd \| 11.418444 \| 11.408514 \| -0.1% \| \| lstm[jit]:bwd \| 27.127205 \| 27.141029 \| 0.1% \| \| lstm[jit]:fwd \| 17.018047 \| 16.975451 \| -0.3% \| \| lstm[jit_multilayer]:bwd \| 27.502396 \| 27.365149 \| -0.5% \| \| lstm[jit_multilayer]:fwd \| 16.918591 \| 16.917767 \| -0.0% \| \| lstm[jit_premul]:bwd \| 22.281199 \| 22.215082 \| -0.3% \| \| lstm[jit_premul]:fwd \| 14.848708 \| 14.896231 \| 0.3% \| \| lstm[jit_premul_bias]:bwd \| 20.761206 \| 21.170969 \| 2.0% \| \| lstm[jit_premul_bias]:fwd \| 15.013515 \| 15.037978 \| 0.2% \| \| lstm[jit_simple]:bwd \| 26.715771 \| 26.697786 \| -0.1% \| \| lstm[jit_simple]:fwd \| 16.675898 \| 16.545893 \| -0.8% \| \| lstm[py]:bwd \| 56.327065 \| 54.731030 \| -2.8% \| \| lstm[py]:fwd \| 39.876324 \| 39.230572 \| -1.6% \| Torch Hub models: \| name \| base time (s) \| diff time (s) \| % change \| \| :--- \| ---: \| ---: \| ---: \| \| test_eval[BERT_pytorch-cuda-jit] \| 0.111706 \| 0.106604 \| -4.6% \| \| test_eval[LearningToPaint-cuda-jit] \| 0.002841 \| 0.002801 \| -1.4% \| \| test_eval[Super_SloMo-cuda-jit] \| 0.384869 \| 0.384737 \| -0.0% \| \| test_eval[attension_is_all_you_nee...-cuda-jit] \| 0.123857 \| 0.123923 \| 0.1% \| \| test_eval[demucs-cuda-jit] \| 0.077270 \| 0.076878 \| -0.5% \| \| test_eval[fastNLP-cuda-jit] \| 0.000255 \| 0.000249 \| -2.3% \| \| test_eval[moco-cuda-jit] \| 0.426472 \| 0.427380 \| 0.2% \| \| test_eval[pytorch_CycleGAN_and_pix...-cuda-jit] \| 0.026483 \| 0.026423 \| -0.2% \| \| test_eval[pytorch_mobilenet_v3-cuda-jit] \| 0.036202 \| 0.035853 \| -1.0% \| \| test_eval[pytorch_struct-cuda-jit] \| 0.001439 \| 0.001495 \| 3.9% \| \| test_train[BERT_pytorch-cuda-jit] \| 0.247236 \| 0.247188 \| -0.0% \| \| test_train[Background_Matting-cuda-jit] \| 3.536659 \| 3.581864 \| 1.3% \| \| test_train[LearningToPaint-cuda-jit] \| 0.015341 \| 0.015331 \| -0.1% \| \| test_train[Super_SloMo-cuda-jit] \| 1.018626 \| 1.019098 \| 0.0% \| \| test_train[attension_is_all_you_nee...-cuda-jit] \| 0.446314 \| 0.444893 \| -0.3% \| \| test_train[demucs-cuda-jit] \| 0.169647 \| 0.169846 \| 0.1% \| \| test_train[fastNLP-cuda-jit] \| 0.001990 \| 0.001978 \| -0.6% \| \| test_train[moco-cuda-jit] \| 0.855323 \| 0.856974 \| 0.2% \| \| test_train[pytorch_mobilenet_v3-cuda-jit] \| 0.497723 \| 0.485416 \| -2.5% \| \| test_train[pytorch_struct-cuda-jit] \| 0.309692 \| 0.308792 \| -0.3% \| Differential Revision: D23794659 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 859b68868ef839c5c6cbc7021879ee22d3144ea8	2020-09-24 14:31:49 -07:00
Jeremy Lilley	7040a070e3	[torch] Minor: Avoid ostreamstring in Operator's canonicalSchemaString() (#44442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44442 I noticed lock contention on startup as lookupByLiteral() was calling registerPendingOperators() - some calls were holding the lock for 10+ ms, as operators were being registered. canonicalSchemaString() was using ostreamstring, which isn't typically particularly fast (partly because of c++ spec locale requirements). If we repalce with regular c++ string appends, it's somewhat faster (which isn't hard when comparing with stringstream; albeit a bit more codegen) Over the first minute or so, this cuts out 1.4 seconds under the OperatorRegistry lock (as part of registerPendingOperators) in the first couple minutes of run time (mostly front-loaded) when running sync sgd. As an example, before: registerPendingOperators 12688 usec for 2449 operators After: registerPendingOperators 6853 usec for 2449 operators ghstack-source-id: 111862971 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/... Reviewed By: ailzhang Differential Revision: D23614515 fbshipit-source-id: e712f9dac5bca0b1876e11fb8f0850402f03873a	2020-09-14 08:24:16 -07:00
Wanchao Liang	ab6126b50e	[rpc][jit] support remote call in TorchScript (#43046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43046 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23621108 Pulled By: wanchaol fbshipit-source-id: e8152c6cdd3831f32d72d46ac86ce22f3f13c651	2020-09-11 14:59:51 -07:00
Wanchao Liang	3e5df5f216	[rpc][jit] support rpc_sync in TorchScript (#43043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43043 This add the support for rpc_sync in TorchScript in a way similar to rpc_async Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23252039 Pulled By: wanchaol fbshipit-source-id: 8a05329cb8a24079b2863178b73087d47273914c	2020-09-11 14:59:47 -07:00
Elias Ellison	a7e7981c0b	Use prim::TensorExprGroup interned symbol (#43635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43635 Intern the symbol, no functional changes. Aliasing need to be looked at but this should be done in a separate PR; this PR is just changing the symbol. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358806 Pulled By: eellison fbshipit-source-id: f18bcd142a0daf514136f019ae607e4c3f45d9f8	2020-08-31 11:52:16 -07:00
Nikolay Korovaiko	000739c31a	Function calls for fallback paths (#43274 ) Summary: This PR adds API to package unoptimized/fallback blocks as function calls. It's mainly meant to be used by TensorExpressionsFuser and SpecializeAutogradZero passes as both specialize the original graph but would also like to provide a fallback path in case the assumptions under which the graph was specialized do not hold for some inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43274 Reviewed By: malfet Differential Revision: D23406961 Pulled By: Krovatkin fbshipit-source-id: ef21fc9ad886953461b09418d02c75c58375490c	2020-08-28 23:31:02 -07:00
Elias Ellison	01f974eb1e	Specialize optionals for grad_sum_to_size (#43633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43633 In the backward graph, _grad_sum_to_size is inserted whenever a possibly broadcasting op is called:" `"aten::_grad_sum_to_size(Tensor(a) self, int[]? size) -> Tensor(a)"` If a broadcast occurred, a sum is called, otherwise the second input is None and it is a no-op. Most of the time, it's a no-op (in the fast RNNs benchmark > 90% of the time). We can get rid of this op by profiling the optionality of the second input. I added `prim::profile_optional` to do this, which counts the number of times it saw a None value and the number of times it saw a value present. When specializing the backward graph, we insert checks for values we profiled as None, and in the optimized block can remove the grad_sum_to_size calls that use those values. In the future we may revisit this when NNC supports reductions and we want to replace grad_sum_to_size with sums as well, but I think this is worth landing now. Test Plan: Imported from OSS Reviewed By: bwasti, ZolotukhinM Differential Revision: D23358809 Pulled By: eellison fbshipit-source-id: a30a148ca581370789d57ba082d23cbf7ef2cd4d	2020-08-27 14:35:37 -07:00
Zino Benaissa	40c77f926c	Add prim::TypeCheck operation (#43026 ) Summary: TypeCheck is a new operation to check the shape of tensors against expectd shapes. TypeCheck is a variadic operation. An example, %t0 : Tensor = ... %t1 : Tensor = ... %2 : FLOAT(20, 20), %3 : FLOAT(30, 30), %1 : bool = prim::TypeCheck(%t1, %t2) prim::If(%1) Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43026 Reviewed By: ZolotukhinM Differential Revision: D23115830 Pulled By: bzinodev fbshipit-source-id: fbf142126002173d2d865cf4b932dea3864466b4	2020-08-21 20:03:24 -07:00
Linbin Yu	dac393fa24	[PT] enforce duplicate op name check on mobile Summary: Enforce duplicate op name check on mobile Test Plan: run full/lite predictor Reviewed By: iseeyuan Differential Revision: D22639758 fbshipit-source-id: 2993c4bc1b14c833b273183f4f343ffad62121b3	2020-07-21 13:14:17 -07:00
Linbin Yu	46eb8d997c	Revert D22533824: [PT] add check for duplicated op names in JIT Test Plan: revert-hammer Differential Revision: D22533824 (`d72c9f4200`) Original commit changeset: b36884531d41 fbshipit-source-id: 8bf840a09b4001cc68858a5dc3540505a0e1abdc	2020-07-18 17:26:42 -07:00
Linbin Yu	d72c9f4200	[PT] add check for duplicated op names in JIT (#41549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41549 D22467871 (`a548c6b18f`) was reverted due to double linking torch_mobile_train. Re-do this change after D22531358 (`7a33d8b001`). Test Plan: buck install fb4a Train mnist in Internal Settings. Reviewed By: iseeyuan Differential Revision: D22533824 fbshipit-source-id: b36884531d41cea2e76b7fb1a567f21106c612b6	2020-07-17 20:26:48 -07:00
Linbin Yu	a1ed6e1eb3	Revert D22467871: add check for duplicated op registration in JIT Test Plan: revert-hammer Differential Revision: D22467871 (`a548c6b18f`) Original commit changeset: 9b7a40a217e6 fbshipit-source-id: b594d4d0a079f7e24ef0efb45476ded2838cbef1	2020-07-10 23:39:23 -07:00
Linbin Yu	a548c6b18f	add check for duplicated op registration in JIT (#41214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41214 Same as D21032976, add check for duplicated op name in JIT Test Plan: run full JIT predictor also buck test pytorch-playground Reviewed By: smessmer Differential Revision: D22467871 fbshipit-source-id: 9b7a40a217e6c63cca44cad54f9f657b8b207a45	2020-07-10 12:19:04 -07:00
Meghan Lele	d58b8222b7	[JIT] Add support for with statements (#34705 ) Summary: Summary This commit adds support for with statements to PyTorch JIT. Each of the with items in a with statement is represented in the JIT IR as a pair of `prim::Enter` and `prim::Exit` nodes that call the `__enter__` and `__exit__` methods defined on the context manager objects returned by the expressions in the with item. Testing This commit adds unit tests for with statements with named with items, nameless with items, and with statements that encounter exceptions. ``` $ python test/test_jit.py TestWith.test_with_as Fail to import hypothesis in common_utils, tests are not derandomized . ---------------------------------------------------------------------- Ran 1 test in 0.430s OK ``` ``` $ python test/test_jit.py TestWith.test_with_no_as Fail to import hypothesis in common_utils, tests are not derandomized . ---------------------------------------------------------------------- Ran 1 test in 0.264s OK ``` ``` $ python test/test_jit.py TestWith.test_with_exceptions Fail to import hypothesis in common_utils, tests are not derandomized Couldn't download test skip set, leaving all tests enabled... . ---------------------------------------------------------------------- Ran 1 test in 1.053s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34705 Differential Revision: D22095945 Pulled By: SplitInfinity fbshipit-source-id: f661565a834786725259b8ea014b4d7532f9419d	2020-06-18 16:57:18 -07:00
Nikita Shulga	ea0cab7f46	Guard listener removal add by `at::Dispatcher::addListener()` with mutex (#35486 ) Summary: Use std::list instead of std::vector to avoid iterating over list of registered listeners Also, fix formatting Pull Request resolved: https://github.com/pytorch/pytorch/pull/35486 Differential Revision: D20677764 Pulled By: malfet fbshipit-source-id: d2a545454a29a12bbbf4aa62d9f8c4029a109e6c	2020-03-26 15:31:12 -07:00
Meghan Lele	6384c2d81b	[JIT] clang-format JIT code (#35115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115 This commit runs the newly added tools/clang_format.py on the JIT codebase and includes all of the formatting changes thus produced. Testing: Ran the script, CI. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D20568523 Pulled By: SplitInfinity fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b	2020-03-26 11:24:51 -07:00
Nikita Shulga	be0cdf5d15	[jit] Implement `torch::jit::deregisterOperator` (#35107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35107 Test Plan: CI + `bin/test_jit --gtest_filter=JitTest.CustomOperators --gtest_repeat=2` Differential Revision: D20650516 Pulled By: malfet fbshipit-source-id: 4a7d939498588c812319e7c1f432d54e6edf2189	2020-03-25 22:51:30 -07:00
Elias Ellison	5b2f8cef08	[JIT] Functional Graph Pass (#33020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33020 This is a pass to create functional blocks. The other PRs in the stack help avoid some of the limitations that are are often found in graphs. It's possible that this would work well with a graph that is frozen. Follow up work items that will help this pass: - We don't currently have any capacity in alias analysis to tell whether a Value that came from the wildcard set "re-escapes" back into the wildcard set. - More comments on the semantics of the graph and correctness conditions - We could consider using dynamic dag if the perf of this is a limitation. - potential make Functional Graphs Functional Blocks instead, so that we do not repeatedly copy constants, also to make IR read easier. Test Plan: Imported from OSS Differential Revision: D20603188 Pulled By: eellison fbshipit-source-id: 6822a6e65f4cc2676f8f6445fe8aa1cb858ebeeb	2020-03-24 23:44:18 -07:00
Michael Suo	c235be42dd	[jit] kill script namespace (#34515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515 Once upon a time we thought this was necessary. In reality it is not, so removing it. For backcompat, our public interface (defined in `api/`) still has typedefs to the old `script::` names. There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph transform. I renamed one of them. Test Plan: Imported from OSS Differential Revision: D20353503 Pulled By: suo fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93	2020-03-11 23:32:48 -07:00
Jie	2b79bab029	[CUDA_FUSER] Fork CUDA fuser (#33527 ) Summary: Separating CUDA fuser from CPU fuser. 1. New node in IR - prim::CudaFusionGroup: This enables the cuda fuser to co-exist along side the old fuser. Allows us to incrementally build and expand cuda fuser. 2. copied FuseGraph optimization passes to CudaFuserGraph: We will re-factor & reuse Chunk/Concat in the old fuser logic, which is handled in the optimization pass at this moment. Unfortunately many code in the pass is tightly binded with the legacy fuser, which makes code sharing difficult. The CudaFusionGraph will support only a subset of operations comparing to legacy fuser (CUDA only). It is registered as a custom pass post fusion via ```torch._C._jit_register_cuda_fuser()``` To have it in effect, you should also turn off fusion on GPU via ```torch._C._jit_override_can_fuse_on_gpu(False)``` 3. We don't have codegen in this PR yet (WIP). Currently we just fall back to the old fuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33527 Differential Revision: D20171598 Pulled By: ZolotukhinM fbshipit-source-id: 9a3c0f06f46da7eaa80ae7551c04869f5b03ef71	2020-03-04 20:25:08 -08:00
Shihao Xu	7d01888a75	[JIT] Register rpc.rpc_async(..) as a JIT operator (#33329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33329 # Use case ``` torch.jit.script def send_rpc_async(dst_worker_name, user_callable_qual_name, tensor): # type: (str, str, Tensor) -> None rpc._rpc_async_torchscript( dst_worker_name, user_callable_qual_name, args=(tensor,) ) ``` # Problem ``` torch.jit.frontend.NotSupportedError: keyword-arg expansion is not supported: File "/data/users/shihaoxu/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/rpc/rpc_spawn#binary,link-tree/torch/distributed/rpc/api.py", line 722 args = args if args else () kwargs = kwargs if kwargs else {} fut = _invoke_rpc_torchscript(to, qualified_name, args, *kwargs) ~~~~~~ <--- HERE return fut ``` # Solution Register `rpc.rpc_async(..)` as a JIT operator to handle variable-length argument list. # Plan This PR is the required changes to make `rpc.rpc_async(..)` a JIT prim operator, which can dynamically handle different number of arguments. - Register "prim::rpc_async" as a `Symbol` in "interned_string.h" - Add a if branch in "python_sugared_value.cpp" `toSugarValue(py::object, ..)` entry utility function to set up how JIT frontend convert `torch.distributed.rpc.rpc_async(..)` Python function (Python object) into a `SpecialFormValue` (IR SugaredValue). - Add a switch case for "prim::rpc_aynsc" Symbol in "ir_emitter.cpp" and `emitApplySpecialForm(..)` to set up how JIT compiler provides inputs to the "prim::rpc_aynsc" Operator. - Register "prim::rpc_async" as a `jit::Operator` and provide implementation in "register_distributed_ops.cpp". Notice, since the distributed module is an optional part when building PyTorch. The code to be added in this PR should be wrapped within preprocessing maco. ``` #ifdef USE_DISTRIBUTED new code here #endif ``` Test Plan: Items that need to be confirmed in the test cases https://fb.quip.com/DCvdA9ZLjeO0 ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_call_python_function_remotely_from_script_not_supported ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_spawn ``` ``` buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:layer_norm_op_test-2.7 -- test_layer_norm_op_jit ``` Differential Revision: D5738300 fbshipit-source-id: a4604fe762e00be062dc8232ca9790df31fb2074	2020-03-03 19:57:42 -08:00
Meghan Lele	cb8d9f99aa	[JIT] Implement Tensor.tolist() (#33472 ) Summary: Summary This commit adds an implementation of `Tensor.tolist()` to the JIT interpreter. Testing This commit adds several unit tests that test that this function works correctly for 0D, 1D, 2D and 3D tensors of type `float`, `int` and `bool`. ``` (base) meghanl-mbp:pytorch meghanl$ python test/test_jit.py TestList.test_to_list -v Fail to import hypothesis in common_utils, tests are not derandomized test_to_list (jit.test_list_dict.TestList) Unit tests for Tensor.tolist() function. ... ok ---------------------------------------------------------------------- Ran 1 test in 0.329s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33472 Differential Revision: D20109738 Pulled By: SplitInfinity fbshipit-source-id: a6e3fee5e3201d5e1f0c4ca45048488ae2bf5e33	2020-02-27 21:45:46 -08:00
Michael Suo	db4a24e008	[jit] remove some unused/redundant files (#33806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33806 as title Test Plan: Imported from OSS Differential Revision: D20122117 Pulled By: suo fbshipit-source-id: 209d29ed2c873181140c9fb5cdc305c200ce4008	2020-02-27 17:16:12 -08:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00

50 Commits