pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Elias Ellison	39be20f259	[JIT][NNC] Add handling of strides to dynamic shape support. (#70464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464 Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/: ``` S_ONE, // STRIDE_ONE: packed S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1] S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1] S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value ``` and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern. Output striding will be done in a follow up. The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes. As an example: ``` %8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)``` ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D33458649 Pulled By: eellison fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d	2022-01-12 09:11:31 -08:00
Raghavan Raman	616afcf981	[jit] [shape analysis] Move constant tensors out of fused subgraphs during generalization (#70320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70320 ghstack-source-id: 146514368 Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/jit:jit` Reviewed By: eellison Differential Revision: D33280508 fbshipit-source-id: fe4291d7c49f0a498b330de96b698e99f6f6a505	2022-01-05 10:19:14 -08:00
Elias Ellison	fc82ad186a	Add Initial NNC Dynamic Shapes Flow (#66136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136 FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack.. Takes in a TensorExprGraph of static shapes and generalizes the input shapes to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise dimensions with the same value will be bucketed to the same symbolic shape. E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)` From there, runs symbolic shape inference on the graph, and creates a versioning if in the graph with prim::TensorExprDynamicGuard checking if the inputs at runtime match the Generalized Symbolic Shapes that are inputs to the TE Kernel. The computate to calculate all symbolic dimensions is inlined in to the if block with the TE Kernel. All Sym Dim Value* are appended to the end of the TE Kernel Graph/Node inputs, and the Node is augmented with a integer list attr `symbolic_shape_inputs` that gives the mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in `test_shape_analysis` Returns True on Success, False on Failure, can fail if shape propagation fails to propagate # of dims or if complete shapes on inputs not set. Example transformation ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : int = prim::Constant[value=0]() %4 : Tensor = aten::tanh(%x.1) %5 : Tensor = aten::erf(%4) %6 : Tensor = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor = aten::cat(%7, %3) %9 : Tensor = aten::hardswish(%8) %10 : Tensor = aten::mul(%9, %z) return (%9) ``` -> ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp) %5 : Tensor = prim::If(%4) block0(): %15 : int[] = aten::size(%x_inp) %16 : int[] = aten::size(%y_inp) %17 : int = prim::Constant[value=1]() %18 : int = prim::Constant[value=0]() %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10 %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10 %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10 %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29 %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3) -> (%3) block1(): %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp) -> (%14) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu), %SS_5 : int, %SS_4 : int, %SS_3 : int, %SS_2 : int): %3 : int = prim::Constant[value=0]() %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1) %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4) %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3) %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8) %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z) return (%9) ``` Test Plan: Imported from OSS Reviewed By: navahgar, anjali411 Differential Revision: D31797466 Pulled By: eellison fbshipit-source-id: b508d2f5baef6e8e4020955ab1d4bc4b9c7bdfdd	2021-10-28 17:09:03 -07:00
Michael Suo	1bf0e1acb4	Revert D31732414: Add Initial NNC Dynamic Shapes Flow Test Plan: revert-hammer Differential Revision: D31732414 (`de4fe7a38c`) Original commit changeset: 290a94a667c2 fbshipit-source-id: 3021a1d7a8661967e37d4f9cfc86ed47cc4a7f3d	2021-10-19 20:05:29 -07:00
Elias Ellison	de4fe7a38c	Add Initial NNC Dynamic Shapes Flow (#66136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136 FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack.. Takes in a TensorExprGraph of static shapes and generalizes the input shapes to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise dimensions with the same value will be bucketed to the same symbolic shape. E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)` From there, runs symbolic shape inference on the graph, and creates a versioning if in the graph with prim::TensorExprDynamicGuard checking if the inputs at runtime match the Generalized Symbolic Shapes that are inputs to the TE Kernel. The computate to calculate all symbolic dimensions is inlined in to the if block with the TE Kernel. All Sym Dim Value* are appended to the end of the TE Kernel Graph/Node inputs, and the Node is augmented with a integer list attr `symbolic_shape_inputs` that gives the mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in `test_shape_analysis` Returns True on Success, False on Failure, can fail if shape propagation fails to propagate # of dims or if complete shapes on inputs not set. Example transformation ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : int = prim::Constant[value=0]() %4 : Tensor = aten::tanh(%x.1) %5 : Tensor = aten::erf(%4) %6 : Tensor = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor = aten::cat(%7, %3) %9 : Tensor = aten::hardswish(%8) %10 : Tensor = aten::mul(%9, %z) return (%9) ``` -> ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp) %5 : Tensor = prim::If(%4) block0(): %15 : int[] = aten::size(%x_inp) %16 : int[] = aten::size(%y_inp) %17 : int = prim::Constant[value=1]() %18 : int = prim::Constant[value=0]() %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10 %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10 %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10 %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29 %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3) -> (%3) block1(): %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp) -> (%14) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu), %SS_5 : int, %SS_4 : int, %SS_3 : int, %SS_2 : int): %3 : int = prim::Constant[value=0]() %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1) %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4) %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3) %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8) %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z) return (%9) ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732414 Pulled By: eellison fbshipit-source-id: 290a94a667c20467717202a43c60e4f9ca4c00e2	2021-10-19 16:41:49 -07:00

5 Commits