pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	2d72cb3373	Revert "[JIT] Allow registering Decompositions" This reverts commit `d9f0774f98`. Reverted https://github.com/pytorch/pytorch/pull/76252 on behalf of https://github.com/zengk95	2022-04-26 04:47:05 +00:00
Elias Ellison	d9f0774f98	[JIT] Allow registering Decompositions - Allow registering custom decompositions - Add easier API for invoking decompositions - Shorten API names (no users yet) I am doing these as one pr because they are fairly short/simple and because github first does not support ghstack yet. cc @Chillee @zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76252 Approved by: https://github.com/davidberard98	2022-04-26 03:00:35 +00:00
David Berard	272890998e	[JIT] pass more exception info through the JIT interpreter If TORCH_SHOW_CPP_STACKTRACES=1, then dump e.what() into the RuntimeError, which should make it easier to debug exceptions that happen within interpreted sections. Test: ```patch diff --git a/test/cpp/jit/test_dce.cpp b/test/cpp/jit/test_dce.cpp index 6f9161d0d9..7c574787cf 100644 --- a/test/cpp/jit/test_dce.cpp +++ b/test/cpp/jit/test_dce.cpp @@ -3,6 +3,10 @@ #include <torch/csrc/jit/ir/irparser.h> #include <torch/csrc/jit/passes/dead_code_elimination.h> #include <torch/csrc/jit/testing/file_check.h> +#include <torch/csrc/jit/runtime/interpreter.h> +#include <test/cpp/jit/test_utils.h> + +#include <ATen/ATen.h> namespace torch { namespace jit { @@ -48,5 +52,30 @@ graph(): // Check that dead code elimin testing::FileCheck().run(input, *graph); } + +TEST(EliminateDeadCodeTest, interpreterfailure) { + const std::string input = R"IR( +graph(%x.1 : Tensor): + %2 : int = prim::Constant[value=128]() # /data/users/dberard/scripts/DGB/sz.py:4:38 + %3 : int = prim::Constant[value=256]() # /data/users/dberard/scripts/DGB/sz.py:4:43 + %5 : int = prim::Constant[value=1]() # /data/users/dberard/scripts/DGB/sz.py:4:53 + %4 : int[] = prim::ListConstruct(%2, %3) + %6 : Tensor[] = aten::split_with_sizes(%x.1, %4, %5) # /data/users/dberard/scripts/DGB/sz.py:4:11 + return (%6) +)IR"; + auto graph = std::make_shared<Graph>(); + parseIR(input, graph.get()); + + //auto stack = createStack({at::randn({2, 383}, at::kCPU)}); + auto stack = createStack({at::Tensor{}}); + + Code code(graph, ""); + InterpreterState interpreter{code}; + interpreter.run(stack); + ASSERT_EQ(2, stack.size()); + ASSERT_FALSE(stack[0].toTensor().defined()); + ASSERT_FALSE(stack[1].toTensor().defined()); +} + } // namespace jit } // namespace torch ``` ^ use this to repro the interpreter issue: `TORCH_SHOW_CPP_STACKTRACES=1 ./bin/test_jit --gtest_filter="EliminateDeadCodeTest.interpreterfailure"` and the stack trace is shown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75682 Approved by: https://github.com/eellison	2022-04-21 18:26:49 +00:00
Elias Ellison	0c671c15ec	[JIT] Remove CSE Hoisting This has led to a couple bugs, and I don't think the additional complexity was worth keeping in codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75756 Approved by: https://github.com/davidberard98	2022-04-19 20:59:25 +00:00
John Clow	f281d83d77	Moving Remove Tensor Type Specializations to after custom passes This is to allow for Intel folks to use type information in their custom passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71748 Approved by: https://github.com/eellison	2022-04-11 22:12:01 +00:00
Elias Ellison	43b56b3814	Add Parsing of tensor constants (#75119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75119 Add support for parsing Tensor constants like Double(4, 4) ... by initializing random tensors. This makes saving IR and then parsing it lossy, so I have it toggled as default not on, but is useful in cases like repro-ing Fusions with tensor constants post-freezing. cc Krovatkin Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D35373999 Pulled By: eellison fbshipit-source-id: a5c8d9f93f23a7442258fc745ed6b6def330dca8 (cherry picked from commit 32dd6567522973563bd452bf486ed27b02e4e35c)	2022-04-06 18:00:53 +00:00
David Berard	e9e75215e2	[JIT] Optionally validate nvfuser outputs after execution (#74361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74361 This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback. ```python import torch def callback(x, y, graph): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) print(graph) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(True, callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D34975310 Pulled By: davidberard98 fbshipit-source-id: 2379c9a6f371cd58da6a187c1f16882f3923ab24 (cherry picked from commit 96c87992c65f5e6bb1bdd51791682dd837af99b4)	2022-04-01 23:48:30 +00:00
Elias Ellison	2ef5611f31	Add comments for adding shape function and linting (#73570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570 Approved by: https://github.com/huiguoo Test Plan: contbuild & OSS CI, see `6d36bbde7e` Reviewed By: pbelevich Differential Revision: D35192688 Pulled By: atalman fbshipit-source-id: b12b80e6a6dd1adaa57a8facb6bb077989faa543 (cherry picked from commit e50478c02592597f12b8490ec5496f76c7d8b8cc)	2022-03-31 04:25:43 +00:00
Nikita Shulga	3036a0309d	[skip ci]Revert "Add comments for adding shape function and linting" This is a technical revert of `6d36bbde7e` to reconcile it with e50478c02592597f12b8490ec5496f76c7d8b8cc (which is the same + lint changes applied) Should be skipped during import	2022-03-30 21:21:28 -07:00
Elias Ellison	6d36bbde7e	Add comments for adding shape function and linting Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570 Approved by: https://github.com/huiguoo	2022-03-29 23:02:22 +00:00
Elias Ellison	aacdf291e0	[JIT] Make aot autograd decompositions usable in JIT, add script for serializing the decompositions (#73938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73938 This is a first step in porting and making usable all of the decompositions defined in [functorch](https://github.com/pytorch/functorch/blob/main/functorch/_src/decompositions.py#L349) in core and in JIT as well as C++. The decompositions are defined in python, scripted and inlined, and then serialized as C++ code which TorchScript can parse. The workflow is edit python decomposition file then run [tools/codegen/decompositions/gen_jit_decompositions.py](https://github.com/pytorch/pytorch/pull/73938/files#diff-6adef2116be233c3524e3b583e373ab0ffc9169beb6c1f6d96b5d0385e75afa1). Decompositions are mapped to their corresponding aten schemas via the schema in their python def. This allows multiple decompositions for an overloaded op like `aten.var` (shown here in the example). This is just a first PR, i'm sure there will be many follows ups such as: - making these runnable in C++ with simple executor - porting over more decompositions from AOT Autograd - Using opinfos / more robust testing - Categorizing decompositions - Hooking in decompositions at various points of JIT execution Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D34938126 Pulled By: eellison fbshipit-source-id: 9559a7cb731982e3a726f2f95af498b84fb09c13 (cherry picked from commit a4e0e748791e378e7e12a9dd0b63fb3c62dc1890)	2022-03-29 18:38:52 +00:00
Oleg Khabinov	5079321b71	Fix issue with prim::Print() and torch::deploy (#74513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74513 Reviewed By: d4l3k, houseroad Differential Revision: D35035089 fbshipit-source-id: d67b98600c74e2ed16b4d80f52148cd64b9e6ca0 (cherry picked from commit 16caf865077e28be31b805f015b9a61962632c8f)	2022-03-25 03:14:34 +00:00
CodemodService FBSourceClangFormatLinterBot	c9612cddb7	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zsol Differential Revision: D35109008 fbshipit-source-id: 35d37cc1d991569c6df8e65fc789803ac881012b (cherry picked from commit f5beda976adc343f90b8e622257b2bcac3ac0d27)	2022-03-24 09:35:26 +00:00
jiej	e4e19d5beb	nvfuser parser skip api (#74520 ) Summary: added python API to disable nvfuser on certain opkind. ``` "_jit_set_nvfuser_skip_node_kind", [](const std::string& op_name, bool flip = true) { return fuser::cuda::skipNode(op_name, flip); }) ``` Args: `op_name`: Symbol of op; `flip`: flag indicating whether to flip the given op in the skip list. Returns: a bool flag indicating if `op_name` was already in the skip list. The python example that disables the fusion of `aten::add` afterwards. `torch._C._jit_set_nvfuser_skip_node_kind("aten::add", True) # returns False, as no op is in skip list by default` Pull Request resolved: https://github.com/pytorch/pytorch/pull/74520 Reviewed By: saketh-are Differential Revision: D35046110 Pulled By: davidberard98 fbshipit-source-id: 689f5286513dbab206768823a852467b9f6b49b6 (cherry picked from commit 9a31129f7591ba2d393ab057b1cd137a6a25e7e8)	2022-03-23 20:56:43 +00:00
Michael Suo	e5bf87963d	Revert D34584878: [pytorch][PR] Add JIT graph fuser for oneDNN Graph API (Preview4) Test Plan: revert-hammer Differential Revision: D34584878 (`7dd0823011`) Original commit changeset: ce817aa8cc90 Original Phabricator Diff: D34584878 (`7dd0823011`) fbshipit-source-id: a941aaad34f8fe5f0c51f719f9f5c29b811c4d5b (cherry picked from commit a43262ec7521b1665b02a64d3f279e72ee2344b9)	2022-03-21 23:07:14 +00:00
chunyuan	7dd0823011	Add JIT graph fuser for oneDNN Graph API (Preview4) (#68111 ) Summary: ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68111 Reviewed By: eellison Differential Revision: D34584878 Pulled By: malfet fbshipit-source-id: ce817aa8cc9052ee9ed930c9cf66be83449e61a4 (cherry picked from commit cd17683aa7d9c0947df45a1ab53627feff795587)	2022-03-21 22:12:19 +00:00
jjsjann123	0120ff759c	fixing assert condition (#74239 ) Summary: fixing assert for `_jit_set_fusion_strategy` Pull Request resolved: https://github.com/pytorch/pytorch/pull/74239 Reviewed By: H-Huang Differential Revision: D34896284 Pulled By: eellison fbshipit-source-id: a4daec70f68dcae2098447551ea071c744f6b0b7 (cherry picked from commit 60746f45b69e0448232626d1d601e8051dc5d427)	2022-03-15 19:28:52 +00:00
David Berard	b5244b8470	[JIT] add keep_unique_names arg to canonicalize python bindings (#74074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74074 Adds the keep_unique_names argument to the python binding for Canonicalize. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D34821816 Pulled By: davidberard98 fbshipit-source-id: 7932562cb20e504494f53b83484393bb296e717a (cherry picked from commit 62bbcff972287550eeaa3ddb0e5c35ff2bbe60ad)	2022-03-11 22:35:55 +00:00
anjali411	086645ad77	Update __torch_dispatch__ to return op overload instead of the opoverload packet function (#72673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72673 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D34627164 Pulled By: anjali411 fbshipit-source-id: 3cb6406a392d530bf9da36b4d8e0a62b30e6497e (cherry picked from commit 65b85a0a67df4d0f16ac8964e2b685d478a610fb)	2022-03-07 22:38:42 +00:00
Vasiliy Kuznetsov	bf896a2988	dbr quant: add torchscript pass to remove redundant aliases (#71230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71230 DBR quantization uses `torch.Tensor.as_subclass` frequently. When the quantized model is traced with `torch.jit.trace`, these calls appear in the resulting graph as `aten::alias`. This PR adds a pass to remove these calls from the graph, for two reasons: 1. ease of debugging (these calls do nothing) 2. less work for downstream passes (for example, converting to ONNX currently breaks if these alias calls are present) For now, we have to inline the graph in order for `aliasDb` to determine safety properly. In the future, we may choose to relax this if there is a need for it. Test Plan: Test plan is pretty basic for now, it can be improved in future PRs. ``` python test/test_quantization.py TestQuantizeDBR.test_jit_tracing_removes_aliases ``` Reviewed By: eellison Differential Revision: D33552387 Pulled By: vkuzo fbshipit-source-id: 681a33ddfff394a91e971263ac593afd93c5ea78 (cherry picked from commit 0f8412725d0c6fd9ef1072a50d4203465aa5d1f9)	2022-03-03 15:31:53 +00:00
BowenBao	bbac8c9c48	[ONNX] List of files to consider for mergebot onnx rule (#72297 ) Summary: Based on past PRs, here is an non-exhaustive list of files to consider for extension. The PR is not meant to be final. Based on feedback and discussion, files could be dropped from the list, or PR could be updated to move code around such that extension is no longer needed. List of files below and description: * These files are for converting from IR to ONNX proto. These should be used only for ONNX. ``` "torch/csrc/jit/serialization/export.", "torch/csrc/jit/serialization/onnx.", ``` * This file is touched whenever pass signature is updated. ``` "torch/_C/__init__.pyi.in", ``` * These files are touched whenever pass signature is updated. Somehow it's been convention that onnx passes are also added here, but it could be possible to move them. Let me know what you think. ~~"torch/csrc/jit/python/init.cpp",~~ ~~"torch/csrc/jit/python/script_init.cpp",~~ Update: Bowen will move onnx passes to files under onnx folder. * ~~Touched when need new attr::xxx, or onnx::xxx.~~ ~~"aten/src/ATen/core/interned_strings.h"~~ Update: Nikita will help separate this file. malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/72297 Reviewed By: H-Huang Differential Revision: D34254666 Pulled By: malfet fbshipit-source-id: 032cfa590cbedf4648b7335fe8f09a2380ab14cb (cherry picked from commit `88653eadbf`)	2022-02-16 23:01:13 +00:00
BowenBao	cc792746d2	[ONNX] De-duplicate initializers (#68202 ) (#69547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69547 ScriptModule export introduces duplicated ONNX initializers for shared weights, unnecessarily increases ONNX model size. This PR de-duplicates ONNX initializers for model exported in eval mode, by checking if the underlying tensors share the same `data_ptr`, `strides` and `sizes`. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32994271 Pulled By: malfet fbshipit-source-id: 10ac66638b6255890875272472aa9ed07a5b1d9a Co-authored-by: BowenBao <bowbao@microsoft.com> (cherry picked from commit `d7cbde940c`)	2022-02-11 22:05:15 +00:00
Anjali Chourdia	a1383a9cfa	Reland torch.ops API change machinery with the core functionality disabled (#71785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71785 see https://github.com/pytorch/pytorch/pull/67254 ghstack-source-id: 147648699 Test Plan: github CI Reviewed By: albanD Differential Revision: D33777229 fbshipit-source-id: 517b36be9743025eb40d708d380dae62e3663184 (cherry picked from commit `a637e69569`)	2022-02-02 16:06:29 +00:00
CodemodService FBSourceClangFormatLinterBot	ed435e903f	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33938055 fbshipit-source-id: 6c0643a18f09854e87e183341f252c66dd6395a6 (cherry picked from commit `fd183aedbc`)	2022-02-02 11:27:15 +00:00
Elias Ellison	f1499d6c18	Refactor PE so fusion specializations are configurable (#71650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71650 * Refactors PE so there is a current fusion strategy set, which will take in a vector of e.g. [(STATIC, 2), (DYNAMIC, 10)] which means fuse two static invocations then fuse 10 dynamic ones, then stop specializing. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33801501 Pulled By: eellison fbshipit-source-id: ebc7ac3c57e35a3b9bb15ab751f0aa1d25cc9bd5 (cherry picked from commit `8dd89088d3`)	2022-02-01 19:07:02 +00:00
Yan Li	6964aa2ced	backout D33469839 (#71443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71443 cogwheel test inline_cvr_infer_canary_pyper_model_publish is timing out. The convert_fx call takes > 20 mins for local and local_ro sub modules, which used to take ~ 2 mins. Test Plan: Fblearn flow run * the following cmd took 1113 seconds before the diff and 5002 seconds after. flow-cli clone-locally 320014219 --run-as-secure-group pytorch_at_scale --operators pyper_model_publish_workflow.pyper_model_publish_workflow.process_torch_package_model_files.process_non_sparse_parameters[0] Cogwheel test * Cogwheel test with packages in B3588 (the last good run) took 4694.48s * Cogwheel test with packages in B3590 (the first timeout) took 13975.83s * Cogwheel test with the following packages took 4535.04s * all packages in B3588 except the model publish * the model publish built with D33469839 (`043e84b3d2`) reversed (created D33633570) Reviewed By: albanD, jerryzh168 Differential Revision: D33633570 fbshipit-source-id: dc5e777c48a90c551641a3f79126461f6a60449e (cherry picked from commit `03ab65023a`)	2022-01-18 23:51:51 +00:00
CodemodService FBSourceClangFormatLinterBot	88012c7daf	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33577744 fbshipit-source-id: 7ecc8367998ee1dffde54c2f4dd3cfafe19a53c9	2022-01-14 06:10:57 -08:00
John Clow	ade83ed90c	Building Default Inference for Device Type (#69049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69049 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33555885 Pulled By: Gamrix fbshipit-source-id: 7364066cbc544ab8442a47c82ea89f0e73eaaa06	2022-01-13 13:57:08 -08:00
Elias Ellison	39be20f259	[JIT][NNC] Add handling of strides to dynamic shape support. (#70464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464 Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/: ``` S_ONE, // STRIDE_ONE: packed S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1] S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1] S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value ``` and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern. Output striding will be done in a follow up. The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes. As an example: ``` %8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)``` ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D33458649 Pulled By: eellison fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d	2022-01-12 09:11:31 -08:00
CodemodService FBSourceClangFormatLinterBot	fb8a9732d9	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33524330 fbshipit-source-id: 112291a23e2efe2d573bee86ead8ce2fc3957e5b	2022-01-11 04:33:21 -08:00
anjali411	043e84b3d2	Per-overload torch.ops API (#67254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254 Fixes https://github.com/pytorch/pytorch/issues/65997 BC breaking: `output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument. Follow up work: 1. disallow `default` as an overload name for aten operators. 2. Add a method to obtain a list of all overloads (exclude the ones registered by JIT) 3. Add methods/properties to `OpOverload` to access more schema information (types of input and output args etc) cc ezyang gchanan Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D33469839 Pulled By: anjali411 fbshipit-source-id: c3fc43460f1c7c9651c64b4d46337be21c400621	2022-01-10 17:29:06 -08:00
John Clow	80659b71a5	Hoisting common expressions out of If blocks [retry] (#65645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65645 This is a retry of PR: https://github.com/pytorch/pytorch/pull/59492 Latest Changes: Added more tests, added the getOrCreateDB pattern, updated logic to remove unnecessary checks addressed all comments. Adding code to find common expressions from the two subblocks of an if operation and hoist them before the if block. This also allows Dead Code Elimination to then eliminate some if blocks. Test Plan: python test_jit.py TestIfHoisting Reviewed By: eellison Differential Revision: D33302065 Pulled By: Gamrix fbshipit-source-id: a5a184a480cf07354359aaca344c6e27b687a3c2	2022-01-10 13:28:17 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	8bdbe94344	Add forward compatability tests in CI (#64139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64139 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30626912 Pulled By: tugsbayasgalan fbshipit-source-id: 781a88386701b42e2e86daaca0a779d1fc1c4df3	2022-01-05 23:40:06 -08:00
Michael Suo	402f2934bf	Revert D33262228: Per-overload torch.ops API Test Plan: revert-hammer Differential Revision: D33262228 (`8e6d1738a4`) Original commit changeset: 600dbf511514 Original Phabricator Diff: D33262228 (`8e6d1738a4`) fbshipit-source-id: 238fa88ea9c4f26c7511334765c07452fbca9655	2022-01-05 22:10:11 -08:00
anjali411	8e6d1738a4	Per-overload torch.ops API (#67254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254 Fixes https://github.com/pytorch/pytorch/issues/65997 TODO: disallow `default` as an overload name for aten operators. BC breaking: `output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument. cc ezyang gchanan Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33262228 Pulled By: anjali411 fbshipit-source-id: 600dbf511514ea9b41aea3e6b1bc1102dab08909	2022-01-05 15:17:41 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	4ae71c8d34	Add graph op replacement pass (#69915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69915 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198158 Pulled By: tugsbayasgalan fbshipit-source-id: f2b924edf9959aaf51f97db994fae031fa062cf8	2021-12-25 13:03:19 -08:00
jjsjann123	e429a68478	Allow single node fusion for nvfuser (#70000 ) Summary: Setting `PYTORCH_NVFUSER_ONE_OP_FUSION=1` will take all nodes nvFuser support, instead of waiting for fusion opportunity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70000 Reviewed By: samdow Differential Revision: D33292195 Pulled By: davidberard98 fbshipit-source-id: 8ed5ce5e82fbb6737e8ab5ce4223b038eaf47756	2021-12-23 17:07:57 -08:00
David Berard	c21169ea41	[JIT] optimize_for_inference on methods other than forward (#69367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69367 Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D32835529 Pulled By: davidberard98 fbshipit-source-id: d3066c23d071bc2a3bee59b8ab03b6ab0e43efcf	2021-12-07 12:36:47 -08:00
Nikolay Korovaiko	ab1d879b33	[WIP] forbid aliasing between the outputs of a differentiable graph (#67732 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67732 Reviewed By: cpuhrsch Differential Revision: D32522826 Pulled By: Krovatkin fbshipit-source-id: 9fdf3509dcd1b885f7c7f06d22b340c0f93bbe12	2021-11-18 15:03:35 -08:00
John Clow	a9c2f11d2a	Update Freezing Logic and add new passes (#68024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68024 Pull Request resolved: #67949 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32260614 Pulled By: eellison fbshipit-source-id: 41d7a9b45e33297a17560a22eba8973e2fc48b43	2021-11-09 13:21:52 -08:00
John Clow	ec8a71f9ac	Dtype Analysis for Unary and Binary ops with Metatensors (#66898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66898 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175961 Pulled By: Gamrix fbshipit-source-id: 72721259b900e5a311b6bcb5c350366ba420b734	2021-11-04 19:00:50 -07:00
Natalia Gimelshein	3d4a6ff15d	Revert D32154788: Move Concat Linear out of Optimize Numerics Test Plan: revert-hammer Differential Revision: D32154788 (`ea94dde573`) Original commit changeset: faa6465c89b3 fbshipit-source-id: 0dcaa65268b68ed01e6a5bc7b73ade1f51163b33	2021-11-04 12:20:02 -07:00
John Clow	ea94dde573	Move Concat Linear out of Optimize Numerics (#67196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67196 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32154788 Pulled By: Gamrix fbshipit-source-id: faa6465c89b3676d6b1ff7c20a677738a7fbdf88	2021-11-04 11:30:39 -07:00
Elias Ellison	2486061c72	[JIT] make x (+ or -) 0 and x (* or /) 1 peepholes type promotion aware (#67688 ) Summary: Some of the "no-ops" are not actually no-ops because they can change the dtype Pull Request resolved: https://github.com/pytorch/pytorch/pull/67688 Reviewed By: davidberard98 Differential Revision: D32104601 Pulled By: eellison fbshipit-source-id: ccb99179a4b30fd20b5a9228374584f2cdc8ec21	2021-11-03 20:11:46 -07:00
Nikolay Korovaiko	3db536e55e	add jit_trace_module python binding (#67425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67425 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31998564 Pulled By: Krovatkin fbshipit-source-id: f7e38c8c3f560f2c4e5ed62e1acae2c100efebd4	2021-11-02 23:55:23 -07:00
jjsjann123	1ec732bc46	Add fp16/fp32 autocasting to JIT/TorchScript (#63939 ) Summary: Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b) This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast. We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)` The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs. Few limitation/challenge that is not properly resolved in this PR: 1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules. 2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input') 3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value. Credit goes mostly to: tlemo kevinstephano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939 Reviewed By: navahgar Differential Revision: D31093381 Pulled By: eellison fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314	2021-10-27 12:11:36 -07:00
Nikolay Korovaiko	a7ebf76a15	jit trace (#59949 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59949 Reviewed By: ZolotukhinM Differential Revision: D31366787 Pulled By: Krovatkin fbshipit-source-id: 798cbcd97e8ecfba984f98cd70214954be9309af	2021-10-24 18:04:22 -07:00
Nikita Shulga	6f3f302d9f	[ONNX] Deprecate fold_if pass (#65697 ) (#66145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66145 Deprecate fold_if pass Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424097 fbshipit-source-id: 25b89679c756393a1065ca6aaa24d29db960cbd4 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-10-22 13:46:20 -07:00
Nikita Shulga	53a163a015	[ONNX] Export nn.Module call as ONNX local function (#63589 ) (#66140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66140 * Add new argument to export api to enable users specifying `nn.Module` classes that they wish to be exported as local function in ONNX model. * Refactor `torch/csrc/jit/serialization/export.cpp`, and remove redundant `EncoderBase` class. * ~~Contains changes from #63268~~ * Depends on #63716 to update onnx submodule. Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424098 fbshipit-source-id: c949d0b01c206c30b4182c2dd1a5b90e32b7a0d3 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-10-22 13:44:56 -07:00
Elias Ellison	63b41e1f4d	[JIT] Add partial evaluation graph stitching logic (#65377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377 When we run symbolic shape analysis on ``` conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) mod = nn.Sequential(conv1, max_pool) ... graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential, %input.1 : Tensor): %18 : bool = prim::Constant[value=0]() %30 : int[] = prim::Constant[value=[1, 1]]() %29 : int[] = prim::Constant[value=[3, 3]]() %28 : int[] = prim::Constant[value=[2, 2]]() %6 : int = prim::Constant[value=1]() %self.0.bias : NoneType = prim::Constant() %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]() %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6) %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18) return (%input.9) ``` we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`. The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to a) recover symbolic equivalences by CSE'ing & other optimizations b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling. c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`. ``` graph(%input.1 : int[]): %42 : bool = prim::Constant[value=0]() # <string>:152:17 %15 : int = prim::Constant[value=3]() %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41 %13 : int = prim::Constant[value=1]() # <string>:426:61 %12 : int = prim::Constant[value=4]() # <string>:437:32 %11 : str = prim::Constant[value="AssertionError: "]() %9 : int = prim::Constant[value=2]() %8 : int = prim::Constant[value=6]() %7 : int = prim::Constant[value=7]() %16 : int = aten::len(%input.1) # <string>:438:17 %17 : bool = aten::eq(%16, %12) # <string>:438:17 = prim::If(%17) # <string>:438:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:438:10 -> () %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17 %19 : bool = aten::eq(%18, %15) # <string>:407:17 = prim::If(%19) # <string>:407:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:407:10 -> () %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20 %21 : int = aten::add(%20, %8) # <string>:411:20 %22 : bool = aten::ge(%21, %7) # <string>:411:20 = prim::If(%22) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20 %24 : int = aten::add(%23, %8) # <string>:411:20 %25 : bool = aten::ge(%24, %7) # <string>:411:20 = prim::If(%25) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29 %27 : int = aten::sub(%20, %13) # <string>:428:32 %28 : int = aten::floordiv(%27, %9) # <string>:428:32 %29 : int = aten::add(%28, %13) # <string>:428:32 %30 : int = aten::sub(%23, %13) # <string>:428:32 %31 : int = aten::floordiv(%30, %9) # <string>:428:32 %32 : int = aten::add(%31, %13) # <string>:428:32 %48 : int = aten::floordiv(%28, %9) # <string>:133:17 %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23 %51 : int = aten::floordiv(%31, %9) # <string>:133:17 %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23 %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41 %54 : bool = prim::If(%53) # <string>:157:64 block0(): %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93 -> (%55) block1(): -> (%42) = prim::If(%54) # <string>:157:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:157:10 -> () %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17 %57 : bool = prim::If(%56) # <string>:160:17 block0(): %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38 -> (%58) block1(): -> (%42) = prim::If(%57) # <string>:160:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:160:10 -> () return (%26, %29, %32, %outputSize.2, %outputSize.1) ``` This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion. Question for reviewers : should I make this a separate file ? Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797472 Pulled By: eellison fbshipit-source-id: a41ed31fad085d3563e71c815f49af0cd18aaeed	2021-10-20 16:12:58 -07:00

1 2 3 4 5 ...

252 Commits