pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Michael Andreas Dagitses	acd072967a	canonicalize includes of form <aten/src/ATen/...> Pull Request resolved: https://github.com/pytorch/pytorch/pull/78033 This was never intended to be supported. @override-unit-failures (Note: this ignores all push blocking failures!) Differential Revision: [D36567054](https://our.internmc.facebook.com/intern/diff/D36567054/) Approved by: https://github.com/kit1980	2022-06-16 17:46:45 +00:00
Nikolay Korovaiko	8ef6356f26	Reland PySymInt (#79617 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/79617 Approved by: https://github.com/Chillee	2022-06-16 04:18:06 +00:00
PyTorch MergeBot	b8db0a0475	Revert "Python Bindings for SymInts (#78135 )" This reverts commit `d332724071`. Reverted https://github.com/pytorch/pytorch/pull/78135 on behalf of https://github.com/ezyang due to broke torchvision tests	2022-06-15 13:52:14 +00:00
Nikolay Korovaiko	d332724071	Python Bindings for SymInts (#78135 ) This PR adds support for `SymInt`s in python. Namely, * `THPVariable_size` now returns `sym_sizes()` * python arg parser is modified to parse PyObjects into ints and `SymbolicIntNode`s * pybind11 bindings for `SymbolicIntNode` are added, so size expressions can be traced * a large number of tests added to demonstrate how to implement python symints. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78135 Approved by: https://github.com/ezyang	2022-06-14 02:17:59 +00:00
goldenxuett	2f7ed05f22	Retry - [JIT] Add mutation checks for tensor inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/79316 Approved by: https://github.com/davidberard98	2022-06-13 18:16:50 +00:00
Michael Andreas Dagitses	ab2ca95dd1	turn on -Werror=unused-variable in our Bazel CPU build Summary: We also fix any existing issues. Note that we only do this for the CPU build because nvcc is considered a C++ toolchain but it does not have the same flag support. Adding flags to the GPU build will cause nvcc errors. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79156 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-11 02:46:34 +00:00
anjali411	38350acf8f	Autogen Tags enum, and allow specifying tags while defining an op Pull Request resolved: https://github.com/pytorch/pytorch/pull/79322 Approved by: https://github.com/albanD	2022-06-11 00:29:32 +00:00
PyTorch MergeBot	b712467cd1	Revert "Add mutation checks for tensor inputs" This reverts commit `83c0a2bc38`. Reverted https://github.com/pytorch/pytorch/pull/79078 on behalf of https://github.com/davidberard98 due to broke bazel build-and-test, see [https://github.com/pytorch/pytorch/runs/6836001002?check_suite_focus=true](https://github.com/pytorch/pytorch/runs/6836001002?check_suite_focus=true%22)	2022-06-10 20:15:30 +00:00
goldenxuett	83c0a2bc38	Add mutation checks for tensor inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/79078 Approved by: https://github.com/davidberard98, https://github.com/Krovatkin	2022-06-10 18:17:33 +00:00
Luka Mushkudiani	c0a7c1d02e	Expose _export_data from C++ to Python (#79207 ) Summary: https://www.internalfb.com/code/fbsource/[477a5768452957f87e56044169de47f051197567]/fbcode/caffe2/torch/csrc/jit/mobile/train/export_data.cpp export_data is used to serialize data. I binded this method to Python with PyBind11 Test Plan: Wrote a file pybind_check.py which checks if the binding works. Then, tried to read the produced data file from C++ with "torch::jit::_load_parameters" and checked that content matched. Differential Revision: D37029253 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79207 Approved by: https://github.com/qihqi	2022-06-10 00:41:33 +00:00
Yanan Cao (PyTorch)	67badf0d5c	Add missing QSCheme IValue conversion logic (#78862 ) Differential Revision: D36913736 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78862 Approved by: https://github.com/suo	2022-06-07 08:34:17 +00:00
goldenxuett	eb49dde9cf	Disable TracerWarnings on NNC opinfo tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/78756 Approved by: https://github.com/davidberard98	2022-06-03 18:11:12 +00:00
Elias Ellison	26d273959c	Add Caching of Conversion to Fake/Meta tensors in FakeTensorMode Pull Request resolved: https://github.com/pytorch/pytorch/pull/78090 Approved by: https://github.com/ezyang	2022-06-03 13:56:00 +00:00
PyTorch MergeBot	954522a485	Revert "Autogen Tags enum, and allow specifying tags while defining an op" This reverts commit `9476a78f37`. Reverted https://github.com/pytorch/pytorch/pull/77313 on behalf of https://github.com/malfet due to Broke OSS buck builds, see `9476a78f37`	2022-06-03 01:53:53 +00:00
anjali411	9476a78f37	Autogen Tags enum, and allow specifying tags while defining an op Pull Request resolved: https://github.com/pytorch/pytorch/pull/77313 Approved by: https://github.com/ezyang, https://github.com/albanD	2022-06-03 01:13:44 +00:00
Pavithran Ramachandran	9b81e81771	[PyTorchEdge] Extend Flatbuffer to get mobile_info for NMLML workflows Pull Request resolved: https://github.com/pytorch/pytorch/pull/78306 Extending the feature available from pickle that helps NMLML system get info of mobile models from `extra_files` dir Differential Revision: [D36609548](https://our.internmc.facebook.com/intern/diff/D36609548/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36609548/)! Approved by: https://github.com/iseeyuan	2022-06-01 20:09:09 +00:00
Tugsbayasgalan Manlaibaatar	c7e9eea915	Expose is_out to python Pull Request resolved: https://github.com/pytorch/pytorch/pull/78591 Approved by: https://github.com/zhxchen17	2022-06-01 07:39:24 +00:00
Elias Ellison	678213ead2	Fake Tensor Part 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77969 Approved by: https://github.com/ezyang	2022-05-31 16:20:35 +00:00
Edward Z. Yang	6b273444c4	Add logit ref; allow non-refs to be called in refs. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/77816 Approved by: https://github.com/mruberry	2022-05-21 02:35:14 +00:00
Elias Ellison	05ce0f9be6	Add option to disable autocast pass Pull Request resolved: https://github.com/pytorch/pytorch/pull/77566 Approved by: https://github.com/anijain2305, https://github.com/davidberard98	2022-05-18 14:57:25 +00:00
David Berard	d0dc7cb774	Reland "[JIT] during freezing, cast optional bias to half if weight is half" Original PR: #77295 Original commit message: On GPU, conv errors if not all its inputs have the same dtype. In the case of autocasting during freezing, what we see is: 1) inputs to conv are casted to half 2) inputs to batchnorm are not casted, so many are still floats 3) we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm. If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU. Reland changes: There's a memory leak from cuda caching allocator that is a side effect of this fix. The memory leak causes the test to fail, though for some reason it didn't fail on CI in the last PR. This skips the tests for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77617 Approved by: https://github.com/eellison	2022-05-17 12:25:26 +00:00
PyTorch MergeBot	246078e251	Revert "[JIT] during freezing, cast optional bias to half if weight is half" This reverts commit `2547be5135`. Reverted https://github.com/pytorch/pytorch/pull/77295 on behalf of https://github.com/malfet	2022-05-17 00:34:51 +00:00
Tugsbayasgalan Manlaibaatar	31d9f7c303	Move other div variants to upgraders map Pull Request resolved: https://github.com/pytorch/pytorch/pull/73586 Approved by: https://github.com/gmagogsfm	2022-05-16 22:32:15 +00:00
David Berard	2547be5135	[JIT] during freezing, cast optional bias to half if weight is half On GPU, conv errors if not all its inputs have the same dtype. In the case of autocasting during freezing, what we see is: 1) inputs to conv are casted to half 2) inputs to batchnorm are not casted, so many are still floats 3) we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm. If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77295 Approved by: https://github.com/eellison	2022-05-16 22:18:47 +00:00
max	25a6aabe71	Expose permute inputs (#77391 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/77391 Approved by: https://github.com/eellison	2022-05-13 22:18:51 +00:00
Hongxia Yang	8d34a8325d	TorchScript to support capability to rethrow the original python exception (#77093 ) Summary: In order to categorize exceptions/errors, the observability /migration team faced a problem that currently the exception is shown as RuntimeError, and hard to categorize. The solution to this problem is to be able to get the original python exception's class name and msg, and hopefully to recreate a python exception from that. TO support this approach, we did the following in this diff: (1) TorchScript to translate JITException so that it does not show as RuntimeError (2) record python exception class name, original message during translation. Then, later, the python exception can be reconstructed. (3) Added a new decorator to reconstruct the python exception and then rethrow it. Test Plan: buck test //caffe2/torch/fb/translate_exception/tests:test_rethrow mode/dev-tsan ``` More details at https://www.internalfb.com/intern/buck/build/1180a788-3767-48e5-a64d-06d284b91a17 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 24ae6c7c-a647-404e-8f12-d12c762bf728 Trace available for this run at /tmp/tpx-20220507-195320.698499-24ae6c7c-a647-404e-8f12-d12c762bf728/trace.log RemoteExecution session id: reSessionID-24ae6c7c-a647-404e-8f12-d12c762bf728-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8162774413147962 ✓ ListingSuccess: caffe2/torch/fb/translate_exception/tests:test_rethrow : 3 tests discovered (27.233) ✓ Pass: caffe2/torch/fb/translate_exception/tests:test_rethrow - test_one_parameter (test_rethrow.TestTranslateRethrowPythonException) (28.467) ✓ Pass: caffe2/torch/fb/translate_exception/tests:test_rethrow - test_no_parameter (test_rethrow.TestTranslateRethrowPythonException) (28.495) ✓ Pass: caffe2/torch/fb/translate_exception/tests:test_rethrow - test_2_parameter_with_torch_script_only (test_rethrow.TestTranslateRethrowPythonException) (28.708) Summary Pass: 3 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8162774413147962 ``` Differential Revision: D36166520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77093 Approved by: https://github.com/qihqi	2022-05-13 16:40:25 +00:00
David Berard	0925597707	[JIT] Support for ParameterDict getattr Adds support for scripting ParameterDicts and getattr() on them. It does not support iterating on ParameterDicts because torch/nn/container.py implementation of ParameterDict.items() uses a generator, which is not supported by torchscript. torch/nn/container.py would need to be updated so that iter gets correctly registered in python_sugared_value.cpp Added a test in test_module_containers.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/77143 Approved by: https://github.com/eellison	2022-05-13 01:03:25 +00:00
Henry Tu	f6eb811786	Add RefineTypes JIT pass for Tuple (#76919 ) Consider the following JIT graph, where the type of `%a` and `%b` are out of sync with tuple `%c`. Before: ``` graph(%a : Float(123), %b : Float(4, 5, 6)): c : (Tensor, Tensor) = prim::TupleConstruct(%a, %b) return (%c) ``` After: ``` graph(%a : Float(123), %b : Float(4, 5, 6)): c : (Float(123), Float(4, 5, 6)) = prim::TupleConstruct(%a, %b) return (%c) ``` This PR adds a pass `RefineTypes(...)` to update all such instances with the correct type. This is also available via Python by using `torch._C._jit_pass_refine_types(...)`. A unit test has been added for unnamed tuples, but no test exists for `NamedTuple` (though it was tested manually) since it isn't supported by the parser: ``` RuntimeError: unknown type specifier: graph(%a : Float(123), %b : Float(4, 5, 6)): %c : NamedTuple(Tensor : Tuple, Tensor : Tuple) = prim::TupleConstruct(%a, %b) ~~~~~~~~~~ <--- HERE return (%c) ``` cc: @ke1337 @antoniojkim @wconstab @eellison Pull Request resolved: https://github.com/pytorch/pytorch/pull/76919 Approved by: https://github.com/eellison	2022-05-12 00:48:39 +00:00
Edward Z. Yang	0a14a4c280	Register prims as operators. This makes prims look as if they were defined in native_functions.yaml but they're still all written in Python. You now need to give a full schema string for your prims. The returned prim object is now torch.ops.prim overload (prims are not allowed to be overloaded, so we return the overload, not the overload packet, for speed.) Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/77117 Approved by: https://github.com/mruberry, https://github.com/albanD	2022-05-11 16:38:14 +00:00
Han Qi	41ff6f8c49	make has_bundled_input work for flatbuffer (#76854 ) Summary: title Test Plan: unit test Differential Revision: D36120947 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76854 Approved by: https://github.com/Jack-Khuu	2022-05-09 23:04:08 +00:00
Edward Z. Yang	f2eed9400d	Register PrimTorch refs as decompositions. For the most part, PrimTorch refs have the same signature as their ATen equivalents. I modify most PrimTorch refs to register themselves as decompositions, using the prim name they wrap to find the aten name (except for a few cases where the prim/aten names mismatch). There are some exclusions, falling into one of two categories: - The torch equivalent was already implemented as a CompositeImplicitAutograd decomposition in C++ - The ref doesn't support enough features (e.g., the real deal has more kwargs / overloads than are currently implemented) PrimTorch refs are written as a single function that supports all overloads, and this style is convenient for cases where we have a bundle of overloads for what morally is a single overload with a Union type on an argument (which we ought to have supported in native_functions.yaml but blah); to support registering a single decomp for all the overloads, we modify register_decomposition to register to ALL overloads if you pass it an overload packet. This is technically BC breaking but no tests started failing because of it. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/76835 Approved by: https://github.com/Chillee, https://github.com/mruberry	2022-05-06 20:11:45 +00:00
sanchitintel	4ee29d6033	[Reland take-2] Add JIT graph fuser for oneDNN Graph API (v0.5) Re-landing #68111/#74596 ## Description v0.5 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of #50256, the below improvements are included: * The [v0.5 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5) of the oneDNN Graph API is used * The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` `torch.jit.freeze` should be used after tracing (recommended) or scripting a model. ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: * SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) * SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) * By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) ** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code is placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is in: ``` caffe2/CMakeLists.txt cmake/public/mkldnn.cmake cmake/Modules/FindMKLDNN.cmake ``` ## Limitations * In this PR, we only support Pytorch-oneDNN-Graph integration on Linux platform. Support on Windows and MacOS will be enabled as a next step. * We have only optimized the inference use-case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76622 Approved by: https://github.com/eellison	2022-05-05 16:57:03 +00:00
Edward Z. Yang	3a6da16a5a	Return all overloads for an operator in _jit_get_operation This allows us to provide OpOverloadPacket.overloads method that lists all of the overloads. This isn't tested; will be exercised in the next PR. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/76814 Approved by: https://github.com/mruberry	2022-05-04 23:49:47 +00:00
BowenBao	679fc90cdb	[ONNX] Support optional type (#68793 ) (#73284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73284 Some important ops won't support optional type until opset 16, so we can't fully test things end-to-end, but I believe this should be all that's needed. Once ONNX Runtime supports opset 16, we can do more testing and fix any remaining bugs. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D34625646 Pulled By: malfet fbshipit-source-id: 537fcbc1e9d87686cc61f5bd66a997e99cec287b Co-authored-by: BowenBao <bowbao@microsoft.com> Co-authored-by: neginraoof <neginmr@utexas.edu> Co-authored-by: Nikita Shulga <nshulga@fb.com> (cherry picked from commit 822e79f31ae54d73407f34f166b654f4ba115ea5)	2022-05-04 20:24:30 +00:00
David Berard	e33f3229a2	[NVFuser] environment variable to turn nvfuser on or off (#76485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76485 Adds an environment variable `PYTORCH_JIT_ENABLE_NVFUSER` for controlling whether or not nvfuser is enabled. This required changing the PassManager behavior to support the case where nvfuser gets enabled by default when PYTORCH_JIT_ENABLE_NVFUSER=1. Previously the solution for turning nvfuser on or off was to use the PassManager to register or un-register the pass. That works fine if the pass starts of _disabled_, but causes issues once we try to enable the pass by default. The main issue with enabling by default is with the validation check to see whether NVFuser can be turned on. The check relies on at::globalContext().hasCUDA(), which requires CUDAHooks to be registered before hasCUDA() wil work correctly. At static initialization time it's difficult to ensure that CUDAHooks will be registered _before_ we attempt to register the nvfuser pass. In OSS it worked fine, but in internal builds it would fail on ROCm builds. To fix this, we switch the control of NVFuser enablement to a check in the pass. i.e. previously, we enabled/disabled nvfuser by registering or de-registering the pass in pass manager; now, the pass is always registered in pass manager, and enablement is done by a check within the nvfuser pass. Remaining TODO: Connect this with NNC so that in cases where NNC is available but not NVFuser (i.e. on AMD gpus), NNC can be turned on automatically. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D35982618 Pulled By: davidberard98 fbshipit-source-id: fd5b76bc0b8c8716c96fdc04bebfb15026a7ef60 (cherry picked from commit ff14603ff5ac8d9b6c749c4f111f4a8be8023b7f)	2022-05-03 23:05:40 +00:00
PyTorch MergeBot	3dcd67a1b3	Revert "[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1)" This reverts commit `8b11d81058`. Reverted https://github.com/pytorch/pytorch/pull/74596 on behalf of https://github.com/janeyx99	2022-04-29 15:40:17 +00:00
chunyuan	8b11d81058	[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1) Re-landing https://github.com/pytorch/pytorch/pull/68111 ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74596 Approved by: https://github.com/malfet	2022-04-29 01:01:33 +00:00
Elias Ellison	e5a55af305	Reland reland Reland of https://github.com/pytorch/pytorch/pull/76397 and https://github.com/pytorch/pytorch/pull/76493 This time I'll get it right 😢 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76539 Approved by: https://github.com/davidberard98, https://github.com/osalpekar	2022-04-28 20:41:55 +00:00
PyTorch MergeBot	a5bc02aeb2	Revert "[JIT] Register decomp reland" This reverts commit `81b9cb741c`. Reverted https://github.com/pytorch/pytorch/pull/76397 on behalf of https://github.com/osalpekar	2022-04-28 03:33:29 +00:00
Elias Ellison	81b9cb741c	[JIT] Register decomp reland Reland of https://github.com/pytorch/pytorch/pull/76252 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76397 Approved by: https://github.com/davidberard98	2022-04-26 23:17:18 +00:00
Kevin Stephano	b17b2b1cc7	Add NVFuser Python Frontend New functionality. 1. Adds Pybind11 bindings for NVFuser. 2. Requires a build file change and JIT python file change outside of NVFuser's code area. Example: ``` import torch from torch._C._nvfuser import Fusion, FusionDefinition # Construct and Define Fusion fusion = Fusion() with FusionDefinition(fusion) as fd : t0 = fd.define_tensor(3) t1 = fd.define_tensor(1) s0 = fd.define_scalar() fd.add_input(t0) fd.add_input(t1) fd.add_input(s0) c0 = fd.define_constant(3.0) t1_b = fd.Ops.broadcast(t1, [True, True, False]) t2 = fd.Ops.add(t0, t1) t3 = fd.Ops.mul(t2, c0) t4 = fd.Ops.mul(t3, s0) t5 = fd.Ops.relu(t4) t6 = fd.Ops.sum(t5, [-1], False) fd.add_output(t6) fusion.print_ir() # Execute Fusion input1 = torch.ones(2, 4, 8, device='cuda') input2 = torch.ones(8, device='cuda') # Kernel compilation should be cached for the 2nd iteration # with input tensors of the same shape for _ in range(5) : outputs = fusion.execute([input1, input2, 2.0]) print(outputs[0]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/76353 Approved by: https://github.com/csarofeen, https://github.com/mruberry	2022-04-26 06:10:19 +00:00
PyTorch MergeBot	2d72cb3373	Revert "[JIT] Allow registering Decompositions" This reverts commit `d9f0774f98`. Reverted https://github.com/pytorch/pytorch/pull/76252 on behalf of https://github.com/zengk95	2022-04-26 04:47:05 +00:00
Elias Ellison	d9f0774f98	[JIT] Allow registering Decompositions - Allow registering custom decompositions - Add easier API for invoking decompositions - Shorten API names (no users yet) I am doing these as one pr because they are fairly short/simple and because github first does not support ghstack yet. cc @Chillee @zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76252 Approved by: https://github.com/davidberard98	2022-04-26 03:00:35 +00:00
David Berard	82421b0fb8	[JIT] support parameterlist iteration Followup to https://github.com/pytorch/pytorch/pull/75479. This adds support for iterating through parameterlists Pull Request resolved: https://github.com/pytorch/pytorch/pull/76140 Approved by: https://github.com/tugsbayasgalan	2022-04-21 18:51:27 +00:00
David Berard	272890998e	[JIT] pass more exception info through the JIT interpreter If TORCH_SHOW_CPP_STACKTRACES=1, then dump e.what() into the RuntimeError, which should make it easier to debug exceptions that happen within interpreted sections. Test: ```patch diff --git a/test/cpp/jit/test_dce.cpp b/test/cpp/jit/test_dce.cpp index 6f9161d0d9..7c574787cf 100644 --- a/test/cpp/jit/test_dce.cpp +++ b/test/cpp/jit/test_dce.cpp @@ -3,6 +3,10 @@ #include <torch/csrc/jit/ir/irparser.h> #include <torch/csrc/jit/passes/dead_code_elimination.h> #include <torch/csrc/jit/testing/file_check.h> +#include <torch/csrc/jit/runtime/interpreter.h> +#include <test/cpp/jit/test_utils.h> + +#include <ATen/ATen.h> namespace torch { namespace jit { @@ -48,5 +52,30 @@ graph(): // Check that dead code elimin testing::FileCheck().run(input, *graph); } + +TEST(EliminateDeadCodeTest, interpreterfailure) { + const std::string input = R"IR( +graph(%x.1 : Tensor): + %2 : int = prim::Constant[value=128]() # /data/users/dberard/scripts/DGB/sz.py:4:38 + %3 : int = prim::Constant[value=256]() # /data/users/dberard/scripts/DGB/sz.py:4:43 + %5 : int = prim::Constant[value=1]() # /data/users/dberard/scripts/DGB/sz.py:4:53 + %4 : int[] = prim::ListConstruct(%2, %3) + %6 : Tensor[] = aten::split_with_sizes(%x.1, %4, %5) # /data/users/dberard/scripts/DGB/sz.py:4:11 + return (%6) +)IR"; + auto graph = std::make_shared<Graph>(); + parseIR(input, graph.get()); + + //auto stack = createStack({at::randn({2, 383}, at::kCPU)}); + auto stack = createStack({at::Tensor{}}); + + Code code(graph, ""); + InterpreterState interpreter{code}; + interpreter.run(stack); + ASSERT_EQ(2, stack.size()); + ASSERT_FALSE(stack[0].toTensor().defined()); + ASSERT_FALSE(stack[1].toTensor().defined()); +} + } // namespace jit } // namespace torch ``` ^ use this to repro the interpreter issue: `TORCH_SHOW_CPP_STACKTRACES=1 ./bin/test_jit --gtest_filter="EliminateDeadCodeTest.interpreterfailure"` and the stack trace is shown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75682 Approved by: https://github.com/eellison	2022-04-21 18:26:49 +00:00
jishaomin	91e9fcf5b0	sup torch script parameterlist Fixes #61176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75479 Approved by: https://github.com/davidberard98	2022-04-20 20:53:07 +00:00
Elias Ellison	0c671c15ec	[JIT] Remove CSE Hoisting This has led to a couple bugs, and I don't think the additional complexity was worth keeping in codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75756 Approved by: https://github.com/davidberard98	2022-04-19 20:59:25 +00:00
Han Qi	b34b192d6b	Reland "Make debug_pkl smaller by only emitting unique traces." (#73368 ) Summary: ## Original commit message: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368 debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings. Since many SourceRange shares the same source, the string for trace can be deduped. The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression). The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup. To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction. Test Plan: ## Original Test plan unit test Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents: ``` [qihan@devvm5585.vll0 ~]$ du archive -h 4.0K archive/xl_model_weights 3.7M archive/extra 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform 8.0K archive/code/__torch__/caffe2/torch/fb 8.0K archive/code/__torch__/caffe2/torch 8.0K archive/code/__torch__/caffe2 20M archive/code/__torch__/torch/fx/graph_module 20M archive/code/__torch__/torch/fx 8.0K archive/code/__torch__/torch/classes 20M archive/code/__torch__/torch 20M archive/code/__torch__ 20M archive/code 2.7M archive/constants 35M archive [qihan@devvm5585.vll0 ~]$ du resaved -h 4.0K resaved/extra 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform 8.0K resaved/code/__torch__/caffe2/torch/fb 8.0K resaved/code/__torch__/caffe2/torch 8.0K resaved/code/__torch__/caffe2 1.3M resaved/code/__torch__/torch/fx/graph_module 1.3M resaved/code/__torch__/torch/fx 8.0K resaved/code/__torch__/torch/classes 1.4M resaved/code/__torch__/torch 1.4M resaved/code/__torch__ 1.4M resaved/code 2.7M resaved/constants 13M resaved [qihan@devvm5585.vll0 ~]$ ``` ## Additional test: `buck test mode/dev-tsan //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --exact 'caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.to'` passes test jest.fbios.startup_cold_start.local.simulator f333356873 - Differential Revision: D35196883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74869 Approved by: https://github.com/gmagogsfm	2022-04-18 22:34:21 +00:00
John Clow	f281d83d77	Moving Remove Tensor Type Specializations to after custom passes This is to allow for Intel folks to use type information in their custom passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71748 Approved by: https://github.com/eellison	2022-04-11 22:12:01 +00:00
Emma Blink	ca056cc918	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D35543681 fbshipit-source-id: 0453f35c2a39299df172dc2b4fc77fb73963bb97 (cherry picked from commit aae11d9628a1cf7fd88a2113191f31e979750bc8)	2022-04-11 13:48:41 +00:00

1 2 3 4 5 ...

600 Commits