pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
chunyuan	8b11d81058	[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1) Re-landing https://github.com/pytorch/pytorch/pull/68111 ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74596 Approved by: https://github.com/malfet	2022-04-29 01:01:33 +00:00
Elias Ellison	e50dd5ba97	[JIT] Allow empty temporary list literals to be matched to arbitrary types (#74768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74768 As commented in code: ``` // Empty List Literals that are not assigned to variables // may match to any list type in schema matching, // but still default to List[Tensor] if assigned to a variable // or returned from a function // Restricting empty list matching to temporary values // avoids difficult to handle cases such as // a = [] // b = a // if cond: // b.append(2) // else: // a.append("hi") // This is also the same behavior that C++ allows with {} // (cannot assign to a variable typed as auto) ``` Fix for https://github.com/facebookresearch/torchdynamo/issues/95 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D35362760 Pulled By: eellison fbshipit-source-id: da23e8889312001b60d64a1758da5c578b6fe5ea (cherry picked from commit 75682f17204d6d444e7e7144472c6e833150c601)	2022-04-06 18:11:23 +00:00
Michael Suo	e5bf87963d	Revert D34584878: [pytorch][PR] Add JIT graph fuser for oneDNN Graph API (Preview4) Test Plan: revert-hammer Differential Revision: D34584878 (`7dd0823011`) Original commit changeset: ce817aa8cc90 Original Phabricator Diff: D34584878 (`7dd0823011`) fbshipit-source-id: a941aaad34f8fe5f0c51f719f9f5c29b811c4d5b (cherry picked from commit a43262ec7521b1665b02a64d3f279e72ee2344b9)	2022-03-21 23:07:14 +00:00
chunyuan	7dd0823011	Add JIT graph fuser for oneDNN Graph API (Preview4) (#68111 ) Summary: ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68111 Reviewed By: eellison Differential Revision: D34584878 Pulled By: malfet fbshipit-source-id: ce817aa8cc9052ee9ed930c9cf66be83449e61a4 (cherry picked from commit cd17683aa7d9c0947df45a1ab53627feff795587)	2022-03-21 22:12:19 +00:00
Elias Ellison	ab6395fc65	Add api for recursively analyzing function calls (#73329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73329 There is a quantization use case for having better alias analysis with function calls remaining. This does the relatively dumb approach of getting the inlined graph of each function call, and then analyzing that subgraph. Since we need a unique single analysis of every `Value*`, for every function call make a copy of the graph for every analysis past the first. This is relatively slow, but given the limited use case here should work well enough (and is no slower than calling the inlining pass). cc vkuzo Test Plan: Imported from OSS Reviewed By: davidberard98 Differential Revision: D34451424 Pulled By: eellison fbshipit-source-id: b7c7e54679d723f5ded1e11ffb32eb6d2176431d (cherry picked from commit 81a42b31522b890311a3f512448b372c4ebbefd1)	2022-02-28 17:44:45 +00:00
Scott Wolchok	3ad971798f	[PyTorch][JIT] use a better hash table in alias analysis (#69854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69854 ghstack-source-id: 148315147 Test Plan: Time reported to start up static runtime on ctr_mobile_feed local_ro net is 8.8s instead of 9.5s Reviewed By: suo, d1jang Differential Revision: D33039733 fbshipit-source-id: 218dc7ff9aa421a352b71952ec77757368095860 (cherry picked from commit `7586712948`)	2022-02-05 02:15:26 +00:00
Scott Wolchok	1bbea3c3a2	[PyTorch][JIT] Support mayContainAlias(Value, ArrayRef<Value>) (#69853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69853 We can implement this overload more efficiently. ghstack-source-id: 146924693 Test Plan: patched alias_analysis tests Time reported to initialize a predictor by static runtime when given ctr_mobile_feed local_ro net is 9.5s instead of 10.5s. Reviewed By: mikeiovine Differential Revision: D33039731 fbshipit-source-id: 52559d678e9eb00e335b9e0db304e7a5840ea397	2022-01-12 16:53:54 -08:00
Elias Ellison	4a8d4cde65	Fix for tensor in list return added to wildcard set (#71170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71170 As with an output in a tuple return, an output in a list return will not have any further uses that would make adding it directly to the list's contained elements give incorrect behavior. This unblocks a use case in op authoring. cc Chillee Test Plan: Imported from OSS Reviewed By: d1jang Differential Revision: D33535608 Pulled By: eellison fbshipit-source-id: 2066d28e98c2f5d1b3d7e0206c7e39a27b3884b1	2022-01-11 22:12:39 -08:00
Elias Ellison	9bccb31306	Remove precise tuple construct flag (#71121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71121 Test Plan: Imported from OSS Reviewed By: d1jang Differential Revision: D33515234 Pulled By: eellison fbshipit-source-id: 57cfe171b583a6bb4d3493a34b159061e97a11b8	2022-01-11 22:12:36 -08:00
Raghavan Raman	0bdf4702f6	[jit] Add a new op that composes all of the dynamic shape logic (#69476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69476 This diff adds a new op, `TensorExprDynamicGroup`, that composes all the logic behind running a dynamic shaped fused node. This includes a guard instruction that checks for conditions, a conditional that calls the fused node or the fallback graph depending on the guard. ghstack-source-id: 146107006 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/cpp/jit:jit ``` Reviewed By: eellison Differential Revision: D32320082 fbshipit-source-id: 2bd1a43391ca559837d78ddb892d931abe9ebb73	2021-12-22 00:28:57 -08:00
David Berard	96d116fec2	[JIT] Add additional debug output when op cannot be found in AliasDb (#68099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68099 When an op in the graph cannot be matched to any known ops, alias_analysis.cpp throws an error. Before: ``` RuntimeError: 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":612, please report a bug to PyTorch. We don't have an op for aten::add but it isn't a special case. Argument types: Tensor, float, Tensor, ``` After: ``` RuntimeError: 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":612, please report a bug to PyTorch. We don't have an op for a ten::add but it isn't a special case. Argument types: Tensor, float, Tensor, Candidates: aten::add.Tensor(Tensor self, Tensor other, , Scalar alpha=1) -> (Tensor) aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor) aten::add.out(Tensor self, Tensor other, , Scalar alpha=1, Tensor(a!) out) -> (Tensor(a!)) aten::add.t(t[] a, t[] b) -> (t[]) aten::add.str(str a, str b) -> (str) aten::add.int(int a, int b) -> (int) aten::add.complex(complex a, complex b) -> (complex) aten::add.float(float a, float b) -> (float) aten::add.int_complex(int a, complex b) -> (complex) aten::add.complex_int(complex a, int b) -> (complex) aten::add.float_complex(float a, complex b) -> (complex) aten::add.complex_float(complex a, float b) -> (complex) aten::add.int_float(int a, float b) -> (float) aten::add.float_int(float a, int b) -> (float) aten::add(Scalar a, Scalar b) -> (Scalar) ``` Test Plan: Run ``` import torch if __name__ == '__main__': ir = """ graph(%x : Tensor, %y : Tensor): %2 : float = prim::Constant[value=1.2]() %result : Tensor= aten::add(%x, %2, %y) return (%result) """ x = torch.tensor([[1., 2.], [3., 4.]]) y = torch.tensor([[2., 1.], [2., 1.]]) graph = torch._C.parse_ir(ir) print(graph) graph.alias_db().analyze() # print(script(x, y)) ``` to get the results above Imported from OSS Reviewed By: anjali411 Differential Revision: D32339639 fbshipit-source-id: a79a3c2f157154b5fb1e3f33a23e43b7884e8e38	2021-11-12 08:39:41 -08:00
David Berard	e86d8323cb	[JIT] Add special cases for batch_norm, instance_norm in alias_analysis (#66554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66554 In native_functions.yaml, the schemas for batch_norm and instance_norm are incorrect: the inputs `running_mean` and `running_var` are mutated, but are not marked as such in the function schema. Since `(a!)?` annotations are currently not working (see #65760), this instead adds a special case to `alias_anaysis.cpp`. If the value of `training` or `use_input_stats` is known to be `false`, then `alias_analysis` will mark the input as _not_ being written to. Test Plan: Removed the `skip` annotation on the following test, and added a special exception in `check_alias_annotations`: ``` python test/test_ops.py -k test_variant_consistency_jit_nn_functional_batch_norm ``` Also: ``` ./build/bin/test_jit --gtest_filter="BatchAndInstanceNormFixture" ``` Imported from OSS Reviewed By: eellison Differential Revision: D31612339 fbshipit-source-id: 12ca61b782b9e41e06883ba080a276209dc435bb	2021-10-20 10:22:10 -07:00
Scott Wolchok	0aab34c26c	[jit] Refcounting spot fixes in alias_analysis (#66295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66295 Tidying up the top sources of reference count decrements seen during static runtime startup in alias_analysis.cpp specifically. ghstack-source-id: 140484160 Test Plan: CI perf now shows under 2% time spend in ~__shared_count instead of about 5%. Reviewed By: suo Differential Revision: D31490761 fbshipit-source-id: bbdcb7f9065c3aafa7fff7bfea9cea6dbc41f9d9	2021-10-13 14:47:32 -07:00
Scott Wolchok	9767282643	[jit] Add MutableTypePtrHelper::mapTypeToBorrowedAliasTypeSet (#65344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65344 Callsites that know they are using a cache can borrow AliasTypeSets from the cache instead of copying them. ghstack-source-id: 140484162 Test Plan: Running perf on static runtime startup seems to show less inclusive time spent in AliasDb::getElements Reviewed By: ejguan Differential Revision: D31027363 fbshipit-source-id: b7a1473f4f9e9f14566f56f4b3b4e6317076beeb	2021-10-13 14:47:30 -07:00
Scott Wolchok	94845fc44e	[jit] Implement one-argument AliasDb::mayContainAlias more efficiently (#65177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65177 There is no need to heap-allocate any vectors in this case. ghstack-source-id: 140052520 Test Plan: CI Startup for static runtime on ctr_mobile_feed local net decreased from 7.8s to about 7.0s Reviewed By: malfet Differential Revision: D30984194 fbshipit-source-id: 85091e55445f653ec728b27da4b459a2f1873013	2021-10-08 10:29:25 -07:00
Don Jang	7941590a51	[JIT] Selectively enable precise alias analysis for TupleConstruct (#66025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66025 This change adds an option to selectively enable precise alias analysis for `prim::`TupleConstruct` (introduced by D30437737 (`cd458fe092`)) to minimize its exposure only to `StaticRuntime` as of now. Test Plan: Modified existing unit tests whose behavior depends on D30437737 (`cd458fe092`). Reviewed By: eellison Differential Revision: D31350285 fbshipit-source-id: 3ce777f07f99650d74634481ad0805192dce55c6	2021-10-01 20:42:22 -07:00
Don Jang	cd458fe092	[JIT] Make output of prim::TupleConstruct alias only with its inputs (#64879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879 This change makes the output of `prim::TupleConstruct` alias only with its inputs when the created tuple is directly returned from the graph. The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used. Test Plan: Added - `AliasMoveForTupleConstructWithSingleUseAsGraphOutput` - `WildcardAliasForTupleConstructWithUses` to cover the newly added code. Reviewed By: eellison Differential Revision: D30437737 fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb	2021-09-29 21:56:31 -07:00
Scott Wolchok	ece25c453f	[PyTorch] Store Argument::alias_info_ on the heap (#64824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64824 See comment in function_schema.h for explanation. I claim that this is a good tradeoff because the aliasing information seems to be used only in compiler-ish code paths, where performance isn't as critical as actual execution. If performance is important there too, perhaps we should hoist isWrite into the Argument itself since there are several paths that only care about isWrite. ghstack-source-id: 138958896 Test Plan: CI, profile schema parsing on startup and see much fewer page faults in createArgumentVector. Reviewed By: suo Differential Revision: D30860719 fbshipit-source-id: 1d4d2328f2b8e34f5ddf9d82083fd4dd7b7f738f	2021-09-24 17:00:51 -07:00
Ansley Ussery	c60075d4b5	Preserve types during empty container assignment (#58911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58911 Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #58911 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30785623 Pulled By: ansley fbshipit-source-id: 4e05d6369318974290fea02ad2bc148293c25090	2021-09-10 16:49:21 -07:00
Ansley Ussery	6831d8e379	Support Union in TorchScript (#64234 ) Summary: This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234 Reviewed By: gmagogsfm Differential Revision: D30656444 Pulled By: ansley fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a	2021-09-03 06:12:24 -07:00
Matej Sladek	f807229fd4	[ONNX] add support for prim::Unitialized in lower_tuples pass (#56912 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56911 Code from issue generates this Torchscript: ``` graph(%self : __torch__.MyModule, %t.1 : Tensor): %12 : None = prim::Constant() %7 : str = prim::Constant[value="Negative input"]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:11:28 %3 : int = prim::Constant[value=0]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:15 %9 : int = prim::Constant[value=5]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:31 %33 : (Tensor, Tensor) = prim::Uninitialized() %4 : Tensor = aten::lt(%t.1, %3) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:11 %6 : bool = aten::Bool(%4) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:11 %34 : (Tensor, Tensor) = prim::If(%6) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:8 block0(): = prim::RaiseException(%7) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:11:12 -> (%33) block1(): %11 : int[] = prim::ListConstruct(%9) %16 : Tensor = aten::zeros(%11, %12, %12, %12, %12) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:19 %18 : int[] = prim::ListConstruct(%9) %23 : Tensor = aten::zeros(%18, %12, %12, %12, %12) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:35 %24 : (Tensor, Tensor) = prim::TupleConstruct(%16, %23) -> (%24) return (%34) ``` Problem is that onnx exporter during lower_tuples pass doesn't support forwarding of tuples in prim::Unitialized. Solution is: 1. add prim::Unitialized to supported_op in lower_tuples pass 1. As prim::Unitialized has now multiple outputs, we should call giveFreshAlias for every output Pull Request resolved: https://github.com/pytorch/pytorch/pull/56912 Reviewed By: nikithamalgifb Differential Revision: D29837200 Pulled By: SplitInfinity fbshipit-source-id: 321fae6fe52b1523df5653dbb9ea73b998ef1cda	2021-08-10 16:21:16 -07:00
Richard Barnes	88f8f2ab94	irange-ify 6 (#62115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62115 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29879576 fbshipit-source-id: 63cbf0ab5a52325fa2c3dec0e8239e2eac1ecf72	2021-07-28 13:32:11 -07:00
Richard Barnes	3979cb0656	irange for size_t (#55320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27572577 fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03	2021-06-03 01:04:13 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Peng Wu	838d3079ad	Lazily initialize alias db in remove_mutation opt (#55949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55949 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27793881 fbshipit-source-id: eebde5b5142d8fecfee4756604d313b0da809882	2021-04-19 09:45:33 -07:00
Mike Ruberry	c0ac0fef4e	Revert D27448156: irange for size_t Test Plan: revert-hammer Differential Revision: D27448156 (`041b4431b2`) Original commit changeset: 585da57d4de9 fbshipit-source-id: 8e047c29f391c0166e0a1a87c3fb2a0854377365	2021-04-03 19:14:00 -07:00
Richard Barnes	041b4431b2	irange for size_t (#55163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55163 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27448156 fbshipit-source-id: 585da57d4de91c692b6360d65f7b8a66deb0f8c1	2021-04-02 23:22:29 -07:00
Tugsbayasgalan Manlaibaatar	444552e7f9	Optimize alias_analysis node lookup (#54115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54115 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27104047 Pulled By: tugsbayasgalan fbshipit-source-id: 0ef4e78be9ea7081b63ab2303711746bf09653eb	2021-03-18 07:14:49 -07:00
Thomas Viehmann	fd5c1123e4	wrap AliasDb in Python (#51336 ) Summary: Also added a wrapper tlemo 's graphviz export to string. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51336 Reviewed By: ezyang Differential Revision: D26150809 Pulled By: eellison fbshipit-source-id: 9beafce5cbdc1785b986b71c3cd986c1087faa11	2021-03-17 12:55:22 -07:00
Elias Ellison	32fed3f375	Handle mkldnn broadcasting in mkldnn fuser (#51736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51736 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696694 Pulled By: eellison fbshipit-source-id: 473cc64c8d9f775e9d06340437aff2eb6c0619b9	2021-03-01 21:22:23 -08:00
Elias Ellison	a2f7e929ef	Add MKLDNN fuser (#51600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51600 Looking for notes on implementation first, will post more notes on benchmarks and overall thoughts/implementation and solicit more input soon. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696702 Pulled By: eellison fbshipit-source-id: cd612f093fe3859e42fb0b77560ebd1b44fccff7	2021-03-01 21:22:19 -08:00
Elias Ellison	bfae3789ba	Move conv to mkldnn (#51483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51483 This PR moves the conv weights of a frozen model to MKLDNN, and AOT reorders the weights. When the weights are already in MKLDNN, just computing a single conv by converting the input and output from/to mkldnn provides large speedups. I benchmark'd the results of the top 200 shapes in predictor [here](https://www.internalfb.com/phabricator/paste/view/P171537938), as well as verified that it sped up popular models in torchvision. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696703 Pulled By: eellison fbshipit-source-id: 0b4441bee4f6e0890a4540fbca3bb5e58b8c5adf	2021-03-01 21:19:27 -08:00
Bram Wasti	c4a8f0ceaa	[torch script] Add pure list producing ops to alias analysis (#51999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51999 as in title Test Plan: waiting on CI for now Reviewed By: eellison Differential Revision: D26349297 fbshipit-source-id: bd5574ed1f8448ba18a6fda4bdc45f45d8b158e9	2021-02-10 09:00:39 -08:00
Scott Wolchok	7328710cbc	[PyTorch][codemod] Replace immediately-dereferenced cast calls w/castRaw (#50229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50229 `fastmod -m 'cast(<((at\|c10)::)?\w+Type>\s*)->' 'castRaw${1}->'` Presuming it builds, this is a safe change: the result of `cast()` wasn't being saved anywhere, so we didn't need it, so we can use a raw pointer instead of a new `shared_ptr`. ghstack-source-id: 120769170 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D25837494 fbshipit-source-id: 46319100dc0dfc78f6d2b45148207f83481f2ada	2021-02-01 23:12:07 -08:00
jiej	dd1c2a06b7	refactor profiling optional (#47667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47667 Test Plan: Imported from OSS Reviewed By: anjali411, ngimel Differential Revision: D25255572 Pulled By: Krovatkin fbshipit-source-id: d0152c9ef5b1994e27be9888bcb123dca3ecd88f	2021-01-22 14:45:28 -08:00
Lemo	ca3ce77746	Dump torch::jit::AliasDb objects as Graphviz files (#50452 ) Summary: This PR adds a simple debugging helper which exports the AliasDb state as a [GraphViz](http://www.graphviz.org/) graph definition. The generated files can be viewed with any Graphviz viewer (including online based, for example http://viz-js.com) Usage: 1. Call `AliasDb::dumpToGraphvizFile()` from a debugger. Using gdb for example: `call aliasDb_->dumpToGraphvizFile("alias.dot")` 2. Add explicit calls to `AliasDb::dumpToGraphvizFile()`, which returns `true` if it succeeds. An example output file is attached: [example.zip](https://github.com/pytorch/pytorch/files/5805840/example.zip) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50452 Reviewed By: ngimel Differential Revision: D25980222 Pulled By: eellison fbshipit-source-id: 47805a0a81ce73c6ba859340d37b9a806f9000d5	2021-01-22 13:38:47 -08:00
Nikolay Korovaiko	8e60bf9034	add RequiresGradCheck (#50392 ) Summary: This change improves perf by 3-4% on fastrnns. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50392 Reviewed By: izdeby Differential Revision: D25891392 Pulled By: Krovatkin fbshipit-source-id: 44d9b6907d3975742c9d77102fe6a85aab2c08c0	2021-01-15 16:50:42 -08:00
Scott Wolchok	4a0d17ba2d	[PyTorch][codemod] Replace immediately-dereferenced expect calls w/expectRef (#50228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228 `fastmod -m 'expect(<((at\|c10)::)?\w+Type>\s*)->' 'expectRef${1}.'` Presuming it builds, this is a safe change: the result of `expect()` wasn't being saved anywhere, so we didn't need it, so we can take a reference instead of a new `shared_ptr`. ghstack-source-id: 119782961 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D25837374 fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08	2021-01-13 16:13:55 -08:00
Nikitha Malgi	12b73fdbbf	Adding JIT support for cuda streams and events (#48020 ) Summary: ======= This PR addresses the following: * Adds JIT support for CUDA Streams * Adds JIT support for CUDA Events * Adds JIT support for CUDA Stream context manager Testing: ====== python test/test_jit.py -v TestCUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/48020 Reviewed By: navahgar Differential Revision: D25725749 Pulled By: nikithamalgifb fbshipit-source-id: b0addeb49630f8f0c430ed7badeca43bb9d2535c	2020-12-29 20:24:57 -08:00
Ansley Ussery	c619892482	Fix errata (#49903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49903 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25718411 Pulled By: ansley fbshipit-source-id: 0cc365c5a53077752dc1c5a5c4a65b873baa3604	2020-12-28 20:40:41 -08:00
Bram Wasti	f4226b5c90	[static runtime] add static subgraph fusion pass (#49185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49185 This diff adds a fusion feature that will let us use static runtime for parts of the graph. This will prove useful in cases where fully eliminating control flow is hard etc. TODO: [x] factor out into separate fusion file [x] add python test case [x] add graph that isn't fully lowered test case [x] add graph that has weird list/tuple outputs test case the loop example looks quite good: ``` graph(%a.1 : Tensor, %b.1 : Tensor, %iters.1 : int): %12 : bool = prim::Constant[value=1]() # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:110:4 %c.2 : Tensor = prim::StaticSubgraph_0(%a.1, %b.1) %c : Tensor = prim::Loop(%iters.1, %12, %c.2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:110:4 block0(%i : int, %c.12 : Tensor): %c.10 : Tensor = prim::StaticSubgraph_1(%a.1, %c.12, %b.1) -> (%12, %c.10) return (%c) with prim::StaticSubgraph_0 = graph(%0 : Tensor, %4 : Tensor): %5 : int = prim::Constant[value=2]() %6 : Tensor = aten::mul(%4, %5) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:109:12 %2 : int = prim::Constant[value=1]() %c.2 : Tensor = aten::add(%0, %6, %2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:109:8 return (%c.2) with prim::StaticSubgraph_1 = graph(%1 : Tensor, %7 : Tensor, %8 : Tensor): %9 : int = prim::Constant[value=1]() %c.4 : Tensor = aten::add(%7, %8, %9) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:111:12 %5 : int = prim::Constant[value=2]() %c.7 : Tensor = aten::mul_(%c.4, %5) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:112:8 %2 : int = prim::Constant[value=1]() %c.10 : Tensor = aten::sub_(%c.7, %1, %2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:113:8 return (%c.10) ``` (Note: this ignores all push blocking failures!) Test Plan: buck test mode/no-gpu //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test mode/no-gpu caffe2/test:static_runtime Reviewed By: bertmaher Differential Revision: D25385702 fbshipit-source-id: 2f24af4f11d92a959167facd03fbd24f464a6098	2020-12-10 14:03:11 -08:00
jiej	a6fa3b2682	adding profile_ivalue (#47666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47666 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25255573 Pulled By: Krovatkin fbshipit-source-id: 5d8753e4040a3d96105d28d26728125947c7a638	2020-12-09 15:29:15 -08:00
Michael Suo	dc8176356e	Various cleanups to ir_emitter and friends (#46686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46686 I was trying to page this code back in after a while and some things stuck out as unnecessarily confusing. 1. Improve documentation of closures and fork stuff to be more accurate to how we use them today. 2. Change `prim::LocalVariableScope` to `prim::ListComprehension`. It is only ever used for a list comprehensions, and in general the nodes emitted by `ir_emitter` should correspond to concrete operations or language features rather than semantic constraints. 3. Change the somewhat mysterious "inputs" and "attributes" argument names throughout the codebase to be the more obvious "args" and "kwargs" that they generally represent (I think "inputs" and "attributes" come from the AST naming). Test Plan: Imported from OSS Reviewed By: navahgar, jamesr66a Differential Revision: D24464197 Pulled By: suo fbshipit-source-id: 1f4b1475b58b5690a0b204e705caceff969533b4	2020-10-28 16:28:05 -07:00
Elias Ellison	ae286d81e0	[JIT] improve alias analysis for list constructs (#39111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39111 In our present alias analysis, we consider any Value that enter another container as entering the heap, and thus aliasing all other heap values of the same type. There are a number of advantages to this approach: - it is not to hard to maintain the aliasDb implementation - it is much easier from an op schema perspective - there are many composite list ops registered internally and externally that would be tricky to register and get right if we did something more complicated - It limits the size of the AliasDb, because a container of size 10 only contains a single memory dag element instead of 10 elements. The downside is that we have are unable to handle the simple and extremely common case of a list of tensors being used in an ATen op. In an example like: ``` def foo(input): x = torch.tensor([1, 2, 3, 4]) y = [x, x] input.add_(1) return torch.cat(y) ``` we will consider x to be written to. any write to any wildcard element (an element that enters a tuple, an element that is taken from a list) will mark x as written to. This can be limiting for our ability to create a functional subset and fuse graphs - as a result, 4 of TorchVision classification models could not be functionalized. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23828003 Pulled By: eellison fbshipit-source-id: 9109fcb6f2ca20ca897cae71683530285da9d537	2020-09-22 09:38:59 -07:00
Wanchao Liang	ab6126b50e	[rpc][jit] support remote call in TorchScript (#43046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43046 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23621108 Pulled By: wanchaol fbshipit-source-id: e8152c6cdd3831f32d72d46ac86ce22f3f13c651	2020-09-11 14:59:51 -07:00
Wanchao Liang	3e5df5f216	[rpc][jit] support rpc_sync in TorchScript (#43043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43043 This add the support for rpc_sync in TorchScript in a way similar to rpc_async Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23252039 Pulled By: wanchaol fbshipit-source-id: 8a05329cb8a24079b2863178b73087d47273914c	2020-09-11 14:59:47 -07:00
Elias Ellison	a7e7981c0b	Use prim::TensorExprGroup interned symbol (#43635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43635 Intern the symbol, no functional changes. Aliasing need to be looked at but this should be done in a separate PR; this PR is just changing the symbol. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358806 Pulled By: eellison fbshipit-source-id: f18bcd142a0daf514136f019ae607e4c3f45d9f8	2020-08-31 11:52:16 -07:00
Nikolay Korovaiko	000739c31a	Function calls for fallback paths (#43274 ) Summary: This PR adds API to package unoptimized/fallback blocks as function calls. It's mainly meant to be used by TensorExpressionsFuser and SpecializeAutogradZero passes as both specialize the original graph but would also like to provide a fallback path in case the assumptions under which the graph was specialized do not hold for some inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43274 Reviewed By: malfet Differential Revision: D23406961 Pulled By: Krovatkin fbshipit-source-id: ef21fc9ad886953461b09418d02c75c58375490c	2020-08-28 23:31:02 -07:00
Elias Ellison	01f974eb1e	Specialize optionals for grad_sum_to_size (#43633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43633 In the backward graph, _grad_sum_to_size is inserted whenever a possibly broadcasting op is called:" `"aten::_grad_sum_to_size(Tensor(a) self, int[]? size) -> Tensor(a)"` If a broadcast occurred, a sum is called, otherwise the second input is None and it is a no-op. Most of the time, it's a no-op (in the fast RNNs benchmark > 90% of the time). We can get rid of this op by profiling the optionality of the second input. I added `prim::profile_optional` to do this, which counts the number of times it saw a None value and the number of times it saw a value present. When specializing the backward graph, we insert checks for values we profiled as None, and in the optimized block can remove the grad_sum_to_size calls that use those values. In the future we may revisit this when NNC supports reductions and we want to replace grad_sum_to_size with sums as well, but I think this is worth landing now. Test Plan: Imported from OSS Reviewed By: bwasti, ZolotukhinM Differential Revision: D23358809 Pulled By: eellison fbshipit-source-id: a30a148ca581370789d57ba082d23cbf7ef2cd4d	2020-08-27 14:35:37 -07:00
Nikolay Korovaiko	a97ca93c0e	remove prim::profile and special-casing (#43160 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43160 Reviewed By: ZolotukhinM Differential Revision: D23284421 Pulled By: Krovatkin fbshipit-source-id: 35e97aad299509a682ae7e95d7cef53301625309	2020-08-22 23:52:36 -07:00

1 2

71 Commits