pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xuehai Pan	6ff1e43a41	[BE][Easy][13/19] enforce style for empty lines in import segments in `test/j*/` (#129764 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129764 Approved by: https://github.com/ezyang	2024-08-01 12:13:42 +00:00
Yuanhao Ji	604c9c5601	Enable UFMT on all of `test/jit` (#123623 ) Partially addresses #123062 Ran lintrunner on: - `test/jit` with command: ```bash lintrunner -a --take UFMT --all-files ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123623 Approved by: https://github.com/ezyang	2024-04-11 23:45:05 +00:00
Jason Ansel	ae57bd6630	PT2/TorchScript interoperability fix (#94678 ) Allows torch.compile() to inline into ScriptFunction Pull Request resolved: https://github.com/pytorch/pytorch/pull/94678 Approved by: https://github.com/ezyang	2023-02-15 01:21:10 +00:00
Elias Ellison	6694fdaccd	Clean up profiling mode and profiling executor strategy (#73875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875 Previously we had a few settings: - getExecutor - which toggled between Profiling Executor and Legacy - getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations) and then... - getProfilingMode - which would set PE to 0 specializtions. The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93. The tests here are failing but get fixed with the PR above it, so i'll squash for landing. Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D34938130 Pulled By: eellison fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b (cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)	2022-03-29 18:38:51 +00:00
Elias Ellison	fc3622904f	Add Gflags for fusion strategy and make it local to executor (#73668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73668 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D34598128 Pulled By: eellison fbshipit-source-id: a67a258d5dde8ad81bf19151bcf7a4da1321a2c0 (cherry picked from commit 578d3150f1557c4df91d35ad464896610bf8c23b)	2022-03-03 19:41:46 +00:00
Jane Xu	09c7771e9c	Set test owners for jit tests (#66808 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66808 Reviewed By: mrshenli Differential Revision: D31761414 Pulled By: janeyx99 fbshipit-source-id: baf8c49ff9c4bcda7b0ea0f6aafd26380586e72d	2021-10-25 07:51:10 -07:00
Bert Maher	8dda299d96	Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776 I reverted this out of an abundance of caution because some test failures occurred, but they were all due to precision issues fixed lower in this stack. Let's try again. I've rolled the elimination of the allow-parallelism-in-fusions toggle into this diff since they're pretty tightly coupled. ghstack-source-id: 136529847 Test Plan: CI Reviewed By: huiguoo Differential Revision: D30484555 fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59	2021-08-24 18:56:55 -07:00
Bert Maher	76da46ccdc	Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism Test Plan: revert-hammer Differential Revision: D30417127 (`6600bc9651`) Original commit changeset: b77d7c68364f fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1	2021-08-21 03:38:07 -07:00
Bert Maher	6600bc9651	Remove flag to toggle CPU fusion in the presence of parallelism (#63514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30417127 Pulled By: bertmaher fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e	2021-08-20 11:18:19 -07:00
Bert Maher	8e82e932f3	Reland: D27652485: [nnc] Enable CPU fusion only when num_threads == 1" (#56120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120 This reverts commit `ad17fadbfc` (D27786457). The big annoyance here is that depending on the threading mode you may not be able to toggle num_threads at will, so the fusion tests won't fail. I hate this solution, but I'm adding a secondary override for the TE fuser. Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're OK if you're running with 1 thread, or you can add `_jit_set_texpr_parallel_cpu_enabled` to enable it anyways. This is (a) mainly for tests, since a real user probably won't fiddle aimlessly with the thread count, and (b) will go away once NNC's threading support is fully baked. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D27788199 Pulled By: bertmaher fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1	2021-04-15 15:50:18 -07:00
Natalia Gimelshein	ad17fadbfc	Revert D27652485: [nnc] Enable CPU fusion only when num_threads == 1 Test Plan: revert-hammer Differential Revision: D27652485 (`e7e164f9e6`) Original commit changeset: 182580cf758d fbshipit-source-id: e3c95b06d1eef668095f3cf461485395179d94af	2021-04-14 20:23:15 -07:00
Bert Maher	e7e164f9e6	[nnc] Enable CPU fusion only when num_threads == 1 (#55621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55621 Fuser support for thread-level parallelism is a work in progress, so only fuse when the program is running single-threaded. ghstack-source-id: 126069259 Test Plan: observe fusion groups formed when torch.get_num_threads==1 vs not Reviewed By: ZolotukhinM Differential Revision: D27652485 fbshipit-source-id: 182580cf758d99dd499cc4591eb9d080884aa7ef	2021-04-14 09:16:54 -07:00
Raghavan Raman	c7a70eec1b	Make LLVM the default backend for TE (#52314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52264 When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression. This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52314 Reviewed By: ejguan Differential Revision: D26491294 Pulled By: navahgar fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb	2021-02-18 12:00:38 -08:00
Thomas Viehmann	ea087e2d92	JIT: guard DifferentiableGraph node (#49433 ) Summary: This adds guarding for DifferentiableGraph nodes in order to not depend on Also bailing out on required gradients for the CUDA fuser. Fixes https://github.com/pytorch/pytorch/issues/49299 I still need to look into a handful of failing tests, but maybe it can be a discussion basis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49433 Reviewed By: ngimel Differential Revision: D25681374 Pulled By: Krovatkin fbshipit-source-id: 8e7be53a335c845560436c0cceeb5e154c9cf296	2021-01-08 20:01:27 -08:00
Elias Ellison	71ddc0ba19	[TensorExpr Fuser] Add support for nodes which have tensor constant inputs (#47814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47814 Previously, we would bail completely if a node had a constant tensor input. This PR adds support for this case by lifting the constant out of the fusion graph after we've done fusion. It might be nice to add support for Tensor Constants in NNC itself, but it looked kind of tricky and this is an easy enough temporary solution. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25286215 Pulled By: eellison fbshipit-source-id: 9ff67f92f5a2d43fd3ca087569898666525ca8cf	2020-12-10 12:19:47 -08:00
Elias Ellison	0a42003f8f	[TensorExpr Fuser] Handle fusing values with un-profiled uses (#48689 ) Summary: Copying myself from the code comments: A value can be profiled with differently typed uses. This can occur from: - having a use which is not executed, so the type will be TensorType::get() - control-flow that depends on tensor type: if x.size() == 2 op(x) else op(x) - mutation of the value on a field represented in the tensor type op(x); x.resize_([...]); op(x) The most common case today with num_profiles = 1 is from the first case. Here we can just ignore non-profiled uses, and choose any of the profiled uses. Because we guard all tensor types in the runtime, even if we set a Value to have a profiled type from one use and then execute a use with a different profiled type, we will still be correct. In the future we could consider unifying the types of uses, or adding a type refinement node so uses can have the correct corresponding type. Fix for https://github.com/pytorch/pytorch/issues/48043 I think there's probably too much context required for that to be a good bootcamp task... There was an observed missed fusion opportunity in detectron2 because of this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48689 Reviewed By: ngimel Differential Revision: D25278791 Pulled By: eellison fbshipit-source-id: 443e5e1254446a31cc895a275b5f1ac3798c327f	2020-12-04 12:48:10 -08:00
Elias Ellison	5dd288eb06	[JIT] Regularize tensorexpr fuser strategy with other fusers (#44972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44972 Previously, our fusion strategy would be: - start at the end of the block, find a fusable node - iteratively try to merge inputs into the fusion group, sorted topologically This strategy works pretty well, but has the possibility of missing fusion groups. See my attached test case for an example where we wouldn't find all possible fusion groups. bertmaher found an example of a missed fusion groups in one of our rnn examples (jit_premul) that caused a regression from the legacy fuser. Here, I'm updating our fusion strategy to be the same as our other fusion passes - create_autodiff_subgraphs, and graph_fuser.cpp. The basic strategy is: - iterate until you find a fusible node - try to merge the nodes inputs, whenever a succesful merge occurs restart at the beginning of the nodes inputs - after you've exhausted a node, continue searching the block for fusion opportunities from the node - continue doing this on the block until we go through an iteration without an succesful merges Since we create the fusion groups once, and only re-specialize within the fusion groups, we should be running this very infrequently (only re-triggers when we fail undefinedness specializations). Also bc it's the same algorithm as the existing fuser it is unlikely to cause a regression. Test Plan: Imported from OSS Reviewed By: Krovatkin, robieta Differential Revision: D23821581 Pulled By: eellison fbshipit-source-id: e513d1ef719120dadb0bfafc7a14f4254cd806ee	2020-09-24 15:34:21 -07:00
Elias Ellison	8df0400a50	Fix fallback graph in specialize autogradzero (#44654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44654 Previously we weren't creating a fallback graph as intended in specialize autograd zero, so if a Tensor failed one of our undefinedness checks we would run the backward normally without reprofiling & optimizing. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23691764 Pulled By: eellison fbshipit-source-id: 10c6fa79518c84a6f5ef2bfbd9ea10843af751eb	2020-09-15 11:12:20 -07:00
Elias Ellison	856510c96d	[JIT] Dont optimize shape info in batch_mm (#44565 ) Summary: We run remove profile nodes and specialize types before batch_mm, so we cannot run peepholes on the type information of tensors since these properties have not been guarded to be guaranteed to be correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44565 Reviewed By: albanD Differential Revision: D23661538 Pulled By: eellison fbshipit-source-id: 0dd23a65714f047f49b4db4ec582b21870925fe1	2020-09-14 12:34:20 -07:00
Elias Ellison	cc5a1cf616	[JIT] Erase shapes before fallback graph (#44434 ) Summary: Previously the specialized types were copied over to the fallback function, although the tensors in the fallback type were not of that type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44434 Reviewed By: SplitInfinity Differential Revision: D23611943 Pulled By: eellison fbshipit-source-id: 2ea88a97529409f6c5c4c1f59a14b623524933de	2020-09-10 12:07:31 -07:00
Sujoy Saraswati	54931ebb7b	Release saved variable from DifferentiableGraphBackward (#42994 ) Summary: When the backward ops execute via the autograd engine evaluate_function(), the fn.release_variables() is called to release the SavedVariables. For the eager mode ops, this releases the saved inputs that was required for backward grad function. However, with TorchScript, we get a DifferentableGraph and the DifferentiableGraphBackward() doesn't implement a release_variables(). This leads to the SavedVariables to be alive longer. Implement release_variables() for DifferentiableGraphBackward to release these SavedVariables early. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42994 Reviewed By: izdeby Differential Revision: D23503172 Pulled By: albanD fbshipit-source-id: d87127498cfa72883ae6bb31d0e6c7056c4c36d4	2020-09-08 14:36:52 -07:00
Elias Ellison	df67f0beab	[TensorExpr fuser] Guard nodes that have tensor output properties determined by non-tensor inputs (#44137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44137 We only insert guards on Tensor types, so we rely on the output of a node being uniquely determined by its input types. bail if any non-Tensor input affects the output type and cannot be reasoned about statically Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23543602 Pulled By: eellison fbshipit-source-id: abd6fe0b1fd7fe6fc251694d4cd442b19c032dd7	2020-09-05 01:40:18 -07:00
Elias Ellison	6868bf95c6	[JIT] Fuser match on schemas not node kind (#44083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44083 Match on the complete schema of a node instead of its node kind when deciding to fuse it. Previously we matched on node kind, which could fail with something like `aten::add(int, int)` and if a new overload was added to an op without corresponding NNC support we would fuse it. Follow ups are: - bail when an output tensor type isnt uniquely determined by the input types (e.g. aten::add and the second input could be either a float or an int) - remove NNC lowering for _tanh_backward & _sigmoid_backward - Validate that we support all of the overloads here. I optimistically added ops that included Tensors, it's possible that we do not support every overload here. This isn't a regression, and this PR is at least improving our failures in that regard. I can do any of these as part of this PR if desired, but there are a number of failures people have run into that this PR fixes so I think it would be good to land this sooner than later. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23503704 Pulled By: eellison fbshipit-source-id: 3ce971fb1bc3a7f1cbaa38f1ed853e2db3d67c18	2020-09-03 14:47:19 -07:00
Bert Maher	33d51a9b32	Respect canFuseOn{CPU,GPU} in TE fuser (#43967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D23469048 Pulled By: bertmaher fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb	2020-09-02 18:00:25 -07:00
Elias Ellison	3c8b1d73c9	Update aliasing in tensorexpr fuser (#43743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43743 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D23385205 Pulled By: eellison fbshipit-source-id: 097a15d5bcf216453e1dd144d6117108b3deae4d	2020-08-31 11:52:26 -07:00
Elias Ellison	5da8a7bf2d	use types in the IR instead of vmap (#43742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43742 We can remove all prim::profiles, update the values to their specialized profiled types, and then later guard the input graphs based on the input types of the fusion group. After that we remove specialized tensor types from the graph. This gets rid of having to update the vmap and removes all of the profile nodes in fusing. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D23385206 Pulled By: eellison fbshipit-source-id: 2c84bd1d1c38df0d7585e523c30f7bd28f399d7c	2020-08-31 11:52:23 -07:00
Elias Ellison	01f974eb1e	Specialize optionals for grad_sum_to_size (#43633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43633 In the backward graph, _grad_sum_to_size is inserted whenever a possibly broadcasting op is called:" `"aten::_grad_sum_to_size(Tensor(a) self, int[]? size) -> Tensor(a)"` If a broadcast occurred, a sum is called, otherwise the second input is None and it is a no-op. Most of the time, it's a no-op (in the fast RNNs benchmark > 90% of the time). We can get rid of this op by profiling the optionality of the second input. I added `prim::profile_optional` to do this, which counts the number of times it saw a None value and the number of times it saw a value present. When specializing the backward graph, we insert checks for values we profiled as None, and in the optimized block can remove the grad_sum_to_size calls that use those values. In the future we may revisit this when NNC supports reductions and we want to replace grad_sum_to_size with sums as well, but I think this is worth landing now. Test Plan: Imported from OSS Reviewed By: bwasti, ZolotukhinM Differential Revision: D23358809 Pulled By: eellison fbshipit-source-id: a30a148ca581370789d57ba082d23cbf7ef2cd4d	2020-08-27 14:35:37 -07:00
Elias Ellison	a19fd3a388	Add undefined specializations in backward (#43632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43632 Specialize the backward graph by guarding on the undefinedness of the input tensors. The graph will look like: ``` ty1, ty2, succesful_checks = prim::TypeCheck(...) if (succesful_checks) -> optimized graph else: -> fallback graph ``` Specializing on the undefinedness of tensors allows us to clean up the ``` if any_defined(inputs): outputs = <original_computation> else: outputs = autograd zero tensors ``` blocks that make up the backward graph, so that we can fuse the original_computation nodes together. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23358808 Pulled By: eellison fbshipit-source-id: f5bb28f78a4a3082ecc688a8fe0345a8a098c091	2020-08-27 14:35:35 -07:00
Elias Ellison	a4cf4c2437	refactor tests (#43631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43631 I added a new test for just profiler stuff - I don't think the test should go in test_jit.py. Maybe this should just go in test_tensorexpr_fuser, but I'm not really testing tensorexpr stuff either... LMK Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358810 Pulled By: eellison fbshipit-source-id: 074238e1b60e4c4a919a052b7a5312b790ad5d82	2020-08-27 14:35:33 -07:00

29 Commits