pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jiewen Tan	dc37090ec5	[LT] Support diagonal op (#75230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75230 Op diagonal is a view op which we can't code-gen yet. Therefore, support it by making hand-written IR construction and lowering. Test Plan: ./build/bin/test_lazy --gtest_filter=LazyOpsTest.TestDiagonal* Reviewed By: wconstab Differential Revision: D35378316 Pulled By: alanwaketan fbshipit-source-id: 7958d00107aef20ac37aabcf2868346240977530 (cherry picked from commit 84155528fce484627c9688cfd92fd4aeb68219e5)	2022-04-08 19:49:42 +00:00
Nikolay Korovaiko	4a85145bbd	Ansley's rebase of DimensionNode onto master (#75352 ) Summary: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75352 Reviewed By: wconstab Differential Revision: D35455859 Pulled By: Krovatkin fbshipit-source-id: e24c81d63dc66d03b752cc8de5cb551d84b003ac (cherry picked from commit 4ad371cb4cc88860ce8ec398d82083f6759e3fcf)	2022-04-08 17:22:56 +00:00
John Clow	f1db3e465a	Adding integration of SSA into LazyTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/75050 Approved by: https://github.com/Krovatkin	2022-04-07 19:49:41 +00:00
Pavithran Ramachandran	3001bda304	[PyTorchEdge] Backport from v9 flatbuffer to v8 pickle (#75201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75201 In this diff: 1. Bump supported version to 9, which will serve as a placeholder for upcoming version bump to v9 for flatbuffer format migration. 2. Implements backport from v9 flatbuffer file to v8 pickle file. ghstack-source-id: 153225189 (Note: this ignores all push blocking failures!) Test Plan: fb: ``` cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions Parsing buck files: finished in 0.7 sec Downloaded 0/25 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 20.7 sec (100%) 21783/21783 jobs, 5/21783 updated cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.FlatbufferBackPortTest Parsing buck files: finished in 0.7 sec Building: finished in 4.5 sec (100%) 12972/53298 jobs, 0/53298 updated Total time: 5.3 sec More details at https://www.internalfb.com/intern/buck/build/b658d597-d358-4293-97cb-28e7612b96e8 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 35d5542d-6ee3-4c28-be10-1d822c7a6fef Trace available for this run at /tmp/tpx-20220308-090347.891303-35d5542d-6ee3-4c28-be10-1d822c7a6fef/trace.log RemoteExecution session id: reSessionID-35d5542d-6ee3-4c28-be10-1d822c7a6fef-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 490 tests discovered (22.838) ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.FlatbufferBackPortTest (0.289) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000 ``` Reviewed By: iseeyuan Differential Revision: D34702597 fbshipit-source-id: 5c203c29d13360d7934ce6e57557739e7038c05e (cherry picked from commit 6189e08a2bd968fdab636f77cb6bd73d6c36beb2)	2022-04-07 19:43:57 +00:00
Wang, Eikan	252e1ccce6	Enable TE fuser to support user defined operator (#73073 ) Summary: PyTorch supports registering a custom operator by `TORCH_LIBRARY_FRAGMENT` / `TORCH_LIBRARY_IMPL` and `torch::jit::tensorexpr::getNNCLoweringRegistry` could insert a custom operator. But the te fuser passes conditional check does not support custom operator. The `isSupported` of `tensorexpr_fuser` checks whether the `Node` is `get_tensorexpr_elementwise_set()`, `supported_non_eltwise_set()`, `supported_misc_set` and `supported_reduction_set`. If a custom operator needs to be added to the TE fusion group, the checked will block it. Taking the RN50 as an example, we can speed up the model by fusing the convolution and consecutive element-wise operator into a custom operator. The framework overhead becomes non-negligible when the computation becomes more efficient, especially for the latency mode and the tiny models. If the TE fuser allows adding the custom operator to the fusion group, then the entire RN50 model could be fused by TE as a single operator/function consisting of "ExternalCalls" and TE-IR. This could significantly reduce framework overhead, which in turn improves RN50 E2E performance. The same goes for other models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73073 Reviewed By: pbelevich Differential Revision: D35453165 Pulled By: ZolotukhinM fbshipit-source-id: a764cf340b0b1e05fe230649cbe44f5786bdd37d (cherry picked from commit ee95aa4d36714540fbb216a338799e6a6bb966d5)	2022-04-07 04:36:39 +00:00
Martin Yuan	00c1e01ad0	Remove internal logic to handle bytecode version 3 (#57775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57775 The minimum supported bytecode version is updated from 3 to 4. We no longer support version 3 bytecode models. Why? * There are hacky codes in operator loading, that performs differently on one operator on the global bytecode version 3. Instead operator related metadata should be passed (for example, in #56845). To allow future development, we remove the hacky way first. * The bytecode version was bumped from 3 to 4 more than half a year ago. Since all the production models are all bumped to version 4, it's not practical to keep and maintain version 3. The risk to deprecate version 3 is low. Test Plan: Imported from OSS Reviewed By: raziel Differential Revision: D28270791 Pulled By: cccclai fbshipit-source-id: 70b1bd6352fdaae5f8d2173b81578d77018c8e44 (cherry picked from commit 3e930fa381cd01f3705116795c6426df992372fc)	2022-04-07 01:45:52 +00:00
Pavithran Ramachandran	f984e50f39	Extend jit::load to work on flatbuffer file; Take 2 (#75256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75256 ghstack-source-id: 153138970 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D35399581 fbshipit-source-id: dafe9d301009d3f70986ed92bfe06d160ab90ba0 (cherry picked from commit ccc860fd07946de5aae12bc179a0b8bbba83b997)	2022-04-06 17:54:01 +00:00
John Clow	26dcec152c	Added support for SSA for ops not in a JIT graph Pull Request resolved: https://github.com/pytorch/pytorch/pull/74340 Approved by: https://github.com/eellison	2022-04-06 01:45:37 +00:00
Antonio Kim	e1b4117e30	Move shape and operand definitions to base node (#75223 ) Summary: First stage of breaking up https://github.com/pytorch/pytorch/pull/74710 Moves the shape and operand definitions from `TsNode` to the base `Node` CC: wconstab JackCaoG henrytwo Partially Fixes https://github.com/pytorch/pytorch/issues/74628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75223 Reviewed By: zou3519 Differential Revision: D35410285 Pulled By: wconstab fbshipit-source-id: bb84d3fb636882cbe7e18af4b35ff2c0e22aaa58 (cherry picked from commit a4144c9a48379d8a9007cff845796608b597cce1)	2022-04-06 01:43:46 +00:00
Lu Fang	32e58c73c4	Back out "Extend jit::load to work on flatbuffer file" (#75244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75244 Original commit changeset: d653a5af662a Original Phabricator Diff: D35060736 (`d9d34922a0`) Test Plan: Model loading test, verified that D35060736 (`d9d34922a0`) will cause the torch::save => torch::load failure. Reviewed By: yinghai, jianyuh Differential Revision: D35387009 fbshipit-source-id: 9d176992d402d57779e2af3d905b3c1538335298 (cherry picked from commit 6c8cc0d3b8a88b15e35702d70e18bbae8aa4628a)	2022-04-05 09:55:04 +00:00
Nikita Shulga	81d765ef1f	Fix sign-compare violations in cpp tests Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: https://github.com/pytorch/pytorch/pull/75080 Approved by: https://github.com/atalman	2022-04-04 23:05:31 +00:00
Chen Lai	6efc5c1acf	Rewrite upgrader bytecode version from 3 to 4 (content unchanged) (#75120 ) Summary: update the upgrader models by hacking backport logic - copy everything in the model and only rewrite the bytecode version to 4 in D35265596 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75120 ghstack-source-id: 152823046 Test Plan: CI Reviewed By: qihqi Differential Revision: D35321154 fbshipit-source-id: 333158bd0fd9b4819b3b7cf47d80c285934adf3e (cherry picked from commit 74bb2da73a4d18f448b8486772643eac89eb759a)	2022-04-02 01:51:39 +00:00
Pavithran Ramachandran	d9d34922a0	Extend jit::load to work on flatbuffer file (#75022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75022 Extending torch::jit::load to read flatbuffer file ghstack-source-id: 152820697 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D35060736 fbshipit-source-id: d653a5af662a46107ff4fd70209fd2a0a4d40f20 (cherry picked from commit 109e14a54bd279011c8f9066e6c29e8e0b1fc4db)	2022-04-02 01:33:34 +00:00
Pavithran Ramachandran	7aaa75af05	Extending _get_bytecode_version to support flatbuffers format (#75021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75021 Extending `_get_bytecode_version` to support flatbuffers. ghstack-source-id: 152771695 (Note: this ignores all push blocking failures!) Test Plan: ``` ~/fbsource/xplat] cd ~/fbsource/xplat/ && buck test //xplat/caffe2:test_lite_interpreter Building: finished in 0.8 sec (100%) 327/327 jobs, 0/327 updated Total time: 0.9 sec Testing: finished in 06:59.5 min (85 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/caffe2:test_lite_interpreter PASS 412.3s 85 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_interpreter TESTS PASSED ``` Reviewed By: iseeyuan Differential Revision: D34900498 fbshipit-source-id: 65743076d43a933c5381ec128d0268f22c0a8441 (cherry picked from commit 457c76c7d1df6050b941c56a8198162e2e4a3388)	2022-04-01 15:05:37 +00:00
Will Constable	b9e535a64a	Add non-eager registration to dispatch autogen (#74557 ) Summary: Previously, the torchscript backend would be (partially) initialized at startup. - the dispatcher registrations would be registered, - but other backend components would not be initialized until explicitly calling the backend init function With this change, the torchscript backend is not initialized until its explicit initialization function is called. This enables external backends to register their own backend instead of the torchscript backend to the same (Lazy) key. Lands a change contributed by antoniojkim via lazy_tensor_staging branch (https://github.com/pytorch/pytorch/issues/73973) Pull Request resolved: https://github.com/pytorch/pytorch/pull/74557 Reviewed By: bdhirsh Differential Revision: D35051464 Pulled By: wconstab fbshipit-source-id: 5a8b0851293e394f49427d1416ee571a8881fe9f (cherry picked from commit ef745a4a2c8d1d7f9510541a20f1f40625ce29de)	2022-04-01 03:42:53 +00:00
Will Constable	14affba799	Fix ir_metadata Python frames func and remove dead code (#74979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74979 Reviewed By: alanwaketan Differential Revision: D35261641 Pulled By: wconstab fbshipit-source-id: e82b5f17d0043c4a3de72c16fb42fd02a85414fe (cherry picked from commit fc6c0a1654256871361a5ad08926bc39d74cd0c5)	2022-03-31 23:23:36 +00:00
Nikolay Korovaiko	5177f95d21	Introducing SymInt to Pytorch (for tracing size arithmetic) (master rebase) (#74861 ) Summary: This PR introduces `SymInt` type to Pytorch which will be used by LTC and AOTAutograd for tracing size arithmetic and tests. `SymInt` is a C++ union structure [int64_t, SymbolicIntNode*] that wraps around an int64_t field where the value of the field could be an index into a list of `shared_ptr<SymbolicIntNode>` or a real int. This PR doesn't add any support for actually tracing symbolic ints. i.e. data_ for now can only contain real ints. ``` Goal 1: just to show we can add a type to PyTorch core. (wraps int) LANDEABLE Finalize the naming - symint Want the name to be short Does invoke “size” - NO SInt/SymInt/SymbolicInt SInt could mean signed int sym_int or symint or SymInt (originally it was “int”; capitalized implies object semantics, whereas lowercase implies value semantics) JIT schema - symint C++ - symint ``` See more details here: https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8 (`d843f63f2a`)YLw-jxEw Pull Request resolved: https://github.com/pytorch/pytorch/pull/74861 Reviewed By: qihqi, ngimel Differential Revision: D35226230 Pulled By: Krovatkin fbshipit-source-id: 34acf342bd50fcaa4d8d5dd49c2fd6a98823a5b3 (cherry picked from commit 218643f63ef181cabb92d13a6e837eb64f2dda3c)	2022-03-31 21:59:59 +00:00
jjsjann123	873ced7cd0	Nvfuser code bump 030122 (#73627 ) Summary: Things changed in this PR that requires review: test/forward_backward_compatibility/check_forward_backward_compatibility.py Our previous function overload extension names were wrong and has been updated in this PR, hence the compatibility list updated. nvfuser code updates with bug fixes towards failures we encountered in OpInfoTests as well as failures reported by AOTAutograd team. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73627 Reviewed By: Chillee Differential Revision: D34765458 Pulled By: davidberard98 fbshipit-source-id: c81f3d6a1b723fb3a8ba419b7f82227f70440ca7 (cherry picked from commit b6a2c362c37051e44fac31687b2fe272f776551e)	2022-03-31 08:18:22 +00:00
Nikita Shulga	43313cbde3	Revert D34647822: [tensorexpr] Add support for aten::stack Test Plan: revert-hammer Differential Revision: D34647822 (`954c7e2a77`) Original commit changeset: 3b863c71886c Original Phabricator Diff: D34647822 (`954c7e2a77`) fbshipit-source-id: e9ce06c9c8d7caf0fbb2565f0d99035bad685793 (cherry picked from commit b2ff355e9dbaa4e940fb221254223984c3c8a215)	2022-03-31 04:25:43 +00:00
Nikita Shulga	320e5a8268	Revert D34808051: [tensorexpr] Enabled aten::stack in the fuser pass with static shapes Test Plan: revert-hammer Differential Revision: D34808051 Original commit changeset: 213e2ffdf87f Original Phabricator Diff: D34808051 fbshipit-source-id: b618daeb346f784e8ab9525040edcb4a30a39613 (cherry picked from commit e47b973cba5c95e9410f8aecdfd5619de6d4be7c)	2022-03-31 04:25:43 +00:00
Hui Guo	90c3699cc8	[tensorexpr] Enabled aten::stack in the fuser pass with static shapes (#74077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74077 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D34808051 Pulled By: huiguoo fbshipit-source-id: 213e2ffdf87fb1a74104037cea7ef25e4bfd4307 (cherry picked from commit ad9e84842e5b47eda845827d325b08ba361a8286)	2022-03-31 04:25:43 +00:00
Elias Ellison	2ef5611f31	Add comments for adding shape function and linting (#73570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570 Approved by: https://github.com/huiguoo Test Plan: contbuild & OSS CI, see `6d36bbde7e` Reviewed By: pbelevich Differential Revision: D35192688 Pulled By: atalman fbshipit-source-id: b12b80e6a6dd1adaa57a8facb6bb077989faa543 (cherry picked from commit e50478c02592597f12b8490ec5496f76c7d8b8cc)	2022-03-31 04:25:43 +00:00
Nikita Shulga	3036a0309d	[skip ci]Revert "Add comments for adding shape function and linting" This is a technical revert of `6d36bbde7e` to reconcile it with e50478c02592597f12b8490ec5496f76c7d8b8cc (which is the same + lint changes applied) Should be skipped during import	2022-03-30 21:21:28 -07:00
Hui Guo	954c7e2a77	[tensorexpr] Add support for aten::stack (#73801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73801 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D34647822 Pulled By: huiguoo fbshipit-source-id: 3b863c71886c7c6616b16f5d3313079714c8b82a (cherry picked from commit c71778cf6a5724d26b671bf3ee0478add24990e8)	2022-03-30 21:25:15 +00:00
Dave Bort	f82b2d4a82	[PyTorchEdge] Make _load_parameters() handle flatbuffer inputs (#74580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74580 Handle Flatbuffer-serialized parameters. Make `_load_parameters()` detect the input data format and use the correct deserializer to load the parameters. Also, rename `BytecodeDeserializer` to `IValueUnpickler` to make it clear that it unpickles an `IValue` and doesn't have anything to do with bytecode. ghstack-source-id: 152487890 Test Plan: New unit test shows a successful round trip from _save_parameters() to _load_parameters() using flatbuffers. ``` $ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated Total time: 0.6 sec Testing: finished in 0.5 sec (26 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer TESTS PASSED ``` Reviewed By: qihqi Differential Revision: D34488913 fbshipit-source-id: 8d2c0b895699f3b336115d33bf96d49cbf9245d2 (cherry picked from commit 319345deff260826197f8cdf5ac03071b412c72f)	2022-03-30 20:39:58 +00:00
Dave Bort	1659a267f9	[PyTorchEdge] Export flatbuffers from _save_parameters() (#74579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74579 Now that we can convert a module to a flatbuffer, update `_save_parameters()` to optionally write to that format. Also, rename the internal `ScriptModuleSerializer` class to `IValuePickler` to make it more clear that a) it's pickle-specific, and b) it serializes IValues, not Modules. ghstack-source-id: 152487889 Test Plan: New unit test shows that we can produce Flatbuffer-formatted output. ``` $ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated Total time: 0.6 sec Testing: finished in 0.5 sec (26 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer TESTS PASSED ``` A new test in later commit D34488913 tests the full round trip. Reviewed By: qihqi Differential Revision: D34408538 fbshipit-source-id: eea183c31b5e1b2b75a65f384d8a479223a4ae72 (cherry picked from commit de310a15422b65fb7e443f7005d287d9f5f586bc)	2022-03-30 20:39:58 +00:00
Elias Ellison	6d36bbde7e	Add comments for adding shape function and linting Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570 Approved by: https://github.com/huiguoo	2022-03-29 23:02:22 +00:00
Elias Ellison	9c4a63787b	Add api for changing function executor settings, hook up execution with decomposition registry (#74186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74186 Make the execution settings mutable on function_impl so that we can set it for running op decompositions. Add mapping to function objects and show example in test of executing op decompositions. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D34938125 Pulled By: eellison fbshipit-source-id: adf108b2f6c1bd166910c6d7b94245661d67ce0d (cherry picked from commit 9957e33803002d9e71abe4ff802769270b6960d3)	2022-03-29 18:38:52 +00:00
Elias Ellison	0ecf1add1b	Introduce function-local settings for executor, expose in c++ (#74012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74012 This allows setting an executor on a function. The first use case is use to decompositions in C++ without additional fusion passes etc which might not work with custom tensors like batched tensors/vmap. A subsequent use case might be taking advantage of invokees of JIT execution which guard on certain properties before invocation (such as complete shapes in AOT autograd, rank in lazy tensor). Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D34938124 Pulled By: eellison fbshipit-source-id: cf7a45416457942b872322cab47d871a8336bdb5 (cherry picked from commit 9c600eb9ad0f2173f003e511268e97584edae36d)	2022-03-29 18:38:52 +00:00
Elias Ellison	6694fdaccd	Clean up profiling mode and profiling executor strategy (#73875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875 Previously we had a few settings: - getExecutor - which toggled between Profiling Executor and Legacy - getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations) and then... - getProfilingMode - which would set PE to 0 specializtions. The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93. The tests here are failing but get fixed with the PR above it, so i'll squash for landing. Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D34938130 Pulled By: eellison fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b (cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)	2022-03-29 18:38:51 +00:00
Kurt Mohler	5375b2e994	Resolve `int[]?` arguments to new OptionalIntArrayRef class This PR uses the `OptionalArrayRef` template class that was drafted in #64084. Fixes #44409 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70864 Approved by: https://github.com/ezyang	2022-03-26 01:45:50 +00:00
Pavithran Ramachandran	fc2cf3d26f	Back out "Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support `only pickle` and `pickle + flatbuffer` for migration" (#74594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74594 Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default. Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration. Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer BEFORE: ```lang=mermaid graph TD; torch_core-->torch_mobile_deserialize; torch_mobile_core-->torch_mobile_deserialize; jit_module_saving-->torch_core; jit_module_saving-->torch_mobile_core; torch_mobile_deserialize-->caffe2_serialize; torch_mobile_deserialize-->torch_mobile_module; caffe2_serialize-->miniz; flatbuffer_loader-->mobile_bytecode; flatbuffer_serializer-->mobile_bytecode; mobile_bytecode-->flatbuffer_2.0; flatbuffer_loader-->torch_mobile_module; flatbuffer_serializer-->torch_mobile_module; ``` AFTER: ```lang=mermaid graph TD; torch_core-->torch_mobile_deserialize; torch_mobile_core-->torch_mobile_deserialize; jit_module_saving-->torch_core; jit_module_saving-->torch_mobile_core; torch_mobile_deserialize-->caffe2_serialize; torch_mobile_deserialize-->torch_mobile_module; caffe2_serialize-->miniz; flatbuffer_loader-->mobile_bytecode; flatbuffer_serializer-->mobile_bytecode; mobile_bytecode-->flatbuffer_2.0; torch_mobile_deserialize_pickle_and_flatbuffer-->\|new\| flatbuffer_loader; torch_mobile_deserialize_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize; torch_mobile_core_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize_pickle_and_flatbuffer; torch_core_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize_pickle_and_flatbuffer; jit_module_saving_pickle_and_flatbuffer-->\|new\| torch_core_pickle_and_flatbuffer; jit_module_saving_pickle_and_flatbuffer-->\|new\| torch_mobile_core_pickle_and_flatbuffer; flatbuffer_serializer-->torch_mobile_module; jit_module_saving_pickle_and_flatbuffer-->\|new\|jit_module_saving; jit_module_saving_pickle_and_flatbuffer-->\|new\|flatbuffer_serializer; flatbuffer_loader-->torch_mobile_module; ``` Original commit changeset: 780dfb6fd6ba Original Phabricator Diff: D34805092 (`284b2b7135`) ghstack-source-id: 152044801 (Note: this ignores all push blocking failures!) Test Plan: CI ``` ~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 //caffe2/test/cpp/jit:jit -- FlatbufferTest.ExtraFiles Parsing buck files: finished in 0.9 sec Building: finished in 5.3 sec (100%) 12992/54304 jobs, 0/54304 updated Total time: 6.2 sec More details at https://www.internalfb.com/intern/buck/build/2b387fff-f813-4cfa-b53f-eb2378630d4e BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d Trace available for this run at /tmp/tpx-20220323-134108.766518-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d/trace.log RemoteExecution session id: reSessionID-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 486 tests discovered (19.122) ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.187) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693 ``` Similar Build Deps Dags ``` [pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact \| pastry P486770901: https://www.internalfb.com/intern/paste/P486770901/ [pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact \| pastry P486771278: https://www.internalfb.com/intern/paste/P486771278/ ``` pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901 pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278 Reviewed By: iseeyuan Differential Revision: D35067157 fbshipit-source-id: 9044259c17a2e0da79bd6aedb28efbdfd57e23e0 (cherry picked from commit f738069ec3a72e79da56172741d027de514e9e5f)	2022-03-24 21:51:05 +00:00
Will Constable	3547f20872	Land remaining parts of Torchscript Lazy Tensor backend (#74111 ) Summary: Also enables bazel build to run lazy codegen. Bazel (oss) build feeds off the same filelists as cmake/buck (build_variables.bzl), so enabling it is easier than keeping it disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74111 Test Plan: Run CI and verify test_lazy_ops is running via OSS cmake builds Reviewed By: bdhirsh Differential Revision: D34772403 fbshipit-source-id: 8a63f58b9536e6ac1be530667932176ef2549496 (cherry picked from commit e807ffb1918853d10b924fdc24f85ee5b1a39021)	2022-03-22 23:14:03 +00:00
Nikita Shulga	c53b3ed20f	Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support `only pickle` and `pickle + flatbuffer` for migration Test Plan: revert-hammer Differential Revision: D34805092 (`284b2b7135`) Original commit changeset: 57f3fc81d68f Original Phabricator Diff: D34805092 (`284b2b7135`) fbshipit-source-id: 780dfb6fd6ba5f9348f24a2fb3c57971b7155541 (cherry picked from commit bebeb8b84e11c34cbde4857d0e1c291731a7c781)	2022-03-22 22:45:50 +00:00
Pavithran Ramachandran	284b2b7135	Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support `only pickle` and `pickle + flatbuffer` for migration (#74209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74209 Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default. Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration. Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer BEFORE: ```lang=mermaid graph TD; torch_core-->torch_mobile_deserialize; torch_mobile_core-->torch_mobile_deserialize; jit_module_saving-->torch_core; jit_module_saving-->torch_mobile_core; torch_mobile_deserialize-->caffe2_serialize; torch_mobile_deserialize-->torch_mobile_module; caffe2_serialize-->miniz; flatbuffer_loader-->mobile_bytecode; flatbuffer_serializer-->mobile_bytecode; mobile_bytecode-->flatbuffer_2.0; flatbuffer_loader-->torch_mobile_module; flatbuffer_serializer-->torch_mobile_module; ``` AFTER: ```lang=mermaid graph TD; torch_core-->torch_mobile_deserialize; torch_mobile_core-->torch_mobile_deserialize; jit_module_saving-->torch_core; jit_module_saving-->torch_mobile_core; torch_mobile_deserialize-->caffe2_serialize; torch_mobile_deserialize-->torch_mobile_module; caffe2_serialize-->miniz; flatbuffer_loader-->mobile_bytecode; flatbuffer_serializer-->mobile_bytecode; mobile_bytecode-->flatbuffer_2.0; torch_mobile_deserialize_pickle_and_flatbuffer-->\|new\| flatbuffer_loader; torch_mobile_deserialize_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize; torch_mobile_core_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize_pickle_and_flatbuffer; torch_core_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize_pickle_and_flatbuffer; jit_module_saving_pickle_and_flatbuffer-->\|new\| torch_core_pickle_and_flatbuffer; jit_module_saving_pickle_and_flatbuffer-->\|new\| torch_mobile_core_pickle_and_flatbuffer; flatbuffer_serializer-->torch_mobile_module; jit_module_saving_pickle_and_flatbuffer-->\|new\|jit_module_saving; jit_module_saving_pickle_and_flatbuffer-->\|new\|flatbuffer_serializer; flatbuffer_loader-->torch_mobile_module; ``` ghstack-source-id: 151744258 Test Plan: Similar Build Deps Dags ``` [pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact \| pastry P486770901: https://www.internalfb.com/intern/paste/P486770901/ [pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact \| pastry P486771278: https://www.internalfb.com/intern/paste/P486771278/ ``` pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901 pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278 Reviewed By: iseeyuan Differential Revision: D34805092 fbshipit-source-id: 57f3fc81d68fce941a050c35bd8e6f05951183b3 (cherry picked from commit 671ae4ed29e65b86ffe507a503548d3e86ab0ea4)	2022-03-22 20:00:53 +00:00
Han Qi	4b4f652f79	[3/5] Put JIT source inside flatbuffer (#74245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74245 title Test Plan: unittest Reviewed By: iseeyuan Differential Revision: D34881612 fbshipit-source-id: 7037982e9267ad72b86e91cd5f2d92426d71dd56 (cherry picked from commit 88f34eb55b2bee6ef8ef27188e075fa2b8767fdf)	2022-03-17 18:46:47 +00:00
Will Constable	d67a265881	Sync lazy_tensor_staging to master (#74311 ) Summary: This merges changes that have already been reviewed/landed onto lazy_tensor_staging branch. It combines changes from multiple PRs into one diff. updated from lazy_tensor_staging on 3/16 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74311 Test Plan: Run CI to ensure compilation on various platforms Run unit tests on lazy_tensor_staging branch with source version of all these diffs Reviewed By: desertfire Differential Revision: D34929235 fbshipit-source-id: babbc3bbeabc5b8107ee9284ed7765887a148622 (cherry picked from commit d91577a6557343ec536f6859e4808ec1a8a9b685)	2022-03-17 16:08:57 +00:00
Will Constable	44a8d4d998	Add lazy tensor unit tests, disabled (#74309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74309 Since the test file is large, it can be landed on its own and then switched on in the diff that actually builds lazy tensor code. Test Plan: verify CI passes Reviewed By: desertfire Differential Revision: D34928619 fbshipit-source-id: cd556155326f7fb55b3f29031f80bc36c936d565 (cherry picked from commit 60945adbefb6a8d19f89e330f8b344d076b13bfc)	2022-03-17 15:31:26 +00:00
Will Constable	72b1194464	Run lazy tensor codegen in generate_code.py (#73996 ) Summary: Hooks into existing autograd codegen script (generate_code.py) to take advantage of its integrations into buck/cmake/bazel. Adds a new option (--gen_lazy_ts_backend) to. generate_code.py, calling this from CMake OSS build and fbcode build, but not from other internal xplat/ovrsource builds (these could be opted in later) Bazel support is added in a later diff. Includes one generated file (torch/csrc/lazy/generated/LazyIr.h) in a unit test (test/cpp/lazy/test_ir.cpp) to partially verify the generator is working, but does not compile the remaining output sources from the generator yet as they depend on other files not yet landed from lazy_tensor_staging branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73996 Test Plan: OSS/internal CI - verify all builds are working and test_ir.cpp compiles LazyIr.h Reviewed By: ezyang Differential Revision: D34408536 fbshipit-source-id: 8af0aea3b95d81eccafc17d64390d70ddd176515 (cherry picked from commit f930612f2bad61c76eb02d85cfbec9f33a1459dc)	2022-03-17 15:31:26 +00:00
Han Qi	ded82ad7c7	Create method to map JIT module to (source, constant) and back. (#74119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74119 implemented function to generate source as ExtraFilesMap and constants wrote function to construct jit module given (ivalue, source, constant) tripple. Test Plan: unittest Reviewed By: pavithranrao Differential Revision: D34803945 fbshipit-source-id: 2edc798407fe68294cb4c3c7516f5bd143df88c3 (cherry picked from commit 35e54e166b8f0f5cfe8f08c07866b59ae61ee79d)	2022-03-15 18:30:08 +00:00
Taylor Robie	0b1f3bd158	[Profiler] Prefer TSC to wall clock when available (#73855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73855 Calling the clock is one of the most expensive parts of profiling. We can reduce the profiling overhead by using `rdtsc` instead. The tradeoff is that we have to measure and convert. (shift and scale) Test Plan: I added a cpp unit test with very aggressive anti-flake measures. I also ran the overhead benchmark (9 replicates) with `--stressTestKineto` (0.94 -> 0.89 us) and `--stressTestKineto --kinetoProfileMemory` (1.27 -> 1.17 us) Reviewed By: chaekit Differential Revision: D34231071 fbshipit-source-id: e3b3dd7580d93bcc783e87c7f2fc726cb74f4df8 (cherry picked from commit e8be9f8160793c6ee35d5af02bca3e01703e377d)	2022-03-13 18:29:06 +00:00
Taylor Robie	5a58820f01	[Profiler] Specialized AppendOnlyQueue (#73409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73409 We can do better than `vector` or `deque`, and it's sufficiently important to the hot path to justify a custom container. (This is part of the larger queue refactor, but this is a standalone drop-in replacement so we don't need to wait.) Test Plan: It's a pretty simple container type, so I just added a few cpp tests for emplace and read back. I also ran the overhead benchmark (replicates=9) with both `--stressTestKineto` (0.99 -> 0.94 us) and `--stressTestKineto --kinetoProfileMemory` (1.36 -> 1.27 us). Reviewed By: swolchok Differential Revision: D34231072 fbshipit-source-id: ed57299729d444d59cf843a0d38a3ee2240eeec1 (cherry picked from commit 43907948f3a8d2137244e7bb59f43999bd660917)	2022-03-11 19:47:40 +00:00
David Dang	abfaef0aec	[Quant][core] Merged conv packed params and linear packed params (#73486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73486 conv and linear packed params were previously defined in ATen/native/quantized/cpu/conv_packed_params.h> and ATen/native/quantized/cpu/packed_params.h>. These two files have been merged into one and has been relocated to ATen/native/quantized/cpu/packed_params.h>. Differential Revision: D34513286 D34513286 Test Plan: Imported from OSS Reviewed By: dagitses Pulled By: dzdang fbshipit-source-id: 813845af7ea9449e316ab7822efe7460f0bd0d88 (cherry picked from commit 2f627561f27f81977ff73b8863c5e9e719dc4c60)	2022-03-11 15:18:45 +00:00
Ivan Kobzarev	519e226b66	[tensorexp] ExternalCall2 without memcpy (#72225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72225 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D33960933 Pulled By: IvanKobzarev fbshipit-source-id: fc73a3de9e5150919e3806516065b4a6c8316000 (cherry picked from commit f637842c341e0ba94906a0c8a1efc81691dc512c)	2022-03-09 21:19:26 +00:00
Han Qi	0723639b60	Revert D34455360: Multisect successfully blamed D34455360 for test failures Summary: This diff is reverting D34455360 (`61d6c43864`) D34455360 (`61d6c43864`) is making the following tests to fail and this revert diff is either the revert of the blame diff or the revert of the stack of diffs that need to be reverted to revert the blame diff Tests affected: - https://www.internalfb.com/intern/test/562950004334605/ Multisect link: https://www.internalfb.com/intern/testinfra/multisect/756170 Test Plan: NA Reviewed By: zhxchen17 Differential Revision: D34596156 fbshipit-source-id: a465bca0094db3caf6130c80f1ed49eea981359b (cherry picked from commit ef5e5578c64ce9827570757fb016aafa9c782c6a)	2022-03-08 23:18:54 +00:00
Elias Ellison	52ccbf4494	Lock thread/block computation (#73800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73800 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D34647281 Pulled By: eellison fbshipit-source-id: adbdaf24191c4c1b85e0b62564388f2481002ed2 (cherry picked from commit 6cf38015cc14691518b1b5cb7d636e80eb3684fc)	2022-03-04 22:32:08 +00:00
Dave Bort	7b51629c53	[PyTorchEdge] Add getFileFormat() so we can differentiate Zip/Pickle from Flatbuffer (#73707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73707 Add a helper function to detect the file format from the first bytes of a data file or stream. This will be necessary during the migration from Pickle-serialized modules to Flatbuffer-serialized modules. ghstack-source-id: 150384317 Test Plan: Existing tests for ZIP+Pickle continue to pass. New unit tests pass: ``` cd xplat && buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_interpreter Building: finished in 26.6 sec (100%) 3180/3180 jobs, 571/3180 updated Total time: 32.2 sec Testing: finished in 07:08.3 min (89 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/caffe2:test_lite_interpreter //xplat/caffe2:test_lite_trainer PASS 421.1s 81 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_interpreter PASS 103ms 8 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer TESTS PASSED ``` Reviewed By: iseeyuan Differential Revision: D34527859 fbshipit-source-id: ff2d1eabc2f8be1de2e44709c878e2d1a373f0df (cherry picked from commit 5c394848346ab9e374c9e7eed479ad70ed09a7ae)	2022-03-04 19:35:41 +00:00
Han Qi	61d6c43864	Make debug_pkl smaller by only emitting unique traces. (#73368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368 debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings. Since many SourceRange shares the same source, the string for trace can be deduped. The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression). The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup. To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction. Test Plan: unit test Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents: ``` [qihan@devvm5585.vll0 ~]$ du archive -h 4.0K archive/xl_model_weights 3.7M archive/extra 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform 8.0K archive/code/__torch__/caffe2/torch/fb 8.0K archive/code/__torch__/caffe2/torch 8.0K archive/code/__torch__/caffe2 20M archive/code/__torch__/torch/fx/graph_module 20M archive/code/__torch__/torch/fx 8.0K archive/code/__torch__/torch/classes 20M archive/code/__torch__/torch 20M archive/code/__torch__ 20M archive/code 2.7M archive/constants 35M archive [qihan@devvm5585.vll0 ~]$ du resaved -h 4.0K resaved/extra 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform 8.0K resaved/code/__torch__/caffe2/torch/fb 8.0K resaved/code/__torch__/caffe2/torch 8.0K resaved/code/__torch__/caffe2 1.3M resaved/code/__torch__/torch/fx/graph_module 1.3M resaved/code/__torch__/torch/fx 8.0K resaved/code/__torch__/torch/classes 1.4M resaved/code/__torch__/torch 1.4M resaved/code/__torch__ 1.4M resaved/code 2.7M resaved/constants 13M resaved [qihan@devvm5585.vll0 ~]$ ``` Reviewed By: gmagogsfm Differential Revision: D34455360 fbshipit-source-id: 8cc716f9bba7183746b1b4ecc33a2de34ac503b9 (cherry picked from commit f1a04730fc9ac8fdab6c8e4c44cb5529e42090e4)	2022-03-02 08:37:08 +00:00
Mengwei Liu	9ce9803abe	[PyTorch] Add codegen unboxing ability (#69881 ) Summary: RFC: https://github.com/pytorch/rfcs/pull/40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In https://github.com/pytorch/pytorch/issues/71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023 (cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)	2022-03-01 23:28:13 +00:00
Elias Ellison	d3d74e9040	Allow custom registration of shape functions (#73270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73270 Together with open registration of NNC lowerings this should make possible to add support for custom operators, including internal fb-ops Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D34451275 Pulled By: eellison fbshipit-source-id: ae8ae2deb93caa6770e738217461e65853897b55 (cherry picked from commit ea6b7e8a6d8f970a20e68d02eefc5c951e32aa07)	2022-02-28 17:44:45 +00:00
Pavithran Ramachandran	62eb7d64cf	[PyTorchEdge] Extend flatbuffer to support extra files map (#72951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72951 Extend flatbuffer to support extra files map Flatbuffer schema has extra files. The users can write extra files by providing a `map<string, string>` which will be part of the flatbuffer model asset and and can be loaded back similar to pickle. ghstack-source-id: 149622799 Test Plan: fb: ```[pavithran@devvm5216.vll0 ~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.ExtraFiles Parsing buck files: finished in 0.7 sec Downloaded 0/8 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 20.0 sec (100%) 22343/22343 jobs, 4/22343 updated Total time: 20.7 sec More details at https://www.internalfb.com/intern/buck/build/7dba5034-d623-4a1e-afa1-b0e809df7066 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 9c1ac1e0-a8c0-4a62-95df-8f49695aa7d1 Trace available for this run at /tmp/tpx-20220216-144630.207992/trace.log RemoteExecution session id: reSessionID-9c1ac1e0-a8c0-4a62-95df-8f49695aa7d1-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7318349470518809 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 468 tests discovered (17.211) ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.169) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7318349470518809```` Reviewed By: iseeyuan Differential Revision: D34286346 fbshipit-source-id: 4e09ab25b8ed6af6f8923db3aab046c255f13bb8 (cherry picked from commit ce8d88e22a360b25253d8a75f428d523fa88a79a)	2022-02-24 19:39:32 +00:00
Jacob Szwejbka	faacb8ab36	[Pytorch Edge] Lean Runtime Test Summary: As far as I can tell theres no CI that actually runs the lean_runtime. This should add it I think. (Is this directory covered by CI?) Next up is to create some test for min_runtime_lib (Note: this ignores all push blocking failures!) Test Plan: buck test :lean_runtime_delegate_flatbuffer_test Reviewed By: iseeyuan Differential Revision: D34255148 fbshipit-source-id: b44693220e93869edd984bbcd17d33db4007a4ea (cherry picked from commit 0a4a6b5bd2b4a1f8cce8bc1c4a22dad9539631c1)	2022-02-24 18:40:47 +00:00
Alban Desmaison	3bd1507ff2	Revert D33994011: Make debug_pkl smaller by only emitting unique traces. Test Plan: revert-hammer Differential Revision: D33994011 (`3d37f5b052`) Original commit changeset: 8e6224c6e942 Original Phabricator Diff: D33994011 (`3d37f5b052`) fbshipit-source-id: 885e739efa1081382e1fcf9c6cccba92c57e9f7a (cherry picked from commit a6d98c85a736c2eb321a6f38005dd0f5dc43eb87)	2022-02-24 16:38:55 +00:00
Han Qi	3d37f5b052	Make debug_pkl smaller by only emitting unique traces. (#72596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72596 debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings. Since many SourceRange shares the same source, the string for trace can be deduped. The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression). The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup. To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction. Test Plan: unit test Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents: ``` [qihan@devvm5585.vll0 ~]$ du archive -h 4.0K archive/xl_model_weights 3.7M archive/extra 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform 8.0K archive/code/__torch__/caffe2/torch/fb 8.0K archive/code/__torch__/caffe2/torch 8.0K archive/code/__torch__/caffe2 20M archive/code/__torch__/torch/fx/graph_module 20M archive/code/__torch__/torch/fx 8.0K archive/code/__torch__/torch/classes 20M archive/code/__torch__/torch 20M archive/code/__torch__ 20M archive/code 2.7M archive/constants 35M archive [qihan@devvm5585.vll0 ~]$ du resaved -h 4.0K resaved/extra 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform 8.0K resaved/code/__torch__/caffe2/torch/fb 8.0K resaved/code/__torch__/caffe2/torch 8.0K resaved/code/__torch__/caffe2 1.3M resaved/code/__torch__/torch/fx/graph_module 1.3M resaved/code/__torch__/torch/fx 8.0K resaved/code/__torch__/torch/classes 1.4M resaved/code/__torch__/torch 1.4M resaved/code/__torch__ 1.4M resaved/code 2.7M resaved/constants 13M resaved [qihan@devvm5585.vll0 ~]$ ``` Reviewed By: JasonHanwen Differential Revision: D33994011 fbshipit-source-id: 8e6224c6e942e91c3403f686c8f0937d1002ed41 (cherry picked from commit a7014dd4029308c95007f362a57c31796d686647)	2022-02-24 09:31:16 +00:00
Hui Guo	5eb5b61221	[tensorexpre] Add typecast when src and dest buf types are different in PlacementAllocate (#71934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71934 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33826700 Pulled By: huiguoo fbshipit-source-id: 9fb29a43ab5983586a6bfde3a34d7e2f2120ab0a (cherry picked from commit 2bee018691ec888cb1ec761528951f5745d7ef79)	2022-02-23 19:36:50 +00:00
Yedidya Feldblum	7a5b0efc64	[caffe2] fix build failures in optimized builds under clang Summary: There are various possible approaches, but the approach chosen minimizes disruption to source control blame. Addresses: ``` error: Function _ZN23FunctionalTest_Pad_Test8TestBodyEv is too big to optimize [-Werror,-Wignored-optimization-argument] ``` Test Plan: buck2 build mode/opt caffe2/test/cpp/api:functional Reviewed By: jamesr66a Differential Revision: D34027291 fbshipit-source-id: 9dfd771ad56d3d4bc0d41b38b04654c8dae7c006 (cherry picked from commit `d43b5a7ed6`)	2022-02-22 22:31:47 +00:00
Raghavan Raman	0d66748948	[jit] Add tests for JIT with dynamic shape fusion (#72201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72201 Reviewed By: mikaylagawarecki Differential Revision: D34067211 Pulled By: navahgar fbshipit-source-id: 2c13bb43c76c7fed720ad37892d2177c3dc0b924 (cherry picked from commit `eed2d8cea4`)	2022-02-18 23:29:08 +00:00
Alban Desmaison	0951cb513a	Revert D34342689: Revert D34250357: Sync lazy_tensor_staging back to master Test Plan: revert-hammer Differential Revision: D34342689 Original commit changeset: 43f6da6986f7 Original Phabricator Diff: D34250357 (`69389fb542`) fbshipit-source-id: 8a3fb74877e719e9b9577b58027b4e7061a04ef0 (cherry picked from commit `c749f08e7a`)	2022-02-18 17:31:21 +00:00
Alban Desmaison	86a961af87	Revert D34250357: Sync lazy_tensor_staging back to master Test Plan: revert-hammer Differential Revision: D34250357 (`69389fb542`) Original commit changeset: aa7d589f6050 Original Phabricator Diff: D34250357 (`69389fb542`) fbshipit-source-id: 43f6da6986f7fc5189d641b7803adc5ada27194c (cherry picked from commit `3c930a5e4e`)	2022-02-18 15:47:37 +00:00
Will Constable	69389fb542	Sync lazy_tensor_staging back to master (#72875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72875 This diff contains changes from several PRs landed to lazy_tensor_staging branch. * generating 'fallback' overrides for each codegenned op, useful for debugging * supports operators which are missing aten:: symbols for op names, instead using their string counterpart * makes the IR class a base class instead of hardcoding the assumption of TS It also resolves lint issues and in particular cleans up the following: * {Type}s shouldn't be passed into isValueType, and using the catch-all base class of CType is nicer than specifying a list of types. Fixes #72852 Test Plan: test manually on lazy_tensor_staging branch Reviewed By: shunting314 Differential Revision: D34250357 fbshipit-source-id: aa7d589f605055d5d02bc77c77fa6f1182ff7497 (cherry picked from commit `2f8f5e4971`)	2022-02-18 03:49:46 +00:00
Raghavan Raman	6d33852685	[NNC] TensorExprKernel state should not be modified on calls to run methods (#73028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73028 A typical use case for `TensorExprKernel` is to create the kernel once and call it multiple times, possibly in parallel. For the parallel calls to work, we need to ensure that the run() method calls do not change any state in `TensorExprKernel`. Before this change, the `run()` method was modifying the sizes and strides vectors when dynamic shapes were present. This manifested as a data race when running a model with Static Runtime. ghstack-source-id: 149398820 Test Plan: ``` buck build mode/dev-asan //caffe2/test/cpp/tensorexpr:tensorexpr ./buck-out/dev/gen/caffe2/test/cpp/tensorexpr/tensorexpr --gtest_filter="DynamicShapes.MultiThreadedExecution" ``` Reviewed By: eellison Differential Revision: D34287960 fbshipit-source-id: d311f3c5a66c5d5de4e1deaeaa01816b53e9906e (cherry picked from commit `161568bfae`)	2022-02-17 23:14:27 +00:00
Mike Iovine	d1c5f9e439	[JIT][SR] Introduce prim::IfThenElse (#72587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72587 This pattern frequently appears in a few graphs: ``` %result = prim::If(%condition) block0(): -> (%a) block1(): -> (%b) ``` This is slow, particularly in static runtime. Static runtime creates memory planners/block runners for each sub-block, which eats up a lot of memory and introduces a lot of extra overhead for this relatively simple operation. This diff introduces a new op that replaces nodes like the above with a single op meant to act like a ternary operator: ``` %result = prim::IfThenElse(%condition, %a, %b) ``` Test Plan: New unit tests Reviewed By: eellison Differential Revision: D34091789 fbshipit-source-id: eb6a8c460c39b4c019a1f4ab1f3f1e5b6edc400c (cherry picked from commit `0f1b335e5b`)	2022-02-17 18:22:48 +00:00
Will Constable	889f3f48b2	Revert D34178476: Update lazy_ir.py from lazy_tensor_staging Test Plan: revert-hammer Differential Revision: D34178476 (`3842140fd5`) Original commit changeset: 7190b2e0d82b Original Phabricator Diff: D34178476 (`3842140fd5`) fbshipit-source-id: 4c969a355f01244c6f5acc52bc31679f2182aa55 (cherry picked from commit `17082075dd`)	2022-02-16 19:34:41 +00:00
Will Constable	3842140fd5	Update lazy_ir.py from lazy_tensor_staging (#72730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72730 This diff contains changes from several PRs landed to lazy_tensor_staging branch. - generating 'fallback' overrides for each codegenned op, useful for debugging - supports operators which are missing aten:: symbols for op names, instead using their string counterpart - makes the IR class a base class instead of hardcoding the assumption of TS Test Plan: tested on lazy_tensor_staging branch Reviewed By: desertfire Differential Revision: D34178476 fbshipit-source-id: 7190b2e0d82b4eb1f4510c858c24446c6df3f9d0 (cherry picked from commit `6713d3f0ef`)	2022-02-16 18:33:31 +00:00
Shunting Zhang	763ad1bf25	(2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change (#72899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72899 Reland D33282878 (`911d527b87`). This is the frontend change. ghstack-source-id: 149204031 Test Plan: Refer to D33282878 (`911d527b87`). Also check CI Reviewed By: gmagogsfm Differential Revision: D34252127 fbshipit-source-id: 27b17ddd4d05d904eb91fd9ee094d9121f00e388 (cherry picked from commit `1d276baca3`)	2022-02-16 03:45:15 +00:00
Ivan Kobzarev	67cd98fad4	[tensorexpr] Fix isNLC segfault (#72786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72786 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D34204523 Pulled By: IvanKobzarev fbshipit-source-id: 9a0f2ce0a1921e261932029c3ebd842330fdf528 (cherry picked from commit `b8326064f6`)	2022-02-15 20:31:56 +00:00
Michael Suo	7db4a48d92	Revert D33342569: (2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change Test Plan: revert-hammer Differential Revision: D33342569 (`856157fcee`) Original commit changeset: 57984ac67ae2 Original Phabricator Diff: D33342569 (`856157fcee`) fbshipit-source-id: 4c12235a1776a3652e7f91e93b626705759d5176 (cherry picked from commit `4cbd7d8bab`)	2022-02-15 18:45:44 +00:00
Shunting Zhang	856157fcee	(2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change (#70471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70471 Reland D33282878 (`911d527b87`). This is the frontend change. ghstack-source-id: 149114933 Test Plan: Refer to D33282878 (`911d527b87`). Also check CI Reviewed By: gmagogsfm Differential Revision: D33342569 fbshipit-source-id: 57984ac67ae2c56c38f72d3b1fb69105901fb472 (cherry picked from commit `b47cc935ee`)	2022-02-15 07:21:19 +00:00
Pavithran Ramachandran	a482aeb0ce	[PyTorchEdge] backport v8 to v7 to support promoted ops as instruction (#71662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71662 backport v8 to v7 to support promoted ops as instruction a flag to help export as instruction from v8 and export as operators for v7 and below Test Plan: ``` buck test caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5629499620570927 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 461 tests discovered (15.693) ✓ Pass: caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions (2.712) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5629499620570927 ``` ``` buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen buck test mode/opt //caffe2/test:upgrader_codegen -- mobile.test_upgrader_codegen.TestLiteScriptModule Parsing buck files: finished in 0.8 sec Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 01:39.4 min (100%) 11031/11031 jobs, 2/11031 updated Total time: 01:40.2 min More details at https://www.internalfb.com/intern/buck/build/a8b0e417-019c-44ba-be6b-23379411a965 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 44fbfa66-cce8-4277-82ac-f89d79558581 Trace available for this run at /tmp/tpx-20220202-160956.915412/trace.log RemoteExecution session id: reSessionID-44fbfa66-cce8-4277-82ac-f89d79558581-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/281475200877601 ✓ ListingSuccess: caffe2/test:upgrader_codegen : 1 tests discovered (1.249) ✓ Pass: caffe2/test:upgrader_codegen - test_generate_bytecode (mobile.test_upgrader_codegen.TestLiteScriptModule) (1.365) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/281475200877601 ``` Reviewed By: iseeyuan Differential Revision: D33719098 fbshipit-source-id: e2d2b23d298f98e4d4fcdfc344f7b8c6f92cff26 (cherry picked from commit `81b956c23a`)	2022-02-15 03:47:39 +00:00
jiej	2d110d514f	Nvfuser code bump 2_1_2022 (#72127 ) Summary: Things changed in this PR that requires review: 1. aten/src/ATen/core/interned_strings.h 2. torch/csrc/jit/ir/alias_analysis.h : exposing createValue to allow efficient mutation 3. torch/csrc/jit/runtime/symbolic_shape_registry.cpp : added gelu/tanh/erf in registry 4. torch/jit/_script.py : throws scripting model sees autocast as decorator since it's not supported nvfuser code update: 1. codegen improvements and performance tuning 2. integration bug fixes for shape expression logic 3. kernel segmentation update to address perf regression from horizontal fusion 4. scalar cpu tensor promotion to support inter-device operation between cpu scalar tensor and cuda tensor Things reverted from local changes: aten::gelu with approximation (tracked in PR: https://github.com/pytorch/pytorch/pull/61439) Pull Request resolved: https://github.com/pytorch/pytorch/pull/72127 Reviewed By: HamidShojanazeri Differential Revision: D34113233 Pulled By: jbschlosser fbshipit-source-id: b82cde32b71e324eca0ea57cb8c9f9647278ca74 (cherry picked from commit `e009bc5c4e`)	2022-02-15 00:43:16 +00:00
Jacob Szwejbka	52c516ecb8	[Pytorch Edge] Minor improve documentation in test_backend_with_compiler Summary: Went through all these files and the design doc to understand the to_backend api. Figured I could add some comments to these files to make the apis a little clearer for those that come after. (Note: this ignores all push blocking failures!) Test Plan: na Reviewed By: raziel, larryliu0820 Differential Revision: D34221989 fbshipit-source-id: 699fcbd8714bfb6b58c6c0bf0e5fbc019d2ef6f8 (cherry picked from commit `0b3f5d73e8`)	2022-02-14 23:44:46 +00:00
Ryan Spring	4f8b986e28	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: VitalyFedyunin Differential Revision: D33894937 Pulled By: jbschlosser fbshipit-source-id: b65e8fb6ea66168af8f34f45ed50e92737a33851 (cherry picked from commit `6e986f91a9`)	2022-02-14 03:40:32 +00:00
Mikhail Zolotukhin	1855b14922	[TensorExpr] Delet `DimArg` class. (#72390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72390 This class didn't add much value and only caused more boilerplate code. This change removes the class and updates all the use cases with uses of `ExprHandle`. A side effect of this change is different names in loop variables, which caused massive mechanical changes in our tests. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D34030296 Pulled By: ZolotukhinM fbshipit-source-id: 2ba4e313506a43ab129a10d99e72b638b7d40108 (cherry picked from commit `c2ec46a058`)	2022-02-11 01:21:59 +00:00
Mikhail Zolotukhin	9123e9b3b5	[TensorExpr] Switch from `ExprPtr` to `ExprHandle` in Compute impl. (#72389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72389 This is an NFC change that just prepares the code for the upcoming deletion of `DimArg` class. This change makes `Compute` and `Reduce` APIs to use `ExprHandle` everywhere. There should be no observable behavior change from this PR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D34030295 Pulled By: ZolotukhinM fbshipit-source-id: 3fd035b6a6bd0a07ccfa92e118819478ae85412a (cherry picked from commit `1b0a4b6fac`)	2022-02-11 01:21:59 +00:00
David Berard	c314750401	[JIT] enable profiling optional tensors (#70532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70532 This adds profiling to Optional[Tensor] types First, in profiling_record.cpp, profiling nodes are added to Optional[Tensor] inputs. The nodes record (a) whether or not any `None` types are encountered, and (b) of the Tensor types, what's the most specific type matching all of non-null tensors that were encoutered (shape, dtype, etc.) In tensorexpr_fuser, when specializing types based on the profiled information, an Optional[Tensor] type will always be Optional[], but the Tensor type contained in the optional type can be specialized (e.g. `Optional[Float(2x2x2, cpu, etc)]`) Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33714748 Pulled By: davidberard98 fbshipit-source-id: 93c819054450de7ac84b112de1012c0c12e34120 (cherry picked from commit `21cfd80123`)	2022-02-08 22:52:26 +00:00
Raghavan Raman	765908708b	[nnc] Adding a test with dynamic shapes from a model (#72198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72198 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D33951741 Pulled By: navahgar fbshipit-source-id: 596b193eba14c8e1affa9fa13070079f05d64cac (cherry picked from commit `ddbb78ff80`)	2022-02-08 02:00:46 +00:00
Raghavan Raman	ff71429906	[nnc] Add stride args while running with allocated outputs (#72223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72223 ghstack-source-id: 148494871 Test Plan: ``` buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - DynamicShapes.GraphWithSymbolicStrides' ``` Reviewed By: eellison Differential Revision: D33960592 fbshipit-source-id: 6334978d5e3713889b4ad12bcd8ed8c69df39d58 (cherry picked from commit `95cc102bc2`)	2022-02-07 19:24:56 +00:00
Han Qi	57f039b41f	Fixing few bugs in torch flatbuffer (#72349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72349 1. Interface call'd methods need to be registered to class. Previously all interface calls are inlined so there was no such problem. 2. parseDoubleList and parseBoolList got reversed when refactoring. Test Plan: 1. Get ASR's test model at ``` mkdir ~/asr1 && cd ~/asr1 fbpkg fetch speech.tuna.milan.ondevice.en_us ``` 2. Convert model: ``` cd ~/fbsource buck run //xplat/caffe2/fb/lite_predictor:convert_model -- --model=$HOME/asr1/pytorchmodel.pt --output_name=$HOME/asr1/pytorchmodel.ff ``` 3. Ran lite_predictor_flatbuffer ``` buck run //xplat/caffe2/fb/lite_predictor:lite_predictor_flatbuffer -- --model=$HOME/asr1/pytorchmodel.ff --method_to_call=encode_src --method_to_generate_input=get_all_bundled_inputs_for_encode_src ``` See perf metric generated (means loading and inference succeeded). Reviewed By: gmagogsfm, zhxchen17 Differential Revision: D33959746 fbshipit-source-id: 24671e1189438119f477032eb6c29bd7736e74ca (cherry picked from commit `5e18809350`)	2022-02-05 00:25:27 +00:00
Raghavan Raman	38f696c0cd	[nnc] Add a API to unroll loops by a given factor (#72071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72071 Reviewed By: ngimel Differential Revision: D33946250 Pulled By: navahgar fbshipit-source-id: 3f3f92054174620025a9d71154d006f1738953e2 (cherry picked from commit `d8b53598e9`)	2022-02-03 18:41:21 +00:00
kshitij12345	02f6226bff	[fix] Dropout2d-3d no-batch-dim (#69885 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69801 TODO: * [x] Update C++ API cc albanD mruberry jbschlosser walterddr kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69885 Reviewed By: mruberry Differential Revision: D33175470 Pulled By: jbschlosser fbshipit-source-id: c9d7d9e0f59ba290a0157725c338a345f3d58b9f (cherry picked from commit `7e4271a156`)	2022-02-02 16:40:32 +00:00
CodemodService FBSourceClangFormatLinterBot	ed435e903f	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33938055 fbshipit-source-id: 6c0643a18f09854e87e183341f252c66dd6395a6 (cherry picked from commit `fd183aedbc`)	2022-02-02 11:27:15 +00:00
Ivan Kobzarev	34e4418dfa	[nnc] tensorexpr for quantized/aten::upsample_nearest2d (#71236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71236 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33553305 Pulled By: IvanKobzarev fbshipit-source-id: 2442afee6d23123bb3a4bc52d3555393b0254106 (cherry picked from commit `90a263fc08`)	2022-02-01 19:48:53 +00:00
Elias Ellison	cf1833df70	[WIP] add explicit dynamic fusion arg (#71173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71173 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D33536222 Pulled By: eellison fbshipit-source-id: a097408ecdd6e284432de128feb297993d882d52 (cherry picked from commit `0e3419b2d3`)	2022-02-01 19:07:02 +00:00
Nikita Shulga	74c44ba9d6	Revert D33850228: [pytorch][PR] Implement Tanh Gelu Approximation Test Plan: revert-hammer Differential Revision: D33850228 (`23d03025dc`) Original commit changeset: 3cc33fb298e4 Original Phabricator Diff: D33850228 (`23d03025dc`) fbshipit-source-id: 9436e7df73c2b2e2011f321674f24973316d3692 (cherry picked from commit `c9efb58223`)	2022-01-31 17:44:19 +00:00
Ryan Spring	23d03025dc	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: cpuhrsch Differential Revision: D33850228 Pulled By: jbschlosser fbshipit-source-id: 3cc33fb298e480d7ecc5c67716da019d60c6ab33 (cherry picked from commit `3a53b3e94f`)	2022-01-31 17:07:45 +00:00
Tristan Rice	6208c2800e	torch/monitor: merge Interval and FixedCount stats (#72009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72009 This simplifies the Stats interface by merging IntervalStat and FixedCountStat into a single Stat w/ a specific window size duration and an optional max samples per window. This allows for the original intention of having comparably sized windows (for statistical purposes) while also having a consistent output bandwidth. Test Plan: ``` buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor ``` Reviewed By: kiukchung Differential Revision: D33822956 fbshipit-source-id: a74782492421be613a1a8b14341b6fb2e8eeb8b4 (cherry picked from commit `293b94e0b4`)	2022-01-30 23:21:59 +00:00
David Berard	99bc978b78	[JIT] Propagate requires_grad to autodiff subgraphs (#71666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71666 When JIT autodiff is constructing a gradient computation graph, it will only add gradients for tensors that require_grad. Previously, require_grad information was not propagated to the subgraph that autodiff used; as a result, autodiff would calculate all gradients, even if requires_grad had never been set during profiling runs. In certain cases, this can lead to performance issues. For example, during training, the gradient of the input data is not needed, but is still computed. This propagates requires_grad to the subgraph passed into autodiff, so that autodiff will not compute unnecessary gradients. Test: `./bin/test_jit --gtest_filter="AutodiffRemoveUnusedGradientsTest.Linear"` Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D33725304 Pulled By: davidberard98 fbshipit-source-id: ca7ab4c9a6a26f94f93aff2d5a4135e125323ba1 (cherry picked from commit `a97fe0556d`)	2022-01-28 18:57:36 +00:00
Joel Schlosser	cb823d9f07	Revert D33744717: [pytorch][PR] Implement Tanh Gelu Approximation Test Plan: revert-hammer Differential Revision: D33744717 (`f499ab9cef`) Original commit changeset: d64532a562ed Original Phabricator Diff: D33744717 (`f499ab9cef`) fbshipit-source-id: 396c3f63de5865f894dbc353d0790a01a624be93 (cherry picked from commit `e9fb2d1db1`)	2022-01-28 18:35:01 +00:00
Ryan Spring	f499ab9cef	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: mikaylagawarecki Differential Revision: D33744717 Pulled By: jbschlosser fbshipit-source-id: d64532a562ed53247bb4fa52bb16722634d5c187 (cherry picked from commit `4713dd9cca`)	2022-01-28 16:59:09 +00:00
John Clow	c85965600c	Fix bug where frozen mod not used for OFI #68903 (#71436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71436 Fixes issue #68903 Test Plan: Imported from OSS Reviewed By: george-qi Differential Revision: D33824857 Pulled By: Gamrix fbshipit-source-id: 8d351feb4a621916f55003c58527a1e85eec476e (cherry picked from commit `57bb420040`)	2022-01-27 23:37:50 +00:00
Pavithran Ramachandran	bf69a61293	(1/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: backend change Summary: Reland for D33282878 (`911d527b87`) . Land backend change first to maintain FC. Will wait for 2 weeks after this diff is in. And than land the front-end change in next diff. Test Plan: test in next diff time buck test mode/dev-nosan fblearner/flow/projects/langtech/translation:tests -- test_e2e_base_training Reviewed By: gmagogsfm Differential Revision: D33342547 fbshipit-source-id: b3dee9a4bdfd78103848c12629e5fccafdd621e3 (cherry picked from commit `ae1935f1af`)	2022-01-27 03:29:40 +00:00
Mikhail Zolotukhin	1dbcde2ade	[TensorExpr] Support scalar intermediate and output values. (#71186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71186 So far we've only supported scalar inputs, but couldn't handle scalar outputs or intermediates. This PR adds it. Scalar outputs are returned as 0-dim tensors. If the kernel is invoked on a stack of IValues, we correctly convert the results to scalar IValues when needed. If the kernel is invoked with a vector of void* pointers, everything works out of the box without any conversions. Lowerings for scalar operators are a bit tricky. Usual lowerings return a pair <Buf, Stmt> (aka Tensor), but for scalar operators we also want to have the corresponding Var that the lowering function supposedly creates (in theory we could just use Loads and Stores, but I'm worried it can affect performance as there is no guarantee this will be optimized by LLVM). So, what we do here to work around this is we return a fake buf + stmt that sets the corresponding var. Then outside of the lowering we create a real buffer and generate a Store to it with the value from the variable we passed as the base handle of the fake buf. This real buffer is then treated as usual by the rest of the system and we can use it if we need to return this scalar value as a kernel output. If we do not need to return it, then the Store will be deleted by the DCE pass. Differential Revision: D33539324 D33539324 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: ab4524b9820ce204f106effcf6232ed33d4ee223 (cherry picked from commit `7faa0939f0`)	2022-01-26 06:32:51 +00:00
Jacob Szwejbka	70f3078dd6	[Pytorch Edge] Wrap lowered module in to_backend (#71597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71597 Problem: _jit_to_backend overrides get/set state. This means any attributes added to the module after lowering will not be preserved after serialization. For edge workflows the biggest problem here is it breaks bundled_inputs. Solution?: Real quick and easy way to handle issues with to_backend overriding get/set state. Wraps the lowered module in another module and has forwarding functions for the api specified in 'method_compile_spec'. The tradeoff with this approach is now the actual workhorse of the module is 1 layer deep which might make debugging slightly grosser/more difficult/confusing. The other approach Martin David and I talked about would be to only lower the portions that require custom get/set state logic. This leaves the top level the same, and only specific backened internals are changed. Personally I'm not sure how much that really addresses the debugging concern all that well. It seems like if you cracked the model open you'd still run into similar amounts of confusion with a lot of the variables and logic referenced coming from another module. The other concern with this approach is whether or not 'compile_spec' specifies the public api of the module (since thats our source of truth for this wrapper). While it may not be enforced, it certainly seems to be true by convention and the to_backend api already uses it as a source of truth for all functions that get generated in the resulting module. I say we just formally commit to this (compile spec keys being functions) being the contract of the api instead of just assuming it to be the case and then having weird behavior if its not. Test Plan: New Unit Test CI to check for existing behavior and contracts. manually tested in a notebook with bundled inputs. {P475790313} Reviewed By: raziel Differential Revision: D33694257 fbshipit-source-id: 9ff27db421eba41bac083dff11a22e9e40a36970 (cherry picked from commit `91ef49977e`)	2022-01-25 06:30:19 +00:00
Peter Bell	40d1f77384	Codegen: python_torch_functions only include relevant operators (#68693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68693 Generation of python bindings for native functions is split over 8 different files. One for each namespace, with the torch namespace split into 3 shards, and methods in their own file as well. This change ensures that editing any single (non-method) operator only causes one of these files to be rebuilt. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596270 Pulled By: albanD fbshipit-source-id: 0570ec69e7476b8f1bc21138ba18fe8f95ebbe3f (cherry picked from commit `ba0fc71a3a`)	2022-01-21 15:37:06 +00:00
Jacob Szwejbka	e926360cb8	[Pytorch Edge] Refactor Compatibility Stuff into own directory (#71432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71432 Organizing jit/mobile a little more ghstack-source-id: 147184536 Test Plan: ci. Reviewed By: iseeyuan Differential Revision: D33640527 fbshipit-source-id: f3a7884fe0d06d80bb8d9cf141ecaee34b6f88ff (cherry picked from commit `4c3d1e5435`)	2022-01-20 19:38:41 +00:00
Han Qi	21b697b646	add flatbuffer_loader and flatbuffer_serializer as BUCK target (#71463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71463 title Test Plan: unittest Reviewed By: zhxchen17 Differential Revision: D33651339 fbshipit-source-id: 4bf325a40e263a441fd86bce560645ad0c1ebb23 (cherry picked from commit `4cb02e62a6`)	2022-01-20 04:51:10 +00:00
Raghavan Raman	70c9146c40	[nnc] Update block and thread extents in cuda_codegen to use int64_t (#71428 ) Summary: The block and thread extent calculations in `cuda_codegen` should be using `int64_t` instead of `int`. The updated test, `test_dynamic_shapes`, fails without this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71428 Reviewed By: samdow Differential Revision: D33640374 Pulled By: navahgar fbshipit-source-id: 64c340ad2a9a1fa1fe066cf1c5dfc3b546b7be6d (cherry picked from commit `6ea546ce11`)	2022-01-19 23:21:24 +00:00
Peter Bell	6f4c491c6b	empty_cpu: Add functions that don't depend on Tensor (#70613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70613 This refactors `at::detail::empty_cpu` to use only `TensorBase` so you can construct tensors without including `Tensor.h`. It also adds a `TensorOptions` version to reduce friction in operators moving from the `at::empty` API. Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33623682 Pulled By: ngimel fbshipit-source-id: 7a7b08bc2ed06830a3d698197a0c8389a096dc1d (cherry picked from commit `2e17ad0bbd`)	2022-01-19 00:01:58 +00:00
Jiewen Tan	680d61daab	[LT] Remove torch::lazy::convertShapes (#71291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71291 This commit removes torch::lazy::convertShapes since it's no longer used. In addition, it replaces a numel logic within LTCTensorImpl. Test Plan: ./build/bin/test_lazy CI in lazy_tensor_staging branch Reviewed By: wconstab Differential Revision: D33575084 Pulled By: alanwaketan fbshipit-source-id: b104ef39fd552822e1f4069eab2cb942d48423a6	2022-01-14 12:06:39 -08:00
CodemodService FBSourceClangFormatLinterBot	88012c7daf	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33577744 fbshipit-source-id: 7ecc8367998ee1dffde54c2f4dd3cfafe19a53c9	2022-01-14 06:10:57 -08:00
Mike Ruberry	3a0c680a14	Jiterates exp2, erfc, erfinv and entr and refactors code_template.h to ATen (#71295 ) Summary: Per title. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/71295 Reviewed By: ngimel Differential Revision: D33575885 Pulled By: mruberry fbshipit-source-id: bc841b46fc0b5458a26a4d4465b18a7a54cd5a5b	2022-01-13 23:58:51 -08:00
Zhengxu Chen	5f2b4be3b9	[jit] Split DynamicType conformance test into smaller pieces. (#71275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71275 Currently it's taking more than 10 minutes to run the conformance test. Instead we should use parametrized test to shard into test segments so that they can run in parallel. ghstack-source-id: 146990608 Test Plan: ``` [zhxchen17@devbig560.ftw3 /data/users/zhxchen17/fbsource/fbcode] buck test mode/dev-tsan //caffe2/test/cpp/jit:jit -- -r 'LiteInterpreterDynamicTypeTestFixture' Building... 34.9 sec (99%) 12110/12111 jobs, 0/12111 updated Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: ebea52b3-7c7f-46be-9f69-18e2e7b040cc Trace available for this run at /tmp/tpx-20220113-113635.717778/trace.log RemoteExecution session id: reSessionID-ebea52b3-7c7f-46be-9f69-18e2e7b040cc-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4222124735827748 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 431 tests discovered (11.173) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/0 (51.331) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/1 (65.614) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/3 (76.875) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/5 (77.271) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/4 (78.871) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/6 (78.984) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/7 (84.068) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/2 (85.198) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/8 (88.815) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/9 (90.332) Summary Pass: 10 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124735827748 ``` Reviewed By: qihqi Differential Revision: D33570442 fbshipit-source-id: 5c49e03b0f88068d444c84b4adeaaf45433ce1fa	2022-01-13 18:22:55 -08:00
CodemodService FBSourceClangFormatLinterBot	60632a00fe	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33561057 fbshipit-source-id: 79873717c45c8bbe6d0ae760e718770fd960185d	2022-01-13 03:27:06 -08:00
Scott Wolchok	1bbea3c3a2	[PyTorch][JIT] Support mayContainAlias(Value, ArrayRef<Value>) (#69853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69853 We can implement this overload more efficiently. ghstack-source-id: 146924693 Test Plan: patched alias_analysis tests Time reported to initialize a predictor by static runtime when given ctr_mobile_feed local_ro net is 9.5s instead of 10.5s. Reviewed By: mikeiovine Differential Revision: D33039731 fbshipit-source-id: 52559d678e9eb00e335b9e0db304e7a5840ea397	2022-01-12 16:53:54 -08:00
Han Qi	1bc3571078	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#70201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70201 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer module object Compared to previous attempts, this diff only adds flatbuffer to cmake target and leaves fbcode/xplat ones unchanged. Test Plan: unittest Reviewed By: malfet, gmagogsfm Differential Revision: D33239362 fbshipit-source-id: b9ca36b83d6af2d78cc50b9eb9e2a6fa7fce0763	2022-01-12 16:30:39 -08:00
Raghavan Raman	9ca367d48b	[nnc] Use given kernel function name while emitting code (#67781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67781 Update `LLVMCodeGen` in NNC to use the given kernel function name while emitting code. This was earlier committed as D31445799 (`c30dc52739`) and got reverted as part of a stack of diffs that included a cache for `PyTorchLLVMJIT`, which was the likely culprit. Test Plan: ``` buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - LLVM.CodeGenKernelFuncName' ``` Reviewed By: ZolotukhinM, bdhirsh Differential Revision: D32145958 fbshipit-source-id: 5f4e0400c4fa7cabce5b91e6de2a294fa0cad88e	2022-01-12 15:49:17 -08:00
Tristan Rice	bfe1abd3b5	torch/monitor: add pybind (#69567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69567 This exposes torch.monitor events and stats via pybind11 to the underlying C++ implementation. * The registration interface is a tad different since it takes a lambda function in Python where as in C++ it's a full class. * This has a small amount of changes to the counter interfaces since there's no way to create an initializer list at runtime so they now also take a vector. * Only double based stats are provided in Python since it's intended more for high level stats where float imprecision shouldn't be an issue. This can be changed down the line if need arises. ``` events = [] def handler(event): events.append(event) handle = register_event_handler(handler) log_event(Event(type="torch.monitor.TestEvent", timestamp=datetime.now(), metadata={"foo": 1.0})) ``` D32969391 is now included in this diff. This cleans up the naming for events. type is now name, message is gone, and metadata is renamed data. Test Plan: buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor Reviewed By: kiukchung Differential Revision: D32924141 fbshipit-source-id: 563304c2e3261a4754e40cca39fc64c5a04b43e8	2022-01-12 13:35:11 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	70951884d4	Add option to load historic operators in IR when the operator is deprecated (#71148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71148 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D33521300 Pulled By: tugsbayasgalan fbshipit-source-id: a0607dba5e7233590384326537017eb0b18da419	2022-01-12 11:07:04 -08:00
Elias Ellison	5480deb183	Add support for permutting dynamic fusion group outputs to channels last format (#70656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70656 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D33458650 Pulled By: eellison fbshipit-source-id: f0c7d20743deac7a87f7c9176e60da8100aefe41	2022-01-12 09:11:34 -08:00
Elias Ellison	39be20f259	[JIT][NNC] Add handling of strides to dynamic shape support. (#70464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464 Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/: ``` S_ONE, // STRIDE_ONE: packed S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1] S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1] S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value ``` and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern. Output striding will be done in a follow up. The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes. As an example: ``` %8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)``` ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D33458649 Pulled By: eellison fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d	2022-01-12 09:11:31 -08:00
Elias Ellison	975e7d246e	Remove ignore shapes arg (#71144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71144 This wasn't being used anywhere. It was originally intended for the SR flow but we're doing something else now. Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D33521061 Pulled By: eellison fbshipit-source-id: 0574698a2b7409df6feb703f81e806d886225307	2022-01-12 09:09:49 -08:00
CodemodService FBSourceClangFormatLinterBot	93b2399c6c	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33544281 fbshipit-source-id: 4f0b5d6d490e6fcb967550cfb1dc0111b1770f73	2022-01-12 04:16:43 -08:00
Elias Ellison	9bccb31306	Remove precise tuple construct flag (#71121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71121 Test Plan: Imported from OSS Reviewed By: d1jang Differential Revision: D33515234 Pulled By: eellison fbshipit-source-id: 57cfe171b583a6bb4d3493a34b159061e97a11b8	2022-01-11 22:12:36 -08:00
Zhengxu Chen	9465c24245	[jit][edge] Use dynamic type instead of union types for schema parsers. (#70509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70509 TypeFactory will construct DynamicType when building on Edge platforms. We use this facility to make FunctionSchema return DynamicType all the time for OptionalType. We don't explicitly use DynamicTypeFactory everywhere because that requires too many changes and will split the entire aten codebase. ghstack-source-id: 146818621 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D33306737 fbshipit-source-id: d7ce00b438f7c03b43945d578280cfd254b1f634	2022-01-11 20:14:25 -08:00
Zhengxu Chen	e7634f83ce	[jit][edge] Migrate base types to DynamicType on mobile. (#70233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70233 Make type parser to produce DynamicType for all base types which don't have type arguments, and return DynamicType pointer for IValue::type(). ghstack-source-id: 146818622 Test Plan: no behavior change. Reviewed By: iseeyuan Differential Revision: D33137219 fbshipit-source-id: 1612c924f5619261ebb21359936309b41b2754f5	2022-01-11 13:53:29 -08:00
Zhengxu Chen	4f35b9144c	[jit][edge] Migrate ListType to DynamicType on mobile. (#70212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70212 Use DynamicType instead of ListType all over the place in Lite Interpreter. Namely we need to modify the following places: 1. Type parser which produces the Type constants. 2. IValue::type() which returns reflected Type from IValues. 3. Helper functions to construct the container value. 4. Typechecks which test whether a type instance is a particular container type. ghstack-source-id: 146818619 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D33176931 fbshipit-source-id: 9144787f5fc4778538e5c665946974eb6171a2e6	2022-01-11 10:57:53 -08:00
Zhengxu Chen	b12ca69179	[jit][edge] Migrate DictType to DynamicType on mobile. (#70202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70202 Use DynamicType instead of DictType all over the place in Lite Interpreter. Namely we need to modify the following places: 1. Type parser which produces the Type constants. 2. IValue::type() which returns reflected Type from IValues. 3. Helper functions to construct the container value. 4. Typechecks which test whether a type instance is a particular container type. ghstack-source-id: 146735648 Test Plan: no behavior change. Reviewed By: iseeyuan Differential Revision: D33137257 fbshipit-source-id: 971bf431658c422ea9353cc32cdab66e98876e9d	2022-01-10 15:55:29 -08:00
Zhengxu Chen	30699cbfd5	Reland D33284352: [jit][edge] Do not reuse mobile type parser for all unpicklers. (#71048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71048 reland D33284352 (`0a921ba0d0`) ghstack-source-id: 146735646 Test Plan: All Github CI: ciflow rerun -l ciflow/all Reviewed By: gmagogsfm Differential Revision: D33489731 fbshipit-source-id: 3e160209a1abb193ad3eed3018054aa7d331025e	2022-01-10 12:42:23 -08:00
Elias Ellison	fb66f561b1	Add copy out to the fallback path in SR invocation of composed op (#70871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70871 We had previously handled reusing memory in the optimized kernel execution path, but not yet handled it if we hit the unoptimized fallback. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33458652 Pulled By: eellison fbshipit-source-id: 4eb62181ed02c95813a99638f5e2d0f9347b5c08	2022-01-10 12:16:38 -08:00
Zhengxu Chen	9762aa0fdc	Revert D33284352: [jit][edge] Do not reuse mobile type parser for all unpicklers. Test Plan: revert-hammer Differential Revision: D33284352 (`0a921ba0d0`) Original commit changeset: 997c4f110b36 Original Phabricator Diff: D33284352 (`0a921ba0d0`) fbshipit-source-id: af316727442a64f1ae40d53d7a9d26ec550d634e	2022-01-07 19:58:03 -08:00
Zhengxu Chen	0a921ba0d0	[jit][edge] Do not reuse mobile type parser for all unpicklers. (#70338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70338 Today Unpickler is used by both server and mobile for deserializing model, and it always fallback to mobile parser when there's no type resolver provided by user. However this is not intended as server and mobile type parser supports different things. In this diff we provide a default fallback using script parser and opt it out for all mobile cases. ghstack-source-id: 146727330 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: iseeyuan Differential Revision: D33284352 fbshipit-source-id: 997c4f110b36eee6596e8f23f6a87bf91a4197ed	2022-01-07 18:35:32 -08:00
Jiewen Tan	338eb1b2b3	[LTC] Export torch::lazy::GetBackendDevice() (#70963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70963 This commit exports torch::lazy::GetBackendDevice(). Test Plan: CI in the lazy_tensor_staging branch. Reviewed By: wconstab Differential Revision: D33468938 Pulled By: alanwaketan fbshipit-source-id: f65599c9238bf6b4f4ffbd5194befdc267272831	2022-01-07 13:13:18 -08:00
Zhengxu Chen	1011ac188f	[jit][edge] Create DynamicType for OptionalType in mobile. (#68137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68137 A small step to replace existing OptionalType usage to DynamicType in Edge runtime. ghstack-source-id: 146670520 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D32264617 fbshipit-source-id: 62d3ffad40901842deac19ca2098ea5ca132e718	2022-01-07 11:23:12 -08:00
Zhengxu Chen	0517e719ac	[jit] Add conformance test for DynamicType with server JIT types. (#69482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69482 Add a test to enumerate a number of JIT type combinations and see if their subtyping behavior is preserved in the new DynamicType system. ghstack-source-id: 146670526 Test Plan: buck test mode/opt //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.DynamicType' Reviewed By: gmagogsfm Differential Revision: D32891263 fbshipit-source-id: 728211b39778e93db011b69b0a4047df78a8fc5b	2022-01-07 11:23:09 -08:00
Xiang Gao	6e16c9bb1d	Add support for deleteKey for FileStore (#69953 ) Summary: torch_ucc uses `deleteKey`, and trying to run PyTorch tests with torch_ucc leads to failure about `deleteKey not implemented for FileStore`. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69953 Reviewed By: ngimel Differential Revision: D33458457 Pulled By: H-Huang fbshipit-source-id: f46afd59f950722ae594d9aafb8843f14019e930	2022-01-07 06:20:59 -08:00
Mikhail Zolotukhin	8223ef1cd8	[TensorExpr] Clean-up logic for copying input tensors and remove some dead code. (#70535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70535 This also fixes handling of inputs that happen to be outputs (they require copy). Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D33399116 Pulled By: ZolotukhinM fbshipit-source-id: 9845838eb653b82ae47b527631b51893990d5319	2022-01-07 01:03:56 -08:00
Mikhail Zolotukhin	5d7cc8f22a	[TensorExpr] Add some graph-rewrite passes to prepare models for AOT compilation. (#66515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66515 These passes should not be used generally as they change API of the model's forward method, but they help experimenting with the model and ironing out all the kinks before it can be compiled properly. In the long run ideally we should provide a better way to enable such experiments. Differential Revision: D31590862 D31590862 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 74ded34c6c871d4cafa29f43dc27c7e71daff8fc	2022-01-07 01:03:53 -08:00
Joel Schlosser	e6befbe85c	Add flag to optionally average output attention weights across heads (#70055 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47583 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70055 Reviewed By: bhosmer Differential Revision: D33457866 Pulled By: jbschlosser fbshipit-source-id: 17746b3668b0148c1e1ed8333227b7c42f1e3bf5	2022-01-06 17:32:37 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	b0fdca8855	Bump version number to 7 and compile old operators with old schema (#68358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68358 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33433730 Pulled By: tugsbayasgalan fbshipit-source-id: 202c58365bae13195d3545cefcb0da9162b02151	2022-01-05 23:57:22 -08:00
Raghavan Raman	616afcf981	[jit] [shape analysis] Move constant tensors out of fused subgraphs during generalization (#70320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70320 ghstack-source-id: 146514368 Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/jit:jit` Reviewed By: eellison Differential Revision: D33280508 fbshipit-source-id: fe4291d7c49f0a498b330de96b698e99f6f6a505	2022-01-05 10:19:14 -08:00
Michael Suo	0ece9a49d7	Revert D33198155: Bump version number to 7 and compile old operators with old schema Test Plan: revert-hammer Differential Revision: D33198155 (`d35fc409ad`) Original commit changeset: 38a1185f9ecb Original Phabricator Diff: D33198155 (`d35fc409ad`) fbshipit-source-id: 411aaeb4e047aad9202db50d4d0f2ff35bc51f9d	2022-01-04 13:44:59 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	d35fc409ad	Bump version number to 7 and compile old operators with old schema (#68358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68358 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198155 Pulled By: tugsbayasgalan fbshipit-source-id: 38a1185f9ecb34a33f737ad0b060b3490956300c	2022-01-04 01:31:25 -08:00
Salil Desai	35251a5528	[PyTorch] Add Enum to IValue Deepcopy (#69937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69937 This enables ```export_torch_mobile_model``` compatibility with Enum IValues Test Plan: ModuleAPITest.DeepCopyEnum Reviewed By: gmagogsfm Differential Revision: D33104681 fbshipit-source-id: ca2a6d259c312487fe38dd1bed33ab6b7910bc2a	2021-12-30 07:52:22 -08:00
George Qi	8af39b7668	AdaptiveLogSoftmaxWithLoss no_batch_dim support (#69054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69054 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33200166 Pulled By: george-qi fbshipit-source-id: 9d953744351a25f372418d2a64e8402356d1e9b7	2021-12-29 10:25:26 -08:00
Bo Wu	bf610f08b0	Back out "Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions" Summary: as title Test Plan: ``` buck run mode/opt-split-dwarf -c=python.package_style=inplace //ai_infra/distributed_ai/pyper_test_framework/templates:pyper_release_v2 -- --model inline_cvr_post_imp_deterministic_shrunk_pyper_release_v2 --cluster TSCTestCluster --hpc_identity oncall_pyper_oncall --stage prod_offline_training --test_module training_platform ... ############## Start inline_cvr_post_imp_model Test Results Analysis ############## I1226 22:03:56.789000 3346280 test_driver.py:139 UNKNOWN ] Test finished in 808.2743511786684 seconds. +-------------------------+---------+------------------------+-----------------+ \| Test Case \| Status \| Message \| Model Entity ID \| +-------------------------+---------+------------------------+-----------------+ \| SmallWorld_release_test \| Success \| finished successfully. \| 987987491 \| +-------------------------+---------+------------------------+-----------------+ I1226 22:03:56.790000 3346280 test_driver.py:143 UNKNOWN ] test_run_id: 3d085f61-28d1-411d-bd27-940ea2554b23 use this id to find your run in scuba pyper_test_framework I1226 22:03:56.792000 3346280 test_driver.py:160 UNKNOWN ] Calling cleanup I1226 22:03:56.792000 3346280 training_platform_test_launcher.py:385 UNKNOWN ] Stopping launched jobs 1 I1226 22:03:59.563122 3346280 ClientSingletonManager.cpp:100] Shutting down Manifold ClientSingletonManager ``` Reviewed By: seemethere Differential Revision: D33325936 fbshipit-source-id: 64414bf7061ad77e8ac12eb8abafee4043e0fa1e	2021-12-27 09:11:46 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	4ae71c8d34	Add graph op replacement pass (#69915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69915 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198158 Pulled By: tugsbayasgalan fbshipit-source-id: f2b924edf9959aaf51f97db994fae031fa062cf8	2021-12-25 13:03:19 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	df3cbcff28	Add utility methods to find an upgrader (#68355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68355 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198156 Pulled By: tugsbayasgalan fbshipit-source-id: 68380148f0d9bee96d8090bf01c8dfca8e1f8b12	2021-12-24 12:23:04 -08:00
Shunting Zhang	911d527b87	Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions (#70339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339 When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message. Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName . Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want. Code under scripts/shunting are just my own experimental code. I can split them out if requested. ghstack-source-id: 146221879 Test Plan: buck test mode/opt //caffe2/test:jit Reviewed By: gmagogsfm Differential Revision: D33282878 fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d	2021-12-24 00:25:40 -08:00
Jiewen Tan	ab57f6d12c	[LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069 This commit upstreams utils to extract BackendDevice from at::Tensor. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice* Reviewed By: samdow Differential Revision: D33293160 Pulled By: alanwaketan fbshipit-source-id: 78647239f90b4d04adce84ae6022b8983ad30c09	2021-12-23 12:42:03 -08:00
Michael Suo	795af1578c	Revert D33172665: [LTC] Upstream utils to extract BackendDevice from at::Tensor Test Plan: revert-hammer Differential Revision: D33172665 (`121d067999`) Original commit changeset: b334ee358ea7 Original Phabricator Diff: D33172665 (`121d067999`) fbshipit-source-id: 8bff43cddfc5d30483ec5cea8eff037aab9d1cfa	2021-12-22 21:12:49 -08:00
Jiewen Tan	121d067999	[LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069 This commit upstreams utils to extract BackendDevice from at::Tensor. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice* Reviewed By: wconstab Differential Revision: D33172665 Pulled By: alanwaketan fbshipit-source-id: b334ee358ea7b031bbffb0a16fa634715dba83f5	2021-12-22 18:15:45 -08:00
vfdev-5	ce9a2f8ba9	[C++ API] Added missing nearest-exact mode and anti-alias flag (#69318 ) Summary: Description: Following https://github.com/pytorch/pytorch/pull/65142#issuecomment-981995692 adding missing nearest-exact mode and anti-alias flag to C++ frontend. - https://github.com/pytorch/pytorch/pull/65142 - https://github.com/pytorch/pytorch/pull/64501 - added tests in pytorch/test/cpp/api/functional.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/69318 Reviewed By: davidberard98 Differential Revision: D33278995 Pulled By: jbschlosser fbshipit-source-id: fa87c0c78df6b398e4f9688cc02111eed187afa7	2021-12-22 11:10:51 -08:00
Jiewen Tan	e02d836cb2	[LTC] Upstream LTCTensorImpl (#70062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70062 This commit upstreams LTCTensorImpl from the lazy_tensor_staging branch. It inherits from c10::TensorImpl and thus manages the lifetime/storage of LazyTensor. Test Plan: ./build/bin/test_lazy --gtest_filter=LazyTensorImplTest.* Reviewed By: desertfire Differential Revision: D33171186 Pulled By: alanwaketan fbshipit-source-id: 6af9f91cc7c7e997f120cb89a7bcd6785c03ace0	2021-12-22 03:21:52 -08:00
Raghavan Raman	4dec15e6d8	[nnc] Add a run method to TensorExprKernel that takes in output tensors (#69477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69477 This diff adds a new run method to `TensorExprKernel` which takes in output tensors as inputs and stores the output in those given tensors. ghstack-source-id: 146107009 Test Plan: buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.RunWithAllocatedOutputs' Reviewed By: ZolotukhinM Differential Revision: D32823890 fbshipit-source-id: edc1f4839785124048b034060feb71cb8c1be34f	2021-12-22 00:30:15 -08:00
George Qi	bb51519937	bug fix FractionalMaxPool2d (random_samples dimensions) (#70031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70031 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33200618 Pulled By: george-qi fbshipit-source-id: 142f224c2cab1008d2d4e9ed333697a92d2d42db	2021-12-21 12:21:54 -08:00
Hui Guo	7abb7667a6	[tensorexpr] Add memory planning to reuse intermediate buffers (#66452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66452 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31557188 Pulled By: huiguoo fbshipit-source-id: f18dfeba1df20d5d4f118640fc10782534eb9219	2021-12-17 01:38:02 -08:00
Hui Guo	bbfd7b75ca	[tensorexpr] Move the allocation of intermediate buffers from TEK to CodeGen (#67143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67143 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31881151 Pulled By: huiguoo fbshipit-source-id: 457e5d4ff8a15f70af9c797c9ab4803d8e779abe	2021-12-17 01:37:56 -08:00
Hui Guo	c7e0951524	[tensorexpr] Add a stmt recorder to obtain stmt PCs (#66450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66450 Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D31557189 Pulled By: huiguoo fbshipit-source-id: 416d79ddfc46a0109187cdeb919ad9b5abde8030	2021-12-17 01:36:37 -08:00
Zhengxu Chen	d459e79500	[jit][edge] Remove usage of shared_ptr<mobile::Code>. (#68037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68037 Right now mobile::Code doesn't outlive its enclosing Function, and all accesses to Code happens inside interpreter loop which doesn't outlive the module, so we don't need to use std::shared_ptr here. This also should saves us 1-2 KB for binary size, because shared_ptr seems to bloat on arm64 android. ghstack-source-id: 145818696 Test Plan: eyes. Reviewed By: qihqi, tugsbayasgalan Differential Revision: D32264616 fbshipit-source-id: d83f538d6604cf75fd7728a25127b4849ce7ab2a	2021-12-16 13:11:46 -08:00
Jiawei Lv	b4c4a015d6	Revert D33163841: Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers" Test Plan: revert-hammer Differential Revision: D33163841 Original commit changeset: e262b6d8c80a Original Phabricator Diff: D33102715 (`eb374de3f5`) fbshipit-source-id: 644216036a238a458f0a2198460b36d24fb035f8	2021-12-16 11:12:18 -08:00
Jiawei Lv	c80b5b8c8f	Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers" Test Plan: revert-hammer Differential Revision: D33102715 (`eb374de3f5`) Original commit changeset: 3816ff01c578 Original Phabricator Diff: D33102715 (`eb374de3f5`) fbshipit-source-id: e262b6d8c80a05f3a67e024fedfbadefdbfe6e29	2021-12-16 09:39:57 -08:00
David Berard	8c7f4a0d0b	[tensorexpr] check for index out of bounds in ir_eval (#68858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68858 when executing with ir_eval, check for index out of bounds. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32657881 Pulled By: davidberard98 fbshipit-source-id: 62dd0f85bb182b34e9c9f795ff761081290f6922	2021-12-16 09:27:45 -08:00
jiej	76d282d447	Nvfuser code bump 12 5 (#69964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69964 Things added in this PR that requires review: 1. cuLaunchCooperativeKernel driver API added aten/src/ATen/cuda/detail/LazyNVRTC.cpp aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h nvfuser code update: 1. perf turning on codegen scheduler that improves performance. 2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark) Things reverted from local changes: 1. aten::gelu with approximation 2. local changes that is upstreamed in PR https://github.com/pytorch/pytorch/issues/68804 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69428 Reviewed By: ngimel Differential Revision: D33073817 Pulled By: wconstab fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb	2021-12-16 08:28:54 -08:00
Tristan Rice	eb374de3f5	Back out "Revert D32606547: torch/monitor: add C++ events and handlers" (#69923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69923 Original commit changeset: fbaf2cc06ad4 Original Phabricator Diff: D32606547 (`e61fc1c03b`) This is the same thing as the original diff but just using a normal std::mutex instead of std::shared_timed_mutex which is not available on OSX 10.11. The performance difference should be negligible and easy to change down the line if it does become a bottleneck. Old failing build: https://github.com/pytorch/pytorch/runs/4495465412?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783 Test Plan: buck test //caffe2/test/cpp/monitor:monitor will add ciflow tags to ensure mac builds are fine Reviewed By: aivanou Differential Revision: D33102715 fbshipit-source-id: 3816ff01c578d8e844d303d881a63cf5c3817bdb	2021-12-15 22:51:43 -08:00
Taylor Robie	24bc3be146	[Profiler] Clean up profiler includes. (#69421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421 I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` solely to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace. Test Plan: Unit tests and CI. Reviewed By: aaronenyeshi, albanD Differential Revision: D32865907 fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e	2021-12-15 12:50:24 -08:00
Chen Lai	408283319a	[Operator Versioning][Edge] Change OP to CALL when there is a valid upgrader (#67731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67731 1. Register upgrader function at loading stage 2. Change OP to CALL when there operator_version from model is smaller than current runtime version and there exists a valid upgrader The interpreter log is : ``` RUNNING 0 STOREN 1 3 RUNNING 1 DROPR 1 RUNNING 2 LOAD 2 RUNNING 3 LOAD 3 RUNNING 4 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 5 LOAD 2 RUNNING 6 LOAD 3 RUNNING 7 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 8 MOVE 2 RUNNING 9 MOVE 3 RUNNING 10 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 11 TUPLE_CONSTRUCT 3 RUNNING 12 RET ``` The upgrader bytecode is: ``` (STOREN, 1, 2) (LOAD, 1, 0) (OP, 0, 0) (JF, 3, 0) (LOADC, 1, 0) (JMP, 3, 0) (LOAD, 2, 0) (OP, 0, 0) (STORE, 3, 0) (MOVE, 3, 0) (JF, 5, 0) (LOAD, 1, 0) (LOAD, 2, 0) (OP, 1, 0) (JMP, 5, 0) (LOAD, 1, 0) (LOAD, 2, 0) (LOADC, 0, 0) (OP, 2, 0) (STORE, 4, 0) (DROPR, 2, 0) (DROPR, 1, 0) (MOVE, 4, 0) (RET, 0, 0) ``` ghstack-source-id: 145635622 Test Plan: describe in summary and CI Reviewed By: iseeyuan Differential Revision: D32092517 fbshipit-source-id: 0314b4bda5d2578cdd4e7cfbfd1e3c07fbccf8a3	2021-12-14 19:13:12 -08:00
Chen Lai	9e4d60a552	[Operator Versioning][Edge] Use check in cpp source file for upgrader (#67728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67728 1. Check in upgrader_mobile.h and upgrader_mobile.cpp 2. Add test to parse all bytecode from upgrader_mobile.h ghstack-source-id: 145635621 Test Plan: buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterUpgraderTest.Upgrader' Reviewed By: iseeyuan Differential Revision: D32087295 fbshipit-source-id: 21e95aabb5e9db76be27e01adfea8fbc41caeaf6	2021-12-14 19:10:51 -08:00
Michael Suo	f565167fbd	Revert D32606547: torch/monitor: add C++ events and handlers Test Plan: revert-hammer Differential Revision: D32606547 (`e61fc1c03b`) Original commit changeset: a00d0364092d Original Phabricator Diff: D32606547 (`e61fc1c03b`) fbshipit-source-id: fbaf2cc06ad4bec606e8a9c6f591d65c04e6fa56	2021-12-11 22:51:03 -08:00
Tristan Rice	e61fc1c03b	torch/monitor: add C++ events and handlers (#68783 ) Summary: This adds a C++ event handler corresponding to the Python one mentioned in the RFC. This changes the counters a bit to all be push driven instead of being polled. The two window types are "fixed count" and "interval". One is based off the number of logged events and the other is based off of time windows. There's currently no active ticker for interval so it needs a regular stream of events to ensure events are produced. A follow up diff can add support for things like HHWheel / simple ticker. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783 Test Plan: buck test //caffe2/test/cpp/monitor:monitor Reviewed By: kiukchung Differential Revision: D32606547 fbshipit-source-id: a00d0364092d7d8a98e0b18e503c0ca8ede2bead	2021-12-11 16:44:46 -08:00
Yanan Cao	17f3179d60	Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796 (Note: this ignores all push blocking failures!) Test Plan: External CI + Sandcastle Reviewed By: zhxchen17 Differential Revision: D33032671 fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef	2021-12-10 21:29:53 -08:00
Hao Lu	91d16cb633	[Jit] Fix schema of aten::split int[] version (#69745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69745 Missed in D31935573 (`6b44e75f6b`). Reviewed By: d1jang Differential Revision: D31889867 fbshipit-source-id: 417bd0b15db4891dbd641b35a803553f11d0d756	2021-12-10 02:33:36 -08:00
Nikita Shulga	3bb20ae49f	Make c10d tests -Werror clean (#69703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69703 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D32997001 Pulled By: malfet fbshipit-source-id: 38b5f195c04f2b3b920e6883a96fe9a36345b9d2	2021-12-09 22:10:04 -08:00
Ivan Kobzarev	7dba88dfdb	[nnc][quant] Fix quantized concat (#69596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69596 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32941108 Pulled By: IvanKobzarev fbshipit-source-id: 727f608b98625648e2e444396d910838c95f58f2	2021-12-09 18:55:32 -08:00
Peter Bell	b2e79ed5ec	Remove WindowsTorchApiMacro.h in favor of Export.h (#69585 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/68095 This also changes the files from the ATen folder to include c10's `Export.h` instead since they can't ever be exporting `TORCH_PYTHON_API`. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69585 Reviewed By: mrshenli Differential Revision: D32958594 Pulled By: albanD fbshipit-source-id: 1ec7ef63764573fa2b486928955e3a1172150061	2021-12-09 17:30:09 -08:00
Han Qi	d3649309e6	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Test Plan: unittests Reviewed By: gmagogsfm Differential Revision: D32806835 fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57	2021-12-09 14:53:31 -08:00
Richard Barnes	afb742382a	use irange for loops 10 (#69394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69394 Modified loops in files under fbsource/fbcode/caffe2/ from the format ``` for(TYPE var=x0;var<x_max;x++) ``` to the format ``` for(const auto var: irange(xmax)) ``` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D32837991 fbshipit-source-id: fc7c4f76d2f32a17a0faf329294b3fe7cb81df32	2021-12-09 09:49:34 -08:00
Chen Lai	13faaff54c	[Operator Versioning][Edge] Implement register function for upgrader (#67730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67730 This pr implement the register function for upgrader so it can be used at loading stage ghstack-source-id: 145170986 Test Plan: ``` buck test //caffe2/test/cpp/jit:jit ``` Reviewed By: iseeyuan Differential Revision: D32092518 fbshipit-source-id: 779b51eb12b8cb162a93a55c1e66fe0becc4cb36	2021-12-09 02:18:09 -08:00
Peter Bell	e279963eef	Remove remaining THC code (#69039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69039 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872476 Pulled By: ngimel fbshipit-source-id: 7972aacc24aef9450fb59b707ed6396c501bcb31	2021-12-08 12:18:08 -08:00
Bin Bao	e8f4c9cc40	[LT] Upstream LazyView and view ops IR Nodes (#69277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69277 LazyView is the main class for tracking alias caused by view ops. The corresponding IR classes for view ops are hand-written now, and we can switch to code-gen them in future. For certain view ops, they have a reverse IR class to perform inplace update in the backward direction on a chain of alias ops. As part of the future work, we will simplify the logic for LazyView once the functionalization pass in core is ready to use. Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32820014 Pulled By: desertfire fbshipit-source-id: d9eb526cb23885f667e4815dc9dd291a7b7e4256	2021-12-04 08:44:54 -08:00
Ramanpreet Nara	f587267dc7	Revert D31705359: use irange for loops 8 Test Plan: revert-hammer Differential Revision: D31705359 (`17e5200441`) Original commit changeset: c9ea2fbc0f9c fbshipit-source-id: 08fff2d12beca953ad30dd0baabf86e39ac84f14	2021-12-02 12:55:08 -08:00
Richard Barnes	17e5200441	use irange for loops 8 (#66743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705359 fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b	2021-12-02 10:21:29 -08:00
Alban Desmaison	00ebbd5ef6	Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer Test Plan: revert-hammer Differential Revision: D32010095 (`41d35dc201`) Original commit changeset: d763b0557780 fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d	2021-12-02 06:41:40 -08:00
Han Qi	41d35dc201	Add ability for a mobile::Module to save as flatbuffer (#67351 ) Summary: Included functions: * save_mobile_module -> saves a mobile::Module to flatbuffer * load_mobile_module_from_file -> loads a flatbuffer into mobile::Module * parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351 Reviewed By: iseeyuan Differential Revision: D32010095 Pulled By: qihqi fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1	2021-12-01 23:58:15 -08:00
Jacob Szwejbka	291e56eda4	[Pytorch Edge] Update Black Box Api with operator versioning (#68678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68678 Test Plan: Ill update the unit test before land Reviewed By: cccclai Differential Revision: D32573603 fbshipit-source-id: 19271bcbb68b61d24d6943e61a943f4f75fddb5d	2021-12-01 19:13:32 -08:00
Chen Lai	b9738e923e	[Operator Versioning][Edge] Add old models and unittest (#67726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67726 1. Check in one model with aten:div_tensor old op with unittest in both cpp and python. The following two lines are commented out and expected to work after using upgrader. ``` _helper(mobile_module_v2, div_tensor_0_3) _helper(current_mobile_module, torch.div) ``` 2. Update the commented code accordingly. Currently there are 6 upgraders. The following old models with operators are added to cover these 6 upgraders: ``` // Tensor x Tensor test_versioned_div_tensor_v3 // Tensor x Scalar test_versioned_div_scalar_float_v3 test_versioned_div_scalar_reciprocal_int_v3 test_versioned_div_scalar_inplace_float_v3 // Scalar x Scalar test_versioned_div_scalar_scalar_v3 // Tensor x Tensor with out kwarg test_versioned_div_tensor_out_v3 // Tensor x Tensor inplace test_versioned_div_tensor_inplace_v3 // Tensor x Scalar inplace test_versioned_div_scalar_inplace_int_v3 ``` Note: In this pr, per model, it includes the following test: 1. Model (with old op) load/run test will be in both cpp and python 2. Model (with old op) + upgrader test will be in python Other tests considered adding: 1. per upgrader bytecode test 2. app level integration test ghstack-source-id: 144422418 Test Plan: CI and the added unittest Reviewed By: iseeyuan Differential Revision: D32069653 fbshipit-source-id: 96d9567088a1f709bc7795f78beed7a308e71ca9	2021-12-01 18:46:30 -08:00
Jiewen Tan	e6c435bf96	[LTC] Upstream helpers for c10::Device <=> BackendDevice (#69064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69064 This commit upstreams helpers for converting a c10::Device to BackendDevice and vice versa. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.FromAten:BackendDeviceTest.ToAten Reviewed By: wconstab Differential Revision: D32732607 Pulled By: alanwaketan fbshipit-source-id: 0dd233d37a4a30fc4b22dba322ddd85d4cb3635b	2021-12-01 12:15:32 -08:00
Scott Wolchok	1d84d8c5d8	[PyTorch] Remove StringView from RecordFunction interface (1/2) (#68410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68410 First step toward not heap-allocating a string in RecordFunction::before() every time ghstack-source-id: 144287654 Test Plan: CI Reviewed By: chaekit Differential Revision: D32453847 fbshipit-source-id: 080d95095fb568287b65fcc41a4ca6929b5f9a87	2021-11-30 13:20:08 -08:00
Joel Schlosser	8fef7c09f5	Remove finput from slow2d signatures (#68896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68896 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32655874 Pulled By: jbschlosser fbshipit-source-id: 3c9acb106961c40af1432652179edb2bc5a4bfa5	2021-11-30 09:47:24 -08:00
Jiewen Tan	0cdeb586ae	[LTC] Upstream some utilities (#69046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69046 This commit upstreams utilities including ExceptionCleanup, MaybeRef, Iota, ToVector, ToOptionalVector and GetEnumValue. Test Plan: ./build/bin/test_lazy --gtest_filter=UtilTest.* Reviewed By: wconstab, Chillee Differential Revision: D32709090 Pulled By: alanwaketan fbshipit-source-id: 5147433becd4dbb07be7d36d66b0b8685054d714	2021-11-30 02:44:02 -08:00
Mikhail Zolotukhin	75ce040620	[TensorExpr] Allow for 'keepdim' argument in aten::mean in NNC's external call. (#68756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68756 That fixes some warnings in our tests. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32600952 Pulled By: ZolotukhinM fbshipit-source-id: 548eaf3659e20795cce44d8f57e77f4a47d44d98	2021-11-30 00:06:34 -08:00
Vinnam Kim	7b701ce2d4	Add set_to_none option to C++ API (#68801 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68167. Signed-off-by: Vinnam Kim <vinnam.kim@makinarocks.ai> Pull Request resolved: https://github.com/pytorch/pytorch/pull/68801 Reviewed By: mruberry Differential Revision: D32625239 Pulled By: jbschlosser fbshipit-source-id: 5f09b959e23d5448106a47029d06ec20ad094d82	2021-11-29 08:42:39 -08:00
Bin Bao	787ded5103	Add lazy::Shape::numel() (#68314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68314 Add a convenience to lazy::Shape for counting the number of elements (by multiplying out the dimensions). This is a method on Tensor, and in switching other lazy tensor shape utils to use aten shape inference, we need numel counts. Test Plan: add unit tests Reviewed By: alanwaketan Differential Revision: D32409138 fbshipit-source-id: 3ae725300f8826d38e45412f46501d5e5f776fb2	2021-11-29 08:38:09 -08:00
Han Qi	959cb03132	Populate operator_input_sizes_ (#68542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68542 title Test Plan: unittest Reviewed By: iseeyuan Differential Revision: D32508159 fbshipit-source-id: 0773a725973a493f19a2e9a340365e559dfdf7f8	2021-11-23 12:18:06 -08:00
Tristan Rice	758d7dea9c	torch.monitor - Initial C++ Stats (#68074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68074 This is the first step of many PRs towards implementing the `torch.monitor` RFC https://github.com/pytorch/rfcs/pull/30 This defines the aggregation types, the `Stat` class and provides some simple collection of the stats. This doesn't match the RFC exactly as it incorporates some of the comments on the RFC as well as a few changes for performance. Changes: * added window_size to the stats. If specified it will always compute the stat using the `window_size` number of values. If there aren't enough values within that window it reports the previous stats. * This doesn't include the push metrics yet (will be coming). After more discussion it looks like the best way to handle this is to support a hybrid where the metric can set how frequently it'll be logged. For fixed window_size metrics it'll be logged each time it hits the window size. This will allow performant counters as well as lower frequency push counters (window_size=1). Performance considerations: * Updating the stats acquires a lock on that Stat object. This should be performant unless there's many-many threads writing to the same stat. Single thread will typically use futex so should be quite fast. * Adding/removing/fetching all stats sets a global lock on the stat list -- this shouldn't be an issue since these events happen infrequently. * Fetching stats accesses one stat at a time instead of a global lock. This means the exported values are linearizable but not serializable across multiple stats but I don't expect this to be an issue. Next steps: 1. Add StatCollector interface for push style metrics 1. Add pybind interfaces to expose to Python 1. Add default metric providers 1. Integrate into Kineto trace view Test Plan: buck test //caffe2/test/cpp/monitor:monitor CI Reviewed By: kiukchung Differential Revision: D32266032 fbshipit-source-id: dab8747b4712f5dba5644387817a3a0fda18b66a	2021-11-18 21:46:23 -08:00
Hongyi Jia	146a7f68e2	Enable desync root cause analysis for NCCL (#68310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68310 Enable desync root cause analysis by recording the last footprint of collective calls. When timeout we parse the store trace and figure out the root cause of the desync issue. This feature is built based on async error handling. Test Plan: Standalone test * Typical desync - P467288969 * Mismatched collectives - P467288916 * Mismatched broadcast size - P467288873 DDP benchmark * DDP benchmark desync - P467433483, P467520195 No perf regression: * w/o this diff https://www.internalfb.com/intern/fblearner/details/308379789?tab=Outputs * w/ this diff https://www.internalfb.com/intern/fblearner/details/308534088?tab=Outputs Reviewed By: mingzhe09088 Differential Revision: D32348647 fbshipit-source-id: 43e7e96e3fa2be0ac66c1325bceb639b461a8b3a	2021-11-17 20:29:03 -08:00
Han Qi	4eb772fde6	Refactor saving jit::Module to mobile .pt in 2 steps: (#66494 ) Summary: 1. is to convert Function -> mobile::Function 2. is to serialize mobile::Function This also opens opportunity to create mobile::Module without saving/reloading Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66494 Reviewed By: zhxchen17 Differential Revision: D32293022 Pulled By: qihqi fbshipit-source-id: 29b43d47ff86071d5e2f9d6ca4dba4445711ce3d	2021-11-17 12:02:20 -08:00
jjsjann123	0dc3f829d9	Nvfuser code bump 11 5 (#67943 ) Summary: nvfuser code update: 1. Tuning heuristics on schedulers for reduction/normalization kernels; 2. bfloat16 on IO tensor support; 3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last; 4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`. Things that are reverted from our local branch: 1. changes on some entries in autodiff 2. aten::gelu with approximation 3. native_dropout(_backward) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943 Reviewed By: ngimel Differential Revision: D32288709 Pulled By: dzhulgakov fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1	2021-11-17 01:22:17 -08:00
Raghavan Raman	2fd468e5f8	[jit] Set the graph input types before interpreting the graph during tracing (#68242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68242 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D32382958 Pulled By: navahgar fbshipit-source-id: 4e82a604a9ea2046af2755de23944147e618a65f	2021-11-15 15:44:32 -08:00
Mike Iovine	c697eeba72	[JIT] Combine concat nodes where possible (#67000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67000 See the [related issue](https://github.com/pytorch/pytorch/issues/66654) for context. This new JIT optimization transforms patterns like this: ``` %inputs.1 : Tensor[] = prim::ListConstruct(%a, %b, %c) %concat.1 : Tensor = aten::cat(%inputs, %dim) %inputs.2 : Tensor[] = prim::ListConstruct(%x, %concat.1, %y) %concat.2 : Tensor = aten::cat(%inputs.2, %dim) ``` into this: ``` %inputs.2 : Tensor[] = prim::ListConstruct(%x, %a, %b, %c, %y) %concat.2 : Tensor = aten::cat(%inputs.2, %dim) ``` (it can do this for chains of `aten::cat` longer than 2 as well) A few conditions have to hold: 1. The `dim`s have to match. 2. `inputs.1` and `inputs.2` cannot be mutated Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOpt` Reviewed By: d1jang Differential Revision: D31819491 fbshipit-source-id: 9f1a501d52099eb1a630b5dd906df4c38c3817ba	2021-11-15 12:02:45 -08:00
Mikhail Zolotukhin	e511a7a5b4	[TensorExpr] Remove non-determinism in iterating over unordered_set of intermediate buffers. (#68277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68277 Differential Revision: D32400553 D32400553 Test Plan: Imported from OSS Reviewed By: saketh-are, priyaramani Pulled By: ZolotukhinM fbshipit-source-id: a8fe820bbddaa19f95db432efaa6d3e36095a05e	2021-11-13 00:50:57 -08:00
Will Constable	6ddaf3bd37	[LT] Upstream TsNode, TsNodeLowering, TsLoweringContext (#68154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68154 Test Plan: added a basic test; cover more by using lazy_tensor_staging tests Reviewed By: Krovatkin, alanwaketan Differential Revision: D32224303 fbshipit-source-id: ac3e1161229b8ae60fdb15ffa72e17072b595914	2021-11-12 12:57:20 -08:00
Will Constable	dc24503a89	Fix Hash(c10::Scalar), account for garbage data in union (#68201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68201 Hash(c10::Scalar) made a bad assumption that it was valid to just hash over all the bytes of data of the c10::Scalar struct. Becuase c10::Scalar stores a union of different (float/int/complex) types with different sizes, not all bytes are valid in all cases. Hash() should only read the bytes corresponding to the currently active type. Test Plan: Added new unit tests. Verified HashTest.Scalar failed with the original Hash() impl and then fixed. Reviewed By: alanwaketan Differential Revision: D32367564 fbshipit-source-id: ac30dd4f6dd0513954986d3d23c0c11ba802c37b	2021-11-12 07:20:08 -08:00
Howard Huang	7b376bf844	Remove ProcessGroup from TensorPipeAgent initialization (#68128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68128 Reland of D31762735 (`0cbfd466d2`). This diff was originally reverted due to failure in test_send_export_type_through_rpc_with_custom_pickler. I updated rpc_pickler_test.py to prevent a race condition where processes were not registering their pickler before handling their rpc_sync calls. Test Plan: rpc_pickler_test file: buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test //caffe2/torch/fb/training_toolkit/backend/metrics/collectors/fbdata_aggregator/tests:batch_collector_test -- --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx rpc_pickler stress test: buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test -- --exact 'caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test - test_send_export_type_through_rpc_with_custom_pickler (caffe2.torch.fb.training_toolkit.backend.metrics.tests.rpc_pickler_test.CythonTypeRpcSpawnTest)' --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx --jobs 18 --stress-runs 10 --record-results Reviewed By: mrshenli Differential Revision: D32316077 fbshipit-source-id: e58de2335fbaa3ab46d46fe222c659197633a5e4	2021-11-11 12:28:55 -08:00
Martin Yuan	bd5f33f91e	demo backend decoupled from operators (#66100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66100 A backend should not directly dependent on ATen operators. The demo backend is changed to that way for testing purpose. Test Plan: Imported from OSS Reviewed By: pavithranrao Differential Revision: D31384614 Pulled By: iseeyuan fbshipit-source-id: c97f0c4aa12feb1d124f1d7a852e9955a7a2ce42	2021-11-11 10:26:17 -08:00
Will Constable	d6e6064efc	[LT] Upstream backend interfaces (#67927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67927 BackendData - represents 'tensor data' in opaque backend storage LoweringContext - interface for performing backend-specific IR lowering BackendImplInterface - interface for lazy tensors backends to implement Reorgs backend-related files into lazy/backend subdir includes a few small fixes, which were made on lazy_tensor_staging but need to be back-ported to master. Test Plan: used by lazy_tensor_staging branch Reviewed By: desertfire Differential Revision: D32142032 fbshipit-source-id: 828c717bcd0d511876e64ad209b50f7bfb10cec5	2021-11-10 12:55:31 -08:00
Jiewen Tan	6011c35a79	[LTC] Upstream class BackendDevice (#68027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68027 This commit upstreams class BackendDevice to the master, which is a backend specific representation of the actual hardware, for instances, CPU, GPU, or TPU. This concept is important for backend like XLA where it needs to tell the actual hardware type from the c10::DeviceType::Lazy virtual device during both IR constructions and lowerings. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.* Reviewed By: wconstab Differential Revision: D32261838 Pulled By: alanwaketan fbshipit-source-id: 579c3fc5f9da7847c887a383c6047e8ecb9cc5bc	2021-11-10 07:05:43 -08:00
Bin Bao	a027551358	[LT] Merge cache.h (#67929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67929 1. Write a node-hash based unit test for Cache 2. Replace CHECK with TORCH_CHECK in IrUtil Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32246134 Pulled By: desertfire fbshipit-source-id: c464bc300126d47e9ad4af3b3e8484a389757dc0	2021-11-09 12:02:02 -08:00
Bin Bao	a473417076	[LT] Merge permutation_util into master (#67766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67766 Test Plan: `build/bin/test_lazy` Reviewed By: wconstab Differential Revision: D32147676 Pulled By: desertfire fbshipit-source-id: 528b48c9cf789abc171235091c7146b2ab7a9c76	2021-11-09 12:00:39 -08:00
Howard Huang	9fb3ba9d7b	Revert D31762735 (#67924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67924 This diff reverts the changes made in D31762735 (`0cbfd466d2`) Test Plan: Wait for CI Reviewed By: derekmod-fb Differential Revision: D32214744 fbshipit-source-id: e0a65b6a31a88216ae1243549fcbc901ef812374	2021-11-06 17:34:13 -07:00
Chen Lai	ae501a9727	[PyTorch Edge] Update bytecode version compatibility check (#67417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67417 bytecode version is valid when it's smaller than kMaxSupported and larger than kMinSupported ghstack-source-id: 142609392 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail' ``` Reviewed By: JacobSzwejbka, iseeyuan Differential Revision: D31984839 fbshipit-source-id: 2011e77455c931c0a8a58267494d44bcf167b877	2021-11-05 19:34:01 -07:00
Raghavan Raman	e7a3bbce89	[nnc] Add support for dynamic shapes in TensorExprKernel (#67861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67861 Previously submitted as https://github.com/pytorch/pytorch/pull/67197. This got reverted because its failures were hidden by the failures of another PR. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32178196 Pulled By: navahgar fbshipit-source-id: cc8a5c68aed360d06289e69645461cfa773e1300	2021-11-05 11:18:19 -07:00
Jiewen Tan	8bed46ef38	[WIP][LTC] Upstream class Shape (#67672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67672 This commit Upstreams class Shape from lazy_tensor_staging branch. Test Plan: WIP. Reviewed By: malfet Differential Revision: D32095478 Pulled By: alanwaketan fbshipit-source-id: 61611b12fc079b195833b5b22a6cf73c0935b8b9	2021-11-04 14:12:03 -07:00
Rohan Varma	90d311b268	[RPC] Add exception logging to constValue() (#67802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67802 In RPC C++ code, we might sometimes call constValue() when the future actually has an exception, and in unittests we want to assert on the exception. What happens is that we get a message basically saying "!eptr_" which indicates there is some exception but we don't know what it is. This diff simply adds logging for the exception and mentions that `value` over `constValue` should be used when the future can have an exception. The contract of `constValue` to throw when `eptr_` is set is still held, it is just enhanced with additional logging. ghstack-source-id: 142375391 Test Plan: Added UT Reviewed By: mrshenli Differential Revision: D32156552 fbshipit-source-id: 4dd5e73b92173209074c104a4b75c2021e20de4b	2021-11-04 10:04:09 -07:00
Natalia Gimelshein	ca445645f9	Revert D31902471: [nnc] Add support for dynamic shapes in TensorExprKernel Test Plan: revert-hammer Differential Revision: D31902471 (`15a3c374e2`) Original commit changeset: d2729a38ba1a fbshipit-source-id: 4c05de82e626bbf744df84fd2b914b66fd165a19	2021-11-03 14:48:12 -07:00
Raghavan Raman	15a3c374e2	[nnc] Add support for dynamic shapes in TensorExprKernel (#67197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67197 Test Plan: Imported from OSS Reviewed By: eellison, ZolotukhinM Differential Revision: D31902471 Pulled By: navahgar fbshipit-source-id: d2729a38ba1ac607ff07f516ed56fbd9085715dc	2021-11-03 11:24:17 -07:00
Raghavan Raman	383c1f51b1	[nnc] Fixed handling of 0-sized tensors in cat (#67734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67734 The implementation of `aten::cat` op in NNC has to ignore tensors that have 0-size in any dimension. Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.CatWithEmptyInputs'` Reviewed By: ZolotukhinM Differential Revision: D32122171 fbshipit-source-id: 90c697813bc504664673cdc262df6e7ce419c655	2021-11-03 10:16:16 -07:00
Mikhail Zolotukhin	ff5c61a74e	[TensorExpr] Add lowering for aten::max (reduction). (#66519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66519 Differential Revision: D31590853 D31590853 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: a702621621f681d7f5392912e8a77ca124e14170	2021-11-03 09:44:09 -07:00
Mikhail Zolotukhin	00afe9ba7b	[TensorExpr] Add lowering for aten::embedding. (#66518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66518 Differential Revision: D31590855 D31590855 Test Plan: Imported from OSS Reviewed By: pbelevich Pulled By: ZolotukhinM fbshipit-source-id: aace0a87b1649330dae44182f7873aca27160d64	2021-11-03 09:44:07 -07:00
Mikhail Zolotukhin	008a58d226	[TensorExpr] Add lowering for aten::conv1d. (#66517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66517 Differential Revision: D31590856 D31590856 Test Plan: Imported from OSS Reviewed By: pbelevich Pulled By: ZolotukhinM fbshipit-source-id: c05a37d8741acd0606c2adb8d6cfeb1f57bc8aa0	2021-11-03 09:44:05 -07:00
Mikhail Zolotukhin	d58ef2bbff	[TensorExpr] Fix lowering for aten::softmax for the case when dtype parameter is None. (#66516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66516 Differential Revision: D31590858 D31590858 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 0aeee7a5be64b3b9c8fa00aacb1a94031a7e25d1	2021-11-03 09:42:48 -07:00
Rohan Varma	885da61d7d	[PG NCCL] Disable NCCL health check (#67668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67668 This adds an env var to enable NCCL health check, which when left unspecified, results in the check not being run. Unit tests that need to test this functionality have the env variable set. Please see internal diff for more details. Test Plan: CI Reviewed By: yuguo68, mrshenli Differential Revision: D32089763 fbshipit-source-id: dff5664a5e607f711515cd1042089ca769914fbb	2021-11-02 16:21:59 -07:00
Scott Wolchok	82f7f8d471	[PyTorch] Adopt IValue::toTupleRef() where obvious (#65505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65505 Generated with `fastmod -m 'toTuple(\s)->' 'toTupleRef()${1}.'` , followed by `fastmod '(std::move$.)toTupleRef\($.' '${1}toTuple()->'` to unbreak 2 callsites. ghstack-source-id: 142065835 Test Plan: CI Reviewed By: gchanan Differential Revision: D31131025 fbshipit-source-id: 54457ae5bbeb38db9c7f196d469b98521c3d3f34	2021-11-02 10:22:18 -07:00
Howard Huang	0cbfd466d2	Remove ProcessGroup from TensorPipeAgent initialization (#66708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66708 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31762735 Pulled By: H-Huang fbshipit-source-id: 9f3879fca6b8258f7e6171b14d2c1d6cce21627d	2021-11-01 14:15:27 -07:00
Max Ren	ba369ea053	check to ensure profiler_edge is only added when use_kineto is on (#67494 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67494 Reviewed By: jbschlosser Differential Revision: D32031142 Pulled By: mcr229 fbshipit-source-id: 8267f0e02c5bed0fbc4956af6935a551bedb27ef	2021-11-01 13:42:14 -07:00
Ivan Kobzarev	7fbcf79684	[tensorexpr][nnc] Support quantization (#66676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66676 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31676329 Pulled By: IvanKobzarev fbshipit-source-id: 288b41ff4ed603dfaacb465f296997f14bb23c22	2021-10-31 22:49:30 -07:00
Jacob Szwejbka	66202b7f8d	[Pytorch Edge] Expose runtime operators versioning (#67385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67385 As part of the expanded operator versioning effort we are going to start looking at this variable and whats stored locally in the model file. ghstack-source-id: 141782717 Test Plan: unit test Reviewed By: cccclai Differential Revision: D31976654 fbshipit-source-id: 255a23cff7c4f4039089de23b4da95772be48324	2021-10-29 13:42:59 -07:00
Elias Ellison	fc82ad186a	Add Initial NNC Dynamic Shapes Flow (#66136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136 FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack.. Takes in a TensorExprGraph of static shapes and generalizes the input shapes to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise dimensions with the same value will be bucketed to the same symbolic shape. E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)` From there, runs symbolic shape inference on the graph, and creates a versioning if in the graph with prim::TensorExprDynamicGuard checking if the inputs at runtime match the Generalized Symbolic Shapes that are inputs to the TE Kernel. The computate to calculate all symbolic dimensions is inlined in to the if block with the TE Kernel. All Sym Dim Value* are appended to the end of the TE Kernel Graph/Node inputs, and the Node is augmented with a integer list attr `symbolic_shape_inputs` that gives the mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in `test_shape_analysis` Returns True on Success, False on Failure, can fail if shape propagation fails to propagate # of dims or if complete shapes on inputs not set. Example transformation ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : int = prim::Constant[value=0]() %4 : Tensor = aten::tanh(%x.1) %5 : Tensor = aten::erf(%4) %6 : Tensor = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor = aten::cat(%7, %3) %9 : Tensor = aten::hardswish(%8) %10 : Tensor = aten::mul(%9, %z) return (%9) ``` -> ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp) %5 : Tensor = prim::If(%4) block0(): %15 : int[] = aten::size(%x_inp) %16 : int[] = aten::size(%y_inp) %17 : int = prim::Constant[value=1]() %18 : int = prim::Constant[value=0]() %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10 %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10 %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10 %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29 %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3) -> (%3) block1(): %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp) -> (%14) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu), %SS_5 : int, %SS_4 : int, %SS_3 : int, %SS_2 : int): %3 : int = prim::Constant[value=0]() %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1) %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4) %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3) %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8) %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z) return (%9) ``` Test Plan: Imported from OSS Reviewed By: navahgar, anjali411 Differential Revision: D31797466 Pulled By: eellison fbshipit-source-id: b508d2f5baef6e8e4020955ab1d4bc4b9c7bdfdd	2021-10-28 17:09:03 -07:00
Zhengxu Chen	0795735351	[jit] Clean up unneeded virtual methods from Function interface. (#65968 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65968 tryToGraphFunction() should cover all cases and more composable than adhoc virtual methods. ghstack-source-id: 141759214 Test Plan: no behavior change. Reviewed By: gmagogsfm Differential Revision: D31326154 fbshipit-source-id: 692a35df424f7d4f777a96489c4cbb24b3ae7807	2021-10-28 12:28:48 -07:00
Bin Bao	2366948085	[LT] Add ir_util for ComputePostOrder (#67282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67282 Test Plan: `build/bin/test_lazy` Reviewed By: wconstab, ngimel Differential Revision: D31961754 Pulled By: desertfire fbshipit-source-id: 28466588ece8057640a7202b8c79cc1a4357d373	2021-10-28 08:17:52 -07:00
Zhengxu Chen	b55a2500d2	[jit] Remove graph() call from abstract Function interface. (#65967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967 Graph is an implementation detail. If user wants to get access to the underlying graph, they should be able to explicitly dynamic cast instead. ghstack-source-id: 141659819 Test Plan: no behavior change. Reviewed By: gmagogsfm Differential Revision: D31326153 fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84	2021-10-27 11:54:26 -07:00
Pavithran Ramachandran	1ce500f56f	[easy][PyTorch] Use `at::native::is_nonzero` (#67195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67195 Now that `is_nonzero` is part of `at::native` refer https://github.com/pytorch/pytorch/pull/66663, replacing `TensorCompare::is_nonzero` to `at::native::is_nonzero` ghstack-source-id: 141514416 Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D31704041 fbshipit-source-id: 36813e5411d0aa2eb2d0442e2a195bbed417b33d	2021-10-26 12:40:32 -07:00
Michael Shi	ad5731cacc	[PyTorch] Add flop count for bmm and baddbmm (#66636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66636 Add FLOP count for bmm and baddbmm, which is `2bmnk`. Reviewed By: ngimel Differential Revision: D31622061 fbshipit-source-id: f3e1e1e34c45228693117b81647fb4a623c4085b	2021-10-25 17:31:12 -07:00
Zhengxu Chen	12daa4f663	[jit][edge] Enable CALL instruction in lite interpreter. (#65964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65964 ghstack-source-id: 141425519 Test Plan: buck run xplat/caffe2:test_lite_interpreter Reviewed By: cccclai Differential Revision: D31326149 fbshipit-source-id: 8a599d92f3fa4e6c125100adb36d89592e71e547	2021-10-25 14:44:33 -07:00
Nikolay Korovaiko	a7ebf76a15	jit trace (#59949 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59949 Reviewed By: ZolotukhinM Differential Revision: D31366787 Pulled By: Krovatkin fbshipit-source-id: 798cbcd97e8ecfba984f98cd70214954be9309af	2021-10-24 18:04:22 -07:00
Chen Lai	5f58764d1d	[PyTorch Edge][type] Add type support for NamedTuple custom class (import) (#63130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63130 Extend `type_parser` to handle `NamedTuple` type. It can be extended to handle other types when needed. The custom type will follow the following format: ``` "qualified_named[ NamedTuple, [ [filed_name_1, field_type_1], [filed_name_2, field_type_2] ] ]" ``` For example: ``` "__torch__.base_models.sparse_nn.pytorch_preproc_types.PreprocOutputType[ NamedTuple, [ [float_features, Tensor], [id_list_features, List[Tensor]], [label, Tensor], [weight, Tensor], ] ]" ``` For nested types, the order of type lists from type table should be: ``` std::string type_1 = “__torch__.C [ NamedTuple, [ [field_name_c_1, Tensor], [field_name_c_2, Tuple[Tensor, Tensor]], ] ]” std::string type_2 = “__torch__.B [ NamedTuple, [ [field_name_b, __torch__.C ] ] ]” std::string type_3 = “__torch__.A[ NamedTuple, [ [field_name_a, __torch__.B] ] ]” std::vector<std::string> type_strs = {type_str_1, type_str_2, type_3}; std::vector<TypePtr> type_ptrs = c10::parseType(type_strs); ``` namedtuple from both `collection` and `typing` are supported ``` from typing import NamedTuple from collections import namedtuple ``` This change only adds the parser and now new runtime can read the above format. ghstack-source-id: 141293658 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.CompatiblePrimitiveType' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.CompatibleCustomType' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.InCompatiblePrimitiveType' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.InCompatibleCustomType' ``` Reviewed By: iseeyuan Differential Revision: D30261547 fbshipit-source-id: 68a9974338464e320b39a5c613dc048f6c5adeb5	2021-10-22 00:40:57 -07:00
David Berard	e86d8323cb	[JIT] Add special cases for batch_norm, instance_norm in alias_analysis (#66554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66554 In native_functions.yaml, the schemas for batch_norm and instance_norm are incorrect: the inputs `running_mean` and `running_var` are mutated, but are not marked as such in the function schema. Since `(a!)?` annotations are currently not working (see #65760), this instead adds a special case to `alias_anaysis.cpp`. If the value of `training` or `use_input_stats` is known to be `false`, then `alias_analysis` will mark the input as _not_ being written to. Test Plan: Removed the `skip` annotation on the following test, and added a special exception in `check_alias_annotations`: ``` python test/test_ops.py -k test_variant_consistency_jit_nn_functional_batch_norm ``` Also: ``` ./build/bin/test_jit --gtest_filter="BatchAndInstanceNormFixture" ``` Imported from OSS Reviewed By: eellison Differential Revision: D31612339 fbshipit-source-id: 12ca61b782b9e41e06883ba080a276209dc435bb	2021-10-20 10:22:10 -07:00
Michael Suo	1bf0e1acb4	Revert D31732414: Add Initial NNC Dynamic Shapes Flow Test Plan: revert-hammer Differential Revision: D31732414 (`de4fe7a38c`) Original commit changeset: 290a94a667c2 fbshipit-source-id: 3021a1d7a8661967e37d4f9cfc86ed47cc4a7f3d	2021-10-19 20:05:29 -07:00
Elias Ellison	de4fe7a38c	Add Initial NNC Dynamic Shapes Flow (#66136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136 FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack.. Takes in a TensorExprGraph of static shapes and generalizes the input shapes to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise dimensions with the same value will be bucketed to the same symbolic shape. E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)` From there, runs symbolic shape inference on the graph, and creates a versioning if in the graph with prim::TensorExprDynamicGuard checking if the inputs at runtime match the Generalized Symbolic Shapes that are inputs to the TE Kernel. The computate to calculate all symbolic dimensions is inlined in to the if block with the TE Kernel. All Sym Dim Value* are appended to the end of the TE Kernel Graph/Node inputs, and the Node is augmented with a integer list attr `symbolic_shape_inputs` that gives the mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in `test_shape_analysis` Returns True on Success, False on Failure, can fail if shape propagation fails to propagate # of dims or if complete shapes on inputs not set. Example transformation ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : int = prim::Constant[value=0]() %4 : Tensor = aten::tanh(%x.1) %5 : Tensor = aten::erf(%4) %6 : Tensor = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor = aten::cat(%7, %3) %9 : Tensor = aten::hardswish(%8) %10 : Tensor = aten::mul(%9, %z) return (%9) ``` -> ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp) %5 : Tensor = prim::If(%4) block0(): %15 : int[] = aten::size(%x_inp) %16 : int[] = aten::size(%y_inp) %17 : int = prim::Constant[value=1]() %18 : int = prim::Constant[value=0]() %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10 %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10 %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10 %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29 %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3) -> (%3) block1(): %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp) -> (%14) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu), %SS_5 : int, %SS_4 : int, %SS_3 : int, %SS_2 : int): %3 : int = prim::Constant[value=0]() %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1) %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4) %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3) %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8) %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z) return (%9) ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732414 Pulled By: eellison fbshipit-source-id: 290a94a667c20467717202a43c60e4f9ca4c00e2	2021-10-19 16:41:49 -07:00
gmagogsfm	147f7559b1	Add `SourceView` which doesn't own source text as base class of `Source` (#65309 ) Summary: This would save the cost copying text from stack to heap in some cases (like parsing function schema during loading phase of libtorch.so) Pull Request resolved: https://github.com/pytorch/pytorch/pull/65309 Reviewed By: swolchok Differential Revision: D31060315 Pulled By: gmagogsfm fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a	2021-10-18 23:17:22 -07:00
Richard Barnes	e0643fa3fc	use irange for loops 5 (#66744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31705358 fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48	2021-10-18 21:59:50 -07:00
Will Constable	d05c1ec007	Add lazy Node base and associated infra (#66601 ) Summary: - Adds Node base class and unit tests - Also adds metadata utils to enable source code annotation and scope tracking Pull Request resolved: https://github.com/pytorch/pytorch/pull/66601 Test Plan: Add new unit tests Reviewed By: desertfire Differential Revision: D31634044 fbshipit-source-id: a042d54f06fbc480acfc63c18d43cb6fceb6fea5	2021-10-18 19:09:42 -07:00
Ivan Yashchuk	0d203a16fe	Add relative and absolute tolerances for matrix_rank, pinv (#63102 ) Summary: This pull request introduces new keyword arguments for `torch.linalg.matrix_rank` and `torch.linalg.pinv`: `atol` and `rtol`. Currently, only tensor overload has default values for either `atol` or `rtol`, the float overload requires both arguments to be specified. FC compatibility: https://github.com/pytorch/pytorch/pull/63102#discussion_r710930509 Fixes https://github.com/pytorch/pytorch/issues/54151. Fixes https://github.com/pytorch/pytorch/issues/66618. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63102 Reviewed By: H-Huang Differential Revision: D31641456 Pulled By: mruberry fbshipit-source-id: 4c765508ab1657730703e42975fc8c0d0a60eb7c	2021-10-17 22:15:42 -07:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Scott Wolchok	e88d1c4f10	[PyTorch] Add tuple inline storage (#64066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066 I noticed a bunch of time being spent heap-allocating Tuples in the unpickler. 1-, 2-, and 3-element Tuples are apparently common enough that they get their own bytecode instructions, so I decided to try also giving them their own representation. We store up to 3 IValues inline in `Tuple` rather than doing a second heap allocation for a `std::vector<IValue>`. ghstack-source-id: 140695395 Test Plan: Added automated tests for TupleElements. Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284 Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422 We went from 347 ms to 302 ms. Reviewed By: dhruvbird Differential Revision: D30592622 fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8	2021-10-15 12:16:51 -07:00
Rohan Varma	06fa6c15c0	Back out "Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor"" (#66393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66393 Third try! Fixes: - test_nccl_timeout can be flaky because of 1s timeout, bump up the timeout to resolve the flakiness. But in general we should not have been relying on time.sleep for this test, filed https://github.com/pytorch/pytorch/issues/66354 to track that. - ciflow/all did not actually run tests due to a bug causing multigpu tests to not be run. This has since been fixed. ghstack-source-id: 140560113 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31534735 fbshipit-source-id: 8b7e0f4fed3972b7a77cbcda28876c9eefb0c7e2	2021-10-14 22:23:22 -07:00
soulitzer	93d326c868	Add InplaceOrView boxed kernel (#63878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63878 See https://github.com/pytorch/pytorch/issues/64407, https://github.com/pytorch/pytorch/issues/62032 for context: In this PR: - Add boxed kernel by replicating `gen_inplace_or_view`'s logic that is ONLY for use with the Autograd not-implemented kernel - Unlike `gen_inplace_or_view` we always pass a view_func to as_view in order to ensure that an "derivative is not implemented" error is raised even if an in-place update is performed on the view. Without the `view_func`, the CopySlice + AsStridedBackward nodes would replace the NotImplemented node. - This limitation makes it impossible to use this node for general use - view relationship must be between first input (must be tensor) and first output (may be tensor or vec of tensor) - do not support non-differentiable views (_values, _indices, view.dtype) - view relationship is always fw and bw differentiable - Adds the macro `#define REGISTER_AUTOGRAD_NOT_IMPLEMENTED_FALLBACK(ns, op)` to be the interface for this feature: - static initialization can be slowed down(? not measured) if there are many registrations, because each line translates to 2 library calls but the workaround is just to manually use the two functions `AutogradNotImplementedFallback` and `ADInplaceOrViewFallback` and call `m.impl`. - Adds testing: - for views: view relationship created - performing in-place operation on the view, raises properly - trying to create two view relationships is not allowed, - single view relationship but not first input/first output should error - view relation created properly for tensor vector output - for in-place: - version count bump - triggers rebase_history - multiple mutations is okay and also updates version counter - TODO (follow up): Update tutorials for adding third-party operators (and document the above limitations) - TODO (follow up): Look at torch-audio/torch-vision and identify places where this can simplify existing code EDIT: Made it more clear what is introduced in this PR and moved some more contextual stuff into the issue itself Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30901714 Pulled By: soulitzer fbshipit-source-id: 48de14c28be023ff4bd31b7ea5e7cba88aeee04c	2021-10-12 18:55:50 -07:00
Kimish Patel	c6216b2a43	Back out "Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source" (#66421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66421 Original commit changeset: ab6bb8fe4e83 Plus this incldes BUILD.bazel changes, the reason for the revert. Test Plan: See original diff Reviewed By: gdankel Differential Revision: D31542513 fbshipit-source-id: ee30aca2d6705638f97e04b77a9ae31fe5cc4ebb	2021-10-12 10:55:29 -07:00
Animesh Jain	cc24e4e5d0	[NNC] Normalize loops in SplitWithTail (#66242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66242 While working on random test generation, I observed that many simple transformations were upsetting vectorization. Digging deeper, I found that it calls SplitWithTail which incorrectly splits the loop when the loop start is not zero. This path normalizes the loop before we start splitting it. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31506853 Pulled By: anijain2305 fbshipit-source-id: 5c5f2568ce0a239bfaa515458be52541eafd23b1	2021-10-11 13:44:05 -07:00
Nikita Shulga	c373387709	Update CMake and use native CUDA language support (#62445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445 PyTorch currently uses the old style of compiling CUDA in CMake which is just a bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as a language just like C++ or C. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503350 fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55	2021-10-11 09:05:48 -07:00
Jane Xu	0a48f56318	Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" Test Plan: revert-hammer Differential Revision: D31299350 (`f1f3bd8c36`) Original commit changeset: 9ad5c8fa17f7 fbshipit-source-id: d63d889922f507a4a0e2e042e451b95b9591c317	2021-10-08 17:55:28 -07:00
Jane Xu	c62ed96496	Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source Test Plan: revert-hammer Differential Revision: D30710710 (`c1343ff706`) Original commit changeset: 51399f9b0b64 fbshipit-source-id: ab6bb8fe4e83ed1052e621e427259192a4f0f540	2021-10-08 17:46:18 -07:00
Rohan Varma	f1f3bd8c36	Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" (#65883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65883 Original commit changeset: d8e962b8aab6 ghstack-source-id: 139836954 Test Plan: ci Reviewed By: zhaojuanmao Differential Revision: D31299350 fbshipit-source-id: 9ad5c8fa17f7038ba579cb1eda6d9271ac07a130	2021-10-08 16:04:20 -07:00
Kimish Patel	c1343ff706	[Pytorch Edge] Support profiling kineto events from external source (#64397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64397 This diff exposes a way to add events to kineto profiler from external source. This can be a backend that executes a subgraph and wants to record this execution in kineto profiler. This diff also adds "backend" metadata to identify the backend an event would have executed on. Test Plan: test_lite_interpreter Imported from OSS Reviewed By: raziel Differential Revision: D30710710 fbshipit-source-id: 51399f9b0b647bc2d0076074ad4ea9286d0ef3e2	2021-10-08 15:59:42 -07:00
Raghavan Raman	92ce188510	Revert D31445799: [nnc] Use given kernel function name while emitting code Test Plan: revert-hammer Differential Revision: D31445799 (`c30dc52739`) Original commit changeset: 8d1642098313 fbshipit-source-id: 6b9d8c816437e9fcba8eb19cc683bc0a46a04cf5	2021-10-08 12:39:01 -07:00
Raghavan Raman	2e6fa0261f	Revert D31445797: [nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination Test Plan: revert-hammer Differential Revision: D31445797 (`7e5ef5e517`) Original commit changeset: 4e1450100928 fbshipit-source-id: fc13b34dbb66c7a22816eb46cf6d98ae9f332d39	2021-10-08 12:38:59 -07:00
Scott Wolchok	2d885ab73d	[jit] Reduce refcounting of Types (#65345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345 FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership. ghstack-source-id: 140044165 Test Plan: CI perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial. Reviewed By: hlu1 Differential Revision: D31027361 fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8	2021-10-08 09:03:04 -07:00
Raghavan Raman	7e5ef5e517	[nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination (#66217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66217 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31445797 Pulled By: navahgar fbshipit-source-id: 4e1450100928132ccce4ef3c6c20ad6661cfabed	2021-10-07 13:17:11 -07:00
Raghavan Raman	c30dc52739	[nnc] Use given kernel function name while emitting code (#66216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66216 Test Plan: Imported from OSS Reviewed By: dagitses, priyaramani Differential Revision: D31445799 Pulled By: navahgar fbshipit-source-id: 8d164209831339d364710b14f6a263a16e108281	2021-10-07 13:15:46 -07:00
Will Constable	a8c0b362ce	[pytorch][PR] Add hash and int128 utils for Lazy Tensor Core" (#66181 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66181 Original commit changeset: 3d0d5377d71e Test Plan: Run PyTorch XLA corresponding PR in XLA CI: https://github.com/pytorch/xla/pull/3148/files Reviewed By: suo Differential Revision: D31416438 fbshipit-source-id: 58a6a49c5bc30134bc6bae2e42778f359b9a8f40	2021-10-07 10:05:26 -07:00
Chen Lai	a5895f85be	[PyTorch Edge][type] Add type check in compatibility api (#63129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63129 1. Add an api to get `supported_types` from runtime, expose in c++ only. 2. Add an api to get `contained_types` from model, expose in both c++ and PyThon. 3. Add a field `contained_types_` in `type_parser.cpp` to track the contained types when parsing python string. 4. Expand `is_compatible` api to check type. When checking type, it will check the contained type list from the model with the support type list from runtime. 5. Expand the unittest for compatibility to cover type 6. Add unit test in python to check type list ghstack-source-id: 139826944 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.GetContainTypes' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleSuccess' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail' buck test //caffe2/test:mobile ``` Reviewed By: iseeyuan Differential Revision: D30231419 fbshipit-source-id: 8427f423ec28cc5de56411f15fd960d8595d6947	2021-10-06 02:23:44 -07:00
Michael Suo	f062def486	Revert D31260343: [pytorch][PR] Add hash and int128 utils for Lazy Tensor Core Test Plan: revert-hammer Differential Revision: D31260343 (`e94fea08d0`) Original commit changeset: 8bb1194188e3 fbshipit-source-id: 3d0d5377d71ed928015bcb2105801be368e38cd8	2021-10-05 17:15:50 -07:00
Will Constable	e94fea08d0	Add hash and int128 utils for Lazy Tensor Core (#65635 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65635 Reviewed By: alanwaketan Differential Revision: D31260343 Pulled By: wconstab fbshipit-source-id: 8bb1194188e3e77fc42e08a14ba37faed37a9c2e	2021-10-05 16:43:55 -07:00
Nikita Shulga	4c4525fa5c	Compile without -Wno-unused-variable (take 2) (#66041 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Do not delete `caffe2::OperatorBase::Output` calls as they have side effects Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041 Reviewed By: ngimel Differential Revision: D31360142 Pulled By: malfet fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8	2021-10-04 20:39:39 -07:00
Don Jang	7941590a51	[JIT] Selectively enable precise alias analysis for TupleConstruct (#66025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66025 This change adds an option to selectively enable precise alias analysis for `prim::`TupleConstruct` (introduced by D30437737 (`cd458fe092`)) to minimize its exposure only to `StaticRuntime` as of now. Test Plan: Modified existing unit tests whose behavior depends on D30437737 (`cd458fe092`). Reviewed By: eellison Differential Revision: D31350285 fbshipit-source-id: 3ce777f07f99650d74634481ad0805192dce55c6	2021-10-01 20:42:22 -07:00
Nikita Shulga	e4ee5ca698	Revert D31326599: [pytorch][PR] Compile without -Wno-unused-variable Test Plan: revert-hammer Differential Revision: D31326599 (`a6280ab653`) Original commit changeset: 924155f1257a fbshipit-source-id: b8ee5bc0298637443232f5ee9ec79e51ed256faf	2021-10-01 20:40:47 -07:00
Nikita Shulga	a6280ab653	Compile without -Wno-unused-variable (#65954 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954 Reviewed By: ngimel Differential Revision: D31326599 Pulled By: malfet fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3	2021-10-01 17:40:47 -07:00
Hariom Narang	2828ce53fd	Added jit log stream changing function and some refactor (#65768 ) Summary: Description: - Have only added `stdout` and `stderr` as possible options from python API for now. We can do file path passing later maybe. - Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file. Python API: `torch._C._jit_set_logging_stream('stdout\|stderr')` C++ API: `::torch::jit::set_jit_logging_output_stream(ostream);` Testing: - Tested python API locally. - Unit test for the C++ API is written Fixes https://github.com/pytorch/pytorch/issues/54182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768 Reviewed By: mrshenli Differential Revision: D31291739 Pulled By: ZolotukhinM fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d	2021-09-30 23:25:11 -07:00
Michael Suo	33c03cb61a	[deploy][1/n] Make deploy code conform to PyTorch style. (#65861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65861 First in a series. This PR changes the code in deploy.h/cpp and interpreter_impl.h/cpp to be camel case instead of snake case. Starting with this as it has the most impact on downstream users. Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D31291183 Pulled By: suo fbshipit-source-id: ba6f74042947c9a08fb9cb3ad7276d8dbb5b2934	2021-09-30 22:59:47 -07:00
Mikhail Zolotukhin	3a0165da49	[TensorExpr] Port NNC lowerings to the new registry mechanism. (#65551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65551 Previously we had a big switch on Op kind to decide how to lower a given JIT operator to NNC. This PR changes this switch to a hash table lookup. Why? This helps us with at least two things: 1) With this approach we can easily check if we know how to handle a given node in advance - i.e. we can inspect the entire graph and tell whether it's possible to compile it or not without actually trying to do that and dying in the middle. This would allow us to, say, provide user-friendly error messages in AOT workflow. 2) We can switch to use schema instead of op kind to determine correct lowering. Unlike op schema, op kind might be ambigous (see e.g. #64963) and using it instead of schema can lead to bugs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31148926 Pulled By: ZolotukhinM fbshipit-source-id: ac12684e2126c899426ef5e4cc1e3f70fa01f704	2021-09-30 22:56:18 -07:00
Don Jang	cd458fe092	[JIT] Make output of prim::TupleConstruct alias only with its inputs (#64879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879 This change makes the output of `prim::TupleConstruct` alias only with its inputs when the created tuple is directly returned from the graph. The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used. Test Plan: Added - `AliasMoveForTupleConstructWithSingleUseAsGraphOutput` - `WildcardAliasForTupleConstructWithUses` to cover the newly added code. Reviewed By: eellison Differential Revision: D30437737 fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb	2021-09-29 21:56:31 -07:00
Mike Ruberry	91f8755b0e	Revert D31005792: [NCCL] Init dummy NCCL comms in constructor Test Plan: revert-hammer Differential Revision: D31005792 (`2b22a5dde2`) Original commit changeset: c2c582dee25a fbshipit-source-id: d8e962b8aab6fda8a6c013e8577492dff9568c27	2021-09-29 20:46:38 -07:00
Rohan Varma	2b22a5dde2	[NCCL] Init dummy NCCL comms in constructor (#65173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65173 Initializes dummy NCCL communicators in constructor for a basic health check that communicators can be initialized prior to launching the first collective. After successful init, we immediately use `ncclCommAbort` to destroy these communicators to ensure they don't interfere with regular communicator creation during collectives. Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D31005792 fbshipit-source-id: c2c582dee25a098361ead6ef03f541e7833c606b	2021-09-29 15:36:54 -07:00
Scott Wolchok	ece25c453f	[PyTorch] Store Argument::alias_info_ on the heap (#64824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64824 See comment in function_schema.h for explanation. I claim that this is a good tradeoff because the aliasing information seems to be used only in compiler-ish code paths, where performance isn't as critical as actual execution. If performance is important there too, perhaps we should hoist isWrite into the Argument itself since there are several paths that only care about isWrite. ghstack-source-id: 138958896 Test Plan: CI, profile schema parsing on startup and see much fewer page faults in createArgumentVector. Reviewed By: suo Differential Revision: D30860719 fbshipit-source-id: 1d4d2328f2b8e34f5ddf9d82083fd4dd7b7f738f	2021-09-24 17:00:51 -07:00
Peter Bell	68e5935498	Remove fgrad_input from slow_conv2d (#64280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64280 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30830887 Pulled By: jbschlosser fbshipit-source-id: 5a3a79ad9d9118177672eabf872f9d9a3313ebe4	2021-09-24 14:27:39 -07:00
XiaobingSuper	1682722152	keep output type after calling SubgraphRewriter (#65453 ) Summary: For jit SubgraphRewriter, it doesn't keep output type after overwriting the old graph, for example, in profiling mode, the old graph has the old operator's shapes, but after replacing the old operator with a newer operator by applying SubgraphRewriter, the tensor shape info was eliminated. The activation is that I want to replace pytorch convolution with a customer's convolution, I first register aten::_convolution as a profiler node that can reorder the input and output's shapes, and then using graph rewrite to replace it as aten::conv2d, which tensors' shapes info are eliminated. I hope using input size do some pre-progress before replacing aten::conv2d with the customer's convolution. Before rewrite: ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/ site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2 2:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %x : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::_convolution(%x.1, %weight, %4, %3, %2, %3, %6, %2, %7, %6, %6, %5, %5), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3. 6/site-packages/torch/nn/modules/conv.py:443:0 %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%x, %z, %7) # jit_test.py: 24:0 return (%16) ``` after rewrite by using aten::conv2d ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:22:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %18 : Tensor = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7) %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py:24:0 return (%16) ``` expected result after replace aten::_convolution with aten::conv2d: ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/ site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2 2:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %18 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7) %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py :24:0 return (%16) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65453 Reviewed By: zdevito Differential Revision: D31162489 Pulled By: ZolotukhinM fbshipit-source-id: 0d1c1d607cb612df47c64f173d9f4c9e8b1d6c49	2021-09-24 11:07:40 -07:00
kshitij12345	a012216b96	[nn] Fold : no batch dim (#64909 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64907 Reference: https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64909 Reviewed By: cpuhrsch, heitorschueroff Differential Revision: D30991087 Pulled By: jbschlosser fbshipit-source-id: 91a37e0b1d51472935ff2308719dfaca931513f3	2021-09-23 08:37:32 -07:00
jiej	127c9402d0	Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137 ) Summary: This reverts commit `03389dc851`. Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745 Fixes the windows build failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137 Reviewed By: seemethere, dzhulgakov, heitorschueroff Differential Revision: D30994556 Pulled By: malfet fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d	2021-09-22 04:54:51 -07:00
Chen Lai	880098a7e3	[PyTorch Edge] Backport function for defaults args with out args, flag on (#63651 ) Summary: 1. Enable support for operators with default args and out args. For `torch.add(x, h, out=x)`, the number of specified arguments will be 3 instead of 4. 2. Bump bytecode version from 6 to 7 3. Implement backport_v7_to_v6 function. Also slightly refactor the local_thread to allow re-emit operators. 4. unittest to cover backport function 5. Update expect result from 4 to 3 in unit test DefaultArgsWithOutArg to cover the number of specified arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63651 ghstack-source-id: 138539912 Test Plan: ``` caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions ``` Reviewed By: raziel, tugsbayasgalan Differential Revision: D30454080 fbshipit-source-id: 357c50b96682430675142d20d688d1f64e1de307	2021-09-20 22:50:30 -07:00
Mengwei Liu	eaf85fad62	[PyTorch] Extract parseOperator() into a standalone source file (#65179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179 This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build. Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`. Reviewed By: iseeyuan Differential Revision: D31006555 fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac	2021-09-17 13:31:59 -07:00
Jane Xu	1ee66a5278	Remove CUDA 9.2 references conditionals and workarounds (#65070 ) Summary: Title says it all Pull Request resolved: https://github.com/pytorch/pytorch/pull/65070 Reviewed By: malfet Differential Revision: D30966464 Pulled By: janeyx99 fbshipit-source-id: e454906fd5d7d321d390939ba5d237e1d9b150f8	2021-09-17 12:28:23 -07:00
Raghavan Raman	bbe25af0df	[nnc] Updated inlining to handle cases when producer indices are constants after eval (#65044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65044 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30954655 Pulled By: navahgar fbshipit-source-id: dfaedb5af710b2625ceec3a443a6c4e34158ab16	2021-09-17 11:28:48 -07:00
Raghavan Raman	03fc636d5c	[nnc] Updated inliner to remove assertions and exception (#64719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64719 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30828583 Pulled By: navahgar fbshipit-source-id: 9826a59085a210e44d101a843ff2cae440dfd633	2021-09-17 11:28:46 -07:00
Edward Yang	9601deb1b3	Disable autograd fallback tests on Windows (#65147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65147 I think they trigger an MSVC bug per https://github.com/pytorch/pytorch/issues/48763 ghstack-source-id: 138247203 Test Plan: breakpointed https://www.internalfb.com/intern/sandcastle/job/9007199738584981/ and sush'ed into the host and ran `buck build arvr/mode/win/opt //xplat/caffe2:autograd_libtorch_test_ovrsource` in `/cygdrive/d/ovrsource-null-hg` Reviewed By: soulitzer Differential Revision: D30992685 fbshipit-source-id: 06c6fb2c18d55490f89fc91ee5b7a4c5a7faf1c6	2021-09-17 08:32:43 -07:00
Eli Uriegas	03389dc851	Revert D30752939: [pytorch][PR] nvfuser update Test Plan: revert-hammer Differential Revision: D30752939 (`cfaecaf40b`) Original commit changeset: ce122e80f01b fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2	2021-09-15 17:38:47 -07:00
Mikhail Zolotukhin	7e9c599784	[TensorExpr] Add a method for sanitizing Var and Buf names in Stmt. (#65010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65010 This pass ensures all names are legal and not-duplicated. Fixes #52727. Test Plan: Imported from OSS Reviewed By: bertmaher, navahgar Differential Revision: D30939717 Pulled By: ZolotukhinM fbshipit-source-id: 7dbe7f937de41f22ad49137a5e067d698443ed63	2021-09-15 17:15:06 -07:00
jiej	cfaecaf40b	nvfuser update (#63745 ) Summary: Syncing nvfuser code base from devel branch, Listing a few of our development since last sync: - Extends support to normalization and reduction kernels. - Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation. - profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes). To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle. internal updates are files located in: 1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda` 2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser` 3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h` updates affecting integration: 1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/`, 2. exposed a few more symbols `aten/src/ATen/core/` used by codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745 Reviewed By: saketh-are Differential Revision: D30752939 Pulled By: malfet fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c	2021-09-15 14:42:55 -07:00
Mikhail Zolotukhin	f23f21dafe	[TensorExpr] Remove 'Placeholder' class. (#64887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887 BufHandle has exactly the same functionality and should be used instead. Differential Revision: D30889483 D30889483 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3	2021-09-14 00:22:44 -07:00
Martin Yuan	30a7c768d7	[RFC] Modularize functions of parsing bytecode (#61862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862 Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter. * The decoupled functions are re-used by current lite interpreter loader. * The bytecode can be serialized/deserialized from other formats. * The decoupled functions have minimum dependencies on other PyTorch components. Next: Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components. ghstack-source-id: 137867287 Test Plan: As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction). CI Reviewed By: larryliu0820 Differential Revision: D29798382 Pulled By: iseeyuan fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f	2021-09-11 22:24:05 -07:00
Mikhail Zolotukhin	180e4fbfae	[TensorExpr] LLVMCodegen: fix lowering for UInt->Float casts. (#64862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64862 Previously we erroneously were looking at dst signedness. This was discovered when we tried to implement quantize/dequantize ops. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30881696 Pulled By: ZolotukhinM fbshipit-source-id: 34af842e5e52a3b6b5d2e70c4ef32f910a20341f	2021-09-11 09:24:36 -07:00
Hui Guo	4481c87ac4	[tensorexpr] Simplify x/100 -> 0 if x is a non-negative integer less than 100. (#64763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64763 Simplification pattern: x/N -> 0; N is a constant positive integer and x is a for-loop index whose range is a subset of [0, N). Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30845854 Pulled By: huiguoo fbshipit-source-id: 814d69ed4be05e57405c222183cc1c6c526721cd	2021-09-10 20:33:02 -07:00
Raghavan Raman	cad7a4b0ea	[nnc] Added an implementation of sign op (#64033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64033 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30579197 Pulled By: navahgar fbshipit-source-id: f9f7fa7f2ffa109cf4e441eb1af821b8b891d4d3	2021-09-10 16:49:04 -07:00
Mikhail Zolotukhin	a17d6c7f80	[TensorExpr] Simplify TE IR before applying any transformations. (#64717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64717 This also exposed several bugs, which are fixed in this PR. Differential Revision: D30826408 D30826408 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: a67ec5739aceed9ffdf0d24f77eb3787cefe4560	2021-09-09 18:50:51 -07:00
Raghavan Raman	b7c86365d1	[nnc] Handled cast in index expression during inlining (#64716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64716 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30826388 Pulled By: navahgar fbshipit-source-id: 7e446602f650527e0d954e437f0370602019e040	2021-09-09 08:30:52 -07:00
Raghavan Raman	652a8bf7d0	[nnc] Updated indices during broadcast to use int64_t (#64627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627 This fixes the root cause of S242719 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30801686 Pulled By: navahgar fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80	2021-09-09 08:29:37 -07:00
Elias Ellison	3bf93d769c	[JIT] Add gradient check in constants (#64613 ) Summary: fixes internal issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/64613 Reviewed By: Gamrix Differential Revision: D30799016 Pulled By: eellison fbshipit-source-id: 48ef52d1cac627919e6cd232216d24878a2a8b58	2021-09-09 08:13:57 -07:00
Hui Guo	5c27a580ec	[tensorexpr] Allocate intermediate buffers at compile time (#64227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64227 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30652220 Pulled By: huiguoo fbshipit-source-id: cd75005cdfa42751318de7174b44e14a3a01634e	2021-09-08 15:34:44 -07:00
Peter Bell	d701357d92	Factor out TensorBase that doesn't depend on native operators (#63612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612 This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to be rebuilt every time someone changes an operator signature. Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to minimize friction in code mixing the two types. To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build system for certain folders, or just define it at the top of any file. I've also included an example of manually special-casing the commonly used `contiguous` operator. The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in `Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can materialize a `Tensor` for use in dispatch without actually increasing its refcount. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728580 Pulled By: ezyang fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03	2021-09-08 13:28:54 -07:00
Mikhail Zolotukhin	72274e2a2f	[TensorExpr] Don't rely on exceptions in Vectorizer. (#64609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64609 We've been using exceptions to indicate whether vectorization succeeded or not, but that posed some problems with (e.g. we spent too much time symbolicazing these exceptions). This change converts this mechanism to a standard error return code. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30795342 Pulled By: ZolotukhinM fbshipit-source-id: 16e38b37bcdd78ceb438ac814cc377f35b058e17	2021-09-08 00:25:34 -07:00
Maksim Levental	81fe2c5e49	add out variant of linear (#61801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61801 resubmitting because the last one was unrecoverable due to making changes incorrectly in the stack Test Plan: Imported from OSS Reviewed By: desertfire Differential Revision: D29812510 Pulled By: makslevental fbshipit-source-id: ba9685dc81b6699724104d5ff3211db5852370a6	2021-09-07 19:58:52 -07:00
Ansley Ussery	6831d8e379	Support Union in TorchScript (#64234 ) Summary: This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234 Reviewed By: gmagogsfm Differential Revision: D30656444 Pulled By: ansley fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a	2021-09-03 06:12:24 -07:00
Chen Lai	8d5b95019d	[PyTorch Edge] Support default args with out arg, flag off (#63540 ) Summary: 1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag. 2. Add two unittests to cover this type of operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540 ghstack-source-id: 137211562 Test Plan: ``` caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg ``` Reviewed By: raziel, iseeyuan, tugsbayasgalan Differential Revision: D30414156 fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f	2021-09-02 01:36:16 -07:00
Kimish Patel	468001600c	Back out "Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling." (#64307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307 Original commit changeset: 0b2aa7c57d08 Restores original changes. This diff changes the way operator profiling is done in lite predictor benchmarking binary. Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile events and then generate operator level metric from it. Since KinetoEvents do not contain cpu clock time, now we report only wallclock time. This unifies various profiling effort that we have for benchmarking purpose. In production we will still use observer based mechanism, but the advantage of using kineto profiler is that we get few other things for free, such as: chrome trace generation. operator level memory profiling (to be added) flop counts (to be added) Furthermore possible we can use python post processing script to parse chrome trace and generate output similar to torch.profiler. (To be done) Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp. Test Plan: aibench run Model without debug info: https://www.internalfb.com/intern/aibench/details/219598441154763 Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information). https://www.internalfb.com/intern/aibench/details/617154236292985 Reviewed By: raziel Differential Revision: D30680354 fbshipit-source-id: b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9	2021-09-01 13:29:35 -07:00
Mikhail Zolotukhin	8337a3fb3f	[TensorExpr] Wrap error messages with buildErrorMessage call. (#64330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64330 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30687226 Pulled By: ZolotukhinM fbshipit-source-id: ade1be2ad6847c6afbba60307ef854696821b4e3	2021-08-31 20:31:16 -07:00
Kimish Patel	67cb131458	Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. Test Plan: revert-hammer Differential Revision: D30327514 (`bc9277dca3`) Original commit changeset: 3bb2f2daaaed fbshipit-source-id: 0b2aa7c57d08de77c9aaa75e546a7d0938610f64	2021-08-31 08:30:36 -07:00
Kimish Patel	bc9277dca3	[Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. (#63367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367 This diff changes the way operator profiling is done in lite predictor benchmarking binary. Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile events and then generate operator level metric from it. Since KinetoEvents do not contain cpu clock time, now we report only wallclock time. This unifies various profiling effort that we have for benchmarking purpose. In production we will still use observer based mechanism, but the advantage of using kineto profiler is that we get few other things for free, such as: - chrome trace generation. - operator level memory profiling (to be added) - flop counts (to be added) Furthermore possible we can use python post processing script to parse chrome trace and generate output similar to torch.profiler. (To be done) Test Plan: aibench run Model without debug info: https://www.internalfb.com/intern/aibench/details/219598441154763 Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information). https://www.internalfb.com/intern/aibench/details/617154236292985 Reviewed By: raziel Differential Revision: D30327514 fbshipit-source-id: 3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f	2021-08-30 20:54:51 -07:00
Will Constable	85df73658c	Make name() part of IMethod interface (#63995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63995 JIT methods already have name() in their interface, and Py methods have names in their implementation. I'm adding this for a particular case where someone tried to use name() on a JIT method that we're replacing with an IMethod. Test Plan: add case to imethod API test Reviewed By: suo Differential Revision: D30559401 fbshipit-source-id: 76236721f5cd9a9d9d488ddba12bfdd01d679a2c	2021-08-30 13:31:55 -07:00
Zhengxu Chen	ac99d63f83	[jit] Make operation call accept Stack& instead Stack* (#63414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414 Misuse of raw pointer in here where stack is never nullable. ghstack-source-id: 136938318 Test Plan: compiles. Imported from OSS Reviewed By: ejguan Differential Revision: D30375410 fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee	2021-08-30 11:49:20 -07:00
Thomas J. Fan	d3bcba5f85	ENH Adds label_smoothing to cross entropy loss (#63122 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/7455 Partially resolves pytorch/vision#4281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63122 Reviewed By: iramazanli Differential Revision: D30586076 Pulled By: jbschlosser fbshipit-source-id: 06afc3aa1f8b9edb07fe9ed68c58968ad1926924	2021-08-29 23:33:04 -07:00
Bert Maher	2e6221a232	[nnc] Make 64-bit dimensions work (#64077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64077 We were assuming kernel dimensions fit in 32 bits (the old fuser made this assumption too), but we should be able to support 64. ghstack-source-id: 136933272 Test Plan: unit tests; new IR level test with huge sizes Reviewed By: ZolotukhinM Differential Revision: D30596689 fbshipit-source-id: 23b7e393a2ebaecb0c391a6b1f0c4b05a98bcc94	2021-08-28 19:59:47 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	d0c63e857d	Enhancement for smart serialization for out schemas (#63096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63096 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30415255 Pulled By: tugsbayasgalan fbshipit-source-id: eb40440a3b46258394d035479f5fc4a4baa12bcc	2021-08-28 11:46:27 -07:00
Mikhail Zolotukhin	2d75ab0c8f	[TensorExpr] Update tutorial. (#64109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64109 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30614050 Pulled By: ZolotukhinM fbshipit-source-id: e8f9bd9ef2483e6eafbc0bd5394d311cd694c7b2	2021-08-27 16:19:29 -07:00
soulitzer	90a6498a12	Add autograd not implemented boxed fallback (#63458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63458 See description and discussion from https://github.com/pytorch/pytorch/pull/62450 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30518572 Pulled By: soulitzer fbshipit-source-id: 3b1504d49abb84560ae17077f0dec335749c9882	2021-08-27 15:00:28 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	19c1b45f25	Detect out argument in the schema (#62755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62755 After this change, out argument can be checked by calling is_out() Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30415256 Pulled By: tugsbayasgalan fbshipit-source-id: b2e1fa46bab7c813aaede1f44149081ef2df566d	2021-08-27 11:20:33 -07:00
Jiewen Tan	ed573a8e08	Enable test_api IMethodTest in OSS (#63345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63345 This diff did the following few things to enable the tests: 1. Exposed IMethod as TORCH_API. 2. Linked torch_deploy to test_api if USE_DEPLOY == 1. 3. Generated torch::deploy examples when building torch_deploy library. Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.* Reviewed By: ngimel Differential Revision: D30346257 Pulled By: alanwaketan fbshipit-source-id: 932ae7d45790dfb6e00c51893933a054a0fad86d	2021-08-26 16:50:52 -07:00
Cheng Chang	0f6b524665	[NNC] Add C++ codegen backend to NNC (#62869 ) Summary: Adds a C++ codegen backend to NNC to generate C++ for CPU instead of generating LLVM IR. Tensors are represented as blobs of float. Vector operations are devectorized/unrolled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62869 Test Plan: https://github.com/pytorch/pytorch/tree/mvz-nnc-aot-prototype makes it able to AOT compile the whole MobileNetV3 model into binary code through LLVM codegen in NNC. I forked that branch to https://github.com/cheng-chang/pytorch/tree/cc-aot-cpp, merged this PR into it, and modified `fancy_compile` to compile MobileNetV3 into C++ through ``` import torch m = torch.jit.load('mobnet.pt') m.eval() f = torch.jit.freeze(m) torch._C._fancy_compile(f.graph, [1, 3, 224, 224]) ``` The generated C++ file `mobnet.cc` can be found at https://gist.github.com/cheng-chang/e2830cc6920b39204ebf368035b2bcec. I manually compiled the generated C++ through `g++ -o mobnet -std=c++14 -L./build/lib -ltorch_cpu -ltorch mobnet.cc`, and it succeeded. Reviewed By: ZolotukhinM Differential Revision: D30149482 Pulled By: cheng-chang fbshipit-source-id: e77b189f0353e37cd309423a48a513e668d07675	2021-08-26 09:56:37 -07:00
Raghavan Raman	6d31ba6ddc	[nnc] Sanitized the names of constants in the input graph. (#63990 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63923 The input graph can contain constants whose names contain special characters. So, all names of constants in the input graph need to be sanitized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63990 Reviewed By: ZolotukhinM Differential Revision: D30558432 Pulled By: navahgar fbshipit-source-id: de5b0c23d50ee8997f40f2c0fc605dda3719186f	2021-08-26 09:52:02 -07:00
Bert Maher	8dda299d96	Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776 I reverted this out of an abundance of caution because some test failures occurred, but they were all due to precision issues fixed lower in this stack. Let's try again. I've rolled the elimination of the allow-parallelism-in-fusions toggle into this diff since they're pretty tightly coupled. ghstack-source-id: 136529847 Test Plan: CI Reviewed By: huiguoo Differential Revision: D30484555 fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59	2021-08-24 18:56:55 -07:00
yanbing-j	33a163d886	Enable BFloat16 LeakyReLU and RReLU in CPU path (#61514 ) Summary: Enable and optimize BFloat16 LeakyReLU and RReLU in CPU path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61514 Reviewed By: ejguan Differential Revision: D30257612 Pulled By: VitalyFedyunin fbshipit-source-id: 8cc0d1faacd02dcc9827af724a86d95b6952748f	2021-08-24 08:34:56 -07:00
Mike Iovine	1385f9fb12	[JIT] Add variadic stack op (#63578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578 Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation. Most of the implementation/tests are the same as `prim::VarConcat`. Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt` Reviewed By: navahgar Differential Revision: D30426232 fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce	2021-08-24 08:20:54 -07:00
Mikhail Zolotukhin	f0d274294d	[TensorExpr] Nuke KernelArena and KernelScope. (#63587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587 Now that there is no classes using KernelArena for memory management we can remove it. Differential Revision: D30429115 D30429115 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544	2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin	62d02f2b57	[TensorExpr] Make 'Tensor' a value type. (#63586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586 This is another commit in transition from KernelArena memory management. Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need to dynamically allocate it at all - it's cheap to pass it by value, and that's what we're switching to in this commit. After this change nothing uses KernelScope/KernelArena and they can be safely removed. Differential Revision: D30429114 D30429114 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819	2021-08-24 00:32:13 -07:00
Mikhail Zolotukhin	dd96c26066	[TensorExpr] More NFC changes like Expr* -> ExprPtr. (#63778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63778 This is a preparation for a switch from raw pointers to shared pointers as a memory model for TE expressions and statements. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30487425 Pulled By: ZolotukhinM fbshipit-source-id: 9cbe817b7d4e5fc2f150b29bb9b3bf578868f20c	2021-08-24 00:30:49 -07:00
Mike Iovine	fc6dd0bc00	[JIT] Move UseVariadicCat internals (#63577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63577 Since other variadic ops will have an almost identical implementation, we can generalize the `UseVariadicCat` implementation and put it in a common folder. Also moved some test utilities that other variadic op tests will likely need. Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOptTest` Reviewed By: navahgar Differential Revision: D30409937 fbshipit-source-id: 925c11c27b58ce98cb8368d2a205e26ba66d3db9	2021-08-23 17:30:36 -07:00
Bert Maher	37d60c08e5	Revert D30360382: [nnc] Support thread level parallelism in fused kernels Test Plan: revert-hammer Differential Revision: D30360382 (`d6d86efb1c`) Original commit changeset: 29acf4e932c6 fbshipit-source-id: e0531113135d30eabb172dc1537d5dd6d65dc438	2021-08-21 03:46:43 -07:00
Bert Maher	76da46ccdc	Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism Test Plan: revert-hammer Differential Revision: D30417127 (`6600bc9651`) Original commit changeset: b77d7c68364f fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1	2021-08-21 03:38:07 -07:00
Bert Maher	6600bc9651	Remove flag to toggle CPU fusion in the presence of parallelism (#63514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30417127 Pulled By: bertmaher fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e	2021-08-20 11:18:19 -07:00
Bert Maher	d6d86efb1c	[nnc] Support thread level parallelism in fused kernels (#63386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63386 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30360382 Pulled By: bertmaher fbshipit-source-id: 29acf4e932c669ce0f35823faea9099bcd8119b6	2021-08-20 11:18:17 -07:00
Raghavan Raman	d82667f7e2	[nnc] Updated sliceTail to do inplace mutation (#63532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63532 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30412184 Pulled By: navahgar fbshipit-source-id: e7669d3b9d24e14501f3feb6505c88d1d42030c6	2021-08-19 22:55:30 -07:00
Raghavan Raman	5e31a3b904	[nnc] Updated sliceHead to do inplace mutation (#63531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63531 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30412183 Pulled By: navahgar fbshipit-source-id: 47ee9482a36e606788d28d22eee4edaca45ffa50	2021-08-19 22:54:05 -07:00
Mikhail Zolotukhin	6e00b31b15	[TensorExpr] Make CacheReplacer and IndexFlattener mutate stmts/exprs inplace. (#63527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63527 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30411411 Pulled By: ZolotukhinM fbshipit-source-id: efb14ee57b36537fa4fefa89bdd6bafe7151c012	2021-08-18 22:59:31 -07:00
Mikhail Zolotukhin	1d62fb8a63	[TensorExpr] Speedup ExternalCall.ComputeInterop test by reducing tensor sizes. (#63526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63526 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30411410 Pulled By: ZolotukhinM fbshipit-source-id: d9a99afac14d2238b5100c98ae9ed4467f9f05ea	2021-08-18 22:58:25 -07:00
Mikhail Zolotukhin	7fdba4564a	[TensorExpr] IRSimplifier: sort terms in polynomials, terms, minterms, maxterms. (#63197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63197 This solves non-determinism from using hash values in sort methods. Changes in tests are mostly mechanical. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292776 Pulled By: ZolotukhinM fbshipit-source-id: 74f57b53c3afc9d4be45715fd74781271373e055	2021-08-18 14:49:27 -07:00
Mikhail Zolotukhin	1dc2b52764	[TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195 This helps us to later switch from using KernelArena with raw pointers to shared pointers without having to change all our source files at once. The changes are mechanical and should not affect any functionality. With this PR, we're changing the following: * `Add` --> `AddPtr` `new Add(...)` --> `alloc<Add>(...)` * `dynamic_cast<Add>` --> `to<Add>` `static_cast<Add>` --> `static_to<Add>` Due to some complications with args forwarding, some places became more verbose, e.g.: `new Block({})` --> `new Block(std::vector<ExprPtr>())` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292779 Pulled By: ZolotukhinM fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9	2021-08-17 13:44:45 -07:00
Mikhail Zolotukhin	548c717cbd	[TensorExpr] Remove test_train from tensorexpr tests. (#63194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63194 This test implements functionality used nowhere, and the author no longer works on that. This PR also adds test_approx to CMakeLists where it's been missing before. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30292777 Pulled By: ZolotukhinM fbshipit-source-id: ab6d98e729320a16f1b02ea0c69734f5e7fb2554	2021-08-16 20:36:31 -07:00
Don Jang	e7724bb100	[JIT] Set future's error to current exception as is when `--torch_jit_enable_rethrow_caught_exception=true` (#63348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348 This change addresses singlaiiit's comment on D30241792 (`61b49c8e41`), which makes the JIT interpreter's behavior consistent between `future` is set and not. Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path. Reviewed By: singlaiiit Differential Revision: D30347782 fbshipit-source-id: 79ce57283154ca4372e5341217d942398db21ac8	2021-08-16 17:32:13 -07:00
Raghavan Raman	e50e8b07d8	[nnc] Updated IRMutator and IRSimplifier to perform in-place mutations. (#63246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63246 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30309636 Pulled By: navahgar fbshipit-source-id: 409ea8d6982888cfee9127e6248044dd2ed9d8d4	2021-08-16 00:09:22 -07:00
Kimish Patel	38c185189c	[Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419 This diff adds support for cpu only kineto profiler on mobile. Thus enabling chrome trace generation on mobile. This bring cpp API for mobile profiling on part with Torchscript. This is done via: 1. Utilizating debug handle annotations in KinetoEvent. 2. Adding post processing capability, via callbacks, to KinetoThreadLocalState 3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be used in surrounding scope of model execution. This will write chrome trace to the location specified in profiler constructor. Test Plan: MobileProfiler.ModuleHierarchy Imported from OSS Reviewed By: raziel Differential Revision: D29993660 fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299	2021-08-13 21:40:19 -07:00
Kimish Patel	1b04d99f55	[Pytorch Profiler] Introduce scopes to enableProfiler (#62417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62417 This diff adds an option to make enableProfiler enable callbacks only for certain RecordScopes. Why? Profiling has some overhead when we repeatedly execute callbacks for alls copes. On mobile side when we often have small quantized models this overhead can be large. We observed that by only profiling top level op and skipping profiling of other atend ops called within we can limit this overhead. For example, instead of profling at::conv2d -> at::convolution -> at::convolution_ and further more if ops like transpose etc. are called, skipping profiling of those. Of course this limits the visibility, but at the least this way we get a choice. Test Plan: Imported from OSS Reviewed By: ilia-cher Differential Revision: D29993659 fbshipit-source-id: 852d3ae7822f0d94dc6e507bd4019b60d488ef69	2021-08-13 21:40:15 -07:00
Kimish Patel	b00afe135d	[Pytorch Profiler] Add debug_handles to KinetoEvent (#62228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62228 This diff adds debug handles to events and provides a way to use RECORD_FUNCTIONs that will pass debug_handles down to profiler, which will record it in the events. Why add debug_handles? For pytorch mobile, with lite interpreter, we generate debug handles that can be used for lazily symbolicate exception traces to model level stack trace. Similar to the model level stack trace you get in TorchScript models. The debug_handles also enable getting module hierarchy for lite interpreter model, support for which was added to KinetoProfiler in previous diffs. Followup plan: 1. Enabled scope callbacks such that lite interpreter can use it to profiler only top level ops. 2. Enable post processing callbacks that take KinetoEvents and populate module hierarchy using debug handles. This will let us use KinetoProfiler for lite interpter use cases on mobile. Aim is to use RAII guard to similarly generate chrome trace for mobile usecases as well, although only for top level ops. Test Plan: test_misc : RecordDebugHandles.Basic Imported from OSS Reviewed By: ilia-cher Differential Revision: D29935899 fbshipit-source-id: 4f06dc411b6b5fe0ffaebdd26d3274c96f8f389b	2021-08-13 21:40:14 -07:00
Kimish Patel	54f2eb6e7e	[Pytorch Profiler] Add support for adding module hierarchy to (#61792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792 KinetoEvent This PR adds module hierarchy information to events. What is module hierarchy information attached to events? During profiling a TorchScript module, when events are added, we ask JIT what is the module hierarchy associated with the node being executed. At the time of execution of that node, there might be multiple frames in the stack of interpreter. For each frame, we find corresponding node and the corresponding module hierarchy is queried. Module hierarchy corresponding to the node is associated with node's InlinedCallStack. InlinedCallStack of node tracks the path via which the node is inlined. Thus during the inlining process we annotate module information corresponding to the CallMethod nodes being inlined. With this PR, chrome trace will contain additional metadata: "Module Hierarchy". This can look like this: TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward It contains module instance, type name and the method name in the callstack. Test Plan: test_profiler Imported from OSS Reviewed By: raziel, ilia-cher Differential Revision: D29745442 fbshipit-source-id: dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528	2021-08-13 21:39:10 -07:00
Don Jang	61b49c8e41	[JIT] Add a flag to rethrow caught exception in jit interpreter (#63073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63073 It turned out that it's less than ideal to print out verbose stacktrace in exception messages in high-QPS services (see the related task) with a non-significant failure rate due to the truncation of long stacktrace which results in losing the original exception message thrown from native code. It is actually desirable to retain only the message of the original exception directly thrown from native code in such a usecase. This change adds a new flag `torch_jit_disable_exception_stacktrace` to the pytorch jit interpreter to suppress stacktrace in the messages of exception thrown from the interpreter. Reviewed By: Krovatkin Differential Revision: D30241792 fbshipit-source-id: c340225c69286663cbd857bd31ba6f1736b1ac4c	2021-08-13 08:44:24 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Nikita Shulga	709ac6853a	Fix warnings (#62930 ) Summary: Add `-Wno-writable-strings`(which is clang's flavor of `-Wwrite-strings`) to list of warnings ignored while compiling torch_python. Avoid unnecessary copies in range loop Fix number of signed-unsigned comparisons Found while building locally on M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62930 Reviewed By: albanD Differential Revision: D30171981 Pulled By: malfet fbshipit-source-id: 25bd43dab5675f927ca707e32737ed178b04651e	2021-08-11 14:07:10 -07:00
Howard Cheng	fa22f6303f	[PyTorch] Add flop count for addmm (#61895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61895 * Add FLOP count for addmm, should be `2mnk`. Share the same code path for `addmm` and `mm`. Test Plan: Imported from OSS `python test/test_profiler.py` Run a sample profile and check that FLOPS for `aten::addmm` is correct. `[chowar@devbig053.frc2 ~/local/pytorch/build] ninja bin/test_jit` `[chowar@devbig053.frc2 ~/local/pytorch/build] ./bin/test_jit --gtest_filter='ComputeFlopsTest'` Reviewed By: dskhudia Differential Revision: D29785671 fbshipit-source-id: d1512036202d7234a981bda897af1f75808ccbfe	2021-08-11 12:33:43 -07:00
Jacob Szwejbka	b746fed164	[Pytorch Edge] Move RuntimeCompatibilityInfo Factory Method (#63005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63005 Realized I forgot to move the Runtime half of these functions be within the struct. Test Plan: ci Reviewed By: pavithranrao Differential Revision: D30205521 fbshipit-source-id: ccd87d7d78450dd0dd23ba493bbb9d87be4640a5	2021-08-11 11:15:57 -07:00
tktrungna	2f5ac9c0ba	update test distributed (#62796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62796 Fixes #62380 * update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder * add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208). ### Test plan check if all ci workflows pass Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30193142 Pulled By: tktrungna fbshipit-source-id: 1247f9eda1c11c763c31c7383c77545b1ead1a60	2021-08-10 16:29:47 -07:00
Howard Huang	4d0497034c	Remove process_group_agent and faulty_process_group_agent files (#62985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62985 Remove the process_group_agent and faulty_process_group_agent code now that PROCESS_GROUP backend has been deprecated for RPC (https://github.com/pytorch/pytorch/issues/55615). Discussed with xush6528 that it was okay to remove ProcessGroupAgentTest and ProcessGroupAgentBench which depended on process_group_agent. Test Plan: CI tests Reviewed By: pritamdamania87 Differential Revision: D30195576 fbshipit-source-id: 8b4381cffadb868b19d481198015d0a67b205811	2021-08-10 15:57:39 -07:00
Will Constable	22e3cc21e5	Back out "Enable test_api IMethodTest in OSS" (#62893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62893 Original commit changeset: 50eb3689cf84 Test Plan: Confirm pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 passes in OSS Reviewed By: seemethere, alanwaketan Differential Revision: D30159999 fbshipit-source-id: 74ff8975328409a3dc8222d3e2707a1bb0ab930c	2021-08-06 16:43:50 -07:00
Jiewen Tan	4b68801c69	Enable test_api IMethodTest in OSS (#62521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62521 This diff did the following few things to enable the tests: 1. Exposed IMethod as TORCH_API. 2. Linked torch_deploy to test_api if USE_DEPLOY == 1. Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.* To be noted, one needs to run `python torch/csrc/deploy/example/generate_examples.py` before the above command. Reviewed By: ezyang Differential Revision: D30055372 Pulled By: alanwaketan fbshipit-source-id: 50eb3689cf84ed0f48be58cd109afcf61ecca508	2021-08-04 21:14:20 -07:00
Raghavan Raman	59dd12042e	[nnc] Removed const from all fields in IR. (#62336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62336 This PR was generated by removing `const` for all types of nodes in NNC IR, and fixing compilation errors that were the result of this change. This is the first step in making all NNC mutations in-place. Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D30049829 Pulled By: navahgar fbshipit-source-id: ed14e2d2ca0559ffc0b92ac371f405579c85dd63	2021-08-03 11:44:36 -07:00
Jacob Szwejbka	474d7ec43b	[Pytorch Edge] Black Box Compatibility API (#61477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61477 It would be nice if the compatibility api was just kinda plug and play with no care about the internals of the api at all. Thats what this diff aims to provide. The general usage would be something like < On the Client > RuntimeCompatibilityInfo runtime_info = get_runtime_compatibility_info(); . . . < On the Server > ModelCompatibilityInfo model_info = get_model_compatibility_info(<model_path>); bool compatible = is_compatible(runtime_info, model_info); Currently RuntimeCompatibilityInfo and ModelCompatibilityInfo are exactly the same, but it seemed feasible to me that they may end up diverging as more information is added to the api (such as a min supported bytecode version being exposed from the runtime). Test Plan: unit test and ci Reviewed By: dhruvbird, raziel Differential Revision: D29624080 fbshipit-source-id: 43c1ce15531f6f1a92f357f9cde4e6634e561700	2021-08-03 11:27:28 -07:00
yanbing-j	c7a7c2b62f	Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525 ) Summary: Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one. Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525 Reviewed By: ejguan Differential Revision: D29940369 Pulled By: ezyang fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf	2021-08-03 06:52:23 -07:00
Hui Guo	3a592730d5	[nnc] Simplify i%100 to i if i is less than 100; fixed #52580 (#60693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60693 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29375938 Pulled By: huiguoo fbshipit-source-id: 1388729c5b93805cb156efa53e8823d5462885bf	2021-08-02 18:38:54 -07:00
Hui Guo	8f7ae77040	[nnc] Add context-sensitive simplification for div/mod (#60688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60688 Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D29373313 Pulled By: huiguoo fbshipit-source-id: 90d7f2fbfce583b0ea3b0f1c7899e22b0210bd62	2021-08-02 18:37:39 -07:00
Joel Schlosser	ee482edf0a	Callable activation function support for Transformer modules (C++) (#62342 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60747 Enhances the C++ versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62342 Reviewed By: malfet Differential Revision: D30022592 Pulled By: jbschlosser fbshipit-source-id: d3c62410b84b1bd8c5ed3a1b3a3cce55608390c4	2021-08-02 08:06:39 -07:00
Will Constable	bc787f2402	Fix setArgumentNames and make Script/Python consistent (#62442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62442 For PythonMethodWrapper::setArgumentNames, make sure to use the correct method specified by method_name_ rather than using the parent model_ obj which itself _is_ callable, but that callable is not the right signature to extract. For Python vs Script, unify the behavior to avoid the 'self' parameter, so we only list the argument names to the unbound arguments which is what we need in practice. Test Plan: update unit test and it passes Reviewed By: alanwaketan Differential Revision: D29965283 fbshipit-source-id: a4e6a1d0f393f2a41c3afac32285548832da3fb4	2021-07-29 21:29:06 -07:00
Dhruv Matani	0b3f42fa4f	[PyTorch Edge] Add test for lite interpreter operator caching (#62306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62306 Test to see if caching of operators works as expected. When caching operators during model load we look up using the operator name. This test ensures that even if there are multiple operators with the same name (in the same model), the caching distinguishes between the ones that have a different number of arguments specified during the call in the serialized bytecode. In this specific test, there's a model with 3 methods, 2 of which return a `float32` tensor and one which return an `int64` dtype. Please see the comments in the diff for details. ghstack-source-id: 134634613 Test Plan: Test command: ``` cd fbsource/fbcode/ buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.OperatorCacheDifferentiatesDefaultArgs' ``` ``` cd fbsource/ buck test xplat/caffe2:test_lite_interpreter ``` Reviewed By: raziel Differential Revision: D29929116 fbshipit-source-id: 1d42bd3e6d33128631e970c477344564b0337325	2021-07-29 20:14:45 -07:00
Dhruv Matani	0bbdf0e1e3	[PyTorch Edge] Add test_lite_interpreter to fbsource xplat BUCK files (#62305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62305 Currently, it's super time consuming to run a lite interpreter test from fbcode since it takes > 10 minutes to build. Recently, I haven't been able to do that either due to low disk space. Having this test available in fbsource/xplat/ is a great win for productivity since I can re-run it in ~2 minutes even after significant changes! I've had to disarm some tests that can only run in OSS of fbcode builds (since they need functionality that we don't include for on-device FB builds). They are disarmed using the macro `FB_XPLAT_BUILD`. ghstack-source-id: 134634611 Test Plan: New test! Reviewed By: raziel, JacobSzwejbka, cccclai Differential Revision: D29954943 fbshipit-source-id: e55eab14309472ef6bc9b0afe0af126c561dbdb1	2021-07-29 20:13:06 -07:00

... 5 6 7 8 9 ...

2136 Commits