pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-08 07:39:33 +01:00

Author	SHA1	Message	Date
Pavithran Ramachandran	62eb7d64cf	[PyTorchEdge] Extend flatbuffer to support extra files map (#72951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72951 Extend flatbuffer to support extra files map Flatbuffer schema has extra files. The users can write extra files by providing a `map<string, string>` which will be part of the flatbuffer model asset and and can be loaded back similar to pickle. ghstack-source-id: 149622799 Test Plan: fb: ```[pavithran@devvm5216.vll0 ~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.ExtraFiles Parsing buck files: finished in 0.7 sec Downloaded 0/8 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 20.0 sec (100%) 22343/22343 jobs, 4/22343 updated Total time: 20.7 sec More details at https://www.internalfb.com/intern/buck/build/7dba5034-d623-4a1e-afa1-b0e809df7066 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 9c1ac1e0-a8c0-4a62-95df-8f49695aa7d1 Trace available for this run at /tmp/tpx-20220216-144630.207992/trace.log RemoteExecution session id: reSessionID-9c1ac1e0-a8c0-4a62-95df-8f49695aa7d1-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7318349470518809 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 468 tests discovered (17.211) ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.169) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7318349470518809```` Reviewed By: iseeyuan Differential Revision: D34286346 fbshipit-source-id: 4e09ab25b8ed6af6f8923db3aab046c255f13bb8 (cherry picked from commit ce8d88e22a360b25253d8a75f428d523fa88a79a)	2022-02-24 19:39:32 +00:00
Jacob Szwejbka	faacb8ab36	[Pytorch Edge] Lean Runtime Test Summary: As far as I can tell theres no CI that actually runs the lean_runtime. This should add it I think. (Is this directory covered by CI?) Next up is to create some test for min_runtime_lib (Note: this ignores all push blocking failures!) Test Plan: buck test :lean_runtime_delegate_flatbuffer_test Reviewed By: iseeyuan Differential Revision: D34255148 fbshipit-source-id: b44693220e93869edd984bbcd17d33db4007a4ea (cherry picked from commit 0a4a6b5bd2b4a1f8cce8bc1c4a22dad9539631c1)	2022-02-24 18:40:47 +00:00
Alban Desmaison	3bd1507ff2	Revert D33994011: Make debug_pkl smaller by only emitting unique traces. Test Plan: revert-hammer Differential Revision: D33994011 (`3d37f5b052`) Original commit changeset: 8e6224c6e942 Original Phabricator Diff: D33994011 (`3d37f5b052`) fbshipit-source-id: 885e739efa1081382e1fcf9c6cccba92c57e9f7a (cherry picked from commit a6d98c85a736c2eb321a6f38005dd0f5dc43eb87)	2022-02-24 16:38:55 +00:00
Han Qi	3d37f5b052	Make debug_pkl smaller by only emitting unique traces. (#72596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72596 debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings. Since many SourceRange shares the same source, the string for trace can be deduped. The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression). The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup. To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction. Test Plan: unit test Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents: ``` [qihan@devvm5585.vll0 ~]$ du archive -h 4.0K archive/xl_model_weights 3.7M archive/extra 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform 8.0K archive/code/__torch__/caffe2/torch/fb 8.0K archive/code/__torch__/caffe2/torch 8.0K archive/code/__torch__/caffe2 20M archive/code/__torch__/torch/fx/graph_module 20M archive/code/__torch__/torch/fx 8.0K archive/code/__torch__/torch/classes 20M archive/code/__torch__/torch 20M archive/code/__torch__ 20M archive/code 2.7M archive/constants 35M archive [qihan@devvm5585.vll0 ~]$ du resaved -h 4.0K resaved/extra 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform 8.0K resaved/code/__torch__/caffe2/torch/fb 8.0K resaved/code/__torch__/caffe2/torch 8.0K resaved/code/__torch__/caffe2 1.3M resaved/code/__torch__/torch/fx/graph_module 1.3M resaved/code/__torch__/torch/fx 8.0K resaved/code/__torch__/torch/classes 1.4M resaved/code/__torch__/torch 1.4M resaved/code/__torch__ 1.4M resaved/code 2.7M resaved/constants 13M resaved [qihan@devvm5585.vll0 ~]$ ``` Reviewed By: JasonHanwen Differential Revision: D33994011 fbshipit-source-id: 8e6224c6e942e91c3403f686c8f0937d1002ed41 (cherry picked from commit a7014dd4029308c95007f362a57c31796d686647)	2022-02-24 09:31:16 +00:00
Hui Guo	5eb5b61221	[tensorexpre] Add typecast when src and dest buf types are different in PlacementAllocate (#71934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71934 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33826700 Pulled By: huiguoo fbshipit-source-id: 9fb29a43ab5983586a6bfde3a34d7e2f2120ab0a (cherry picked from commit 2bee018691ec888cb1ec761528951f5745d7ef79)	2022-02-23 19:36:50 +00:00
Yedidya Feldblum	7a5b0efc64	[caffe2] fix build failures in optimized builds under clang Summary: There are various possible approaches, but the approach chosen minimizes disruption to source control blame. Addresses: ``` error: Function _ZN23FunctionalTest_Pad_Test8TestBodyEv is too big to optimize [-Werror,-Wignored-optimization-argument] ``` Test Plan: buck2 build mode/opt caffe2/test/cpp/api:functional Reviewed By: jamesr66a Differential Revision: D34027291 fbshipit-source-id: 9dfd771ad56d3d4bc0d41b38b04654c8dae7c006 (cherry picked from commit `d43b5a7ed6`)	2022-02-22 22:31:47 +00:00
Raghavan Raman	0d66748948	[jit] Add tests for JIT with dynamic shape fusion (#72201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72201 Reviewed By: mikaylagawarecki Differential Revision: D34067211 Pulled By: navahgar fbshipit-source-id: 2c13bb43c76c7fed720ad37892d2177c3dc0b924 (cherry picked from commit `eed2d8cea4`)	2022-02-18 23:29:08 +00:00
Alban Desmaison	0951cb513a	Revert D34342689: Revert D34250357: Sync lazy_tensor_staging back to master Test Plan: revert-hammer Differential Revision: D34342689 Original commit changeset: 43f6da6986f7 Original Phabricator Diff: D34250357 (`69389fb542`) fbshipit-source-id: 8a3fb74877e719e9b9577b58027b4e7061a04ef0 (cherry picked from commit `c749f08e7a`)	2022-02-18 17:31:21 +00:00
Alban Desmaison	86a961af87	Revert D34250357: Sync lazy_tensor_staging back to master Test Plan: revert-hammer Differential Revision: D34250357 (`69389fb542`) Original commit changeset: aa7d589f6050 Original Phabricator Diff: D34250357 (`69389fb542`) fbshipit-source-id: 43f6da6986f7fc5189d641b7803adc5ada27194c (cherry picked from commit `3c930a5e4e`)	2022-02-18 15:47:37 +00:00
Will Constable	69389fb542	Sync lazy_tensor_staging back to master (#72875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72875 This diff contains changes from several PRs landed to lazy_tensor_staging branch. * generating 'fallback' overrides for each codegenned op, useful for debugging * supports operators which are missing aten:: symbols for op names, instead using their string counterpart * makes the IR class a base class instead of hardcoding the assumption of TS It also resolves lint issues and in particular cleans up the following: * {Type}s shouldn't be passed into isValueType, and using the catch-all base class of CType is nicer than specifying a list of types. Fixes #72852 Test Plan: test manually on lazy_tensor_staging branch Reviewed By: shunting314 Differential Revision: D34250357 fbshipit-source-id: aa7d589f605055d5d02bc77c77fa6f1182ff7497 (cherry picked from commit `2f8f5e4971`)	2022-02-18 03:49:46 +00:00
Raghavan Raman	6d33852685	[NNC] TensorExprKernel state should not be modified on calls to run methods (#73028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73028 A typical use case for `TensorExprKernel` is to create the kernel once and call it multiple times, possibly in parallel. For the parallel calls to work, we need to ensure that the run() method calls do not change any state in `TensorExprKernel`. Before this change, the `run()` method was modifying the sizes and strides vectors when dynamic shapes were present. This manifested as a data race when running a model with Static Runtime. ghstack-source-id: 149398820 Test Plan: ``` buck build mode/dev-asan //caffe2/test/cpp/tensorexpr:tensorexpr ./buck-out/dev/gen/caffe2/test/cpp/tensorexpr/tensorexpr --gtest_filter="DynamicShapes.MultiThreadedExecution" ``` Reviewed By: eellison Differential Revision: D34287960 fbshipit-source-id: d311f3c5a66c5d5de4e1deaeaa01816b53e9906e (cherry picked from commit `161568bfae`)	2022-02-17 23:14:27 +00:00
Mike Iovine	d1c5f9e439	[JIT][SR] Introduce prim::IfThenElse (#72587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72587 This pattern frequently appears in a few graphs: ``` %result = prim::If(%condition) block0(): -> (%a) block1(): -> (%b) ``` This is slow, particularly in static runtime. Static runtime creates memory planners/block runners for each sub-block, which eats up a lot of memory and introduces a lot of extra overhead for this relatively simple operation. This diff introduces a new op that replaces nodes like the above with a single op meant to act like a ternary operator: ``` %result = prim::IfThenElse(%condition, %a, %b) ``` Test Plan: New unit tests Reviewed By: eellison Differential Revision: D34091789 fbshipit-source-id: eb6a8c460c39b4c019a1f4ab1f3f1e5b6edc400c (cherry picked from commit `0f1b335e5b`)	2022-02-17 18:22:48 +00:00
Will Constable	889f3f48b2	Revert D34178476: Update lazy_ir.py from lazy_tensor_staging Test Plan: revert-hammer Differential Revision: D34178476 (`3842140fd5`) Original commit changeset: 7190b2e0d82b Original Phabricator Diff: D34178476 (`3842140fd5`) fbshipit-source-id: 4c969a355f01244c6f5acc52bc31679f2182aa55 (cherry picked from commit `17082075dd`)	2022-02-16 19:34:41 +00:00
Will Constable	3842140fd5	Update lazy_ir.py from lazy_tensor_staging (#72730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72730 This diff contains changes from several PRs landed to lazy_tensor_staging branch. - generating 'fallback' overrides for each codegenned op, useful for debugging - supports operators which are missing aten:: symbols for op names, instead using their string counterpart - makes the IR class a base class instead of hardcoding the assumption of TS Test Plan: tested on lazy_tensor_staging branch Reviewed By: desertfire Differential Revision: D34178476 fbshipit-source-id: 7190b2e0d82b4eb1f4510c858c24446c6df3f9d0 (cherry picked from commit `6713d3f0ef`)	2022-02-16 18:33:31 +00:00
Shunting Zhang	763ad1bf25	(2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change (#72899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72899 Reland D33282878 (`911d527b87`). This is the frontend change. ghstack-source-id: 149204031 Test Plan: Refer to D33282878 (`911d527b87`). Also check CI Reviewed By: gmagogsfm Differential Revision: D34252127 fbshipit-source-id: 27b17ddd4d05d904eb91fd9ee094d9121f00e388 (cherry picked from commit `1d276baca3`)	2022-02-16 03:45:15 +00:00
Ivan Kobzarev	67cd98fad4	[tensorexpr] Fix isNLC segfault (#72786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72786 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D34204523 Pulled By: IvanKobzarev fbshipit-source-id: 9a0f2ce0a1921e261932029c3ebd842330fdf528 (cherry picked from commit `b8326064f6`)	2022-02-15 20:31:56 +00:00
Michael Suo	7db4a48d92	Revert D33342569: (2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change Test Plan: revert-hammer Differential Revision: D33342569 (`856157fcee`) Original commit changeset: 57984ac67ae2 Original Phabricator Diff: D33342569 (`856157fcee`) fbshipit-source-id: 4c12235a1776a3652e7f91e93b626705759d5176 (cherry picked from commit `4cbd7d8bab`)	2022-02-15 18:45:44 +00:00
Shunting Zhang	856157fcee	(2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change (#70471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70471 Reland D33282878 (`911d527b87`). This is the frontend change. ghstack-source-id: 149114933 Test Plan: Refer to D33282878 (`911d527b87`). Also check CI Reviewed By: gmagogsfm Differential Revision: D33342569 fbshipit-source-id: 57984ac67ae2c56c38f72d3b1fb69105901fb472 (cherry picked from commit `b47cc935ee`)	2022-02-15 07:21:19 +00:00
Pavithran Ramachandran	a482aeb0ce	[PyTorchEdge] backport v8 to v7 to support promoted ops as instruction (#71662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71662 backport v8 to v7 to support promoted ops as instruction a flag to help export as instruction from v8 and export as operators for v7 and below Test Plan: ``` buck test caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5629499620570927 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 461 tests discovered (15.693) ✓ Pass: caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions (2.712) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5629499620570927 ``` ``` buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen buck test mode/opt //caffe2/test:upgrader_codegen -- mobile.test_upgrader_codegen.TestLiteScriptModule Parsing buck files: finished in 0.8 sec Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 01:39.4 min (100%) 11031/11031 jobs, 2/11031 updated Total time: 01:40.2 min More details at https://www.internalfb.com/intern/buck/build/a8b0e417-019c-44ba-be6b-23379411a965 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 44fbfa66-cce8-4277-82ac-f89d79558581 Trace available for this run at /tmp/tpx-20220202-160956.915412/trace.log RemoteExecution session id: reSessionID-44fbfa66-cce8-4277-82ac-f89d79558581-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/281475200877601 ✓ ListingSuccess: caffe2/test:upgrader_codegen : 1 tests discovered (1.249) ✓ Pass: caffe2/test:upgrader_codegen - test_generate_bytecode (mobile.test_upgrader_codegen.TestLiteScriptModule) (1.365) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/281475200877601 ``` Reviewed By: iseeyuan Differential Revision: D33719098 fbshipit-source-id: e2d2b23d298f98e4d4fcdfc344f7b8c6f92cff26 (cherry picked from commit `81b956c23a`)	2022-02-15 03:47:39 +00:00
jiej	2d110d514f	Nvfuser code bump 2_1_2022 (#72127 ) Summary: Things changed in this PR that requires review: 1. aten/src/ATen/core/interned_strings.h 2. torch/csrc/jit/ir/alias_analysis.h : exposing createValue to allow efficient mutation 3. torch/csrc/jit/runtime/symbolic_shape_registry.cpp : added gelu/tanh/erf in registry 4. torch/jit/_script.py : throws scripting model sees autocast as decorator since it's not supported nvfuser code update: 1. codegen improvements and performance tuning 2. integration bug fixes for shape expression logic 3. kernel segmentation update to address perf regression from horizontal fusion 4. scalar cpu tensor promotion to support inter-device operation between cpu scalar tensor and cuda tensor Things reverted from local changes: aten::gelu with approximation (tracked in PR: https://github.com/pytorch/pytorch/pull/61439) Pull Request resolved: https://github.com/pytorch/pytorch/pull/72127 Reviewed By: HamidShojanazeri Differential Revision: D34113233 Pulled By: jbschlosser fbshipit-source-id: b82cde32b71e324eca0ea57cb8c9f9647278ca74 (cherry picked from commit `e009bc5c4e`)	2022-02-15 00:43:16 +00:00
Jacob Szwejbka	52c516ecb8	[Pytorch Edge] Minor improve documentation in test_backend_with_compiler Summary: Went through all these files and the design doc to understand the to_backend api. Figured I could add some comments to these files to make the apis a little clearer for those that come after. (Note: this ignores all push blocking failures!) Test Plan: na Reviewed By: raziel, larryliu0820 Differential Revision: D34221989 fbshipit-source-id: 699fcbd8714bfb6b58c6c0bf0e5fbc019d2ef6f8 (cherry picked from commit `0b3f5d73e8`)	2022-02-14 23:44:46 +00:00
Ryan Spring	4f8b986e28	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: VitalyFedyunin Differential Revision: D33894937 Pulled By: jbschlosser fbshipit-source-id: b65e8fb6ea66168af8f34f45ed50e92737a33851 (cherry picked from commit `6e986f91a9`)	2022-02-14 03:40:32 +00:00
Mikhail Zolotukhin	1855b14922	[TensorExpr] Delet `DimArg` class. (#72390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72390 This class didn't add much value and only caused more boilerplate code. This change removes the class and updates all the use cases with uses of `ExprHandle`. A side effect of this change is different names in loop variables, which caused massive mechanical changes in our tests. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D34030296 Pulled By: ZolotukhinM fbshipit-source-id: 2ba4e313506a43ab129a10d99e72b638b7d40108 (cherry picked from commit `c2ec46a058`)	2022-02-11 01:21:59 +00:00
Mikhail Zolotukhin	9123e9b3b5	[TensorExpr] Switch from `ExprPtr` to `ExprHandle` in Compute impl. (#72389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72389 This is an NFC change that just prepares the code for the upcoming deletion of `DimArg` class. This change makes `Compute` and `Reduce` APIs to use `ExprHandle` everywhere. There should be no observable behavior change from this PR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D34030295 Pulled By: ZolotukhinM fbshipit-source-id: 3fd035b6a6bd0a07ccfa92e118819478ae85412a (cherry picked from commit `1b0a4b6fac`)	2022-02-11 01:21:59 +00:00
David Berard	c314750401	[JIT] enable profiling optional tensors (#70532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70532 This adds profiling to Optional[Tensor] types First, in profiling_record.cpp, profiling nodes are added to Optional[Tensor] inputs. The nodes record (a) whether or not any `None` types are encountered, and (b) of the Tensor types, what's the most specific type matching all of non-null tensors that were encoutered (shape, dtype, etc.) In tensorexpr_fuser, when specializing types based on the profiled information, an Optional[Tensor] type will always be Optional[], but the Tensor type contained in the optional type can be specialized (e.g. `Optional[Float(2x2x2, cpu, etc)]`) Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33714748 Pulled By: davidberard98 fbshipit-source-id: 93c819054450de7ac84b112de1012c0c12e34120 (cherry picked from commit `21cfd80123`)	2022-02-08 22:52:26 +00:00
Raghavan Raman	765908708b	[nnc] Adding a test with dynamic shapes from a model (#72198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72198 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D33951741 Pulled By: navahgar fbshipit-source-id: 596b193eba14c8e1affa9fa13070079f05d64cac (cherry picked from commit `ddbb78ff80`)	2022-02-08 02:00:46 +00:00
Raghavan Raman	ff71429906	[nnc] Add stride args while running with allocated outputs (#72223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72223 ghstack-source-id: 148494871 Test Plan: ``` buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - DynamicShapes.GraphWithSymbolicStrides' ``` Reviewed By: eellison Differential Revision: D33960592 fbshipit-source-id: 6334978d5e3713889b4ad12bcd8ed8c69df39d58 (cherry picked from commit `95cc102bc2`)	2022-02-07 19:24:56 +00:00
Han Qi	57f039b41f	Fixing few bugs in torch flatbuffer (#72349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72349 1. Interface call'd methods need to be registered to class. Previously all interface calls are inlined so there was no such problem. 2. parseDoubleList and parseBoolList got reversed when refactoring. Test Plan: 1. Get ASR's test model at ``` mkdir ~/asr1 && cd ~/asr1 fbpkg fetch speech.tuna.milan.ondevice.en_us ``` 2. Convert model: ``` cd ~/fbsource buck run //xplat/caffe2/fb/lite_predictor:convert_model -- --model=$HOME/asr1/pytorchmodel.pt --output_name=$HOME/asr1/pytorchmodel.ff ``` 3. Ran lite_predictor_flatbuffer ``` buck run //xplat/caffe2/fb/lite_predictor:lite_predictor_flatbuffer -- --model=$HOME/asr1/pytorchmodel.ff --method_to_call=encode_src --method_to_generate_input=get_all_bundled_inputs_for_encode_src ``` See perf metric generated (means loading and inference succeeded). Reviewed By: gmagogsfm, zhxchen17 Differential Revision: D33959746 fbshipit-source-id: 24671e1189438119f477032eb6c29bd7736e74ca (cherry picked from commit `5e18809350`)	2022-02-05 00:25:27 +00:00
Raghavan Raman	38f696c0cd	[nnc] Add a API to unroll loops by a given factor (#72071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72071 Reviewed By: ngimel Differential Revision: D33946250 Pulled By: navahgar fbshipit-source-id: 3f3f92054174620025a9d71154d006f1738953e2 (cherry picked from commit `d8b53598e9`)	2022-02-03 18:41:21 +00:00
kshitij12345	02f6226bff	[fix] Dropout2d-3d no-batch-dim (#69885 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69801 TODO: * [x] Update C++ API cc albanD mruberry jbschlosser walterddr kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69885 Reviewed By: mruberry Differential Revision: D33175470 Pulled By: jbschlosser fbshipit-source-id: c9d7d9e0f59ba290a0157725c338a345f3d58b9f (cherry picked from commit `7e4271a156`)	2022-02-02 16:40:32 +00:00
CodemodService FBSourceClangFormatLinterBot	ed435e903f	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33938055 fbshipit-source-id: 6c0643a18f09854e87e183341f252c66dd6395a6 (cherry picked from commit `fd183aedbc`)	2022-02-02 11:27:15 +00:00
Ivan Kobzarev	34e4418dfa	[nnc] tensorexpr for quantized/aten::upsample_nearest2d (#71236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71236 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33553305 Pulled By: IvanKobzarev fbshipit-source-id: 2442afee6d23123bb3a4bc52d3555393b0254106 (cherry picked from commit `90a263fc08`)	2022-02-01 19:48:53 +00:00
Elias Ellison	cf1833df70	[WIP] add explicit dynamic fusion arg (#71173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71173 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D33536222 Pulled By: eellison fbshipit-source-id: a097408ecdd6e284432de128feb297993d882d52 (cherry picked from commit `0e3419b2d3`)	2022-02-01 19:07:02 +00:00
Nikita Shulga	74c44ba9d6	Revert D33850228: [pytorch][PR] Implement Tanh Gelu Approximation Test Plan: revert-hammer Differential Revision: D33850228 (`23d03025dc`) Original commit changeset: 3cc33fb298e4 Original Phabricator Diff: D33850228 (`23d03025dc`) fbshipit-source-id: 9436e7df73c2b2e2011f321674f24973316d3692 (cherry picked from commit `c9efb58223`)	2022-01-31 17:44:19 +00:00
Ryan Spring	23d03025dc	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: cpuhrsch Differential Revision: D33850228 Pulled By: jbschlosser fbshipit-source-id: 3cc33fb298e480d7ecc5c67716da019d60c6ab33 (cherry picked from commit `3a53b3e94f`)	2022-01-31 17:07:45 +00:00
Tristan Rice	6208c2800e	torch/monitor: merge Interval and FixedCount stats (#72009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72009 This simplifies the Stats interface by merging IntervalStat and FixedCountStat into a single Stat w/ a specific window size duration and an optional max samples per window. This allows for the original intention of having comparably sized windows (for statistical purposes) while also having a consistent output bandwidth. Test Plan: ``` buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor ``` Reviewed By: kiukchung Differential Revision: D33822956 fbshipit-source-id: a74782492421be613a1a8b14341b6fb2e8eeb8b4 (cherry picked from commit `293b94e0b4`)	2022-01-30 23:21:59 +00:00
David Berard	99bc978b78	[JIT] Propagate requires_grad to autodiff subgraphs (#71666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71666 When JIT autodiff is constructing a gradient computation graph, it will only add gradients for tensors that require_grad. Previously, require_grad information was not propagated to the subgraph that autodiff used; as a result, autodiff would calculate all gradients, even if requires_grad had never been set during profiling runs. In certain cases, this can lead to performance issues. For example, during training, the gradient of the input data is not needed, but is still computed. This propagates requires_grad to the subgraph passed into autodiff, so that autodiff will not compute unnecessary gradients. Test: `./bin/test_jit --gtest_filter="AutodiffRemoveUnusedGradientsTest.Linear"` Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D33725304 Pulled By: davidberard98 fbshipit-source-id: ca7ab4c9a6a26f94f93aff2d5a4135e125323ba1 (cherry picked from commit `a97fe0556d`)	2022-01-28 18:57:36 +00:00
Joel Schlosser	cb823d9f07	Revert D33744717: [pytorch][PR] Implement Tanh Gelu Approximation Test Plan: revert-hammer Differential Revision: D33744717 (`f499ab9cef`) Original commit changeset: d64532a562ed Original Phabricator Diff: D33744717 (`f499ab9cef`) fbshipit-source-id: 396c3f63de5865f894dbc353d0790a01a624be93 (cherry picked from commit `e9fb2d1db1`)	2022-01-28 18:35:01 +00:00
Ryan Spring	f499ab9cef	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: mikaylagawarecki Differential Revision: D33744717 Pulled By: jbschlosser fbshipit-source-id: d64532a562ed53247bb4fa52bb16722634d5c187 (cherry picked from commit `4713dd9cca`)	2022-01-28 16:59:09 +00:00
John Clow	c85965600c	Fix bug where frozen mod not used for OFI #68903 (#71436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71436 Fixes issue #68903 Test Plan: Imported from OSS Reviewed By: george-qi Differential Revision: D33824857 Pulled By: Gamrix fbshipit-source-id: 8d351feb4a621916f55003c58527a1e85eec476e (cherry picked from commit `57bb420040`)	2022-01-27 23:37:50 +00:00
Pavithran Ramachandran	bf69a61293	(1/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: backend change Summary: Reland for D33282878 (`911d527b87`) . Land backend change first to maintain FC. Will wait for 2 weeks after this diff is in. And than land the front-end change in next diff. Test Plan: test in next diff time buck test mode/dev-nosan fblearner/flow/projects/langtech/translation:tests -- test_e2e_base_training Reviewed By: gmagogsfm Differential Revision: D33342547 fbshipit-source-id: b3dee9a4bdfd78103848c12629e5fccafdd621e3 (cherry picked from commit `ae1935f1af`)	2022-01-27 03:29:40 +00:00
Mikhail Zolotukhin	1dbcde2ade	[TensorExpr] Support scalar intermediate and output values. (#71186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71186 So far we've only supported scalar inputs, but couldn't handle scalar outputs or intermediates. This PR adds it. Scalar outputs are returned as 0-dim tensors. If the kernel is invoked on a stack of IValues, we correctly convert the results to scalar IValues when needed. If the kernel is invoked with a vector of void* pointers, everything works out of the box without any conversions. Lowerings for scalar operators are a bit tricky. Usual lowerings return a pair <Buf, Stmt> (aka Tensor), but for scalar operators we also want to have the corresponding Var that the lowering function supposedly creates (in theory we could just use Loads and Stores, but I'm worried it can affect performance as there is no guarantee this will be optimized by LLVM). So, what we do here to work around this is we return a fake buf + stmt that sets the corresponding var. Then outside of the lowering we create a real buffer and generate a Store to it with the value from the variable we passed as the base handle of the fake buf. This real buffer is then treated as usual by the rest of the system and we can use it if we need to return this scalar value as a kernel output. If we do not need to return it, then the Store will be deleted by the DCE pass. Differential Revision: D33539324 D33539324 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: ab4524b9820ce204f106effcf6232ed33d4ee223 (cherry picked from commit `7faa0939f0`)	2022-01-26 06:32:51 +00:00
Jacob Szwejbka	70f3078dd6	[Pytorch Edge] Wrap lowered module in to_backend (#71597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71597 Problem: _jit_to_backend overrides get/set state. This means any attributes added to the module after lowering will not be preserved after serialization. For edge workflows the biggest problem here is it breaks bundled_inputs. Solution?: Real quick and easy way to handle issues with to_backend overriding get/set state. Wraps the lowered module in another module and has forwarding functions for the api specified in 'method_compile_spec'. The tradeoff with this approach is now the actual workhorse of the module is 1 layer deep which might make debugging slightly grosser/more difficult/confusing. The other approach Martin David and I talked about would be to only lower the portions that require custom get/set state logic. This leaves the top level the same, and only specific backened internals are changed. Personally I'm not sure how much that really addresses the debugging concern all that well. It seems like if you cracked the model open you'd still run into similar amounts of confusion with a lot of the variables and logic referenced coming from another module. The other concern with this approach is whether or not 'compile_spec' specifies the public api of the module (since thats our source of truth for this wrapper). While it may not be enforced, it certainly seems to be true by convention and the to_backend api already uses it as a source of truth for all functions that get generated in the resulting module. I say we just formally commit to this (compile spec keys being functions) being the contract of the api instead of just assuming it to be the case and then having weird behavior if its not. Test Plan: New Unit Test CI to check for existing behavior and contracts. manually tested in a notebook with bundled inputs. {P475790313} Reviewed By: raziel Differential Revision: D33694257 fbshipit-source-id: 9ff27db421eba41bac083dff11a22e9e40a36970 (cherry picked from commit `91ef49977e`)	2022-01-25 06:30:19 +00:00
Peter Bell	40d1f77384	Codegen: python_torch_functions only include relevant operators (#68693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68693 Generation of python bindings for native functions is split over 8 different files. One for each namespace, with the torch namespace split into 3 shards, and methods in their own file as well. This change ensures that editing any single (non-method) operator only causes one of these files to be rebuilt. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596270 Pulled By: albanD fbshipit-source-id: 0570ec69e7476b8f1bc21138ba18fe8f95ebbe3f (cherry picked from commit `ba0fc71a3a`)	2022-01-21 15:37:06 +00:00
Jacob Szwejbka	e926360cb8	[Pytorch Edge] Refactor Compatibility Stuff into own directory (#71432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71432 Organizing jit/mobile a little more ghstack-source-id: 147184536 Test Plan: ci. Reviewed By: iseeyuan Differential Revision: D33640527 fbshipit-source-id: f3a7884fe0d06d80bb8d9cf141ecaee34b6f88ff (cherry picked from commit `4c3d1e5435`)	2022-01-20 19:38:41 +00:00
Han Qi	21b697b646	add flatbuffer_loader and flatbuffer_serializer as BUCK target (#71463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71463 title Test Plan: unittest Reviewed By: zhxchen17 Differential Revision: D33651339 fbshipit-source-id: 4bf325a40e263a441fd86bce560645ad0c1ebb23 (cherry picked from commit `4cb02e62a6`)	2022-01-20 04:51:10 +00:00
Raghavan Raman	70c9146c40	[nnc] Update block and thread extents in cuda_codegen to use int64_t (#71428 ) Summary: The block and thread extent calculations in `cuda_codegen` should be using `int64_t` instead of `int`. The updated test, `test_dynamic_shapes`, fails without this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71428 Reviewed By: samdow Differential Revision: D33640374 Pulled By: navahgar fbshipit-source-id: 64c340ad2a9a1fa1fe066cf1c5dfc3b546b7be6d (cherry picked from commit `6ea546ce11`)	2022-01-19 23:21:24 +00:00
Peter Bell	6f4c491c6b	empty_cpu: Add functions that don't depend on Tensor (#70613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70613 This refactors `at::detail::empty_cpu` to use only `TensorBase` so you can construct tensors without including `Tensor.h`. It also adds a `TensorOptions` version to reduce friction in operators moving from the `at::empty` API. Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33623682 Pulled By: ngimel fbshipit-source-id: 7a7b08bc2ed06830a3d698197a0c8389a096dc1d (cherry picked from commit `2e17ad0bbd`)	2022-01-19 00:01:58 +00:00
Jiewen Tan	680d61daab	[LT] Remove torch::lazy::convertShapes (#71291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71291 This commit removes torch::lazy::convertShapes since it's no longer used. In addition, it replaces a numel logic within LTCTensorImpl. Test Plan: ./build/bin/test_lazy CI in lazy_tensor_staging branch Reviewed By: wconstab Differential Revision: D33575084 Pulled By: alanwaketan fbshipit-source-id: b104ef39fd552822e1f4069eab2cb942d48423a6	2022-01-14 12:06:39 -08:00
CodemodService FBSourceClangFormatLinterBot	88012c7daf	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33577744 fbshipit-source-id: 7ecc8367998ee1dffde54c2f4dd3cfafe19a53c9	2022-01-14 06:10:57 -08:00
Mike Ruberry	3a0c680a14	Jiterates exp2, erfc, erfinv and entr and refactors code_template.h to ATen (#71295 ) Summary: Per title. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/71295 Reviewed By: ngimel Differential Revision: D33575885 Pulled By: mruberry fbshipit-source-id: bc841b46fc0b5458a26a4d4465b18a7a54cd5a5b	2022-01-13 23:58:51 -08:00
Zhengxu Chen	5f2b4be3b9	[jit] Split DynamicType conformance test into smaller pieces. (#71275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71275 Currently it's taking more than 10 minutes to run the conformance test. Instead we should use parametrized test to shard into test segments so that they can run in parallel. ghstack-source-id: 146990608 Test Plan: ``` [zhxchen17@devbig560.ftw3 /data/users/zhxchen17/fbsource/fbcode] buck test mode/dev-tsan //caffe2/test/cpp/jit:jit -- -r 'LiteInterpreterDynamicTypeTestFixture' Building... 34.9 sec (99%) 12110/12111 jobs, 0/12111 updated Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: ebea52b3-7c7f-46be-9f69-18e2e7b040cc Trace available for this run at /tmp/tpx-20220113-113635.717778/trace.log RemoteExecution session id: reSessionID-ebea52b3-7c7f-46be-9f69-18e2e7b040cc-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4222124735827748 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 431 tests discovered (11.173) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/0 (51.331) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/1 (65.614) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/3 (76.875) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/5 (77.271) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/4 (78.871) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/6 (78.984) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/7 (84.068) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/2 (85.198) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/8 (88.815) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/9 (90.332) Summary Pass: 10 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124735827748 ``` Reviewed By: qihqi Differential Revision: D33570442 fbshipit-source-id: 5c49e03b0f88068d444c84b4adeaaf45433ce1fa	2022-01-13 18:22:55 -08:00
CodemodService FBSourceClangFormatLinterBot	60632a00fe	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33561057 fbshipit-source-id: 79873717c45c8bbe6d0ae760e718770fd960185d	2022-01-13 03:27:06 -08:00
Scott Wolchok	1bbea3c3a2	[PyTorch][JIT] Support mayContainAlias(Value, ArrayRef<Value>) (#69853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69853 We can implement this overload more efficiently. ghstack-source-id: 146924693 Test Plan: patched alias_analysis tests Time reported to initialize a predictor by static runtime when given ctr_mobile_feed local_ro net is 9.5s instead of 10.5s. Reviewed By: mikeiovine Differential Revision: D33039731 fbshipit-source-id: 52559d678e9eb00e335b9e0db304e7a5840ea397	2022-01-12 16:53:54 -08:00
Han Qi	1bc3571078	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#70201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70201 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer module object Compared to previous attempts, this diff only adds flatbuffer to cmake target and leaves fbcode/xplat ones unchanged. Test Plan: unittest Reviewed By: malfet, gmagogsfm Differential Revision: D33239362 fbshipit-source-id: b9ca36b83d6af2d78cc50b9eb9e2a6fa7fce0763	2022-01-12 16:30:39 -08:00
Raghavan Raman	9ca367d48b	[nnc] Use given kernel function name while emitting code (#67781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67781 Update `LLVMCodeGen` in NNC to use the given kernel function name while emitting code. This was earlier committed as D31445799 (`c30dc52739`) and got reverted as part of a stack of diffs that included a cache for `PyTorchLLVMJIT`, which was the likely culprit. Test Plan: ``` buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - LLVM.CodeGenKernelFuncName' ``` Reviewed By: ZolotukhinM, bdhirsh Differential Revision: D32145958 fbshipit-source-id: 5f4e0400c4fa7cabce5b91e6de2a294fa0cad88e	2022-01-12 15:49:17 -08:00
Tristan Rice	bfe1abd3b5	torch/monitor: add pybind (#69567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69567 This exposes torch.monitor events and stats via pybind11 to the underlying C++ implementation. * The registration interface is a tad different since it takes a lambda function in Python where as in C++ it's a full class. * This has a small amount of changes to the counter interfaces since there's no way to create an initializer list at runtime so they now also take a vector. * Only double based stats are provided in Python since it's intended more for high level stats where float imprecision shouldn't be an issue. This can be changed down the line if need arises. ``` events = [] def handler(event): events.append(event) handle = register_event_handler(handler) log_event(Event(type="torch.monitor.TestEvent", timestamp=datetime.now(), metadata={"foo": 1.0})) ``` D32969391 is now included in this diff. This cleans up the naming for events. type is now name, message is gone, and metadata is renamed data. Test Plan: buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor Reviewed By: kiukchung Differential Revision: D32924141 fbshipit-source-id: 563304c2e3261a4754e40cca39fc64c5a04b43e8	2022-01-12 13:35:11 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	70951884d4	Add option to load historic operators in IR when the operator is deprecated (#71148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71148 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D33521300 Pulled By: tugsbayasgalan fbshipit-source-id: a0607dba5e7233590384326537017eb0b18da419	2022-01-12 11:07:04 -08:00
Elias Ellison	5480deb183	Add support for permutting dynamic fusion group outputs to channels last format (#70656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70656 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D33458650 Pulled By: eellison fbshipit-source-id: f0c7d20743deac7a87f7c9176e60da8100aefe41	2022-01-12 09:11:34 -08:00
Elias Ellison	39be20f259	[JIT][NNC] Add handling of strides to dynamic shape support. (#70464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464 Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/: ``` S_ONE, // STRIDE_ONE: packed S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1] S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1] S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value ``` and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern. Output striding will be done in a follow up. The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes. As an example: ``` %8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)``` ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D33458649 Pulled By: eellison fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d	2022-01-12 09:11:31 -08:00
Elias Ellison	975e7d246e	Remove ignore shapes arg (#71144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71144 This wasn't being used anywhere. It was originally intended for the SR flow but we're doing something else now. Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D33521061 Pulled By: eellison fbshipit-source-id: 0574698a2b7409df6feb703f81e806d886225307	2022-01-12 09:09:49 -08:00
CodemodService FBSourceClangFormatLinterBot	93b2399c6c	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33544281 fbshipit-source-id: 4f0b5d6d490e6fcb967550cfb1dc0111b1770f73	2022-01-12 04:16:43 -08:00
Elias Ellison	9bccb31306	Remove precise tuple construct flag (#71121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71121 Test Plan: Imported from OSS Reviewed By: d1jang Differential Revision: D33515234 Pulled By: eellison fbshipit-source-id: 57cfe171b583a6bb4d3493a34b159061e97a11b8	2022-01-11 22:12:36 -08:00
Zhengxu Chen	9465c24245	[jit][edge] Use dynamic type instead of union types for schema parsers. (#70509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70509 TypeFactory will construct DynamicType when building on Edge platforms. We use this facility to make FunctionSchema return DynamicType all the time for OptionalType. We don't explicitly use DynamicTypeFactory everywhere because that requires too many changes and will split the entire aten codebase. ghstack-source-id: 146818621 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D33306737 fbshipit-source-id: d7ce00b438f7c03b43945d578280cfd254b1f634	2022-01-11 20:14:25 -08:00
Zhengxu Chen	e7634f83ce	[jit][edge] Migrate base types to DynamicType on mobile. (#70233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70233 Make type parser to produce DynamicType for all base types which don't have type arguments, and return DynamicType pointer for IValue::type(). ghstack-source-id: 146818622 Test Plan: no behavior change. Reviewed By: iseeyuan Differential Revision: D33137219 fbshipit-source-id: 1612c924f5619261ebb21359936309b41b2754f5	2022-01-11 13:53:29 -08:00
Zhengxu Chen	4f35b9144c	[jit][edge] Migrate ListType to DynamicType on mobile. (#70212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70212 Use DynamicType instead of ListType all over the place in Lite Interpreter. Namely we need to modify the following places: 1. Type parser which produces the Type constants. 2. IValue::type() which returns reflected Type from IValues. 3. Helper functions to construct the container value. 4. Typechecks which test whether a type instance is a particular container type. ghstack-source-id: 146818619 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D33176931 fbshipit-source-id: 9144787f5fc4778538e5c665946974eb6171a2e6	2022-01-11 10:57:53 -08:00
Zhengxu Chen	b12ca69179	[jit][edge] Migrate DictType to DynamicType on mobile. (#70202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70202 Use DynamicType instead of DictType all over the place in Lite Interpreter. Namely we need to modify the following places: 1. Type parser which produces the Type constants. 2. IValue::type() which returns reflected Type from IValues. 3. Helper functions to construct the container value. 4. Typechecks which test whether a type instance is a particular container type. ghstack-source-id: 146735648 Test Plan: no behavior change. Reviewed By: iseeyuan Differential Revision: D33137257 fbshipit-source-id: 971bf431658c422ea9353cc32cdab66e98876e9d	2022-01-10 15:55:29 -08:00
Zhengxu Chen	30699cbfd5	Reland D33284352: [jit][edge] Do not reuse mobile type parser for all unpicklers. (#71048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71048 reland D33284352 (`0a921ba0d0`) ghstack-source-id: 146735646 Test Plan: All Github CI: ciflow rerun -l ciflow/all Reviewed By: gmagogsfm Differential Revision: D33489731 fbshipit-source-id: 3e160209a1abb193ad3eed3018054aa7d331025e	2022-01-10 12:42:23 -08:00
Elias Ellison	fb66f561b1	Add copy out to the fallback path in SR invocation of composed op (#70871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70871 We had previously handled reusing memory in the optimized kernel execution path, but not yet handled it if we hit the unoptimized fallback. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33458652 Pulled By: eellison fbshipit-source-id: 4eb62181ed02c95813a99638f5e2d0f9347b5c08	2022-01-10 12:16:38 -08:00
Zhengxu Chen	9762aa0fdc	Revert D33284352: [jit][edge] Do not reuse mobile type parser for all unpicklers. Test Plan: revert-hammer Differential Revision: D33284352 (`0a921ba0d0`) Original commit changeset: 997c4f110b36 Original Phabricator Diff: D33284352 (`0a921ba0d0`) fbshipit-source-id: af316727442a64f1ae40d53d7a9d26ec550d634e	2022-01-07 19:58:03 -08:00
Zhengxu Chen	0a921ba0d0	[jit][edge] Do not reuse mobile type parser for all unpicklers. (#70338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70338 Today Unpickler is used by both server and mobile for deserializing model, and it always fallback to mobile parser when there's no type resolver provided by user. However this is not intended as server and mobile type parser supports different things. In this diff we provide a default fallback using script parser and opt it out for all mobile cases. ghstack-source-id: 146727330 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: iseeyuan Differential Revision: D33284352 fbshipit-source-id: 997c4f110b36eee6596e8f23f6a87bf91a4197ed	2022-01-07 18:35:32 -08:00
Jiewen Tan	338eb1b2b3	[LTC] Export torch::lazy::GetBackendDevice() (#70963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70963 This commit exports torch::lazy::GetBackendDevice(). Test Plan: CI in the lazy_tensor_staging branch. Reviewed By: wconstab Differential Revision: D33468938 Pulled By: alanwaketan fbshipit-source-id: f65599c9238bf6b4f4ffbd5194befdc267272831	2022-01-07 13:13:18 -08:00
Zhengxu Chen	1011ac188f	[jit][edge] Create DynamicType for OptionalType in mobile. (#68137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68137 A small step to replace existing OptionalType usage to DynamicType in Edge runtime. ghstack-source-id: 146670520 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D32264617 fbshipit-source-id: 62d3ffad40901842deac19ca2098ea5ca132e718	2022-01-07 11:23:12 -08:00
Zhengxu Chen	0517e719ac	[jit] Add conformance test for DynamicType with server JIT types. (#69482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69482 Add a test to enumerate a number of JIT type combinations and see if their subtyping behavior is preserved in the new DynamicType system. ghstack-source-id: 146670526 Test Plan: buck test mode/opt //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.DynamicType' Reviewed By: gmagogsfm Differential Revision: D32891263 fbshipit-source-id: 728211b39778e93db011b69b0a4047df78a8fc5b	2022-01-07 11:23:09 -08:00
Xiang Gao	6e16c9bb1d	Add support for deleteKey for FileStore (#69953 ) Summary: torch_ucc uses `deleteKey`, and trying to run PyTorch tests with torch_ucc leads to failure about `deleteKey not implemented for FileStore`. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69953 Reviewed By: ngimel Differential Revision: D33458457 Pulled By: H-Huang fbshipit-source-id: f46afd59f950722ae594d9aafb8843f14019e930	2022-01-07 06:20:59 -08:00
Mikhail Zolotukhin	8223ef1cd8	[TensorExpr] Clean-up logic for copying input tensors and remove some dead code. (#70535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70535 This also fixes handling of inputs that happen to be outputs (they require copy). Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D33399116 Pulled By: ZolotukhinM fbshipit-source-id: 9845838eb653b82ae47b527631b51893990d5319	2022-01-07 01:03:56 -08:00
Mikhail Zolotukhin	5d7cc8f22a	[TensorExpr] Add some graph-rewrite passes to prepare models for AOT compilation. (#66515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66515 These passes should not be used generally as they change API of the model's forward method, but they help experimenting with the model and ironing out all the kinks before it can be compiled properly. In the long run ideally we should provide a better way to enable such experiments. Differential Revision: D31590862 D31590862 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 74ded34c6c871d4cafa29f43dc27c7e71daff8fc	2022-01-07 01:03:53 -08:00
Joel Schlosser	e6befbe85c	Add flag to optionally average output attention weights across heads (#70055 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47583 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70055 Reviewed By: bhosmer Differential Revision: D33457866 Pulled By: jbschlosser fbshipit-source-id: 17746b3668b0148c1e1ed8333227b7c42f1e3bf5	2022-01-06 17:32:37 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	b0fdca8855	Bump version number to 7 and compile old operators with old schema (#68358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68358 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33433730 Pulled By: tugsbayasgalan fbshipit-source-id: 202c58365bae13195d3545cefcb0da9162b02151	2022-01-05 23:57:22 -08:00
Raghavan Raman	616afcf981	[jit] [shape analysis] Move constant tensors out of fused subgraphs during generalization (#70320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70320 ghstack-source-id: 146514368 Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/jit:jit` Reviewed By: eellison Differential Revision: D33280508 fbshipit-source-id: fe4291d7c49f0a498b330de96b698e99f6f6a505	2022-01-05 10:19:14 -08:00
Michael Suo	0ece9a49d7	Revert D33198155: Bump version number to 7 and compile old operators with old schema Test Plan: revert-hammer Differential Revision: D33198155 (`d35fc409ad`) Original commit changeset: 38a1185f9ecb Original Phabricator Diff: D33198155 (`d35fc409ad`) fbshipit-source-id: 411aaeb4e047aad9202db50d4d0f2ff35bc51f9d	2022-01-04 13:44:59 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	d35fc409ad	Bump version number to 7 and compile old operators with old schema (#68358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68358 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198155 Pulled By: tugsbayasgalan fbshipit-source-id: 38a1185f9ecb34a33f737ad0b060b3490956300c	2022-01-04 01:31:25 -08:00
Salil Desai	35251a5528	[PyTorch] Add Enum to IValue Deepcopy (#69937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69937 This enables ```export_torch_mobile_model``` compatibility with Enum IValues Test Plan: ModuleAPITest.DeepCopyEnum Reviewed By: gmagogsfm Differential Revision: D33104681 fbshipit-source-id: ca2a6d259c312487fe38dd1bed33ab6b7910bc2a	2021-12-30 07:52:22 -08:00
George Qi	8af39b7668	AdaptiveLogSoftmaxWithLoss no_batch_dim support (#69054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69054 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33200166 Pulled By: george-qi fbshipit-source-id: 9d953744351a25f372418d2a64e8402356d1e9b7	2021-12-29 10:25:26 -08:00
Bo Wu	bf610f08b0	Back out "Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions" Summary: as title Test Plan: ``` buck run mode/opt-split-dwarf -c=python.package_style=inplace //ai_infra/distributed_ai/pyper_test_framework/templates:pyper_release_v2 -- --model inline_cvr_post_imp_deterministic_shrunk_pyper_release_v2 --cluster TSCTestCluster --hpc_identity oncall_pyper_oncall --stage prod_offline_training --test_module training_platform ... ############## Start inline_cvr_post_imp_model Test Results Analysis ############## I1226 22:03:56.789000 3346280 test_driver.py:139 UNKNOWN ] Test finished in 808.2743511786684 seconds. +-------------------------+---------+------------------------+-----------------+ \| Test Case \| Status \| Message \| Model Entity ID \| +-------------------------+---------+------------------------+-----------------+ \| SmallWorld_release_test \| Success \| finished successfully. \| 987987491 \| +-------------------------+---------+------------------------+-----------------+ I1226 22:03:56.790000 3346280 test_driver.py:143 UNKNOWN ] test_run_id: 3d085f61-28d1-411d-bd27-940ea2554b23 use this id to find your run in scuba pyper_test_framework I1226 22:03:56.792000 3346280 test_driver.py:160 UNKNOWN ] Calling cleanup I1226 22:03:56.792000 3346280 training_platform_test_launcher.py:385 UNKNOWN ] Stopping launched jobs 1 I1226 22:03:59.563122 3346280 ClientSingletonManager.cpp:100] Shutting down Manifold ClientSingletonManager ``` Reviewed By: seemethere Differential Revision: D33325936 fbshipit-source-id: 64414bf7061ad77e8ac12eb8abafee4043e0fa1e	2021-12-27 09:11:46 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	4ae71c8d34	Add graph op replacement pass (#69915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69915 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198158 Pulled By: tugsbayasgalan fbshipit-source-id: f2b924edf9959aaf51f97db994fae031fa062cf8	2021-12-25 13:03:19 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	df3cbcff28	Add utility methods to find an upgrader (#68355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68355 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198156 Pulled By: tugsbayasgalan fbshipit-source-id: 68380148f0d9bee96d8090bf01c8dfca8e1f8b12	2021-12-24 12:23:04 -08:00
Shunting Zhang	911d527b87	Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions (#70339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339 When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message. Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName . Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want. Code under scripts/shunting are just my own experimental code. I can split them out if requested. ghstack-source-id: 146221879 Test Plan: buck test mode/opt //caffe2/test:jit Reviewed By: gmagogsfm Differential Revision: D33282878 fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d	2021-12-24 00:25:40 -08:00
Jiewen Tan	ab57f6d12c	[LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069 This commit upstreams utils to extract BackendDevice from at::Tensor. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice* Reviewed By: samdow Differential Revision: D33293160 Pulled By: alanwaketan fbshipit-source-id: 78647239f90b4d04adce84ae6022b8983ad30c09	2021-12-23 12:42:03 -08:00
Michael Suo	795af1578c	Revert D33172665: [LTC] Upstream utils to extract BackendDevice from at::Tensor Test Plan: revert-hammer Differential Revision: D33172665 (`121d067999`) Original commit changeset: b334ee358ea7 Original Phabricator Diff: D33172665 (`121d067999`) fbshipit-source-id: 8bff43cddfc5d30483ec5cea8eff037aab9d1cfa	2021-12-22 21:12:49 -08:00
Jiewen Tan	121d067999	[LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069 This commit upstreams utils to extract BackendDevice from at::Tensor. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice* Reviewed By: wconstab Differential Revision: D33172665 Pulled By: alanwaketan fbshipit-source-id: b334ee358ea7b031bbffb0a16fa634715dba83f5	2021-12-22 18:15:45 -08:00
vfdev-5	ce9a2f8ba9	[C++ API] Added missing nearest-exact mode and anti-alias flag (#69318 ) Summary: Description: Following https://github.com/pytorch/pytorch/pull/65142#issuecomment-981995692 adding missing nearest-exact mode and anti-alias flag to C++ frontend. - https://github.com/pytorch/pytorch/pull/65142 - https://github.com/pytorch/pytorch/pull/64501 - added tests in pytorch/test/cpp/api/functional.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/69318 Reviewed By: davidberard98 Differential Revision: D33278995 Pulled By: jbschlosser fbshipit-source-id: fa87c0c78df6b398e4f9688cc02111eed187afa7	2021-12-22 11:10:51 -08:00
Jiewen Tan	e02d836cb2	[LTC] Upstream LTCTensorImpl (#70062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70062 This commit upstreams LTCTensorImpl from the lazy_tensor_staging branch. It inherits from c10::TensorImpl and thus manages the lifetime/storage of LazyTensor. Test Plan: ./build/bin/test_lazy --gtest_filter=LazyTensorImplTest.* Reviewed By: desertfire Differential Revision: D33171186 Pulled By: alanwaketan fbshipit-source-id: 6af9f91cc7c7e997f120cb89a7bcd6785c03ace0	2021-12-22 03:21:52 -08:00
Raghavan Raman	4dec15e6d8	[nnc] Add a run method to TensorExprKernel that takes in output tensors (#69477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69477 This diff adds a new run method to `TensorExprKernel` which takes in output tensors as inputs and stores the output in those given tensors. ghstack-source-id: 146107009 Test Plan: buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.RunWithAllocatedOutputs' Reviewed By: ZolotukhinM Differential Revision: D32823890 fbshipit-source-id: edc1f4839785124048b034060feb71cb8c1be34f	2021-12-22 00:30:15 -08:00
George Qi	bb51519937	bug fix FractionalMaxPool2d (random_samples dimensions) (#70031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70031 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33200618 Pulled By: george-qi fbshipit-source-id: 142f224c2cab1008d2d4e9ed333697a92d2d42db	2021-12-21 12:21:54 -08:00
Hui Guo	7abb7667a6	[tensorexpr] Add memory planning to reuse intermediate buffers (#66452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66452 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31557188 Pulled By: huiguoo fbshipit-source-id: f18dfeba1df20d5d4f118640fc10782534eb9219	2021-12-17 01:38:02 -08:00
Hui Guo	bbfd7b75ca	[tensorexpr] Move the allocation of intermediate buffers from TEK to CodeGen (#67143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67143 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31881151 Pulled By: huiguoo fbshipit-source-id: 457e5d4ff8a15f70af9c797c9ab4803d8e779abe	2021-12-17 01:37:56 -08:00
Hui Guo	c7e0951524	[tensorexpr] Add a stmt recorder to obtain stmt PCs (#66450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66450 Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D31557189 Pulled By: huiguoo fbshipit-source-id: 416d79ddfc46a0109187cdeb919ad9b5abde8030	2021-12-17 01:36:37 -08:00
Zhengxu Chen	d459e79500	[jit][edge] Remove usage of shared_ptr<mobile::Code>. (#68037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68037 Right now mobile::Code doesn't outlive its enclosing Function, and all accesses to Code happens inside interpreter loop which doesn't outlive the module, so we don't need to use std::shared_ptr here. This also should saves us 1-2 KB for binary size, because shared_ptr seems to bloat on arm64 android. ghstack-source-id: 145818696 Test Plan: eyes. Reviewed By: qihqi, tugsbayasgalan Differential Revision: D32264616 fbshipit-source-id: d83f538d6604cf75fd7728a25127b4849ce7ab2a	2021-12-16 13:11:46 -08:00
Jiawei Lv	b4c4a015d6	Revert D33163841: Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers" Test Plan: revert-hammer Differential Revision: D33163841 Original commit changeset: e262b6d8c80a Original Phabricator Diff: D33102715 (`eb374de3f5`) fbshipit-source-id: 644216036a238a458f0a2198460b36d24fb035f8	2021-12-16 11:12:18 -08:00
Jiawei Lv	c80b5b8c8f	Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers" Test Plan: revert-hammer Differential Revision: D33102715 (`eb374de3f5`) Original commit changeset: 3816ff01c578 Original Phabricator Diff: D33102715 (`eb374de3f5`) fbshipit-source-id: e262b6d8c80a05f3a67e024fedfbadefdbfe6e29	2021-12-16 09:39:57 -08:00
David Berard	8c7f4a0d0b	[tensorexpr] check for index out of bounds in ir_eval (#68858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68858 when executing with ir_eval, check for index out of bounds. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32657881 Pulled By: davidberard98 fbshipit-source-id: 62dd0f85bb182b34e9c9f795ff761081290f6922	2021-12-16 09:27:45 -08:00
jiej	76d282d447	Nvfuser code bump 12 5 (#69964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69964 Things added in this PR that requires review: 1. cuLaunchCooperativeKernel driver API added aten/src/ATen/cuda/detail/LazyNVRTC.cpp aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h nvfuser code update: 1. perf turning on codegen scheduler that improves performance. 2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark) Things reverted from local changes: 1. aten::gelu with approximation 2. local changes that is upstreamed in PR https://github.com/pytorch/pytorch/issues/68804 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69428 Reviewed By: ngimel Differential Revision: D33073817 Pulled By: wconstab fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb	2021-12-16 08:28:54 -08:00
Tristan Rice	eb374de3f5	Back out "Revert D32606547: torch/monitor: add C++ events and handlers" (#69923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69923 Original commit changeset: fbaf2cc06ad4 Original Phabricator Diff: D32606547 (`e61fc1c03b`) This is the same thing as the original diff but just using a normal std::mutex instead of std::shared_timed_mutex which is not available on OSX 10.11. The performance difference should be negligible and easy to change down the line if it does become a bottleneck. Old failing build: https://github.com/pytorch/pytorch/runs/4495465412?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783 Test Plan: buck test //caffe2/test/cpp/monitor:monitor will add ciflow tags to ensure mac builds are fine Reviewed By: aivanou Differential Revision: D33102715 fbshipit-source-id: 3816ff01c578d8e844d303d881a63cf5c3817bdb	2021-12-15 22:51:43 -08:00
Taylor Robie	24bc3be146	[Profiler] Clean up profiler includes. (#69421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421 I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` solely to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace. Test Plan: Unit tests and CI. Reviewed By: aaronenyeshi, albanD Differential Revision: D32865907 fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e	2021-12-15 12:50:24 -08:00
Chen Lai	408283319a	[Operator Versioning][Edge] Change OP to CALL when there is a valid upgrader (#67731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67731 1. Register upgrader function at loading stage 2. Change OP to CALL when there operator_version from model is smaller than current runtime version and there exists a valid upgrader The interpreter log is : ``` RUNNING 0 STOREN 1 3 RUNNING 1 DROPR 1 RUNNING 2 LOAD 2 RUNNING 3 LOAD 3 RUNNING 4 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 5 LOAD 2 RUNNING 6 LOAD 3 RUNNING 7 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 8 MOVE 2 RUNNING 9 MOVE 3 RUNNING 10 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 11 TUPLE_CONSTRUCT 3 RUNNING 12 RET ``` The upgrader bytecode is: ``` (STOREN, 1, 2) (LOAD, 1, 0) (OP, 0, 0) (JF, 3, 0) (LOADC, 1, 0) (JMP, 3, 0) (LOAD, 2, 0) (OP, 0, 0) (STORE, 3, 0) (MOVE, 3, 0) (JF, 5, 0) (LOAD, 1, 0) (LOAD, 2, 0) (OP, 1, 0) (JMP, 5, 0) (LOAD, 1, 0) (LOAD, 2, 0) (LOADC, 0, 0) (OP, 2, 0) (STORE, 4, 0) (DROPR, 2, 0) (DROPR, 1, 0) (MOVE, 4, 0) (RET, 0, 0) ``` ghstack-source-id: 145635622 Test Plan: describe in summary and CI Reviewed By: iseeyuan Differential Revision: D32092517 fbshipit-source-id: 0314b4bda5d2578cdd4e7cfbfd1e3c07fbccf8a3	2021-12-14 19:13:12 -08:00
Chen Lai	9e4d60a552	[Operator Versioning][Edge] Use check in cpp source file for upgrader (#67728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67728 1. Check in upgrader_mobile.h and upgrader_mobile.cpp 2. Add test to parse all bytecode from upgrader_mobile.h ghstack-source-id: 145635621 Test Plan: buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterUpgraderTest.Upgrader' Reviewed By: iseeyuan Differential Revision: D32087295 fbshipit-source-id: 21e95aabb5e9db76be27e01adfea8fbc41caeaf6	2021-12-14 19:10:51 -08:00
Michael Suo	f565167fbd	Revert D32606547: torch/monitor: add C++ events and handlers Test Plan: revert-hammer Differential Revision: D32606547 (`e61fc1c03b`) Original commit changeset: a00d0364092d Original Phabricator Diff: D32606547 (`e61fc1c03b`) fbshipit-source-id: fbaf2cc06ad4bec606e8a9c6f591d65c04e6fa56	2021-12-11 22:51:03 -08:00
Tristan Rice	e61fc1c03b	torch/monitor: add C++ events and handlers (#68783 ) Summary: This adds a C++ event handler corresponding to the Python one mentioned in the RFC. This changes the counters a bit to all be push driven instead of being polled. The two window types are "fixed count" and "interval". One is based off the number of logged events and the other is based off of time windows. There's currently no active ticker for interval so it needs a regular stream of events to ensure events are produced. A follow up diff can add support for things like HHWheel / simple ticker. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783 Test Plan: buck test //caffe2/test/cpp/monitor:monitor Reviewed By: kiukchung Differential Revision: D32606547 fbshipit-source-id: a00d0364092d7d8a98e0b18e503c0ca8ede2bead	2021-12-11 16:44:46 -08:00
Yanan Cao	17f3179d60	Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796 (Note: this ignores all push blocking failures!) Test Plan: External CI + Sandcastle Reviewed By: zhxchen17 Differential Revision: D33032671 fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef	2021-12-10 21:29:53 -08:00
Hao Lu	91d16cb633	[Jit] Fix schema of aten::split int[] version (#69745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69745 Missed in D31935573 (`6b44e75f6b`). Reviewed By: d1jang Differential Revision: D31889867 fbshipit-source-id: 417bd0b15db4891dbd641b35a803553f11d0d756	2021-12-10 02:33:36 -08:00
Nikita Shulga	3bb20ae49f	Make c10d tests -Werror clean (#69703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69703 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D32997001 Pulled By: malfet fbshipit-source-id: 38b5f195c04f2b3b920e6883a96fe9a36345b9d2	2021-12-09 22:10:04 -08:00
Ivan Kobzarev	7dba88dfdb	[nnc][quant] Fix quantized concat (#69596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69596 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32941108 Pulled By: IvanKobzarev fbshipit-source-id: 727f608b98625648e2e444396d910838c95f58f2	2021-12-09 18:55:32 -08:00
Peter Bell	b2e79ed5ec	Remove WindowsTorchApiMacro.h in favor of Export.h (#69585 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/68095 This also changes the files from the ATen folder to include c10's `Export.h` instead since they can't ever be exporting `TORCH_PYTHON_API`. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69585 Reviewed By: mrshenli Differential Revision: D32958594 Pulled By: albanD fbshipit-source-id: 1ec7ef63764573fa2b486928955e3a1172150061	2021-12-09 17:30:09 -08:00
Han Qi	d3649309e6	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Test Plan: unittests Reviewed By: gmagogsfm Differential Revision: D32806835 fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57	2021-12-09 14:53:31 -08:00
Richard Barnes	afb742382a	use irange for loops 10 (#69394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69394 Modified loops in files under fbsource/fbcode/caffe2/ from the format ``` for(TYPE var=x0;var<x_max;x++) ``` to the format ``` for(const auto var: irange(xmax)) ``` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D32837991 fbshipit-source-id: fc7c4f76d2f32a17a0faf329294b3fe7cb81df32	2021-12-09 09:49:34 -08:00
Chen Lai	13faaff54c	[Operator Versioning][Edge] Implement register function for upgrader (#67730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67730 This pr implement the register function for upgrader so it can be used at loading stage ghstack-source-id: 145170986 Test Plan: ``` buck test //caffe2/test/cpp/jit:jit ``` Reviewed By: iseeyuan Differential Revision: D32092518 fbshipit-source-id: 779b51eb12b8cb162a93a55c1e66fe0becc4cb36	2021-12-09 02:18:09 -08:00
Peter Bell	e279963eef	Remove remaining THC code (#69039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69039 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872476 Pulled By: ngimel fbshipit-source-id: 7972aacc24aef9450fb59b707ed6396c501bcb31	2021-12-08 12:18:08 -08:00
Bin Bao	e8f4c9cc40	[LT] Upstream LazyView and view ops IR Nodes (#69277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69277 LazyView is the main class for tracking alias caused by view ops. The corresponding IR classes for view ops are hand-written now, and we can switch to code-gen them in future. For certain view ops, they have a reverse IR class to perform inplace update in the backward direction on a chain of alias ops. As part of the future work, we will simplify the logic for LazyView once the functionalization pass in core is ready to use. Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32820014 Pulled By: desertfire fbshipit-source-id: d9eb526cb23885f667e4815dc9dd291a7b7e4256	2021-12-04 08:44:54 -08:00
Ramanpreet Nara	f587267dc7	Revert D31705359: use irange for loops 8 Test Plan: revert-hammer Differential Revision: D31705359 (`17e5200441`) Original commit changeset: c9ea2fbc0f9c fbshipit-source-id: 08fff2d12beca953ad30dd0baabf86e39ac84f14	2021-12-02 12:55:08 -08:00
Richard Barnes	17e5200441	use irange for loops 8 (#66743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705359 fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b	2021-12-02 10:21:29 -08:00
Alban Desmaison	00ebbd5ef6	Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer Test Plan: revert-hammer Differential Revision: D32010095 (`41d35dc201`) Original commit changeset: d763b0557780 fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d	2021-12-02 06:41:40 -08:00
Han Qi	41d35dc201	Add ability for a mobile::Module to save as flatbuffer (#67351 ) Summary: Included functions: * save_mobile_module -> saves a mobile::Module to flatbuffer * load_mobile_module_from_file -> loads a flatbuffer into mobile::Module * parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351 Reviewed By: iseeyuan Differential Revision: D32010095 Pulled By: qihqi fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1	2021-12-01 23:58:15 -08:00
Jacob Szwejbka	291e56eda4	[Pytorch Edge] Update Black Box Api with operator versioning (#68678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68678 Test Plan: Ill update the unit test before land Reviewed By: cccclai Differential Revision: D32573603 fbshipit-source-id: 19271bcbb68b61d24d6943e61a943f4f75fddb5d	2021-12-01 19:13:32 -08:00
Chen Lai	b9738e923e	[Operator Versioning][Edge] Add old models and unittest (#67726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67726 1. Check in one model with aten:div_tensor old op with unittest in both cpp and python. The following two lines are commented out and expected to work after using upgrader. ``` _helper(mobile_module_v2, div_tensor_0_3) _helper(current_mobile_module, torch.div) ``` 2. Update the commented code accordingly. Currently there are 6 upgraders. The following old models with operators are added to cover these 6 upgraders: ``` // Tensor x Tensor test_versioned_div_tensor_v3 // Tensor x Scalar test_versioned_div_scalar_float_v3 test_versioned_div_scalar_reciprocal_int_v3 test_versioned_div_scalar_inplace_float_v3 // Scalar x Scalar test_versioned_div_scalar_scalar_v3 // Tensor x Tensor with out kwarg test_versioned_div_tensor_out_v3 // Tensor x Tensor inplace test_versioned_div_tensor_inplace_v3 // Tensor x Scalar inplace test_versioned_div_scalar_inplace_int_v3 ``` Note: In this pr, per model, it includes the following test: 1. Model (with old op) load/run test will be in both cpp and python 2. Model (with old op) + upgrader test will be in python Other tests considered adding: 1. per upgrader bytecode test 2. app level integration test ghstack-source-id: 144422418 Test Plan: CI and the added unittest Reviewed By: iseeyuan Differential Revision: D32069653 fbshipit-source-id: 96d9567088a1f709bc7795f78beed7a308e71ca9	2021-12-01 18:46:30 -08:00
Jiewen Tan	e6c435bf96	[LTC] Upstream helpers for c10::Device <=> BackendDevice (#69064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69064 This commit upstreams helpers for converting a c10::Device to BackendDevice and vice versa. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.FromAten:BackendDeviceTest.ToAten Reviewed By: wconstab Differential Revision: D32732607 Pulled By: alanwaketan fbshipit-source-id: 0dd233d37a4a30fc4b22dba322ddd85d4cb3635b	2021-12-01 12:15:32 -08:00
Scott Wolchok	1d84d8c5d8	[PyTorch] Remove StringView from RecordFunction interface (1/2) (#68410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68410 First step toward not heap-allocating a string in RecordFunction::before() every time ghstack-source-id: 144287654 Test Plan: CI Reviewed By: chaekit Differential Revision: D32453847 fbshipit-source-id: 080d95095fb568287b65fcc41a4ca6929b5f9a87	2021-11-30 13:20:08 -08:00
Joel Schlosser	8fef7c09f5	Remove finput from slow2d signatures (#68896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68896 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32655874 Pulled By: jbschlosser fbshipit-source-id: 3c9acb106961c40af1432652179edb2bc5a4bfa5	2021-11-30 09:47:24 -08:00
Jiewen Tan	0cdeb586ae	[LTC] Upstream some utilities (#69046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69046 This commit upstreams utilities including ExceptionCleanup, MaybeRef, Iota, ToVector, ToOptionalVector and GetEnumValue. Test Plan: ./build/bin/test_lazy --gtest_filter=UtilTest.* Reviewed By: wconstab, Chillee Differential Revision: D32709090 Pulled By: alanwaketan fbshipit-source-id: 5147433becd4dbb07be7d36d66b0b8685054d714	2021-11-30 02:44:02 -08:00
Mikhail Zolotukhin	75ce040620	[TensorExpr] Allow for 'keepdim' argument in aten::mean in NNC's external call. (#68756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68756 That fixes some warnings in our tests. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32600952 Pulled By: ZolotukhinM fbshipit-source-id: 548eaf3659e20795cce44d8f57e77f4a47d44d98	2021-11-30 00:06:34 -08:00
Vinnam Kim	7b701ce2d4	Add set_to_none option to C++ API (#68801 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68167. Signed-off-by: Vinnam Kim <vinnam.kim@makinarocks.ai> Pull Request resolved: https://github.com/pytorch/pytorch/pull/68801 Reviewed By: mruberry Differential Revision: D32625239 Pulled By: jbschlosser fbshipit-source-id: 5f09b959e23d5448106a47029d06ec20ad094d82	2021-11-29 08:42:39 -08:00
Bin Bao	787ded5103	Add lazy::Shape::numel() (#68314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68314 Add a convenience to lazy::Shape for counting the number of elements (by multiplying out the dimensions). This is a method on Tensor, and in switching other lazy tensor shape utils to use aten shape inference, we need numel counts. Test Plan: add unit tests Reviewed By: alanwaketan Differential Revision: D32409138 fbshipit-source-id: 3ae725300f8826d38e45412f46501d5e5f776fb2	2021-11-29 08:38:09 -08:00
Han Qi	959cb03132	Populate operator_input_sizes_ (#68542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68542 title Test Plan: unittest Reviewed By: iseeyuan Differential Revision: D32508159 fbshipit-source-id: 0773a725973a493f19a2e9a340365e559dfdf7f8	2021-11-23 12:18:06 -08:00
Tristan Rice	758d7dea9c	torch.monitor - Initial C++ Stats (#68074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68074 This is the first step of many PRs towards implementing the `torch.monitor` RFC https://github.com/pytorch/rfcs/pull/30 This defines the aggregation types, the `Stat` class and provides some simple collection of the stats. This doesn't match the RFC exactly as it incorporates some of the comments on the RFC as well as a few changes for performance. Changes: * added window_size to the stats. If specified it will always compute the stat using the `window_size` number of values. If there aren't enough values within that window it reports the previous stats. * This doesn't include the push metrics yet (will be coming). After more discussion it looks like the best way to handle this is to support a hybrid where the metric can set how frequently it'll be logged. For fixed window_size metrics it'll be logged each time it hits the window size. This will allow performant counters as well as lower frequency push counters (window_size=1). Performance considerations: * Updating the stats acquires a lock on that Stat object. This should be performant unless there's many-many threads writing to the same stat. Single thread will typically use futex so should be quite fast. * Adding/removing/fetching all stats sets a global lock on the stat list -- this shouldn't be an issue since these events happen infrequently. * Fetching stats accesses one stat at a time instead of a global lock. This means the exported values are linearizable but not serializable across multiple stats but I don't expect this to be an issue. Next steps: 1. Add StatCollector interface for push style metrics 1. Add pybind interfaces to expose to Python 1. Add default metric providers 1. Integrate into Kineto trace view Test Plan: buck test //caffe2/test/cpp/monitor:monitor CI Reviewed By: kiukchung Differential Revision: D32266032 fbshipit-source-id: dab8747b4712f5dba5644387817a3a0fda18b66a	2021-11-18 21:46:23 -08:00
Hongyi Jia	146a7f68e2	Enable desync root cause analysis for NCCL (#68310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68310 Enable desync root cause analysis by recording the last footprint of collective calls. When timeout we parse the store trace and figure out the root cause of the desync issue. This feature is built based on async error handling. Test Plan: Standalone test * Typical desync - P467288969 * Mismatched collectives - P467288916 * Mismatched broadcast size - P467288873 DDP benchmark * DDP benchmark desync - P467433483, P467520195 No perf regression: * w/o this diff https://www.internalfb.com/intern/fblearner/details/308379789?tab=Outputs * w/ this diff https://www.internalfb.com/intern/fblearner/details/308534088?tab=Outputs Reviewed By: mingzhe09088 Differential Revision: D32348647 fbshipit-source-id: 43e7e96e3fa2be0ac66c1325bceb639b461a8b3a	2021-11-17 20:29:03 -08:00
Han Qi	4eb772fde6	Refactor saving jit::Module to mobile .pt in 2 steps: (#66494 ) Summary: 1. is to convert Function -> mobile::Function 2. is to serialize mobile::Function This also opens opportunity to create mobile::Module without saving/reloading Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66494 Reviewed By: zhxchen17 Differential Revision: D32293022 Pulled By: qihqi fbshipit-source-id: 29b43d47ff86071d5e2f9d6ca4dba4445711ce3d	2021-11-17 12:02:20 -08:00
jjsjann123	0dc3f829d9	Nvfuser code bump 11 5 (#67943 ) Summary: nvfuser code update: 1. Tuning heuristics on schedulers for reduction/normalization kernels; 2. bfloat16 on IO tensor support; 3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last; 4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`. Things that are reverted from our local branch: 1. changes on some entries in autodiff 2. aten::gelu with approximation 3. native_dropout(_backward) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943 Reviewed By: ngimel Differential Revision: D32288709 Pulled By: dzhulgakov fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1	2021-11-17 01:22:17 -08:00
Raghavan Raman	2fd468e5f8	[jit] Set the graph input types before interpreting the graph during tracing (#68242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68242 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D32382958 Pulled By: navahgar fbshipit-source-id: 4e82a604a9ea2046af2755de23944147e618a65f	2021-11-15 15:44:32 -08:00
Mike Iovine	c697eeba72	[JIT] Combine concat nodes where possible (#67000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67000 See the [related issue](https://github.com/pytorch/pytorch/issues/66654) for context. This new JIT optimization transforms patterns like this: ``` %inputs.1 : Tensor[] = prim::ListConstruct(%a, %b, %c) %concat.1 : Tensor = aten::cat(%inputs, %dim) %inputs.2 : Tensor[] = prim::ListConstruct(%x, %concat.1, %y) %concat.2 : Tensor = aten::cat(%inputs.2, %dim) ``` into this: ``` %inputs.2 : Tensor[] = prim::ListConstruct(%x, %a, %b, %c, %y) %concat.2 : Tensor = aten::cat(%inputs.2, %dim) ``` (it can do this for chains of `aten::cat` longer than 2 as well) A few conditions have to hold: 1. The `dim`s have to match. 2. `inputs.1` and `inputs.2` cannot be mutated Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOpt` Reviewed By: d1jang Differential Revision: D31819491 fbshipit-source-id: 9f1a501d52099eb1a630b5dd906df4c38c3817ba	2021-11-15 12:02:45 -08:00
Mikhail Zolotukhin	e511a7a5b4	[TensorExpr] Remove non-determinism in iterating over unordered_set of intermediate buffers. (#68277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68277 Differential Revision: D32400553 D32400553 Test Plan: Imported from OSS Reviewed By: saketh-are, priyaramani Pulled By: ZolotukhinM fbshipit-source-id: a8fe820bbddaa19f95db432efaa6d3e36095a05e	2021-11-13 00:50:57 -08:00
Will Constable	6ddaf3bd37	[LT] Upstream TsNode, TsNodeLowering, TsLoweringContext (#68154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68154 Test Plan: added a basic test; cover more by using lazy_tensor_staging tests Reviewed By: Krovatkin, alanwaketan Differential Revision: D32224303 fbshipit-source-id: ac3e1161229b8ae60fdb15ffa72e17072b595914	2021-11-12 12:57:20 -08:00
Will Constable	dc24503a89	Fix Hash(c10::Scalar), account for garbage data in union (#68201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68201 Hash(c10::Scalar) made a bad assumption that it was valid to just hash over all the bytes of data of the c10::Scalar struct. Becuase c10::Scalar stores a union of different (float/int/complex) types with different sizes, not all bytes are valid in all cases. Hash() should only read the bytes corresponding to the currently active type. Test Plan: Added new unit tests. Verified HashTest.Scalar failed with the original Hash() impl and then fixed. Reviewed By: alanwaketan Differential Revision: D32367564 fbshipit-source-id: ac30dd4f6dd0513954986d3d23c0c11ba802c37b	2021-11-12 07:20:08 -08:00
Howard Huang	7b376bf844	Remove ProcessGroup from TensorPipeAgent initialization (#68128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68128 Reland of D31762735 (`0cbfd466d2`). This diff was originally reverted due to failure in test_send_export_type_through_rpc_with_custom_pickler. I updated rpc_pickler_test.py to prevent a race condition where processes were not registering their pickler before handling their rpc_sync calls. Test Plan: rpc_pickler_test file: buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test //caffe2/torch/fb/training_toolkit/backend/metrics/collectors/fbdata_aggregator/tests:batch_collector_test -- --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx rpc_pickler stress test: buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test -- --exact 'caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test - test_send_export_type_through_rpc_with_custom_pickler (caffe2.torch.fb.training_toolkit.backend.metrics.tests.rpc_pickler_test.CythonTypeRpcSpawnTest)' --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx --jobs 18 --stress-runs 10 --record-results Reviewed By: mrshenli Differential Revision: D32316077 fbshipit-source-id: e58de2335fbaa3ab46d46fe222c659197633a5e4	2021-11-11 12:28:55 -08:00
Martin Yuan	bd5f33f91e	demo backend decoupled from operators (#66100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66100 A backend should not directly dependent on ATen operators. The demo backend is changed to that way for testing purpose. Test Plan: Imported from OSS Reviewed By: pavithranrao Differential Revision: D31384614 Pulled By: iseeyuan fbshipit-source-id: c97f0c4aa12feb1d124f1d7a852e9955a7a2ce42	2021-11-11 10:26:17 -08:00
Will Constable	d6e6064efc	[LT] Upstream backend interfaces (#67927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67927 BackendData - represents 'tensor data' in opaque backend storage LoweringContext - interface for performing backend-specific IR lowering BackendImplInterface - interface for lazy tensors backends to implement Reorgs backend-related files into lazy/backend subdir includes a few small fixes, which were made on lazy_tensor_staging but need to be back-ported to master. Test Plan: used by lazy_tensor_staging branch Reviewed By: desertfire Differential Revision: D32142032 fbshipit-source-id: 828c717bcd0d511876e64ad209b50f7bfb10cec5	2021-11-10 12:55:31 -08:00
Jiewen Tan	6011c35a79	[LTC] Upstream class BackendDevice (#68027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68027 This commit upstreams class BackendDevice to the master, which is a backend specific representation of the actual hardware, for instances, CPU, GPU, or TPU. This concept is important for backend like XLA where it needs to tell the actual hardware type from the c10::DeviceType::Lazy virtual device during both IR constructions and lowerings. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.* Reviewed By: wconstab Differential Revision: D32261838 Pulled By: alanwaketan fbshipit-source-id: 579c3fc5f9da7847c887a383c6047e8ecb9cc5bc	2021-11-10 07:05:43 -08:00
Bin Bao	a027551358	[LT] Merge cache.h (#67929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67929 1. Write a node-hash based unit test for Cache 2. Replace CHECK with TORCH_CHECK in IrUtil Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32246134 Pulled By: desertfire fbshipit-source-id: c464bc300126d47e9ad4af3b3e8484a389757dc0	2021-11-09 12:02:02 -08:00
Bin Bao	a473417076	[LT] Merge permutation_util into master (#67766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67766 Test Plan: `build/bin/test_lazy` Reviewed By: wconstab Differential Revision: D32147676 Pulled By: desertfire fbshipit-source-id: 528b48c9cf789abc171235091c7146b2ab7a9c76	2021-11-09 12:00:39 -08:00
Howard Huang	9fb3ba9d7b	Revert D31762735 (#67924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67924 This diff reverts the changes made in D31762735 (`0cbfd466d2`) Test Plan: Wait for CI Reviewed By: derekmod-fb Differential Revision: D32214744 fbshipit-source-id: e0a65b6a31a88216ae1243549fcbc901ef812374	2021-11-06 17:34:13 -07:00
Chen Lai	ae501a9727	[PyTorch Edge] Update bytecode version compatibility check (#67417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67417 bytecode version is valid when it's smaller than kMaxSupported and larger than kMinSupported ghstack-source-id: 142609392 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail' ``` Reviewed By: JacobSzwejbka, iseeyuan Differential Revision: D31984839 fbshipit-source-id: 2011e77455c931c0a8a58267494d44bcf167b877	2021-11-05 19:34:01 -07:00
Raghavan Raman	e7a3bbce89	[nnc] Add support for dynamic shapes in TensorExprKernel (#67861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67861 Previously submitted as https://github.com/pytorch/pytorch/pull/67197. This got reverted because its failures were hidden by the failures of another PR. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32178196 Pulled By: navahgar fbshipit-source-id: cc8a5c68aed360d06289e69645461cfa773e1300	2021-11-05 11:18:19 -07:00
Jiewen Tan	8bed46ef38	[WIP][LTC] Upstream class Shape (#67672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67672 This commit Upstreams class Shape from lazy_tensor_staging branch. Test Plan: WIP. Reviewed By: malfet Differential Revision: D32095478 Pulled By: alanwaketan fbshipit-source-id: 61611b12fc079b195833b5b22a6cf73c0935b8b9	2021-11-04 14:12:03 -07:00
Rohan Varma	90d311b268	[RPC] Add exception logging to constValue() (#67802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67802 In RPC C++ code, we might sometimes call constValue() when the future actually has an exception, and in unittests we want to assert on the exception. What happens is that we get a message basically saying "!eptr_" which indicates there is some exception but we don't know what it is. This diff simply adds logging for the exception and mentions that `value` over `constValue` should be used when the future can have an exception. The contract of `constValue` to throw when `eptr_` is set is still held, it is just enhanced with additional logging. ghstack-source-id: 142375391 Test Plan: Added UT Reviewed By: mrshenli Differential Revision: D32156552 fbshipit-source-id: 4dd5e73b92173209074c104a4b75c2021e20de4b	2021-11-04 10:04:09 -07:00
Natalia Gimelshein	ca445645f9	Revert D31902471: [nnc] Add support for dynamic shapes in TensorExprKernel Test Plan: revert-hammer Differential Revision: D31902471 (`15a3c374e2`) Original commit changeset: d2729a38ba1a fbshipit-source-id: 4c05de82e626bbf744df84fd2b914b66fd165a19	2021-11-03 14:48:12 -07:00
Raghavan Raman	15a3c374e2	[nnc] Add support for dynamic shapes in TensorExprKernel (#67197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67197 Test Plan: Imported from OSS Reviewed By: eellison, ZolotukhinM Differential Revision: D31902471 Pulled By: navahgar fbshipit-source-id: d2729a38ba1ac607ff07f516ed56fbd9085715dc	2021-11-03 11:24:17 -07:00
Raghavan Raman	383c1f51b1	[nnc] Fixed handling of 0-sized tensors in cat (#67734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67734 The implementation of `aten::cat` op in NNC has to ignore tensors that have 0-size in any dimension. Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.CatWithEmptyInputs'` Reviewed By: ZolotukhinM Differential Revision: D32122171 fbshipit-source-id: 90c697813bc504664673cdc262df6e7ce419c655	2021-11-03 10:16:16 -07:00
Mikhail Zolotukhin	ff5c61a74e	[TensorExpr] Add lowering for aten::max (reduction). (#66519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66519 Differential Revision: D31590853 D31590853 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: a702621621f681d7f5392912e8a77ca124e14170	2021-11-03 09:44:09 -07:00
Mikhail Zolotukhin	00afe9ba7b	[TensorExpr] Add lowering for aten::embedding. (#66518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66518 Differential Revision: D31590855 D31590855 Test Plan: Imported from OSS Reviewed By: pbelevich Pulled By: ZolotukhinM fbshipit-source-id: aace0a87b1649330dae44182f7873aca27160d64	2021-11-03 09:44:07 -07:00
Mikhail Zolotukhin	008a58d226	[TensorExpr] Add lowering for aten::conv1d. (#66517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66517 Differential Revision: D31590856 D31590856 Test Plan: Imported from OSS Reviewed By: pbelevich Pulled By: ZolotukhinM fbshipit-source-id: c05a37d8741acd0606c2adb8d6cfeb1f57bc8aa0	2021-11-03 09:44:05 -07:00
Mikhail Zolotukhin	d58ef2bbff	[TensorExpr] Fix lowering for aten::softmax for the case when dtype parameter is None. (#66516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66516 Differential Revision: D31590858 D31590858 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 0aeee7a5be64b3b9c8fa00aacb1a94031a7e25d1	2021-11-03 09:42:48 -07:00
Rohan Varma	885da61d7d	[PG NCCL] Disable NCCL health check (#67668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67668 This adds an env var to enable NCCL health check, which when left unspecified, results in the check not being run. Unit tests that need to test this functionality have the env variable set. Please see internal diff for more details. Test Plan: CI Reviewed By: yuguo68, mrshenli Differential Revision: D32089763 fbshipit-source-id: dff5664a5e607f711515cd1042089ca769914fbb	2021-11-02 16:21:59 -07:00
Scott Wolchok	82f7f8d471	[PyTorch] Adopt IValue::toTupleRef() where obvious (#65505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65505 Generated with `fastmod -m 'toTuple(\s)->' 'toTupleRef()${1}.'` , followed by `fastmod '(std::move$.)toTupleRef\($.' '${1}toTuple()->'` to unbreak 2 callsites. ghstack-source-id: 142065835 Test Plan: CI Reviewed By: gchanan Differential Revision: D31131025 fbshipit-source-id: 54457ae5bbeb38db9c7f196d469b98521c3d3f34	2021-11-02 10:22:18 -07:00
Howard Huang	0cbfd466d2	Remove ProcessGroup from TensorPipeAgent initialization (#66708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66708 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31762735 Pulled By: H-Huang fbshipit-source-id: 9f3879fca6b8258f7e6171b14d2c1d6cce21627d	2021-11-01 14:15:27 -07:00
Max Ren	ba369ea053	check to ensure profiler_edge is only added when use_kineto is on (#67494 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67494 Reviewed By: jbschlosser Differential Revision: D32031142 Pulled By: mcr229 fbshipit-source-id: 8267f0e02c5bed0fbc4956af6935a551bedb27ef	2021-11-01 13:42:14 -07:00
Ivan Kobzarev	7fbcf79684	[tensorexpr][nnc] Support quantization (#66676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66676 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31676329 Pulled By: IvanKobzarev fbshipit-source-id: 288b41ff4ed603dfaacb465f296997f14bb23c22	2021-10-31 22:49:30 -07:00
Jacob Szwejbka	66202b7f8d	[Pytorch Edge] Expose runtime operators versioning (#67385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67385 As part of the expanded operator versioning effort we are going to start looking at this variable and whats stored locally in the model file. ghstack-source-id: 141782717 Test Plan: unit test Reviewed By: cccclai Differential Revision: D31976654 fbshipit-source-id: 255a23cff7c4f4039089de23b4da95772be48324	2021-10-29 13:42:59 -07:00
Elias Ellison	fc82ad186a	Add Initial NNC Dynamic Shapes Flow (#66136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136 FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack.. Takes in a TensorExprGraph of static shapes and generalizes the input shapes to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise dimensions with the same value will be bucketed to the same symbolic shape. E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)` From there, runs symbolic shape inference on the graph, and creates a versioning if in the graph with prim::TensorExprDynamicGuard checking if the inputs at runtime match the Generalized Symbolic Shapes that are inputs to the TE Kernel. The computate to calculate all symbolic dimensions is inlined in to the if block with the TE Kernel. All Sym Dim Value* are appended to the end of the TE Kernel Graph/Node inputs, and the Node is augmented with a integer list attr `symbolic_shape_inputs` that gives the mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in `test_shape_analysis` Returns True on Success, False on Failure, can fail if shape propagation fails to propagate # of dims or if complete shapes on inputs not set. Example transformation ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : int = prim::Constant[value=0]() %4 : Tensor = aten::tanh(%x.1) %5 : Tensor = aten::erf(%4) %6 : Tensor = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor = aten::cat(%7, %3) %9 : Tensor = aten::hardswish(%8) %10 : Tensor = aten::mul(%9, %z) return (%9) ``` -> ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp) %5 : Tensor = prim::If(%4) block0(): %15 : int[] = aten::size(%x_inp) %16 : int[] = aten::size(%y_inp) %17 : int = prim::Constant[value=1]() %18 : int = prim::Constant[value=0]() %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10 %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10 %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10 %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29 %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3) -> (%3) block1(): %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp) -> (%14) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu), %SS_5 : int, %SS_4 : int, %SS_3 : int, %SS_2 : int): %3 : int = prim::Constant[value=0]() %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1) %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4) %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3) %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8) %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z) return (%9) ``` Test Plan: Imported from OSS Reviewed By: navahgar, anjali411 Differential Revision: D31797466 Pulled By: eellison fbshipit-source-id: b508d2f5baef6e8e4020955ab1d4bc4b9c7bdfdd	2021-10-28 17:09:03 -07:00
Zhengxu Chen	0795735351	[jit] Clean up unneeded virtual methods from Function interface. (#65968 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65968 tryToGraphFunction() should cover all cases and more composable than adhoc virtual methods. ghstack-source-id: 141759214 Test Plan: no behavior change. Reviewed By: gmagogsfm Differential Revision: D31326154 fbshipit-source-id: 692a35df424f7d4f777a96489c4cbb24b3ae7807	2021-10-28 12:28:48 -07:00
Bin Bao	2366948085	[LT] Add ir_util for ComputePostOrder (#67282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67282 Test Plan: `build/bin/test_lazy` Reviewed By: wconstab, ngimel Differential Revision: D31961754 Pulled By: desertfire fbshipit-source-id: 28466588ece8057640a7202b8c79cc1a4357d373	2021-10-28 08:17:52 -07:00
Zhengxu Chen	b55a2500d2	[jit] Remove graph() call from abstract Function interface. (#65967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967 Graph is an implementation detail. If user wants to get access to the underlying graph, they should be able to explicitly dynamic cast instead. ghstack-source-id: 141659819 Test Plan: no behavior change. Reviewed By: gmagogsfm Differential Revision: D31326153 fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84	2021-10-27 11:54:26 -07:00
Pavithran Ramachandran	1ce500f56f	[easy][PyTorch] Use `at::native::is_nonzero` (#67195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67195 Now that `is_nonzero` is part of `at::native` refer https://github.com/pytorch/pytorch/pull/66663, replacing `TensorCompare::is_nonzero` to `at::native::is_nonzero` ghstack-source-id: 141514416 Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D31704041 fbshipit-source-id: 36813e5411d0aa2eb2d0442e2a195bbed417b33d	2021-10-26 12:40:32 -07:00
Michael Shi	ad5731cacc	[PyTorch] Add flop count for bmm and baddbmm (#66636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66636 Add FLOP count for bmm and baddbmm, which is `2bmnk`. Reviewed By: ngimel Differential Revision: D31622061 fbshipit-source-id: f3e1e1e34c45228693117b81647fb4a623c4085b	2021-10-25 17:31:12 -07:00
Zhengxu Chen	12daa4f663	[jit][edge] Enable CALL instruction in lite interpreter. (#65964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65964 ghstack-source-id: 141425519 Test Plan: buck run xplat/caffe2:test_lite_interpreter Reviewed By: cccclai Differential Revision: D31326149 fbshipit-source-id: 8a599d92f3fa4e6c125100adb36d89592e71e547	2021-10-25 14:44:33 -07:00
Nikolay Korovaiko	a7ebf76a15	jit trace (#59949 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59949 Reviewed By: ZolotukhinM Differential Revision: D31366787 Pulled By: Krovatkin fbshipit-source-id: 798cbcd97e8ecfba984f98cd70214954be9309af	2021-10-24 18:04:22 -07:00
Chen Lai	5f58764d1d	[PyTorch Edge][type] Add type support for NamedTuple custom class (import) (#63130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63130 Extend `type_parser` to handle `NamedTuple` type. It can be extended to handle other types when needed. The custom type will follow the following format: ``` "qualified_named[ NamedTuple, [ [filed_name_1, field_type_1], [filed_name_2, field_type_2] ] ]" ``` For example: ``` "__torch__.base_models.sparse_nn.pytorch_preproc_types.PreprocOutputType[ NamedTuple, [ [float_features, Tensor], [id_list_features, List[Tensor]], [label, Tensor], [weight, Tensor], ] ]" ``` For nested types, the order of type lists from type table should be: ``` std::string type_1 = “__torch__.C [ NamedTuple, [ [field_name_c_1, Tensor], [field_name_c_2, Tuple[Tensor, Tensor]], ] ]” std::string type_2 = “__torch__.B [ NamedTuple, [ [field_name_b, __torch__.C ] ] ]” std::string type_3 = “__torch__.A[ NamedTuple, [ [field_name_a, __torch__.B] ] ]” std::vector<std::string> type_strs = {type_str_1, type_str_2, type_3}; std::vector<TypePtr> type_ptrs = c10::parseType(type_strs); ``` namedtuple from both `collection` and `typing` are supported ``` from typing import NamedTuple from collections import namedtuple ``` This change only adds the parser and now new runtime can read the above format. ghstack-source-id: 141293658 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.CompatiblePrimitiveType' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.CompatibleCustomType' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.InCompatiblePrimitiveType' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.InCompatibleCustomType' ``` Reviewed By: iseeyuan Differential Revision: D30261547 fbshipit-source-id: 68a9974338464e320b39a5c613dc048f6c5adeb5	2021-10-22 00:40:57 -07:00
David Berard	e86d8323cb	[JIT] Add special cases for batch_norm, instance_norm in alias_analysis (#66554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66554 In native_functions.yaml, the schemas for batch_norm and instance_norm are incorrect: the inputs `running_mean` and `running_var` are mutated, but are not marked as such in the function schema. Since `(a!)?` annotations are currently not working (see #65760), this instead adds a special case to `alias_anaysis.cpp`. If the value of `training` or `use_input_stats` is known to be `false`, then `alias_analysis` will mark the input as _not_ being written to. Test Plan: Removed the `skip` annotation on the following test, and added a special exception in `check_alias_annotations`: ``` python test/test_ops.py -k test_variant_consistency_jit_nn_functional_batch_norm ``` Also: ``` ./build/bin/test_jit --gtest_filter="BatchAndInstanceNormFixture" ``` Imported from OSS Reviewed By: eellison Differential Revision: D31612339 fbshipit-source-id: 12ca61b782b9e41e06883ba080a276209dc435bb	2021-10-20 10:22:10 -07:00
Michael Suo	1bf0e1acb4	Revert D31732414: Add Initial NNC Dynamic Shapes Flow Test Plan: revert-hammer Differential Revision: D31732414 (`de4fe7a38c`) Original commit changeset: 290a94a667c2 fbshipit-source-id: 3021a1d7a8661967e37d4f9cfc86ed47cc4a7f3d	2021-10-19 20:05:29 -07:00
Elias Ellison	de4fe7a38c	Add Initial NNC Dynamic Shapes Flow (#66136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136 FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack.. Takes in a TensorExprGraph of static shapes and generalizes the input shapes to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise dimensions with the same value will be bucketed to the same symbolic shape. E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)` From there, runs symbolic shape inference on the graph, and creates a versioning if in the graph with prim::TensorExprDynamicGuard checking if the inputs at runtime match the Generalized Symbolic Shapes that are inputs to the TE Kernel. The computate to calculate all symbolic dimensions is inlined in to the if block with the TE Kernel. All Sym Dim Value* are appended to the end of the TE Kernel Graph/Node inputs, and the Node is augmented with a integer list attr `symbolic_shape_inputs` that gives the mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in `test_shape_analysis` Returns True on Success, False on Failure, can fail if shape propagation fails to propagate # of dims or if complete shapes on inputs not set. Example transformation ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : int = prim::Constant[value=0]() %4 : Tensor = aten::tanh(%x.1) %5 : Tensor = aten::erf(%4) %6 : Tensor = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor = aten::cat(%7, %3) %9 : Tensor = aten::hardswish(%8) %10 : Tensor = aten::mul(%9, %z) return (%9) ``` -> ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp) %5 : Tensor = prim::If(%4) block0(): %15 : int[] = aten::size(%x_inp) %16 : int[] = aten::size(%y_inp) %17 : int = prim::Constant[value=1]() %18 : int = prim::Constant[value=0]() %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10 %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10 %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10 %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29 %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3) -> (%3) block1(): %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp) -> (%14) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu), %SS_5 : int, %SS_4 : int, %SS_3 : int, %SS_2 : int): %3 : int = prim::Constant[value=0]() %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1) %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4) %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3) %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8) %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z) return (%9) ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732414 Pulled By: eellison fbshipit-source-id: 290a94a667c20467717202a43c60e4f9ca4c00e2	2021-10-19 16:41:49 -07:00
gmagogsfm	147f7559b1	Add `SourceView` which doesn't own source text as base class of `Source` (#65309 ) Summary: This would save the cost copying text from stack to heap in some cases (like parsing function schema during loading phase of libtorch.so) Pull Request resolved: https://github.com/pytorch/pytorch/pull/65309 Reviewed By: swolchok Differential Revision: D31060315 Pulled By: gmagogsfm fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a	2021-10-18 23:17:22 -07:00
Richard Barnes	e0643fa3fc	use irange for loops 5 (#66744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31705358 fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48	2021-10-18 21:59:50 -07:00
Will Constable	d05c1ec007	Add lazy Node base and associated infra (#66601 ) Summary: - Adds Node base class and unit tests - Also adds metadata utils to enable source code annotation and scope tracking Pull Request resolved: https://github.com/pytorch/pytorch/pull/66601 Test Plan: Add new unit tests Reviewed By: desertfire Differential Revision: D31634044 fbshipit-source-id: a042d54f06fbc480acfc63c18d43cb6fceb6fea5	2021-10-18 19:09:42 -07:00
Ivan Yashchuk	0d203a16fe	Add relative and absolute tolerances for matrix_rank, pinv (#63102 ) Summary: This pull request introduces new keyword arguments for `torch.linalg.matrix_rank` and `torch.linalg.pinv`: `atol` and `rtol`. Currently, only tensor overload has default values for either `atol` or `rtol`, the float overload requires both arguments to be specified. FC compatibility: https://github.com/pytorch/pytorch/pull/63102#discussion_r710930509 Fixes https://github.com/pytorch/pytorch/issues/54151. Fixes https://github.com/pytorch/pytorch/issues/66618. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63102 Reviewed By: H-Huang Differential Revision: D31641456 Pulled By: mruberry fbshipit-source-id: 4c765508ab1657730703e42975fc8c0d0a60eb7c	2021-10-17 22:15:42 -07:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Scott Wolchok	e88d1c4f10	[PyTorch] Add tuple inline storage (#64066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066 I noticed a bunch of time being spent heap-allocating Tuples in the unpickler. 1-, 2-, and 3-element Tuples are apparently common enough that they get their own bytecode instructions, so I decided to try also giving them their own representation. We store up to 3 IValues inline in `Tuple` rather than doing a second heap allocation for a `std::vector<IValue>`. ghstack-source-id: 140695395 Test Plan: Added automated tests for TupleElements. Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284 Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422 We went from 347 ms to 302 ms. Reviewed By: dhruvbird Differential Revision: D30592622 fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8	2021-10-15 12:16:51 -07:00
Rohan Varma	06fa6c15c0	Back out "Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor"" (#66393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66393 Third try! Fixes: - test_nccl_timeout can be flaky because of 1s timeout, bump up the timeout to resolve the flakiness. But in general we should not have been relying on time.sleep for this test, filed https://github.com/pytorch/pytorch/issues/66354 to track that. - ciflow/all did not actually run tests due to a bug causing multigpu tests to not be run. This has since been fixed. ghstack-source-id: 140560113 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31534735 fbshipit-source-id: 8b7e0f4fed3972b7a77cbcda28876c9eefb0c7e2	2021-10-14 22:23:22 -07:00
soulitzer	93d326c868	Add InplaceOrView boxed kernel (#63878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63878 See https://github.com/pytorch/pytorch/issues/64407, https://github.com/pytorch/pytorch/issues/62032 for context: In this PR: - Add boxed kernel by replicating `gen_inplace_or_view`'s logic that is ONLY for use with the Autograd not-implemented kernel - Unlike `gen_inplace_or_view` we always pass a view_func to as_view in order to ensure that an "derivative is not implemented" error is raised even if an in-place update is performed on the view. Without the `view_func`, the CopySlice + AsStridedBackward nodes would replace the NotImplemented node. - This limitation makes it impossible to use this node for general use - view relationship must be between first input (must be tensor) and first output (may be tensor or vec of tensor) - do not support non-differentiable views (_values, _indices, view.dtype) - view relationship is always fw and bw differentiable - Adds the macro `#define REGISTER_AUTOGRAD_NOT_IMPLEMENTED_FALLBACK(ns, op)` to be the interface for this feature: - static initialization can be slowed down(? not measured) if there are many registrations, because each line translates to 2 library calls but the workaround is just to manually use the two functions `AutogradNotImplementedFallback` and `ADInplaceOrViewFallback` and call `m.impl`. - Adds testing: - for views: view relationship created - performing in-place operation on the view, raises properly - trying to create two view relationships is not allowed, - single view relationship but not first input/first output should error - view relation created properly for tensor vector output - for in-place: - version count bump - triggers rebase_history - multiple mutations is okay and also updates version counter - TODO (follow up): Update tutorials for adding third-party operators (and document the above limitations) - TODO (follow up): Look at torch-audio/torch-vision and identify places where this can simplify existing code EDIT: Made it more clear what is introduced in this PR and moved some more contextual stuff into the issue itself Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30901714 Pulled By: soulitzer fbshipit-source-id: 48de14c28be023ff4bd31b7ea5e7cba88aeee04c	2021-10-12 18:55:50 -07:00
Kimish Patel	c6216b2a43	Back out "Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source" (#66421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66421 Original commit changeset: ab6bb8fe4e83 Plus this incldes BUILD.bazel changes, the reason for the revert. Test Plan: See original diff Reviewed By: gdankel Differential Revision: D31542513 fbshipit-source-id: ee30aca2d6705638f97e04b77a9ae31fe5cc4ebb	2021-10-12 10:55:29 -07:00
Animesh Jain	cc24e4e5d0	[NNC] Normalize loops in SplitWithTail (#66242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66242 While working on random test generation, I observed that many simple transformations were upsetting vectorization. Digging deeper, I found that it calls SplitWithTail which incorrectly splits the loop when the loop start is not zero. This path normalizes the loop before we start splitting it. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31506853 Pulled By: anijain2305 fbshipit-source-id: 5c5f2568ce0a239bfaa515458be52541eafd23b1	2021-10-11 13:44:05 -07:00
Nikita Shulga	c373387709	Update CMake and use native CUDA language support (#62445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445 PyTorch currently uses the old style of compiling CUDA in CMake which is just a bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as a language just like C++ or C. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503350 fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55	2021-10-11 09:05:48 -07:00
Jane Xu	0a48f56318	Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" Test Plan: revert-hammer Differential Revision: D31299350 (`f1f3bd8c36`) Original commit changeset: 9ad5c8fa17f7 fbshipit-source-id: d63d889922f507a4a0e2e042e451b95b9591c317	2021-10-08 17:55:28 -07:00
Jane Xu	c62ed96496	Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source Test Plan: revert-hammer Differential Revision: D30710710 (`c1343ff706`) Original commit changeset: 51399f9b0b64 fbshipit-source-id: ab6bb8fe4e83ed1052e621e427259192a4f0f540	2021-10-08 17:46:18 -07:00
Rohan Varma	f1f3bd8c36	Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" (#65883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65883 Original commit changeset: d8e962b8aab6 ghstack-source-id: 139836954 Test Plan: ci Reviewed By: zhaojuanmao Differential Revision: D31299350 fbshipit-source-id: 9ad5c8fa17f7038ba579cb1eda6d9271ac07a130	2021-10-08 16:04:20 -07:00
Kimish Patel	c1343ff706	[Pytorch Edge] Support profiling kineto events from external source (#64397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64397 This diff exposes a way to add events to kineto profiler from external source. This can be a backend that executes a subgraph and wants to record this execution in kineto profiler. This diff also adds "backend" metadata to identify the backend an event would have executed on. Test Plan: test_lite_interpreter Imported from OSS Reviewed By: raziel Differential Revision: D30710710 fbshipit-source-id: 51399f9b0b647bc2d0076074ad4ea9286d0ef3e2	2021-10-08 15:59:42 -07:00
Raghavan Raman	92ce188510	Revert D31445799: [nnc] Use given kernel function name while emitting code Test Plan: revert-hammer Differential Revision: D31445799 (`c30dc52739`) Original commit changeset: 8d1642098313 fbshipit-source-id: 6b9d8c816437e9fcba8eb19cc683bc0a46a04cf5	2021-10-08 12:39:01 -07:00
Raghavan Raman	2e6fa0261f	Revert D31445797: [nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination Test Plan: revert-hammer Differential Revision: D31445797 (`7e5ef5e517`) Original commit changeset: 4e1450100928 fbshipit-source-id: fc13b34dbb66c7a22816eb46cf6d98ae9f332d39	2021-10-08 12:38:59 -07:00
Scott Wolchok	2d885ab73d	[jit] Reduce refcounting of Types (#65345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345 FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership. ghstack-source-id: 140044165 Test Plan: CI perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial. Reviewed By: hlu1 Differential Revision: D31027361 fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8	2021-10-08 09:03:04 -07:00
Raghavan Raman	7e5ef5e517	[nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination (#66217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66217 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31445797 Pulled By: navahgar fbshipit-source-id: 4e1450100928132ccce4ef3c6c20ad6661cfabed	2021-10-07 13:17:11 -07:00
Raghavan Raman	c30dc52739	[nnc] Use given kernel function name while emitting code (#66216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66216 Test Plan: Imported from OSS Reviewed By: dagitses, priyaramani Differential Revision: D31445799 Pulled By: navahgar fbshipit-source-id: 8d164209831339d364710b14f6a263a16e108281	2021-10-07 13:15:46 -07:00
Will Constable	a8c0b362ce	[pytorch][PR] Add hash and int128 utils for Lazy Tensor Core" (#66181 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66181 Original commit changeset: 3d0d5377d71e Test Plan: Run PyTorch XLA corresponding PR in XLA CI: https://github.com/pytorch/xla/pull/3148/files Reviewed By: suo Differential Revision: D31416438 fbshipit-source-id: 58a6a49c5bc30134bc6bae2e42778f359b9a8f40	2021-10-07 10:05:26 -07:00
Chen Lai	a5895f85be	[PyTorch Edge][type] Add type check in compatibility api (#63129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63129 1. Add an api to get `supported_types` from runtime, expose in c++ only. 2. Add an api to get `contained_types` from model, expose in both c++ and PyThon. 3. Add a field `contained_types_` in `type_parser.cpp` to track the contained types when parsing python string. 4. Expand `is_compatible` api to check type. When checking type, it will check the contained type list from the model with the support type list from runtime. 5. Expand the unittest for compatibility to cover type 6. Add unit test in python to check type list ghstack-source-id: 139826944 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.GetContainTypes' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleSuccess' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail' buck test //caffe2/test:mobile ``` Reviewed By: iseeyuan Differential Revision: D30231419 fbshipit-source-id: 8427f423ec28cc5de56411f15fd960d8595d6947	2021-10-06 02:23:44 -07:00
Michael Suo	f062def486	Revert D31260343: [pytorch][PR] Add hash and int128 utils for Lazy Tensor Core Test Plan: revert-hammer Differential Revision: D31260343 (`e94fea08d0`) Original commit changeset: 8bb1194188e3 fbshipit-source-id: 3d0d5377d71ed928015bcb2105801be368e38cd8	2021-10-05 17:15:50 -07:00
Will Constable	e94fea08d0	Add hash and int128 utils for Lazy Tensor Core (#65635 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65635 Reviewed By: alanwaketan Differential Revision: D31260343 Pulled By: wconstab fbshipit-source-id: 8bb1194188e3e77fc42e08a14ba37faed37a9c2e	2021-10-05 16:43:55 -07:00
Nikita Shulga	4c4525fa5c	Compile without -Wno-unused-variable (take 2) (#66041 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Do not delete `caffe2::OperatorBase::Output` calls as they have side effects Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041 Reviewed By: ngimel Differential Revision: D31360142 Pulled By: malfet fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8	2021-10-04 20:39:39 -07:00
Don Jang	7941590a51	[JIT] Selectively enable precise alias analysis for TupleConstruct (#66025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66025 This change adds an option to selectively enable precise alias analysis for `prim::`TupleConstruct` (introduced by D30437737 (`cd458fe092`)) to minimize its exposure only to `StaticRuntime` as of now. Test Plan: Modified existing unit tests whose behavior depends on D30437737 (`cd458fe092`). Reviewed By: eellison Differential Revision: D31350285 fbshipit-source-id: 3ce777f07f99650d74634481ad0805192dce55c6	2021-10-01 20:42:22 -07:00
Nikita Shulga	e4ee5ca698	Revert D31326599: [pytorch][PR] Compile without -Wno-unused-variable Test Plan: revert-hammer Differential Revision: D31326599 (`a6280ab653`) Original commit changeset: 924155f1257a fbshipit-source-id: b8ee5bc0298637443232f5ee9ec79e51ed256faf	2021-10-01 20:40:47 -07:00
Nikita Shulga	a6280ab653	Compile without -Wno-unused-variable (#65954 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954 Reviewed By: ngimel Differential Revision: D31326599 Pulled By: malfet fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3	2021-10-01 17:40:47 -07:00
Hariom Narang	2828ce53fd	Added jit log stream changing function and some refactor (#65768 ) Summary: Description: - Have only added `stdout` and `stderr` as possible options from python API for now. We can do file path passing later maybe. - Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file. Python API: `torch._C._jit_set_logging_stream('stdout\|stderr')` C++ API: `::torch::jit::set_jit_logging_output_stream(ostream);` Testing: - Tested python API locally. - Unit test for the C++ API is written Fixes https://github.com/pytorch/pytorch/issues/54182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768 Reviewed By: mrshenli Differential Revision: D31291739 Pulled By: ZolotukhinM fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d	2021-09-30 23:25:11 -07:00
Michael Suo	33c03cb61a	[deploy][1/n] Make deploy code conform to PyTorch style. (#65861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65861 First in a series. This PR changes the code in deploy.h/cpp and interpreter_impl.h/cpp to be camel case instead of snake case. Starting with this as it has the most impact on downstream users. Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D31291183 Pulled By: suo fbshipit-source-id: ba6f74042947c9a08fb9cb3ad7276d8dbb5b2934	2021-09-30 22:59:47 -07:00
Mikhail Zolotukhin	3a0165da49	[TensorExpr] Port NNC lowerings to the new registry mechanism. (#65551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65551 Previously we had a big switch on Op kind to decide how to lower a given JIT operator to NNC. This PR changes this switch to a hash table lookup. Why? This helps us with at least two things: 1) With this approach we can easily check if we know how to handle a given node in advance - i.e. we can inspect the entire graph and tell whether it's possible to compile it or not without actually trying to do that and dying in the middle. This would allow us to, say, provide user-friendly error messages in AOT workflow. 2) We can switch to use schema instead of op kind to determine correct lowering. Unlike op schema, op kind might be ambigous (see e.g. #64963) and using it instead of schema can lead to bugs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31148926 Pulled By: ZolotukhinM fbshipit-source-id: ac12684e2126c899426ef5e4cc1e3f70fa01f704	2021-09-30 22:56:18 -07:00
Don Jang	cd458fe092	[JIT] Make output of prim::TupleConstruct alias only with its inputs (#64879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879 This change makes the output of `prim::TupleConstruct` alias only with its inputs when the created tuple is directly returned from the graph. The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used. Test Plan: Added - `AliasMoveForTupleConstructWithSingleUseAsGraphOutput` - `WildcardAliasForTupleConstructWithUses` to cover the newly added code. Reviewed By: eellison Differential Revision: D30437737 fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb	2021-09-29 21:56:31 -07:00
Mike Ruberry	91f8755b0e	Revert D31005792: [NCCL] Init dummy NCCL comms in constructor Test Plan: revert-hammer Differential Revision: D31005792 (`2b22a5dde2`) Original commit changeset: c2c582dee25a fbshipit-source-id: d8e962b8aab6fda8a6c013e8577492dff9568c27	2021-09-29 20:46:38 -07:00
Rohan Varma	2b22a5dde2	[NCCL] Init dummy NCCL comms in constructor (#65173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65173 Initializes dummy NCCL communicators in constructor for a basic health check that communicators can be initialized prior to launching the first collective. After successful init, we immediately use `ncclCommAbort` to destroy these communicators to ensure they don't interfere with regular communicator creation during collectives. Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D31005792 fbshipit-source-id: c2c582dee25a098361ead6ef03f541e7833c606b	2021-09-29 15:36:54 -07:00
Scott Wolchok	ece25c453f	[PyTorch] Store Argument::alias_info_ on the heap (#64824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64824 See comment in function_schema.h for explanation. I claim that this is a good tradeoff because the aliasing information seems to be used only in compiler-ish code paths, where performance isn't as critical as actual execution. If performance is important there too, perhaps we should hoist isWrite into the Argument itself since there are several paths that only care about isWrite. ghstack-source-id: 138958896 Test Plan: CI, profile schema parsing on startup and see much fewer page faults in createArgumentVector. Reviewed By: suo Differential Revision: D30860719 fbshipit-source-id: 1d4d2328f2b8e34f5ddf9d82083fd4dd7b7f738f	2021-09-24 17:00:51 -07:00
Peter Bell	68e5935498	Remove fgrad_input from slow_conv2d (#64280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64280 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30830887 Pulled By: jbschlosser fbshipit-source-id: 5a3a79ad9d9118177672eabf872f9d9a3313ebe4	2021-09-24 14:27:39 -07:00
XiaobingSuper	1682722152	keep output type after calling SubgraphRewriter (#65453 ) Summary: For jit SubgraphRewriter, it doesn't keep output type after overwriting the old graph, for example, in profiling mode, the old graph has the old operator's shapes, but after replacing the old operator with a newer operator by applying SubgraphRewriter, the tensor shape info was eliminated. The activation is that I want to replace pytorch convolution with a customer's convolution, I first register aten::_convolution as a profiler node that can reorder the input and output's shapes, and then using graph rewrite to replace it as aten::conv2d, which tensors' shapes info are eliminated. I hope using input size do some pre-progress before replacing aten::conv2d with the customer's convolution. Before rewrite: ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/ site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2 2:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %x : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::_convolution(%x.1, %weight, %4, %3, %2, %3, %6, %2, %7, %6, %6, %5, %5), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3. 6/site-packages/torch/nn/modules/conv.py:443:0 %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%x, %z, %7) # jit_test.py: 24:0 return (%16) ``` after rewrite by using aten::conv2d ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:22:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %18 : Tensor = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7) %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py:24:0 return (%16) ``` expected result after replace aten::_convolution with aten::conv2d: ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/ site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2 2:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %18 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7) %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py :24:0 return (%16) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65453 Reviewed By: zdevito Differential Revision: D31162489 Pulled By: ZolotukhinM fbshipit-source-id: 0d1c1d607cb612df47c64f173d9f4c9e8b1d6c49	2021-09-24 11:07:40 -07:00
kshitij12345	a012216b96	[nn] Fold : no batch dim (#64909 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64907 Reference: https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64909 Reviewed By: cpuhrsch, heitorschueroff Differential Revision: D30991087 Pulled By: jbschlosser fbshipit-source-id: 91a37e0b1d51472935ff2308719dfaca931513f3	2021-09-23 08:37:32 -07:00
jiej	127c9402d0	Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137 ) Summary: This reverts commit `03389dc851`. Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745 Fixes the windows build failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137 Reviewed By: seemethere, dzhulgakov, heitorschueroff Differential Revision: D30994556 Pulled By: malfet fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d	2021-09-22 04:54:51 -07:00
Chen Lai	880098a7e3	[PyTorch Edge] Backport function for defaults args with out args, flag on (#63651 ) Summary: 1. Enable support for operators with default args and out args. For `torch.add(x, h, out=x)`, the number of specified arguments will be 3 instead of 4. 2. Bump bytecode version from 6 to 7 3. Implement backport_v7_to_v6 function. Also slightly refactor the local_thread to allow re-emit operators. 4. unittest to cover backport function 5. Update expect result from 4 to 3 in unit test DefaultArgsWithOutArg to cover the number of specified arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63651 ghstack-source-id: 138539912 Test Plan: ``` caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions ``` Reviewed By: raziel, tugsbayasgalan Differential Revision: D30454080 fbshipit-source-id: 357c50b96682430675142d20d688d1f64e1de307	2021-09-20 22:50:30 -07:00
Mengwei Liu	eaf85fad62	[PyTorch] Extract parseOperator() into a standalone source file (#65179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179 This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build. Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`. Reviewed By: iseeyuan Differential Revision: D31006555 fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac	2021-09-17 13:31:59 -07:00
Jane Xu	1ee66a5278	Remove CUDA 9.2 references conditionals and workarounds (#65070 ) Summary: Title says it all Pull Request resolved: https://github.com/pytorch/pytorch/pull/65070 Reviewed By: malfet Differential Revision: D30966464 Pulled By: janeyx99 fbshipit-source-id: e454906fd5d7d321d390939ba5d237e1d9b150f8	2021-09-17 12:28:23 -07:00
Raghavan Raman	bbe25af0df	[nnc] Updated inlining to handle cases when producer indices are constants after eval (#65044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65044 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30954655 Pulled By: navahgar fbshipit-source-id: dfaedb5af710b2625ceec3a443a6c4e34158ab16	2021-09-17 11:28:48 -07:00
Raghavan Raman	03fc636d5c	[nnc] Updated inliner to remove assertions and exception (#64719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64719 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30828583 Pulled By: navahgar fbshipit-source-id: 9826a59085a210e44d101a843ff2cae440dfd633	2021-09-17 11:28:46 -07:00
Edward Yang	9601deb1b3	Disable autograd fallback tests on Windows (#65147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65147 I think they trigger an MSVC bug per https://github.com/pytorch/pytorch/issues/48763 ghstack-source-id: 138247203 Test Plan: breakpointed https://www.internalfb.com/intern/sandcastle/job/9007199738584981/ and sush'ed into the host and ran `buck build arvr/mode/win/opt //xplat/caffe2:autograd_libtorch_test_ovrsource` in `/cygdrive/d/ovrsource-null-hg` Reviewed By: soulitzer Differential Revision: D30992685 fbshipit-source-id: 06c6fb2c18d55490f89fc91ee5b7a4c5a7faf1c6	2021-09-17 08:32:43 -07:00
Eli Uriegas	03389dc851	Revert D30752939: [pytorch][PR] nvfuser update Test Plan: revert-hammer Differential Revision: D30752939 (`cfaecaf40b`) Original commit changeset: ce122e80f01b fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2	2021-09-15 17:38:47 -07:00
Mikhail Zolotukhin	7e9c599784	[TensorExpr] Add a method for sanitizing Var and Buf names in Stmt. (#65010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65010 This pass ensures all names are legal and not-duplicated. Fixes #52727. Test Plan: Imported from OSS Reviewed By: bertmaher, navahgar Differential Revision: D30939717 Pulled By: ZolotukhinM fbshipit-source-id: 7dbe7f937de41f22ad49137a5e067d698443ed63	2021-09-15 17:15:06 -07:00
jiej	cfaecaf40b	nvfuser update (#63745 ) Summary: Syncing nvfuser code base from devel branch, Listing a few of our development since last sync: - Extends support to normalization and reduction kernels. - Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation. - profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes). To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle. internal updates are files located in: 1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda` 2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser` 3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h` updates affecting integration: 1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/`, 2. exposed a few more symbols `aten/src/ATen/core/` used by codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745 Reviewed By: saketh-are Differential Revision: D30752939 Pulled By: malfet fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c	2021-09-15 14:42:55 -07:00
Mikhail Zolotukhin	f23f21dafe	[TensorExpr] Remove 'Placeholder' class. (#64887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887 BufHandle has exactly the same functionality and should be used instead. Differential Revision: D30889483 D30889483 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3	2021-09-14 00:22:44 -07:00
Martin Yuan	30a7c768d7	[RFC] Modularize functions of parsing bytecode (#61862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862 Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter. * The decoupled functions are re-used by current lite interpreter loader. * The bytecode can be serialized/deserialized from other formats. * The decoupled functions have minimum dependencies on other PyTorch components. Next: Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components. ghstack-source-id: 137867287 Test Plan: As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction). CI Reviewed By: larryliu0820 Differential Revision: D29798382 Pulled By: iseeyuan fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f	2021-09-11 22:24:05 -07:00
Mikhail Zolotukhin	180e4fbfae	[TensorExpr] LLVMCodegen: fix lowering for UInt->Float casts. (#64862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64862 Previously we erroneously were looking at dst signedness. This was discovered when we tried to implement quantize/dequantize ops. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30881696 Pulled By: ZolotukhinM fbshipit-source-id: 34af842e5e52a3b6b5d2e70c4ef32f910a20341f	2021-09-11 09:24:36 -07:00
Hui Guo	4481c87ac4	[tensorexpr] Simplify x/100 -> 0 if x is a non-negative integer less than 100. (#64763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64763 Simplification pattern: x/N -> 0; N is a constant positive integer and x is a for-loop index whose range is a subset of [0, N). Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30845854 Pulled By: huiguoo fbshipit-source-id: 814d69ed4be05e57405c222183cc1c6c526721cd	2021-09-10 20:33:02 -07:00
Raghavan Raman	cad7a4b0ea	[nnc] Added an implementation of sign op (#64033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64033 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30579197 Pulled By: navahgar fbshipit-source-id: f9f7fa7f2ffa109cf4e441eb1af821b8b891d4d3	2021-09-10 16:49:04 -07:00
Mikhail Zolotukhin	a17d6c7f80	[TensorExpr] Simplify TE IR before applying any transformations. (#64717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64717 This also exposed several bugs, which are fixed in this PR. Differential Revision: D30826408 D30826408 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: a67ec5739aceed9ffdf0d24f77eb3787cefe4560	2021-09-09 18:50:51 -07:00
Raghavan Raman	b7c86365d1	[nnc] Handled cast in index expression during inlining (#64716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64716 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30826388 Pulled By: navahgar fbshipit-source-id: 7e446602f650527e0d954e437f0370602019e040	2021-09-09 08:30:52 -07:00
Raghavan Raman	652a8bf7d0	[nnc] Updated indices during broadcast to use int64_t (#64627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627 This fixes the root cause of S242719 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30801686 Pulled By: navahgar fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80	2021-09-09 08:29:37 -07:00
Elias Ellison	3bf93d769c	[JIT] Add gradient check in constants (#64613 ) Summary: fixes internal issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/64613 Reviewed By: Gamrix Differential Revision: D30799016 Pulled By: eellison fbshipit-source-id: 48ef52d1cac627919e6cd232216d24878a2a8b58	2021-09-09 08:13:57 -07:00
Hui Guo	5c27a580ec	[tensorexpr] Allocate intermediate buffers at compile time (#64227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64227 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30652220 Pulled By: huiguoo fbshipit-source-id: cd75005cdfa42751318de7174b44e14a3a01634e	2021-09-08 15:34:44 -07:00
Peter Bell	d701357d92	Factor out TensorBase that doesn't depend on native operators (#63612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612 This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to be rebuilt every time someone changes an operator signature. Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to minimize friction in code mixing the two types. To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build system for certain folders, or just define it at the top of any file. I've also included an example of manually special-casing the commonly used `contiguous` operator. The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in `Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can materialize a `Tensor` for use in dispatch without actually increasing its refcount. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728580 Pulled By: ezyang fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03	2021-09-08 13:28:54 -07:00
Mikhail Zolotukhin	72274e2a2f	[TensorExpr] Don't rely on exceptions in Vectorizer. (#64609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64609 We've been using exceptions to indicate whether vectorization succeeded or not, but that posed some problems with (e.g. we spent too much time symbolicazing these exceptions). This change converts this mechanism to a standard error return code. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30795342 Pulled By: ZolotukhinM fbshipit-source-id: 16e38b37bcdd78ceb438ac814cc377f35b058e17	2021-09-08 00:25:34 -07:00
Maksim Levental	81fe2c5e49	add out variant of linear (#61801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61801 resubmitting because the last one was unrecoverable due to making changes incorrectly in the stack Test Plan: Imported from OSS Reviewed By: desertfire Differential Revision: D29812510 Pulled By: makslevental fbshipit-source-id: ba9685dc81b6699724104d5ff3211db5852370a6	2021-09-07 19:58:52 -07:00
Ansley Ussery	6831d8e379	Support Union in TorchScript (#64234 ) Summary: This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234 Reviewed By: gmagogsfm Differential Revision: D30656444 Pulled By: ansley fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a	2021-09-03 06:12:24 -07:00
Chen Lai	8d5b95019d	[PyTorch Edge] Support default args with out arg, flag off (#63540 ) Summary: 1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag. 2. Add two unittests to cover this type of operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540 ghstack-source-id: 137211562 Test Plan: ``` caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg ``` Reviewed By: raziel, iseeyuan, tugsbayasgalan Differential Revision: D30414156 fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f	2021-09-02 01:36:16 -07:00
Kimish Patel	468001600c	Back out "Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling." (#64307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307 Original commit changeset: 0b2aa7c57d08 Restores original changes. This diff changes the way operator profiling is done in lite predictor benchmarking binary. Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile events and then generate operator level metric from it. Since KinetoEvents do not contain cpu clock time, now we report only wallclock time. This unifies various profiling effort that we have for benchmarking purpose. In production we will still use observer based mechanism, but the advantage of using kineto profiler is that we get few other things for free, such as: chrome trace generation. operator level memory profiling (to be added) flop counts (to be added) Furthermore possible we can use python post processing script to parse chrome trace and generate output similar to torch.profiler. (To be done) Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp. Test Plan: aibench run Model without debug info: https://www.internalfb.com/intern/aibench/details/219598441154763 Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information). https://www.internalfb.com/intern/aibench/details/617154236292985 Reviewed By: raziel Differential Revision: D30680354 fbshipit-source-id: b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9	2021-09-01 13:29:35 -07:00
Mikhail Zolotukhin	8337a3fb3f	[TensorExpr] Wrap error messages with buildErrorMessage call. (#64330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64330 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30687226 Pulled By: ZolotukhinM fbshipit-source-id: ade1be2ad6847c6afbba60307ef854696821b4e3	2021-08-31 20:31:16 -07:00
Kimish Patel	67cb131458	Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. Test Plan: revert-hammer Differential Revision: D30327514 (`bc9277dca3`) Original commit changeset: 3bb2f2daaaed fbshipit-source-id: 0b2aa7c57d08de77c9aaa75e546a7d0938610f64	2021-08-31 08:30:36 -07:00
Kimish Patel	bc9277dca3	[Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. (#63367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367 This diff changes the way operator profiling is done in lite predictor benchmarking binary. Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile events and then generate operator level metric from it. Since KinetoEvents do not contain cpu clock time, now we report only wallclock time. This unifies various profiling effort that we have for benchmarking purpose. In production we will still use observer based mechanism, but the advantage of using kineto profiler is that we get few other things for free, such as: - chrome trace generation. - operator level memory profiling (to be added) - flop counts (to be added) Furthermore possible we can use python post processing script to parse chrome trace and generate output similar to torch.profiler. (To be done) Test Plan: aibench run Model without debug info: https://www.internalfb.com/intern/aibench/details/219598441154763 Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information). https://www.internalfb.com/intern/aibench/details/617154236292985 Reviewed By: raziel Differential Revision: D30327514 fbshipit-source-id: 3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f	2021-08-30 20:54:51 -07:00
Will Constable	85df73658c	Make name() part of IMethod interface (#63995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63995 JIT methods already have name() in their interface, and Py methods have names in their implementation. I'm adding this for a particular case where someone tried to use name() on a JIT method that we're replacing with an IMethod. Test Plan: add case to imethod API test Reviewed By: suo Differential Revision: D30559401 fbshipit-source-id: 76236721f5cd9a9d9d488ddba12bfdd01d679a2c	2021-08-30 13:31:55 -07:00
Zhengxu Chen	ac99d63f83	[jit] Make operation call accept Stack& instead Stack* (#63414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414 Misuse of raw pointer in here where stack is never nullable. ghstack-source-id: 136938318 Test Plan: compiles. Imported from OSS Reviewed By: ejguan Differential Revision: D30375410 fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee	2021-08-30 11:49:20 -07:00
Thomas J. Fan	d3bcba5f85	ENH Adds label_smoothing to cross entropy loss (#63122 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/7455 Partially resolves pytorch/vision#4281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63122 Reviewed By: iramazanli Differential Revision: D30586076 Pulled By: jbschlosser fbshipit-source-id: 06afc3aa1f8b9edb07fe9ed68c58968ad1926924	2021-08-29 23:33:04 -07:00
Bert Maher	2e6221a232	[nnc] Make 64-bit dimensions work (#64077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64077 We were assuming kernel dimensions fit in 32 bits (the old fuser made this assumption too), but we should be able to support 64. ghstack-source-id: 136933272 Test Plan: unit tests; new IR level test with huge sizes Reviewed By: ZolotukhinM Differential Revision: D30596689 fbshipit-source-id: 23b7e393a2ebaecb0c391a6b1f0c4b05a98bcc94	2021-08-28 19:59:47 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	d0c63e857d	Enhancement for smart serialization for out schemas (#63096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63096 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30415255 Pulled By: tugsbayasgalan fbshipit-source-id: eb40440a3b46258394d035479f5fc4a4baa12bcc	2021-08-28 11:46:27 -07:00
Mikhail Zolotukhin	2d75ab0c8f	[TensorExpr] Update tutorial. (#64109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64109 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30614050 Pulled By: ZolotukhinM fbshipit-source-id: e8f9bd9ef2483e6eafbc0bd5394d311cd694c7b2	2021-08-27 16:19:29 -07:00
soulitzer	90a6498a12	Add autograd not implemented boxed fallback (#63458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63458 See description and discussion from https://github.com/pytorch/pytorch/pull/62450 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30518572 Pulled By: soulitzer fbshipit-source-id: 3b1504d49abb84560ae17077f0dec335749c9882	2021-08-27 15:00:28 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	19c1b45f25	Detect out argument in the schema (#62755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62755 After this change, out argument can be checked by calling is_out() Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30415256 Pulled By: tugsbayasgalan fbshipit-source-id: b2e1fa46bab7c813aaede1f44149081ef2df566d	2021-08-27 11:20:33 -07:00
Jiewen Tan	ed573a8e08	Enable test_api IMethodTest in OSS (#63345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63345 This diff did the following few things to enable the tests: 1. Exposed IMethod as TORCH_API. 2. Linked torch_deploy to test_api if USE_DEPLOY == 1. 3. Generated torch::deploy examples when building torch_deploy library. Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.* Reviewed By: ngimel Differential Revision: D30346257 Pulled By: alanwaketan fbshipit-source-id: 932ae7d45790dfb6e00c51893933a054a0fad86d	2021-08-26 16:50:52 -07:00
Cheng Chang	0f6b524665	[NNC] Add C++ codegen backend to NNC (#62869 ) Summary: Adds a C++ codegen backend to NNC to generate C++ for CPU instead of generating LLVM IR. Tensors are represented as blobs of float. Vector operations are devectorized/unrolled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62869 Test Plan: https://github.com/pytorch/pytorch/tree/mvz-nnc-aot-prototype makes it able to AOT compile the whole MobileNetV3 model into binary code through LLVM codegen in NNC. I forked that branch to https://github.com/cheng-chang/pytorch/tree/cc-aot-cpp, merged this PR into it, and modified `fancy_compile` to compile MobileNetV3 into C++ through ``` import torch m = torch.jit.load('mobnet.pt') m.eval() f = torch.jit.freeze(m) torch._C._fancy_compile(f.graph, [1, 3, 224, 224]) ``` The generated C++ file `mobnet.cc` can be found at https://gist.github.com/cheng-chang/e2830cc6920b39204ebf368035b2bcec. I manually compiled the generated C++ through `g++ -o mobnet -std=c++14 -L./build/lib -ltorch_cpu -ltorch mobnet.cc`, and it succeeded. Reviewed By: ZolotukhinM Differential Revision: D30149482 Pulled By: cheng-chang fbshipit-source-id: e77b189f0353e37cd309423a48a513e668d07675	2021-08-26 09:56:37 -07:00
Raghavan Raman	6d31ba6ddc	[nnc] Sanitized the names of constants in the input graph. (#63990 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63923 The input graph can contain constants whose names contain special characters. So, all names of constants in the input graph need to be sanitized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63990 Reviewed By: ZolotukhinM Differential Revision: D30558432 Pulled By: navahgar fbshipit-source-id: de5b0c23d50ee8997f40f2c0fc605dda3719186f	2021-08-26 09:52:02 -07:00
Bert Maher	8dda299d96	Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776 I reverted this out of an abundance of caution because some test failures occurred, but they were all due to precision issues fixed lower in this stack. Let's try again. I've rolled the elimination of the allow-parallelism-in-fusions toggle into this diff since they're pretty tightly coupled. ghstack-source-id: 136529847 Test Plan: CI Reviewed By: huiguoo Differential Revision: D30484555 fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59	2021-08-24 18:56:55 -07:00
yanbing-j	33a163d886	Enable BFloat16 LeakyReLU and RReLU in CPU path (#61514 ) Summary: Enable and optimize BFloat16 LeakyReLU and RReLU in CPU path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61514 Reviewed By: ejguan Differential Revision: D30257612 Pulled By: VitalyFedyunin fbshipit-source-id: 8cc0d1faacd02dcc9827af724a86d95b6952748f	2021-08-24 08:34:56 -07:00
Mike Iovine	1385f9fb12	[JIT] Add variadic stack op (#63578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578 Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation. Most of the implementation/tests are the same as `prim::VarConcat`. Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt` Reviewed By: navahgar Differential Revision: D30426232 fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce	2021-08-24 08:20:54 -07:00
Mikhail Zolotukhin	f0d274294d	[TensorExpr] Nuke KernelArena and KernelScope. (#63587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587 Now that there is no classes using KernelArena for memory management we can remove it. Differential Revision: D30429115 D30429115 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544	2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin	62d02f2b57	[TensorExpr] Make 'Tensor' a value type. (#63586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586 This is another commit in transition from KernelArena memory management. Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need to dynamically allocate it at all - it's cheap to pass it by value, and that's what we're switching to in this commit. After this change nothing uses KernelScope/KernelArena and they can be safely removed. Differential Revision: D30429114 D30429114 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819	2021-08-24 00:32:13 -07:00
Mikhail Zolotukhin	dd96c26066	[TensorExpr] More NFC changes like Expr* -> ExprPtr. (#63778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63778 This is a preparation for a switch from raw pointers to shared pointers as a memory model for TE expressions and statements. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30487425 Pulled By: ZolotukhinM fbshipit-source-id: 9cbe817b7d4e5fc2f150b29bb9b3bf578868f20c	2021-08-24 00:30:49 -07:00
Mike Iovine	fc6dd0bc00	[JIT] Move UseVariadicCat internals (#63577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63577 Since other variadic ops will have an almost identical implementation, we can generalize the `UseVariadicCat` implementation and put it in a common folder. Also moved some test utilities that other variadic op tests will likely need. Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOptTest` Reviewed By: navahgar Differential Revision: D30409937 fbshipit-source-id: 925c11c27b58ce98cb8368d2a205e26ba66d3db9	2021-08-23 17:30:36 -07:00
Bert Maher	37d60c08e5	Revert D30360382: [nnc] Support thread level parallelism in fused kernels Test Plan: revert-hammer Differential Revision: D30360382 (`d6d86efb1c`) Original commit changeset: 29acf4e932c6 fbshipit-source-id: e0531113135d30eabb172dc1537d5dd6d65dc438	2021-08-21 03:46:43 -07:00
Bert Maher	76da46ccdc	Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism Test Plan: revert-hammer Differential Revision: D30417127 (`6600bc9651`) Original commit changeset: b77d7c68364f fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1	2021-08-21 03:38:07 -07:00
Bert Maher	6600bc9651	Remove flag to toggle CPU fusion in the presence of parallelism (#63514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30417127 Pulled By: bertmaher fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e	2021-08-20 11:18:19 -07:00
Bert Maher	d6d86efb1c	[nnc] Support thread level parallelism in fused kernels (#63386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63386 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30360382 Pulled By: bertmaher fbshipit-source-id: 29acf4e932c669ce0f35823faea9099bcd8119b6	2021-08-20 11:18:17 -07:00
Raghavan Raman	d82667f7e2	[nnc] Updated sliceTail to do inplace mutation (#63532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63532 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30412184 Pulled By: navahgar fbshipit-source-id: e7669d3b9d24e14501f3feb6505c88d1d42030c6	2021-08-19 22:55:30 -07:00
Raghavan Raman	5e31a3b904	[nnc] Updated sliceHead to do inplace mutation (#63531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63531 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30412183 Pulled By: navahgar fbshipit-source-id: 47ee9482a36e606788d28d22eee4edaca45ffa50	2021-08-19 22:54:05 -07:00
Mikhail Zolotukhin	6e00b31b15	[TensorExpr] Make CacheReplacer and IndexFlattener mutate stmts/exprs inplace. (#63527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63527 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30411411 Pulled By: ZolotukhinM fbshipit-source-id: efb14ee57b36537fa4fefa89bdd6bafe7151c012	2021-08-18 22:59:31 -07:00
Mikhail Zolotukhin	1d62fb8a63	[TensorExpr] Speedup ExternalCall.ComputeInterop test by reducing tensor sizes. (#63526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63526 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30411410 Pulled By: ZolotukhinM fbshipit-source-id: d9a99afac14d2238b5100c98ae9ed4467f9f05ea	2021-08-18 22:58:25 -07:00
Mikhail Zolotukhin	7fdba4564a	[TensorExpr] IRSimplifier: sort terms in polynomials, terms, minterms, maxterms. (#63197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63197 This solves non-determinism from using hash values in sort methods. Changes in tests are mostly mechanical. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292776 Pulled By: ZolotukhinM fbshipit-source-id: 74f57b53c3afc9d4be45715fd74781271373e055	2021-08-18 14:49:27 -07:00
Mikhail Zolotukhin	1dc2b52764	[TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195 This helps us to later switch from using KernelArena with raw pointers to shared pointers without having to change all our source files at once. The changes are mechanical and should not affect any functionality. With this PR, we're changing the following: * `Add` --> `AddPtr` `new Add(...)` --> `alloc<Add>(...)` * `dynamic_cast<Add>` --> `to<Add>` `static_cast<Add>` --> `static_to<Add>` Due to some complications with args forwarding, some places became more verbose, e.g.: `new Block({})` --> `new Block(std::vector<ExprPtr>())` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292779 Pulled By: ZolotukhinM fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9	2021-08-17 13:44:45 -07:00
Mikhail Zolotukhin	548c717cbd	[TensorExpr] Remove test_train from tensorexpr tests. (#63194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63194 This test implements functionality used nowhere, and the author no longer works on that. This PR also adds test_approx to CMakeLists where it's been missing before. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30292777 Pulled By: ZolotukhinM fbshipit-source-id: ab6d98e729320a16f1b02ea0c69734f5e7fb2554	2021-08-16 20:36:31 -07:00
Don Jang	e7724bb100	[JIT] Set future's error to current exception as is when `--torch_jit_enable_rethrow_caught_exception=true` (#63348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348 This change addresses singlaiiit's comment on D30241792 (`61b49c8e41`), which makes the JIT interpreter's behavior consistent between `future` is set and not. Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path. Reviewed By: singlaiiit Differential Revision: D30347782 fbshipit-source-id: 79ce57283154ca4372e5341217d942398db21ac8	2021-08-16 17:32:13 -07:00
Raghavan Raman	e50e8b07d8	[nnc] Updated IRMutator and IRSimplifier to perform in-place mutations. (#63246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63246 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30309636 Pulled By: navahgar fbshipit-source-id: 409ea8d6982888cfee9127e6248044dd2ed9d8d4	2021-08-16 00:09:22 -07:00
Kimish Patel	38c185189c	[Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419 This diff adds support for cpu only kineto profiler on mobile. Thus enabling chrome trace generation on mobile. This bring cpp API for mobile profiling on part with Torchscript. This is done via: 1. Utilizating debug handle annotations in KinetoEvent. 2. Adding post processing capability, via callbacks, to KinetoThreadLocalState 3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be used in surrounding scope of model execution. This will write chrome trace to the location specified in profiler constructor. Test Plan: MobileProfiler.ModuleHierarchy Imported from OSS Reviewed By: raziel Differential Revision: D29993660 fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299	2021-08-13 21:40:19 -07:00
Kimish Patel	1b04d99f55	[Pytorch Profiler] Introduce scopes to enableProfiler (#62417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62417 This diff adds an option to make enableProfiler enable callbacks only for certain RecordScopes. Why? Profiling has some overhead when we repeatedly execute callbacks for alls copes. On mobile side when we often have small quantized models this overhead can be large. We observed that by only profiling top level op and skipping profiling of other atend ops called within we can limit this overhead. For example, instead of profling at::conv2d -> at::convolution -> at::convolution_ and further more if ops like transpose etc. are called, skipping profiling of those. Of course this limits the visibility, but at the least this way we get a choice. Test Plan: Imported from OSS Reviewed By: ilia-cher Differential Revision: D29993659 fbshipit-source-id: 852d3ae7822f0d94dc6e507bd4019b60d488ef69	2021-08-13 21:40:15 -07:00
Kimish Patel	b00afe135d	[Pytorch Profiler] Add debug_handles to KinetoEvent (#62228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62228 This diff adds debug handles to events and provides a way to use RECORD_FUNCTIONs that will pass debug_handles down to profiler, which will record it in the events. Why add debug_handles? For pytorch mobile, with lite interpreter, we generate debug handles that can be used for lazily symbolicate exception traces to model level stack trace. Similar to the model level stack trace you get in TorchScript models. The debug_handles also enable getting module hierarchy for lite interpreter model, support for which was added to KinetoProfiler in previous diffs. Followup plan: 1. Enabled scope callbacks such that lite interpreter can use it to profiler only top level ops. 2. Enable post processing callbacks that take KinetoEvents and populate module hierarchy using debug handles. This will let us use KinetoProfiler for lite interpter use cases on mobile. Aim is to use RAII guard to similarly generate chrome trace for mobile usecases as well, although only for top level ops. Test Plan: test_misc : RecordDebugHandles.Basic Imported from OSS Reviewed By: ilia-cher Differential Revision: D29935899 fbshipit-source-id: 4f06dc411b6b5fe0ffaebdd26d3274c96f8f389b	2021-08-13 21:40:14 -07:00
Kimish Patel	54f2eb6e7e	[Pytorch Profiler] Add support for adding module hierarchy to (#61792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792 KinetoEvent This PR adds module hierarchy information to events. What is module hierarchy information attached to events? During profiling a TorchScript module, when events are added, we ask JIT what is the module hierarchy associated with the node being executed. At the time of execution of that node, there might be multiple frames in the stack of interpreter. For each frame, we find corresponding node and the corresponding module hierarchy is queried. Module hierarchy corresponding to the node is associated with node's InlinedCallStack. InlinedCallStack of node tracks the path via which the node is inlined. Thus during the inlining process we annotate module information corresponding to the CallMethod nodes being inlined. With this PR, chrome trace will contain additional metadata: "Module Hierarchy". This can look like this: TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward It contains module instance, type name and the method name in the callstack. Test Plan: test_profiler Imported from OSS Reviewed By: raziel, ilia-cher Differential Revision: D29745442 fbshipit-source-id: dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528	2021-08-13 21:39:10 -07:00
Don Jang	61b49c8e41	[JIT] Add a flag to rethrow caught exception in jit interpreter (#63073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63073 It turned out that it's less than ideal to print out verbose stacktrace in exception messages in high-QPS services (see the related task) with a non-significant failure rate due to the truncation of long stacktrace which results in losing the original exception message thrown from native code. It is actually desirable to retain only the message of the original exception directly thrown from native code in such a usecase. This change adds a new flag `torch_jit_disable_exception_stacktrace` to the pytorch jit interpreter to suppress stacktrace in the messages of exception thrown from the interpreter. Reviewed By: Krovatkin Differential Revision: D30241792 fbshipit-source-id: c340225c69286663cbd857bd31ba6f1736b1ac4c	2021-08-13 08:44:24 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Nikita Shulga	709ac6853a	Fix warnings (#62930 ) Summary: Add `-Wno-writable-strings`(which is clang's flavor of `-Wwrite-strings`) to list of warnings ignored while compiling torch_python. Avoid unnecessary copies in range loop Fix number of signed-unsigned comparisons Found while building locally on M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62930 Reviewed By: albanD Differential Revision: D30171981 Pulled By: malfet fbshipit-source-id: 25bd43dab5675f927ca707e32737ed178b04651e	2021-08-11 14:07:10 -07:00
Howard Cheng	fa22f6303f	[PyTorch] Add flop count for addmm (#61895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61895 * Add FLOP count for addmm, should be `2mnk`. Share the same code path for `addmm` and `mm`. Test Plan: Imported from OSS `python test/test_profiler.py` Run a sample profile and check that FLOPS for `aten::addmm` is correct. `[chowar@devbig053.frc2 ~/local/pytorch/build] ninja bin/test_jit` `[chowar@devbig053.frc2 ~/local/pytorch/build] ./bin/test_jit --gtest_filter='ComputeFlopsTest'` Reviewed By: dskhudia Differential Revision: D29785671 fbshipit-source-id: d1512036202d7234a981bda897af1f75808ccbfe	2021-08-11 12:33:43 -07:00
Jacob Szwejbka	b746fed164	[Pytorch Edge] Move RuntimeCompatibilityInfo Factory Method (#63005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63005 Realized I forgot to move the Runtime half of these functions be within the struct. Test Plan: ci Reviewed By: pavithranrao Differential Revision: D30205521 fbshipit-source-id: ccd87d7d78450dd0dd23ba493bbb9d87be4640a5	2021-08-11 11:15:57 -07:00
tktrungna	2f5ac9c0ba	update test distributed (#62796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62796 Fixes #62380 * update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder * add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208). ### Test plan check if all ci workflows pass Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30193142 Pulled By: tktrungna fbshipit-source-id: 1247f9eda1c11c763c31c7383c77545b1ead1a60	2021-08-10 16:29:47 -07:00
Howard Huang	4d0497034c	Remove process_group_agent and faulty_process_group_agent files (#62985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62985 Remove the process_group_agent and faulty_process_group_agent code now that PROCESS_GROUP backend has been deprecated for RPC (https://github.com/pytorch/pytorch/issues/55615). Discussed with xush6528 that it was okay to remove ProcessGroupAgentTest and ProcessGroupAgentBench which depended on process_group_agent. Test Plan: CI tests Reviewed By: pritamdamania87 Differential Revision: D30195576 fbshipit-source-id: 8b4381cffadb868b19d481198015d0a67b205811	2021-08-10 15:57:39 -07:00
Will Constable	22e3cc21e5	Back out "Enable test_api IMethodTest in OSS" (#62893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62893 Original commit changeset: 50eb3689cf84 Test Plan: Confirm pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 passes in OSS Reviewed By: seemethere, alanwaketan Differential Revision: D30159999 fbshipit-source-id: 74ff8975328409a3dc8222d3e2707a1bb0ab930c	2021-08-06 16:43:50 -07:00
Jiewen Tan	4b68801c69	Enable test_api IMethodTest in OSS (#62521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62521 This diff did the following few things to enable the tests: 1. Exposed IMethod as TORCH_API. 2. Linked torch_deploy to test_api if USE_DEPLOY == 1. Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.* To be noted, one needs to run `python torch/csrc/deploy/example/generate_examples.py` before the above command. Reviewed By: ezyang Differential Revision: D30055372 Pulled By: alanwaketan fbshipit-source-id: 50eb3689cf84ed0f48be58cd109afcf61ecca508	2021-08-04 21:14:20 -07:00
Raghavan Raman	59dd12042e	[nnc] Removed const from all fields in IR. (#62336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62336 This PR was generated by removing `const` for all types of nodes in NNC IR, and fixing compilation errors that were the result of this change. This is the first step in making all NNC mutations in-place. Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D30049829 Pulled By: navahgar fbshipit-source-id: ed14e2d2ca0559ffc0b92ac371f405579c85dd63	2021-08-03 11:44:36 -07:00
Jacob Szwejbka	474d7ec43b	[Pytorch Edge] Black Box Compatibility API (#61477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61477 It would be nice if the compatibility api was just kinda plug and play with no care about the internals of the api at all. Thats what this diff aims to provide. The general usage would be something like < On the Client > RuntimeCompatibilityInfo runtime_info = get_runtime_compatibility_info(); . . . < On the Server > ModelCompatibilityInfo model_info = get_model_compatibility_info(<model_path>); bool compatible = is_compatible(runtime_info, model_info); Currently RuntimeCompatibilityInfo and ModelCompatibilityInfo are exactly the same, but it seemed feasible to me that they may end up diverging as more information is added to the api (such as a min supported bytecode version being exposed from the runtime). Test Plan: unit test and ci Reviewed By: dhruvbird, raziel Differential Revision: D29624080 fbshipit-source-id: 43c1ce15531f6f1a92f357f9cde4e6634e561700	2021-08-03 11:27:28 -07:00
yanbing-j	c7a7c2b62f	Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525 ) Summary: Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one. Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525 Reviewed By: ejguan Differential Revision: D29940369 Pulled By: ezyang fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf	2021-08-03 06:52:23 -07:00
Hui Guo	3a592730d5	[nnc] Simplify i%100 to i if i is less than 100; fixed #52580 (#60693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60693 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29375938 Pulled By: huiguoo fbshipit-source-id: 1388729c5b93805cb156efa53e8823d5462885bf	2021-08-02 18:38:54 -07:00
Hui Guo	8f7ae77040	[nnc] Add context-sensitive simplification for div/mod (#60688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60688 Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D29373313 Pulled By: huiguoo fbshipit-source-id: 90d7f2fbfce583b0ea3b0f1c7899e22b0210bd62	2021-08-02 18:37:39 -07:00
Joel Schlosser	ee482edf0a	Callable activation function support for Transformer modules (C++) (#62342 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60747 Enhances the C++ versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62342 Reviewed By: malfet Differential Revision: D30022592 Pulled By: jbschlosser fbshipit-source-id: d3c62410b84b1bd8c5ed3a1b3a3cce55608390c4	2021-08-02 08:06:39 -07:00
Will Constable	bc787f2402	Fix setArgumentNames and make Script/Python consistent (#62442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62442 For PythonMethodWrapper::setArgumentNames, make sure to use the correct method specified by method_name_ rather than using the parent model_ obj which itself _is_ callable, but that callable is not the right signature to extract. For Python vs Script, unify the behavior to avoid the 'self' parameter, so we only list the argument names to the unbound arguments which is what we need in practice. Test Plan: update unit test and it passes Reviewed By: alanwaketan Differential Revision: D29965283 fbshipit-source-id: a4e6a1d0f393f2a41c3afac32285548832da3fb4	2021-07-29 21:29:06 -07:00
Dhruv Matani	0b3f42fa4f	[PyTorch Edge] Add test for lite interpreter operator caching (#62306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62306 Test to see if caching of operators works as expected. When caching operators during model load we look up using the operator name. This test ensures that even if there are multiple operators with the same name (in the same model), the caching distinguishes between the ones that have a different number of arguments specified during the call in the serialized bytecode. In this specific test, there's a model with 3 methods, 2 of which return a `float32` tensor and one which return an `int64` dtype. Please see the comments in the diff for details. ghstack-source-id: 134634613 Test Plan: Test command: ``` cd fbsource/fbcode/ buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.OperatorCacheDifferentiatesDefaultArgs' ``` ``` cd fbsource/ buck test xplat/caffe2:test_lite_interpreter ``` Reviewed By: raziel Differential Revision: D29929116 fbshipit-source-id: 1d42bd3e6d33128631e970c477344564b0337325	2021-07-29 20:14:45 -07:00
Dhruv Matani	0bbdf0e1e3	[PyTorch Edge] Add test_lite_interpreter to fbsource xplat BUCK files (#62305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62305 Currently, it's super time consuming to run a lite interpreter test from fbcode since it takes > 10 minutes to build. Recently, I haven't been able to do that either due to low disk space. Having this test available in fbsource/xplat/ is a great win for productivity since I can re-run it in ~2 minutes even after significant changes! I've had to disarm some tests that can only run in OSS of fbcode builds (since they need functionality that we don't include for on-device FB builds). They are disarmed using the macro `FB_XPLAT_BUILD`. ghstack-source-id: 134634611 Test Plan: New test! Reviewed By: raziel, JacobSzwejbka, cccclai Differential Revision: D29954943 fbshipit-source-id: e55eab14309472ef6bc9b0afe0af126c561dbdb1	2021-07-29 20:13:06 -07:00
Raghavan Raman	7b6d569a2b	[jit] Renamed prim::Concat as prim::VarConcat (#61983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61983 Trial #2. The previous PR (https://github.com/pytorch/pytorch/pull/61498) was reverted because this caused a failure in `pytorch_linux_backward_compatibility_check_test`. Fixed that now by adding to the exception list in `check_backward_compatibility.py`. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29828830 Pulled By: navahgar fbshipit-source-id: 947a7b1622ff6e3e575c051b8f34a789e105bcee	2021-07-29 10:28:59 -07:00
guyang3532	4ed8858817	Exclude time of waiting in queue from gloo communication prof… (#61342 ) Summary: Background: The gloo communication implementation is as follow: 1. Construct communication workers and push them into a queue. 2. Initialize a thread pool and each thread run a loop to get worker from the queue and execute it. Issue: The recorded profiling time span start from the worker construction and end at finish. So it will include the time of worker waiting in the queue and will result in multiple gloo communication time span overlapping with each other in a same thread in the timeline: ![image](https://user-images.githubusercontent.com/62738430/124867273-5bc95b80-dff0-11eb-8664-6e5d4166fc39.png) This is because when next work is waiting in the queue, the last work is not finished. Solution: This PR delays the profiling start time of gloo communication from worker construction to worker is really executed, so the profiling span will not include the time of waiting in queue. Implementation as follow: 1. Firstly, disable the original record function by specifying 'nullptr' to 'profilingTitle' argument of ProcessGroup::Work 2. Construct a 'recordFunctionBeforeCallback_' and 'recordFunctionEndCallback_' and save it as member of the worker. 3. When the worker is executed, invoke the 'recordFunctionBeforeCallback_'. 4. The 'recordFunctionEndCallback_' will be invoked at finish as before. After this modification, the gloo profiling span in timeline will not overlap with each other: ![image](https://user-images.githubusercontent.com/62738430/124868716-bb286b00-dff2-11eb-9cf0-d0494a356d0c.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61342 Reviewed By: albanD Differential Revision: D29811656 Pulled By: gdankel fbshipit-source-id: ff07e8906d90f21a072049998400b4a48791e441	2021-07-28 22:24:26 -07:00
Laurence Rouesnel	3bdee2bbed	[jit] Rewrote DFS graph iterator to remove unnecessary local state (#61326 ) (#61980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61980 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29917766 Pulled By: laurencer fbshipit-source-id: 536c4806636fe9e709e8bffdefa9320127064dea	2021-07-27 11:50:20 -07:00
Pavithran Ramachandran	d0f430927b	[PyTorch][Edge] Serializing sub modules with same names (#61933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61933 ### Issue: SubModules with same name are not serialized correctly in bytecode format while using `_save_for_mobile`. These submodules are not distinguished as different modules even though they have different foward, setstate etc if they have the same name. ### Fix: Mangler creates unique names so that modules and submodules that have same names can be uniquely identified while saving the module. iseeyuan rightly pointed out the underlying issue that mangler is not used in the process of saving bytecode and hence unique references for the submodules are not created. Please refer to the notebook to repro the issue: N777224 ### Diff: The above idea of fix is implemented. The mangled names are used in bytecode thereby the files in `code/` directory now have right reference to the `bytecode.pkl` Will this have backward compatibility? iseeyuan please feel free to correct or update this. Yes. This fix impacts only modules with same name sub modules which were not serialized correctly before. Existing modules should have correct references and `_load_for_mobile` must not see any change. To confirm this the existing test cases need to pass for the diff to be approved and shipped. ghstack-source-id: 134242696 Test Plan: ``` ~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestCompositeWithSetStates Downloaded 0/5 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 19.2 sec (100%) 17619/17619 jobs, 3/17619 updated Total time: 19.5 sec More details at https://www.internalfb.com/intern/buck/build/91542d50-25f2-434d-9e1a-b93117f4efe1 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: de9e27cf-4c6c-4980-8bc5-b830b7c9c534 Trace available for this run at /tmp/tpx-20210719-161607.659665/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388 ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (8.140) ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.528) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388 ``` ``` ~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestConsistencyOfCompositeWithSetStates Building: finished in 4.7 sec (100%) 6787/6787 jobs, 0/6787 updated Total time: 5.0 sec More details at https://www.internalfb.com/intern/buck/build/63d6d871-1dd9-4c72-a63b-ed91900c4dc9 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 81023cd2-c1a2-498b-81b8-86383d73d23b Trace available for this run at /tmp/tpx-20210722-160818.436635/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153 ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (7.867) ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestConsistencyOfCompositeWithSetStates (0.607) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153 ``` To check the `bytecode.pkl` using module inspector please check: N1007089 Reviewed By: iseeyuan Differential Revision: D29669831 fbshipit-source-id: 504dfcb5f7446be5e1c9bd31f0bd9c986ce1a647	2021-07-26 16:31:48 -07:00
Kimish Patel	026cfe85b4	Fix InlinedCallStack annotation to account for module calling its own (#61791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61791 methods from forward During inlining we attached InlinedCallstack to nodes being inlined. In the process we attach moodule information as well, such that if CallMethod is being inlined we know which class instance and class type the method belongs to. However, CallMethod can be calling a method of the same object to which the graph belongs. e.g.: ``` def forward(self, input): x = input + 10 return forward_impl_(x, input) ``` Here forward_impl is method defined on the same class in which forward is defined. Existing module hierarchy annotation will mislabel this as unknown instance since the method is not associated with output of GetAttr node (it would be we had called self.conv.forward_impl_ for example). Change in this PR reconciles this by creating a placeholder name "SELF" for module instance indicating that you can traverse InlinedCallStack backwards to find first node with name != SELF, which would be the name of the object. e.g.: TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward Test Plan: Add test Imported from OSS Reviewed By: larryliu0820 Differential Revision: D29745443 fbshipit-source-id: 1525e41df53913341c4c36a56772454782a0ba93	2021-07-26 15:00:57 -07:00
Richard Barnes	ee44d73e59	Modernize override (#61744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61744 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717320 fbshipit-source-id: 6eea4295ee2e5572ab337620be412376fcc2f3cc	2021-07-23 23:04:46 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
imaginary-person	9e53c823b8	Add AVX512 support in ATen & remove AVX support (#61903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61903 ### Remaining Tasks - [ ] Collate results of benchmarks on two Intel Xeon machines (with & without CUDA, to check if CPU throttling causes issues with GPUs) - make graphs, including Roofline model plots (Intel Advisor can't make them with libgomp, though, but with Intel OpenMP). ### Summary 1. This draft PR produces binaries with with 3 types of ATen kernels - default, AVX2, AVX512 . Using the environment variable `ATEN_AVX512_256=TRUE` also results in 3 types of kernels, but the compiler can use 32 ymm registers for AVX2, instead of the default 16. ATen kernels for `CPU_CAPABILITY_AVX` have been removed. 2. `nansum` is not using AVX512 kernel right now, as it has poorer accuracy for Float16, than does AVX2 or DEFAULT, whose respective accuracies aren't very good either (#59415). It was more convenient to disable AVX512 dispatch for all dtypes of `nansum` for now. 3. On Windows , ATen Quantized AVX512 kernels are not being used, as quantization tests are flaky. If `--continue-through-failure` is used, then `test_compare_model_outputs_functional_static` fails. But if this test is skipped, `test_compare_model_outputs_conv_static` fails. If both these tests are skipped, then a third one fails. These are hard to debug right now due to not having access to a Windows machine with AVX512 support, so it was more convenient to disable AVX512 dispatch of all ATen Quantized kernels on Windows for now. 4. One test is currently being skipped - [test_lstm` in `quantization.bc](https://github.com/pytorch/pytorch/issues/59098) - It fails only on Cascade Lake machines, irrespective of the `ATEN_CPU_CAPABILITY` used, because FBGEMM uses `AVX512_VNNI` on machines that support it. The value of `reduce_range` should be used as `False` on such machines. The list of the changes is at https://gist.github.com/imaginary-person/4b4fda660534f0493bf9573d511a878d. Credits to ezyang for proposing `AVX512_256` - these use AVX2 intrinsics but benefit from 32 registers, instead of the 16 ymm registers that AVX2 uses. Credits to limo1996 for the initial proposal, and for optimizing `hsub_pd` & `hadd_pd`, which didn't have direct AVX512 equivalents, and are being used in some kernels. He also refactored `vec/functional.h` to remove duplicated code. Credits to quickwritereader for helping fix 4 failing complex multiplication & division tests. ### Testing 1. `vec_test_all_types` was modified to test basic AVX512 support, as tests already existed for AVX2. Only one test had to be modified, as it was hardcoded for AVX2. 2. `pytorch_linux_bionic_py3_8_gcc9_coverage_test1` & `pytorch_linux_bionic_py3_8_gcc9_coverage_test2` are now using `linux.2xlarge` instances, as they support AVX512. They were used for testing AVX512 kernels, as AVX512 kernels are being used by default in both of the CI checks. Windows CI checks had already been using machines with AVX512 support. ### Would the downclocking caused by AVX512 pose an issue? I think it's important to note that AVX2 causes downclocking as well, and the additional downclocking caused by AVX512 may not hamper performance on some Skylake machines & beyond, because of the double vector-size. I think that [this post with verifiable references is a must-read](https://community.intel.com/t5/Software-Tuning-Performance/Unexpected-power-vs-cores-profile-for-MKL-kernels-on-modern-Xeon/m-p/1133869/highlight/true#M6450). Also, AVX512 would _probably not_ hurt performance on a high-end machine, [but measurements are recommended](https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/). In case it does, `ATEN_AVX512_256=TRUE` can be used for building PyTorch, as AVX2 can then use 32 ymm registers instead of the default 16. [FBGEMM uses `AVX512_256` only on Xeon D processors](https://github.com/pytorch/FBGEMM/pull/209), which are said to have poor AVX512 performance. This [official data](https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-scalable-spec-update.pdf) is for the Intel Skylake family, and the first link helps understand its significance. Cascade Lake & Ice Lake SP Xeon processors are said to be even better when it comes to AVX512 performance. Here is the corresponding data for [Cascade Lake](https://cdrdv2.intel.com/v1/dl/getContent/338848) - ![CASCADE LAKE AVX2](https://user-images.githubusercontent.com/76181208/120666172-ffec3f80-c451-11eb-8ea1-8933ccc12a1b.PNG) ![CASCADE LAKE AVX512](https://user-images.githubusercontent.com/76181208/120666190-04b0f380-c452-11eb-9faa-38d233c874c8.PNG) The corresponding data isn't publicly available for Intel Xeon SP 3rd gen (Ice Lake SP), but [Intel mentioned that the 3rd gen has frequency improvements pertaining to AVX512](https://newsroom.intel.com/wp-content/uploads/sites/11/2021/04/3rd-Gen-Intel-Xeon-Scalable-Platform-Press-Presentation-281884.pdf). Ice Lake SP machines also have 48 KB L1D caches, so that's another reason for AVX512 performance to be better on them. ### Is PyTorch always faster with AVX512? No, but then PyTorch is not always faster with AVX2 either. Please refer to #60202. The benefit from vectorization is apparent with with small tensors that fit in caches or in kernels that are more compute heavy. For instance, AVX512 or AVX2 would yield no benefit for adding two 64 MB tensors, but adding two 1 MB tensors would do well with AVX2, and even more so with AVX512. It seems that memory-bound computations, such as adding two 64 MB tensors can be slow with vectorization (depending upon the number of threads used), as the effects of downclocking can then be observed. Original pull request: https://github.com/pytorch/pytorch/pull/56992 Reviewed By: soulitzer Differential Revision: D29266289 Pulled By: ezyang fbshipit-source-id: 2d5e8d1c2307252f22423bbc14f136c67c3e6184	2021-07-22 08:51:49 -07:00
Jiewen Tan	31beef009d	Fix IMethodTest.GetArgumentNames after D29648756 (#61985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61985 Fix IMethodTest.GetArgumentNames after D29648756 (`641f6ef8a7`). ghstack-source-id: 134054637 Test Plan: buck test mode/dev caffe2/test/cpp/api:imethod -- IMethodTest.GetArgumentNames Reviewed By: suo Differential Revision: D29828807 fbshipit-source-id: b1411745b91e1b8c0ea0fd9e9666e22125dde333	2021-07-22 00:21:59 -07:00
Laurence Rouesnel	adb73d3dcf	Removed overhead from reshape() call if tensor doesn't need to be changed (#61466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61466 ## Goal Per #55126 the performance of `reshape` is worse than `alias` in cases where they are performing the same operation (i.e. where reshape is returning a view) because `reshape` delegates to `view` and duplicates some of the operations (specifically `infer_size_dv` and `computeStride`). The goal of this pull-request is to reduce or remove the additional overhead that `reshape` has. ### Proposed Implementation Instead of using `view` we implement a private/internal operator (`_reshape_alias`) that `reshape` dispatches to which skips the relevant checks. This is functionally equivalent to `as_strided` however it is a lot simpler because it's specialized to this use-case, and importantly the `backward` implementation is a lot faster. Note that we have to dispatch (`reshape` is a composite operator) because `reshape` can return either a view or a copy of the Tensor depending on the parameters, and this complicates implementing a derivative/backward for `reshape`. ### Why not `as_strided`? Using `as_strided` directly slows down autograd. If we use a custom function equivalent to `_reshape_alias` but with a simpler backward function then `view` has the same performance as `reshape`. If we delegate to `as_strided` it is about 56% slower (and this holds against our custom function). This is also the reason we make an internal operator named `_reshape_alias` instead of exposing a new operator since this should only be used in the `reshape` case and it is effectively a more limited version of `view`, `alias`, and `as_strided`. ## Benchmarks In a micro-benchmark for `backward` running: ```cpp // Setup at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); // Benchmark loop // `reshape(-1)` replaced with a call to view(-1) for view baseline x.pow(4).reshape(-1).mean().backward(); ``` I also benchmarked simple operations without gradients using: ```cpp // Setup at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); // Benchmark loop x.reshape(-1) // replaced with a call to view(-1) for view baseline ``` Baselined to `view`: * Original `reshape`: `+3.3%` (without gradients `+20.8%`) * Using `as_strided`: `+55.1%` (without gradients `+1.0%`) * Using custom `_reshape_view`: `-1.0%` (without gradients `+6.2%`) In absolute terms (note the percentages above were generated comparing between runs/tests rather than to a single baseline): * Original `view`: `53.66 us` (without gradients `582.78 ns`) * Original `reshape`: `55.46 us` (without gradients `704.24 ns`) * Using `as_strided`: `83.24 us` (without gradients `576.49 ns`) * Using custom `_reshape_view`: `53.13 us` (without gradients `536.01 ns`) Note that these benchmarks perform a backwards operation as well. When compared without using gradient computation at all the performance differneces are more pronounced as this takes up more of the time. ### Original performance <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e4d393160> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.66 us IQR: 2.70 us (52.54 to 55.24) 884 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e2ebd4fa0> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 55.46 us IQR: 2.61 us (54.39 to 57.01) 889 measurements, 100 runs per measurement, 1 thread] 2276116 2286256 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f0e5b2e3e20> 2640 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) 1920 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1040 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) 720 ???:__tls_get_addr 520 ???:at::shouldRunRecordFunction(bool) 520 ???:__memcpy_avx_unaligned_erms 200 ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) 100 ???:c10::TensorImpl::strides() const 100 ???:c10::TensorImpl::sizes() const 100 ???:at::(anonymous namespace)::manager() 77 /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_7815557938202456331/timer_src.cpp:main 40 ???:c10::TensorImpl::numel() const -77 /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_8055217880649990171/timer_src.cpp:main -260 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 10140 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f850dd66c10> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 582.78 ns IQR: 33.80 ns (573.80 to 607.61) 833 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f850de31e20> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 704.24 ns IQR: 24.42 ns (697.20 to 721.62) 679 measurements, 10000 runs per measurement, 1 thread] 56896 67036 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f84e1930bb0> 2640 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) 1920 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1040 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) 720 ???:__tls_get_addr 520 ???:at::shouldRunRecordFunction(bool) 520 ???:__memcpy_avx_unaligned_erms 200 ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) 100 ???:c10::TensorImpl::strides() const 100 ???:c10::TensorImpl::sizes() const 100 ???:at::(anonymous namespace)::manager() 76 /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_547407365342278353/timer_src.cpp:main 40 ???:c10::TensorImpl::numel() const -76 /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_3457873755756181226/timer_src.cpp:main -260 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 10140 ``` </details> ### Using `as_strided` <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f8b13bb5b50> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.37 us IQR: 3.15 us (51.73 to 54.88) 936 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f8af55f8490> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 83.24 us IQR: 4.05 us (81.20 to 85.25) 609 measurements, 100 runs per measurement, 1 thread] 2267916 2525061 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f8af55f8e50> 31930 ???:_int_free 15940 ???:malloc 11595 ???:_int_malloc 10100 ???:torch::autograd::generated::details::as_strided_backward(at::Tensor, at::TensorGeometry, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 9360 ???:__tls_get_addr 8280 ???:free 8100 ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 4520 ???:c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_() 4080 ???:operator new(unsigned long) ... -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1220 ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -2560 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) -4860 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) Total: 257145 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f93176a0160> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 570.55 ns IQR: 32.69 ns (552.87 to 585.56) 874 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f92f8f29490> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 576.49 ns IQR: 37.95 ns (559.51 to 597.46) 861 measurements, 10000 runs per measurement, 1 thread] 56896 58556 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f932556ca60> 2140 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1940 ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1880 ???:torch::ADInplaceOrView::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1720 ???:at::_ops::as_strided::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1400 ???:at::native::as_strided_tensorimpl(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1260 ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)'2 1260 ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) ... -620 ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -1740 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 1660 ``` </details> ### Using custom function (`_reshape_alias`) <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f16861d6b50> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.50 us IQR: 2.64 us (52.32 to 54.96) 906 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f1667b2ed60> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.13 us IQR: 3.40 us (51.72 to 55.13) 914 measurements, 100 runs per measurement, 1 thread] 2269736 2273236 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f1693f8dc10> 5060 ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 2000 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1780 ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1660 ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1600 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1220 ???:torch::autograd::generated::AliasToShapeBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) ... -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1220 ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) -4860 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) Total: 3500 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f5287adfb20> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 505.10 ns IQR: 20.04 ns (500.41 to 520.45) 944 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f526951b430> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 536.01 ns IQR: 17.81 ns (531.34 to 549.16) 916 measurements, 10000 runs per measurement, 1 thread] 56896 60376 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f5295896c10> 2000 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1860 ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1780 ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1660 ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1600 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) ... -620 ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -1740 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 3480 ``` </details> Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29792126 Pulled By: laurencer fbshipit-source-id: f0519b45b65f868aa3e8651679354558bd761dfd	2021-07-21 14:05:35 -07:00
Vitaly Fedyunin	33db828e52	Revert D29647586: [jit] Renamed prim::Concat as prim::VarConcat Test Plan: revert-hammer Differential Revision: D29647586 (`db11619901`) Original commit changeset: cdd34ea5a3c9 fbshipit-source-id: bab5ac4ed67a00ac151fe39463aa3fb56897d7f4	2021-07-21 08:28:26 -07:00
Raghavan Raman	db11619901	[jit] Renamed prim::Concat as prim::VarConcat (#61498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61498 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29647586 Pulled By: navahgar fbshipit-source-id: cdd34ea5a3c986350a813be17e7d428844ea4cbf	2021-07-20 19:30:00 -07:00
Raghavan Raman	429908e540	[jit] Updated the concat common inputs elimination pass to use the variadic cat op instead of aten::cat (#60908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60908 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29441865 Pulled By: navahgar fbshipit-source-id: 2ab08168102eff1f43667ca418bdd94bb2df562a	2021-07-20 19:29:57 -07:00
Raghavan Raman	53668f8bf6	[jit] Added an API to remove list mutations and replace with variadic cat until fixed point (#60776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60776 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29406099 Pulled By: navahgar fbshipit-source-id: e2e69eb6ebff3bc6e25d80f46ce118e52f557fb6	2021-07-20 19:29:55 -07:00
Raghavan Raman	4dd04a8bbe	[jit] Handled cases when input list to cat is mutated after cat using AliasDb (#60774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60774 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29406100 Pulled By: navahgar fbshipit-source-id: af6afca65881c18c51b482eb63898a0f1c94d591	2021-07-20 19:28:42 -07:00
Raghavan Raman	593e8f41ca	[jit] Fixed a bug in the pass that replaces cat with the variadic op (#61795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61795 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29748785 Pulled By: navahgar fbshipit-source-id: df5b84c35f007718c92a21a0b44a231e6d346918	2021-07-18 21:38:30 -07:00
Nikita Shulga	ee2f2ec9a5	Revert D29687143: [3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test Test Plan: revert-hammer Differential Revision: D29687143 (`5798a00aa4`) Original commit changeset: 9ba9e57f7f85 fbshipit-source-id: 6a672c76a04366b35c492698ae5b39fd4dd1785f	2021-07-16 13:32:51 -07:00
Amy He	5798a00aa4	[3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test (#61594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61594 ### Summary: Added a unit test for the Nnapi delegate's preprocess() function. The function was previously tested locally, but now a basic test is added for OSS. See https://github.com/pytorch/pytorch/pull/61499 for preprocess implementation. See D29647123 for local testing. TODO: Add more comprehensive tests. Add tests for model execution, after the Nnapi delegate's initialization and execution is implemented T91991928. CMakeLists.txt: Added a library for the Nnapi delegate - Explicit linking of torch_python is necessary for the Nnapi delegate's use of pybind test_backends.py: Added a test for lowering to Nnapi - Based off https://github.com/pytorch/pytorch/blob/master/test/test_nnapi.py - Only differences are the loading of the nnapi backend library and the need to change dtype from float64 to float32 ### Test Plan: Running `python test/test_jit.py TestBackendsWithCompiler -v` succeeds. Also saved and examined the model file locally. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D29687143 fbshipit-source-id: 9ba9e57f7f856e5ac15e13527f6178d613b32802	2021-07-16 11:00:38 -07:00
Bert Maher	b963607d50	[nnc] Insert alloc/free at global scope (#61725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61725 Alloc/free inside a loop isn't really an optimization, and furthermore it breaks some attempted optimization in the llvm backend: we use alloca for small allocations, which is efficient since alloca is on the stack, but there's no corresponding free, so we leak tons of stack. I hit this while building an rfactor buffer inside a very deeply nested loop. ghstack-source-id: 133627310 Test Plan: Unit test which simulates use of a temp buffer in a deeply nested loop. Reviewed By: navahgar Differential Revision: D29533364 fbshipit-source-id: c321f4cb05304cfb9146afe32edc4567b623412e	2021-07-16 08:42:24 -07:00
Raghavan Raman	843c42ffd8	[nnc] Refactored test macros and updated compress buffer tests to use them (#61716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61716 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29715754 Pulled By: navahgar fbshipit-source-id: c400a58b7f393c0f93e5a25f118403124f8834b0	2021-07-15 21:17:14 -07:00
Raghavan Raman	d01837081d	[nnc] Cleaned up compress buffer tests to use BufHandle instead of Buf (#61715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61715 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29715755 Pulled By: navahgar fbshipit-source-id: 453adac8f5b13263c39d96b6b4086425a01bae54	2021-07-15 21:15:23 -07:00
Raghavan Raman	bd360ebe6f	[nnc] Added a new API to distribute loop and all its parents (#61293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61293 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29560008 Pulled By: navahgar fbshipit-source-id: e4e459184f20b1872bc242ba8626d0a6df29e810	2021-07-15 10:28:20 -07:00
Raghavan Raman	76f097466e	[nnc] Added a new API to compress all buffers in a given statement (#61087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61087 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29506677 Pulled By: navahgar fbshipit-source-id: 63583fd5a0e42c0096ddf08d5b96bc680ea8a44e	2021-07-15 10:28:18 -07:00
Raghavan Raman	2908d3eb45	[nnc] Modified the semantics of reorder in using permutation (#61085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61085 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29506679 Pulled By: navahgar fbshipit-source-id: f674aedff8175b9947404fd2164a0b4f57a71e93	2021-07-15 10:28:16 -07:00
Will Constable	a25e6370e5	Add IMethod interface Summary: Expose IMethod interface, which provides a unified interface to either script or python methods backed by torchscript or torchdeploy. IMethod provides a way to depend on a torch method without depending on a particular runtime implementation such as torchscript or python/deploy. Test Plan: add unit tests. Reviewed By: suo Differential Revision: D29463455 fbshipit-source-id: 903391d9af9fbdd8fcdb096c1a136ec6ac153b7c	2021-06-30 11:28:24 -07:00
Bert Maher	93772792e3	[nnc] Get rid of fuser trigger counters (#57334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334 Here's a possibly controversial PR. These counters got in the way of generalizing the fuser tests to handle arbitrary devices, and I guess I'm just generally skeptical that they provide much value. While true that they let us observe whether fusion groups were created, we already have assertions based on the shape of the graph, and I'm not sure that I trust those any less than these counters. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29471484 Pulled By: bertmaher fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57	2021-06-29 22:22:15 -07:00
Raghavan Raman	6d952dbaf0	[nnc] Fixed checking for loop carried dependence while fusing 2D reduction loops (#60609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60609 Fixes #60310 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D29386144 Pulled By: navahgar fbshipit-source-id: 230df4f59d6196a250ea57ff649b117d096fcdbc	2021-06-29 14:17:01 -07:00
Mikhail Zolotukhin	3bfe15085d	[TensorExpr] Add a mechanism to register custom TS->NNC lowerings in TensorExprKernel. (#60804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60804 The lowerings are stored as a map c10::Symbol -> std::function and the signature of thoese functions match the signature of `computeOperandValue`. Custom lowerings have higher priority over the standard ones, i.e. we can redefine how a given op is lowered. In general this feature is aimed at unblocking users whose models contain ops that are not yet supported by NNC - it allows to quickly add a custom lowering for a given op. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D29409580 Pulled By: ZolotukhinM fbshipit-source-id: e8e8dc9d3cb9155cfbf5c08a4216ba1b5b791a60	2021-06-27 15:27:22 -07:00
Xiong Wei	7e3a694b23	supports non-leaf inputs for autograd.backward() function (#60521 ) Summary: Close https://github.com/pytorch/pytorch/issues/60268 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60521 Reviewed By: ngimel Differential Revision: D29393586 Pulled By: albanD fbshipit-source-id: 2dd2de427ecfecca8d544237bacf690e0b7c918c	2021-06-25 18:57:26 -07:00
Martin Yuan	d8c3d555e4	[Delegate] Support composite of lowered sub modules of the same backend (#59921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59921 Test Plan: Imported from OSS Reviewed By: raziel Differential Revision: D29091143 Pulled By: iseeyuan fbshipit-source-id: 9ffcd18681917ece8ec73a34866c53701bdee1bc	2021-06-25 07:18:32 -07:00
Luca Wehrstedt	a016150163	Move torch/lib/c10d to torch/csrc/distributed/c10d (#60543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60543 Since now c10d is part of libtorch, it would also be nice if the sources lived all in one place. ghstack-source-id: 132306292 Test Plan: It builds Reviewed By: cbalioglu Differential Revision: D29062002 fbshipit-source-id: d9e1301e9d73e1643fa0f0119cd2d618f1ad52e6	2021-06-24 12:38:51 -07:00
Raghavan Raman	d3a8505ee1	[jit] Added a pass to transform aten::cat ops to prim::Concat op with variable number of inputs (#59881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59881 This pass is not included in the JIT flow or anywhere else at this point. The idea is, once this lands, everyone can use this to test their workflow with this transformation and once we are convinced this is useful and/or improves performance, we can include it in the appropriate workflow. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29277876 Pulled By: navahgar fbshipit-source-id: b5be7bdcc98dced59295bd7b8f6627619cb58d41	2021-06-24 01:27:41 -07:00
Hui Guo	d867340c7b	[nnc] Add LoopNest::getLoopAt to retrieve a specified inner For-stmt (#60569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60569 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29337767 Pulled By: huiguoo fbshipit-source-id: e3ae23c1b290739c03d1fa5d7da25de878eb1d4c	2021-06-23 15:53:29 -07:00
Hui Guo	c0d08dc10f	[NNC] Add tile transformation in loopnest (fixed #52785 ) (#57758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57758 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28260744 Pulled By: huiguoo fbshipit-source-id: 6b5591850aaf46455bf3c2d776fa930654839a63	2021-06-23 15:52:19 -07:00
Eli Uriegas	2dedd96dd2	cmake: Prefer CMAKE_CURRENT_SOURCE_DIR to TORCH_SRC_DIR (#60493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60493 TORCH_SRC_DIR appears to be a bit bugged when it comes to identifying include directories so let's try and use CMAKE_CURRENT_SOURCE_DIR instead <details> <summary>Logs for builds with torchaudio</summary> ``` -- Building version 0.10.0a0+9e36281 running bdist_wheel running build running build_py copying torchaudio/version.py -> build/lib.linux-x86_64-3.6/torchaudio running build_ext -- Configuring done -- Generating done -- Build files have been written to: /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6 [1/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-error.cc [2/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-math.cc [3/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/feature-functions.cc [4/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-matrix.cc [5/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -c ../../third_party/kaldi/submodule/src/feat/resample.cc [6/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-vector.cc [7/11] /usr/lib64/ccache/c++ -DINCLUDE_KALDI -DTORCH_API_INCLUDE_EXTENSION_H -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_torchaudio_EXPORTS -I../../ -I/tmp/tmp.GKeM3KKcFi/include/python3.6m -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -MF torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o.d -o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -c ../../torchaudio/csrc/kaldi.cpp [8/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlinePitchFeatureImpl::UpdateRemainder(const kaldi::VectorBase<float>&)’: ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:814:11: warning: unused variable ‘full_frame_length’ [-Wunused-variable] 814 \| int32 full_frame_length = opts_.NccfWindowSize() + nccf_last_lag_; \| ^~~~~~~~~~~~~~~~~ ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlineProcessPitch::UpdateNormalizationStats(kaldi::int32)’: ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:1504:35: warning: comparison of integer expressions of different signedness: ‘std::vector<kaldi::OnlineProcessPitch::NormalizationStats>::size_type’ {aka ‘long unsigned int’} and ‘kaldi::int32’ {aka ‘int’} [-Wsign-compare] 1504 \| if (normalization_stats_.size() <= frame) \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ [9/11] : && /usr/bin/cmake -E rm -f third_party/kaldi/libkaldi.a && /usr/bin/ar qc third_party/kaldi/libkaldi.a third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o && /usr/bin/ranlib third_party/kaldi/libkaldi.a && : [10/11] : && /usr/lib64/ccache/c++ -fPIC -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG -shared -Wl,-soname,_torchaudio.so -o torchaudio/csrc/_torchaudio.so torchaudio/csrc/CMakeFiles/_torchaudio.dir/pybind.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/lfilter.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/overdrive.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/utils.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -Wl,-rpath,/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib: /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_python.so third_party/kaldi/libkaldi.a /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed /usr/local/lib/libbreakpad_client.a /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so -lpthread -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so" -Wl,--as-needed /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so && : [10/11] cd /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6 && /usr/bin/cmake -P cmake_install.cmake -- Install configuration: "Release" -- Installing: /home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so -- Set runtime path of "/home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so" to "" installing to build/bdist.linux-x86_64/wheel running install running install_lib creating build/bdist.linux-x86_64/wheel creating build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/kaldi_io.py -> build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/transforms.py -> build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio creating build/bdist.linux-x86_64/wheel/torchaudio/compliance copying build/lib.linux-x86_64-3.6/torchaudio/compliance/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance copying build/lib.linux-x86_64-3.6/torchaudio/compliance/kaldi.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance creating build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/cmuarctic.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/librispeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/libritts.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/vctk.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/commonvoice.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/gtzan.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/ljspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/speechcommands.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/tedlium.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/yesno.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets creating build/bdist.linux-x86_64/wheel/torchaudio/_internal copying build/lib.linux-x86_64-3.6/torchaudio/_internal/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal copying build/lib.linux-x86_64-3.6/torchaudio/_internal/fft.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal copying build/lib.linux-x86_64-3.6/torchaudio/_internal/module_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal creating build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/common.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/no_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/soundfile_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/sox_io_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend creating build/bdist.linux-x86_64/wheel/torchaudio/extension copying build/lib.linux-x86_64-3.6/torchaudio/extension/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension copying build/lib.linux-x86_64-3.6/torchaudio/extension/extension.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension creating build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/conv_tasnet.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/deepspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2letter.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/wavernn.py -> build/bdist.linux-x86_64/wheel/torchaudio/models creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/components.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/model.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_fairseq.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_huggingface.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils creating build/bdist.linux-x86_64/wheel/torchaudio/sox_effects copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/sox_effects.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects creating build/bdist.linux-x86_64/wheel/torchaudio/utils copying build/lib.linux-x86_64-3.6/torchaudio/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils copying build/lib.linux-x86_64-3.6/torchaudio/utils/sox_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils creating build/bdist.linux-x86_64/wheel/torchaudio/functional copying build/lib.linux-x86_64-3.6/torchaudio/functional/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional copying build/lib.linux-x86_64-3.6/torchaudio/functional/filtering.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional copying build/lib.linux-x86_64-3.6/torchaudio/functional/functional.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional creating build/bdist.linux-x86_64/wheel/torchaudio/prototype copying build/lib.linux-x86_64-3.6/torchaudio/prototype/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype copying build/lib.linux-x86_64-3.6/torchaudio/prototype/rnnt_loss.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype copying build/lib.linux-x86_64-3.6/torchaudio/version.py -> build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/_torchaudio.so -> build/bdist.linux-x86_64/wheel/torchaudio running install_egg_info running egg_info writing torchaudio.egg-info/PKG-INFO writing dependency_links to torchaudio.egg-info/dependency_links.txt writing requirements to torchaudio.egg-info/requires.txt writing top-level names to torchaudio.egg-info/top_level.txt reading manifest file 'torchaudio.egg-info/SOURCES.txt' writing manifest file 'torchaudio.egg-info/SOURCES.txt' Copying torchaudio.egg-info to build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281-py3.6.egg-info running install_scripts adding license file "LICENSE" (matched pattern "LICEN[CS]E*") creating build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281.dist-info/WHEEL creating 'dist/torchaudio-0.10.0a0+9e36281-cp36-cp36m-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it adding 'torchaudio/__init__.py' adding 'torchaudio/_torchaudio.so' adding 'torchaudio/kaldi_io.py' adding 'torchaudio/transforms.py' adding 'torchaudio/version.py' adding 'torchaudio/_internal/__init__.py' adding 'torchaudio/_internal/fft.py' adding 'torchaudio/_internal/module_utils.py' adding 'torchaudio/backend/__init__.py' adding 'torchaudio/backend/common.py' adding 'torchaudio/backend/no_backend.py' adding 'torchaudio/backend/soundfile_backend.py' adding 'torchaudio/backend/sox_io_backend.py' adding 'torchaudio/backend/utils.py' adding 'torchaudio/compliance/__init__.py' adding 'torchaudio/compliance/kaldi.py' adding 'torchaudio/datasets/__init__.py' adding 'torchaudio/datasets/cmuarctic.py' adding 'torchaudio/datasets/commonvoice.py' adding 'torchaudio/datasets/gtzan.py' adding 'torchaudio/datasets/librispeech.py' adding 'torchaudio/datasets/libritts.py' adding 'torchaudio/datasets/ljspeech.py' adding 'torchaudio/datasets/speechcommands.py' adding 'torchaudio/datasets/tedlium.py' adding 'torchaudio/datasets/utils.py' adding 'torchaudio/datasets/vctk.py' adding 'torchaudio/datasets/yesno.py' adding 'torchaudio/extension/__init__.py' adding 'torchaudio/extension/extension.py' adding 'torchaudio/functional/__init__.py' adding 'torchaudio/functional/filtering.py' adding 'torchaudio/functional/functional.py' adding 'torchaudio/models/__init__.py' adding 'torchaudio/models/conv_tasnet.py' adding 'torchaudio/models/deepspeech.py' adding 'torchaudio/models/wav2letter.py' adding 'torchaudio/models/wavernn.py' adding 'torchaudio/models/wav2vec2/__init__.py' adding 'torchaudio/models/wav2vec2/components.py' adding 'torchaudio/models/wav2vec2/model.py' adding 'torchaudio/models/wav2vec2/utils/__init__.py' adding 'torchaudio/models/wav2vec2/utils/import_fairseq.py' adding 'torchaudio/models/wav2vec2/utils/import_huggingface.py' adding 'torchaudio/prototype/__init__.py' adding 'torchaudio/prototype/rnnt_loss.py' adding 'torchaudio/sox_effects/__init__.py' adding 'torchaudio/sox_effects/sox_effects.py' adding 'torchaudio/utils/__init__.py' adding 'torchaudio/utils/sox_utils.py' adding 'torchaudio-0.10.0a0+9e36281.dist-info/LICENSE' adding 'torchaudio-0.10.0a0+9e36281.dist-info/METADATA' adding 'torchaudio-0.10.0a0+9e36281.dist-info/WHEEL' adding 'torchaudio-0.10.0a0+9e36281.dist-info/top_level.txt' adding 'torchaudio-0.10.0a0+9e36281.dist-info/RECORD' removing build/bdist.linux-x86_64/wheel ``` </details> Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D29316372 Pulled By: seemethere fbshipit-source-id: 02be64df6197c0d4bad5a5bfb3cef336c11f53ed	2021-06-23 14:08:19 -07:00
Bert Maher	10e11dbdcd	Reland D29190420: [nnc][tests] Tests and benchmarks for computeSum (#60550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60550 Original commit changeset: ed655497a981 Whatever gcc version OSS Bazel uses wasn't happy move-constructing the SimpleIREvaluator, so use a unique_ptr instead. Test Plan: CI. Hope that the gcc version used by OSS Bazel build is happier with this (it should be), since actually testing it locally is an intractable pain. Reviewed By: navahgar Differential Revision: D29333116 fbshipit-source-id: c3e4b5d8c91eb96a43ae5315a01ca0c0f4d4a99d	2021-06-23 10:50:03 -07:00
Anjali Chourdia	b14f19b6fe	Revert D29190420: [nnc][tests] Tests and benchmarks for computeSum Test Plan: revert-hammer Differential Revision: D29190420 (`21479ad20c`) Original commit changeset: 86246df82098 fbshipit-source-id: ed655497a981783da4c8f13e2d7fec104e3cb184	2021-06-23 06:59:37 -07:00
Bert Maher	21479ad20c	[nnc][tests] Tests and benchmarks for computeSum (#60160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60160 Adds a few simple tests and benchmarks for the `computeSum` op (equivalent to `at::sum`). The benchmarks test 1D reduction and 2D row and column reduction. Performance is in the ballpark of aten (14-15 GB/s) on my skylake devserver for all cases, and occasionally better (e.g. 256k * 64 row reduction goes from 9 GB/s to 13). Results (on my skylake-avx512, with turbo disabled): ``` ------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------ Reduce1D/Torch/16777216 4746995 ns 4746722 ns 150 BYTES=14.1379G/s Reduce1D/Naive/16777216 34063215 ns 34061388 ns 21 BYTES=1.97023G/s Reduce1D/NativeRfactor/16777216 5057175 ns 5057167 ns 139 BYTES=13.2701G/s Reduce1D/TeNaive/16777216 33868945 ns 33868851 ns 21 BYTES=1.98143G/s Reduce1D/TeSplitTail/16777216 33902786 ns 33900436 ns 21 BYTES=1.97959G/s Reduce1D/TeSplitMask/16777216 33922509 ns 33920604 ns 21 BYTES=1.97841G/s Reduce1D/TeRfactorV1/16777216 5141150 ns 5141002 ns 135 BYTES=13.0537G/s Reduce1D/Op/16777216 5140390 ns 5140091 ns 135 BYTES=13.056G/s Reduce2DCol/Torch/8/2097152 12824403 ns 12823563 ns 55 BYTES=5.8874G/s Reduce2DCol/Torch/64/262144 8306873 ns 8306743 ns 83 BYTES=8.20507G/s Reduce2DCol/Torch/4096/4096 7992364 ns 7992239 ns 87 BYTES=8.3988G/s Reduce2DCol/OpSchedule/8/2097152/0 4866144 ns 4865766 ns 138 BYTES=15.5161G/s Reduce2DCol/OpSchedule/64/262144/0 36668978 ns 36666415 ns 19 BYTES=1.85885G/s Reduce2DCol/OpSchedule/4096/4096/0 155862459 ns 155801266 ns 4 BYTES=430.839M/s Reduce2DCol/OpSchedule/8/2097152/1 8067683 ns 8061117 ns 85 BYTES=9.36563G/s Reduce2DCol/OpSchedule/64/262144/1 7496686 ns 7496562 ns 93 BYTES=9.09183G/s Reduce2DCol/OpSchedule/4096/4096/1 5262821 ns 5262186 ns 131 BYTES=12.7562G/s Reduce2DCol/OpSchedule/8/2097152/2 6237899 ns 6237210 ns 109 BYTES=12.1044G/s Reduce2DCol/OpSchedule/64/262144/2 5258012 ns 5257655 ns 127 BYTES=12.9635G/s Reduce2DCol/OpSchedule/4096/4096/2 5231686 ns 5228241 ns 132 BYTES=12.839G/s Reduce2DCol/OpSchedule/8/2097152/3 11088573 ns 11087557 ns 62 BYTES=6.80921G/s Reduce2DCol/OpSchedule/64/262144/3 5338843 ns 5338326 ns 127 BYTES=12.7676G/s Reduce2DCol/OpSchedule/4096/4096/3 4311617 ns 4308102 ns 162 BYTES=15.5812G/s Reduce2DRow/Torch/8/2097152 4642244 ns 4641794 ns 151 BYTES=14.4575G/s Reduce2DRow/Torch/64/262144 4628311 ns 4628245 ns 151 BYTES=14.4999G/s Reduce2DRow/Torch/4096/4096 4894012 ns 4893316 ns 143 BYTES=13.7177G/s Reduce2DRow/Torch/262144/64 10469098 ns 10468027 ns 68 BYTES=6.51101G/s Reduce2DRow/Hand/262144/64 5554380 ns 5554059 ns 126 BYTES=12.2716G/s Reduce2DRow/OpSchedule/8/2097152/0 33890363 ns 33888931 ns 21 BYTES=1.98026G/s Reduce2DRow/OpSchedule/64/262144/0 33901317 ns 33899436 ns 21 BYTES=1.97965G/s Reduce2DRow/OpSchedule/4096/4096/0 33500358 ns 33498815 ns 21 BYTES=2.00381G/s Reduce2DRow/OpSchedule/262144/64/0 13132231 ns 13131049 ns 53 BYTES=5.19056G/s Reduce2DRow/OpSchedule/8/2097152/1 5200423 ns 5200025 ns 134 BYTES=12.9055G/s Reduce2DRow/OpSchedule/64/262144/1 5204428 ns 5204327 ns 133 BYTES=12.8949G/s Reduce2DRow/OpSchedule/4096/4096/1 8724355 ns 8723370 ns 80 BYTES=7.69488G/s Reduce2DRow/OpSchedule/262144/64/1 1811861280 ns 1811352083 ns 1 BYTES=37.6279M/s Reduce2DRow/OpSchedule/8/2097152/2 9169829 ns 9168946 ns 76 BYTES=7.31915G/s Reduce2DRow/OpSchedule/64/262144/2 9159901 ns 9158560 ns 76 BYTES=7.32747G/s Reduce2DRow/OpSchedule/4096/4096/2 9217398 ns 9215557 ns 76 BYTES=7.28391G/s Reduce2DRow/OpSchedule/262144/64/2 10820450 ns 10818998 ns 66 BYTES=6.29979G/s Reduce2DRow/OpSchedule/8/2097152/3 5227921 ns 5226544 ns 133 BYTES=12.84G/s Reduce2DRow/OpSchedule/64/262144/3 5194362 ns 5194082 ns 133 BYTES=12.9203G/s Reduce2DRow/OpSchedule/4096/4096/3 5196080 ns 5195349 ns 134 BYTES=12.9203G/s Reduce2DRow/OpSchedule/262144/64/3 5235189 ns 5234728 ns 133 BYTES=13.0202G/s ``` ghstack-source-id: 131753875 Test Plan: these tests Reviewed By: navahgar Differential Revision: D29190420 fbshipit-source-id: 86246df82098da4f5493d6c4f34a40016d95a9f0	2021-06-22 23:04:09 -07:00
Jiakai Liu	b0c9762e2d	[pytorch][nnc] external function call to xnnpack ops (#59525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59525 This PR added NNC external function call binding for two XNNPack ops: - prepacked::linear_clamp_run - prepacked::conv2d_clamp_run Both ops take two arguments: a regular input tensor and a prepacked context object that contains other parameters like weights/bias/etc. The prepacked context object's type is a custom class. NNC doesn't generate assembly code that reads the content of the prepacked object directly. It simply passes it into the XNNPack ops wrapper, so both NNC and the generated assembly code don't need to know the custom class type. At compilation time, we use a size-1 dummy tensor as the placeholder for the prepacked XNNPack context object. At runtime, we pass in the raw pointer of the XNNPack context object as if it were a regular tensor storage data pointer. Inside the external function call wrapper, we reinterpret_cast the raw pointer back to the custom class type before dispatching to the XNNPack ops. ghstack-source-id: 132135512 Test Plan: unit test Reviewed By: bertmaher Differential Revision: D28924934 fbshipit-source-id: 15326b35dc6c022f4c3f247a2037c361e06e80b4	2021-06-22 21:29:31 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	fca931d181	List striding with arbitrary step size (#58537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58537 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D28531721 Pulled By: tugsbayasgalan fbshipit-source-id: 8c8ed32ca00366603bfb5086e87dfa62736ff4b2	2021-06-22 11:25:23 -07:00
Michael Dagitses	91451369ed	require non-empty inputs to grad() calls in the API (#52016 ) Summary: The grad() function needs to return the updated values, and hence needs a non-empty inputs to populate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52016 Test Plan: Passes Python and C++ unit tests, and added new tests to catch this behavior. Fixes https://github.com/pytorch/pytorch/issues/47061 Reviewed By: albanD Differential Revision: D26406444 Pulled By: dagitses fbshipit-source-id: 023aeca9a40cd765c5bad6a1a2f8767a33b75a1a	2021-06-22 10:10:58 -07:00
Hariom Narang	9d1d799034	Added API to change logging levels for JIT (#58821 ) Summary: Description: - Before this, logging level could only be changed by changing the env variable "PYTORCH_JIT_LOG_LEVEL" - Can change the level from python now - Have not added stream configuration for now - Configuration is stored in a singleton class managing the options Issue Link: https://github.com/pytorch/pytorch/issues/54188 Gotchas: - Created separate functions `::torch::jit::get_jit_logging_levels/set_jit_logging_levels` instead of using the singleton class's method directly - This is because when running test cases, two different instances of the singleton are created for the test suite and the actual code (`jit_log.cpp`) - On using these methods directly, `is_enabled` calls the singleton in `jit_log.cpp` while we are setting the config using another singleton - See: https://stackoverflow.com/questions/55467246/my-singleton-can-be-called-multiple-times API: - To set the level: `torch._C._jit_set_logging_option("level")` - To get the level: `torch._C._jit_get_logging_option()` Testing: - UTs were added for C++ - A very simple UT was added for python to just check if the API is being called correctly - The API was checked by running trace in a sample python file - Set env variable to "" and used `_jit_set_logging_option` in python to set the variable to `>dead_code_elimination` - The error output had logs of form [DUMP..] [UPDATE...] etc Fixes https://github.com/pytorch/pytorch/issues/54188 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58821 Reviewed By: soulitzer Differential Revision: D29116712 Pulled By: ZolotukhinM fbshipit-source-id: 8f2861ee2bd567fb63b405953d035ca657a3200f	2021-06-21 16:10:49 -07:00
Thomas J. Fan	c16f87949f	ENH Adds nn.ReflectionPad3d (#59791 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27655 This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791 Reviewed By: gchanan Differential Revision: D29242015 Pulled By: jbschlosser fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56	2021-06-21 10:53:14 -07:00
Raghavan Raman	d0c4ace00f	[jit] Added a tranformation to move consumers of aten::cat to its inputs, in the fused subgraphs (#59580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59580 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28955318 Pulled By: navahgar fbshipit-source-id: 7504d5aea441920f4eb9234cdfa17077161ab13c	2021-06-18 14:32:07 -07:00
Luca Wehrstedt	08ce5eedf5	[reland] Move RPC agents to libtorch (#60170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60170 Reland of #59939. Test Plan: CI Reviewed By: mrshenli Differential Revision: D29193234 fbshipit-source-id: ee2a90d6be961c10f91361512bdd4cadca43dd60	2021-06-18 05:15:09 -07:00
Bin Bao	3dc8112187	[NNC] Handle int64 indices and loop bounds (#59769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59769 Allow loop bound and tensor indice to be either int32 or int64, and avoid unnecessary cast op. Test Plan: ``` build/bin/test_tensorexpr ``` Reviewed By: H-Huang Differential Revision: D29173970 Pulled By: desertfire fbshipit-source-id: 859a876ddb1b41535b2266089aa1222884295c78	2021-06-17 09:35:59 -07:00
Mikhail Zolotukhin	eb36f67dcc	[TensorExpr] Minor cleanup in TensorExprKernel::computeValue (#60041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60041 Differential Revision: D29146709 D29146709 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 49ac919c18f669d7fda1a26c5a74e62ea752df4f	2021-06-17 01:23:24 -07:00
Mike Ruberry	f233274f30	Revert D28875276: Move RPC agents to libtorch Test Plan: revert-hammer Differential Revision: D28875276 (`fc50f91929`) Original commit changeset: f2f6970fd74d fbshipit-source-id: 3c52af652579733ebea8ddfb06576a0ce262bf78	2021-06-17 00:48:58 -07:00
Hao Lu	eda2ddb5b0	[ATen] Fix aten::to schema (#60001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60001 Fix the aten::to schema to reflect that the output may alias input. Test Plan: Added new unit tests. Reviewed By: ezyang Differential Revision: D29121620 fbshipit-source-id: c29b6aa22d367ffedf06e47116bc46b3e188c39c	2021-06-15 20:04:20 -07:00
Brian Hirsh	27a3204982	generate C++ API for meta functions using at::meta:: (#58570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58570 What the PR does Generate a fast-path `at::meta::{op}` API for calling meta functions without having to go through the dispatcher. This will be important for perf for external backends that want to use meta functions for shape checking (which seems likely to be what we end up doing for LazyTensorCore). Details In order to avoid naming collisions I had to make two small changes: - rename `MetaFunctions.h` template -> `NativeMetaFunctions.h` (this is the file that declares the impl() function for every structured operator). - rename the meta class: `at::meta::{op}::meta()` -> `at::meta::structured_{op}::meta()` I also deleted a few unnecessary includes, since any file that includes NativeFunctions.h will automatically include NativeMetaFunctions.h. Why I made the change This change isn't actually immediately used anywhere; I already started writing it because I thought it would be useful for structured composite ops, but that isn't actually true (see [comment](https://github.com/pytorch/pytorch/pull/58266#issuecomment-843213147)). The change feels useful and unambiguous though so I think it's safe to add. I added explicit tests for C++ meta function calls just to ensure that I wrote it correctly - which is actually how I hit the internal linkage issue in the PR below this in the stack. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D28711299 Pulled By: bdhirsh fbshipit-source-id: d410d17358c2b406f0191398093f17308b3c6b9e	2021-06-15 16:54:46 -07:00
Luca Wehrstedt	fc50f91929	Move RPC agents to libtorch (#59939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59939 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28875276 fbshipit-source-id: f2f6970fd74de5f112636e78edaa4410c61d8c45	2021-06-15 16:20:53 -07:00
Raghavan Raman	b822928e33	[nnc] Removed setGPUBlockIndex and setGPUThreadIndex methods from LoopNest (#59495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59495 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28915960 Pulled By: navahgar fbshipit-source-id: 20a4032b031aba6e43d85433ade5f0680c65fbc0	2021-06-15 10:37:46 -07:00
Raghavan Raman	aa163aeff5	[nnc] Made several LoopNest APIs static (#59494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59494 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28915959 Pulled By: navahgar fbshipit-source-id: bf52e30d893f4d86812219b538a14307f347f10b	2021-06-15 10:36:31 -07:00
Luca Wehrstedt	a1780432fa	Move c10d to libtorch(_cuda) (#59563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59563 ghstack-source-id: 131331264 Test Plan: CI Reviewed By: malfet Differential Revision: D28932239 fbshipit-source-id: 5df6cdfa5253b15cbbc97039fe672d6d97321e34	2021-06-15 02:01:31 -07:00
Raghavan Raman	b83ac0cc4e	[nnc] Added a check to vectorize only those loops that are normalized. (#59423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59423 Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D28886979 Pulled By: navahgar fbshipit-source-id: edfc61feaf5efe22d4f367ac718b83b3d0f47cb3	2021-06-11 12:03:34 -07:00
Raghavan Raman	30e24b2d2b	[nnc] Modified vectorize API to return bool (#59422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59422 Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D28886980 Pulled By: navahgar fbshipit-source-id: 58cc3ecd86564a312a132f8260d836b096505095	2021-06-11 12:02:19 -07:00
Rohan Varma	d433a55c94	Replace throw std::runtime_error with torch_check in torch/csrc/distributed (#59683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59683 Replaces usages of throw std::runtime_error("foo") with the better torch_check(false, "foo") which allows C++ stacktraces to show up when TORCH_SHOW_CPP_STACKTRACES=1. This will hopefully provide much better debugging information when debugging crashes/flaky tests. ghstack-source-id: 131167210 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D28981327 fbshipit-source-id: 677f569e28600263cab18759eb1b282e0391aa7b	2021-06-11 11:15:49 -07:00
Kimish Patel	2ce21b2e61	[Pytorch backend delegation] Preprocess to accept (#58873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58873 BackenDebugInforRecorder Prior to this PR: In order to generate debug handles corresponding to the graph being lowered, backend's preprocess will call generate_debug_handles and will get map of Node*-to-debug_handles. In order to facilitate this, to_backend will own BackendDebugInfoRecorder and initialize thread local pointer to it. generate_debug_handle function will query thread local pointer to see if there is a valid BackendDebugInforRecorder for the context. If there is it will generate debug handles. After this PR: Signature of preprocess is changed such that backends have to register preprocess that accepts instance of BackendDebugInfoRecorder by reference. generate_debug_handles is no more a free function but becomes part of the API of BackendDebugInfoRecorder. Now backend's preprocess function will call generate_debug_handles on BackendDebugInfoRecorder instead of free function. Reason for this change: With RAII that initializes thread local pointer, results in a lose contract with backends, which may result in backends not storing debug information. Making it part of API results in backends having to be aware of BackendDebugInfoRecorder and explicitly chosing not to generate/store debug information if they chose to do so. Test Plan: backend tests Imported from OSS Reviewed By: jbschlosser, raziel Differential Revision: D28648613 fbshipit-source-id: c9b7e7bf0f78e87023ea7bc08612cf893b08cb98	2021-06-11 10:16:00 -07:00
Mikhail Zolotukhin	daa35141e8	Reland: "[TensorExpr] Fix handling of 0-dim tensors." (#59508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59508 An assert that was triggering in a previous version is now relaxed to take 0-dim tensors into account. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28918342 Pulled By: ZolotukhinM fbshipit-source-id: c09b62c9725d1603b0ec11fcc051e7c932af06ae	2021-06-08 22:48:17 -07:00
Jeffrey Wan	f52e202840	Add warning when accessing Tensor::grad() in the C++ API (#59362 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35379 - Adds `retains_grad` attribute backed by cpp as a native function. The python bindings for the function are skipped to be consistent with `is_leaf`. - Tried writing it without native function, but the jit test `test_tensor_properties` seems to require that it be a native function (or alternatively maybe it could also work if we manually add a prim implementation?). - Python API now uses `retain_grad` implementation from cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/59362 Reviewed By: jbschlosser Differential Revision: D28969298 Pulled By: soulitzer fbshipit-source-id: 335f2be50b9fb870cd35dc72f7dadd6c8666cc02	2021-06-08 19:43:21 -07:00
Jeffrey Wan	1733d10399	Warn when backward() is called with create_graph=True (#59412 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/4661 - Add warnings in engine's `execute` function so it can be triggered through both cpp and python codepaths - Adds an RAII guard version of `c10::Warning::set_warnAlways` and replaces all prior usages of the set_warnAlways with the new one Pull Request resolved: https://github.com/pytorch/pytorch/pull/59412 Reviewed By: jbschlosser Differential Revision: D28969294 Pulled By: soulitzer fbshipit-source-id: b03369c926a3be18ce1cf363b39edd82a14245f0	2021-06-08 17:19:04 -07:00
Can Balioglu	cf408c3743	[1/n] [c10d] Introduce a new TCPStore constructor (#58328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58328 This PR is part of a stack that addresses the GitHub issue #41614; it introduces a new `TCPStore` constructor that takes its optional parameters via a newly introduced `TCPStoreOptions` structure. This gives the API callers the flexibility to specify only the desired options while skipping the rest. The main motivation behind this change is the introduction of the `multiTenant` constructor option in the second PR of this stack. ghstack-source-id: 130676384 Test Plan: Run the existing tests since there are no behavioral changes. Reviewed By: H-Huang Differential Revision: D28417742 fbshipit-source-id: e6ac2a057f7ad1908581176ee6d2c2554c3c74a9	2021-06-05 07:50:02 -07:00
Nikita Shulga	ba3a90b55e	Revert D28819780: [TensorExpr] Fix handling of 0-dim tensors. Test Plan: revert-hammer Differential Revision: D28819780 Original commit changeset: f3feff35a1ce fbshipit-source-id: 1dca4ac9cea0b67e9f02800f6d5b3c7e4ae1d81a	2021-06-04 19:25:30 -07:00
David Reiss	a682ff7ef1	Add kMaxSupportedBytecodeVersion for Lite Interpreter (#59472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59472 Previously, the lite interpreter would refuse to load any model with a version greater than kProducedBytecodeVersion. Now, we're able to independently advance the loading and saving code, so we can roll out changes without breaking forward compatibility. Test Plan: CI. Loaded a bytecode v5 model even with setting kProducedBytecodeVersion to v4. Reviewed By: raziel Differential Revision: D28904350 fbshipit-source-id: 598c22f0adf47d4ed3e976bcbebdf3959dacb1df	2021-06-04 17:55:02 -07:00
Mikhail Zolotukhin	d60efd8207	[TensorExpr] Fix handling of 0-dim tensors. (#59279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59279 There were some issues with how we handle 0-dim cases in lowerings and also in how we generate reductions in that special case. This PR fixes those issues and reenables a bunch of tests. Differential Revision: D28819780 D28819780 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f3feff35a1ce11821ada2f8d04ae9d4be10dc736	2021-06-04 13:58:15 -07:00
Jeffrey Wan	4ae5764d47	Add is_inference to native functions (#58729 ) Summary: Adds `is_inference` as a native function w/ manual cpp bindings. Also changes instances of `is_inference_tensor` to `is_inference` to be consistent with other properties such as `is_complex`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58729 Reviewed By: mruberry Differential Revision: D28874507 Pulled By: soulitzer fbshipit-source-id: 0fa6bcdc72a4ae444705e2e0f3c416c1b28dadc7	2021-06-04 08:59:11 -07:00
Luca Wehrstedt	8f4cfaa9db	Fix race condition in TP agent (#58753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58753 TSAN was (rightfully!) detecting and complaining about a race due to the fact that upon init the TP agent exchanges the device maps between nodes using RPC requests (and by doing so it accesses the device maps) and then sets the reverse device maps (thus possibly modifying the set of devices). This resulted in a data race, i.e., simultaneously reading and writing the set of devices without synchronizing. One solution is to add a mutex around the devices, which works, but is "annoying". An alternative solution is to make the set of devices immutable (i.e., `const`). For that to work, we need to exchange the device maps without using RPC calls. We can do so using the process group that we need to create anyways. Since now there's a lot more logic in Python, I've moved (and restructured) all safety checks over there, and removed them from C++. ghstack-source-id: 130583775 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D28603754 fbshipit-source-id: 88533e65d72d1eb806dc41bec8d55def5082e290	2021-06-04 06:53:42 -07:00
johnlu	db90533b9e	Make JIT not assume that the device is CUDA. (#54238 ) Summary: Decouple the JIT argument spec and shape analysis with CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54238 Reviewed By: ngimel Differential Revision: D28802085 Pulled By: Krovatkin fbshipit-source-id: 4068c9460cdec2d80733f001ca90ea3f5e6d3a7e	2021-06-03 22:21:27 -07:00
Hui Guo	7c4ac9e3ee	[NNC] Fix loopnest.cache_accesses for reduce ops (fixed #59002 ) (#59136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59136 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28768598 Pulled By: huiguoo fbshipit-source-id: 99ab8430bc0ba395e2a041b03a7761de335ddda5	2021-06-03 21:04:14 -07:00
Bin Bao	add291cf66	[JIT] Add a phase to perform inplace<->functional conversion for activation operators (#57477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57477 Currently the conversion only deals with activation operators. The legality check is somewhat strict for now. Test Plan: ``` python test/test_jit.py -k test_functional_to_inplace_activation python test/test_jit.py -k test_inplace_to_functional_activation ``` Reviewed By: mrshenli Differential Revision: D28155153 Pulled By: desertfire fbshipit-source-id: df092830c4dff3ce9578ff76285eb7a566b7d81b	2021-06-03 06:43:23 -07:00
Luca Wehrstedt	3a2149a4ce	[reland] Make TP agent use streams from Future when sending response (#59212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59212 Reland of https://github.com/pytorch/pytorch/pull/58428 Until now, the TP agent expected the output of a remote function to be on the same streams as the inputs. In other words, it used the lazy stream context of the inputs to synchronize the output tensors. This was true in the most common case of a synchronous remote function. However it wasn't true for async functions, for fetching RRefs, ... The more generic way is to use the CUDA events held by the Future to perform this synchronization. (These events may be on the input streams, or they may not be!). ghstack-source-id: 130202842 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28623885 fbshipit-source-id: 29333bcb75d077ab801eac92017d0e381e8f5569	2021-06-02 05:46:05 -07:00
Luca Wehrstedt	5ec169b4c3	[reland] Always use intrusive_ptr for Message (1 out of 2) (#59205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59205 Reland of https://github.com/pytorch/pytorch/pull/58422 Similar to Future (which I tackled recently), Message is an ivalue type (a "custom class" one), and the natural way to represent it is inside an intrusive_ptr. However in the RPC code we had a mix of usages, often passing Message by value. This has undesirable consequences, as it could easily trigger a copy by accident, which I believe is why in many places we accepted _rvalue references_ to Message, in order to force the caller to move. In my experience this is non-idiomatic in C++ (normally a function signature specifies how the function consumes its arguments, and it's up to the caller to then decide whether to copy or move). By moving to intrusive_ptr everywhere I think we eliminate and simplify many of the problems above. In this PR I do half of the migration, by updating everything except the `toMessageImpl` methods, which will come in the next PR. ghstack-source-id: 130202849 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28623891 fbshipit-source-id: c9aeea3440679a11741ca78c06b03c57cb815a5e	2021-06-02 05:44:49 -07:00
Joel Schlosser	ef32a29c97	Back out "[pytorch][PR] ENH Adds dtype to nn.functional.one_hot" (#59080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59080 Original commit changeset: 3686579517cc Test Plan: None; reverting diff Reviewed By: albanD Differential Revision: D28746799 fbshipit-source-id: 75a7885ab0bf3abadde9a42b56d479f71f57c89c	2021-05-27 15:40:52 -07:00
Bert Maher	617b74aa35	[nnc] LLVMCodeGen for any target (#58713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58713 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28585722 Pulled By: bertmaher fbshipit-source-id: 82885b9780dc1a8610660a90969d8d2baad97920	2021-05-27 09:25:15 -07:00
Scott Wolchok	de22657e1c	[PyTorch] Replace RecordFunction shouldRun callback with atomic bools (#56504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56504 Having callbacks registered but disabled via their `shouldRun` callback defeats the `shouldRunRecordFunction` optimization (no relation between the two things, despite the shared prefix on the names) that aims to skip `RecordFunction` construction. This diff attempts to safely rectify this issue: we drop support for `shouldRun` callbacks (this is bc-breaking; does anything use these externally? do I need to add the support back and just stop using it internally?), add support for enabling and disabling callbacks, and (for global callbacks) make doing so thread-safe. There is an interesting subtlety with `std::atomic` that came up: it is neither copyable nor movable, which precludes putting it into `std::vector`. I manually overrode this because the thread safety reasons it is neither copyable nor movable don't apply here; we already state that adding or removing callbacks (the operations that might copy/move an atomic) are not thread-safe and should be done at initialization time. ghstack-source-id: 129614296 Test Plan: Existing CI should cover correctness, right? Inspected perf report of a simple benchmark that runs nn.Linear in a loop on CUDA, where internally have Kineto initialized and thus had a shouldRun observer previously; we are no longer going through the dispatcher's slow RecordFunction path or spending measurable time constructing RecordFunction instances. Reviewed By: ilia-cher Differential Revision: D27834944 fbshipit-source-id: 93db1bc0a28b5372f7307490c908457e7853fa92	2021-05-26 14:31:33 -07:00
Chen Lai	9ba9a16700	[PyTorch Edge] Use stream as backport_vi_to_vi-1 interface (#58790 ) Summary: Two main changes: 1. Change the argument of the collection of backport_v{i}_to_v{i-1} from (reader, writer) to (input_model_stream, output_model_stream), so it's easier to backport a model in option 2. > 2) [Both format and content change] ]Use torch.jit.load() to load the stream, and save it to output_model_stream. 2. Fix an issue in the test `backportAllVersionCheck`. Previous it declares `std::ostringstream oss` and uses `oss.clear()` to reset the stringstream. However, the `clear()` function doesn't reset the stream content, and causes problematic stream. As a mitigation, checks are added to prevent corrupted stream for each iteration in while loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58790 ghstack-source-id: 129929960 Test Plan: CI ``` buck test mode/dev //caffe2/test/cpp/jit:jit ``` Reviewed By: raziel, iseeyuan Differential Revision: D28620961 fbshipit-source-id: b0cbe0e88645ae278eb3999e2a84800702b5f985	2021-05-26 02:07:46 -07:00
Chen Lai	60af6e928a	[PyTorch Edge][Version] Fix torchscript model after backport (#58892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58892 The torchscript model after backport misses the `constants` archive. Add it back, and extend the unit test to run torchscript part. ghstack-source-id: 129853819 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions' ``` Reviewed By: raziel, iseeyuan Differential Revision: D28664507 fbshipit-source-id: 5f98723231cc64ed203c062ee6f00d8adbdccf77	2021-05-25 15:36:56 -07:00
Kimish Patel	ede3f5421f	[Pytorch Delegated Backend] Save function name in debug info (#57481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57481 This diff introduces function name to InlinedCallStack. Since we are using InlinedCallStack for debug information in lite interpreter as well as delegate backends, where InlinedCallStack cannot be constructed from model source code, we need to save function name. In the absence of function name Function* is used to get name of the function. This is when JIT compiles code at runtime. When that is not possible, this diff introduces a way to obtain function name. Test Plan: test_backend test_cs_debug_info_serialization test_backend test_cs_debug_info_serialization Imported from OSS Differential Revision: D28159097 D28159097 Reviewed By: raziel, ZolotukhinM Pulled By: kimishpatel fbshipit-source-id: deacaea3325e27273f92ae96cf0cd0789bbd6e72	2021-05-25 13:19:02 -07:00
Kimish Patel	813adf1076	[Pytorch Delegated Backend] Save operator name and function name in (#57441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57441 debug info Previous diffs did not save operator name in debug info. For delegated backends that only idenfity op for profiling with debug handle, operator name should be stores as well. Furthermore to complete debug informaton also serialize function name. Test Plan: Existing lite interpreter and backend tests Existing lite interpreter and backend tests Imported from OSS Differential Revision: D28144581 D28144581 Reviewed By: raziel Pulled By: kimishpatel fbshipit-source-id: 415210f147530a53b444b07f1d6ee699a3570d99	2021-05-25 13:17:54 -07:00
Raghavan Raman	dd7bbe1a63	[NNC] Make splitWithMask transform in-place (#58269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58269 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28427227 Pulled By: navahgar fbshipit-source-id: 4e38a436abcf4752fd7ef6ab3666876eec6ea5ba	2021-05-25 11:32:51 -07:00
Raghavan Raman	e2467cc43e	[NNC] Make splitWithTail transform in-place (#58268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58268 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28427228 Pulled By: navahgar fbshipit-source-id: 270b62c4e83739ad21dd68f375120e56881b394f	2021-05-25 11:31:14 -07:00
Adnios	09a8f22bf9	Add mish activation function (#58648 ) Summary: See issus: https://github.com/pytorch/pytorch/issues/58375 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648 Reviewed By: gchanan Differential Revision: D28625390 Pulled By: jbschlosser fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4	2021-05-25 10:36:21 -07:00
Zhengxu Chen	2b0ec9c3cf	Reapply "[jit] Implement ScriptProfile to collect instruction profiles." (#58783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58783 This reverts commit `fc804b5def`. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28617037 Pulled By: zhxchen17 fbshipit-source-id: 645de2ede20500a5c218d6ec3c7faae94de37a14	2021-05-24 18:23:21 -07:00
Thomas J. Fan	a7f4f80903	ENH Adds dtype to nn.functional.one_hot (#58090 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33046 Related to https://github.com/pytorch/pytorch/issues/53785 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58090 Reviewed By: zou3519 Differential Revision: D28640893 Pulled By: jbschlosser fbshipit-source-id: 3686579517ccc75beaa74f0f6d167f5e40a83fd2	2021-05-24 13:48:25 -07:00
Jacob Szwejbka	1c5f63d86d	[Pytorch Edge] Model Ops compatibility api (#57501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57501 Add an api _get_model_ops_and_info to get root operators and versioning info of a model in both cxx and python, and the input can be from a file path or buffer. ghstack-source-id: 129620112 Test Plan: unit test. Reviewed By: xcheng16, raziel Differential Revision: D28162765 fbshipit-source-id: 4413c1e906b8a872e4a717d849da37347adbbea4	2021-05-24 12:00:06 -07:00
Kimish Patel	d6d726f781	[Pytorch Backend delegation] Add api for backend lowering to query debug (#55462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55462 handles and symbolicate exception callstack thrown from backend. Objective of this diff is to achieve improve error reporting when exceptions are raised from lowered backend. We would effectively like to get the same model level stack trace that you would get without having lowered some module to backend. For example: ``` class AA(nn.Module): def forward(self, x, y): return x + y class A(nn.Module): def __init__(...): self.AA0 = AA() def forward(self, x, y): return self.AA0.forward(x, y) + 3 class B(nn.Module): def forward(self, x): return x + 2 class C(nn.Module): def __init__(...): self.A0 = A() self.B0 = B() def forward(self, x, y): return self.A0.forward(x, y) + self.B0.forward(x) ``` If the we then do C().forward(torch.rand((2,3)), torch.rand(14,2))) we will likely see error stack like: ``` C++ exception with description "The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): File "<string>", line 3, in forward def forward(self, x, y): return self.A0.forward(x, y) + self.B0.forward(x) ~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 3, in forward def forward(self, x, y): return self.AA0.forward(x, y) + 3 ~~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 3, in forward def forward(self, x, y): return x + y ~~~~~ <--- HERE ``` We would like to see the same error stack if we lowered C.A0 to some backend. With this diff we get something like: ``` Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).AA0(AA) Traceback of TorchScript (most recent call last): File "<string>", line 3, in FunctionName_UNKNOWN def forward(self, x, y): return self.A0.forward(x, y) + self.B0.forward(x) ~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 5, in FunctionName_UNKNOWN typed_inputs: List[Any] = [x, y, ] if self.__backend.is_available() : _0, = self.__backend.execute(self.__handles["forward"], typed_inputs) ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE assert isinstance(_0, Tensor) return _0 File "<string>", line 3, in FunctionName_UNKNOWN def forward(self, x, y): return self.AA0.forward(x, y) + 3 ~~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 3, in FunctionName_UNKNOWN def forward(self, x, y): return x + y ~~~~~ <--- HERE ``` This is achieved in 3 parts: Part 1: A. BackendDebugInfoRecorder: During backend lowering, in `to_backend`, before calling the preprocess function corresponding to the backend. This will facilitate recording of debug info (such as source range + inlined callstack) for the lowered module. B. Instantiate WithBackendDebugInfoRecorder with BackendDebugInfoRecorder. This initializes thread local pointer to BackendDebugInfoRecorder. C. generate_debug_handles: In preprocess function, the backend will call generate_debug_handles for each method being lowered separately. generate_debug_handles takes `Graph` of the method being lowered and returns a map of Node*-to-debug_handles. Backend is responsible for storing debug handles appropriately so as to raise exception (and later profiling) using debug handles when the exception being raised corresponds to particular Node that was lowered. Inside generate_debug_handles, we will query the current BackendDebugHandleInfoRecorder, that is issuing debug handles. This debug handle manager will issue debug handles as well as record debug_handles-to-<source range, inlined callstack> map. D. Back in `to_backend`, once the preprocess function is has finished lowering the module, we will call `stopRecord` on BackendDebugInfoRecorder. This will return the debug info map. This debug info is then stored inside the lowered module. Part 2: Serialization: During serialization for bytecode (lite interpreter), we will do two things: 1. Extract all the source ranges that are contained inside debug_handles-to-<source range, inlined callstack> map for lowered module. This will be source range corresponding to debug handles, including what is there is inlined callstack. Since we replaced original module with lowered module, we wont be serializing code for the original module and thus no source range. That is why the source range will have to be stored separately. We will lump all the source ranges for all the lowered modules in one single debug_pkl file. 2. Then we will serialize debug_handles-to-<source range, inlined callstack> map. Now during deserialization we will be able to reconstruct debug_handles-to-<source range, inlined callstack> map. Given all debug_handles are unique we would not need any module information. Test Plan: Tests are added in test_backend.cpp Tests are added in test_backend.cpp Imported from OSS Differential Revision: D27621330 D27621330 Reviewed By: raziel Pulled By: kimishpatel fbshipit-source-id: 0650ec68cda0df0a945864658cab226a97ba1890	2021-05-22 08:33:07 -07:00
Xiaodong Wang	4c961beacb	Revert D28474878: Always use intrusive_ptr for Message (1 out of 2) Test Plan: revert-hammer Differential Revision: D28474878 (`4d704e607d`) Original commit changeset: 5b76d45e05f6 fbshipit-source-id: 677c5bc7f02dca23213f778eb0e626a2f6600f3b	2021-05-21 19:24:22 -07:00
Xiaodong Wang	b8a04e25ec	Revert D28474982: Make TP agent use streams from Future when sending response Test Plan: revert-hammer Differential Revision: D28474982 (`19a7472702`) Original commit changeset: c0034eb3f2a2 fbshipit-source-id: fb260c71e6c9dd5a2c44121fe4729a4f4418532b	2021-05-21 19:23:01 -07:00
Luca Wehrstedt	19a7472702	Make TP agent use streams from Future when sending response (#58428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58428 Until now, the TP agent expected the output of a remote function to be on the same streams as the inputs. In other words, it used the lazy stream context of the inputs to synchronize the output tensors. This was true in the most common case of a synchronous remote function. However it wasn't true for async functions, for fetching RRefs, ... The more generic way is to use the CUDA events held by the Future to perform this synchronization. (These events may be on the input streams, or they may not be!). ghstack-source-id: 129567045 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28474982 fbshipit-source-id: c0034eb3f2a2ea525efb63a31b839bc086060e7e	2021-05-21 13:15:35 -07:00
Luca Wehrstedt	4d704e607d	Always use intrusive_ptr for Message (1 out of 2) (#58422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58422 Similar to Future (which I tackled recently), Message is an ivalue type (a "custom class" one), and the natural way to represent it is inside an intrusive_ptr. However in the RPC code we had a mix of usages, often passing Message by value. This has undesirable consequences, as it could easily trigger a copy by accident, which I believe is why in many places we accepted _rvalue references_ to Message, in order to force the caller to move. In my experience this is non-idiomatic in C++ (normally a function signature specifies how the function consumes its arguments, and it's up to the caller to then decide whether to copy or move). By moving to intrusive_ptr everywhere I think we eliminate and simplify many of the problems above. In this PR I do half of the migration, by updating everything except the `toMessageImpl` methods, which will come in the next PR. ghstack-source-id: 129567053 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28474878 fbshipit-source-id: 5b76d45e05f6fa58c831e369c5c964d126187a6c	2021-05-21 13:15:24 -07:00
Edward Yang	fc804b5def	Revert D28133579: [jit] Implement ScriptProfile to collect instruction profiles. Test Plan: revert-hammer Differential Revision: D28133579 (`034a238bab`) Original commit changeset: e7e30e961513 fbshipit-source-id: 5a7756468b4f2eeed24d2abb7b52ab46d081a95e	2021-05-21 08:18:40 -07:00
Zhengxu Chen	034a238bab	[jit] Implement ScriptProfile to collect instruction profiles. (#57397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57397 Introduces two main classes in C++ runtime: ScriptProfile is the implementation for enalbing and disabling interpreter profiling in C++. This should be only used from Python, and we will add corresponding Python API in the next diff. InstructionSpan is a utility class to instrument execution of each single instruction. A start timestamp is recorded in the consturctor, and an end timestamp is recorded in the destructor. During destruction, this will send runtime data to all enabled ScriptProfile instances. Test Plan: build/bin/test_jit --gtest_filter='ScriptProfileTest.Basic' Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28133579 fbshipit-source-id: e7e30e96151367022793ab3ad323f01c51ad4a3b	2021-05-20 14:11:03 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	9db64e6e56	Revert "Striding for lists Part 2 (#49352 )" (#58523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58523 This reverts commit `fee7e8b91d`. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28528023 Pulled By: tugsbayasgalan fbshipit-source-id: 9fa1d86f0c81fcc6fd3798e0d51a712a3c9b3952	2021-05-20 13:20:33 -07:00
Edvard Ghazaryan	ccad77aa22	Added OperatorMap for mapping Operator to any template <T> (#58060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58060 Generic way to check if Operator belongs to predefined map, and if so via public method(s) access to map value. In general value can be anything for example Operator's schema. Test Plan: buck test caffe2/test/cpp/jit:jit -- OperatorMap Reviewed By: Krovatkin Differential Revision: D28357933 fbshipit-source-id: ba3248cf06c07f16aebafccb7ae71c1245afb083	2021-05-19 11:38:49 -07:00
Bert Maher	dcfc2050bd	VaryingShape<Strides>::isComplete() needs to consider whether each Stride is complete (#58510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58510 In some case that I don't fully understand we're getting a stride that is: ``` {2:1, 1:1, 0:*} ``` (in this debug output, M:N means stride index M, stride value N). This shape should be considered incomplete, since we don't actually know the values of the stride, but VaryingShape::isComplete considers it complete because it only checks the presence of elements in the vector, not whether those elements are themselves complete. ghstack-source-id: 129279583 Test Plan: new unit test in test/cpp/jit To see the failure in the context of a real model: ``` ./fblearner/predictor/loadgen/download-requests.sh 272478342_0 10 ~/local/requests/272478342_0.recordio buck-out/gen/fblearner/predictor/loadgen/replay_model_requests --model_id=272478342_0 --replay_record_source=recordio:/data/users/bertrand/requests/272478342_0.recordio --remote_port=9119 --output_file=/data/users/bertrand/responses/272478342_0_actual.recordio --output_type=recordio buck-out/gen/fblearner/predictor/loadgen/replay_model_requests --model_id=272478342_0 --replay_record_source=recordio:/data/users/bertrand/requests/272478342_0.recordio --remote_port=9119 --output_file=/data/users/bertrand/responses/272478342_0_actual.recordio --output_type=recordio ``` Reviewed By: Krovatkin Differential Revision: D28520062 fbshipit-source-id: 3ca900337d86480a40fbd90349a698cbb2fa5f11	2021-05-18 21:45:46 -07:00
Raghavan Raman	4b859cbca1	[NNC] Do not optimize conditionals when the corresponding loop is not normalized (#57675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57675 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28231375 Pulled By: navahgar fbshipit-source-id: bcbcebca25577744c7190a0aa9fa376f76dea77d	2021-05-18 14:25:53 -07:00
Raghavan Raman	a71b99b50d	[NNC] Add a method to check if a loop is normalized (#57674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57674 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28231377 Pulled By: navahgar fbshipit-source-id: 3d92d532f1e1f78c9d94619980340622b73f99ec	2021-05-18 14:25:50 -07:00
Raghavan Raman	3fe72d30dc	[NNC] Optimize conditionals that correspond to the form generated for aten::cat op. (#57673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57673 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28231374 Pulled By: navahgar fbshipit-source-id: 1777a63df4e5ebed6d515683bd772a88be465b3a	2021-05-18 14:23:48 -07:00
Raghavan Raman	34d6618386	[NNC] Fixing a bug in simplifier (#58291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58291 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28435393 Pulled By: navahgar fbshipit-source-id: 517e47385a93a43d2ddf054382adc81c18484066	2021-05-18 01:28:33 -07:00
Elias Ellison	211bac53ef	[JIT] Add optimize_for_inference API (#58193 ) Summary: Freezing exists as a pass which partially evaluates your model and applies generic optimizations which should speed it up. Optimize for inference is a counterpart to these optimizations which runs build & server specific optimizations. The interaction with existing `optimize_frozen_module` is not great, I guess we could just deprecate the API entirely? it was never officially released but just existed to document the `optimize_numerics` keyword. Eventually, I would like to add a way of adding example inputs but I didnt add that here because they are not being used at all yet. I also have not yet included a way to blacklist individual optimizations, and would like to wait until we move this to Beta and have a little more clarity on how everything will fit together. I also think blacklisting will be an uncommon use case for the current optimizations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58193 Reviewed By: bertmaher, navahgar Differential Revision: D28443714 Pulled By: eellison fbshipit-source-id: b032355bb2585720a6d2f00c89d0d9a7ef60e649	2021-05-15 15:50:14 -07:00

... 6 7 8 9 10 ...

2136 Commits