pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jiewen Tan	3b8245ab12	[LTC] Make ComputePostOrder accept const T pointers (#88773 ) Summary: Since `c10::ArrayRef` now support `c10::ArrayRef<const T>`, let's restore `ComputePostOrder` to accept `const Node*` again, which is more suitable for the context of the given helpers. Test Plan: CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88773 Approved by: https://github.com/JackCaoG	2022-11-10 18:34:19 +00:00
kshitij12345	eb9b156019	[fix] MathBits: serialization (#88182 ) Fixes #81690 TODO: * [x] C++ Unpickler Fix (locally tested pickled in Python and unpickled in C++) * [x] C++ Pickler Fix (locally tested pickled in C++ and unpickled in Python) * [x] Do quant_tensor, sparse_tensor, etc require similar changes? (Sparse and Quant don't need this) * [x] Add Comments * [x] How to make sure C++ and Python are in sync? (Functions in `pickler.h` help in getting and setting Tensor Metadata (math-bits for now) on a tensor. They are the only place which should handle this.) Notes: Quant Tensor don't support complex dtypes and for float they segfault with `_neg_view` : https://github.com/pytorch/pytorch/issues/88484 Sparse Tensor: ```python >>> a = torch.tensor([[0, 2.], [3j, 0]]).to_sparse() >>> a.conj().is_conj() False >>> a._neg_view() Traceback (most recent call last): File "<stdin>", line 1, in <module> NotImplementedError: Cannot access storage of SparseTensorImpl ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88182 Approved by: https://github.com/ezyang, https://github.com/anjali411	2022-11-09 17:15:12 +00:00
jjsjann123	7b419e8513	[NVFuser] Upstream push 1026 (#87779 ) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Codegen changes include: * codegen improvement: i. allow non-root trivial reductions, allow empty/no-op fusion ii. fixes vectorization checks and size calculation iii. bank conflict handle improvement iv. enables transpose scheduler * misc: i. CI tests failure fixes ii. cpp tests file clean up iii. trivial forwarding supports added in codegen runtime iv. added factory methods support in codegen Commits that's in this PR from the devel branch: ``` 7117a7e37ebec372d9e802fdfb8abb7786960f4a patching nvfuser conv cudnn test numerics mismatch (#2048) 65af1a4e7013f070df1ba33701f2d524de79d096 Inserting sync for redundant parallel types is already done at the (#2023) 6ac74d181689c8f135f60bfc1ec139d88941c98c Fix sync map (#2047) f5bca333355e2c0033523f3402de5b8aac602c00 Bank conflict checker improvements (#2032) d2ca7e3fd203537946be3f7b435303c60fa7f51e Minor update on cp.async code generation. (#1901) d36cf61f5570c9c992a748126287c4e7432228e0 Test file cleanup (#2040) 0b8e83f49c2ea9f04a4aad5061c1e7f4268474c6 Allow non-root trivial reductions (#2037) a2dfe40b27cd3f5c04207596f0a1818fbd5e5439 Fix vectorize size calculation (#2035) e040676a317fe34ea5875276270c7be88f6eaa56 Use withPredicate to replace setPredicate to maintain Exprs immutable (#2025) 197221b847ad5eb347d7ec1cf2706733aacbf97c removing ci workflow (#2034) 40e2703d00795526e7855860aa00b9ab7160755f Reduction rand like patch (#2031) bc772661cbdb3b711d8e9854ae9b8b7052e3e4a3 Add utility for checking bank conflict of shared memory (#2029) ddd1cf7695f3fb172a0e4bcb8e4004573617a037 Add back FusionReductionWithTrivialReduction_CUDA (#2030) fbd97e5ef15fa0f7573800e6fbb5743463fd9e57 Revert "Cleanup trivial reduction workarounds (#2006)" (#2024) bca20c1dfb8aa8d881fc7973e7579ce82bc6a894 Cleanup trivial reduction workarounds (#2006) e4b65850eee1d70084105bb6e1f290651adde23e Trivial forwarding (#1995) 1a0e355b5027ed0df501989194ee8f2be3fdd37a Fix contiguity analysis of predicates to match updated contiguity. (#1991) a4effa6a5f7066647519dc56e854f4c8a2efd2a7 Enable output allocation cache (#2010) 35440b7953ed8da164a5fb28f87d7fd760ac5e00 Patching bn inference (#2016) 0f9f0b4060dc8ca18dc65779cfd7e0776b6b38e8 Add matmul benchmark (#2007) 45045cd05ea268f510587321dbcc8d7c2977cdab Enable tests previously disabled due to an aliasing bug (#2005) 967aa77d2c8e360c7c01587522eec1c1d377c87e Contiguous indexing for View operations (#1990) a43cb20f48943595894e345865bc1eabf58a5b48 Make inlining even more modular (#2004) dc458358c0ac91dfaf4e6655a9b3fc206fc0c897 Test util cleanup (#2003) 3ca21ebe4d213f0070ffdfa4ae5d7f6cb0b8e870 More strict validation (#2000) a7a7d573310c4707a9f381831d3114210461af01 Fix build problem (#1999) fc235b064e27921fa9d6dbb9dc7055e5bae1c222 Just fixes comments (#1998) 482386c0509fee6edb2964c5ae72074791f3e43a cleanup (#1997) 4cbe0db6558a82c3097d281eec9c85ad2ea0893a Improve divisible split detection (#1970) 42ccc52bdc18bab0330f4b93ed1399164e2980c9 Minor build fix. (#1996) fcf8c091f72d46f3055975a35afd06263324ede6 Cleanup of lower_utils.cpp: Isolate out GpuLower usage (#1989) 15f2f6dba8cbf408ec93c344767c1862c30f7ecc Move ConcretizedBroadcastDomains to shared_ptr in GpuLower. (#1988) 8f1c7f52679a3ad6acfd419d28a2f4be4a7d89e2 Minor cleanup lower_unroll.cpp (#1994) 1d9858c80319ca7f0037db7de5f04e47f540d76c Minor cleanup (#1992) f262d9cab59f41c669f53799c6d4a6b9fc4267eb Add support for uniform RNG (#1986) eb1dad10c73f855eb1ecb20a8b1f7b6edb0c9ea3 Remove non-const functions, remove GpuLower instance on build, pass in ca_map. (#1987) 634820c5e3586c0fe44132c51179b3155be18072 Add support for some empty fusion (#1981) eabe8d844ad765ee4973faa4821d451ef71b83c3 Segment self mapping fusions (#1954) e96aacfd9cf9b3c6d08f120282762489bdf540c8 Enable Transpose operation (#1882) 425dce2777420248e9f08893765b5402644f4161 Add a null scheduler that helps segmenting away no-op schedules (#1835) 306d4a68f127dd1b854b749855e48ba23444ba60 Fix canScheduleCompileTime check of transpose scheduler (#1969) b1bd32cc1b2ae7bbd44701477bddbcfa6642a9be Minor fix (#1967) bd93578143c1763c1e00ba613a017f8130a6b989 Enable transpose scheduler (#1927) b7a206e93b4ac823c791c87f12859cf7af264a4c Move scheduler vectorize utilities into their own file (#1959) d9420e4ca090489bf210e68e9912bb059b895baf View scheduling (#1928) c668e13aea0cf21d40f95b48e0163b812712cdf2 Upstream push ci fixes (#1965) c40202bb40ce955955bb97b12762ef3b6b612997 Fix dump effective bandwidth (#1962) 93505bcbb90a7849bd67090fe5708d867e8909e4 WAR on index mapping when exact and permissive maps differ (#1960) 45e95fd1d3c773ee9b2a21d79624c279d269da9f Allow splitting inner-most ID to create virtual innermost ID in transpose scheduler (#1930) a3ecb339442131f87842eb56955e4f17c544e99f Improve the comments at the beginning of index_compute.h (#1946) f7bc3417cc2923a635042cc6cc361b2f344248d6 Remove unused variables (#1955) df3393adbb5cb0309d091f358cfa98706bd4d313 Some cleanup (#1957) 7d1d7c8724ab5a226fad0f5a80feeac04975a496 TVDomainGuard factory (#1953) 357ba224c0fb41ed3e4e8594d95599c973f4a0ca Fill allocation with nan on tests (#1956) 8eafc54685d406f5ac527bcbacc475fda4492d7a Fix detection of unmappable root domains (#1952) 90a51f282601ba8ebd4c84b9334efd7762a234bc Some indexing cleanups, Add eye support (#1940) ddc01e4e16428aec92f9c84d698f959b6436a971 Exclude unsupported data types (#1951) 992e17c0688fe690c51b50e81a75803621b7e6aa test the groups the same order as they are merged (#1949) 208262b75d1fed0597a0329d61d57bc8bcd7ff14 Move detection of self mapping IDs to IterDomainGraph from (#1941) ac4de38c6ee53b366e85fdfe408c3642d32b57df Merge pull request #1945 from csarofeen/master_merge_0828 631094891a96f715d8c9925fb73d41013ca7f2e3 Add full, full_like, zeros, zeros_like, ones, ones_like (#1943) aab10bce4541204c46b91ff0f0ed9878aec1bfc4 Merge remote-tracking branch 'upstream/viable/strict' into HEAD 4c254c063bb55887b45677e3812357556a7aa80d Fix arange when step is negative (#1942) 89330aa23aa804340b2406ab58899d816e3dc3d2 Tensor factories must set the output shape as its input (#1939) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D40869846](https://our.internmc.facebook.com/intern/diff/D40869846) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87779 Approved by: https://github.com/davidberard98	2022-11-04 20:04:34 +00:00
Digant Desai	47a542dc06	Nested profiling support for Linux-perf Profiler (#87904 ) Add a stack of start counter values, and attribute each disable to the last enable Differential Revision: [D40539212](https://our.internmc.facebook.com/intern/diff/D40539212/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87904 Approved by: https://github.com/SS-JIA	2022-11-02 14:51:53 +00:00
Digant Desai	ebdaeaaa8c	[edge profiler] Add e2e test for profiler event and chrometrace (#87877 ) * Runs an existing model and checks an aten op if it gets perf events generated in the chrometrace * Doesn't check for exact values since that's harder to do in a hardware independent way Differential Revision: [D40474957](https://our.internmc.facebook.com/intern/diff/D40474957/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87877 Approved by: https://github.com/SS-JIA	2022-11-02 14:49:54 +00:00
Digant Desai	bc1e9a07a3	[profiler] Add Performance events support in Kineto profiler (#87874 ) * Wiring to allow user to pass event names to profiler and reflect the count to the chrometrace * If not used, the runtime and size overhead should be neglegible * For now, primary user will be KinetoEdgeCPUProfiler but the impl does not assume that * Not exposed to python yet Differential Revision: [D40238032](https://our.internmc.facebook.com/intern/diff/D40238032/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D40238032/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87874 Approved by: https://github.com/SS-JIA	2022-11-02 14:43:17 +00:00
Nikita Shulga	e1c123d29a	Add UBSAN to ASAN (#88055 ) Add undefined behavior sanitizer to `USE_ASAN` option. Added `torch._C._crash_if_vptr_ubsan()` that only fails if vptr belongs to a wrong class after typecast Deleted all ubsan supressions, but disabled `ProtoTest::Basic` as it fails above-mentioned vptr check. Fixes https://github.com/pytorch/pytorch/issues/88042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88055 Approved by: https://github.com/ezyang	2022-11-01 17:59:35 +00:00
Han Qi (qihqi)	5c3666cb81	[codev] Make backport work with flatbuffer models (#88127 ) Summary: By adding flatbuffer as dependency of backport. Differential Revision: D40865452 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88127 Approved by: https://github.com/cccclai	2022-11-01 16:11:30 +00:00
Edward Z. Yang	1ff52225f1	Unify SymIntNode and SymFloatNode into SymNode (#87817 ) This refactor was prompted by challenges handling mixed int/float operations in C++. A previous version of this patch added overloads for each permutation of int/float and was unwieldy https://github.com/pytorch/pytorch/pull/87722/ This PR takes a different approach. The general outline of the patch is to combine the C++ types SymIntNode and SymFloatNode into a single type, SymNode. This is type erased; we no longer know statically at C++ if we have an int/float and have to test it with the is_int()/is_float() virtual methods. This has a number of knock on effects. - We no longer have C++ classes to bind to Python. Instead, we take an entirely new approach to our Python API, where we have a SymInt/SymFloat class defined entirely in Python, which hold a SymNode (which corresponds to the C++ SymNode). However, SymNode is not pybind11-bound; instead, it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode when it goes into C++. This implies a userland rename. In principle, it is also possible for the canonical implementation of SymNode to be written in C++, and then bound to Python with pybind11 (we have this code, although it is commented out.) However, I did not implement this as we currently have no C++ implementations of SymNode. Because we do return SymInt/SymFloat from C++ bindings, the C++ binding code needs to know how to find these classes. Currently, this is done just by manually importing torch and getting the attributes. - Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now takes SymInt/SymFloat, rather than SymNode, bringing it in line with how __torch_dispatch__ works. Some miscellaneous improvements: - SymInt now has a constructor that takes SymNode. Note that this constructor is ambiguous if you pass in a subclass of SymNode, so an explicit downcast is necessary. This means toSymFloat/toSymInt are no more. This is a mild optimization as it means rvalue reference works automatically. - We uniformly use the caster for c10::SymInt/SymFloat, rather than going the long way via the SymIntNode/SymFloatNode. - Removed some unnecessary toSymInt/toSymFloat calls in normalize_* functions, pretty sure this doesn't do anything. - guard_int is now a free function, since to guard on an int you cannot assume the method exists. A function can handle both int and SymInt inputs. - We clean up the magic method definition code for SymInt/SymFloat/SymNode. ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets plain methods; this is to help avoid confusion between the two types. Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817 Approved by: https://github.com/albanD, https://github.com/anjali411	2022-10-27 20:56:02 +00:00
PyTorch MergeBot	0c1dec375f	Revert "Back out "Revert D40198461: [pytorch][PR] Backport currently dont work with some models if:" (#87124 )" This reverts commit `a42fbfa0cb`. Reverted https://github.com/pytorch/pytorch/pull/87124 on behalf of https://github.com/ZainRizvi due to This is causing periodic jobs to fail	2022-10-21 16:03:00 +00:00
Han Qi (qihqi)	a42fbfa0cb	Back out "Revert D40198461: [pytorch][PR] Backport currently dont work with some models if:" (#87124 ) Summary: reland after fixing windows build failure for OVR. Notable change: ``` #if defined(FBCODE_CAFFE2) or defined(FB_XPLAT_BUILD) ``` changed to ```#if defined(FBCODE_CAFFE2) \|\| defined(FB_XPLAT_BUILD) ``` Appearently `-DFB_XPLAT_BUILD` wasn't getting picked up in windows if using `or `to connect Original commit changeset: 7a31fc4b455f Original Phabricator Diff: D40198461 Test Plan: waitforsandcastle Reviewed By: davidberard98, cccclai Differential Revision: D40290932 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87124 Approved by: https://github.com/gmagogsfm	2022-10-20 23:02:10 +00:00
Kurt Mohler	1dbc8ad3b7	Add `Warning` class and refactor C++ warnings to use it (#84101 ) Also adds `TORCH_WARN_WITH` and `TORCH_WARN_DEPRECATION` macros Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84101 Approved by: https://github.com/albanD	2022-10-18 20:02:42 +00:00
Nirav Mehta	fb614b1871	Enable UBSAN mode for test_jit (#85735 ) # Summary Run `test_jit` executable with UBSAN flag in order to catch errors that might cause internal breakage Pull Request resolved: https://github.com/pytorch/pytorch/pull/85735 Approved by: https://github.com/dagitses	2022-10-17 22:15:50 +00:00
Chengqi Deng	b43ae1c411	Add reference counter in FileStore (#85601 ) Fixes #67566. This diff added a reference counter in the FileStore object. The underlying file would be removed only if the reference counter became 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85601 Approved by: https://github.com/H-Huang	2022-10-07 17:59:29 +00:00
Wang, Eikan	70c6a988d6	Fix the performance issue that the for-loop before ExternallCall could not be parallelized. (#85056 ) Currently, NNC only parallelizes the loop statement of the graph outputs. The logic could bypass some loop statements that could be parallelized. Take an example as follows and suppose the output of `ExternallCall` is also the output of NNC fusion group. Current [parallel logic](https://github.com/pytorch/pytorch/pull/85056/files#diff-9a11174c26e4b57ab73e819520122bc314467c72962f3a5b79e7400ea3c4bbe5L781-L785) only tries to parallel the `ExternalCall` and bypass `stmt1` and `stmt2`. ```c++ stmt1: For: stmt2: For: stmt3: ExternalCall ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85056 Approved by: https://github.com/frank-wei, https://github.com/bertmaher	2022-10-07 07:36:28 +00:00
Sahan Paliskara	936e93058b	Delete torch::deploy from pytorch core (#85953 ) As we have migrated torch::deploy over to https://github.com/pytorch/multipy, we can now delete it from pytorch core as ongoing development will happen there. This PR was created due to syncing issues with https://github.com/pytorch/pytorch/pull/85443 which is where the review history can be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85953 Approved by: https://github.com/seemethere, https://github.com/malfet	2022-10-06 07:20:16 +00:00
Edward Z. Yang	6cd9c447da	Make test_api compile on DEBUG mode with some compiler versions (#86092 ) The symbol seems to conflict under some compiler versions, giving an error like "relocation refers to global symbol which is defined in a discarded section". Simple enough to put it in an anonymous namespace, so why not. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/86092 Approved by: https://github.com/Chillee	2022-10-03 13:52:32 +00:00
lezcano	787028cadb	Implement col2im decomposition and fix im2col and add a few preconditions (#85541 ) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/85541 Approved by: https://github.com/jansel	2022-09-30 09:31:53 +00:00
Min Si	1ad0048b64	Refactor distribuetd to use absolute header path (#85780 ) Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute. See D39835774 for more details about Meta internal complication. How to test: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780 Approved by: https://github.com/kumpera, https://github.com/huydhn	2022-09-30 05:13:50 +00:00
PyTorch MergeBot	a50d8864fc	Revert "Refactor distribuetd to use absolute header path (#85780 )" This reverts commit `668082718a`. Reverted https://github.com/pytorch/pytorch/pull/85780 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it breaks build due to a missing file <c10d/Store.hpp>	2022-09-30 02:04:29 +00:00
Min Si	668082718a	Refactor distribuetd to use absolute header path (#85780 ) Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute. See D39835774 for more details about Meta internal complication. How to test: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780 Approved by: https://github.com/kumpera	2022-09-30 00:27:24 +00:00
Mikayla Gawarecki	afaee00fec	Add python `nested_tensor` and `as_nested_tensor` constructors in `torch.nested` (#85593 ) Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ). Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc. Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593 Approved by: https://github.com/drisspg, https://github.com/cpuhrsch	2022-09-28 20:15:02 +00:00
Wang, Eikan	45be74cc63	Optimize to if the datatyep of the source tensor is as same as the dest datatype (#85140 ) The AMP inserts `_autocast_to_reduced_precision` and `_autocast_to_full_precision` automatically. The aten implementation provides a fast path to bypass the conversion if the tensor data type has been the reduced/full precision. But NNC always does the conversion which could bring >5% E2E performance regression. This PR is to address the performance issue like aten. We will not pull `_autocast_to_reduced_precision` and `_autocast_to_full_precision` into NNC fusion group and fallback to aten to trigger its fast path if the tensor data type has been the reduced/full precision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85140 Approved by: https://github.com/frank-wei	2022-09-27 04:40:42 +00:00
Alex Beloi	a38e43e936	[perf][1/5] Replace IValue::toString()->string() with IValue::toStringRef() (#85437 ) Summary: `IValue::toString()` creates a `new c10::intrusive_ptr` (like `std::shared_ptr`) and `->string()` immediately accesses it, creating an atomic reference increment/decrement. We can skip both of these operations by calling `IValue::toStringRef()`. Test Plan: CI Reviewed By: jaybean-dev Differential Revision: D39605242 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85437 Approved by: https://github.com/jfix71	2022-09-23 23:36:57 +00:00
jjsjann123	0e582fbfcc	[NVFuser] Upstream push 0907 (#84626 ) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Codegen changes include: - codegen improvement: i. improved view support on pointwise and transpose scheduler ii. grouped grid welford added for better outer-norm grid persistence in normalization - misc: i. new composite ops added: variance_mean , arange, ii. fixes misaligned address for transpose scheduler iii. refactor on separation of compilation API from execution API to prepare us for async compilation iv. double type support on expression evaluator v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN Commits that's in this PR from the devel branch: ``` 89330aa23aa804340b2406ab58899d816e3dc3d2 Tensor factories must set the output shape as its input (#1939) b2fd01ea9346712c6d6f623ca6addbc4888d008e arange support (#1933) 56c00fd3922dad7dfc57351ad7d780f0f2f8e4ed Double support on all expression evaluators (#1937) 371f28223e57fe3f6b5e50a0a45177e6a5c0785c Improve trivial reduction merge support (#1931) 1d0c26790e5647920b40d419d26815bbe310b3a6 Test `rand` in a fusion with zero tensor input (#1932) 0dab160fb2177d178eef3148c6a529e0855009e9 Fix softmax bwd sizes. (#1890) ef98f360f6d3e3e1cc662ecb65202d88150f128d Fix a bug (#1936) 63132a0c56508c550084b07fb76a3df865102d00 Propagate permissive mapping information into indexing pass (#1929) b4ac2c88d78078ee4d8b21c4fc51645b5710a282 Map IterationDomains through view operations. (#1919) c0a187a7619d7cf9dc920294e15461791e8d6d4d do not use deprecated functions (#1935) 88de85e758c5e4afb7b6e746573c0d9a53b4cea7 Upstream cherry pick fixes 0811 (#1934) b247dcf7c57dc6ac3f7a799b0a6beb7770536a74 Separate kernel compilation API from kernel execution API (#1914) b34e3b93ee1a8030730c14af3995dd95665af07d Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924) 14a53e6707f43bf760494c238a46386d69830822 Nullary RNGOp (#1892) 3c3c89e638f5172cafb0761f22bacd1fd695eec3 Misc fixes/tuning for transpose scheduler (#1912) 20cf109c8b44d48f61977e35bae94368985144ac Grouped grid welford (#1921) 6cf7eb024c9e53c358cbe56597e117bad56efefd Transpose scheduler small dim sizes better support (#1910) 9341ea9a5bf42f9b14ccad0c94edbc79fc5bb552 Disabled ViewPersistentShmoo sizes that results in NAN (#1922) 057237f66deeea816bb943d802a97c1b7e4414ab Fix CUDA driver error: misaligned address for transpose scheduler (#1918) 3fb3d80339e4f794767a53eb8fdd61e64cf404a2 Add variance_mean function using Welford (#1907) 98febf6aa3b8c6fe4fdfb2864cda9e5d30089262 Remove DisableOption::UnrollWithRng (#1913) ee8ef33a5591b534cf587d347af11e48ba7a15d4 Minor fix for the debug interface of using PTX directly (#1917) 6e8f953351f9dabfd1f991d8431cecb6c2ce684d Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916) 5eefa9a72385f6a4b145680a9dcc52d7e8293763 dopt is only available since nvrtc 11.7 (#1915) 2ec8fc711eafc72451eebf0f5e2a98a38bf3f6ef Kill computeAtBetween (#1911) d0d106a1d9af118d71673173674e875be35d259d Improve view support on pointwise and transpose scheduler (#1906) e71e1ecefe67219846070590bbed54bbc7416b79 Fix name clash of RNG with shared memory (#1904) 3381793a253689abf224febc73fd3fe2a0dbc921 Fix mutator and sameAs for expanded IterDomain (#1902) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84626 Approved by: https://github.com/malfet	2022-09-23 20:29:48 +00:00
Mike Iovine	63c1f2fef9	[Static Runtime] Fold linear prepack ops (#85289 ) Summary: Split `quantized_linear_unpacked_weight_v2` into `linear_prepack` and `quantized_linear` so that the prepacking operation may be eliminated by constant folding. Test Plan: Fixes a huge regression in an internal model: ``` Before 89.6141 ms. 99.0923%. fb::quantized_linear_unpacked_weight_v2 (12 nodes) After 0.806852 ms. 53.5365%. quantized::linear (12 nodes, out variant) (prepacking eliminated) ``` Differential Revision: D39622530 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85289 Approved by: https://github.com/davidberard98	2022-09-22 20:23:07 +00:00
Thomas Viehmann	e41d758e26	Handle implicit real->complex casting for backward of stack (#84993 ) Fixes: #75852 P.S.: Yay for the PyTorch foundation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84993 Approved by: https://github.com/soulitzer	2022-09-19 21:20:34 +00:00
Kevin Stephano	b8418e02eb	Create Cache for Fusion Reuse in NVFuser in Python Frontend for Primtorch (#85045 ) This PR does the following: - Replaces the `FusionOwner` with a `FusionCache` and `FusionInterface`. The `FusionCache` is a singleton that contains a cache of Fusions based on the `FusionDefinition`. It replaces the TorchScript graph caching that looked up a Fusion based on a stringified and canonicalized representation of the TorchScript graph with a prefix tree of statements in the `FusionDefinition`. The `FusionInterface` is an object that represents a Fusion in python. It can also query the cache based on id. - The ability to print out a mechanically derived definition, in python, for the user to use when debugging was added. - Replaces the python `examples` directory with true python tests under `test/test_nvfuser_frontend.py`. - Adds a set of C++ tests under the `test` directory to verify the `FusionCache`, `FusionDefinition`, and parts of the `RecordFunctor` child classes. - Adds a README file to explain how to use the Python Frontend While there are 3,000+ line edits, the bulk of the changes were repetitive line changes to the python bindings for each operation. An identical PR to #83267 to avoid tooling issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85045 Approved by: https://github.com/davidberard98	2022-09-17 10:52:54 +00:00
Steven Krawczyk	89525cbd69	Add variable_list support to ExtractVariables struct (#84583 ) This is required to unblock https://github.com/pytorch/xla/pull/3843, which lowers the einsum op for pytorch/xla. Because one method input parameter is a TensorList, we need to support TensorLists here so that we can support einsum gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84583 Approved by: https://github.com/soulitzer	2022-09-16 01:26:22 +00:00
Edward Z. Yang	65158b8876	empty strided symint (#84830 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84830 Approved by: https://github.com/ezyang	2022-09-15 04:09:43 +00:00
Sergii Dymchenko	d05f07494a	Use angle brackets in include for internal clangtidy (#85032 ) This issue was found after importing https://github.com/pytorch/pytorch/pull/70978 into fbsource. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85032 Approved by: https://github.com/huydhn	2022-09-15 03:08:51 +00:00
Edward Z. Yang	ccade9410f	Don't detach when making views; force caller to detach (#84893 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84893 Approved by: https://github.com/soulitzer, https://github.com/SherlockNoMad	2022-09-14 22:32:45 +00:00
PyTorch MergeBot	94b67f4cd8	Revert "Create Cache for Fusion Reuse in NVFuser in Python Frontend for Primtorch (#83267 )" This reverts commit `ec916bf6af`. Reverted https://github.com/pytorch/pytorch/pull/83267 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-09-14 17:40:22 +00:00
Howard Huang	74ead61944	[2/N] [Dispatchable Collectives] Extract ProcessGroup::Work into a separate class and update references (#83680 ) ### Changes - Move ProcessGroup::Work into its own class and update all the references to it / header includes. #### Motivation In the future PRs we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. This change is prevent a circular dependency with ProcessGroup depending on Backend and Backend depending on ProcessGroup::Work. Differential Revision: [D38839212](https://our.internmc.facebook.com/intern/diff/D38839212) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83680 Approved by: https://github.com/kwen2501	2022-09-14 13:05:58 +00:00
PyTorch MergeBot	8ca057eb71	Revert "Don't detach when making views; force caller to detach (#84893 )" This reverts commit `3bb8d6a93c`. Reverted https://github.com/pytorch/pytorch/pull/84893 on behalf of https://github.com/malfet due to Broke MPS, see `3bb8d6a93c`	2022-09-14 01:09:04 +00:00
Edward Z. Yang	3bb8d6a93c	Don't detach when making views; force caller to detach (#84893 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84893 Approved by: https://github.com/soulitzer, https://github.com/SherlockNoMad	2022-09-13 23:31:21 +00:00
Kevin Stephano	ec916bf6af	Create Cache for Fusion Reuse in NVFuser in Python Frontend for Primtorch (#83267 ) This PR does the following: - Replaces the `FusionOwner` with a `FusionCache` and `FusionInterface`. The `FusionCache` is a singleton that contains a cache of Fusions based on the `FusionDefinition`. It replaces the TorchScript graph caching that looked up a Fusion based on a stringified and canonicalized representation of the TorchScript graph with a prefix tree of statements in the `FusionDefinition`. The `FusionInterface` is an object that represents a Fusion in python. It can also query the cache based on id. - The ability to print out a mechanically derived definition, in python, for the user to use when debugging was added. - Replaces the python `examples` directory with true python tests under `test/test_nvfuser_frontend.py`. - Adds a set of C++ tests under the `test` directory to verify the `FusionCache`, `FusionDefinition`, and parts of the `RecordFunctor` child classes. - Adds a README file to explain how to use the Python Frontend While there are 3,000+ line edits, the bulk of the changes were repetitive line changes to the python bindings for each operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83267 Approved by: https://github.com/jjsjann123, https://github.com/davidberard98	2022-09-13 23:28:39 +00:00
vfdev-5	d951165bd8	[C++ API] Added missing antialiasing path in interpolation C++ api (#84599 ) Description: Following https://github.com/pytorch/pytorch/pull/69318#issuecomment-1238433540 adding missing bicubic path for anti-alias flag to C++ frontend. - https://github.com/pytorch/pytorch/pull/70930 - added tests in pytorch/test/cpp/api/functional.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/84599 Approved by: https://github.com/kit1980, https://github.com/malfet	2022-09-13 03:54:07 +00:00
Edward Z. Yang	9e5563dbb1	Delete SymIntArrayRef wrapper struct (#84837 ) Since we separated at::foo and at::foo_symint there is no benefit to trying to make initializer lists work in both cases. So we can get rid of the special different struct. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84837 Approved by: https://github.com/kit1980	2022-09-12 20:04:01 +00:00
PyTorch MergeBot	034f2db1fd	Revert "Delete SymIntArrayRef wrapper struct (#84837 )" This reverts commit `9c78f599e4`. Reverted https://github.com/pytorch/pytorch/pull/84837 on behalf of https://github.com/ZainRizvi due to The test test_post_localSGD_optimizer_step_reload in the X linux-bionic-cuda11.6-py3.10-gcc7 workflow has started consistently failing since this PR was submitted	2022-09-12 19:04:07 +00:00
Mikayla Gawarecki	e217b30b0f	Add `torch.nested` namespace (#84102 ) First step towards #83775 - only `to_padded_tensor` is moved to the nested namespace for now - following the schema used for `special`, `fft`, `linalg` and other namespaces, nested functions are registered in native_functions.yaml as `nested_{function_name}` and are bound to the desired Python name in `torch/nested/__init__.py`, and the desired C++ name in `torch/csrc/api/include/torch/nested.h`. ~~Question: should we keep the documentation for `Tensor.to_padded_tensor` or can this deleted since it is shared by `torch.nested.to_padded_tensor`?~~ [generated nested docs](https://docs-preview.pytorch.org/84102/nested.html?highlight=nested#module-torch.nested) Differential Revision: [D39361148](https://our.internmc.facebook.com/intern/diff/D39361148) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84102 Approved by: https://github.com/drisspg	2022-09-12 16:31:05 +00:00
Edward Z. Yang	9c78f599e4	Delete SymIntArrayRef wrapper struct (#84837 ) Since we separated at::foo and at::foo_symint there is no benefit to trying to make initializer lists work in both cases. So we can get rid of the special different struct. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84837 Approved by: https://github.com/kit1980	2022-09-12 16:28:20 +00:00
Edward Z. Yang	ad44670fa1	Back out "Revert D38984222: Don't introduce new overload for SymInt (#83628 )" (#84173 ) Also Back out "Revert D39075159: [acc_tensor] Use SymIntArrayRef for overloaded empty.memory_format's signature" Original commit changeset: dab4a9dba4fa Original commit changeset: dcaf16c037a9 Original Phabricator Diff: D38984222 Original Phabricator Diff: D39075159 Also update Metal registrations for C++ registration changes. Also update NNPI registration to account for tightened schema checking Differential Revision: [D39084762](https://our.internmc.facebook.com/intern/diff/D39084762/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39084762/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84173 Approved by: https://github.com/Krovatkin	2022-08-29 18:01:07 +00:00
PyTorch MergeBot	c7edcd6968	Revert "Don't introduce new overload for SymInt (#83628 )" This reverts commit `9790d90e4b`. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to Breaks internal builds, see D39076487	2022-08-27 01:23:17 +00:00
Peter Bell	b429a17545	Enable -Wunused-local-typedefs (#83708 ) I recently had a PR reverted because it triggered an unused-local-typedefs warning, so disabling these in the CMake build is counter-productive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83708 Approved by: https://github.com/albanD	2022-08-26 15:45:47 +00:00
Edward Z. Yang	9790d90e4b	Don't introduce new overload for SymInt (#83628 ) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it as if it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-08-26 01:35:40 +00:00
Xiang Gao	a4a55f5ea6	New TORCH_UCC_BLOCKING_WAIT env variable (#81791 ) Cherry-pick of https://github.com/facebookresearch/torch_ucc/pull/95. I recommend waiting until https://github.com/pytorch/pytorch/pull/81583 is merged first, so the CI is checking if this PR compiles correctly. Marking this as a draft for now, will change to "ready for review" once https://github.com/pytorch/pytorch/pull/81583 merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81791 Approved by: https://github.com/kwen2501	2022-08-25 21:33:17 +00:00
Nikolay Korovaiko	86e134ddf7	disable c10::SymIntNode tests on mobile (#84066 ) This fixes c++ tests' breaks where we were passing pointers and expected `is_symbolic` to return `true` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84066 Approved by: https://github.com/albanD	2022-08-25 17:28:23 +00:00
jjsjann123	b21a6ff639	[NVFuser] Upstream push 0811 (#83239 ) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. double support in expression evaluator - bug fixes: 1. dropout fix - rework RNG to support broadcasted dropout (Fixes #82784) 2. expand fix - Patch expand+reduction, expand+view, rework view analysis and guard - scheduler: 1. manual transpose schedule example 2. WIP transpose scheduler Commits that's in this PR from the devel branch: ``` b7435afcd22c917713c2f41a7237bc26e1183f14 Transpose scheduler, step 1 (#1854) 8a45dbf72034684eb8e18b1835b533e90b68f184 Add an example on how to manually schedule transpose (#1889) 83dbf56a9554b2efbd5416461d938fff477b0b27 Patch dropout fix (#1898) 69d3519a532250719b1aa8341b50e067b181b42d Expand+Reduction, Expand+View support, rework View analysis and guards (#1883) 15091c488e96343bdc49e3990acbf238a3b3da51 Rework RNG to correctly support broadcasted dropout (#1888) aafe2d048aaac596e503596a41303423619f3954 Make ExpressionEvaluator support Double (#1885) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D38657074](https://our.internmc.facebook.com/intern/diff/D38657074) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83239 Approved by: https://github.com/davidberard98	2022-08-25 02:23:22 +00:00
PyTorch MergeBot	a7edf71360	Revert "Don't introduce new overload for SymInt (#83628 )" This reverts commit `8fae7027b3`. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to breaking internal builds, see https://www.internalfb.com/diff/D38984222	2022-08-25 00:49:40 +00:00
Richard Barnes	67f0940cdd	Check all CUDA API calls for errors in test/ (#74921 ) (#83954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74921 Test Plan: Sandcastle Reviewed By: ezyang, malfet, ngimel Differential Revision: D35194966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83954 Approved by: https://github.com/ezyang	2022-08-24 20:12:25 +00:00
Larry Liu	a8a36c45a6	[frontend] Fix tensor list alias annotation (#84005 ) For issue https://github.com/pytorch/pytorch/issues/77920 and a retry of https://github.com/pytorch/pytorch/pull/83921 The current logic checks alias info before `[]` and after. If no alias info exists after `[]`, we overwrite the alias info before. This logic failed on argument like `Tensor(a!)[]`, dropping the alias info before `[]` on the floor. This PR adds a new alias info if it's missing after `[]`. This way we can keep the alias info before `[]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84005 Approved by: https://github.com/cccclai, https://github.com/bdhirsh	2022-08-24 19:50:19 +00:00
Nikolay Korovaiko	b842670aa5	logical ops (#83879 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83879 Approved by: https://github.com/ezyang	2022-08-24 17:49:57 +00:00
Nikolay Korovaiko	2b805e3520	add arithmetic ops (#83878 ) arithmetic ops tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/83878 Approved by: https://github.com/ezyang	2022-08-24 17:49:56 +00:00
Edward Z. Yang	8fae7027b3	Don't introduce new overload for SymInt (#83628 ) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it as if it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-08-23 22:04:07 +00:00
Nikolay Korovaiko	fcb124406b	release the current symintnode in the move c-tor (#83789 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83789 Approved by: https://github.com/ezyang	2022-08-22 14:37:06 +00:00
Milad Mohammadi	72963bbae9	Update isDynamic api to align with is_symbolic API (#83415 ) Downstream #https://github.com/pytorch/xla/pull/3888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83415 Approved by: https://github.com/Krovatkin	2022-08-18 22:53:19 +00:00
soulitzer	31fad3926a	Add option to run anomaly mode without nan checking (#83481 ) Fixes https://github.com/pytorch/pytorch/issues/83117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83481 Approved by: https://github.com/albanD	2022-08-16 22:56:23 +00:00
richard	382ef1fda7	Autograd graphtask trim unnecessary edges (#82544 ) ### Introduction <!-- What did you change and why was it needed? --> Removing unnecessary weight gradient calculation is very important for applications that need high-order derivatives during training. However, this is not supported by the current Autograd engine. For more detail: The backward function of a `matmul` operator (e.g., `linear` `addmm` `mm`), has two matmuls, one for `input gradient` and another for `weight gradient`. For a typical neural network (nn) with a few linear layers and activation functions, if the user calls `torch.autograd.grad()` to calculate the derivative of the nn output `y` w.r.t the nn input `x`, only the `input gradient` of the `matmul` operator is needed, and the `weight gradient` is discarded. However, the current PyTorch autograd engine will always calculate the `weight gradient` if `weight` requires gradient (the calculation of the high-order derivative is performed during training). The figure attached shows the autograd graph of the following code snippet: ```py y = torch.nn.functional.linear(x, weight, bias) y = y.pow(2) # first order derivative y__x, = torch.autograd.grad(y, x, grad_outputs=grad_outputs, create_graph=True) # first order derivative y__x__x, = torch.autograd.grad(y__x, x, grad_outputs=grad_outputs, create_graph=True) ``` The path with ❌ is not needed when calculating derivatives. <img width="50%" alt="image" src="https://user-images.githubusercontent.com/9999318/182018117-719c5a23-bcc6-4a63-8e8d-1bca3ebda2e3.png"> ### Issue <!-- Link to Issue ticket or RFP --> Related issue: https://github.com/pytorch/pytorch/issues/56500 ### Method When calling `torch.autograd.grad`, `exec_info_` is created for each GraphTask, which allows filtering paths on the graph that are not needed. However, when the GraphTask calls into the node, the node still does not know whether the edges are needed or not. In the case of matmul, `weight.requires_grad is True` so the weight gradient is always calculated. Following https://github.com/pytorch/pytorch/issues/56500#issuecomment-825694656, this PR passes the graph task's thread_local `exec_info_` into the node, so it could trim unnecessary edges during `torch.autograd.grad` calls. ### Benchmark Benchmark script: https://gist.github.com/yueyericardo/24158433a2021c51eeef9c3e2722df99 Benchmark result: 6 hidden layers, batch size 10000, on A100 FP32 result \| hessian benchmark \| FP32 (before) \| FP32 (After) \| FP32 (Functorch v0.1.1) \| \| ----------------------------- \| ------------- \| ----------------- \| ----------------------- \| \| Linear + ReLU (no backward) \| 55.658 ms \| 29.392 ms (1.90X) \| 29.547 ms (1.90X) \| \| Linear + ReLU (with backward) \| 81.173 ms \| 54.917 ms (1.47X) \| 68.988 ms (1.18X) \| TF32 result \| hessian benchmark \| TF32 (before) \| TF32 (after) \| TF32 (Functorch v0.1.1) \| \| ----------------------------- \| ------------- \| ----------------- \| ----------------------- \| \| Linear + ReLU (no backward) \| 19.801 ms \| 11.259 ms (1.76X) \| 10.754 ms (1.84X) \| \| Linear + ReLU (with backward) \| 29.167 ms \| 20.466 ms (1.42X) \| 22.784 ms (1.28X) \| For FP32 result, we could get 1.9X speed up for hessian calculation, and 1.47X speed up during training, which is even faster than functorch `vmap(jacfwd(jacrev` implementation. (functorch has performance regression on v0.2.0, https://github.com/pytorch/functorch/issues/989, so we are using v0.1.1 for benchmark) @zou3519 does functorch also includes similar optimizations during hessian calculation? If not, what do we need to do so the functorch could also benefit from this PR? ### Testing <!-- How did you test your change? --> - [x] we need to figure out a way for unittest ### Thanks Thanks for the great blog: [How Computational Graphs are Executed in PyTorch \| PyTorch](https://pytorch.org/blog/how-computational-graphs-are-executed-in-pytorch/) cc @zasdfgbnm @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/82544 Approved by: https://github.com/soulitzer	2022-08-11 18:50:09 +00:00
Nikita Shulga	1b2a17b8f9	Build MacOS binaries with `-Werror` (#83049 ) Should prevent proliferating MPS warnings Fixes https://github.com/pytorch/pytorch/issues/82966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83049 Approved by: https://github.com/albanD, https://github.com/ezyang	2022-08-10 17:29:44 +00:00
Nikita Shulga	62c8d30f9f	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-10 14:32:26 +00:00
PyTorch MergeBot	d3a1f17fc7	Revert "[BE] Add `append_cxx_flag_if_supported` macro (#82883 )" This reverts commit `d7e6aaa59b`. Reverted https://github.com/pytorch/pytorch/pull/82883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-10 10:27:59 +00:00
David Chen	90821aab10	Add SOFT_ASSERT to gracefully recover from invariant violations (#82689 ) Summary: Implement SOFT_ASSERT that only fails in debug mode, but only trigger a warning log in release mode. This allows us to gracefully handle some of the invariant violation when processing traces that doesn't necessarily need to crash the entire program. Test Plan: Added SOFT_ASSERT test in containers.cpp Differential Revision: D38327334 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82689 Approved by: https://github.com/robieta	2022-08-10 00:58:07 +00:00
Han Qi (qihqi)	f9533560cc	Use flatbuffer of alternate namespace (#82952 ) Summary: Minimal change to make use of flatbuffer with fbsource namespace. Test Plan: existing unit tests Differential Revision: D38494999 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82952 Approved by: https://github.com/cccclai	2022-08-09 07:40:59 +00:00
Tugsbayasgalan Manlaibaatar	b4b60c2a2e	Get rid of ENABLE_UPGRADERS macro (#77574 ) Since it's been a while after we merged the upgrader design and we haven't encountered any issues, let's get rid of the macro for safe rollout Pull Request resolved: https://github.com/pytorch/pytorch/pull/77574 Approved by: https://github.com/gmagogsfm	2022-08-09 05:33:14 +00:00
Nikita Shulga	d7e6aaa59b	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-08 21:04:09 +00:00
Dave Bort	6e712823c5	Migrate remaining pytorch code to use new flatbuffer_loader.h APIs (#82620 ) This is the only file in pytorch core that refers to the deprecated flatbuffer_loader.h APIs. Move to the non-deprecated functions. Differential Revision: [D38330369](https://our.internmc.facebook.com/intern/diff/D38330369/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82620 Approved by: https://github.com/qihqi	2022-08-05 02:25:33 +00:00
Dave Bort	0810961d5f	Remove flatbuffer types/headers from flatbuffer_serializer[_jit].h (#82619 ) Hide the flatbuffers types and headers from the serialize APIs, and stop using the DEPRECATED functions from flatbuffer_loader.h. This required creating the new `DetachedBuffer` type to replace/hide `flatbuffers::DetachedBuffer`, a class that owns a span of custom-allocated memory. This is another step towards hiding the flatbuffers types and headers from the load/serialize APIs. Differential Revision: [D38292798](https://our.internmc.facebook.com/intern/diff/D38292798/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38292798/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/82619 Approved by: https://github.com/qihqi	2022-08-05 02:23:34 +00:00
Howard Huang	9d228fe517	[Small] Remove using c10d::ProcessGroup directive from c10d test (#82681 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82681 Approved by: https://github.com/awgu	2022-08-03 17:23:35 +00:00
Edward Z. Yang	df69660832	Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )"" (#82599 ) This reverts commit `532b8a9e00`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599 Approved by: https://github.com/albanD	2022-08-02 19:37:02 +00:00
PyTorch MergeBot	532b8a9e00	Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )" This reverts commit `9465c0e0b5`. Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels	2022-08-01 20:25:35 +00:00
Edward Z. Yang	9465c0e0b5	Add a lint rule for torch/csrc/util/pybind.h include (#82552 ) We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations always be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552 Approved by: https://github.com/albanD	2022-08-01 17:16:58 +00:00
Edward Z. Yang	50e8abbcad	Change SymIntNode into an intrusive pointer (#82548 ) This will make the pointer type a single word, which is important for packing it into an int64_t This time, this diff doesn't segfault when you build with DEBUG mode; more details at https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82548 Approved by: https://github.com/albanD	2022-08-01 15:07:21 +00:00
PyTorch MergeBot	3b9cbb1738	Revert "Change SymIntNode into an intrusive pointer (#82432 )" This reverts commit `7be44f8158`. Reverted https://github.com/pytorch/pytorch/pull/82432 on behalf of https://github.com/ezyang due to segfaults on test but not caught in CI	2022-07-29 20:08:59 +00:00
Edward Z. Yang	7be44f8158	Change SymIntNode into an intrusive pointer (#82432 ) This will make the pointer type a single word, which is important for packing it into an int64_t Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82432 Approved by: https://github.com/albanD, https://github.com/Krovatkin	2022-07-29 17:32:54 +00:00
Max Ren	727a327162	Back out "Back out "[profiling] Adding targets file for test_mobile_profiler"" (#82243 ) Summary: Originally reverted this diff D37116110 (`c9aa74a37f`) because ``` > /usr/local/bin/buck build //caffe2/test/cpp/lite_interpreter_runtime/... BUILD FAILED The rule //caffe2:backend_interface_libAndroid could not be found. Please check the spelling and whether it is one of the 1866 targets in /data/users/batanasov/fbsource/fbcode/caffe2/TARGETS. (52107 bytes) 1 similar targets in /data/users/batanasov/fbsource/fbcode/caffe2/TARGETS are: //caffe2:backend_interface_lib This error happened while trying to get dependency '//caffe2:backend_interface_libAndroid' of target '//caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profilerAndroid' At //caffe2:backend_interface_libAndroid (ovr_config//platform/linux:x86_64-fbcode) At //caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profilerAndroid (ovr_config//platform/linux:x86_64-fbcode) ``` The add test_mobile_profiler was not meant to be built with Android or other mobile platforms, so we are changing the test to a cpp_unittest Test Plan: ``` buck test //caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler Parsing buck files: finished in 0.9 sec Creating action graph: finished in 26.5 sec Downloaded 2/2 artifacts, 1.30 Mbytes, 0.0% cache miss (for updated rules) Building: finished in 16.5 sec (100%) 18451/18451 jobs, 3/18451 updated Total time: 44.0 sec More details at https://www.internalfb.com/intern/buck/build/8bee82c1-66a9-4fae-805f-e4ef5505d25d BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 6904f989-5c17-4c5b-9a4f-ffb643dfcc43 Trace available for this run at /tmp/tpx-20220726-114727.001729-6904f989-5c17-4c5b-9a4f-ffb643dfcc43/trace.log RemoteExecution session id: reSessionID-6904f989-5c17-4c5b-9a4f-ffb643dfcc43-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425183404951 ✓ ListingSuccess: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler : 3 tests discovered (17.640) ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.Backend (0.206) ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.BackendMemoryEvents (0.271) ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.ModuleHierarchy (0.268) Summary Pass: 3 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425183404951 ``` Differential Revision: D38166171 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82243 Approved by: https://github.com/salilsdesai	2022-07-28 23:08:52 +00:00
Edward Z. Yang	fd5ac1e6b5	Rename SymbolicIntNode to SymIntNodeImpl (#82350 ) Done via ``` git grep -l 'SymbolicIntNode' \| xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g' ``` Reasoning for the change: * Sym is shorter than Symbolic, and consistent with SymInt * You usually will deal in shared_ptr<...>, so we're going to reserve the shorter name (SymIntNode) for the shared pointer. But I don't want to update the Python name, so afterwards I ran ``` git grep -l _C.SymIntNodeImpl \| xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/' ``` and manually fixed up the binding code Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350 Approved by: https://github.com/Krovatkin	2022-07-28 18:27:45 +00:00
goldenxuett	c2ccf6e625	[JIT] Add backwards compatibility test for old NonDeterminism ops list in ir.cpp (#82257 ) - Added backwards compatibility test to ensure that every Op in the old Nondeterministic op list from ir.cpp has the tag nondeterministic_seeded. Note that the 3 ops marked "normal" were not actually real op signatures. (ie findOp with dispatcher returned a nullptr). These were changed to normal.Tensor_Tensor, normal.Tensor_float and normal.float_Tensor in the list since that is what matches the rest of their signatures Pull Request resolved: https://github.com/pytorch/pytorch/pull/82257 Approved by: https://github.com/davidberard98	2022-07-27 20:19:22 +00:00
goldenxuett	8d5951e7e8	[JIT] Add is_aliasing method to FunctionSchema (#82255 ) - Add is_aliasing method in function schema to be able to indicate if an argument has an alias_set attached to it. This is utilized in the integration with autograd (see next PR) - Tested in test_schema_info Pull Request resolved: https://github.com/pytorch/pytorch/pull/82255 Approved by: https://github.com/davidberard98	2022-07-27 20:19:21 +00:00
goldenxuett	67c22b6c07	[JIT] Modify is_nondeterministic to utilize tags in SchemaInfo for non-mobile contexts and integrate with ir.cpp (#82253 ) - Modified is_nondeterministic method in SchemaInfo class to utilize tags. - Modified isNonDeterministic method in ir.cpp to utilize SchemaInfo when a Node is an aten op. - Added an assert to ensure that if a node is an aten op kind, it has a schema. - Tested through verifying that all IR.cpp tests run, and through adding 2 custom determinism checks to test for the special dropout edge case and a general bernoulli case. Differential Revision: [D38179499](https://our.internmc.facebook.com/intern/diff/D38179499) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82253 Approved by: https://github.com/davidberard98	2022-07-27 20:19:19 +00:00
PyTorch MergeBot	e1bd244a14	Revert "[JIT] Modify is_nondeterministic to utilize tags in schemaInfo and integrate with ir.cpp (#81836 )" This reverts commit `fc3555ce4d`. Reverted https://github.com/pytorch/pytorch/pull/81836 on behalf of https://github.com/osalpekar due to Internal Mobile NNPACK custom_ops tests failing with Error: tags are not saved for Mobile	2022-07-26 19:11:49 +00:00
PyTorch MergeBot	cbeef2c541	Revert "[JIT] Add is_aliasing method to FunctionSchema (#81916 )" This reverts commit `eb2ea9a581`. Reverted https://github.com/pytorch/pytorch/pull/81916 on behalf of https://github.com/osalpekar due to Need to revert this to revert https://github.com/pytorch/pytorch/pull/81836 cleanly. That PR broke internal mobile custom_ops	2022-07-26 19:06:56 +00:00
PyTorch MergeBot	18dd7e55c9	Revert "[JIT] Add backwards compatibility test for old NonDeterminism ops list in ir.cpp (#82029 )" This reverts commit `7288ea4e1d`. Reverted https://github.com/pytorch/pytorch/pull/82029 on behalf of https://github.com/osalpekar due to Need to revert this to revert https://github.com/pytorch/pytorch/pull/81836 cleanly. That PR broke internal mobile custom_ops	2022-07-26 19:00:44 +00:00
Will Constable	4f34cd6d1e	Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032 ) Avoid exposing defines that conflict with google logging, since this blocks external usage of libtorch in certain cases. All the 'interesting' changes should be in these two files, and the rest should just be mechanical changes via sed. c10/util/logging_is_not_google_glog.h c10/util/logging_is_google_glog.h Fixes https://github.com/pytorch/pytorch/issues/81415 cc @miladm @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/82032 Approved by: https://github.com/soumith, https://github.com/miladm	2022-07-26 01:20:44 +00:00
goldenxuett	7288ea4e1d	[JIT] Add backwards compatibility test for old NonDeterminism ops list in ir.cpp (#82029 ) - Added backwards compatibility test to ensure that every Op in the old Nondeterministic op list from ir.cpp has the tag nondeterministic_seeded. Note that the 3 ops marked "normal" were not actually real op signatures. (ie findOp with dispatcher returned a nullptr). These were changed to normal.Tensor_Tensor, normal.Tensor_float and normal.float_Tensor in the list since that is what matches the rest of their signatures Pull Request resolved: https://github.com/pytorch/pytorch/pull/82029 Approved by: https://github.com/davidberard98	2022-07-25 15:44:34 +00:00
goldenxuett	eb2ea9a581	[JIT] Add is_aliasing method to FunctionSchema (#81916 ) - Add is_aliasing method in function schema to be able to indicate if an argument has an alias_set attached to it. This is utilized in the integration with autograd (see next PR) - Tested in test_schema_info Pull Request resolved: https://github.com/pytorch/pytorch/pull/81916 Approved by: https://github.com/davidberard98	2022-07-25 15:44:33 +00:00
goldenxuett	fc3555ce4d	[JIT] Modify is_nondeterministic to utilize tags in schemaInfo and integrate with ir.cpp (#81836 ) - Modified is_nondeterministic method in SchemaInfo class to utilize tags. - Modified isNonDeterministic method in ir.cpp to utilize SchemaInfo when a Node is an aten op. - Added an assert to ensure that if a node is an aten op kind, it has a schema. - Tested through verifying that all IR.cpp tests run, and through adding 2 custom determinism checks to test for the special dropout edge case and a general bernoulli case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81836 Approved by: https://github.com/davidberard98	2022-07-25 15:44:31 +00:00
goldenxuett	9a5fa15ea8	[JIT] Remove BatchNorm and InstanceNorm special cases from AliasDB and replace with SchemaInfo is_mutable checks (#81785 ) - Generalized AnalyzeImpl cases for batchNorm and InstanceNorm in alias_analysis.cpp using schema_info. - Tested by ensuring all aliasDB special case checks for batchNorm and instanceNorm pass as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81785 Approved by: https://github.com/davidberard98	2022-07-23 05:50:39 +00:00
Peter Bell	8d0cbce069	Lower randint default dtype to the C++ API (#81410 ) The default dtype for randint is currently handled with manual python binding code, this moves it into the `native_functions.yaml` declaration for API consistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81410 Approved by: https://github.com/albanD	2022-07-21 16:42:49 +00:00
goldenxuett	c9497886fd	[JIT] Modify is_mutable in FunctionSchema and SchemaInfo to have SchemaArgument parameter instead of index (#81784 ) - Modify the is_mutable(size_t index) overload to become is_mutable(const SchemaArgument& argument) due to cases where one might want to check the mutability of either input or output arguments. - Refactored all calls to the function to use this new overload - Tested through is_mutable() tests in test_schema_info.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/81784 Approved by: https://github.com/davidberard98	2022-07-20 22:09:56 +00:00
goldenxuett	1ddbc5a7dc	[JIT] Remove has_side_effects functionality from SchemaInfo (#81575 ) - This removes all functionality from https://github.com/pytorch/pytorch/pull/81002 due to a realization that the side effects check doesn't affect any ops outside of JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81575 Approved by: https://github.com/davidberard98	2022-07-19 22:33:19 +00:00
goldenxuett	a6e716cfed	[JIT] Add may_contains_alias function in SchemaInfo class (#81444 ) - Created may_contain_alias method in SchemaInfo which is a wrapper around FunctionSchema may_contain_alias that also accounts for argument values. This is done using similar logic to AliasDB using an internal understanding of wildcard sets and container object - Added a multitude of tests for various graph edge cases (inputs aliasing, outputs aliasing, multiple input wildcards, multiple container objects, etc...). Pull Request resolved: https://github.com/pytorch/pytorch/pull/81444 Approved by: https://github.com/davidberard98	2022-07-19 04:29:22 +00:00
goldenxuett	47cdab6601	[JIT] Fix double wildcard edge case for may_alias in SchemaInfo and improve formatting (#81439 ) - Create c10::AliasTypeSet type def of vector<TypePtr> to match alias_analysis.cpp formatting and improve readability. - Move canAliasTypeSetsAlias, mapTypeToAliasTypeSet, getAliasTypeSetContainedTypes, and getCorrectList to public in function_schema.h for use in SchemaInfo class. In the future it might be better to find a different home for most of these functions since they don't depend on functionSchema. - Created hash function for SchemaArgument - Add assert to ensure that there is only 1 input and 1 output with each alias set (excluding wildcard) - Fixed double wildcard input edge case for may_alias. (This is the case where if there is a schema with the form (Tensor(a) a, Tensor() b, Tensor() c) -> Tensor, and the argument values for 'a' and 'b' cause them to alias, then 'a' may also alias 'c'. - Added tests for double wildcard case in may_alias, mismatching types in may_alias, and the uniqueness internal assert. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81439 Approved by: https://github.com/davidberard98	2022-07-19 04:29:21 +00:00
Nikolay Korovaiko	4aac42cc98	[LT] Add a new backend interface [DUP of the original] (#81662 ) This is a dup of https://github.com/pytorch/pytorch/pull/76517 which is failing because Jiewen needs to resign the CLA. Summary: This commit introduces a new set of BackendImplInterface: GetDefaultDeviceOrdinal and SetDefaultDeviceOrdinal. It allows backend to specify their own default device, e.g, 1 for XLA and 0 for CUDA/CPU. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.* ghstack-source-id: b4adfef49253e51bffbbf40d356188a92c98994d Pull Request resolved: https://github.com/pytorch/pytorch/pull/76517 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/81662 Approved by: https://github.com/JackCaoG, https://github.com/wconstab	2022-07-19 01:15:22 +00:00
zhang, xiaobing	86b86202b5	fix torch.config can't respect USE_MKLDNN flag issue (#75001 ) Fixes https://github.com/pytorch/pytorch/issues/74949, which reports that torch.config can't respect USE_MKLDNN flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75001 Approved by: https://github.com/malfet	2022-07-17 15:00:48 +00:00
goldenxuett	e71f4e7958	[JIT] Implement may_contain_alias in FunctionSchema (#81352 ) - Created may_contain_alias method in FunctionSchema to publicize more detailed aliasing information about inputs and outputs of a schema. This method returns whether the first argument may contain an alias to the second argument (ie if the first argument is a list[Tensor], it can contain an alias to the second argument of the second argument is Tensor(*)) and vice versa if bidirectional = true. - Created helper methods are explained more thoroughly in detail in function_schema.h -Tested may_contain_alias methods for basic functionality, bidirectional functionality, wildcard functionality and dual container functionality in test_schema_info.cpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81352 Approved by: https://github.com/davidberard98, https://github.com/Gamrix	2022-07-15 21:57:37 +00:00
goldenxuett	42ee1608d3	[JIT] Add special cases batch_norm, instance_norm and dropout for SchemaInfo (#81007 ) - Added special cases for detach in is_non_deterministic() check and batch_norm and instance_norm in is_mutable() check in SchemaInfo(). - Added tests for the above special cases for detach, batch_norm and instance_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81007 Approved by: https://github.com/davidberard98	2022-07-15 04:52:02 +00:00
goldenxuett	3b4964230e	[JIT] Add side effects checks for ops in SchemaInfo subclass (#81002 ) - Added has_side_effects method which returns whether a given op has side effects. Currently this is implemented with a hard-coded list of functions copied from ir.cpp in AliasDB, but this will eventually be implemented by returning with a given schema has the has_side_effects tag. - Tested in test_schema_info.cpp with both an op with side effects and an op without side effects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81002 Approved by: https://github.com/davidberard98	2022-07-13 00:39:30 +00:00
goldenxuett	14c28caed9	[JIT] Add determinism checks for ops in SchemaInfo subclass (#81000 ) - Added is_non_deterministic which returns whether a given op is non-deterministic. Currently this is implemented with a hard-coded list of non-deterministic functions copied from ir.cpp in AliasDB, but this will eventually be implemented by returning with a given schema has the non_deterministic tag. - Tested is_non_deterministic method with a deterministic op and a non deterministic op in test_schema_info.cpp Note that the case for op "aten::dropout(Tensor input, float p, bool train) -> Tensor" which is deterministic whenever "train=false" is not accounted for in this pr and will be fixed in a later pr. Currently "aten::dropout(Tensor input, float p, bool train) -> Tensor" is always considered nondeterministic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81000 Approved by: https://github.com/davidberard98	2022-07-13 00:35:42 +00:00
goldenxuett	50ba94f5cc	[JIT] Add aliasing checks in SchemaInfo with associated tests (#80984 ) - Created may_alias method in SchemaInfo to update the implementation of FunctionSchema::may_alias for aliasing cases due to inputs aliasing. - Created output_alias_map_ internal variable to check cases where outputs might alias due to inputs aliasing. This variable is updated in generateAliasMap(). - Added tests for various may_alias special cases (input - input, input - output, output - output) due to inputs aliasing causing other arguments to also alias. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80984 Approved by: https://github.com/davidberard98	2022-07-13 00:18:43 +00:00
goldenxuett	aa61fdb667	[JIT] Add argumentValue functions and is_mutable checks to SchemaInfo (#80972 ) - Created addArgumentValue/s methods in SchemaInfo to pass argument values into the subclass. These are used for more accurate mutation, aliasing and determinism checks which include special cases. - Added input_alias_map_ to keep track of which inputs alias each other. This is updated with the method generateAliasMap. - Implemented is_mutable methods in SchemaInfo which also give information based on argument values. For instance, if two inputs alias and one is mutable by the schema, then the other will also be mutable. - Tested Schema Info is_mutable implementation where inputs alias as mentioned above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80972 Approved by: https://github.com/davidberard98	2022-07-13 00:16:41 +00:00
goldenxuett	e3a870986e	[JIT] Add may_alias in FunctionSchema with associated tests (#80918 ) - Created may_alias method in FunctionSchema to publicize aliasing information about inputs and outputs of a schema. - Tested may_alias methods for basic functionality, exceptions, and wildcard functionality. Cases where elements of a container alias another argument will be handled with a new may_contain_alias method which will be created in a later pr Pull Request resolved: https://github.com/pytorch/pytorch/pull/80918 Approved by: https://github.com/davidberard98	2022-07-12 18:07:23 +00:00
goldenxuett	b4e342928b	[JIT] Add mutability checks in FunctionSchema and create SchemaInfo subclass (#80734 ) - Added overloads to is_mutable method in FunctionSchema to tell whether an argument at index is mutable or an argument with name is mutable. - Created SchemaInfo subclass of FunctionSchema with constructors from FunctionSchema and from const char* signature. - Tested is_mutable method overloads in new test_schema_info.cpp file. Note that this pr is used to set up SchemaInfo. Implementation for SchemaInfo will be addressed in later commits Differential Revision: [D37651384](https://our.internmc.facebook.com/intern/diff/D37651384) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80734 Approved by: https://github.com/davidberard98	2022-07-11 19:13:06 +00:00
soulitzer	516f3198d6	Fix retains grad behavior after in-place (#79996 ) See this doc: https://docs.google.com/document/d/1KiRdnoj6B4cI3yl017hTbCqcOGO1gWIpUf20sldipHM/edit# Two issues (1) regarding hooks in general and (2) regarding retains grad hooks are fixed, Python hooks, which rely on a different mechanism are not discussed here: - Hooks in cpp in general - (fixed) new hooks to registered to a newer version of the tensor no longer get applied to grad_fn associated with older version of the tensor when the first hook was ever registered - (unchanged) hooks registered to the older version of the tensor remain active on - Retains grad hooks - (fixed) now get moved to the latest grad_fn. NB: To the user, retains_grad is not considered hooks or expected to behave like hooks (which we consider properties of the grad_fn) vs retains_gradness which is a property of the tensor. - (not in this PR) Python hooks - (will fix) same issue as hooks in cpp where new hooks are being applied to grad_fn associated with the older version of the tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/79996 Approved by: https://github.com/albanD	2022-07-08 19:13:28 +00:00
Sergii Dymchenko	b0aaefb50f	Build example_allreduce only for GLOO (#81062 ) `example/allreduce.cpp` is GLOO-specific and will not compile with USE_GLOO=0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81062 Approved by: https://github.com/malfet	2022-07-08 02:25:54 +00:00
Nikolay Korovaiko	8389ccbcd8	reinstate size and shape returning symints (#79560 ) This PR redirects `size` and `.shape` to call `sym_sizes` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79560 Approved by: https://github.com/Chillee	2022-07-08 01:17:33 +00:00
Boyan Atanasov	b603860c1d	Back out "[profiling] Adding targets file for test_mobile_profiler" (#80789 ) Summary: Original commit changeset: 38314c83d223 Original Phabricator Diff: D37116110 (`c9aa74a37f`) Reviewed By: mcr229 Differential Revision: D37582906 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80789 Approved by: https://github.com/bochko, https://github.com/salilsdesai	2022-07-05 23:34:15 +00:00
Han Qi (qihqi)	c93ceef658	Wrap static initializers in ifdef (#80590 ) because, on iOS some projects has -Wglobal-constructors and it won't build. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/80590 Approved by: https://github.com/cccclai	2022-07-01 04:42:17 +00:00
Max Ren	c9aa74a37f	[profiling] Adding targets file for test_mobile_profiler (#80351 ) Summary: Testing for successful recording of backend events. Testing checks that the trace file successfully adds the memory recording from the backend at execute. The record in the trace file looks like: ``` { "ph": "i", "cat": "cpu_instant_event", "s": "t", "name": "[memory]", "pid": 847267, "tid": 847267, "ts": 1655333276408215, "args": { "Device Type": 0, "Device Id": -1, "Addr": 108370615407104, "Bytes": 16384, "Total Allocated": 16384, "Total Reserved": 49152 } } ``` Test Plan: ``` buck test //caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler Parsing buck files: finished in 1.6 sec Creating action graph: finished in 30.9 sec Downloaded 0/5 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 37.9 sec (100%) 25314/25314 jobs, 5/25314 updated Total time: 01:10.5 min More details at https://www.internalfb.com/intern/buck/build/ef1c4324-13d3-494e-bce7-8004047d5f89 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 17f300d4-9a78-4302-9e9e-d7ab79ba1ff0 Trace available for this run at /tmp/tpx-20220615-165413.567757-17f300d4-9a78-4302-9e9e-d7ab79ba1ff0/trace.log RemoteExecution session id: reSessionID-17f300d4-9a78-4302-9e9e-d7ab79ba1ff0-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7881299443250383 ✓ ListingSuccess: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler : 3 tests discovered (37.049) ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.Backend (0.402) ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.ModuleHierarchy (0.487) ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.BackendMemoryEvents (0.280) Summary Pass: 3 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7881299443250383 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users ``` Differential Revision: D37116110 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80351 Approved by: https://github.com/kimishpatel	2022-06-30 17:27:35 +00:00
Sergei Vorobev	a8b0988596	Fix //:module_test Conversion_MultiCUDA (#79926 ) Fixes #79871 Make `module.cpp` tests respect change that was made in #78436 (no int types in autograd). Note that there still a gap in Cmake test -- it's unclear why it didn't fail CI before. As far as I can tell it should be executed, because it's included here `79507d2a9d/test/cpp/api/CMakeLists.txt (L17)`:L17 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79926 Approved by: https://github.com/soulitzer	2022-06-21 23:32:18 +00:00
Nikolay Korovaiko	efc7343743	Revert "Revert "Put symint overloads on a different name"" (#79680 ) This relands https://github.com/pytorch/pytorch/pull/79281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79680 Approved by: https://github.com/malfet	2022-06-21 07:06:33 +00:00
Han Qi (qihqi)	fed12ff680	[BE][flatbuffer] Remove code duplications and refactor (#79184 ) Summary: Remove code dup in import.cpp / export_modules.cpp such that 1. Only one copy of switching logic (detect flatbuffer / is_flatbuffer); 2. Move detection of includeness of flatbuffer to runtime (so no more macros) This also reverts the dependency of import.cpp -> flatbuffer_loader.cpp to flatbuffer_loader.cpp -> import.cpp. Differential Revision: D36926217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79184 Approved by: https://github.com/zhxchen17	2022-06-20 16:37:38 +00:00
Nikita Shulga	4a4890cfb2	[BE] Use CamelCase for enum class members (#79772 ) Per many C++ code-style guides members(for [example](https://google.github.io/styleguide/cppguide.html#Enumerator_Names) ) members of `enum` should be CamelCased, and only defines should be ALL_CAPS Changes `MemOverlap`, `MemOverlapStatus` and `CmpEvalResult` enum values Also, `YES`, `NO`, `TRUE` and `FALSE` are often system defines Fixes among other things, current iOS build regression, see, which manifests as follows (see [this](`6e90572bb9`): ``` /Users/runner/work/pytorch/pytorch/aten/src/ATen/MemoryOverlap.h:19:29: error: expected identifier enum class MemOverlap { NO, YES, TOO_HARD }; ^ /Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator14.4.sdk/usr/include/objc/objc.h:89:13: note: expanded from macro 'YES' #define YES __objc_yes ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79772 Approved by: https://github.com/drisspg, https://github.com/kulinseth	2022-06-17 05:53:57 +00:00
PyTorch MergeBot	b9bb52d97b	Revert "Put symint overloads on a different name" This reverts commit `213a8fc992`. Reverted https://github.com/pytorch/pytorch/pull/79281 on behalf of https://github.com/bigfootjon due to Diff reverted internally	2022-06-15 17:15:21 +00:00
Nikolay Korovaiko	83e575c510	have a common interface to extract metadata from SizeNodes (#78088 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/78088 Approved by: https://github.com/JackCaoG, https://github.com/wconstab	2022-06-15 04:59:08 +00:00
John Clow	07a528cac7	Adding isDynamic Support to SizeNodes Pull Request resolved: https://github.com/pytorch/pytorch/pull/77917 Approved by: https://github.com/Krovatkin	2022-06-14 03:27:57 +00:00
David Berard	91a2e953e5	[JIT] Use signed integers in CalculatedNecessaryArgs x was underflowing: ``` size_t x = ... while (x >= 0) { x--; } ``` Changed the variables to ssize_t. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79331 Approved by: https://github.com/yuhc, https://github.com/tugsbayasgalan	2022-06-13 19:41:18 +00:00
Edward Z. Yang	213a8fc992	Put symint overloads on a different name Due to implicit conversion shenanigans, having both IntArrayRef and SymIntArrayRef overloads makes {} ambiguous. While we could fix this by making a single unified type that accepts all the overloads we want, an easier fix was to just push the SymIntArrayRef overload to its own name. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/79281 Approved by: https://github.com/suo	2022-06-12 14:36:39 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
Michael Andreas Dagitses	ab2ca95dd1	turn on -Werror=unused-variable in our Bazel CPU build Summary: We also fix any existing issues. Note that we only do this for the CPU build because nvcc is considered a C++ toolchain but it does not have the same flag support. Adding flags to the GPU build will cause nvcc errors. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79156 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-11 02:46:34 +00:00
Michael Andreas Dagitses	606b234336	turn on -Werror=unused-function in our Bazel CPU build Summary: We also fix any existing issues. Note that we only do this for the CPU build because nvcc is considered a C++ toolchain but it does not have the same flag support. Adding flags to the GPU build will cause nvcc errors. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-10 22:11:54 +00:00
PyTorch MergeBot	bcd7a20953	Revert "turn on -Werror=unused-function in our Bazel CPU build" This reverts commit `67d313a032`. Reverted https://github.com/pytorch/pytorch/pull/79154 on behalf of https://github.com/malfet due to Breaks bazel build: `67d313a032`	2022-06-10 20:43:03 +00:00
Michael Andreas Dagitses	67d313a032	turn on -Werror=unused-function in our Bazel CPU build Summary: We also fix any existing issues. Note that we only do this for the CPU build because nvcc is considered a C++ toolchain but it does not have the same flag support. Adding flags to the GPU build will cause nvcc errors. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-10 18:30:08 +00:00
Brian Hirsh	7b3a0ff87a	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-06-10 17:27:47 +00:00
Michael Andreas Dagitses	f96d96a7fc	turn on -Werror=type-limits in our Bazel CPU build Summary: We also fix any existing issues. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79139 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-10 10:04:08 +00:00
Nikita Shulga	3255ddeec9	Make `Wunused-local-typedef` a hard error (#77918 ) Only allow it for `libtorch_python` and tests Helps prevent regression like https://github.com/pytorch/pytorch/pull/76547#issuecomment-1132208232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77918 Approved by: https://github.com/osalpekar, https://github.com/seemethere	2022-06-09 18:14:01 +00:00
Mark Harfouche	221755cc71	Link BLAS privately (#78883 ) We've some users report that they are getting symbol collisions when linking to blas. I don't see a need to re-export the blas library symbols. I figured I would share here for other packagers to be able to benefit too. xref: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/116 xref: https://github.com/conda-forge/openblas-feedstock/issues/134 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78883 Approved by: https://github.com/ezyang	2022-06-09 17:02:06 +00:00
PyTorch MergeBot	4b82ef7928	Revert "Port `index.Tensor` to structured kernels." This reverts commit `cfd84125bd`. Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/zengk95 due to This is breaking mac trunk tests `cfd84125bd`	2022-06-08 20:16:10 +00:00
Brian Hirsh	cfd84125bd	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-06-08 18:17:52 +00:00
dzdang	a56f4e23b9	[quant][core][better-engineering] Rename files in quantized directory to conform with non-quantized countertpart filenames Summary: Names of analogous files in quantized directory (previously snake case) were inconsistent with their non-quantized filename counterparts (pascal case). This is the first of a series of PRs that changes all files in quantized (and sub-directories) dir to have pascal case. `aten/src/ATen/native/quantized/qconv_unpack.cpp` has not been renamed yet because (for reasons currently unknown) after making the name change, `import torch` produces the below error (`qlinear_unpack.cpp` renaming also seems to fail some phabricator CI tests for similar reasons). We suspect that these may be undefined errors and will revisit naming these files in a future PR. ``` terminate called after throwing an instance of 'c10::Error' what(): Type c10::intrusive_ptr<ConvPackedParamsBase<2> > could not be converted to any of the known types. Exception raised from operator() at ../aten/src/ATen/core/jit_type.h:1735 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x55 (0x7f26745c0c65 in /data/users/dzdang/pytorch/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xb1 (0x7f26745bdcd1 in /data/users/dzdang/pytorch/torch/lib/libc10.so) frame #2: <unknown function> + 0x1494e24 (0x7f2663b14e24 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #3: <unknown function> + 0xfed0bc (0x7f266366d0bc in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #4: c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x5a (0x7f266366d71a in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #5: c10::detail::infer_schema::make_function_schema(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x7b (0x7f266366e06b in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #6: <unknown function> + 0x1493f32 (0x7f2663b13f32 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #7: <unknown function> + 0xe227dd (0x7f26634a27dd in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #8: <unknown function> + 0x14e0a (0x7f268c934e0a in /lib64/ld-linux-x86-64.so.2) ..........................truncated............. ``` Test Plan: ``` python test/test_quantization.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/77037 Approved by: https://github.com/jerryzh168	2022-06-07 13:47:08 +00:00
PyTorch MergeBot	6a4997e66a	[Profiler] Weaken ordering check during post processing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78563 The profiler assembles a call hierarchy by replaying recorded events. There is an assert to ensure that the events form a well structured tree; however many of the inputs are from external sources and small differences (e.g. recording time in a lower precision) leads to traces which violate that assumption. For now this is acceptable; the post processing can handle resolving these descrepencies. As a result, I am relaxing the assert to only test event types where we expect the framework to be able to enforce these strong structural requirements. Differential Revision: [D36787787](https://our.internmc.facebook.com/intern/diff/D36787787/) Approved by: https://github.com/suo	2022-06-01 18:55:19 +00:00
PyTorch MergeBot	fca1f495c2	Revert "Port `index.Tensor` to structured kernels." This reverts commit `9fe6f1baf5`. Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/suo due to this broke master, see: `9fe6f1baf5`	2022-06-01 00:12:15 +00:00
PyTorch MergeBot	ceb93afe3f	Revert "Fix bug in flatbuffer deserialization" This reverts commit `7e72c96b10`. Reverted https://github.com/pytorch/pytorch/pull/78344 on behalf of https://github.com/tugsbayasgalan due to as we need to land it in fbcode asap	2022-05-31 23:34:04 +00:00
Brian Hirsh	9fe6f1baf5	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-05-31 22:15:20 +00:00
Tugsbayasgalan Manlaibaatar	7e72c96b10	Fix bug in flatbuffer deserialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/78344 Approved by: https://github.com/qihqi	2022-05-31 18:37:30 +00:00
Michael Suo	032d1ace1d	[ci] disable flaky MobileProfiler.Backend test This test is flaky, normally I'd disable using the disable bot but it doesn't support cpp. [skip ci] Pull Request resolved: https://github.com/pytorch/pytorch/pull/78320 Approved by: https://github.com/malfet	2022-05-26 03:22:55 +00:00
Shunting Zhang	26d9386f67	Make string serialization of C++ FunctionSchema consistent with torchgen.model.FunctionSchema Pull Request resolved: https://github.com/pytorch/pytorch/pull/77926 There is a discrepency between the string representation of C++ FunctionSchema and torchgen.model.FunctionSchema. The latter will not add parenthesis around the returned types if that a single item, but the C++ FunctionSchema always add the parenthesis. Make them consistent so we can convert one type to the other via its string representation and parse method. Differential Revision: [D36535924](https://our.internmc.facebook.com/intern/diff/D36535924/) Approved by: https://github.com/bdhirsh	2022-05-24 19:39:26 +00:00
John Clow	c82fb7a67f	Adding support for upper and lower bound functions in SSA Pull Request resolved: https://github.com/pytorch/pytorch/pull/77389 Approved by: https://github.com/eellison	2022-05-20 23:58:40 +00:00
Nikolay Korovaiko	df1f9b9840	Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#77756 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/77756 Approved by: https://github.com/desertfire	2022-05-20 05:39:03 +00:00
PyTorch MergeBot	e9d660c331	Revert "Revert "Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836 )""" This reverts commit `acf7136a52`. Reverted https://github.com/pytorch/pytorch/pull/77719 on behalf of https://github.com/suo	2022-05-18 05:06:50 +00:00
Edward Z. Yang	acf7136a52	Revert "Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836 )"" This reverts commit `c35bd8d423`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77719 Approved by: https://github.com/Chillee, https://github.com/malfet	2022-05-18 03:25:43 +00:00
PyTorch MergeBot	c35bd8d423	Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836 )" This reverts commit `fc4c3c9bc7`. Reverted https://github.com/pytorch/pytorch/pull/76836 on behalf of https://github.com/suo	2022-05-18 02:45:25 +00:00
Han Qi (qihqi)	3822a472ef	Python function to extract information on mobile::Module from flatbuffer (#77624 ) Summary: Includes following refactor: 1. common loading on operator validation that is dup'd in pickle and flatbuffer loader moved to function.h/cpp 2. Allow loading of a function without wiring operator. This function will be used to implement get_bundled_input and friends for flatbuffer. Test Plan: contbuild & OSS CI, see `69fa49f123` Reviewed By: cccclai Differential Revision: D36348549 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77624 Approved by: https://github.com/cccclai	2022-05-18 00:42:57 +00:00
Nikolay Korovaiko	fc4c3c9bc7	Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836 ) LTC Tensors now create real IR (SizeNode) for sym_sizes() in LTCTensorImpl.cpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76836 Approved by: https://github.com/ezyang	2022-05-18 00:40:42 +00:00
Michael Suo	7f1e331b34	Make SymInt constructor explicit Since we plan to have a bunch of code that is sensitive to whether or not a SymInt contains a symbolic shape or not, it seems like a bad idea to have an implicit constructor. For example, code like: ``` sizes_and_strides_.stride_at_unchecked(dim) = 0; ``` would sail through, and the `0` would get implicitly promoted to a SymInt. This is a tradeoff though: it makes code that handles `SymInt`s more clunky as `int64_t`s and integer literals need to be explicitly wrapped in `SymInt` before being used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77666 Approved by: https://github.com/ezyang	2022-05-17 22:28:35 +00:00
Bin Bao	25c6ebd12c	Revert "Revert "[LT] Codegen ReuseNode for supported ops"" Summary: Fixed a XLC build failure by generating an always-return-false default CanBeReused method. This reverts commit `3cade9d454`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77513 Approved by: https://github.com/alanwaketan	2022-05-16 20:14:42 +00:00
Wang, Eikan	e5a5cd149f	Simplify IfThenElse and CompareSelect within for-loop (#76793 ) Analyze the range to determine if a condition cannot be satisfied. Suppose the for-loop body contains `IfThenElse` or `CompareSelect` while the condition of the two statements depends on the for-loop index `Var`. In that case, we will analyze the range to check whether the condition could always be satisfied or not. If the condition is deterministic, simplify the logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76793 Approved by: https://github.com/huiguoo	2022-05-15 20:21:28 +00:00
PyTorch MergeBot	3cade9d454	Revert "[LT] Codegen ReuseNode for supported ops" This reverts commit `6066e5929f`. Reverted https://github.com/pytorch/pytorch/pull/76738 on behalf of https://github.com/malfet	2022-05-14 00:33:10 +00:00
Bin Bao	6066e5929f	[LT] Codegen ReuseNode for supported ops Summary: 1. Update the codegen script to add a TrieCache lookup (ReuseNode) before creating a new IR node. The following is an example generated code, ``` at::Tensor LazyNativeFunctions::add(const at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) { ... torch::lazy::NodePtr node = torch::lazy::ReuseNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha); if (!node) { auto out_meta = at::meta::add(self, other, alpha); std::vector<Shape> shapes{Shape(out_meta.scalar_type(), out_meta.sizes().vec())}; TORCH_INTERNAL_ASSERT(shapes.size() == 1); if(symbolicShapeEnabled()){ std::vector<jit::IValue> inputs = { self, other, alpha }; char* schema_str = "aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor"; applySymbolicShapesOnLT(schema_str, inputs, shapes); } node = torch::lazy::MakeNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha, std::move(shapes)); CacheNode(node); } ... } ``` 2. TrieCache lookup depends on each IR node subclass to provide its own comparison function. The following is an example generated code, ``` bool CanBeReused(const torch::lazy::Value& self, const torch::lazy::Value& other, const torch::lazy::Value& alpha) const { size_t i = 0; return (operand(i++) == self && operand(i++) == other && operand(i++) == alpha); } ``` 3. DeviceData is specially handled. 4. Non-codegen op changes are coming a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76738 Approved by: https://github.com/JackCaoG, https://github.com/wconstab	2022-05-13 19:13:58 +00:00
yanbing-j	4f82f439d1	Enable BFloat16 ELU, SELU and CELU in CPU path (#62546 ) Enable BFloat16 ELU, SELU and CELU in CPU path. SELU and CELU will call ELU implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62546 Approved by: https://github.com/frank-wei	2022-05-12 16:56:57 +00:00
Xiang Gao	cc9d0f309e	lshift and rshift stop support floating types (#77146 ) Fixes #74358 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77146 Approved by: https://github.com/ngimel	2022-05-11 22:29:30 +00:00
Bin Bao	8f5cdc6d5d	Revert "Revert "[LT] Store OpKind for each IR subclass in a static field"" Summary: Re-land https://github.com/pytorch/pytorch/pull/76711 by fixing internal build errors. Generate class-level opkind as a static method instead of a static member. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77102 Approved by: https://github.com/wconstab, https://github.com/JackCaoG, https://github.com/antoniojkim	2022-05-11 12:27:05 +00:00
John Clow	26e2936edc	[JIT SSA] Added testing for the Cat Op in LazyTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/76552 Approved by: https://github.com/Krovatkin	2022-05-09 22:11:14 +00:00
PyTorch MergeBot	7eaf4780ba	Revert "[LT] Store OpKind for each IR subclass in a static field" This reverts commit `ac37ddc795`. Reverted https://github.com/pytorch/pytorch/pull/76711 on behalf of https://github.com/malfet	2022-05-09 20:50:09 +00:00
Fuqiang Zhang	bd573389f6	[Bootcamp]Add option for flatbuffer loader to copy memory to individual tensors (#76986 ) Summary: Add option for flatbuffer loader to copy memory to individual tensors to allow free memeory without waiting for all tensor runs completed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76986 Approved by: https://github.com/qihqi	2022-05-09 17:29:30 +00:00
Bin Bao	ac37ddc795	[LT] Store OpKind for each IR subclass in a static field Summary: Currently OpKind is stored as an object field called op_ for each IR node, and one usage of op_ is to avoid dynamic_cast in NodeCast when we need to downcast a base-node pointer into a concrete sub-node pointer. As a result, we need to construct and pass in an op when downcasting nodes, and this becomes quite anonnying when we start to implement the trie-based IR node reusing. More importantly, the op for each subclass should be unique for that subclass and thus making it a const static field is a more logical design. In this PR, we still keep the object-level op_ for easier XLA adoption. As furture work, we can come back to remove op_, make the op() method virtual, and get rid of OpKind in all the node constructors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76711 Approved by: https://github.com/wconstab, https://github.com/JackCaoG	2022-05-06 19:14:46 +00:00
David Berard	6c615a21a0	[NVFuser] prep for on-by-default 1. fix tests that expected nvfuser off-by-default behavior 2. skip nvfuser if getExecutorMode() == false Pull Request resolved: https://github.com/pytorch/pytorch/pull/76937 Approved by: https://github.com/eellison	2022-05-06 18:18:53 +00:00
Bin Bao	f05710dd40	[LT] Add a trie data structure for caching IR nodes Summary: TrieCache provides a way to look up an IR node before we actually create it. If the lookup hits in TrieCache, we reuse the existing node and move the current pointer in TrieCache to point to that node; if the lookup misses, we create a new node and insert it into TrieCache. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76542 Approved by: https://github.com/wconstab, https://github.com/JackCaoG	2022-05-04 23:48:03 +00:00
Wang, Eikan	429a80dded	[NNC] Lowering function generates the output buffer with the specified stride (#76529 ) Summary: Pass stride information to lowering function to generate the output bufer with proper memory layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76529 Reviewed By: ZolotukhinM Differential Revision: D36116712 Pulled By: IvanKobzarev fbshipit-source-id: d3901f756b3710ecce172d6db3ecb0b7c12fb929 (cherry picked from commit b6cd53c91c01db36ea0e99167dc0ce0ae1d3aa23)	2022-05-04 20:04:22 +00:00
Bin Bao	f8a4780eb2	[LT] Move MakeNode into ir_builder.h Summary: Move MakeNode into ir_builder.h to avoid circular header reference later when introducing a trie cache for IR node lookup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76482 Approved by: https://github.com/wconstab	2022-05-03 14:53:19 +00:00
Elias Ellison	e5a55af305	Reland reland Reland of https://github.com/pytorch/pytorch/pull/76397 and https://github.com/pytorch/pytorch/pull/76493 This time I'll get it right 😢 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76539 Approved by: https://github.com/davidberard98, https://github.com/osalpekar	2022-04-28 20:41:55 +00:00
PyTorch MergeBot	a5bc02aeb2	Revert "[JIT] Register decomp reland" This reverts commit `81b9cb741c`. Reverted https://github.com/pytorch/pytorch/pull/76397 on behalf of https://github.com/osalpekar	2022-04-28 03:33:29 +00:00
Antonio Kim	f3f327e103	Decouple LTC from TS Backend using Lazy IR Builder Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710 IR builder class introduced to decouple the explicit usage of `TsNode` in core lazy tensors. Requires https://github.com/pytorch/pytorch/pull/75324 to be merged in first. Background - there are ~ 5 special ops used in lazy core but defined as :public {Backend}Node. (DeviceData, Expand, Scalar...) - we currently require all nodes derive from {Backend}Node, so that backends can make this assumption safely - it is hard to have shared 'IR classes' in core/ because they depend on 'Node' Motivation 1. avoid copy-paste of "special" node classes for each backend 2. in general decouple and remove all dependencies that LTC has on the TS backend Summary of changes - new 'IRBuilder' interface that knows how to make 5 special ops - move 'special' node classes to `ts_backend/` - implement TSIRBuilder that makes the special TS Nodes - new backend interface API to get the IRBuilder - update core code to call the builder CC: @wconstab @JackCaoG @henrytwo Partially Fixes #74628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75433 Approved by: https://github.com/wconstab	2022-04-28 02:07:02 +00:00
Jiewen Tan	a28b132bc2	Revert D35860266: [pytorch][PR] Update torch::lazy::BackendDevice to have a new default ordinal Test Plan: revert-hammer Differential Revision: D35860266 (`f9d07ae644`) Original commit changeset: 554ebe16a068 Original Phabricator Diff: D35860266 (`f9d07ae644`) fbshipit-source-id: 325c54aa2e87e51134115213352b3d33a81b7edf (cherry picked from commit bbd74bf34a534d1b87aadff9790038e3dbbfa9c8)	2022-04-27 18:11:24 +00:00
Elias Ellison	81b9cb741c	[JIT] Register decomp reland Reland of https://github.com/pytorch/pytorch/pull/76252 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76397 Approved by: https://github.com/davidberard98	2022-04-26 23:17:18 +00:00
PyTorch MergeBot	2d72cb3373	Revert "[JIT] Allow registering Decompositions" This reverts commit `d9f0774f98`. Reverted https://github.com/pytorch/pytorch/pull/76252 on behalf of https://github.com/zengk95	2022-04-26 04:47:05 +00:00
Elias Ellison	d9f0774f98	[JIT] Allow registering Decompositions - Allow registering custom decompositions - Add easier API for invoking decompositions - Shorten API names (no users yet) I am doing these as one pr because they are fairly short/simple and because github first does not support ghstack yet. cc @Chillee @zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76252 Approved by: https://github.com/davidberard98	2022-04-26 03:00:35 +00:00
Nikolay Korovaiko	bb60cac25a	E2E SymInt example narrow_copy This roughly corresponds to Goal 3.2 in https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8YLw-jxEw/edit# Namely: It adds the following: * SymbolicIntNode interface * LazySymbolicIntNode implementation * Lazy `narrow_copy` implementation * Need add support for SymInt in codegen * Test (below) ```cpp TEST(LazyDynamicOpsTest, NarrowCopy) { auto x = torch::rand({5, 10, 10}).to(kLazy); const size_t Y_DIM = 3; const size_t X_DIM_INDEX = 2; auto y = torch::rand({Y_DIM}).to(kLazy); auto ly = torch::lazy::TryGetLtcTensor(y); auto dim_node = MakeNode<SizeNode>(ly->GetIrValue(), 0); auto lmn = new torch::lazy::SymbolicIntNode(dim_node); auto z = x.narrow_copy(X_DIM_INDEX, 0, lmn->toSymInt()); AllClose(z.cpu(), x.cpu().narrow_copy(X_DIM_INDEX, 0, Y_DIM)); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75759 Approved by: https://github.com/wconstab	2022-04-26 02:40:27 +00:00
Wonjoo Lee	f9d07ae644	Update torch::lazy::BackendDevice to have a new default ordinal (#76264 ) Summary: Fixes https://github.com/pytorch/xla/issues/3490. Updates `torch::lazy::BackendDevice` with changes below: 1. Remove the no-op string constructor. 2. Update default ordinal to `-1`. 3. Add a `is_valid` function to check if `ordinal` is valid/non-default (`ordinal >= 0`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/76264 Reviewed By: mrshenli Differential Revision: D35860266 Pulled By: alanwaketan fbshipit-source-id: 554ebe16a0683d37b00270c4f35163bf690bfe28 (cherry picked from commit b941d10e8545dfecfb34e4d5c24a29a1cc49bc4b)	2022-04-25 23:57:18 +00:00
zengk95	1d55518198	Revert "[nnc] Strides to Tensor (#72962 )" This reverts commit `939060925f`. Fixes https://github.com/pytorch/vision/issues/5873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76332 Approved by: https://github.com/seemethere	2022-04-25 19:50:00 +00:00
Ivan Kobzarev	939060925f	[nnc] Strides to Tensor (#72962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72962 Test Plan: Imported from OSS Reviewed By: ZolotukhinM, cpuhrsch Differential Revision: D34589306 Pulled By: IvanKobzarev fbshipit-source-id: ecee5249760ecc0c8b2edb1842b90218899bc944 (cherry picked from commit 9e310c4c67389da30da89126d838ffe3864aba6f)	2022-04-23 19:35:15 +00:00
Prem	7557407653	Added directory check before saving in C++ API Fixes #75177 Couldn't find any utility method to get directory name in pytorch repo, hence creating a function for that. Let me know if a new function is not needed. I also referred [this](https://github.com/pytorch/pytorch/blob/master/c10/test/util/tempfile_test.cpp#L15) for directory check. Also I am using TORCH_CHECK to show the error. This is highly verbose with the entire stack visible. Is there any alternative for the same so that it is easier to read? This could happen a frequently, so small and concise error would be more helpful here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75681 Approved by: https://github.com/albanD	2022-04-22 20:04:41 +00:00
Wang, Eikan	ef0873327e	[NNC] Add utility functions to check channels-last contiguous (#75938 ) Summary: The `Buf` uses `std::vector<ExprHandle>` to represent its strides. The `ExprHandle` could be an immediate value or a mathematical expression with variables involved both for the static shape and dynamic shape. So it is hard to directly deduce the channels-last contiguous layout based on the numerical calculation. Hence, the utility functions of this PR are based on the pattern match to check whether the `Buf` is channels-last contiguous. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75938 Reviewed By: cpuhrsch Differential Revision: D35724091 Pulled By: ZolotukhinM fbshipit-source-id: f79ae21749d0aad8601f0434b52df88602ff09bf (cherry picked from commit 3712bbbe4bea57c5c1abe1eafde4b8778e13e0c4)	2022-04-22 06:42:39 -07:00
Antonio Kim	2c2c13d21b	Decouple Lazy Node Shape Cache (#75324 ) Summary: Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710 Move shape cache implementation to the backend interface. Also, clean up some of the hashing logic in the base node class. CC: wconstab JackCaoG henrytwo Partially Fixes https://github.com/pytorch/pytorch/issues/74628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75324 Reviewed By: anjali411 Differential Revision: D35730823 Pulled By: wconstab fbshipit-source-id: cf6fa326319b9324e5f422a78817b6fb5bf7e9b8 (cherry picked from commit faec5043df56639e2fd23de2d91ae796e4f3df70)	2022-04-21 17:27:05 -07:00
Nikolay Korovaiko	69e048b090	List of SymInt rebase on master Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75115 Approved by: https://github.com/ezyang	2022-04-20 02:09:55 +00:00
Elias Ellison	f65eb09d6b	[JIT] Move Shape Function definition to python Moves jit shape function registration to python. Like jit decompositions, a script must be run after adding new definitions which serializes them in a c++ file. This was a request so that torch-mlir could define functions in python and upstream their shape functions. cc @silvasean @makslevental Pull Request resolved: https://github.com/pytorch/pytorch/pull/75546 Approved by: https://github.com/davidberard98	2022-04-19 20:59:44 +00:00
Taylor Robie	a5e338a826	[RecordFunction] More effecient machinery to determine which callbacks to run. (#75807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75807 There is a tension in RecordFunction between two use cases: 1) In the normal eager path we don't run any callbacks, so we need to bail out of the profiling path as soon as possible to minimize eager overhead. 2) When profiling we want to determine which callbacks to run as efficiently as possible to minimize instrumentation overhead. The confounding factor in all of this is sampling callbacks because they change which callbacks will run on each call, even in steady state operation. This has traditionally been handled with a two stage procedure: first we flip a coin to determine if a sampled callback might run. If false (which it usually is), do nothing. This solves (1). If true, check to see if we need to build the full callback set or if it was a false positive. This procedure has two negative effects: * It forces us to rebuild the set of callbacks to run on every step when profiling * It leaks the sampling abstraction, requiring other parts of the code to bump certain values and forces RecordFunction to lazily initialize. This change introduces a multi-level cache which can (in the common case) quickly determine which callbacks will run, rather than if callbacks might run. This means that rather than call `shouldRunRecordFunction`, we can simply get the callbacks for an invocation and check if they are empty. (And completely removes the pre-sampling heuristic.) Another major benefit of the new cache structure is that it allows thread-safe registration and unregistration of global callbacks. It's worth briefly discussing how this maintains eager performance. In the standard eager case (only sampling callbacks registered) the cache first checks that the global callbacks haven't changed (atomic read), decrements a counter to see if a sampling callback fired, and then returns the active callbacks which is simply a SmallVector of pointer pairs and a couple POD values (scope, needs inputs/outputs/ids). The biggest cost according to perf is the SmallVector logic; we could consider adopting a hard limit on active callbacks; more than half a dozen callbacks running in a single step would be quite a lot. But the total cost relative to `PYTORCH_DISABLE_PER_OP_PROFILING` is only ~10ns, so debatable if it's worth it to switch to `std::array`. The primary change is in `record_function.cpp`, which has a more detailed description of the new cache structure. `record_function.h` has some minor changes to align with the new calling convention and the remaining files are simply changes to the call sites. Future work: * RecordFunction no longer needs to be lazily initialized. * We can deprecate the disable/reenable APIs, since we can not safely add and remove global callbacks. Test Plan: I tested eager mode performance using the overhead benchmark and found that the non-profiled path was unaffected. However the no-op observer dropped from 0.41us to 0.37us (0.25us if no observers are active) which is about 1/3rd reduction in the cost of the callback selection machinery. I also added several C++ unit tests, as the core RecordFunction machinery (especially sampling) was largely untested. Reviewed By: swolchok, davidberard98 Differential Revision: D35276158 fbshipit-source-id: 35135f444724fba4eb97c0ae7f3f710f0f9016fd (cherry picked from commit 9e359b87422c18f2a195185f32e7e85c82f956fd)	2022-04-19 20:46:16 +00:00
Han Qi	b34b192d6b	Reland "Make debug_pkl smaller by only emitting unique traces." (#73368 ) Summary: ## Original commit message: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368 debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings. Since many SourceRange shares the same source, the string for trace can be deduped. The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression). The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup. To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction. Test Plan: ## Original Test plan unit test Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents: ``` [qihan@devvm5585.vll0 ~]$ du archive -h 4.0K archive/xl_model_weights 3.7M archive/extra 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform 8.0K archive/code/__torch__/caffe2/torch/fb 8.0K archive/code/__torch__/caffe2/torch 8.0K archive/code/__torch__/caffe2 20M archive/code/__torch__/torch/fx/graph_module 20M archive/code/__torch__/torch/fx 8.0K archive/code/__torch__/torch/classes 20M archive/code/__torch__/torch 20M archive/code/__torch__ 20M archive/code 2.7M archive/constants 35M archive [qihan@devvm5585.vll0 ~]$ du resaved -h 4.0K resaved/extra 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform 8.0K resaved/code/__torch__/caffe2/torch/fb 8.0K resaved/code/__torch__/caffe2/torch 8.0K resaved/code/__torch__/caffe2 1.3M resaved/code/__torch__/torch/fx/graph_module 1.3M resaved/code/__torch__/torch/fx 8.0K resaved/code/__torch__/torch/classes 1.4M resaved/code/__torch__/torch 1.4M resaved/code/__torch__ 1.4M resaved/code 2.7M resaved/constants 13M resaved [qihan@devvm5585.vll0 ~]$ ``` ## Additional test: `buck test mode/dev-tsan //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --exact 'caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.to'` passes test jest.fbios.startup_cold_start.local.simulator f333356873 - Differential Revision: D35196883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74869 Approved by: https://github.com/gmagogsfm	2022-04-18 22:34:21 +00:00
Han Qi	7d5c07830d	Add upgrader related logic to flatbuffer (#71451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71451 title Test Plan: unittest Reviewed By: tugsbayasgalan Differential Revision: D33593056 fbshipit-source-id: c48d6ad50e6e2f757b68525dfe07693711b95840 (cherry picked from commit 8e09e20c1dafcdbdb45c2d1574da68a32e54a3a5)	2022-04-17 18:51:23 +00:00
Nikita Shulga	fe8eff3711	Revert "Add upgrader related logic to flatbuffer" This reverts commit `dfae96171a`.	2022-04-17 11:38:59 -07:00
Han Qi	dfae96171a	Add upgrader related logic to flatbuffer Summary: title Test Plan: unittest Differential Revision: D33593056 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71451 Approved by: https://github.com/tugsbayasgalan	2022-04-16 02:04:48 +00:00
Raghavan Raman	c2d5f6a5a4	[nnc] Update bounds overlap analysis to identify non-overlaps even with symbolic bounds Pull Request resolved: https://github.com/pytorch/pytorch/pull/74658 Approved by: https://github.com/ZolotukhinM	2022-04-14 20:24:03 +00:00
Raghavan Raman	d8ad1a579f	[nnc] Fuse loops that have variable bounds Pull Request resolved: https://github.com/pytorch/pytorch/pull/74346 Approved by: https://github.com/ZolotukhinM	2022-04-14 20:24:03 +00:00
Jiewen Tan	ab0d9b18e9	[LT] Support Tensor.is_alias_of Summary: Tensor.is_alias_of relies on Storage to perform. However, LTCTensorImpl was not implemented with that in mind. This commit adds a fake storage to LazyTensor as a marker to mark LazyTensors that point to the same storage. The reason why it's not done at LTCTensorImpl is that LazyTensor maintains the view ops/alias logic in LazyTensor class instead of relying on TensorImpl to do the check. Test Plan: ./build/bin/test_lazy --gtest_filter=LazyOpsTest.IsAliasOf Pull Request resolved: https://github.com/pytorch/pytorch/pull/75246 Approved by: https://github.com/bdhirsh	2022-04-14 07:28:03 +00:00
Nikolay Korovaiko	ce842f43f2	Relanding shape cache (75400) (#75710 ) Summary: https://github.com/pytorch/pytorch/pull/75400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75710 Reviewed By: malfet Differential Revision: D35598920 Pulled By: Krovatkin fbshipit-source-id: 2bbbb3d0c24214b5dbb4ca605e7daa94671f96b0 (cherry picked from commit 572f2f9df5bfd73cd7b83536f619bc86d820ccd8)	2022-04-13 17:17:30 +00:00
PyTorch MergeBot	db1801099b	Revert "Relanding shape cache (75400)" This reverts commit `89486821ed`. Reverted https://github.com/pytorch/pytorch/pull/75710 on behalf of https://github.com/malfet	2022-04-13 17:14:38 +00:00
Nikolay Korovaiko	89486821ed	Relanding shape cache (75400) https://github.com/pytorch/pytorch/pull/75400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75710 Approved by: https://github.com/malfet	2022-04-13 07:28:32 +00:00
PyTorch MergeBot	c274f66268	Revert "Adding Caching of calculated Symbolic Shapes" This reverts commit `9a7bfaa929`. Reverted https://github.com/pytorch/pytorch/pull/75400 on behalf of https://github.com/mehtanirav	2022-04-12 21:53:31 +00:00
John Clow	9a7bfaa929	Adding Caching of calculated Symbolic Shapes Pull Request resolved: https://github.com/pytorch/pytorch/pull/75400 Approved by: https://github.com/eellison	2022-04-12 11:19:58 +00:00
Pavithran Ramachandran	6402e62454	Refractor flatbuffer jit code Pull Request resolved: https://github.com/pytorch/pytorch/pull/75239 Refractor flatbuffer_serializer to move JIT related code to a separate file . Differential Revision: [D35301020](https://our.internmc.facebook.com/intern/diff/D35301020/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35301020/)! Approved by: https://github.com/iseeyuan	2022-04-11 23:41:48 +00:00
John Clow	f281d83d77	Moving Remove Tensor Type Specializations to after custom passes This is to allow for Intel folks to use type information in their custom passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71748 Approved by: https://github.com/eellison	2022-04-11 22:12:01 +00:00
Yulv-git	ac2d2e3a3d	Fix some typos. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561 Approved by: https://github.com/albanD	2022-04-11 21:55:59 +00:00
Nikita Shulga	80ea6955af	Add cuda-11.3+clang9 build workflow (take 2) To be able to detect unused captures in GPU code lambdas (as gcc does not support this diagnostic) Remove unused opts lambda capture in `ProcessGroupMPI.cpp` and `Distributions.cu` Fix sign-compare in nvfuser benchmark and ignore signed unsigned comparison in nvfuser tests Fixes https://github.com/pytorch/pytorch/issues/75475 by aliasing CMAKE_CUDA_HOST_COMPILER to C_COMPILER when clang is used Pull Request resolved: https://github.com/pytorch/pytorch/pull/75293 Approved by: https://github.com/atalman, https://github.com/seemethere	2022-04-11 17:13:01 +00:00
PyTorch MergeBot	8fe43d76d5	Revert "Add cuda-11.3+clang9 build workflow" This reverts commit `709fcc862e`. Reverted https://github.com/pytorch/pytorch/pull/75293 on behalf of https://github.com/janeyx99	2022-04-11 15:24:59 +00:00
Nikita Shulga	709fcc862e	Add cuda-11.3+clang9 build workflow To be able to detect unused captures in GPU code lambdas (as gcc does not support this diagnostic) Remove unused opts lambda capture in `ProcessGroupMPI.cpp` and `Distributions.cu` Fix sign-compare in nvfuser benchmark and ignore signed unsigned comparison in nvfuser tests Fixes https://github.com/pytorch/pytorch/issues/75475 by aliasing CMAKE_CUDA_HOST_COMPILER to C_COMPILER when clang is used Pull Request resolved: https://github.com/pytorch/pytorch/pull/75293 Approved by: https://github.com/atalman, https://github.com/seemethere	2022-04-11 14:10:57 +00:00
Jiewen Tan	dc37090ec5	[LT] Support diagonal op (#75230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75230 Op diagonal is a view op which we can't code-gen yet. Therefore, support it by making hand-written IR construction and lowering. Test Plan: ./build/bin/test_lazy --gtest_filter=LazyOpsTest.TestDiagonal* Reviewed By: wconstab Differential Revision: D35378316 Pulled By: alanwaketan fbshipit-source-id: 7958d00107aef20ac37aabcf2868346240977530 (cherry picked from commit 84155528fce484627c9688cfd92fd4aeb68219e5)	2022-04-08 19:49:42 +00:00
Nikolay Korovaiko	4a85145bbd	Ansley's rebase of DimensionNode onto master (#75352 ) Summary: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75352 Reviewed By: wconstab Differential Revision: D35455859 Pulled By: Krovatkin fbshipit-source-id: e24c81d63dc66d03b752cc8de5cb551d84b003ac (cherry picked from commit 4ad371cb4cc88860ce8ec398d82083f6759e3fcf)	2022-04-08 17:22:56 +00:00
John Clow	f1db3e465a	Adding integration of SSA into LazyTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/75050 Approved by: https://github.com/Krovatkin	2022-04-07 19:49:41 +00:00
Pavithran Ramachandran	3001bda304	[PyTorchEdge] Backport from v9 flatbuffer to v8 pickle (#75201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75201 In this diff: 1. Bump supported version to 9, which will serve as a placeholder for upcoming version bump to v9 for flatbuffer format migration. 2. Implements backport from v9 flatbuffer file to v8 pickle file. ghstack-source-id: 153225189 (Note: this ignores all push blocking failures!) Test Plan: fb: ``` cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions Parsing buck files: finished in 0.7 sec Downloaded 0/25 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 20.7 sec (100%) 21783/21783 jobs, 5/21783 updated cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.FlatbufferBackPortTest Parsing buck files: finished in 0.7 sec Building: finished in 4.5 sec (100%) 12972/53298 jobs, 0/53298 updated Total time: 5.3 sec More details at https://www.internalfb.com/intern/buck/build/b658d597-d358-4293-97cb-28e7612b96e8 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 35d5542d-6ee3-4c28-be10-1d822c7a6fef Trace available for this run at /tmp/tpx-20220308-090347.891303-35d5542d-6ee3-4c28-be10-1d822c7a6fef/trace.log RemoteExecution session id: reSessionID-35d5542d-6ee3-4c28-be10-1d822c7a6fef-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 490 tests discovered (22.838) ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.FlatbufferBackPortTest (0.289) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000 ``` Reviewed By: iseeyuan Differential Revision: D34702597 fbshipit-source-id: 5c203c29d13360d7934ce6e57557739e7038c05e (cherry picked from commit 6189e08a2bd968fdab636f77cb6bd73d6c36beb2)	2022-04-07 19:43:57 +00:00
Wang, Eikan	252e1ccce6	Enable TE fuser to support user defined operator (#73073 ) Summary: PyTorch supports registering a custom operator by `TORCH_LIBRARY_FRAGMENT` / `TORCH_LIBRARY_IMPL` and `torch::jit::tensorexpr::getNNCLoweringRegistry` could insert a custom operator. But the te fuser passes conditional check does not support custom operator. The `isSupported` of `tensorexpr_fuser` checks whether the `Node` is `get_tensorexpr_elementwise_set()`, `supported_non_eltwise_set()`, `supported_misc_set` and `supported_reduction_set`. If a custom operator needs to be added to the TE fusion group, the checked will block it. Taking the RN50 as an example, we can speed up the model by fusing the convolution and consecutive element-wise operator into a custom operator. The framework overhead becomes non-negligible when the computation becomes more efficient, especially for the latency mode and the tiny models. If the TE fuser allows adding the custom operator to the fusion group, then the entire RN50 model could be fused by TE as a single operator/function consisting of "ExternalCalls" and TE-IR. This could significantly reduce framework overhead, which in turn improves RN50 E2E performance. The same goes for other models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73073 Reviewed By: pbelevich Differential Revision: D35453165 Pulled By: ZolotukhinM fbshipit-source-id: a764cf340b0b1e05fe230649cbe44f5786bdd37d (cherry picked from commit ee95aa4d36714540fbb216a338799e6a6bb966d5)	2022-04-07 04:36:39 +00:00
Martin Yuan	00c1e01ad0	Remove internal logic to handle bytecode version 3 (#57775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57775 The minimum supported bytecode version is updated from 3 to 4. We no longer support version 3 bytecode models. Why? * There are hacky codes in operator loading, that performs differently on one operator on the global bytecode version 3. Instead operator related metadata should be passed (for example, in #56845). To allow future development, we remove the hacky way first. * The bytecode version was bumped from 3 to 4 more than half a year ago. Since all the production models are all bumped to version 4, it's not practical to keep and maintain version 3. The risk to deprecate version 3 is low. Test Plan: Imported from OSS Reviewed By: raziel Differential Revision: D28270791 Pulled By: cccclai fbshipit-source-id: 70b1bd6352fdaae5f8d2173b81578d77018c8e44 (cherry picked from commit 3e930fa381cd01f3705116795c6426df992372fc)	2022-04-07 01:45:52 +00:00
Pavithran Ramachandran	f984e50f39	Extend jit::load to work on flatbuffer file; Take 2 (#75256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75256 ghstack-source-id: 153138970 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D35399581 fbshipit-source-id: dafe9d301009d3f70986ed92bfe06d160ab90ba0 (cherry picked from commit ccc860fd07946de5aae12bc179a0b8bbba83b997)	2022-04-06 17:54:01 +00:00
John Clow	26dcec152c	Added support for SSA for ops not in a JIT graph Pull Request resolved: https://github.com/pytorch/pytorch/pull/74340 Approved by: https://github.com/eellison	2022-04-06 01:45:37 +00:00
Antonio Kim	e1b4117e30	Move shape and operand definitions to base node (#75223 ) Summary: First stage of breaking up https://github.com/pytorch/pytorch/pull/74710 Moves the shape and operand definitions from `TsNode` to the base `Node` CC: wconstab JackCaoG henrytwo Partially Fixes https://github.com/pytorch/pytorch/issues/74628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75223 Reviewed By: zou3519 Differential Revision: D35410285 Pulled By: wconstab fbshipit-source-id: bb84d3fb636882cbe7e18af4b35ff2c0e22aaa58 (cherry picked from commit a4144c9a48379d8a9007cff845796608b597cce1)	2022-04-06 01:43:46 +00:00
Lu Fang	32e58c73c4	Back out "Extend jit::load to work on flatbuffer file" (#75244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75244 Original commit changeset: d653a5af662a Original Phabricator Diff: D35060736 (`d9d34922a0`) Test Plan: Model loading test, verified that D35060736 (`d9d34922a0`) will cause the torch::save => torch::load failure. Reviewed By: yinghai, jianyuh Differential Revision: D35387009 fbshipit-source-id: 9d176992d402d57779e2af3d905b3c1538335298 (cherry picked from commit 6c8cc0d3b8a88b15e35702d70e18bbae8aa4628a)	2022-04-05 09:55:04 +00:00
Nikita Shulga	81d765ef1f	Fix sign-compare violations in cpp tests Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: https://github.com/pytorch/pytorch/pull/75080 Approved by: https://github.com/atalman	2022-04-04 23:05:31 +00:00
Chen Lai	6efc5c1acf	Rewrite upgrader bytecode version from 3 to 4 (content unchanged) (#75120 ) Summary: update the upgrader models by hacking backport logic - copy everything in the model and only rewrite the bytecode version to 4 in D35265596 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75120 ghstack-source-id: 152823046 Test Plan: CI Reviewed By: qihqi Differential Revision: D35321154 fbshipit-source-id: 333158bd0fd9b4819b3b7cf47d80c285934adf3e (cherry picked from commit 74bb2da73a4d18f448b8486772643eac89eb759a)	2022-04-02 01:51:39 +00:00
Pavithran Ramachandran	d9d34922a0	Extend jit::load to work on flatbuffer file (#75022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75022 Extending torch::jit::load to read flatbuffer file ghstack-source-id: 152820697 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D35060736 fbshipit-source-id: d653a5af662a46107ff4fd70209fd2a0a4d40f20 (cherry picked from commit 109e14a54bd279011c8f9066e6c29e8e0b1fc4db)	2022-04-02 01:33:34 +00:00
Pavithran Ramachandran	7aaa75af05	Extending _get_bytecode_version to support flatbuffers format (#75021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75021 Extending `_get_bytecode_version` to support flatbuffers. ghstack-source-id: 152771695 (Note: this ignores all push blocking failures!) Test Plan: ``` ~/fbsource/xplat] cd ~/fbsource/xplat/ && buck test //xplat/caffe2:test_lite_interpreter Building: finished in 0.8 sec (100%) 327/327 jobs, 0/327 updated Total time: 0.9 sec Testing: finished in 06:59.5 min (85 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/caffe2:test_lite_interpreter PASS 412.3s 85 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_interpreter TESTS PASSED ``` Reviewed By: iseeyuan Differential Revision: D34900498 fbshipit-source-id: 65743076d43a933c5381ec128d0268f22c0a8441 (cherry picked from commit 457c76c7d1df6050b941c56a8198162e2e4a3388)	2022-04-01 15:05:37 +00:00
Will Constable	b9e535a64a	Add non-eager registration to dispatch autogen (#74557 ) Summary: Previously, the torchscript backend would be (partially) initialized at startup. - the dispatcher registrations would be registered, - but other backend components would not be initialized until explicitly calling the backend init function With this change, the torchscript backend is not initialized until its explicit initialization function is called. This enables external backends to register their own backend instead of the torchscript backend to the same (Lazy) key. Lands a change contributed by antoniojkim via lazy_tensor_staging branch (https://github.com/pytorch/pytorch/issues/73973) Pull Request resolved: https://github.com/pytorch/pytorch/pull/74557 Reviewed By: bdhirsh Differential Revision: D35051464 Pulled By: wconstab fbshipit-source-id: 5a8b0851293e394f49427d1416ee571a8881fe9f (cherry picked from commit ef745a4a2c8d1d7f9510541a20f1f40625ce29de)	2022-04-01 03:42:53 +00:00
Will Constable	14affba799	Fix ir_metadata Python frames func and remove dead code (#74979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74979 Reviewed By: alanwaketan Differential Revision: D35261641 Pulled By: wconstab fbshipit-source-id: e82b5f17d0043c4a3de72c16fb42fd02a85414fe (cherry picked from commit fc6c0a1654256871361a5ad08926bc39d74cd0c5)	2022-03-31 23:23:36 +00:00
Nikolay Korovaiko	5177f95d21	Introducing SymInt to Pytorch (for tracing size arithmetic) (master rebase) (#74861 ) Summary: This PR introduces `SymInt` type to Pytorch which will be used by LTC and AOTAutograd for tracing size arithmetic and tests. `SymInt` is a C++ union structure [int64_t, SymbolicIntNode*] that wraps around an int64_t field where the value of the field could be an index into a list of `shared_ptr<SymbolicIntNode>` or a real int. This PR doesn't add any support for actually tracing symbolic ints. i.e. data_ for now can only contain real ints. ``` Goal 1: just to show we can add a type to PyTorch core. (wraps int) LANDEABLE Finalize the naming - symint Want the name to be short Does invoke “size” - NO SInt/SymInt/SymbolicInt SInt could mean signed int sym_int or symint or SymInt (originally it was “int”; capitalized implies object semantics, whereas lowercase implies value semantics) JIT schema - symint C++ - symint ``` See more details here: https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8 (`d843f63f2a`)YLw-jxEw Pull Request resolved: https://github.com/pytorch/pytorch/pull/74861 Reviewed By: qihqi, ngimel Differential Revision: D35226230 Pulled By: Krovatkin fbshipit-source-id: 34acf342bd50fcaa4d8d5dd49c2fd6a98823a5b3 (cherry picked from commit 218643f63ef181cabb92d13a6e837eb64f2dda3c)	2022-03-31 21:59:59 +00:00
jjsjann123	873ced7cd0	Nvfuser code bump 030122 (#73627 ) Summary: Things changed in this PR that requires review: test/forward_backward_compatibility/check_forward_backward_compatibility.py Our previous function overload extension names were wrong and has been updated in this PR, hence the compatibility list updated. nvfuser code updates with bug fixes towards failures we encountered in OpInfoTests as well as failures reported by AOTAutograd team. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73627 Reviewed By: Chillee Differential Revision: D34765458 Pulled By: davidberard98 fbshipit-source-id: c81f3d6a1b723fb3a8ba419b7f82227f70440ca7 (cherry picked from commit b6a2c362c37051e44fac31687b2fe272f776551e)	2022-03-31 08:18:22 +00:00
Nikita Shulga	43313cbde3	Revert D34647822: [tensorexpr] Add support for aten::stack Test Plan: revert-hammer Differential Revision: D34647822 (`954c7e2a77`) Original commit changeset: 3b863c71886c Original Phabricator Diff: D34647822 (`954c7e2a77`) fbshipit-source-id: e9ce06c9c8d7caf0fbb2565f0d99035bad685793 (cherry picked from commit b2ff355e9dbaa4e940fb221254223984c3c8a215)	2022-03-31 04:25:43 +00:00
Nikita Shulga	320e5a8268	Revert D34808051: [tensorexpr] Enabled aten::stack in the fuser pass with static shapes Test Plan: revert-hammer Differential Revision: D34808051 Original commit changeset: 213e2ffdf87f Original Phabricator Diff: D34808051 fbshipit-source-id: b618daeb346f784e8ab9525040edcb4a30a39613 (cherry picked from commit e47b973cba5c95e9410f8aecdfd5619de6d4be7c)	2022-03-31 04:25:43 +00:00
Hui Guo	90c3699cc8	[tensorexpr] Enabled aten::stack in the fuser pass with static shapes (#74077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74077 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D34808051 Pulled By: huiguoo fbshipit-source-id: 213e2ffdf87fb1a74104037cea7ef25e4bfd4307 (cherry picked from commit ad9e84842e5b47eda845827d325b08ba361a8286)	2022-03-31 04:25:43 +00:00
Elias Ellison	2ef5611f31	Add comments for adding shape function and linting (#73570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570 Approved by: https://github.com/huiguoo Test Plan: contbuild & OSS CI, see `6d36bbde7e` Reviewed By: pbelevich Differential Revision: D35192688 Pulled By: atalman fbshipit-source-id: b12b80e6a6dd1adaa57a8facb6bb077989faa543 (cherry picked from commit e50478c02592597f12b8490ec5496f76c7d8b8cc)	2022-03-31 04:25:43 +00:00
Nikita Shulga	3036a0309d	[skip ci]Revert "Add comments for adding shape function and linting" This is a technical revert of `6d36bbde7e` to reconcile it with e50478c02592597f12b8490ec5496f76c7d8b8cc (which is the same + lint changes applied) Should be skipped during import	2022-03-30 21:21:28 -07:00
Hui Guo	954c7e2a77	[tensorexpr] Add support for aten::stack (#73801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73801 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D34647822 Pulled By: huiguoo fbshipit-source-id: 3b863c71886c7c6616b16f5d3313079714c8b82a (cherry picked from commit c71778cf6a5724d26b671bf3ee0478add24990e8)	2022-03-30 21:25:15 +00:00
Dave Bort	f82b2d4a82	[PyTorchEdge] Make _load_parameters() handle flatbuffer inputs (#74580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74580 Handle Flatbuffer-serialized parameters. Make `_load_parameters()` detect the input data format and use the correct deserializer to load the parameters. Also, rename `BytecodeDeserializer` to `IValueUnpickler` to make it clear that it unpickles an `IValue` and doesn't have anything to do with bytecode. ghstack-source-id: 152487890 Test Plan: New unit test shows a successful round trip from _save_parameters() to _load_parameters() using flatbuffers. ``` $ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated Total time: 0.6 sec Testing: finished in 0.5 sec (26 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer TESTS PASSED ``` Reviewed By: qihqi Differential Revision: D34488913 fbshipit-source-id: 8d2c0b895699f3b336115d33bf96d49cbf9245d2 (cherry picked from commit 319345deff260826197f8cdf5ac03071b412c72f)	2022-03-30 20:39:58 +00:00
Dave Bort	1659a267f9	[PyTorchEdge] Export flatbuffers from _save_parameters() (#74579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74579 Now that we can convert a module to a flatbuffer, update `_save_parameters()` to optionally write to that format. Also, rename the internal `ScriptModuleSerializer` class to `IValuePickler` to make it more clear that a) it's pickle-specific, and b) it serializes IValues, not Modules. ghstack-source-id: 152487889 Test Plan: New unit test shows that we can produce Flatbuffer-formatted output. ``` $ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated Total time: 0.6 sec Testing: finished in 0.5 sec (26 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer TESTS PASSED ``` A new test in later commit D34488913 tests the full round trip. Reviewed By: qihqi Differential Revision: D34408538 fbshipit-source-id: eea183c31b5e1b2b75a65f384d8a479223a4ae72 (cherry picked from commit de310a15422b65fb7e443f7005d287d9f5f586bc)	2022-03-30 20:39:58 +00:00
Elias Ellison	6d36bbde7e	Add comments for adding shape function and linting Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570 Approved by: https://github.com/huiguoo	2022-03-29 23:02:22 +00:00
Elias Ellison	9c4a63787b	Add api for changing function executor settings, hook up execution with decomposition registry (#74186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74186 Make the execution settings mutable on function_impl so that we can set it for running op decompositions. Add mapping to function objects and show example in test of executing op decompositions. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D34938125 Pulled By: eellison fbshipit-source-id: adf108b2f6c1bd166910c6d7b94245661d67ce0d (cherry picked from commit 9957e33803002d9e71abe4ff802769270b6960d3)	2022-03-29 18:38:52 +00:00
Elias Ellison	0ecf1add1b	Introduce function-local settings for executor, expose in c++ (#74012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74012 This allows setting an executor on a function. The first use case is use to decompositions in C++ without additional fusion passes etc which might not work with custom tensors like batched tensors/vmap. A subsequent use case might be taking advantage of invokees of JIT execution which guard on certain properties before invocation (such as complete shapes in AOT autograd, rank in lazy tensor). Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D34938124 Pulled By: eellison fbshipit-source-id: cf7a45416457942b872322cab47d871a8336bdb5 (cherry picked from commit 9c600eb9ad0f2173f003e511268e97584edae36d)	2022-03-29 18:38:52 +00:00
Elias Ellison	6694fdaccd	Clean up profiling mode and profiling executor strategy (#73875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875 Previously we had a few settings: - getExecutor - which toggled between Profiling Executor and Legacy - getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations) and then... - getProfilingMode - which would set PE to 0 specializtions. The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93. The tests here are failing but get fixed with the PR above it, so i'll squash for landing. Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D34938130 Pulled By: eellison fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b (cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)	2022-03-29 18:38:51 +00:00
Kurt Mohler	5375b2e994	Resolve `int[]?` arguments to new OptionalIntArrayRef class This PR uses the `OptionalArrayRef` template class that was drafted in #64084. Fixes #44409 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70864 Approved by: https://github.com/ezyang	2022-03-26 01:45:50 +00:00
Pavithran Ramachandran	fc2cf3d26f	Back out "Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support `only pickle` and `pickle + flatbuffer` for migration" (#74594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74594 Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default. Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration. Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer BEFORE: ```lang=mermaid graph TD; torch_core-->torch_mobile_deserialize; torch_mobile_core-->torch_mobile_deserialize; jit_module_saving-->torch_core; jit_module_saving-->torch_mobile_core; torch_mobile_deserialize-->caffe2_serialize; torch_mobile_deserialize-->torch_mobile_module; caffe2_serialize-->miniz; flatbuffer_loader-->mobile_bytecode; flatbuffer_serializer-->mobile_bytecode; mobile_bytecode-->flatbuffer_2.0; flatbuffer_loader-->torch_mobile_module; flatbuffer_serializer-->torch_mobile_module; ``` AFTER: ```lang=mermaid graph TD; torch_core-->torch_mobile_deserialize; torch_mobile_core-->torch_mobile_deserialize; jit_module_saving-->torch_core; jit_module_saving-->torch_mobile_core; torch_mobile_deserialize-->caffe2_serialize; torch_mobile_deserialize-->torch_mobile_module; caffe2_serialize-->miniz; flatbuffer_loader-->mobile_bytecode; flatbuffer_serializer-->mobile_bytecode; mobile_bytecode-->flatbuffer_2.0; torch_mobile_deserialize_pickle_and_flatbuffer-->\|new\| flatbuffer_loader; torch_mobile_deserialize_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize; torch_mobile_core_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize_pickle_and_flatbuffer; torch_core_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize_pickle_and_flatbuffer; jit_module_saving_pickle_and_flatbuffer-->\|new\| torch_core_pickle_and_flatbuffer; jit_module_saving_pickle_and_flatbuffer-->\|new\| torch_mobile_core_pickle_and_flatbuffer; flatbuffer_serializer-->torch_mobile_module; jit_module_saving_pickle_and_flatbuffer-->\|new\|jit_module_saving; jit_module_saving_pickle_and_flatbuffer-->\|new\|flatbuffer_serializer; flatbuffer_loader-->torch_mobile_module; ``` Original commit changeset: 780dfb6fd6ba Original Phabricator Diff: D34805092 (`284b2b7135`) ghstack-source-id: 152044801 (Note: this ignores all push blocking failures!) Test Plan: CI ``` ~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 //caffe2/test/cpp/jit:jit -- FlatbufferTest.ExtraFiles Parsing buck files: finished in 0.9 sec Building: finished in 5.3 sec (100%) 12992/54304 jobs, 0/54304 updated Total time: 6.2 sec More details at https://www.internalfb.com/intern/buck/build/2b387fff-f813-4cfa-b53f-eb2378630d4e BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d Trace available for this run at /tmp/tpx-20220323-134108.766518-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d/trace.log RemoteExecution session id: reSessionID-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 486 tests discovered (19.122) ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.187) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693 ``` Similar Build Deps Dags ``` [pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact \| pastry P486770901: https://www.internalfb.com/intern/paste/P486770901/ [pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact \| pastry P486771278: https://www.internalfb.com/intern/paste/P486771278/ ``` pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901 pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278 Reviewed By: iseeyuan Differential Revision: D35067157 fbshipit-source-id: 9044259c17a2e0da79bd6aedb28efbdfd57e23e0 (cherry picked from commit f738069ec3a72e79da56172741d027de514e9e5f)	2022-03-24 21:51:05 +00:00
Will Constable	3547f20872	Land remaining parts of Torchscript Lazy Tensor backend (#74111 ) Summary: Also enables bazel build to run lazy codegen. Bazel (oss) build feeds off the same filelists as cmake/buck (build_variables.bzl), so enabling it is easier than keeping it disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74111 Test Plan: Run CI and verify test_lazy_ops is running via OSS cmake builds Reviewed By: bdhirsh Differential Revision: D34772403 fbshipit-source-id: 8a63f58b9536e6ac1be530667932176ef2549496 (cherry picked from commit e807ffb1918853d10b924fdc24f85ee5b1a39021)	2022-03-22 23:14:03 +00:00
Nikita Shulga	c53b3ed20f	Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support `only pickle` and `pickle + flatbuffer` for migration Test Plan: revert-hammer Differential Revision: D34805092 (`284b2b7135`) Original commit changeset: 57f3fc81d68f Original Phabricator Diff: D34805092 (`284b2b7135`) fbshipit-source-id: 780dfb6fd6ba5f9348f24a2fb3c57971b7155541 (cherry picked from commit bebeb8b84e11c34cbde4857d0e1c291731a7c781)	2022-03-22 22:45:50 +00:00
Pavithran Ramachandran	284b2b7135	Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support `only pickle` and `pickle + flatbuffer` for migration (#74209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74209 Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default. Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration. Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer BEFORE: ```lang=mermaid graph TD; torch_core-->torch_mobile_deserialize; torch_mobile_core-->torch_mobile_deserialize; jit_module_saving-->torch_core; jit_module_saving-->torch_mobile_core; torch_mobile_deserialize-->caffe2_serialize; torch_mobile_deserialize-->torch_mobile_module; caffe2_serialize-->miniz; flatbuffer_loader-->mobile_bytecode; flatbuffer_serializer-->mobile_bytecode; mobile_bytecode-->flatbuffer_2.0; flatbuffer_loader-->torch_mobile_module; flatbuffer_serializer-->torch_mobile_module; ``` AFTER: ```lang=mermaid graph TD; torch_core-->torch_mobile_deserialize; torch_mobile_core-->torch_mobile_deserialize; jit_module_saving-->torch_core; jit_module_saving-->torch_mobile_core; torch_mobile_deserialize-->caffe2_serialize; torch_mobile_deserialize-->torch_mobile_module; caffe2_serialize-->miniz; flatbuffer_loader-->mobile_bytecode; flatbuffer_serializer-->mobile_bytecode; mobile_bytecode-->flatbuffer_2.0; torch_mobile_deserialize_pickle_and_flatbuffer-->\|new\| flatbuffer_loader; torch_mobile_deserialize_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize; torch_mobile_core_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize_pickle_and_flatbuffer; torch_core_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize_pickle_and_flatbuffer; jit_module_saving_pickle_and_flatbuffer-->\|new\| torch_core_pickle_and_flatbuffer; jit_module_saving_pickle_and_flatbuffer-->\|new\| torch_mobile_core_pickle_and_flatbuffer; flatbuffer_serializer-->torch_mobile_module; jit_module_saving_pickle_and_flatbuffer-->\|new\|jit_module_saving; jit_module_saving_pickle_and_flatbuffer-->\|new\|flatbuffer_serializer; flatbuffer_loader-->torch_mobile_module; ``` ghstack-source-id: 151744258 Test Plan: Similar Build Deps Dags ``` [pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact \| pastry P486770901: https://www.internalfb.com/intern/paste/P486770901/ [pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact \| pastry P486771278: https://www.internalfb.com/intern/paste/P486771278/ ``` pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901 pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278 Reviewed By: iseeyuan Differential Revision: D34805092 fbshipit-source-id: 57f3fc81d68fce941a050c35bd8e6f05951183b3 (cherry picked from commit 671ae4ed29e65b86ffe507a503548d3e86ab0ea4)	2022-03-22 20:00:53 +00:00
Han Qi	4b4f652f79	[3/5] Put JIT source inside flatbuffer (#74245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74245 title Test Plan: unittest Reviewed By: iseeyuan Differential Revision: D34881612 fbshipit-source-id: 7037982e9267ad72b86e91cd5f2d92426d71dd56 (cherry picked from commit 88f34eb55b2bee6ef8ef27188e075fa2b8767fdf)	2022-03-17 18:46:47 +00:00
Will Constable	d67a265881	Sync lazy_tensor_staging to master (#74311 ) Summary: This merges changes that have already been reviewed/landed onto lazy_tensor_staging branch. It combines changes from multiple PRs into one diff. updated from lazy_tensor_staging on 3/16 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74311 Test Plan: Run CI to ensure compilation on various platforms Run unit tests on lazy_tensor_staging branch with source version of all these diffs Reviewed By: desertfire Differential Revision: D34929235 fbshipit-source-id: babbc3bbeabc5b8107ee9284ed7765887a148622 (cherry picked from commit d91577a6557343ec536f6859e4808ec1a8a9b685)	2022-03-17 16:08:57 +00:00
Will Constable	44a8d4d998	Add lazy tensor unit tests, disabled (#74309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74309 Since the test file is large, it can be landed on its own and then switched on in the diff that actually builds lazy tensor code. Test Plan: verify CI passes Reviewed By: desertfire Differential Revision: D34928619 fbshipit-source-id: cd556155326f7fb55b3f29031f80bc36c936d565 (cherry picked from commit 60945adbefb6a8d19f89e330f8b344d076b13bfc)	2022-03-17 15:31:26 +00:00
Will Constable	72b1194464	Run lazy tensor codegen in generate_code.py (#73996 ) Summary: Hooks into existing autograd codegen script (generate_code.py) to take advantage of its integrations into buck/cmake/bazel. Adds a new option (--gen_lazy_ts_backend) to. generate_code.py, calling this from CMake OSS build and fbcode build, but not from other internal xplat/ovrsource builds (these could be opted in later) Bazel support is added in a later diff. Includes one generated file (torch/csrc/lazy/generated/LazyIr.h) in a unit test (test/cpp/lazy/test_ir.cpp) to partially verify the generator is working, but does not compile the remaining output sources from the generator yet as they depend on other files not yet landed from lazy_tensor_staging branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73996 Test Plan: OSS/internal CI - verify all builds are working and test_ir.cpp compiles LazyIr.h Reviewed By: ezyang Differential Revision: D34408536 fbshipit-source-id: 8af0aea3b95d81eccafc17d64390d70ddd176515 (cherry picked from commit f930612f2bad61c76eb02d85cfbec9f33a1459dc)	2022-03-17 15:31:26 +00:00
Han Qi	ded82ad7c7	Create method to map JIT module to (source, constant) and back. (#74119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74119 implemented function to generate source as ExtraFilesMap and constants wrote function to construct jit module given (ivalue, source, constant) tripple. Test Plan: unittest Reviewed By: pavithranrao Differential Revision: D34803945 fbshipit-source-id: 2edc798407fe68294cb4c3c7516f5bd143df88c3 (cherry picked from commit 35e54e166b8f0f5cfe8f08c07866b59ae61ee79d)	2022-03-15 18:30:08 +00:00
Taylor Robie	0b1f3bd158	[Profiler] Prefer TSC to wall clock when available (#73855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73855 Calling the clock is one of the most expensive parts of profiling. We can reduce the profiling overhead by using `rdtsc` instead. The tradeoff is that we have to measure and convert. (shift and scale) Test Plan: I added a cpp unit test with very aggressive anti-flake measures. I also ran the overhead benchmark (9 replicates) with `--stressTestKineto` (0.94 -> 0.89 us) and `--stressTestKineto --kinetoProfileMemory` (1.27 -> 1.17 us) Reviewed By: chaekit Differential Revision: D34231071 fbshipit-source-id: e3b3dd7580d93bcc783e87c7f2fc726cb74f4df8 (cherry picked from commit e8be9f8160793c6ee35d5af02bca3e01703e377d)	2022-03-13 18:29:06 +00:00
Taylor Robie	5a58820f01	[Profiler] Specialized AppendOnlyQueue (#73409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73409 We can do better than `vector` or `deque`, and it's sufficiently important to the hot path to justify a custom container. (This is part of the larger queue refactor, but this is a standalone drop-in replacement so we don't need to wait.) Test Plan: It's a pretty simple container type, so I just added a few cpp tests for emplace and read back. I also ran the overhead benchmark (replicates=9) with both `--stressTestKineto` (0.99 -> 0.94 us) and `--stressTestKineto --kinetoProfileMemory` (1.36 -> 1.27 us). Reviewed By: swolchok Differential Revision: D34231072 fbshipit-source-id: ed57299729d444d59cf843a0d38a3ee2240eeec1 (cherry picked from commit 43907948f3a8d2137244e7bb59f43999bd660917)	2022-03-11 19:47:40 +00:00
David Dang	abfaef0aec	[Quant][core] Merged conv packed params and linear packed params (#73486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73486 conv and linear packed params were previously defined in ATen/native/quantized/cpu/conv_packed_params.h> and ATen/native/quantized/cpu/packed_params.h>. These two files have been merged into one and has been relocated to ATen/native/quantized/cpu/packed_params.h>. Differential Revision: D34513286 D34513286 Test Plan: Imported from OSS Reviewed By: dagitses Pulled By: dzdang fbshipit-source-id: 813845af7ea9449e316ab7822efe7460f0bd0d88 (cherry picked from commit 2f627561f27f81977ff73b8863c5e9e719dc4c60)	2022-03-11 15:18:45 +00:00
Ivan Kobzarev	519e226b66	[tensorexp] ExternalCall2 without memcpy (#72225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72225 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D33960933 Pulled By: IvanKobzarev fbshipit-source-id: fc73a3de9e5150919e3806516065b4a6c8316000 (cherry picked from commit f637842c341e0ba94906a0c8a1efc81691dc512c)	2022-03-09 21:19:26 +00:00
Han Qi	0723639b60	Revert D34455360: Multisect successfully blamed D34455360 for test failures Summary: This diff is reverting D34455360 (`61d6c43864`) D34455360 (`61d6c43864`) is making the following tests to fail and this revert diff is either the revert of the blame diff or the revert of the stack of diffs that need to be reverted to revert the blame diff Tests affected: - https://www.internalfb.com/intern/test/562950004334605/ Multisect link: https://www.internalfb.com/intern/testinfra/multisect/756170 Test Plan: NA Reviewed By: zhxchen17 Differential Revision: D34596156 fbshipit-source-id: a465bca0094db3caf6130c80f1ed49eea981359b (cherry picked from commit ef5e5578c64ce9827570757fb016aafa9c782c6a)	2022-03-08 23:18:54 +00:00
Elias Ellison	52ccbf4494	Lock thread/block computation (#73800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73800 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D34647281 Pulled By: eellison fbshipit-source-id: adbdaf24191c4c1b85e0b62564388f2481002ed2 (cherry picked from commit 6cf38015cc14691518b1b5cb7d636e80eb3684fc)	2022-03-04 22:32:08 +00:00
Dave Bort	7b51629c53	[PyTorchEdge] Add getFileFormat() so we can differentiate Zip/Pickle from Flatbuffer (#73707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73707 Add a helper function to detect the file format from the first bytes of a data file or stream. This will be necessary during the migration from Pickle-serialized modules to Flatbuffer-serialized modules. ghstack-source-id: 150384317 Test Plan: Existing tests for ZIP+Pickle continue to pass. New unit tests pass: ``` cd xplat && buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_interpreter Building: finished in 26.6 sec (100%) 3180/3180 jobs, 571/3180 updated Total time: 32.2 sec Testing: finished in 07:08.3 min (89 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/caffe2:test_lite_interpreter //xplat/caffe2:test_lite_trainer PASS 421.1s 81 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_interpreter PASS 103ms 8 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer TESTS PASSED ``` Reviewed By: iseeyuan Differential Revision: D34527859 fbshipit-source-id: ff2d1eabc2f8be1de2e44709c878e2d1a373f0df (cherry picked from commit 5c394848346ab9e374c9e7eed479ad70ed09a7ae)	2022-03-04 19:35:41 +00:00
Han Qi	61d6c43864	Make debug_pkl smaller by only emitting unique traces. (#73368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368 debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings. Since many SourceRange shares the same source, the string for trace can be deduped. The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression). The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup. To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction. Test Plan: unit test Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents: ``` [qihan@devvm5585.vll0 ~]$ du archive -h 4.0K archive/xl_model_weights 3.7M archive/extra 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform 8.0K archive/code/__torch__/caffe2/torch/fb 8.0K archive/code/__torch__/caffe2/torch 8.0K archive/code/__torch__/caffe2 20M archive/code/__torch__/torch/fx/graph_module 20M archive/code/__torch__/torch/fx 8.0K archive/code/__torch__/torch/classes 20M archive/code/__torch__/torch 20M archive/code/__torch__ 20M archive/code 2.7M archive/constants 35M archive [qihan@devvm5585.vll0 ~]$ du resaved -h 4.0K resaved/extra 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform 8.0K resaved/code/__torch__/caffe2/torch/fb 8.0K resaved/code/__torch__/caffe2/torch 8.0K resaved/code/__torch__/caffe2 1.3M resaved/code/__torch__/torch/fx/graph_module 1.3M resaved/code/__torch__/torch/fx 8.0K resaved/code/__torch__/torch/classes 1.4M resaved/code/__torch__/torch 1.4M resaved/code/__torch__ 1.4M resaved/code 2.7M resaved/constants 13M resaved [qihan@devvm5585.vll0 ~]$ ``` Reviewed By: gmagogsfm Differential Revision: D34455360 fbshipit-source-id: 8cc716f9bba7183746b1b4ecc33a2de34ac503b9 (cherry picked from commit f1a04730fc9ac8fdab6c8e4c44cb5529e42090e4)	2022-03-02 08:37:08 +00:00
Mengwei Liu	9ce9803abe	[PyTorch] Add codegen unboxing ability (#69881 ) Summary: RFC: https://github.com/pytorch/rfcs/pull/40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In https://github.com/pytorch/pytorch/issues/71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023 (cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)	2022-03-01 23:28:13 +00:00
Elias Ellison	d3d74e9040	Allow custom registration of shape functions (#73270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73270 Together with open registration of NNC lowerings this should make possible to add support for custom operators, including internal fb-ops Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D34451275 Pulled By: eellison fbshipit-source-id: ae8ae2deb93caa6770e738217461e65853897b55 (cherry picked from commit ea6b7e8a6d8f970a20e68d02eefc5c951e32aa07)	2022-02-28 17:44:45 +00:00
Pavithran Ramachandran	62eb7d64cf	[PyTorchEdge] Extend flatbuffer to support extra files map (#72951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72951 Extend flatbuffer to support extra files map Flatbuffer schema has extra files. The users can write extra files by providing a `map<string, string>` which will be part of the flatbuffer model asset and and can be loaded back similar to pickle. ghstack-source-id: 149622799 Test Plan: fb: ```[pavithran@devvm5216.vll0 ~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.ExtraFiles Parsing buck files: finished in 0.7 sec Downloaded 0/8 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 20.0 sec (100%) 22343/22343 jobs, 4/22343 updated Total time: 20.7 sec More details at https://www.internalfb.com/intern/buck/build/7dba5034-d623-4a1e-afa1-b0e809df7066 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 9c1ac1e0-a8c0-4a62-95df-8f49695aa7d1 Trace available for this run at /tmp/tpx-20220216-144630.207992/trace.log RemoteExecution session id: reSessionID-9c1ac1e0-a8c0-4a62-95df-8f49695aa7d1-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7318349470518809 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 468 tests discovered (17.211) ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.169) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7318349470518809```` Reviewed By: iseeyuan Differential Revision: D34286346 fbshipit-source-id: 4e09ab25b8ed6af6f8923db3aab046c255f13bb8 (cherry picked from commit ce8d88e22a360b25253d8a75f428d523fa88a79a)	2022-02-24 19:39:32 +00:00
Jacob Szwejbka	faacb8ab36	[Pytorch Edge] Lean Runtime Test Summary: As far as I can tell theres no CI that actually runs the lean_runtime. This should add it I think. (Is this directory covered by CI?) Next up is to create some test for min_runtime_lib (Note: this ignores all push blocking failures!) Test Plan: buck test :lean_runtime_delegate_flatbuffer_test Reviewed By: iseeyuan Differential Revision: D34255148 fbshipit-source-id: b44693220e93869edd984bbcd17d33db4007a4ea (cherry picked from commit 0a4a6b5bd2b4a1f8cce8bc1c4a22dad9539631c1)	2022-02-24 18:40:47 +00:00
Alban Desmaison	3bd1507ff2	Revert D33994011: Make debug_pkl smaller by only emitting unique traces. Test Plan: revert-hammer Differential Revision: D33994011 (`3d37f5b052`) Original commit changeset: 8e6224c6e942 Original Phabricator Diff: D33994011 (`3d37f5b052`) fbshipit-source-id: 885e739efa1081382e1fcf9c6cccba92c57e9f7a (cherry picked from commit a6d98c85a736c2eb321a6f38005dd0f5dc43eb87)	2022-02-24 16:38:55 +00:00
Han Qi	3d37f5b052	Make debug_pkl smaller by only emitting unique traces. (#72596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72596 debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings. Since many SourceRange shares the same source, the string for trace can be deduped. The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression). The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup. To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction. Test Plan: unit test Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents: ``` [qihan@devvm5585.vll0 ~]$ du archive -h 4.0K archive/xl_model_weights 3.7M archive/extra 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform 8.0K archive/code/__torch__/caffe2/torch/fb 8.0K archive/code/__torch__/caffe2/torch 8.0K archive/code/__torch__/caffe2 20M archive/code/__torch__/torch/fx/graph_module 20M archive/code/__torch__/torch/fx 8.0K archive/code/__torch__/torch/classes 20M archive/code/__torch__/torch 20M archive/code/__torch__ 20M archive/code 2.7M archive/constants 35M archive [qihan@devvm5585.vll0 ~]$ du resaved -h 4.0K resaved/extra 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform 8.0K resaved/code/__torch__/caffe2/torch/fb 8.0K resaved/code/__torch__/caffe2/torch 8.0K resaved/code/__torch__/caffe2 1.3M resaved/code/__torch__/torch/fx/graph_module 1.3M resaved/code/__torch__/torch/fx 8.0K resaved/code/__torch__/torch/classes 1.4M resaved/code/__torch__/torch 1.4M resaved/code/__torch__ 1.4M resaved/code 2.7M resaved/constants 13M resaved [qihan@devvm5585.vll0 ~]$ ``` Reviewed By: JasonHanwen Differential Revision: D33994011 fbshipit-source-id: 8e6224c6e942e91c3403f686c8f0937d1002ed41 (cherry picked from commit a7014dd4029308c95007f362a57c31796d686647)	2022-02-24 09:31:16 +00:00
Hui Guo	5eb5b61221	[tensorexpre] Add typecast when src and dest buf types are different in PlacementAllocate (#71934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71934 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33826700 Pulled By: huiguoo fbshipit-source-id: 9fb29a43ab5983586a6bfde3a34d7e2f2120ab0a (cherry picked from commit 2bee018691ec888cb1ec761528951f5745d7ef79)	2022-02-23 19:36:50 +00:00

... 3 4 5 6 7 ...

2231 Commits