pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Edward Z. Yang	9790d90e4b	Don't introduce new overload for SymInt (#83628 ) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it as if it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-08-26 01:35:40 +00:00
Xiang Gao	a4a55f5ea6	New TORCH_UCC_BLOCKING_WAIT env variable (#81791 ) Cherry-pick of https://github.com/facebookresearch/torch_ucc/pull/95. I recommend waiting until https://github.com/pytorch/pytorch/pull/81583 is merged first, so the CI is checking if this PR compiles correctly. Marking this as a draft for now, will change to "ready for review" once https://github.com/pytorch/pytorch/pull/81583 merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81791 Approved by: https://github.com/kwen2501	2022-08-25 21:33:17 +00:00
Nikolay Korovaiko	86e134ddf7	disable c10::SymIntNode tests on mobile (#84066 ) This fixes c++ tests' breaks where we were passing pointers and expected `is_symbolic` to return `true` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84066 Approved by: https://github.com/albanD	2022-08-25 17:28:23 +00:00
jjsjann123	b21a6ff639	[NVFuser] Upstream push 0811 (#83239 ) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. double support in expression evaluator - bug fixes: 1. dropout fix - rework RNG to support broadcasted dropout (Fixes #82784) 2. expand fix - Patch expand+reduction, expand+view, rework view analysis and guard - scheduler: 1. manual transpose schedule example 2. WIP transpose scheduler Commits that's in this PR from the devel branch: ``` b7435afcd22c917713c2f41a7237bc26e1183f14 Transpose scheduler, step 1 (#1854) 8a45dbf72034684eb8e18b1835b533e90b68f184 Add an example on how to manually schedule transpose (#1889) 83dbf56a9554b2efbd5416461d938fff477b0b27 Patch dropout fix (#1898) 69d3519a532250719b1aa8341b50e067b181b42d Expand+Reduction, Expand+View support, rework View analysis and guards (#1883) 15091c488e96343bdc49e3990acbf238a3b3da51 Rework RNG to correctly support broadcasted dropout (#1888) aafe2d048aaac596e503596a41303423619f3954 Make ExpressionEvaluator support Double (#1885) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D38657074](https://our.internmc.facebook.com/intern/diff/D38657074) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83239 Approved by: https://github.com/davidberard98	2022-08-25 02:23:22 +00:00
PyTorch MergeBot	a7edf71360	Revert "Don't introduce new overload for SymInt (#83628 )" This reverts commit `8fae7027b3`. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to breaking internal builds, see https://www.internalfb.com/diff/D38984222	2022-08-25 00:49:40 +00:00
Richard Barnes	67f0940cdd	Check all CUDA API calls for errors in test/ (#74921 ) (#83954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74921 Test Plan: Sandcastle Reviewed By: ezyang, malfet, ngimel Differential Revision: D35194966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83954 Approved by: https://github.com/ezyang	2022-08-24 20:12:25 +00:00
Larry Liu	a8a36c45a6	[frontend] Fix tensor list alias annotation (#84005 ) For issue https://github.com/pytorch/pytorch/issues/77920 and a retry of https://github.com/pytorch/pytorch/pull/83921 The current logic checks alias info before `[]` and after. If no alias info exists after `[]`, we overwrite the alias info before. This logic failed on argument like `Tensor(a!)[]`, dropping the alias info before `[]` on the floor. This PR adds a new alias info if it's missing after `[]`. This way we can keep the alias info before `[]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84005 Approved by: https://github.com/cccclai, https://github.com/bdhirsh	2022-08-24 19:50:19 +00:00
Nikolay Korovaiko	b842670aa5	logical ops (#83879 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83879 Approved by: https://github.com/ezyang	2022-08-24 17:49:57 +00:00
Nikolay Korovaiko	2b805e3520	add arithmetic ops (#83878 ) arithmetic ops tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/83878 Approved by: https://github.com/ezyang	2022-08-24 17:49:56 +00:00
Edward Z. Yang	8fae7027b3	Don't introduce new overload for SymInt (#83628 ) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it as if it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-08-23 22:04:07 +00:00
Nikolay Korovaiko	fcb124406b	release the current symintnode in the move c-tor (#83789 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83789 Approved by: https://github.com/ezyang	2022-08-22 14:37:06 +00:00
Milad Mohammadi	72963bbae9	Update isDynamic api to align with is_symbolic API (#83415 ) Downstream #https://github.com/pytorch/xla/pull/3888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83415 Approved by: https://github.com/Krovatkin	2022-08-18 22:53:19 +00:00
soulitzer	31fad3926a	Add option to run anomaly mode without nan checking (#83481 ) Fixes https://github.com/pytorch/pytorch/issues/83117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83481 Approved by: https://github.com/albanD	2022-08-16 22:56:23 +00:00
richard	382ef1fda7	Autograd graphtask trim unnecessary edges (#82544 ) ### Introduction <!-- What did you change and why was it needed? --> Removing unnecessary weight gradient calculation is very important for applications that need high-order derivatives during training. However, this is not supported by the current Autograd engine. For more detail: The backward function of a `matmul` operator (e.g., `linear` `addmm` `mm`), has two matmuls, one for `input gradient` and another for `weight gradient`. For a typical neural network (nn) with a few linear layers and activation functions, if the user calls `torch.autograd.grad()` to calculate the derivative of the nn output `y` w.r.t the nn input `x`, only the `input gradient` of the `matmul` operator is needed, and the `weight gradient` is discarded. However, the current PyTorch autograd engine will always calculate the `weight gradient` if `weight` requires gradient (the calculation of the high-order derivative is performed during training). The figure attached shows the autograd graph of the following code snippet: ```py y = torch.nn.functional.linear(x, weight, bias) y = y.pow(2) # first order derivative y__x, = torch.autograd.grad(y, x, grad_outputs=grad_outputs, create_graph=True) # first order derivative y__x__x, = torch.autograd.grad(y__x, x, grad_outputs=grad_outputs, create_graph=True) ``` The path with ❌ is not needed when calculating derivatives. <img width="50%" alt="image" src="https://user-images.githubusercontent.com/9999318/182018117-719c5a23-bcc6-4a63-8e8d-1bca3ebda2e3.png"> ### Issue <!-- Link to Issue ticket or RFP --> Related issue: https://github.com/pytorch/pytorch/issues/56500 ### Method When calling `torch.autograd.grad`, `exec_info_` is created for each GraphTask, which allows filtering paths on the graph that are not needed. However, when the GraphTask calls into the node, the node still does not know whether the edges are needed or not. In the case of matmul, `weight.requires_grad is True` so the weight gradient is always calculated. Following https://github.com/pytorch/pytorch/issues/56500#issuecomment-825694656, this PR passes the graph task's thread_local `exec_info_` into the node, so it could trim unnecessary edges during `torch.autograd.grad` calls. ### Benchmark Benchmark script: https://gist.github.com/yueyericardo/24158433a2021c51eeef9c3e2722df99 Benchmark result: 6 hidden layers, batch size 10000, on A100 FP32 result \| hessian benchmark \| FP32 (before) \| FP32 (After) \| FP32 (Functorch v0.1.1) \| \| ----------------------------- \| ------------- \| ----------------- \| ----------------------- \| \| Linear + ReLU (no backward) \| 55.658 ms \| 29.392 ms (1.90X) \| 29.547 ms (1.90X) \| \| Linear + ReLU (with backward) \| 81.173 ms \| 54.917 ms (1.47X) \| 68.988 ms (1.18X) \| TF32 result \| hessian benchmark \| TF32 (before) \| TF32 (after) \| TF32 (Functorch v0.1.1) \| \| ----------------------------- \| ------------- \| ----------------- \| ----------------------- \| \| Linear + ReLU (no backward) \| 19.801 ms \| 11.259 ms (1.76X) \| 10.754 ms (1.84X) \| \| Linear + ReLU (with backward) \| 29.167 ms \| 20.466 ms (1.42X) \| 22.784 ms (1.28X) \| For FP32 result, we could get 1.9X speed up for hessian calculation, and 1.47X speed up during training, which is even faster than functorch `vmap(jacfwd(jacrev` implementation. (functorch has performance regression on v0.2.0, https://github.com/pytorch/functorch/issues/989, so we are using v0.1.1 for benchmark) @zou3519 does functorch also includes similar optimizations during hessian calculation? If not, what do we need to do so the functorch could also benefit from this PR? ### Testing <!-- How did you test your change? --> - [x] we need to figure out a way for unittest ### Thanks Thanks for the great blog: [How Computational Graphs are Executed in PyTorch \| PyTorch](https://pytorch.org/blog/how-computational-graphs-are-executed-in-pytorch/) cc @zasdfgbnm @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/82544 Approved by: https://github.com/soulitzer	2022-08-11 18:50:09 +00:00
Nikita Shulga	1b2a17b8f9	Build MacOS binaries with `-Werror` (#83049 ) Should prevent proliferating MPS warnings Fixes https://github.com/pytorch/pytorch/issues/82966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83049 Approved by: https://github.com/albanD, https://github.com/ezyang	2022-08-10 17:29:44 +00:00
Nikita Shulga	62c8d30f9f	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-10 14:32:26 +00:00
PyTorch MergeBot	d3a1f17fc7	Revert "[BE] Add `append_cxx_flag_if_supported` macro (#82883 )" This reverts commit `d7e6aaa59b`. Reverted https://github.com/pytorch/pytorch/pull/82883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-10 10:27:59 +00:00
David Chen	90821aab10	Add SOFT_ASSERT to gracefully recover from invariant violations (#82689 ) Summary: Implement SOFT_ASSERT that only fails in debug mode, but only trigger a warning log in release mode. This allows us to gracefully handle some of the invariant violation when processing traces that doesn't necessarily need to crash the entire program. Test Plan: Added SOFT_ASSERT test in containers.cpp Differential Revision: D38327334 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82689 Approved by: https://github.com/robieta	2022-08-10 00:58:07 +00:00
Han Qi (qihqi)	f9533560cc	Use flatbuffer of alternate namespace (#82952 ) Summary: Minimal change to make use of flatbuffer with fbsource namespace. Test Plan: existing unit tests Differential Revision: D38494999 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82952 Approved by: https://github.com/cccclai	2022-08-09 07:40:59 +00:00
Tugsbayasgalan Manlaibaatar	b4b60c2a2e	Get rid of ENABLE_UPGRADERS macro (#77574 ) Since it's been a while after we merged the upgrader design and we haven't encountered any issues, let's get rid of the macro for safe rollout Pull Request resolved: https://github.com/pytorch/pytorch/pull/77574 Approved by: https://github.com/gmagogsfm	2022-08-09 05:33:14 +00:00
Nikita Shulga	d7e6aaa59b	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-08 21:04:09 +00:00
Dave Bort	6e712823c5	Migrate remaining pytorch code to use new flatbuffer_loader.h APIs (#82620 ) This is the only file in pytorch core that refers to the deprecated flatbuffer_loader.h APIs. Move to the non-deprecated functions. Differential Revision: [D38330369](https://our.internmc.facebook.com/intern/diff/D38330369/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82620 Approved by: https://github.com/qihqi	2022-08-05 02:25:33 +00:00
Dave Bort	0810961d5f	Remove flatbuffer types/headers from flatbuffer_serializer[_jit].h (#82619 ) Hide the flatbuffers types and headers from the serialize APIs, and stop using the DEPRECATED functions from flatbuffer_loader.h. This required creating the new `DetachedBuffer` type to replace/hide `flatbuffers::DetachedBuffer`, a class that owns a span of custom-allocated memory. This is another step towards hiding the flatbuffers types and headers from the load/serialize APIs. Differential Revision: [D38292798](https://our.internmc.facebook.com/intern/diff/D38292798/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38292798/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/82619 Approved by: https://github.com/qihqi	2022-08-05 02:23:34 +00:00
Howard Huang	9d228fe517	[Small] Remove using c10d::ProcessGroup directive from c10d test (#82681 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82681 Approved by: https://github.com/awgu	2022-08-03 17:23:35 +00:00
Edward Z. Yang	df69660832	Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )"" (#82599 ) This reverts commit `532b8a9e00`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599 Approved by: https://github.com/albanD	2022-08-02 19:37:02 +00:00
PyTorch MergeBot	532b8a9e00	Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )" This reverts commit `9465c0e0b5`. Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels	2022-08-01 20:25:35 +00:00
Edward Z. Yang	9465c0e0b5	Add a lint rule for torch/csrc/util/pybind.h include (#82552 ) We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations always be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552 Approved by: https://github.com/albanD	2022-08-01 17:16:58 +00:00
Edward Z. Yang	50e8abbcad	Change SymIntNode into an intrusive pointer (#82548 ) This will make the pointer type a single word, which is important for packing it into an int64_t This time, this diff doesn't segfault when you build with DEBUG mode; more details at https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82548 Approved by: https://github.com/albanD	2022-08-01 15:07:21 +00:00
PyTorch MergeBot	3b9cbb1738	Revert "Change SymIntNode into an intrusive pointer (#82432 )" This reverts commit `7be44f8158`. Reverted https://github.com/pytorch/pytorch/pull/82432 on behalf of https://github.com/ezyang due to segfaults on test but not caught in CI	2022-07-29 20:08:59 +00:00
Edward Z. Yang	7be44f8158	Change SymIntNode into an intrusive pointer (#82432 ) This will make the pointer type a single word, which is important for packing it into an int64_t Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82432 Approved by: https://github.com/albanD, https://github.com/Krovatkin	2022-07-29 17:32:54 +00:00
Max Ren	727a327162	Back out "Back out "[profiling] Adding targets file for test_mobile_profiler"" (#82243 ) Summary: Originally reverted this diff D37116110 (`c9aa74a37f`) because ``` > /usr/local/bin/buck build //caffe2/test/cpp/lite_interpreter_runtime/... BUILD FAILED The rule //caffe2:backend_interface_libAndroid could not be found. Please check the spelling and whether it is one of the 1866 targets in /data/users/batanasov/fbsource/fbcode/caffe2/TARGETS. (52107 bytes) 1 similar targets in /data/users/batanasov/fbsource/fbcode/caffe2/TARGETS are: //caffe2:backend_interface_lib This error happened while trying to get dependency '//caffe2:backend_interface_libAndroid' of target '//caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profilerAndroid' At //caffe2:backend_interface_libAndroid (ovr_config//platform/linux:x86_64-fbcode) At //caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profilerAndroid (ovr_config//platform/linux:x86_64-fbcode) ``` The add test_mobile_profiler was not meant to be built with Android or other mobile platforms, so we are changing the test to a cpp_unittest Test Plan: ``` buck test //caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler Parsing buck files: finished in 0.9 sec Creating action graph: finished in 26.5 sec Downloaded 2/2 artifacts, 1.30 Mbytes, 0.0% cache miss (for updated rules) Building: finished in 16.5 sec (100%) 18451/18451 jobs, 3/18451 updated Total time: 44.0 sec More details at https://www.internalfb.com/intern/buck/build/8bee82c1-66a9-4fae-805f-e4ef5505d25d BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 6904f989-5c17-4c5b-9a4f-ffb643dfcc43 Trace available for this run at /tmp/tpx-20220726-114727.001729-6904f989-5c17-4c5b-9a4f-ffb643dfcc43/trace.log RemoteExecution session id: reSessionID-6904f989-5c17-4c5b-9a4f-ffb643dfcc43-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425183404951 ✓ ListingSuccess: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler : 3 tests discovered (17.640) ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.Backend (0.206) ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.BackendMemoryEvents (0.271) ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.ModuleHierarchy (0.268) Summary Pass: 3 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425183404951 ``` Differential Revision: D38166171 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82243 Approved by: https://github.com/salilsdesai	2022-07-28 23:08:52 +00:00
Edward Z. Yang	fd5ac1e6b5	Rename SymbolicIntNode to SymIntNodeImpl (#82350 ) Done via ``` git grep -l 'SymbolicIntNode' \| xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g' ``` Reasoning for the change: * Sym is shorter than Symbolic, and consistent with SymInt * You usually will deal in shared_ptr<...>, so we're going to reserve the shorter name (SymIntNode) for the shared pointer. But I don't want to update the Python name, so afterwards I ran ``` git grep -l _C.SymIntNodeImpl \| xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/' ``` and manually fixed up the binding code Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350 Approved by: https://github.com/Krovatkin	2022-07-28 18:27:45 +00:00
goldenxuett	c2ccf6e625	[JIT] Add backwards compatibility test for old NonDeterminism ops list in ir.cpp (#82257 ) - Added backwards compatibility test to ensure that every Op in the old Nondeterministic op list from ir.cpp has the tag nondeterministic_seeded. Note that the 3 ops marked "normal" were not actually real op signatures. (ie findOp with dispatcher returned a nullptr). These were changed to normal.Tensor_Tensor, normal.Tensor_float and normal.float_Tensor in the list since that is what matches the rest of their signatures Pull Request resolved: https://github.com/pytorch/pytorch/pull/82257 Approved by: https://github.com/davidberard98	2022-07-27 20:19:22 +00:00
goldenxuett	8d5951e7e8	[JIT] Add is_aliasing method to FunctionSchema (#82255 ) - Add is_aliasing method in function schema to be able to indicate if an argument has an alias_set attached to it. This is utilized in the integration with autograd (see next PR) - Tested in test_schema_info Pull Request resolved: https://github.com/pytorch/pytorch/pull/82255 Approved by: https://github.com/davidberard98	2022-07-27 20:19:21 +00:00
goldenxuett	67c22b6c07	[JIT] Modify is_nondeterministic to utilize tags in SchemaInfo for non-mobile contexts and integrate with ir.cpp (#82253 ) - Modified is_nondeterministic method in SchemaInfo class to utilize tags. - Modified isNonDeterministic method in ir.cpp to utilize SchemaInfo when a Node is an aten op. - Added an assert to ensure that if a node is an aten op kind, it has a schema. - Tested through verifying that all IR.cpp tests run, and through adding 2 custom determinism checks to test for the special dropout edge case and a general bernoulli case. Differential Revision: [D38179499](https://our.internmc.facebook.com/intern/diff/D38179499) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82253 Approved by: https://github.com/davidberard98	2022-07-27 20:19:19 +00:00
PyTorch MergeBot	e1bd244a14	Revert "[JIT] Modify is_nondeterministic to utilize tags in schemaInfo and integrate with ir.cpp (#81836 )" This reverts commit `fc3555ce4d`. Reverted https://github.com/pytorch/pytorch/pull/81836 on behalf of https://github.com/osalpekar due to Internal Mobile NNPACK custom_ops tests failing with Error: tags are not saved for Mobile	2022-07-26 19:11:49 +00:00
PyTorch MergeBot	cbeef2c541	Revert "[JIT] Add is_aliasing method to FunctionSchema (#81916 )" This reverts commit `eb2ea9a581`. Reverted https://github.com/pytorch/pytorch/pull/81916 on behalf of https://github.com/osalpekar due to Need to revert this to revert https://github.com/pytorch/pytorch/pull/81836 cleanly. That PR broke internal mobile custom_ops	2022-07-26 19:06:56 +00:00
PyTorch MergeBot	18dd7e55c9	Revert "[JIT] Add backwards compatibility test for old NonDeterminism ops list in ir.cpp (#82029 )" This reverts commit `7288ea4e1d`. Reverted https://github.com/pytorch/pytorch/pull/82029 on behalf of https://github.com/osalpekar due to Need to revert this to revert https://github.com/pytorch/pytorch/pull/81836 cleanly. That PR broke internal mobile custom_ops	2022-07-26 19:00:44 +00:00
Will Constable	4f34cd6d1e	Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032 ) Avoid exposing defines that conflict with google logging, since this blocks external usage of libtorch in certain cases. All the 'interesting' changes should be in these two files, and the rest should just be mechanical changes via sed. c10/util/logging_is_not_google_glog.h c10/util/logging_is_google_glog.h Fixes https://github.com/pytorch/pytorch/issues/81415 cc @miladm @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/82032 Approved by: https://github.com/soumith, https://github.com/miladm	2022-07-26 01:20:44 +00:00
goldenxuett	7288ea4e1d	[JIT] Add backwards compatibility test for old NonDeterminism ops list in ir.cpp (#82029 ) - Added backwards compatibility test to ensure that every Op in the old Nondeterministic op list from ir.cpp has the tag nondeterministic_seeded. Note that the 3 ops marked "normal" were not actually real op signatures. (ie findOp with dispatcher returned a nullptr). These were changed to normal.Tensor_Tensor, normal.Tensor_float and normal.float_Tensor in the list since that is what matches the rest of their signatures Pull Request resolved: https://github.com/pytorch/pytorch/pull/82029 Approved by: https://github.com/davidberard98	2022-07-25 15:44:34 +00:00
goldenxuett	eb2ea9a581	[JIT] Add is_aliasing method to FunctionSchema (#81916 ) - Add is_aliasing method in function schema to be able to indicate if an argument has an alias_set attached to it. This is utilized in the integration with autograd (see next PR) - Tested in test_schema_info Pull Request resolved: https://github.com/pytorch/pytorch/pull/81916 Approved by: https://github.com/davidberard98	2022-07-25 15:44:33 +00:00
goldenxuett	fc3555ce4d	[JIT] Modify is_nondeterministic to utilize tags in schemaInfo and integrate with ir.cpp (#81836 ) - Modified is_nondeterministic method in SchemaInfo class to utilize tags. - Modified isNonDeterministic method in ir.cpp to utilize SchemaInfo when a Node is an aten op. - Added an assert to ensure that if a node is an aten op kind, it has a schema. - Tested through verifying that all IR.cpp tests run, and through adding 2 custom determinism checks to test for the special dropout edge case and a general bernoulli case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81836 Approved by: https://github.com/davidberard98	2022-07-25 15:44:31 +00:00
goldenxuett	9a5fa15ea8	[JIT] Remove BatchNorm and InstanceNorm special cases from AliasDB and replace with SchemaInfo is_mutable checks (#81785 ) - Generalized AnalyzeImpl cases for batchNorm and InstanceNorm in alias_analysis.cpp using schema_info. - Tested by ensuring all aliasDB special case checks for batchNorm and instanceNorm pass as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81785 Approved by: https://github.com/davidberard98	2022-07-23 05:50:39 +00:00
Peter Bell	8d0cbce069	Lower randint default dtype to the C++ API (#81410 ) The default dtype for randint is currently handled with manual python binding code, this moves it into the `native_functions.yaml` declaration for API consistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81410 Approved by: https://github.com/albanD	2022-07-21 16:42:49 +00:00
goldenxuett	c9497886fd	[JIT] Modify is_mutable in FunctionSchema and SchemaInfo to have SchemaArgument parameter instead of index (#81784 ) - Modify the is_mutable(size_t index) overload to become is_mutable(const SchemaArgument& argument) due to cases where one might want to check the mutability of either input or output arguments. - Refactored all calls to the function to use this new overload - Tested through is_mutable() tests in test_schema_info.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/81784 Approved by: https://github.com/davidberard98	2022-07-20 22:09:56 +00:00
goldenxuett	1ddbc5a7dc	[JIT] Remove has_side_effects functionality from SchemaInfo (#81575 ) - This removes all functionality from https://github.com/pytorch/pytorch/pull/81002 due to a realization that the side effects check doesn't affect any ops outside of JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81575 Approved by: https://github.com/davidberard98	2022-07-19 22:33:19 +00:00
goldenxuett	a6e716cfed	[JIT] Add may_contains_alias function in SchemaInfo class (#81444 ) - Created may_contain_alias method in SchemaInfo which is a wrapper around FunctionSchema may_contain_alias that also accounts for argument values. This is done using similar logic to AliasDB using an internal understanding of wildcard sets and container object - Added a multitude of tests for various graph edge cases (inputs aliasing, outputs aliasing, multiple input wildcards, multiple container objects, etc...). Pull Request resolved: https://github.com/pytorch/pytorch/pull/81444 Approved by: https://github.com/davidberard98	2022-07-19 04:29:22 +00:00
goldenxuett	47cdab6601	[JIT] Fix double wildcard edge case for may_alias in SchemaInfo and improve formatting (#81439 ) - Create c10::AliasTypeSet type def of vector<TypePtr> to match alias_analysis.cpp formatting and improve readability. - Move canAliasTypeSetsAlias, mapTypeToAliasTypeSet, getAliasTypeSetContainedTypes, and getCorrectList to public in function_schema.h for use in SchemaInfo class. In the future it might be better to find a different home for most of these functions since they don't depend on functionSchema. - Created hash function for SchemaArgument - Add assert to ensure that there is only 1 input and 1 output with each alias set (excluding wildcard) - Fixed double wildcard input edge case for may_alias. (This is the case where if there is a schema with the form (Tensor(a) a, Tensor() b, Tensor() c) -> Tensor, and the argument values for 'a' and 'b' cause them to alias, then 'a' may also alias 'c'. - Added tests for double wildcard case in may_alias, mismatching types in may_alias, and the uniqueness internal assert. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81439 Approved by: https://github.com/davidberard98	2022-07-19 04:29:21 +00:00
Nikolay Korovaiko	4aac42cc98	[LT] Add a new backend interface [DUP of the original] (#81662 ) This is a dup of https://github.com/pytorch/pytorch/pull/76517 which is failing because Jiewen needs to resign the CLA. Summary: This commit introduces a new set of BackendImplInterface: GetDefaultDeviceOrdinal and SetDefaultDeviceOrdinal. It allows backend to specify their own default device, e.g, 1 for XLA and 0 for CUDA/CPU. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.* ghstack-source-id: b4adfef49253e51bffbbf40d356188a92c98994d Pull Request resolved: https://github.com/pytorch/pytorch/pull/76517 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/81662 Approved by: https://github.com/JackCaoG, https://github.com/wconstab	2022-07-19 01:15:22 +00:00
zhang, xiaobing	86b86202b5	fix torch.config can't respect USE_MKLDNN flag issue (#75001 ) Fixes https://github.com/pytorch/pytorch/issues/74949, which reports that torch.config can't respect USE_MKLDNN flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75001 Approved by: https://github.com/malfet	2022-07-17 15:00:48 +00:00
goldenxuett	e71f4e7958	[JIT] Implement may_contain_alias in FunctionSchema (#81352 ) - Created may_contain_alias method in FunctionSchema to publicize more detailed aliasing information about inputs and outputs of a schema. This method returns whether the first argument may contain an alias to the second argument (ie if the first argument is a list[Tensor], it can contain an alias to the second argument of the second argument is Tensor(*)) and vice versa if bidirectional = true. - Created helper methods are explained more thoroughly in detail in function_schema.h -Tested may_contain_alias methods for basic functionality, bidirectional functionality, wildcard functionality and dual container functionality in test_schema_info.cpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81352 Approved by: https://github.com/davidberard98, https://github.com/Gamrix	2022-07-15 21:57:37 +00:00
goldenxuett	42ee1608d3	[JIT] Add special cases batch_norm, instance_norm and dropout for SchemaInfo (#81007 ) - Added special cases for detach in is_non_deterministic() check and batch_norm and instance_norm in is_mutable() check in SchemaInfo(). - Added tests for the above special cases for detach, batch_norm and instance_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81007 Approved by: https://github.com/davidberard98	2022-07-15 04:52:02 +00:00
goldenxuett	3b4964230e	[JIT] Add side effects checks for ops in SchemaInfo subclass (#81002 ) - Added has_side_effects method which returns whether a given op has side effects. Currently this is implemented with a hard-coded list of functions copied from ir.cpp in AliasDB, but this will eventually be implemented by returning with a given schema has the has_side_effects tag. - Tested in test_schema_info.cpp with both an op with side effects and an op without side effects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81002 Approved by: https://github.com/davidberard98	2022-07-13 00:39:30 +00:00
goldenxuett	14c28caed9	[JIT] Add determinism checks for ops in SchemaInfo subclass (#81000 ) - Added is_non_deterministic which returns whether a given op is non-deterministic. Currently this is implemented with a hard-coded list of non-deterministic functions copied from ir.cpp in AliasDB, but this will eventually be implemented by returning with a given schema has the non_deterministic tag. - Tested is_non_deterministic method with a deterministic op and a non deterministic op in test_schema_info.cpp Note that the case for op "aten::dropout(Tensor input, float p, bool train) -> Tensor" which is deterministic whenever "train=false" is not accounted for in this pr and will be fixed in a later pr. Currently "aten::dropout(Tensor input, float p, bool train) -> Tensor" is always considered nondeterministic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81000 Approved by: https://github.com/davidberard98	2022-07-13 00:35:42 +00:00
goldenxuett	50ba94f5cc	[JIT] Add aliasing checks in SchemaInfo with associated tests (#80984 ) - Created may_alias method in SchemaInfo to update the implementation of FunctionSchema::may_alias for aliasing cases due to inputs aliasing. - Created output_alias_map_ internal variable to check cases where outputs might alias due to inputs aliasing. This variable is updated in generateAliasMap(). - Added tests for various may_alias special cases (input - input, input - output, output - output) due to inputs aliasing causing other arguments to also alias. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80984 Approved by: https://github.com/davidberard98	2022-07-13 00:18:43 +00:00
goldenxuett	aa61fdb667	[JIT] Add argumentValue functions and is_mutable checks to SchemaInfo (#80972 ) - Created addArgumentValue/s methods in SchemaInfo to pass argument values into the subclass. These are used for more accurate mutation, aliasing and determinism checks which include special cases. - Added input_alias_map_ to keep track of which inputs alias each other. This is updated with the method generateAliasMap. - Implemented is_mutable methods in SchemaInfo which also give information based on argument values. For instance, if two inputs alias and one is mutable by the schema, then the other will also be mutable. - Tested Schema Info is_mutable implementation where inputs alias as mentioned above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80972 Approved by: https://github.com/davidberard98	2022-07-13 00:16:41 +00:00
goldenxuett	e3a870986e	[JIT] Add may_alias in FunctionSchema with associated tests (#80918 ) - Created may_alias method in FunctionSchema to publicize aliasing information about inputs and outputs of a schema. - Tested may_alias methods for basic functionality, exceptions, and wildcard functionality. Cases where elements of a container alias another argument will be handled with a new may_contain_alias method which will be created in a later pr Pull Request resolved: https://github.com/pytorch/pytorch/pull/80918 Approved by: https://github.com/davidberard98	2022-07-12 18:07:23 +00:00
goldenxuett	b4e342928b	[JIT] Add mutability checks in FunctionSchema and create SchemaInfo subclass (#80734 ) - Added overloads to is_mutable method in FunctionSchema to tell whether an argument at index is mutable or an argument with name is mutable. - Created SchemaInfo subclass of FunctionSchema with constructors from FunctionSchema and from const char* signature. - Tested is_mutable method overloads in new test_schema_info.cpp file. Note that this pr is used to set up SchemaInfo. Implementation for SchemaInfo will be addressed in later commits Differential Revision: [D37651384](https://our.internmc.facebook.com/intern/diff/D37651384) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80734 Approved by: https://github.com/davidberard98	2022-07-11 19:13:06 +00:00
soulitzer	516f3198d6	Fix retains grad behavior after in-place (#79996 ) See this doc: https://docs.google.com/document/d/1KiRdnoj6B4cI3yl017hTbCqcOGO1gWIpUf20sldipHM/edit# Two issues (1) regarding hooks in general and (2) regarding retains grad hooks are fixed, Python hooks, which rely on a different mechanism are not discussed here: - Hooks in cpp in general - (fixed) new hooks to registered to a newer version of the tensor no longer get applied to grad_fn associated with older version of the tensor when the first hook was ever registered - (unchanged) hooks registered to the older version of the tensor remain active on - Retains grad hooks - (fixed) now get moved to the latest grad_fn. NB: To the user, retains_grad is not considered hooks or expected to behave like hooks (which we consider properties of the grad_fn) vs retains_gradness which is a property of the tensor. - (not in this PR) Python hooks - (will fix) same issue as hooks in cpp where new hooks are being applied to grad_fn associated with the older version of the tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/79996 Approved by: https://github.com/albanD	2022-07-08 19:13:28 +00:00
Sergii Dymchenko	b0aaefb50f	Build example_allreduce only for GLOO (#81062 ) `example/allreduce.cpp` is GLOO-specific and will not compile with USE_GLOO=0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81062 Approved by: https://github.com/malfet	2022-07-08 02:25:54 +00:00
Nikolay Korovaiko	8389ccbcd8	reinstate size and shape returning symints (#79560 ) This PR redirects `size` and `.shape` to call `sym_sizes` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79560 Approved by: https://github.com/Chillee	2022-07-08 01:17:33 +00:00
Boyan Atanasov	b603860c1d	Back out "[profiling] Adding targets file for test_mobile_profiler" (#80789 ) Summary: Original commit changeset: 38314c83d223 Original Phabricator Diff: D37116110 (`c9aa74a37f`) Reviewed By: mcr229 Differential Revision: D37582906 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80789 Approved by: https://github.com/bochko, https://github.com/salilsdesai	2022-07-05 23:34:15 +00:00
Han Qi (qihqi)	c93ceef658	Wrap static initializers in ifdef (#80590 ) because, on iOS some projects has -Wglobal-constructors and it won't build. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/80590 Approved by: https://github.com/cccclai	2022-07-01 04:42:17 +00:00
Max Ren	c9aa74a37f	[profiling] Adding targets file for test_mobile_profiler (#80351 ) Summary: Testing for successful recording of backend events. Testing checks that the trace file successfully adds the memory recording from the backend at execute. The record in the trace file looks like: ``` { "ph": "i", "cat": "cpu_instant_event", "s": "t", "name": "[memory]", "pid": 847267, "tid": 847267, "ts": 1655333276408215, "args": { "Device Type": 0, "Device Id": -1, "Addr": 108370615407104, "Bytes": 16384, "Total Allocated": 16384, "Total Reserved": 49152 } } ``` Test Plan: ``` buck test //caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler Parsing buck files: finished in 1.6 sec Creating action graph: finished in 30.9 sec Downloaded 0/5 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 37.9 sec (100%) 25314/25314 jobs, 5/25314 updated Total time: 01:10.5 min More details at https://www.internalfb.com/intern/buck/build/ef1c4324-13d3-494e-bce7-8004047d5f89 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 17f300d4-9a78-4302-9e9e-d7ab79ba1ff0 Trace available for this run at /tmp/tpx-20220615-165413.567757-17f300d4-9a78-4302-9e9e-d7ab79ba1ff0/trace.log RemoteExecution session id: reSessionID-17f300d4-9a78-4302-9e9e-d7ab79ba1ff0-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7881299443250383 ✓ ListingSuccess: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler : 3 tests discovered (37.049) ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.Backend (0.402) ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.ModuleHierarchy (0.487) ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.BackendMemoryEvents (0.280) Summary Pass: 3 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7881299443250383 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users ``` Differential Revision: D37116110 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80351 Approved by: https://github.com/kimishpatel	2022-06-30 17:27:35 +00:00
Sergei Vorobev	a8b0988596	Fix //:module_test Conversion_MultiCUDA (#79926 ) Fixes #79871 Make `module.cpp` tests respect change that was made in #78436 (no int types in autograd). Note that there still a gap in Cmake test -- it's unclear why it didn't fail CI before. As far as I can tell it should be executed, because it's included here `79507d2a9d/test/cpp/api/CMakeLists.txt (L17)`:L17 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79926 Approved by: https://github.com/soulitzer	2022-06-21 23:32:18 +00:00
Nikolay Korovaiko	efc7343743	Revert "Revert "Put symint overloads on a different name"" (#79680 ) This relands https://github.com/pytorch/pytorch/pull/79281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79680 Approved by: https://github.com/malfet	2022-06-21 07:06:33 +00:00
Han Qi (qihqi)	fed12ff680	[BE][flatbuffer] Remove code duplications and refactor (#79184 ) Summary: Remove code dup in import.cpp / export_modules.cpp such that 1. Only one copy of switching logic (detect flatbuffer / is_flatbuffer); 2. Move detection of includeness of flatbuffer to runtime (so no more macros) This also reverts the dependency of import.cpp -> flatbuffer_loader.cpp to flatbuffer_loader.cpp -> import.cpp. Differential Revision: D36926217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79184 Approved by: https://github.com/zhxchen17	2022-06-20 16:37:38 +00:00
Nikita Shulga	4a4890cfb2	[BE] Use CamelCase for enum class members (#79772 ) Per many C++ code-style guides members(for [example](https://google.github.io/styleguide/cppguide.html#Enumerator_Names) ) members of `enum` should be CamelCased, and only defines should be ALL_CAPS Changes `MemOverlap`, `MemOverlapStatus` and `CmpEvalResult` enum values Also, `YES`, `NO`, `TRUE` and `FALSE` are often system defines Fixes among other things, current iOS build regression, see, which manifests as follows (see [this](`6e90572bb9`): ``` /Users/runner/work/pytorch/pytorch/aten/src/ATen/MemoryOverlap.h:19:29: error: expected identifier enum class MemOverlap { NO, YES, TOO_HARD }; ^ /Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator14.4.sdk/usr/include/objc/objc.h:89:13: note: expanded from macro 'YES' #define YES __objc_yes ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79772 Approved by: https://github.com/drisspg, https://github.com/kulinseth	2022-06-17 05:53:57 +00:00
PyTorch MergeBot	b9bb52d97b	Revert "Put symint overloads on a different name" This reverts commit `213a8fc992`. Reverted https://github.com/pytorch/pytorch/pull/79281 on behalf of https://github.com/bigfootjon due to Diff reverted internally	2022-06-15 17:15:21 +00:00
Nikolay Korovaiko	83e575c510	have a common interface to extract metadata from SizeNodes (#78088 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/78088 Approved by: https://github.com/JackCaoG, https://github.com/wconstab	2022-06-15 04:59:08 +00:00
John Clow	07a528cac7	Adding isDynamic Support to SizeNodes Pull Request resolved: https://github.com/pytorch/pytorch/pull/77917 Approved by: https://github.com/Krovatkin	2022-06-14 03:27:57 +00:00
David Berard	91a2e953e5	[JIT] Use signed integers in CalculatedNecessaryArgs x was underflowing: ``` size_t x = ... while (x >= 0) { x--; } ``` Changed the variables to ssize_t. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79331 Approved by: https://github.com/yuhc, https://github.com/tugsbayasgalan	2022-06-13 19:41:18 +00:00
Edward Z. Yang	213a8fc992	Put symint overloads on a different name Due to implicit conversion shenanigans, having both IntArrayRef and SymIntArrayRef overloads makes {} ambiguous. While we could fix this by making a single unified type that accepts all the overloads we want, an easier fix was to just push the SymIntArrayRef overload to its own name. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/79281 Approved by: https://github.com/suo	2022-06-12 14:36:39 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
Michael Andreas Dagitses	ab2ca95dd1	turn on -Werror=unused-variable in our Bazel CPU build Summary: We also fix any existing issues. Note that we only do this for the CPU build because nvcc is considered a C++ toolchain but it does not have the same flag support. Adding flags to the GPU build will cause nvcc errors. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79156 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-11 02:46:34 +00:00
Michael Andreas Dagitses	606b234336	turn on -Werror=unused-function in our Bazel CPU build Summary: We also fix any existing issues. Note that we only do this for the CPU build because nvcc is considered a C++ toolchain but it does not have the same flag support. Adding flags to the GPU build will cause nvcc errors. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-10 22:11:54 +00:00
PyTorch MergeBot	bcd7a20953	Revert "turn on -Werror=unused-function in our Bazel CPU build" This reverts commit `67d313a032`. Reverted https://github.com/pytorch/pytorch/pull/79154 on behalf of https://github.com/malfet due to Breaks bazel build: `67d313a032`	2022-06-10 20:43:03 +00:00
Michael Andreas Dagitses	67d313a032	turn on -Werror=unused-function in our Bazel CPU build Summary: We also fix any existing issues. Note that we only do this for the CPU build because nvcc is considered a C++ toolchain but it does not have the same flag support. Adding flags to the GPU build will cause nvcc errors. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-10 18:30:08 +00:00
Brian Hirsh	7b3a0ff87a	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-06-10 17:27:47 +00:00
Michael Andreas Dagitses	f96d96a7fc	turn on -Werror=type-limits in our Bazel CPU build Summary: We also fix any existing issues. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79139 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-10 10:04:08 +00:00
Nikita Shulga	3255ddeec9	Make `Wunused-local-typedef` a hard error (#77918 ) Only allow it for `libtorch_python` and tests Helps prevent regression like https://github.com/pytorch/pytorch/pull/76547#issuecomment-1132208232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77918 Approved by: https://github.com/osalpekar, https://github.com/seemethere	2022-06-09 18:14:01 +00:00
Mark Harfouche	221755cc71	Link BLAS privately (#78883 ) We've some users report that they are getting symbol collisions when linking to blas. I don't see a need to re-export the blas library symbols. I figured I would share here for other packagers to be able to benefit too. xref: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/116 xref: https://github.com/conda-forge/openblas-feedstock/issues/134 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78883 Approved by: https://github.com/ezyang	2022-06-09 17:02:06 +00:00
PyTorch MergeBot	4b82ef7928	Revert "Port `index.Tensor` to structured kernels." This reverts commit `cfd84125bd`. Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/zengk95 due to This is breaking mac trunk tests `cfd84125bd`	2022-06-08 20:16:10 +00:00
Brian Hirsh	cfd84125bd	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-06-08 18:17:52 +00:00
dzdang	a56f4e23b9	[quant][core][better-engineering] Rename files in quantized directory to conform with non-quantized countertpart filenames Summary: Names of analogous files in quantized directory (previously snake case) were inconsistent with their non-quantized filename counterparts (pascal case). This is the first of a series of PRs that changes all files in quantized (and sub-directories) dir to have pascal case. `aten/src/ATen/native/quantized/qconv_unpack.cpp` has not been renamed yet because (for reasons currently unknown) after making the name change, `import torch` produces the below error (`qlinear_unpack.cpp` renaming also seems to fail some phabricator CI tests for similar reasons). We suspect that these may be undefined errors and will revisit naming these files in a future PR. ``` terminate called after throwing an instance of 'c10::Error' what(): Type c10::intrusive_ptr<ConvPackedParamsBase<2> > could not be converted to any of the known types. Exception raised from operator() at ../aten/src/ATen/core/jit_type.h:1735 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x55 (0x7f26745c0c65 in /data/users/dzdang/pytorch/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xb1 (0x7f26745bdcd1 in /data/users/dzdang/pytorch/torch/lib/libc10.so) frame #2: <unknown function> + 0x1494e24 (0x7f2663b14e24 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #3: <unknown function> + 0xfed0bc (0x7f266366d0bc in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #4: c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x5a (0x7f266366d71a in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #5: c10::detail::infer_schema::make_function_schema(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x7b (0x7f266366e06b in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #6: <unknown function> + 0x1493f32 (0x7f2663b13f32 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #7: <unknown function> + 0xe227dd (0x7f26634a27dd in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #8: <unknown function> + 0x14e0a (0x7f268c934e0a in /lib64/ld-linux-x86-64.so.2) ..........................truncated............. ``` Test Plan: ``` python test/test_quantization.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/77037 Approved by: https://github.com/jerryzh168	2022-06-07 13:47:08 +00:00
PyTorch MergeBot	6a4997e66a	[Profiler] Weaken ordering check during post processing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78563 The profiler assembles a call hierarchy by replaying recorded events. There is an assert to ensure that the events form a well structured tree; however many of the inputs are from external sources and small differences (e.g. recording time in a lower precision) leads to traces which violate that assumption. For now this is acceptable; the post processing can handle resolving these descrepencies. As a result, I am relaxing the assert to only test event types where we expect the framework to be able to enforce these strong structural requirements. Differential Revision: [D36787787](https://our.internmc.facebook.com/intern/diff/D36787787/) Approved by: https://github.com/suo	2022-06-01 18:55:19 +00:00
PyTorch MergeBot	fca1f495c2	Revert "Port `index.Tensor` to structured kernels." This reverts commit `9fe6f1baf5`. Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/suo due to this broke master, see: `9fe6f1baf5`	2022-06-01 00:12:15 +00:00
PyTorch MergeBot	ceb93afe3f	Revert "Fix bug in flatbuffer deserialization" This reverts commit `7e72c96b10`. Reverted https://github.com/pytorch/pytorch/pull/78344 on behalf of https://github.com/tugsbayasgalan due to as we need to land it in fbcode asap	2022-05-31 23:34:04 +00:00
Brian Hirsh	9fe6f1baf5	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-05-31 22:15:20 +00:00
Tugsbayasgalan Manlaibaatar	7e72c96b10	Fix bug in flatbuffer deserialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/78344 Approved by: https://github.com/qihqi	2022-05-31 18:37:30 +00:00
Michael Suo	032d1ace1d	[ci] disable flaky MobileProfiler.Backend test This test is flaky, normally I'd disable using the disable bot but it doesn't support cpp. [skip ci] Pull Request resolved: https://github.com/pytorch/pytorch/pull/78320 Approved by: https://github.com/malfet	2022-05-26 03:22:55 +00:00
Shunting Zhang	26d9386f67	Make string serialization of C++ FunctionSchema consistent with torchgen.model.FunctionSchema Pull Request resolved: https://github.com/pytorch/pytorch/pull/77926 There is a discrepency between the string representation of C++ FunctionSchema and torchgen.model.FunctionSchema. The latter will not add parenthesis around the returned types if that a single item, but the C++ FunctionSchema always add the parenthesis. Make them consistent so we can convert one type to the other via its string representation and parse method. Differential Revision: [D36535924](https://our.internmc.facebook.com/intern/diff/D36535924/) Approved by: https://github.com/bdhirsh	2022-05-24 19:39:26 +00:00
John Clow	c82fb7a67f	Adding support for upper and lower bound functions in SSA Pull Request resolved: https://github.com/pytorch/pytorch/pull/77389 Approved by: https://github.com/eellison	2022-05-20 23:58:40 +00:00
Nikolay Korovaiko	df1f9b9840	Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#77756 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/77756 Approved by: https://github.com/desertfire	2022-05-20 05:39:03 +00:00
PyTorch MergeBot	e9d660c331	Revert "Revert "Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836 )""" This reverts commit `acf7136a52`. Reverted https://github.com/pytorch/pytorch/pull/77719 on behalf of https://github.com/suo	2022-05-18 05:06:50 +00:00
Edward Z. Yang	acf7136a52	Revert "Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836 )"" This reverts commit `c35bd8d423`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77719 Approved by: https://github.com/Chillee, https://github.com/malfet	2022-05-18 03:25:43 +00:00
PyTorch MergeBot	c35bd8d423	Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836 )" This reverts commit `fc4c3c9bc7`. Reverted https://github.com/pytorch/pytorch/pull/76836 on behalf of https://github.com/suo	2022-05-18 02:45:25 +00:00
Han Qi (qihqi)	3822a472ef	Python function to extract information on mobile::Module from flatbuffer (#77624 ) Summary: Includes following refactor: 1. common loading on operator validation that is dup'd in pickle and flatbuffer loader moved to function.h/cpp 2. Allow loading of a function without wiring operator. This function will be used to implement get_bundled_input and friends for flatbuffer. Test Plan: contbuild & OSS CI, see `69fa49f123` Reviewed By: cccclai Differential Revision: D36348549 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77624 Approved by: https://github.com/cccclai	2022-05-18 00:42:57 +00:00
Nikolay Korovaiko	fc4c3c9bc7	Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836 ) LTC Tensors now create real IR (SizeNode) for sym_sizes() in LTCTensorImpl.cpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76836 Approved by: https://github.com/ezyang	2022-05-18 00:40:42 +00:00
Michael Suo	7f1e331b34	Make SymInt constructor explicit Since we plan to have a bunch of code that is sensitive to whether or not a SymInt contains a symbolic shape or not, it seems like a bad idea to have an implicit constructor. For example, code like: ``` sizes_and_strides_.stride_at_unchecked(dim) = 0; ``` would sail through, and the `0` would get implicitly promoted to a SymInt. This is a tradeoff though: it makes code that handles `SymInt`s more clunky as `int64_t`s and integer literals need to be explicitly wrapped in `SymInt` before being used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77666 Approved by: https://github.com/ezyang	2022-05-17 22:28:35 +00:00
Bin Bao	25c6ebd12c	Revert "Revert "[LT] Codegen ReuseNode for supported ops"" Summary: Fixed a XLC build failure by generating an always-return-false default CanBeReused method. This reverts commit `3cade9d454`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77513 Approved by: https://github.com/alanwaketan	2022-05-16 20:14:42 +00:00
Wang, Eikan	e5a5cd149f	Simplify IfThenElse and CompareSelect within for-loop (#76793 ) Analyze the range to determine if a condition cannot be satisfied. Suppose the for-loop body contains `IfThenElse` or `CompareSelect` while the condition of the two statements depends on the for-loop index `Var`. In that case, we will analyze the range to check whether the condition could always be satisfied or not. If the condition is deterministic, simplify the logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76793 Approved by: https://github.com/huiguoo	2022-05-15 20:21:28 +00:00
PyTorch MergeBot	3cade9d454	Revert "[LT] Codegen ReuseNode for supported ops" This reverts commit `6066e5929f`. Reverted https://github.com/pytorch/pytorch/pull/76738 on behalf of https://github.com/malfet	2022-05-14 00:33:10 +00:00
Bin Bao	6066e5929f	[LT] Codegen ReuseNode for supported ops Summary: 1. Update the codegen script to add a TrieCache lookup (ReuseNode) before creating a new IR node. The following is an example generated code, ``` at::Tensor LazyNativeFunctions::add(const at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) { ... torch::lazy::NodePtr node = torch::lazy::ReuseNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha); if (!node) { auto out_meta = at::meta::add(self, other, alpha); std::vector<Shape> shapes{Shape(out_meta.scalar_type(), out_meta.sizes().vec())}; TORCH_INTERNAL_ASSERT(shapes.size() == 1); if(symbolicShapeEnabled()){ std::vector<jit::IValue> inputs = { self, other, alpha }; char* schema_str = "aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor"; applySymbolicShapesOnLT(schema_str, inputs, shapes); } node = torch::lazy::MakeNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha, std::move(shapes)); CacheNode(node); } ... } ``` 2. TrieCache lookup depends on each IR node subclass to provide its own comparison function. The following is an example generated code, ``` bool CanBeReused(const torch::lazy::Value& self, const torch::lazy::Value& other, const torch::lazy::Value& alpha) const { size_t i = 0; return (operand(i++) == self && operand(i++) == other && operand(i++) == alpha); } ``` 3. DeviceData is specially handled. 4. Non-codegen op changes are coming a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76738 Approved by: https://github.com/JackCaoG, https://github.com/wconstab	2022-05-13 19:13:58 +00:00
yanbing-j	4f82f439d1	Enable BFloat16 ELU, SELU and CELU in CPU path (#62546 ) Enable BFloat16 ELU, SELU and CELU in CPU path. SELU and CELU will call ELU implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62546 Approved by: https://github.com/frank-wei	2022-05-12 16:56:57 +00:00
Xiang Gao	cc9d0f309e	lshift and rshift stop support floating types (#77146 ) Fixes #74358 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77146 Approved by: https://github.com/ngimel	2022-05-11 22:29:30 +00:00
Bin Bao	8f5cdc6d5d	Revert "Revert "[LT] Store OpKind for each IR subclass in a static field"" Summary: Re-land https://github.com/pytorch/pytorch/pull/76711 by fixing internal build errors. Generate class-level opkind as a static method instead of a static member. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77102 Approved by: https://github.com/wconstab, https://github.com/JackCaoG, https://github.com/antoniojkim	2022-05-11 12:27:05 +00:00
John Clow	26e2936edc	[JIT SSA] Added testing for the Cat Op in LazyTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/76552 Approved by: https://github.com/Krovatkin	2022-05-09 22:11:14 +00:00
PyTorch MergeBot	7eaf4780ba	Revert "[LT] Store OpKind for each IR subclass in a static field" This reverts commit `ac37ddc795`. Reverted https://github.com/pytorch/pytorch/pull/76711 on behalf of https://github.com/malfet	2022-05-09 20:50:09 +00:00
Fuqiang Zhang	bd573389f6	[Bootcamp]Add option for flatbuffer loader to copy memory to individual tensors (#76986 ) Summary: Add option for flatbuffer loader to copy memory to individual tensors to allow free memeory without waiting for all tensor runs completed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76986 Approved by: https://github.com/qihqi	2022-05-09 17:29:30 +00:00
Bin Bao	ac37ddc795	[LT] Store OpKind for each IR subclass in a static field Summary: Currently OpKind is stored as an object field called op_ for each IR node, and one usage of op_ is to avoid dynamic_cast in NodeCast when we need to downcast a base-node pointer into a concrete sub-node pointer. As a result, we need to construct and pass in an op when downcasting nodes, and this becomes quite anonnying when we start to implement the trie-based IR node reusing. More importantly, the op for each subclass should be unique for that subclass and thus making it a const static field is a more logical design. In this PR, we still keep the object-level op_ for easier XLA adoption. As furture work, we can come back to remove op_, make the op() method virtual, and get rid of OpKind in all the node constructors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76711 Approved by: https://github.com/wconstab, https://github.com/JackCaoG	2022-05-06 19:14:46 +00:00
David Berard	6c615a21a0	[NVFuser] prep for on-by-default 1. fix tests that expected nvfuser off-by-default behavior 2. skip nvfuser if getExecutorMode() == false Pull Request resolved: https://github.com/pytorch/pytorch/pull/76937 Approved by: https://github.com/eellison	2022-05-06 18:18:53 +00:00
Bin Bao	f05710dd40	[LT] Add a trie data structure for caching IR nodes Summary: TrieCache provides a way to look up an IR node before we actually create it. If the lookup hits in TrieCache, we reuse the existing node and move the current pointer in TrieCache to point to that node; if the lookup misses, we create a new node and insert it into TrieCache. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76542 Approved by: https://github.com/wconstab, https://github.com/JackCaoG	2022-05-04 23:48:03 +00:00
Wang, Eikan	429a80dded	[NNC] Lowering function generates the output buffer with the specified stride (#76529 ) Summary: Pass stride information to lowering function to generate the output bufer with proper memory layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76529 Reviewed By: ZolotukhinM Differential Revision: D36116712 Pulled By: IvanKobzarev fbshipit-source-id: d3901f756b3710ecce172d6db3ecb0b7c12fb929 (cherry picked from commit b6cd53c91c01db36ea0e99167dc0ce0ae1d3aa23)	2022-05-04 20:04:22 +00:00
Bin Bao	f8a4780eb2	[LT] Move MakeNode into ir_builder.h Summary: Move MakeNode into ir_builder.h to avoid circular header reference later when introducing a trie cache for IR node lookup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76482 Approved by: https://github.com/wconstab	2022-05-03 14:53:19 +00:00
Elias Ellison	e5a55af305	Reland reland Reland of https://github.com/pytorch/pytorch/pull/76397 and https://github.com/pytorch/pytorch/pull/76493 This time I'll get it right 😢 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76539 Approved by: https://github.com/davidberard98, https://github.com/osalpekar	2022-04-28 20:41:55 +00:00
PyTorch MergeBot	a5bc02aeb2	Revert "[JIT] Register decomp reland" This reverts commit `81b9cb741c`. Reverted https://github.com/pytorch/pytorch/pull/76397 on behalf of https://github.com/osalpekar	2022-04-28 03:33:29 +00:00
Antonio Kim	f3f327e103	Decouple LTC from TS Backend using Lazy IR Builder Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710 IR builder class introduced to decouple the explicit usage of `TsNode` in core lazy tensors. Requires https://github.com/pytorch/pytorch/pull/75324 to be merged in first. Background - there are ~ 5 special ops used in lazy core but defined as :public {Backend}Node. (DeviceData, Expand, Scalar...) - we currently require all nodes derive from {Backend}Node, so that backends can make this assumption safely - it is hard to have shared 'IR classes' in core/ because they depend on 'Node' Motivation 1. avoid copy-paste of "special" node classes for each backend 2. in general decouple and remove all dependencies that LTC has on the TS backend Summary of changes - new 'IRBuilder' interface that knows how to make 5 special ops - move 'special' node classes to `ts_backend/` - implement TSIRBuilder that makes the special TS Nodes - new backend interface API to get the IRBuilder - update core code to call the builder CC: @wconstab @JackCaoG @henrytwo Partially Fixes #74628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75433 Approved by: https://github.com/wconstab	2022-04-28 02:07:02 +00:00
Jiewen Tan	a28b132bc2	Revert D35860266: [pytorch][PR] Update torch::lazy::BackendDevice to have a new default ordinal Test Plan: revert-hammer Differential Revision: D35860266 (`f9d07ae644`) Original commit changeset: 554ebe16a068 Original Phabricator Diff: D35860266 (`f9d07ae644`) fbshipit-source-id: 325c54aa2e87e51134115213352b3d33a81b7edf (cherry picked from commit bbd74bf34a534d1b87aadff9790038e3dbbfa9c8)	2022-04-27 18:11:24 +00:00
Elias Ellison	81b9cb741c	[JIT] Register decomp reland Reland of https://github.com/pytorch/pytorch/pull/76252 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76397 Approved by: https://github.com/davidberard98	2022-04-26 23:17:18 +00:00
PyTorch MergeBot	2d72cb3373	Revert "[JIT] Allow registering Decompositions" This reverts commit `d9f0774f98`. Reverted https://github.com/pytorch/pytorch/pull/76252 on behalf of https://github.com/zengk95	2022-04-26 04:47:05 +00:00
Elias Ellison	d9f0774f98	[JIT] Allow registering Decompositions - Allow registering custom decompositions - Add easier API for invoking decompositions - Shorten API names (no users yet) I am doing these as one pr because they are fairly short/simple and because github first does not support ghstack yet. cc @Chillee @zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76252 Approved by: https://github.com/davidberard98	2022-04-26 03:00:35 +00:00
Nikolay Korovaiko	bb60cac25a	E2E SymInt example narrow_copy This roughly corresponds to Goal 3.2 in https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8YLw-jxEw/edit# Namely: It adds the following: * SymbolicIntNode interface * LazySymbolicIntNode implementation * Lazy `narrow_copy` implementation * Need add support for SymInt in codegen * Test (below) ```cpp TEST(LazyDynamicOpsTest, NarrowCopy) { auto x = torch::rand({5, 10, 10}).to(kLazy); const size_t Y_DIM = 3; const size_t X_DIM_INDEX = 2; auto y = torch::rand({Y_DIM}).to(kLazy); auto ly = torch::lazy::TryGetLtcTensor(y); auto dim_node = MakeNode<SizeNode>(ly->GetIrValue(), 0); auto lmn = new torch::lazy::SymbolicIntNode(dim_node); auto z = x.narrow_copy(X_DIM_INDEX, 0, lmn->toSymInt()); AllClose(z.cpu(), x.cpu().narrow_copy(X_DIM_INDEX, 0, Y_DIM)); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75759 Approved by: https://github.com/wconstab	2022-04-26 02:40:27 +00:00
Wonjoo Lee	f9d07ae644	Update torch::lazy::BackendDevice to have a new default ordinal (#76264 ) Summary: Fixes https://github.com/pytorch/xla/issues/3490. Updates `torch::lazy::BackendDevice` with changes below: 1. Remove the no-op string constructor. 2. Update default ordinal to `-1`. 3. Add a `is_valid` function to check if `ordinal` is valid/non-default (`ordinal >= 0`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/76264 Reviewed By: mrshenli Differential Revision: D35860266 Pulled By: alanwaketan fbshipit-source-id: 554ebe16a0683d37b00270c4f35163bf690bfe28 (cherry picked from commit b941d10e8545dfecfb34e4d5c24a29a1cc49bc4b)	2022-04-25 23:57:18 +00:00
zengk95	1d55518198	Revert "[nnc] Strides to Tensor (#72962 )" This reverts commit `939060925f`. Fixes https://github.com/pytorch/vision/issues/5873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76332 Approved by: https://github.com/seemethere	2022-04-25 19:50:00 +00:00
Ivan Kobzarev	939060925f	[nnc] Strides to Tensor (#72962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72962 Test Plan: Imported from OSS Reviewed By: ZolotukhinM, cpuhrsch Differential Revision: D34589306 Pulled By: IvanKobzarev fbshipit-source-id: ecee5249760ecc0c8b2edb1842b90218899bc944 (cherry picked from commit 9e310c4c67389da30da89126d838ffe3864aba6f)	2022-04-23 19:35:15 +00:00
Prem	7557407653	Added directory check before saving in C++ API Fixes #75177 Couldn't find any utility method to get directory name in pytorch repo, hence creating a function for that. Let me know if a new function is not needed. I also referred [this](https://github.com/pytorch/pytorch/blob/master/c10/test/util/tempfile_test.cpp#L15) for directory check. Also I am using TORCH_CHECK to show the error. This is highly verbose with the entire stack visible. Is there any alternative for the same so that it is easier to read? This could happen a frequently, so small and concise error would be more helpful here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75681 Approved by: https://github.com/albanD	2022-04-22 20:04:41 +00:00
Wang, Eikan	ef0873327e	[NNC] Add utility functions to check channels-last contiguous (#75938 ) Summary: The `Buf` uses `std::vector<ExprHandle>` to represent its strides. The `ExprHandle` could be an immediate value or a mathematical expression with variables involved both for the static shape and dynamic shape. So it is hard to directly deduce the channels-last contiguous layout based on the numerical calculation. Hence, the utility functions of this PR are based on the pattern match to check whether the `Buf` is channels-last contiguous. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75938 Reviewed By: cpuhrsch Differential Revision: D35724091 Pulled By: ZolotukhinM fbshipit-source-id: f79ae21749d0aad8601f0434b52df88602ff09bf (cherry picked from commit 3712bbbe4bea57c5c1abe1eafde4b8778e13e0c4)	2022-04-22 06:42:39 -07:00
Antonio Kim	2c2c13d21b	Decouple Lazy Node Shape Cache (#75324 ) Summary: Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710 Move shape cache implementation to the backend interface. Also, clean up some of the hashing logic in the base node class. CC: wconstab JackCaoG henrytwo Partially Fixes https://github.com/pytorch/pytorch/issues/74628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75324 Reviewed By: anjali411 Differential Revision: D35730823 Pulled By: wconstab fbshipit-source-id: cf6fa326319b9324e5f422a78817b6fb5bf7e9b8 (cherry picked from commit faec5043df56639e2fd23de2d91ae796e4f3df70)	2022-04-21 17:27:05 -07:00
Nikolay Korovaiko	69e048b090	List of SymInt rebase on master Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75115 Approved by: https://github.com/ezyang	2022-04-20 02:09:55 +00:00
Elias Ellison	f65eb09d6b	[JIT] Move Shape Function definition to python Moves jit shape function registration to python. Like jit decompositions, a script must be run after adding new definitions which serializes them in a c++ file. This was a request so that torch-mlir could define functions in python and upstream their shape functions. cc @silvasean @makslevental Pull Request resolved: https://github.com/pytorch/pytorch/pull/75546 Approved by: https://github.com/davidberard98	2022-04-19 20:59:44 +00:00
Taylor Robie	a5e338a826	[RecordFunction] More effecient machinery to determine which callbacks to run. (#75807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75807 There is a tension in RecordFunction between two use cases: 1) In the normal eager path we don't run any callbacks, so we need to bail out of the profiling path as soon as possible to minimize eager overhead. 2) When profiling we want to determine which callbacks to run as efficiently as possible to minimize instrumentation overhead. The confounding factor in all of this is sampling callbacks because they change which callbacks will run on each call, even in steady state operation. This has traditionally been handled with a two stage procedure: first we flip a coin to determine if a sampled callback might run. If false (which it usually is), do nothing. This solves (1). If true, check to see if we need to build the full callback set or if it was a false positive. This procedure has two negative effects: * It forces us to rebuild the set of callbacks to run on every step when profiling * It leaks the sampling abstraction, requiring other parts of the code to bump certain values and forces RecordFunction to lazily initialize. This change introduces a multi-level cache which can (in the common case) quickly determine which callbacks will run, rather than if callbacks might run. This means that rather than call `shouldRunRecordFunction`, we can simply get the callbacks for an invocation and check if they are empty. (And completely removes the pre-sampling heuristic.) Another major benefit of the new cache structure is that it allows thread-safe registration and unregistration of global callbacks. It's worth briefly discussing how this maintains eager performance. In the standard eager case (only sampling callbacks registered) the cache first checks that the global callbacks haven't changed (atomic read), decrements a counter to see if a sampling callback fired, and then returns the active callbacks which is simply a SmallVector of pointer pairs and a couple POD values (scope, needs inputs/outputs/ids). The biggest cost according to perf is the SmallVector logic; we could consider adopting a hard limit on active callbacks; more than half a dozen callbacks running in a single step would be quite a lot. But the total cost relative to `PYTORCH_DISABLE_PER_OP_PROFILING` is only ~10ns, so debatable if it's worth it to switch to `std::array`. The primary change is in `record_function.cpp`, which has a more detailed description of the new cache structure. `record_function.h` has some minor changes to align with the new calling convention and the remaining files are simply changes to the call sites. Future work: * RecordFunction no longer needs to be lazily initialized. * We can deprecate the disable/reenable APIs, since we can not safely add and remove global callbacks. Test Plan: I tested eager mode performance using the overhead benchmark and found that the non-profiled path was unaffected. However the no-op observer dropped from 0.41us to 0.37us (0.25us if no observers are active) which is about 1/3rd reduction in the cost of the callback selection machinery. I also added several C++ unit tests, as the core RecordFunction machinery (especially sampling) was largely untested. Reviewed By: swolchok, davidberard98 Differential Revision: D35276158 fbshipit-source-id: 35135f444724fba4eb97c0ae7f3f710f0f9016fd (cherry picked from commit 9e359b87422c18f2a195185f32e7e85c82f956fd)	2022-04-19 20:46:16 +00:00
Han Qi	b34b192d6b	Reland "Make debug_pkl smaller by only emitting unique traces." (#73368 ) Summary: ## Original commit message: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368 debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings. Since many SourceRange shares the same source, the string for trace can be deduped. The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression). The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup. To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction. Test Plan: ## Original Test plan unit test Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents: ``` [qihan@devvm5585.vll0 ~]$ du archive -h 4.0K archive/xl_model_weights 3.7M archive/extra 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform 8.0K archive/code/__torch__/caffe2/torch/fb 8.0K archive/code/__torch__/caffe2/torch 8.0K archive/code/__torch__/caffe2 20M archive/code/__torch__/torch/fx/graph_module 20M archive/code/__torch__/torch/fx 8.0K archive/code/__torch__/torch/classes 20M archive/code/__torch__/torch 20M archive/code/__torch__ 20M archive/code 2.7M archive/constants 35M archive [qihan@devvm5585.vll0 ~]$ du resaved -h 4.0K resaved/extra 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform 8.0K resaved/code/__torch__/caffe2/torch/fb 8.0K resaved/code/__torch__/caffe2/torch 8.0K resaved/code/__torch__/caffe2 1.3M resaved/code/__torch__/torch/fx/graph_module 1.3M resaved/code/__torch__/torch/fx 8.0K resaved/code/__torch__/torch/classes 1.4M resaved/code/__torch__/torch 1.4M resaved/code/__torch__ 1.4M resaved/code 2.7M resaved/constants 13M resaved [qihan@devvm5585.vll0 ~]$ ``` ## Additional test: `buck test mode/dev-tsan //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --exact 'caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.to'` passes test jest.fbios.startup_cold_start.local.simulator f333356873 - Differential Revision: D35196883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74869 Approved by: https://github.com/gmagogsfm	2022-04-18 22:34:21 +00:00
Han Qi	7d5c07830d	Add upgrader related logic to flatbuffer (#71451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71451 title Test Plan: unittest Reviewed By: tugsbayasgalan Differential Revision: D33593056 fbshipit-source-id: c48d6ad50e6e2f757b68525dfe07693711b95840 (cherry picked from commit 8e09e20c1dafcdbdb45c2d1574da68a32e54a3a5)	2022-04-17 18:51:23 +00:00
Nikita Shulga	fe8eff3711	Revert "Add upgrader related logic to flatbuffer" This reverts commit `dfae96171a`.	2022-04-17 11:38:59 -07:00
Han Qi	dfae96171a	Add upgrader related logic to flatbuffer Summary: title Test Plan: unittest Differential Revision: D33593056 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71451 Approved by: https://github.com/tugsbayasgalan	2022-04-16 02:04:48 +00:00
Raghavan Raman	c2d5f6a5a4	[nnc] Update bounds overlap analysis to identify non-overlaps even with symbolic bounds Pull Request resolved: https://github.com/pytorch/pytorch/pull/74658 Approved by: https://github.com/ZolotukhinM	2022-04-14 20:24:03 +00:00
Raghavan Raman	d8ad1a579f	[nnc] Fuse loops that have variable bounds Pull Request resolved: https://github.com/pytorch/pytorch/pull/74346 Approved by: https://github.com/ZolotukhinM	2022-04-14 20:24:03 +00:00
Jiewen Tan	ab0d9b18e9	[LT] Support Tensor.is_alias_of Summary: Tensor.is_alias_of relies on Storage to perform. However, LTCTensorImpl was not implemented with that in mind. This commit adds a fake storage to LazyTensor as a marker to mark LazyTensors that point to the same storage. The reason why it's not done at LTCTensorImpl is that LazyTensor maintains the view ops/alias logic in LazyTensor class instead of relying on TensorImpl to do the check. Test Plan: ./build/bin/test_lazy --gtest_filter=LazyOpsTest.IsAliasOf Pull Request resolved: https://github.com/pytorch/pytorch/pull/75246 Approved by: https://github.com/bdhirsh	2022-04-14 07:28:03 +00:00
Nikolay Korovaiko	ce842f43f2	Relanding shape cache (75400) (#75710 ) Summary: https://github.com/pytorch/pytorch/pull/75400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75710 Reviewed By: malfet Differential Revision: D35598920 Pulled By: Krovatkin fbshipit-source-id: 2bbbb3d0c24214b5dbb4ca605e7daa94671f96b0 (cherry picked from commit 572f2f9df5bfd73cd7b83536f619bc86d820ccd8)	2022-04-13 17:17:30 +00:00
PyTorch MergeBot	db1801099b	Revert "Relanding shape cache (75400)" This reverts commit `89486821ed`. Reverted https://github.com/pytorch/pytorch/pull/75710 on behalf of https://github.com/malfet	2022-04-13 17:14:38 +00:00
Nikolay Korovaiko	89486821ed	Relanding shape cache (75400) https://github.com/pytorch/pytorch/pull/75400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75710 Approved by: https://github.com/malfet	2022-04-13 07:28:32 +00:00
PyTorch MergeBot	c274f66268	Revert "Adding Caching of calculated Symbolic Shapes" This reverts commit `9a7bfaa929`. Reverted https://github.com/pytorch/pytorch/pull/75400 on behalf of https://github.com/mehtanirav	2022-04-12 21:53:31 +00:00
John Clow	9a7bfaa929	Adding Caching of calculated Symbolic Shapes Pull Request resolved: https://github.com/pytorch/pytorch/pull/75400 Approved by: https://github.com/eellison	2022-04-12 11:19:58 +00:00
Pavithran Ramachandran	6402e62454	Refractor flatbuffer jit code Pull Request resolved: https://github.com/pytorch/pytorch/pull/75239 Refractor flatbuffer_serializer to move JIT related code to a separate file . Differential Revision: [D35301020](https://our.internmc.facebook.com/intern/diff/D35301020/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35301020/)! Approved by: https://github.com/iseeyuan	2022-04-11 23:41:48 +00:00
John Clow	f281d83d77	Moving Remove Tensor Type Specializations to after custom passes This is to allow for Intel folks to use type information in their custom passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71748 Approved by: https://github.com/eellison	2022-04-11 22:12:01 +00:00
Yulv-git	ac2d2e3a3d	Fix some typos. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561 Approved by: https://github.com/albanD	2022-04-11 21:55:59 +00:00
Nikita Shulga	80ea6955af	Add cuda-11.3+clang9 build workflow (take 2) To be able to detect unused captures in GPU code lambdas (as gcc does not support this diagnostic) Remove unused opts lambda capture in `ProcessGroupMPI.cpp` and `Distributions.cu` Fix sign-compare in nvfuser benchmark and ignore signed unsigned comparison in nvfuser tests Fixes https://github.com/pytorch/pytorch/issues/75475 by aliasing CMAKE_CUDA_HOST_COMPILER to C_COMPILER when clang is used Pull Request resolved: https://github.com/pytorch/pytorch/pull/75293 Approved by: https://github.com/atalman, https://github.com/seemethere	2022-04-11 17:13:01 +00:00
PyTorch MergeBot	8fe43d76d5	Revert "Add cuda-11.3+clang9 build workflow" This reverts commit `709fcc862e`. Reverted https://github.com/pytorch/pytorch/pull/75293 on behalf of https://github.com/janeyx99	2022-04-11 15:24:59 +00:00
Nikita Shulga	709fcc862e	Add cuda-11.3+clang9 build workflow To be able to detect unused captures in GPU code lambdas (as gcc does not support this diagnostic) Remove unused opts lambda capture in `ProcessGroupMPI.cpp` and `Distributions.cu` Fix sign-compare in nvfuser benchmark and ignore signed unsigned comparison in nvfuser tests Fixes https://github.com/pytorch/pytorch/issues/75475 by aliasing CMAKE_CUDA_HOST_COMPILER to C_COMPILER when clang is used Pull Request resolved: https://github.com/pytorch/pytorch/pull/75293 Approved by: https://github.com/atalman, https://github.com/seemethere	2022-04-11 14:10:57 +00:00
Jiewen Tan	dc37090ec5	[LT] Support diagonal op (#75230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75230 Op diagonal is a view op which we can't code-gen yet. Therefore, support it by making hand-written IR construction and lowering. Test Plan: ./build/bin/test_lazy --gtest_filter=LazyOpsTest.TestDiagonal* Reviewed By: wconstab Differential Revision: D35378316 Pulled By: alanwaketan fbshipit-source-id: 7958d00107aef20ac37aabcf2868346240977530 (cherry picked from commit 84155528fce484627c9688cfd92fd4aeb68219e5)	2022-04-08 19:49:42 +00:00
Nikolay Korovaiko	4a85145bbd	Ansley's rebase of DimensionNode onto master (#75352 ) Summary: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75352 Reviewed By: wconstab Differential Revision: D35455859 Pulled By: Krovatkin fbshipit-source-id: e24c81d63dc66d03b752cc8de5cb551d84b003ac (cherry picked from commit 4ad371cb4cc88860ce8ec398d82083f6759e3fcf)	2022-04-08 17:22:56 +00:00
John Clow	f1db3e465a	Adding integration of SSA into LazyTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/75050 Approved by: https://github.com/Krovatkin	2022-04-07 19:49:41 +00:00
Pavithran Ramachandran	3001bda304	[PyTorchEdge] Backport from v9 flatbuffer to v8 pickle (#75201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75201 In this diff: 1. Bump supported version to 9, which will serve as a placeholder for upcoming version bump to v9 for flatbuffer format migration. 2. Implements backport from v9 flatbuffer file to v8 pickle file. ghstack-source-id: 153225189 (Note: this ignores all push blocking failures!) Test Plan: fb: ``` cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions Parsing buck files: finished in 0.7 sec Downloaded 0/25 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 20.7 sec (100%) 21783/21783 jobs, 5/21783 updated cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.FlatbufferBackPortTest Parsing buck files: finished in 0.7 sec Building: finished in 4.5 sec (100%) 12972/53298 jobs, 0/53298 updated Total time: 5.3 sec More details at https://www.internalfb.com/intern/buck/build/b658d597-d358-4293-97cb-28e7612b96e8 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 35d5542d-6ee3-4c28-be10-1d822c7a6fef Trace available for this run at /tmp/tpx-20220308-090347.891303-35d5542d-6ee3-4c28-be10-1d822c7a6fef/trace.log RemoteExecution session id: reSessionID-35d5542d-6ee3-4c28-be10-1d822c7a6fef-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 490 tests discovered (22.838) ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.FlatbufferBackPortTest (0.289) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000 ``` Reviewed By: iseeyuan Differential Revision: D34702597 fbshipit-source-id: 5c203c29d13360d7934ce6e57557739e7038c05e (cherry picked from commit 6189e08a2bd968fdab636f77cb6bd73d6c36beb2)	2022-04-07 19:43:57 +00:00
Wang, Eikan	252e1ccce6	Enable TE fuser to support user defined operator (#73073 ) Summary: PyTorch supports registering a custom operator by `TORCH_LIBRARY_FRAGMENT` / `TORCH_LIBRARY_IMPL` and `torch::jit::tensorexpr::getNNCLoweringRegistry` could insert a custom operator. But the te fuser passes conditional check does not support custom operator. The `isSupported` of `tensorexpr_fuser` checks whether the `Node` is `get_tensorexpr_elementwise_set()`, `supported_non_eltwise_set()`, `supported_misc_set` and `supported_reduction_set`. If a custom operator needs to be added to the TE fusion group, the checked will block it. Taking the RN50 as an example, we can speed up the model by fusing the convolution and consecutive element-wise operator into a custom operator. The framework overhead becomes non-negligible when the computation becomes more efficient, especially for the latency mode and the tiny models. If the TE fuser allows adding the custom operator to the fusion group, then the entire RN50 model could be fused by TE as a single operator/function consisting of "ExternalCalls" and TE-IR. This could significantly reduce framework overhead, which in turn improves RN50 E2E performance. The same goes for other models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73073 Reviewed By: pbelevich Differential Revision: D35453165 Pulled By: ZolotukhinM fbshipit-source-id: a764cf340b0b1e05fe230649cbe44f5786bdd37d (cherry picked from commit ee95aa4d36714540fbb216a338799e6a6bb966d5)	2022-04-07 04:36:39 +00:00
Martin Yuan	00c1e01ad0	Remove internal logic to handle bytecode version 3 (#57775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57775 The minimum supported bytecode version is updated from 3 to 4. We no longer support version 3 bytecode models. Why? * There are hacky codes in operator loading, that performs differently on one operator on the global bytecode version 3. Instead operator related metadata should be passed (for example, in #56845). To allow future development, we remove the hacky way first. * The bytecode version was bumped from 3 to 4 more than half a year ago. Since all the production models are all bumped to version 4, it's not practical to keep and maintain version 3. The risk to deprecate version 3 is low. Test Plan: Imported from OSS Reviewed By: raziel Differential Revision: D28270791 Pulled By: cccclai fbshipit-source-id: 70b1bd6352fdaae5f8d2173b81578d77018c8e44 (cherry picked from commit 3e930fa381cd01f3705116795c6426df992372fc)	2022-04-07 01:45:52 +00:00
Pavithran Ramachandran	f984e50f39	Extend jit::load to work on flatbuffer file; Take 2 (#75256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75256 ghstack-source-id: 153138970 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D35399581 fbshipit-source-id: dafe9d301009d3f70986ed92bfe06d160ab90ba0 (cherry picked from commit ccc860fd07946de5aae12bc179a0b8bbba83b997)	2022-04-06 17:54:01 +00:00
John Clow	26dcec152c	Added support for SSA for ops not in a JIT graph Pull Request resolved: https://github.com/pytorch/pytorch/pull/74340 Approved by: https://github.com/eellison	2022-04-06 01:45:37 +00:00
Antonio Kim	e1b4117e30	Move shape and operand definitions to base node (#75223 ) Summary: First stage of breaking up https://github.com/pytorch/pytorch/pull/74710 Moves the shape and operand definitions from `TsNode` to the base `Node` CC: wconstab JackCaoG henrytwo Partially Fixes https://github.com/pytorch/pytorch/issues/74628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75223 Reviewed By: zou3519 Differential Revision: D35410285 Pulled By: wconstab fbshipit-source-id: bb84d3fb636882cbe7e18af4b35ff2c0e22aaa58 (cherry picked from commit a4144c9a48379d8a9007cff845796608b597cce1)	2022-04-06 01:43:46 +00:00
Lu Fang	32e58c73c4	Back out "Extend jit::load to work on flatbuffer file" (#75244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75244 Original commit changeset: d653a5af662a Original Phabricator Diff: D35060736 (`d9d34922a0`) Test Plan: Model loading test, verified that D35060736 (`d9d34922a0`) will cause the torch::save => torch::load failure. Reviewed By: yinghai, jianyuh Differential Revision: D35387009 fbshipit-source-id: 9d176992d402d57779e2af3d905b3c1538335298 (cherry picked from commit 6c8cc0d3b8a88b15e35702d70e18bbae8aa4628a)	2022-04-05 09:55:04 +00:00
Nikita Shulga	81d765ef1f	Fix sign-compare violations in cpp tests Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: https://github.com/pytorch/pytorch/pull/75080 Approved by: https://github.com/atalman	2022-04-04 23:05:31 +00:00
Chen Lai	6efc5c1acf	Rewrite upgrader bytecode version from 3 to 4 (content unchanged) (#75120 ) Summary: update the upgrader models by hacking backport logic - copy everything in the model and only rewrite the bytecode version to 4 in D35265596 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75120 ghstack-source-id: 152823046 Test Plan: CI Reviewed By: qihqi Differential Revision: D35321154 fbshipit-source-id: 333158bd0fd9b4819b3b7cf47d80c285934adf3e (cherry picked from commit 74bb2da73a4d18f448b8486772643eac89eb759a)	2022-04-02 01:51:39 +00:00
Pavithran Ramachandran	d9d34922a0	Extend jit::load to work on flatbuffer file (#75022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75022 Extending torch::jit::load to read flatbuffer file ghstack-source-id: 152820697 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D35060736 fbshipit-source-id: d653a5af662a46107ff4fd70209fd2a0a4d40f20 (cherry picked from commit 109e14a54bd279011c8f9066e6c29e8e0b1fc4db)	2022-04-02 01:33:34 +00:00
Pavithran Ramachandran	7aaa75af05	Extending _get_bytecode_version to support flatbuffers format (#75021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75021 Extending `_get_bytecode_version` to support flatbuffers. ghstack-source-id: 152771695 (Note: this ignores all push blocking failures!) Test Plan: ``` ~/fbsource/xplat] cd ~/fbsource/xplat/ && buck test //xplat/caffe2:test_lite_interpreter Building: finished in 0.8 sec (100%) 327/327 jobs, 0/327 updated Total time: 0.9 sec Testing: finished in 06:59.5 min (85 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/caffe2:test_lite_interpreter PASS 412.3s 85 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_interpreter TESTS PASSED ``` Reviewed By: iseeyuan Differential Revision: D34900498 fbshipit-source-id: 65743076d43a933c5381ec128d0268f22c0a8441 (cherry picked from commit 457c76c7d1df6050b941c56a8198162e2e4a3388)	2022-04-01 15:05:37 +00:00
Will Constable	b9e535a64a	Add non-eager registration to dispatch autogen (#74557 ) Summary: Previously, the torchscript backend would be (partially) initialized at startup. - the dispatcher registrations would be registered, - but other backend components would not be initialized until explicitly calling the backend init function With this change, the torchscript backend is not initialized until its explicit initialization function is called. This enables external backends to register their own backend instead of the torchscript backend to the same (Lazy) key. Lands a change contributed by antoniojkim via lazy_tensor_staging branch (https://github.com/pytorch/pytorch/issues/73973) Pull Request resolved: https://github.com/pytorch/pytorch/pull/74557 Reviewed By: bdhirsh Differential Revision: D35051464 Pulled By: wconstab fbshipit-source-id: 5a8b0851293e394f49427d1416ee571a8881fe9f (cherry picked from commit ef745a4a2c8d1d7f9510541a20f1f40625ce29de)	2022-04-01 03:42:53 +00:00
Will Constable	14affba799	Fix ir_metadata Python frames func and remove dead code (#74979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74979 Reviewed By: alanwaketan Differential Revision: D35261641 Pulled By: wconstab fbshipit-source-id: e82b5f17d0043c4a3de72c16fb42fd02a85414fe (cherry picked from commit fc6c0a1654256871361a5ad08926bc39d74cd0c5)	2022-03-31 23:23:36 +00:00
Nikolay Korovaiko	5177f95d21	Introducing SymInt to Pytorch (for tracing size arithmetic) (master rebase) (#74861 ) Summary: This PR introduces `SymInt` type to Pytorch which will be used by LTC and AOTAutograd for tracing size arithmetic and tests. `SymInt` is a C++ union structure [int64_t, SymbolicIntNode*] that wraps around an int64_t field where the value of the field could be an index into a list of `shared_ptr<SymbolicIntNode>` or a real int. This PR doesn't add any support for actually tracing symbolic ints. i.e. data_ for now can only contain real ints. ``` Goal 1: just to show we can add a type to PyTorch core. (wraps int) LANDEABLE Finalize the naming - symint Want the name to be short Does invoke “size” - NO SInt/SymInt/SymbolicInt SInt could mean signed int sym_int or symint or SymInt (originally it was “int”; capitalized implies object semantics, whereas lowercase implies value semantics) JIT schema - symint C++ - symint ``` See more details here: https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8 (`d843f63f2a`)YLw-jxEw Pull Request resolved: https://github.com/pytorch/pytorch/pull/74861 Reviewed By: qihqi, ngimel Differential Revision: D35226230 Pulled By: Krovatkin fbshipit-source-id: 34acf342bd50fcaa4d8d5dd49c2fd6a98823a5b3 (cherry picked from commit 218643f63ef181cabb92d13a6e837eb64f2dda3c)	2022-03-31 21:59:59 +00:00
jjsjann123	873ced7cd0	Nvfuser code bump 030122 (#73627 ) Summary: Things changed in this PR that requires review: test/forward_backward_compatibility/check_forward_backward_compatibility.py Our previous function overload extension names were wrong and has been updated in this PR, hence the compatibility list updated. nvfuser code updates with bug fixes towards failures we encountered in OpInfoTests as well as failures reported by AOTAutograd team. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73627 Reviewed By: Chillee Differential Revision: D34765458 Pulled By: davidberard98 fbshipit-source-id: c81f3d6a1b723fb3a8ba419b7f82227f70440ca7 (cherry picked from commit b6a2c362c37051e44fac31687b2fe272f776551e)	2022-03-31 08:18:22 +00:00
Nikita Shulga	43313cbde3	Revert D34647822: [tensorexpr] Add support for aten::stack Test Plan: revert-hammer Differential Revision: D34647822 (`954c7e2a77`) Original commit changeset: 3b863c71886c Original Phabricator Diff: D34647822 (`954c7e2a77`) fbshipit-source-id: e9ce06c9c8d7caf0fbb2565f0d99035bad685793 (cherry picked from commit b2ff355e9dbaa4e940fb221254223984c3c8a215)	2022-03-31 04:25:43 +00:00
Nikita Shulga	320e5a8268	Revert D34808051: [tensorexpr] Enabled aten::stack in the fuser pass with static shapes Test Plan: revert-hammer Differential Revision: D34808051 Original commit changeset: 213e2ffdf87f Original Phabricator Diff: D34808051 fbshipit-source-id: b618daeb346f784e8ab9525040edcb4a30a39613 (cherry picked from commit e47b973cba5c95e9410f8aecdfd5619de6d4be7c)	2022-03-31 04:25:43 +00:00
Hui Guo	90c3699cc8	[tensorexpr] Enabled aten::stack in the fuser pass with static shapes (#74077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74077 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D34808051 Pulled By: huiguoo fbshipit-source-id: 213e2ffdf87fb1a74104037cea7ef25e4bfd4307 (cherry picked from commit ad9e84842e5b47eda845827d325b08ba361a8286)	2022-03-31 04:25:43 +00:00
Elias Ellison	2ef5611f31	Add comments for adding shape function and linting (#73570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570 Approved by: https://github.com/huiguoo Test Plan: contbuild & OSS CI, see `6d36bbde7e` Reviewed By: pbelevich Differential Revision: D35192688 Pulled By: atalman fbshipit-source-id: b12b80e6a6dd1adaa57a8facb6bb077989faa543 (cherry picked from commit e50478c02592597f12b8490ec5496f76c7d8b8cc)	2022-03-31 04:25:43 +00:00
Nikita Shulga	3036a0309d	[skip ci]Revert "Add comments for adding shape function and linting" This is a technical revert of `6d36bbde7e` to reconcile it with e50478c02592597f12b8490ec5496f76c7d8b8cc (which is the same + lint changes applied) Should be skipped during import	2022-03-30 21:21:28 -07:00
Hui Guo	954c7e2a77	[tensorexpr] Add support for aten::stack (#73801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73801 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D34647822 Pulled By: huiguoo fbshipit-source-id: 3b863c71886c7c6616b16f5d3313079714c8b82a (cherry picked from commit c71778cf6a5724d26b671bf3ee0478add24990e8)	2022-03-30 21:25:15 +00:00
Dave Bort	f82b2d4a82	[PyTorchEdge] Make _load_parameters() handle flatbuffer inputs (#74580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74580 Handle Flatbuffer-serialized parameters. Make `_load_parameters()` detect the input data format and use the correct deserializer to load the parameters. Also, rename `BytecodeDeserializer` to `IValueUnpickler` to make it clear that it unpickles an `IValue` and doesn't have anything to do with bytecode. ghstack-source-id: 152487890 Test Plan: New unit test shows a successful round trip from _save_parameters() to _load_parameters() using flatbuffers. ``` $ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated Total time: 0.6 sec Testing: finished in 0.5 sec (26 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer TESTS PASSED ``` Reviewed By: qihqi Differential Revision: D34488913 fbshipit-source-id: 8d2c0b895699f3b336115d33bf96d49cbf9245d2 (cherry picked from commit 319345deff260826197f8cdf5ac03071b412c72f)	2022-03-30 20:39:58 +00:00
Dave Bort	1659a267f9	[PyTorchEdge] Export flatbuffers from _save_parameters() (#74579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74579 Now that we can convert a module to a flatbuffer, update `_save_parameters()` to optionally write to that format. Also, rename the internal `ScriptModuleSerializer` class to `IValuePickler` to make it more clear that a) it's pickle-specific, and b) it serializes IValues, not Modules. ghstack-source-id: 152487889 Test Plan: New unit test shows that we can produce Flatbuffer-formatted output. ``` $ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated Total time: 0.6 sec Testing: finished in 0.5 sec (26 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer TESTS PASSED ``` A new test in later commit D34488913 tests the full round trip. Reviewed By: qihqi Differential Revision: D34408538 fbshipit-source-id: eea183c31b5e1b2b75a65f384d8a479223a4ae72 (cherry picked from commit de310a15422b65fb7e443f7005d287d9f5f586bc)	2022-03-30 20:39:58 +00:00
Elias Ellison	6d36bbde7e	Add comments for adding shape function and linting Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570 Approved by: https://github.com/huiguoo	2022-03-29 23:02:22 +00:00
Elias Ellison	9c4a63787b	Add api for changing function executor settings, hook up execution with decomposition registry (#74186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74186 Make the execution settings mutable on function_impl so that we can set it for running op decompositions. Add mapping to function objects and show example in test of executing op decompositions. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D34938125 Pulled By: eellison fbshipit-source-id: adf108b2f6c1bd166910c6d7b94245661d67ce0d (cherry picked from commit 9957e33803002d9e71abe4ff802769270b6960d3)	2022-03-29 18:38:52 +00:00
Elias Ellison	0ecf1add1b	Introduce function-local settings for executor, expose in c++ (#74012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74012 This allows setting an executor on a function. The first use case is use to decompositions in C++ without additional fusion passes etc which might not work with custom tensors like batched tensors/vmap. A subsequent use case might be taking advantage of invokees of JIT execution which guard on certain properties before invocation (such as complete shapes in AOT autograd, rank in lazy tensor). Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D34938124 Pulled By: eellison fbshipit-source-id: cf7a45416457942b872322cab47d871a8336bdb5 (cherry picked from commit 9c600eb9ad0f2173f003e511268e97584edae36d)	2022-03-29 18:38:52 +00:00
Elias Ellison	6694fdaccd	Clean up profiling mode and profiling executor strategy (#73875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875 Previously we had a few settings: - getExecutor - which toggled between Profiling Executor and Legacy - getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations) and then... - getProfilingMode - which would set PE to 0 specializtions. The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93. The tests here are failing but get fixed with the PR above it, so i'll squash for landing. Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D34938130 Pulled By: eellison fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b (cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)	2022-03-29 18:38:51 +00:00
Kurt Mohler	5375b2e994	Resolve `int[]?` arguments to new OptionalIntArrayRef class This PR uses the `OptionalArrayRef` template class that was drafted in #64084. Fixes #44409 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70864 Approved by: https://github.com/ezyang	2022-03-26 01:45:50 +00:00
Pavithran Ramachandran	fc2cf3d26f	Back out "Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support `only pickle` and `pickle + flatbuffer` for migration" (#74594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74594 Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default. Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration. Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer BEFORE: ```lang=mermaid graph TD; torch_core-->torch_mobile_deserialize; torch_mobile_core-->torch_mobile_deserialize; jit_module_saving-->torch_core; jit_module_saving-->torch_mobile_core; torch_mobile_deserialize-->caffe2_serialize; torch_mobile_deserialize-->torch_mobile_module; caffe2_serialize-->miniz; flatbuffer_loader-->mobile_bytecode; flatbuffer_serializer-->mobile_bytecode; mobile_bytecode-->flatbuffer_2.0; flatbuffer_loader-->torch_mobile_module; flatbuffer_serializer-->torch_mobile_module; ``` AFTER: ```lang=mermaid graph TD; torch_core-->torch_mobile_deserialize; torch_mobile_core-->torch_mobile_deserialize; jit_module_saving-->torch_core; jit_module_saving-->torch_mobile_core; torch_mobile_deserialize-->caffe2_serialize; torch_mobile_deserialize-->torch_mobile_module; caffe2_serialize-->miniz; flatbuffer_loader-->mobile_bytecode; flatbuffer_serializer-->mobile_bytecode; mobile_bytecode-->flatbuffer_2.0; torch_mobile_deserialize_pickle_and_flatbuffer-->\|new\| flatbuffer_loader; torch_mobile_deserialize_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize; torch_mobile_core_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize_pickle_and_flatbuffer; torch_core_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize_pickle_and_flatbuffer; jit_module_saving_pickle_and_flatbuffer-->\|new\| torch_core_pickle_and_flatbuffer; jit_module_saving_pickle_and_flatbuffer-->\|new\| torch_mobile_core_pickle_and_flatbuffer; flatbuffer_serializer-->torch_mobile_module; jit_module_saving_pickle_and_flatbuffer-->\|new\|jit_module_saving; jit_module_saving_pickle_and_flatbuffer-->\|new\|flatbuffer_serializer; flatbuffer_loader-->torch_mobile_module; ``` Original commit changeset: 780dfb6fd6ba Original Phabricator Diff: D34805092 (`284b2b7135`) ghstack-source-id: 152044801 (Note: this ignores all push blocking failures!) Test Plan: CI ``` ~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 //caffe2/test/cpp/jit:jit -- FlatbufferTest.ExtraFiles Parsing buck files: finished in 0.9 sec Building: finished in 5.3 sec (100%) 12992/54304 jobs, 0/54304 updated Total time: 6.2 sec More details at https://www.internalfb.com/intern/buck/build/2b387fff-f813-4cfa-b53f-eb2378630d4e BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d Trace available for this run at /tmp/tpx-20220323-134108.766518-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d/trace.log RemoteExecution session id: reSessionID-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 486 tests discovered (19.122) ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.187) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693 ``` Similar Build Deps Dags ``` [pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact \| pastry P486770901: https://www.internalfb.com/intern/paste/P486770901/ [pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact \| pastry P486771278: https://www.internalfb.com/intern/paste/P486771278/ ``` pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901 pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278 Reviewed By: iseeyuan Differential Revision: D35067157 fbshipit-source-id: 9044259c17a2e0da79bd6aedb28efbdfd57e23e0 (cherry picked from commit f738069ec3a72e79da56172741d027de514e9e5f)	2022-03-24 21:51:05 +00:00
Will Constable	3547f20872	Land remaining parts of Torchscript Lazy Tensor backend (#74111 ) Summary: Also enables bazel build to run lazy codegen. Bazel (oss) build feeds off the same filelists as cmake/buck (build_variables.bzl), so enabling it is easier than keeping it disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74111 Test Plan: Run CI and verify test_lazy_ops is running via OSS cmake builds Reviewed By: bdhirsh Differential Revision: D34772403 fbshipit-source-id: 8a63f58b9536e6ac1be530667932176ef2549496 (cherry picked from commit e807ffb1918853d10b924fdc24f85ee5b1a39021)	2022-03-22 23:14:03 +00:00
Nikita Shulga	c53b3ed20f	Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support `only pickle` and `pickle + flatbuffer` for migration Test Plan: revert-hammer Differential Revision: D34805092 (`284b2b7135`) Original commit changeset: 57f3fc81d68f Original Phabricator Diff: D34805092 (`284b2b7135`) fbshipit-source-id: 780dfb6fd6ba5f9348f24a2fb3c57971b7155541 (cherry picked from commit bebeb8b84e11c34cbde4857d0e1c291731a7c781)	2022-03-22 22:45:50 +00:00
Pavithran Ramachandran	284b2b7135	Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support `only pickle` and `pickle + flatbuffer` for migration (#74209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74209 Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default. Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration. Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer BEFORE: ```lang=mermaid graph TD; torch_core-->torch_mobile_deserialize; torch_mobile_core-->torch_mobile_deserialize; jit_module_saving-->torch_core; jit_module_saving-->torch_mobile_core; torch_mobile_deserialize-->caffe2_serialize; torch_mobile_deserialize-->torch_mobile_module; caffe2_serialize-->miniz; flatbuffer_loader-->mobile_bytecode; flatbuffer_serializer-->mobile_bytecode; mobile_bytecode-->flatbuffer_2.0; flatbuffer_loader-->torch_mobile_module; flatbuffer_serializer-->torch_mobile_module; ``` AFTER: ```lang=mermaid graph TD; torch_core-->torch_mobile_deserialize; torch_mobile_core-->torch_mobile_deserialize; jit_module_saving-->torch_core; jit_module_saving-->torch_mobile_core; torch_mobile_deserialize-->caffe2_serialize; torch_mobile_deserialize-->torch_mobile_module; caffe2_serialize-->miniz; flatbuffer_loader-->mobile_bytecode; flatbuffer_serializer-->mobile_bytecode; mobile_bytecode-->flatbuffer_2.0; torch_mobile_deserialize_pickle_and_flatbuffer-->\|new\| flatbuffer_loader; torch_mobile_deserialize_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize; torch_mobile_core_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize_pickle_and_flatbuffer; torch_core_pickle_and_flatbuffer-->\|new\| torch_mobile_deserialize_pickle_and_flatbuffer; jit_module_saving_pickle_and_flatbuffer-->\|new\| torch_core_pickle_and_flatbuffer; jit_module_saving_pickle_and_flatbuffer-->\|new\| torch_mobile_core_pickle_and_flatbuffer; flatbuffer_serializer-->torch_mobile_module; jit_module_saving_pickle_and_flatbuffer-->\|new\|jit_module_saving; jit_module_saving_pickle_and_flatbuffer-->\|new\|flatbuffer_serializer; flatbuffer_loader-->torch_mobile_module; ``` ghstack-source-id: 151744258 Test Plan: Similar Build Deps Dags ``` [pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact \| pastry P486770901: https://www.internalfb.com/intern/paste/P486770901/ [pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact \| pastry P486771278: https://www.internalfb.com/intern/paste/P486771278/ ``` pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901 pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278 Reviewed By: iseeyuan Differential Revision: D34805092 fbshipit-source-id: 57f3fc81d68fce941a050c35bd8e6f05951183b3 (cherry picked from commit 671ae4ed29e65b86ffe507a503548d3e86ab0ea4)	2022-03-22 20:00:53 +00:00
Han Qi	4b4f652f79	[3/5] Put JIT source inside flatbuffer (#74245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74245 title Test Plan: unittest Reviewed By: iseeyuan Differential Revision: D34881612 fbshipit-source-id: 7037982e9267ad72b86e91cd5f2d92426d71dd56 (cherry picked from commit 88f34eb55b2bee6ef8ef27188e075fa2b8767fdf)	2022-03-17 18:46:47 +00:00
Will Constable	d67a265881	Sync lazy_tensor_staging to master (#74311 ) Summary: This merges changes that have already been reviewed/landed onto lazy_tensor_staging branch. It combines changes from multiple PRs into one diff. updated from lazy_tensor_staging on 3/16 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74311 Test Plan: Run CI to ensure compilation on various platforms Run unit tests on lazy_tensor_staging branch with source version of all these diffs Reviewed By: desertfire Differential Revision: D34929235 fbshipit-source-id: babbc3bbeabc5b8107ee9284ed7765887a148622 (cherry picked from commit d91577a6557343ec536f6859e4808ec1a8a9b685)	2022-03-17 16:08:57 +00:00
Will Constable	44a8d4d998	Add lazy tensor unit tests, disabled (#74309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74309 Since the test file is large, it can be landed on its own and then switched on in the diff that actually builds lazy tensor code. Test Plan: verify CI passes Reviewed By: desertfire Differential Revision: D34928619 fbshipit-source-id: cd556155326f7fb55b3f29031f80bc36c936d565 (cherry picked from commit 60945adbefb6a8d19f89e330f8b344d076b13bfc)	2022-03-17 15:31:26 +00:00
Will Constable	72b1194464	Run lazy tensor codegen in generate_code.py (#73996 ) Summary: Hooks into existing autograd codegen script (generate_code.py) to take advantage of its integrations into buck/cmake/bazel. Adds a new option (--gen_lazy_ts_backend) to. generate_code.py, calling this from CMake OSS build and fbcode build, but not from other internal xplat/ovrsource builds (these could be opted in later) Bazel support is added in a later diff. Includes one generated file (torch/csrc/lazy/generated/LazyIr.h) in a unit test (test/cpp/lazy/test_ir.cpp) to partially verify the generator is working, but does not compile the remaining output sources from the generator yet as they depend on other files not yet landed from lazy_tensor_staging branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73996 Test Plan: OSS/internal CI - verify all builds are working and test_ir.cpp compiles LazyIr.h Reviewed By: ezyang Differential Revision: D34408536 fbshipit-source-id: 8af0aea3b95d81eccafc17d64390d70ddd176515 (cherry picked from commit f930612f2bad61c76eb02d85cfbec9f33a1459dc)	2022-03-17 15:31:26 +00:00
Han Qi	ded82ad7c7	Create method to map JIT module to (source, constant) and back. (#74119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74119 implemented function to generate source as ExtraFilesMap and constants wrote function to construct jit module given (ivalue, source, constant) tripple. Test Plan: unittest Reviewed By: pavithranrao Differential Revision: D34803945 fbshipit-source-id: 2edc798407fe68294cb4c3c7516f5bd143df88c3 (cherry picked from commit 35e54e166b8f0f5cfe8f08c07866b59ae61ee79d)	2022-03-15 18:30:08 +00:00
Taylor Robie	0b1f3bd158	[Profiler] Prefer TSC to wall clock when available (#73855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73855 Calling the clock is one of the most expensive parts of profiling. We can reduce the profiling overhead by using `rdtsc` instead. The tradeoff is that we have to measure and convert. (shift and scale) Test Plan: I added a cpp unit test with very aggressive anti-flake measures. I also ran the overhead benchmark (9 replicates) with `--stressTestKineto` (0.94 -> 0.89 us) and `--stressTestKineto --kinetoProfileMemory` (1.27 -> 1.17 us) Reviewed By: chaekit Differential Revision: D34231071 fbshipit-source-id: e3b3dd7580d93bcc783e87c7f2fc726cb74f4df8 (cherry picked from commit e8be9f8160793c6ee35d5af02bca3e01703e377d)	2022-03-13 18:29:06 +00:00
Taylor Robie	5a58820f01	[Profiler] Specialized AppendOnlyQueue (#73409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73409 We can do better than `vector` or `deque`, and it's sufficiently important to the hot path to justify a custom container. (This is part of the larger queue refactor, but this is a standalone drop-in replacement so we don't need to wait.) Test Plan: It's a pretty simple container type, so I just added a few cpp tests for emplace and read back. I also ran the overhead benchmark (replicates=9) with both `--stressTestKineto` (0.99 -> 0.94 us) and `--stressTestKineto --kinetoProfileMemory` (1.36 -> 1.27 us). Reviewed By: swolchok Differential Revision: D34231072 fbshipit-source-id: ed57299729d444d59cf843a0d38a3ee2240eeec1 (cherry picked from commit 43907948f3a8d2137244e7bb59f43999bd660917)	2022-03-11 19:47:40 +00:00
David Dang	abfaef0aec	[Quant][core] Merged conv packed params and linear packed params (#73486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73486 conv and linear packed params were previously defined in ATen/native/quantized/cpu/conv_packed_params.h> and ATen/native/quantized/cpu/packed_params.h>. These two files have been merged into one and has been relocated to ATen/native/quantized/cpu/packed_params.h>. Differential Revision: D34513286 D34513286 Test Plan: Imported from OSS Reviewed By: dagitses Pulled By: dzdang fbshipit-source-id: 813845af7ea9449e316ab7822efe7460f0bd0d88 (cherry picked from commit 2f627561f27f81977ff73b8863c5e9e719dc4c60)	2022-03-11 15:18:45 +00:00
Ivan Kobzarev	519e226b66	[tensorexp] ExternalCall2 without memcpy (#72225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72225 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D33960933 Pulled By: IvanKobzarev fbshipit-source-id: fc73a3de9e5150919e3806516065b4a6c8316000 (cherry picked from commit f637842c341e0ba94906a0c8a1efc81691dc512c)	2022-03-09 21:19:26 +00:00
Han Qi	0723639b60	Revert D34455360: Multisect successfully blamed D34455360 for test failures Summary: This diff is reverting D34455360 (`61d6c43864`) D34455360 (`61d6c43864`) is making the following tests to fail and this revert diff is either the revert of the blame diff or the revert of the stack of diffs that need to be reverted to revert the blame diff Tests affected: - https://www.internalfb.com/intern/test/562950004334605/ Multisect link: https://www.internalfb.com/intern/testinfra/multisect/756170 Test Plan: NA Reviewed By: zhxchen17 Differential Revision: D34596156 fbshipit-source-id: a465bca0094db3caf6130c80f1ed49eea981359b (cherry picked from commit ef5e5578c64ce9827570757fb016aafa9c782c6a)	2022-03-08 23:18:54 +00:00
Elias Ellison	52ccbf4494	Lock thread/block computation (#73800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73800 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D34647281 Pulled By: eellison fbshipit-source-id: adbdaf24191c4c1b85e0b62564388f2481002ed2 (cherry picked from commit 6cf38015cc14691518b1b5cb7d636e80eb3684fc)	2022-03-04 22:32:08 +00:00
Dave Bort	7b51629c53	[PyTorchEdge] Add getFileFormat() so we can differentiate Zip/Pickle from Flatbuffer (#73707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73707 Add a helper function to detect the file format from the first bytes of a data file or stream. This will be necessary during the migration from Pickle-serialized modules to Flatbuffer-serialized modules. ghstack-source-id: 150384317 Test Plan: Existing tests for ZIP+Pickle continue to pass. New unit tests pass: ``` cd xplat && buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_interpreter Building: finished in 26.6 sec (100%) 3180/3180 jobs, 571/3180 updated Total time: 32.2 sec Testing: finished in 07:08.3 min (89 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/caffe2:test_lite_interpreter //xplat/caffe2:test_lite_trainer PASS 421.1s 81 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_interpreter PASS 103ms 8 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer TESTS PASSED ``` Reviewed By: iseeyuan Differential Revision: D34527859 fbshipit-source-id: ff2d1eabc2f8be1de2e44709c878e2d1a373f0df (cherry picked from commit 5c394848346ab9e374c9e7eed479ad70ed09a7ae)	2022-03-04 19:35:41 +00:00
Han Qi	61d6c43864	Make debug_pkl smaller by only emitting unique traces. (#73368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368 debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings. Since many SourceRange shares the same source, the string for trace can be deduped. The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression). The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup. To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction. Test Plan: unit test Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents: ``` [qihan@devvm5585.vll0 ~]$ du archive -h 4.0K archive/xl_model_weights 3.7M archive/extra 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform 8.0K archive/code/__torch__/caffe2/torch/fb 8.0K archive/code/__torch__/caffe2/torch 8.0K archive/code/__torch__/caffe2 20M archive/code/__torch__/torch/fx/graph_module 20M archive/code/__torch__/torch/fx 8.0K archive/code/__torch__/torch/classes 20M archive/code/__torch__/torch 20M archive/code/__torch__ 20M archive/code 2.7M archive/constants 35M archive [qihan@devvm5585.vll0 ~]$ du resaved -h 4.0K resaved/extra 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform 8.0K resaved/code/__torch__/caffe2/torch/fb 8.0K resaved/code/__torch__/caffe2/torch 8.0K resaved/code/__torch__/caffe2 1.3M resaved/code/__torch__/torch/fx/graph_module 1.3M resaved/code/__torch__/torch/fx 8.0K resaved/code/__torch__/torch/classes 1.4M resaved/code/__torch__/torch 1.4M resaved/code/__torch__ 1.4M resaved/code 2.7M resaved/constants 13M resaved [qihan@devvm5585.vll0 ~]$ ``` Reviewed By: gmagogsfm Differential Revision: D34455360 fbshipit-source-id: 8cc716f9bba7183746b1b4ecc33a2de34ac503b9 (cherry picked from commit f1a04730fc9ac8fdab6c8e4c44cb5529e42090e4)	2022-03-02 08:37:08 +00:00
Mengwei Liu	9ce9803abe	[PyTorch] Add codegen unboxing ability (#69881 ) Summary: RFC: https://github.com/pytorch/rfcs/pull/40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In https://github.com/pytorch/pytorch/issues/71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023 (cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)	2022-03-01 23:28:13 +00:00
Elias Ellison	d3d74e9040	Allow custom registration of shape functions (#73270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73270 Together with open registration of NNC lowerings this should make possible to add support for custom operators, including internal fb-ops Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D34451275 Pulled By: eellison fbshipit-source-id: ae8ae2deb93caa6770e738217461e65853897b55 (cherry picked from commit ea6b7e8a6d8f970a20e68d02eefc5c951e32aa07)	2022-02-28 17:44:45 +00:00

... 2 3 4 5 6 ...

2136 Commits