Add codegen infrastructure to generate IR nodes for non-native ops.
The proposed change is to add a `non_native` key to the `{backend}_native_functions.yaml` file that contains schema definitions similar to what is found in `native_functions.yaml`. e.g.
```
non_native:
...
- func: expand(Tensor input, int[] size, bool is_scalar_expand) -> Tensor
...
```
these definitions are parsed into a `LazyIrSchema` that can be used for generating IR nodes using `GenLazyIR`.
Fixes#74628
CC: @wconstab @desertfire @henrytwo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76535
Approved by: https://github.com/wconstab
Summary: The `create_if_missing` parameter is optional, and defaults to `None`.
Test Plan:
Confirmed that Pyre no longer complains about calling `SwitchWorkspace` with a
single string argument.
Differential Revision: D36366987
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77464
Approved by: https://github.com/voznesenskym
Summary: Currently OpKind is stored as an object field called op_ for each IR
node, and one usage of op_ is to avoid dynamic_cast in NodeCast when we
need to downcast a base-node pointer into a concrete sub-node pointer.
As a result, we need to construct and pass in an op when downcasting
nodes, and this becomes quite anonnying when we start to implement the
trie-based IR node reusing. More importantly, the op for each subclass
should be unique for that subclass and thus making it a const static field
is a more logical design.
In this PR, we still keep the object-level op_ for easier XLA adoption. As
furture work, we can come back to remove op_, make the op() method
virtual, and get rid of OpKind in all the node constructors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76711
Approved by: https://github.com/wconstab, https://github.com/JackCaoG
Re-landing #68111/#74596
## Description
v0.5 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).
On the basis of #50256, the below improvements are included:
* The [v0.5 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5) of the oneDNN Graph API is used
* The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.
### User API:
The optimization pass is disabled by default. Users could enable it by:
```
torch.jit.enable_onednn_fusion(True)
```
`torch.jit.freeze` should be used after tracing (recommended) or scripting a model.
### Performance:
[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:
* SkyLake 8180 (1 socket of 28 cores):

* SkyLake 8180 (single thread):

* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops
### Directory structure of the integration code
Fuser-related code is placed under:
```
torch/csrc/jit/codegen/onednn/
```
Optimization pass registration is done in:
```
torch/csrc/jit/passes/onednn_graph_fuser.h
```
CMake for the integration code is in:
```
caffe2/CMakeLists.txt
cmake/public/mkldnn.cmake
cmake/Modules/FindMKLDNN.cmake
```
## Limitations
* In this PR, we only support Pytorch-oneDNN-Graph integration on Linux platform. Support on Windows and MacOS will be enabled as a next step.
* We have only optimized the inference use-case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76622
Approved by: https://github.com/eellison
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73284
Some important ops won't support optional type until opset 16,
so we can't fully test things end-to-end, but I believe this should
be all that's needed. Once ONNX Runtime supports opset 16,
we can do more testing and fix any remaining bugs.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D34625646
Pulled By: malfet
fbshipit-source-id: 537fcbc1e9d87686cc61f5bd66a997e99cec287b
Co-authored-by: BowenBao <bowbao@microsoft.com>
Co-authored-by: neginraoof <neginmr@utexas.edu>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
(cherry picked from commit 822e79f31ae54d73407f34f166b654f4ba115ea5)
This functionality does not seem to be used
and there are some requests to update dependency.
Add `third_party` to torch_cpu include directories if compiling with
Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
Summary:
Add an observer to `PTMCoreMLExecutor` so we can inspect OOMs in production to help with T115554493.
The behaviour of the logger is as such:
1. Each time a model is compiled, there is a chance we publish all logs to QPL. This is determined by the randomly generated `_model_load_id` and `_sample_thresh`.
2. If we are publishing all logs, then every `_sample_every` inferences will be logged via QPL.
3. Every QPL log will collect memory metrics before and after model compilation/inference
4. If memory pressure is not normal (remaining mem < 400 MB) before or after compilation/inference, then that compilation/inference will be logged to QPL no matter what.
Previous diff got reverted due to OSS CI failures. Fixed those failures in this iteration.
Test Plan:
We can test in pytorch playground and inspect the QPL logs through Flipper:
```
arc focus2 -b pp-ios -a ModelRunner -a //xplat/caffe2/c10:c10Apple -a //xplat/caffe2:torch_mobile_coreApple -a //xplat/caffe2/fb/dynamic_pytorch:dynamic_pytorch_implApple -a //xplat/caffe2:coreml_delegateApple -a ModelRunnerDevOps -a //xplat/caffe2:torch_mobile_all_opsApple -a coreml_memory_observer -a //xplat/perflogger:perfloggerApple -fd --force-with-wrong-xcode
```
To check results in Hive/Scuba, test in instagram:
```
arc focus2 -b igios-no-extensions -a //fbobjc/Apps/Instagram/AppLibraries/Core/QPL/IGPerformanceLogging:IGPerformanceLogging -a //xplat/caffe2/c10:c10Apple -a //xplat/caffe2:torch_mobile_coreApple -a //xplat/caffe2/fb/dynamic_pytorch:dynamic_pytorch_implApple -a //xplat/caffe2:coreml_delegateApple -a //xplat/caffe2:torch_mobile_all_opsApple -a //xplat/perflogger:perfloggerApple -a coreml_memory_observerApple -c pt.enable_qpl=1 --force-with-wrong-xcode
```
Note that we need to change `_sample_thresh` to ensure logs show up.
Reviewed By: kimishpatel
Differential Revision: D35970823
fbshipit-source-id: 2cb4d73931f35a0fa7f362b9fb44b9f0a00aeb82
(cherry picked from commit 1e965acf72958fc59d0cd0b4958e54955cc1adf2)
Summary:
X-link: https://github.com/facebook/CacheLib/pull/137
Fix
error: anonymous non-C-compatible type given name for linkage purposes by alias declaration; add a tag name here [-Werror,-Wnon-c-typedef-for-linkage]
Test Plan:
```name=before
buck-out/v2/gen/fbcode/d839c731f5505c62/datasec/messaging/auth/utils/__MessagingAuthUtils__/headers/datasec/messaging/auth/utils/MessagingAuthLogger.h:83:25: error: anonymous non-C-compatible type given name for linkage purposes by alias declaration; add a tag name here [-Werror,-Wnon-c-typedef-for-linkage]
using LogInfo = struct {
```
CI
Reviewed By: philippv
Differential Revision: D36043476
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76610
Approved by: https://github.com/philippv, https://github.com/osalpekar
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76366
caffe2 is not currently being built for XROS.
Test Plan: CI
Reviewed By: kimishpatel
Differential Revision: D35923922
fbshipit-source-id: 260dacadf0bd5b6bab7833a4ce81e896d280b053
(cherry picked from commit 8370b8dd2519d55a79fa8d45e7951ca8dc0b21a8)
Fixes#75177
Couldn't find any utility method to get directory name in pytorch repo, hence creating a function for that.
Let me know if a new function is not needed.
I also referred [this](https://github.com/pytorch/pytorch/blob/master/c10/test/util/tempfile_test.cpp#L15) for directory check.
Also I am using TORCH_CHECK to show the error. This is highly verbose with the entire stack visible. Is there any alternative for the same so that it is easier to read? This could happen a frequently, so small and concise error would be more helpful here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75681
Approved by: https://github.com/albanD
Summary:
`C10_UNUSED` somehow triggers segfault in some versions of NVCC that looks like something as follows:
```
caffe2\caffe2\operators\piecewise_linear_transform_op.h(65): internal error: assertion failed: gen_variable_decl: declared_type is NULL (cp_gen_be.c, line 22209 in gen_variable_decl)
1 catastrophic error detected in the compilation of "caffe2/caffe2/operators/piecewise_linear_transform_op.cu".
Compilation aborted.
nvcc error : 'cudafe++' died with status 0xC0000409
```
Fixes regression introduced by https://github.com/pytorch/pytorch/pull/75538 / D35747333 (f6c275f55d)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76204
Test Plan: CI
Reviewed By: EscapeZero, atalman
Differential Revision: D35831451
fbshipit-source-id: f744d4688c9fd324f8f54b27781a3def97778d1e
(cherry picked from commit fd64655aa4b6890d205273ec76b416ee3b524783)
Summary:
[Comment](https://github.com/pytorch/pytorch/pull/62445/files#r680132022) claims, it got added for consistency with top level CMakeLists.txt, but `-Wno-unused-variable` is not mentioned there.
Modify violations in 50+ files that were added in the interim by either removing unused variables, or decorating the code with `C10_UNUSED` if local variable is likely used to extend object lifetime until the end of the block.
Caused preventable revert in https://github.com/pytorch/pytorch/pull/72633#issuecomment-1092300787
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75538
Reviewed By: anjali411
Differential Revision: D35747333
Pulled By: malfet
fbshipit-source-id: 3fc5828e44a4c05ba0e89e92613e6ebbdb260626
(cherry picked from commit c179fba21cfa2a0093fad50ccad5a22dd7cff52c)
Summary:
caffe2/utils/math.h depends on protobuf which is not needed for pytorch, so we want to get rid of it.
It turns out maskrcnn ops only need the StorageOrder enum from math.h, so just copy it.
Test Plan:
buck build //xplat/caffe2/fb/custom_ops/maskrcnn:maskrcnnAndroid#android-armv7,shared
buck build //xplat/caffe2/fb/custom_ops/maskrcnn:maskrcnn_aabbAndroid#android-armv7,shared
buck build //xplat/caffe2/fb/custom_ops/maskrcnn:maskrcnn_int8Android#android-armv7,shared
Differential Revision: D35624743
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75760
Approved by: https://github.com/malfet
This PR introduces 3 BC changes:
First, this PR propagates `BUILD_CAFFE2` flag to `libtorch` and `libtorch_python`, which is necessary for non-caffe2 ONNX runtimes when using `ONNX_ATEN_FALLBACK` operator export type.
Second, as a complement of https://github.com/pytorch/pytorch/pull/68490, this PR refactors Caffe2's Aten ops symbolics to consider not only the `operator_export_type` (aka `ONNX_ATEN_FALLBACK`) to emit Caffe2 Aten ops, but also whether `BUILD_CAFFE2` (which is called `torch.onnx._CAFFE2_ATEN_FALLBACK` in python binding) is set.
Lastly, it renames `onnx::ATen` to `aten::ATen` for ONNX spec consistency in a BC fashion.
ONNX doesn't have `ATen` op on its spec, but PyTorch ONNX converter emits them. Non-Caffe2 backend engines would be mislead by such operator's name/domain. A non-ideal workaround would be to have Aten ops handled based on its name and ignore the (non-complaint) domain. Moreover, users could incorrectly file bugs to either ONNX or ONNX Runtime when they inspect the model and notice the presence of an unspecified ONNX operator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73954
Approved by: https://github.com/BowenBao, https://github.com/malfet, https://github.com/garymm, https://github.com/jiafatom
Fixes https://github.com/pytorch/pytorch/issues/69674
The fix is Back Compatible with any Caffe2 build. It simply tries to use `onnxptimizer` module when `onnx.optimizer` is not available.
`onnx.optimizer` does not exist since ONNX 1.9 (April 2021) as the code was moved to a different [repo](https://github.com/onnx/onnxoptimizer)
If both `onnx<1.9` and `onnxoptimizer` are not found, the current fallback behavior is maintained (no ONNX optimization happens). Otherwise, the ONNX optimization pass will run from whatever module it is found.
This PR does not require or enforce a direct package dependency to work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75718
Approved by: https://github.com/BowenBao, https://github.com/malfet
This is enabled on some of our internal builds, is a common source
of fbcode only errors and apparently we are relatively clean on it.
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74996
Approved by: https://github.com/malfet