This commit speeds up the ONNX export of large models by making the following changes:
- Remove unecessary memcpy in `GetGraphProtoSize`
- In `export.cpp`, pass around a pointer to the ModelProto instead of the ModelProto itself.
The shape inference function is still the slowest part of the export for these models with large weights taking up 50% or more of the export time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103278
Approved by: https://github.com/BowenBao, https://github.com/thiagocrepaldi
Number of OSS PR were reverted, because new signed-unsigned comparison warnings, which are treated as errors in some internal builds.
Not sure how those selective rules are applied, but this PR removes `-Wno-sign-compare` from PyTorch codebase.
The only tricky part in this PR, as making sure that non-ASCII character detection works for both signed and unsigned chars here:
6e3d51b08a/torch/csrc/jit/serialization/python_print.cpp (L926)
Exclude several files from sign-compare if flash attention is used, due to the violation in cutlass, to be fixed by https://github.com/NVIDIA/cutlass/pull/869
Do not try to fix sign compare violations in caffe2 codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96723
Approved by: https://github.com/albanD
Fix#82589
Why:
1. **full_check** works in `onnx::checker::check_model` function as it turns on **strict_mode** in `onnx::shape_inference::InferShapes()` which I think that was the intention of this part of code.
2. **strict_mode** catches failed shape type inference (invalid ONNX model from onnx perspective) and ONNXRUNTIME can't run these invalid models, as ONNXRUNTIME actually rely on ONNX shape type inference to optimize ONNX graph. Why we don't set it True for default? >>> some of existing users use other platform, such as caffe2 to run ONNX model which doesn't need valid ONNX model to run.
3. This PR doesn't change the original behavior of `check_onnx_proto`, but add a warning message for those models which can't pass strict shape type inference, saying the models would fail on onnxruntime.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83186
Approved by: https://github.com/justinchuby, https://github.com/thiagocrepaldi, https://github.com/jcwchen, https://github.com/BowenBao
As we live in C++17 world
This is a functional no-op, just
- `s/namespace at { namespace native {/namespace at::native {/`
- `s/namespace torch { namespace jit {/namespace torch::jit {/`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92100
Approved by: https://github.com/izaitsevfb
Introduce `_jit_pass_onnx_assign_node_and_value_names` to parse and assign
scoped name for nodes and values in exported onnx graph.
Module layer information is obtained from `ONNXScopeName` captured in `scope`
attribute in nodes. For nodes, the processed onnx node name are stored in
attribute `onnx_name`. For values, the processed onnx output name are stored
as `debugName`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82040
Approved by: https://github.com/AllenTiTaiWang, https://github.com/justinchuby, https://github.com/abock
Introduce `_jit_pass_onnx_assign_node_and_value_names` to parse and assign
scoped name for nodes and values in exported onnx graph.
Module layer information is obtained from `ONNXScopeName` captured in `scope`
attribute in nodes. For nodes, the processed onnx node name are stored in
attribute `onnx_name`. For values, the processed onnx output name are stored
as `debugName`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82040
Approved by: https://github.com/AllenTiTaiWang, https://github.com/justinchuby, https://github.com/abock
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73284
Some important ops won't support optional type until opset 16,
so we can't fully test things end-to-end, but I believe this should
be all that's needed. Once ONNX Runtime supports opset 16,
we can do more testing and fix any remaining bugs.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D34625646
Pulled By: malfet
fbshipit-source-id: 537fcbc1e9d87686cc61f5bd66a997e99cec287b
Co-authored-by: BowenBao <bowbao@microsoft.com>
Co-authored-by: neginraoof <neginmr@utexas.edu>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
(cherry picked from commit 822e79f31ae54d73407f34f166b654f4ba115ea5)
This PR introduces 3 BC changes:
First, this PR propagates `BUILD_CAFFE2` flag to `libtorch` and `libtorch_python`, which is necessary for non-caffe2 ONNX runtimes when using `ONNX_ATEN_FALLBACK` operator export type.
Second, as a complement of https://github.com/pytorch/pytorch/pull/68490, this PR refactors Caffe2's Aten ops symbolics to consider not only the `operator_export_type` (aka `ONNX_ATEN_FALLBACK`) to emit Caffe2 Aten ops, but also whether `BUILD_CAFFE2` (which is called `torch.onnx._CAFFE2_ATEN_FALLBACK` in python binding) is set.
Lastly, it renames `onnx::ATen` to `aten::ATen` for ONNX spec consistency in a BC fashion.
ONNX doesn't have `ATen` op on its spec, but PyTorch ONNX converter emits them. Non-Caffe2 backend engines would be mislead by such operator's name/domain. A non-ideal workaround would be to have Aten ops handled based on its name and ignore the (non-complaint) domain. Moreover, users could incorrectly file bugs to either ONNX or ONNX Runtime when they inspect the model and notice the presence of an unspecified ONNX operator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73954
Approved by: https://github.com/BowenBao, https://github.com/malfet, https://github.com/garymm, https://github.com/jiafatom
Summary:
Add ONNX exporter logging facility. Supporting both C++/Python logging api. Logging can be turned on/off. Logging output stream can be either set to `stdout` or `stderr`.
A few other changes:
* When exception is raised in passes, the current IR graph being processed will be logged.
* When exception is raised from `_jit_pass_onnx` (the pass that converts nodes from namespace `ATen` to `ONNX`), both ATen IR graph and ONNX IR graph under construction will be logged.
* Exception message for ConstantFolding is truncated to avoid being too verbose.
* Update the final printed IR graph with node name in ONNX ModelProto as node attribute. Torch IR Node does not have name. Adding this to printed IR graph helps debugging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71342
Reviewed By: msaroufim
Differential Revision: D34433473
Pulled By: malfet
fbshipit-source-id: 4b137dfd6a33eb681a5f2612f19aadf5dfe3d84a
(cherry picked from commit 67a8ebed5192c266f604bdcca931df6fe589699f)
Do not assign dim_param for sequence tensor type.
Sequence of tensors could differ in dimension size.
Use a dimension with neither dim_value nor dim_param set
to denote an unknown dimension.
Create and assign dim_param for normal tensor type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70651
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68490
The use of ATEN as a fallback operator during ONNX conversion is important for increasing operator coverage or even provide more efficient implementations over some ONNX ops.
Currently this feature is available through `OperatorExportTypes.ONNX_ATEN_FALLBACK`,
but it also performs changes to the graph that are runnable by Caffe2, only.
This PR introduces restricts caffe2-specific graph transformations for `ONNX_ATEN_FALLBACK`
operator export type for when pytorch is built with caffe2 support (aka BUILD_CAFFE2=1 during build)
The first version of this PR introduced a new operator export type `ONNX_ATEN__STRICT_FALLBACK`,
which essentially is the same as `ONNX_ATEN_FALLBACK` but without caffe2 transformations.
It was preferred to not introduce a new operator export type, but to refine the existing aten fallback one
## BC-breaking note
### The global constant `torch.onnx.PYTORCH_ONNX_CAFFE2_BUNDLE` is removed in favor of
a less visible `torch.onnx._CAFFE2_ATEN_FALLBACK`.
`PYTORCH_ONNX_CAFFE2_BUNDLE` is really a dead code flag always set to False.
One alternative would be fixing it, but #66658 disables Caffe2 build by default.
Making a Caffe2 feature a private one seems to make more sense for future deprecation.
### The method `torch.onnx.export` now defaults to ONNX when `operator_export_type` is not specified.
Previously `torch.onnx.export's operator_export_type` intended to default to `ONNX_ATEN_FALLBACK` when `PYTORCH_ONNX_CAFFE2_BUNDLE` was set, but it would never happen as `PYTORCH_ONNX_CAFFE2_BUNDLE` is always undefined
Co-authored-by: Nikita Shulga <nshulga@fb.com>
Test Plan: Imported from OSS
Reviewed By: jansel
Differential Revision: D32483781
Pulled By: malfet
fbshipit-source-id: e9b447db9466b369e77d747188685495aec3f124
(cherry picked from commit 5fb1eb1b19)
Summary:
- PyTorch and ONNX has supported BFloat16, add this to unblock some mixed-precision training model.
- Support PyTorch TNLG model to use BFloat16 tensors for the inputs/outputs of the layers that run on the NPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66788
Reviewed By: jansel
Differential Revision: D32283510
Pulled By: malfet
fbshipit-source-id: 150d69b1465b2b917dd6554505eca58042c1262a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67803
* Addresses comments from #63589
[ONNX] remove torch::onnx::PRODUCER_VERSION (#67107)
Use constants from version.h instead.
This simplifies things since we no longer have to update
PRODUCER_VERSION for each release.
Also add TORCH_VERSION to version.h so that a string is available for
this purpose.
[ONNX] Set `ir_version` based on opset_version. (#67128)
This increases the odds that the exported ONNX model will be usable.
Before this change, we were setting the IR version to a value which may
be higher than what the model consumer supports.
Also some minor clean-up in the test code:
* Fix string replacement.
* Use a temporary file so as to not leave files around in the test
current working directory.
Test Plan: Imported from OSS
Reviewed By: msaroufim
Differential Revision: D32181306
Pulled By: malfet
fbshipit-source-id: 02f136d34ef8f664ade0bc1985a584f0e8c2b663
Co-authored-by: BowenBao <bowbao@microsoft.com>
Co-authored-by: Gary Miguel <garymiguel@microsoft.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65966
ghstack-source-id: 141594521
Support exportation of "interface methods" from submodule to a mobile module. "Interface methods" are defined as methods which might be dynamically called in a module therefore need to be exported anyway, like virtual functions in C++.
Before this change the algorithm of exportation is a simple iteration through all toplevel methods. Now since we have indirect calls, we need to recursively walkthrough the call graph to find all potentially used methods, which means the order we export methods might break in old runtimes, to guarantee forward compatibility we need to export toplevel methods first, then extra methods, in this order toplevel methods will always be found first.
NOTE that interface methods exportations are disabled by default in this diff. We need to call torch._C._enable_mobile_interface_call_export to actaully enable it.
Test Plan: buck test mode/dev //caffe2/test:jit -- --exact 'caffe2/test:jit - test_export_opnames_interface (jit.test_misc.TestMisc)'
Reviewed By: qihqi, iseeyuan
Differential Revision: D31326155
fbshipit-source-id: 5be7234cca07691f62648a85133b6db65e427b53
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66140
* Add new argument to export api to enable users specifying `nn.Module` classes that they wish to be exported as local function in ONNX model.
* Refactor `torch/csrc/jit/serialization/export.cpp`, and remove redundant `EncoderBase` class.
* ~~Contains changes from #63268~~
* Depends on #63716 to update onnx submodule.
Test Plan: Imported from OSS
Reviewed By: jansel
Differential Revision: D31424098
fbshipit-source-id: c949d0b01c206c30b4182c2dd1a5b90e32b7a0d3
Co-authored-by: BowenBao <bowbao@microsoft.com>
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64382
* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.
* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.
* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D30905265
Pulled By: malfet
fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb
Co-authored-by: hwangdeyu <dejack953@outlook.com>
Summary:
* Minor: spelling, grammar.
* Add calls to `GRAPH_DUMP()` where they were missing.
* Add or expand a few comments.
* Move a few comments to seemingly more appropriate spots.
* In canonicalize_graph_fuser_ops.cpp inline `runnableInputs()` since it
was only called in one place and had a misleading comment and
confusing name.
* In `PeepholeOptimizeImpl::optimizeBlock()`, set `changed = true;` when
removing `aten::is_complex`. Pretty sure its absence was a bug.
* Delete unused `_jit_pass_remove_inplace_ops` and and its
implementation `RemoveInplaceOps()`.
* In `preprocessCaffe2Ops()`, remove redundant check for nested optional
types. It was already checked in `checkONNXCompatibility()`.
* In `EncoderBase::AddAttribute`, log the unexpected attribute kind.
I don't remember the repro case now but I did hit this error at some
point and this additional logging made it easier to understand.
* In `fuseConvBatchNorm()` in eval_peephole.cpp, consistently use
camelCase instead of snake_case for local variables.
* Add curly braces around the bodies of if and loops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60390
Reviewed By: Krovatkin
Differential Revision: D29523283
Pulled By: SplitInfinity
fbshipit-source-id: 4e16c5648616f53da07d68dab7fdf252e06a0752
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60244
Do 2GB size check for protocol buffer serialization at a later time, to avoid false alarming for cases like shape inference where no serialization actually happens.
Test Plan: Imported from OSS
Reviewed By: zou3519, ZolotukhinM
Differential Revision: D29494910
Pulled By: SplitInfinity
fbshipit-source-id: 4c36d26de9a94e5d6cf78f332d4dffc46588ebf0
Co-authored-by: BowenBao <bowbao@microsoft.com>
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
Handle sequence output shape and type inference.
This PR fixes value type of sequence outputs. Prior to this, all model sequence type outputs were unfolded for ONNX models.
This PR also enable shape inference for sequence outputs to represent the dynamic shape of these values.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46542
Reviewed By: ezyang
Differential Revision: D24924236
Pulled By: bzinodev
fbshipit-source-id: 506e70a38cfe31069191d7f40fc6375239c6aafe
Summary:
* Support propagating `dim_param` in ONNX by encoding as `ShapeSymbol` in `SymbolicShape` of outputs. If export is called with `dynamic_axes` provided, shape inference will start with these axes set as dynamic.
* Add new test file `test_pytorch_onnx_shape_inference.py`, reusing all test cases from `test_pytorch_onnx_onnxruntime.py`, but focus on validating shape for all nodes in graph. Currently this is not enabled in the CI, since there are still quite some existing issues and corner cases to fix. The test is default to run only at opset 12.
* Bug fixes, such as div, _len, and peephole.cpp passes for PackPadded, and LogSoftmaxCrossEntropy.
* This PR depends on existing PR such as 44332.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44920
Reviewed By: eellison
Differential Revision: D23958398
Pulled By: bzinodev
fbshipit-source-id: 00479d9bd19c867d526769a15ba97ec16d56e51d
Summary:
Optimize export_onnx api to reduce string and model proto exchange in export.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44332
Reviewed By: bwasti, eellison
Differential Revision: D23880129
Pulled By: bzinodev
fbshipit-source-id: 1d216d8f710f356cbba2334fb21ea15a89dd16fa
Summary:
* Support sequence type (de)serialization, enables onnx shape inference on sequence nodes.
* Fix shape inference with block input/output: e.g. Loop and If nodes.
* Fix bugs in symbolic discovered by coverage of onnx shape inference.
* Improve debuggability: added more jit logs. For simplicity, the default log level, when jit log is enabled, will not dump ir graphs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43929
Reviewed By: albanD
Differential Revision: D23674604
Pulled By: bzinodev
fbshipit-source-id: ab6aacb16d0e3b9a4708845bce27c6d65e567ba7
Summary:
It is often that the conversion from torch operator to onnx operator requires input rank/dtype/shape to be known. Previously, the conversion depends on tracer to provide these info, leaving a gap in conversion of scripted modules.
We are extending the export with support from onnx shape inference. If enabled, onnx shape inference will be called whenever an onnx node is created. This is the first PR introducing the initial look of the feature. More and more cases will be supported following this PR.
* Added pass to run onnx shape inference on a given node. The node has to have namespace `onnx`.
* Moved helper functions from `export.cpp` to a common place for re-use.
* This feature is currently experimental, and can be turned on through flag `onnx_shape_inference` in internal api `torch.onnx._export`.
* Currently skipping ONNX Sequence ops, If/Loop and ConstantOfShape due to limitations. Support will be added in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40628
Reviewed By: mrshenli
Differential Revision: D22709746
Pulled By: bzinodev
fbshipit-source-id: b52aeeae00667e66e0b0c1144022f7af9a8b2948
Summary:
Previously large tensor data in attributes and subgraphs are not stored externally. ONNX won't be able to serialize the model for cases where the total size sums up to >= 2GB. This PR enables that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38793
Reviewed By: hl475
Differential Revision: D22111092
Pulled By: houseroad
fbshipit-source-id: 355234e50825d576754de33c86a9690161caaeaf