### Description
This PR fixes a bug with getting module attributes during `torch.onnx.export` when `export_modules_as_functions` is used. With this fix, we can compare the LLaMA-2 models produced by the TorchScript exporter and the [Dynamo exporter](https://github.com/pytorch/pytorch/issues/104903).
### Context
When exporting LLaMA-2 from Hugging Face with `export_modules_as_functions`, the `Embedding` object does not have the `freeze` attribute.
```
File "/home/kvaishnavi/.local/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 662, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/home/kvaishnavi/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/kvaishnavi/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1558, in _call_impl
args_result = hook(self, args)
File "/home/kvaishnavi/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 1394, in _track_module_attributes_forward_pre_hook
setattr(module, attr_name, _get_module_attributes(module))
File "/home/kvaishnavi/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 1474, in _get_module_attributes
return {k: getattr(module, k) for k in annotations}
File "/home/kvaishnavi/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 1474, in <dictcomp>
return {k: getattr(module, k) for k in annotations}
File "/home/kvaishnavi/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1696, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'Embedding' object has no attribute 'freeze'
```
To get around this issue, we can skip adding the keys in the dictionary when the object does not have the attribute.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109759
Approved by: https://github.com/BowenBao
Fixes#68972
Relands #107246
To avoid causing Meta-internal CI failures, this PR avoids always asserting that the default dtype is float in the `TestCase.setUp/tearDown` methods. Instead, the assert is only done if `TestCase._default_dtype_check_enabled == True`. `_default_dtype_check_enabled` is set to True in the `if __name__ == "__main__":` blocks of all the relevant test files that have required changes for this issue
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108088
Approved by: https://github.com/ezyang
This reworks the DORT backend factory function to support the options kwarg of torch.compile, and defines a concrete OrtBackendOptions type that can be used to influence the backend.
Caching is also implemented in order to reuse backends with equal options.
Wrapping the backend in auto_autograd also becomes an option, which allows `OrtBackend` to always be returned as the callable for torch.compile; wrapping happens internally if opted into (True by default).
Lastly, expose options for configuring preferred execution providers (will be attempted first), whether or not to attempt to infer an ORT EP from a torch found device in the graph or inputs, and finally the default/fallback EPs.
### Demo
The following demo runs `Gelu` through `torch.compile(backend="onnxrt")` using various backend options through a dictionary form and a strongly typed form. It additionally exports the model through both the ONNX TorchScript exporter and the new TorchDynamo exporter.
```python
import math
import onnx.inliner
import onnxruntime
import torch
import torch.onnx
torch.manual_seed(0)
class Gelu(torch.nn.Module):
def forward(self, x):
return x * (0.5 * torch.erf(math.sqrt(0.5) * x) + 1.0)
@torch.compile(
backend="onnxrt",
options={
"preferred_execution_providers": [
"NotARealEP",
"CPUExecutionProvider",
],
"export_options": torch.onnx.ExportOptions(dynamic_shapes=True),
},
)
def dort_gelu(x):
return Gelu()(x)
ort_session_options = onnxruntime.SessionOptions()
ort_session_options.log_severity_level = 0
dort_gelu2 = torch.compile(
Gelu(),
backend="onnxrt",
options=torch.onnx._OrtBackendOptions(
preferred_execution_providers=[
"NotARealEP",
"CPUExecutionProvider",
],
export_options=torch.onnx.ExportOptions(dynamic_shapes=True),
ort_session_options=ort_session_options,
),
)
x = torch.randn(10)
torch.onnx.export(Gelu(), (x,), "gelu_ts.onnx")
export_output = torch.onnx.dynamo_export(Gelu(), x)
export_output.save("gelu_dynamo.onnx")
inlined_model = onnx.inliner.inline_local_functions(export_output.model_proto)
onnx.save_model(inlined_model, "gelu_dynamo_inlined.onnx")
print("Torch Eager:")
print(Gelu()(x))
print("DORT:")
print(dort_gelu(x))
print(dort_gelu2(x))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107973
Approved by: https://github.com/BowenBao
Cap opset version at 17 for torch.onnx.export and suggest users to use the dynamo exporter. Warn users instead of failing hard because we should still allow users to create custom symbolic functions for opset>17.
Also updates the default opset version by running `tools/onnx/update_default_opset_version.py`.
Fixes#107801Fixes#107446
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107829
Approved by: https://github.com/BowenBao
Previously, the top level GraphProto is hardcoded with name "torch_jit", and the subgraphs "torch_jit_{count}". It does not offer any insight to the graph, but rather encodes the graph producer as jit (torchscript). This is no longer true now that the graph can also be produced from dynamo.
As a naive first step, this PR re-purposes the name, to "main_graph", and "sub_graph_{count}" respectively. More delicate processing can be done to name the subgraphs with respect to their parent node or module. This can be done as follow ups.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107408
Approved by: https://github.com/justinchuby, https://github.com/titaiwangms
1. Add a list of HF models to CI tests. The PR intends to build them from Config, but some of them are not supported with Config. NOTE: Loaded from pre-trained model could potentially hit [uint8/bool conflict](https://github.com/huggingface/transformers/issues/21013) when a newer version of transformers is used.
- Dolly has torch.fx.Node in OnnxFunction attribute, which is currently not supported.
- Falcon and MPT has unsupported user coding to Dynamo.
2. Only update GPT2 exporting with real tensor to Config, as FakeMode rises unequal input errors between PyTorch and ORT. The reason is that [non-persistent buffer is not supported](https://github.com/pytorch/pytorch/issues/107211)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107247
Approved by: https://github.com/wschin, https://github.com/BowenBao
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
Issue list:
* Unsupported FX nodes: {'call_function': ['aten.embedding_renorm.default', ~~'aten._embedding_bag_forward_only.default'~~]}.
* aten._embedding_bag.default not captured by test. Hence this test is not reflecting the pattern seen in model from onnxbench. Update: need validation again, unsure if this is still the case.
* `padding_idx` is always emitted for `aten._embedding_bag` and `aten._embedding_bag_forward_only`. This overload is unsupported by Torchlib.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105862
Approved by: https://github.com/justinchuby
1. Update xfail reasons in fx runtime
2. Enable bloom-560m in runtime test. However, it's blocked by the unsupported constant tensor case. The previous error was because the when the model loads with external data, it surpasses 2GB, and couldn't be inlined. The fix is to inline the model it self, and then replace the original one. Pointing ORT to the path allows it to load with external data into model in runtime.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107257
Approved by: https://github.com/justinchuby
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
Generate diagnostic reports to monitor the internal stages of the export process. This tool aids in unblocking model exports and debugging the exporter.
#### Settings
~~1. Choose if you want to produce a .sarif file and specify its location.~~
1. Updated: saving .sarif file should be done by `export_output.save_sarif_log(dst)`, similar to saving exported onnx model `export_output.save(model_dst)`.
2. Customize diagnostic options:
- Set the desired verbosity for diagnostics.
- Treat warnings as errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106741
Approved by: https://github.com/titaiwangms, https://github.com/justinchuby, https://github.com/malfet
Summary
- The 'dynamo_export' diagnostics leverages the PT2 artifact logger to handle the verbosity
level of logs that are recorded in each SARIF log diagnostic. In addition to SARIF log,
terminal logging is by default disabled. Terminal logging can be activated by setting
the environment variable `TORCH_LOGS="onnx_diagnostics"`. When the environment variable
is set, it also fixes logging level to `logging.DEBUG`, overriding the verbosity level
specified in the diagnostic options.
See `torch/_logging/__init__.py` for more on PT2 logging.
- Replaces 'with_additional_message' with 'Logger.log' like apis.
- Introduce 'LazyString', adopted from 'torch._dynamo.utils', to skip
evaluation if the message will not be logged into diagnostic.
- Introduce 'log_source_exception' for easier exception logging.
- Introduce 'log_section' for easier markdown title logging.
- Updated all existing code to use new api.
- Removed 'arg_format_too_verbose' diagnostic.
- Rename legacy diagnostic classes for TorchScript Onnx Exporter to avoid
confusion.
Follow ups
- The 'dynamo_export' diagnostic now will not capture python stack
information at point of diagnostic creation. This will be added back in
follow up PRs for debug level logging.
- There is type mismatch due to subclassing 'Diagnostic' and 'DiagnosticContext'
for 'dynamo_export' to incorporate with PT2 logging. Follow up PR will
attempt to fix it.
- More docstrings with examples.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106592
Approved by: https://github.com/titaiwangms
Port fix from https://github.com/huggingface/safetensors/pull/318 into ONNX exporter until it is merged
* This add support for safetensors to be loaded within a FakeTensorMode, which results in creating `torch.empty((shape,), dtype=)`. This is done through a monkeypatch for the in-progress https://github.com/huggingface/safetensors/pull/318
* Adds a test for the HF bloom model (bigscience/bloom-560m)
* This PR also fixes existing fake tensor unit tests by moving the `torch.onnx.dynamo_export` to be inside the `enable_fake_mode()` context. Although calling `torch.onnx._dynamo_export` works for several models, the right way of using fake mode is calling the exporter within the context manager.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106930
Approved by: https://github.com/BowenBao
Fix#106057, except **Attribute dtype mismatch. E.g., alpha of aten.add.Tensor. -> Attribute: alpha INT vs FLOAT**.
Summarized the change
* Fill in defaults of attribute when `param_schema` is applied. This relaxes the matching on default attributes.
* Fill in None to optional input when `param_schema` is applied.
* Keep extra kwargs in attributes to make matching strictly.
* Allow input to be None when its dtype is `optiona[INPUT]`
The change comes with the guarantee from torchlib that attribute would never be None. For example, if `memory_format` is needed. The function should specify like this:
```python
@torch_op("aten::clone")
def aten_clone(
self: TTensor, memory_format: str = "" # pylint: disable=unused-argument
) -> TTensor:
"""clone(Tensor self, *, MemoryFormat? memory_format=None) -> Tensor"""
return op.Identity(self)
```
Previous to this PR, opSchema matching didn't strictly guard the number of inputs/attributes to allow nearest match, which introduces the bug of dispatching `aten::div.Tensor` to `aten::div.default` disregarding the fact that `aten::div.Tensor` has an extra attibute `rounding_mode`. This PR fixes the issue with the new logic to perfect/nearest match. Particularly, strictly restrict the qualification of being nearest match candidate.
For each ONNX variants, we check these step by step:
1. Check if the function signature of inputs number is the same as the inputs.
2. Check if the function signature of attribute names is the same set of inputs.
If either of the above two criteria fails to meet, the ONNX variant is not a perfect match, nor a nearest match candidate (match_score=None)
3. Check if input dtype matches
4. Check if attribute dtype matches
If 3 and 4 are met, then this is a perfect match, otherwise, it's still considered a candidate of nearest match with a matching score.
## Case Study
### Optional Input
The dispatcher recognizes optional inputs. However, the input can't be ignored. None must be provided.
```python
# Perfect match is found
inputs = (Tensor([2, 3]), None)
aten_op(X: TTensor, Y: Optional[INT64]):
...
```
Real Case: aten::convolution
NOTE: There is/will not optional attribute in torchlib.
### Different attributes
If an attribute is provided with value, it's a must to match the attribute in function signature.
```python
# Not perfect match, nor nearest match
inputs = (Tensor([2, 3]),)
attributes = {"a":1, "b":2}
aten_op(X: TTensor, a: int):
...
```
Real Case: aten::div and aten::div.Tensor_mode
### Default attribute
Default attribute will fill in the value into inputs/attributes
```python
# Perfect match is found
inputs = (Tensor([2, 3]),)
attributes = {}
aten_op(X: TTensor, a: int = 3):
...
```
Real case: aten::clone
### Ignore attribute with None value
The attributes with None value will be ignored in matching.
```python
# Perfect match
inputs = (Tensor([2, 3]),)
attributes = {"a": None}
aten_op(X: TTensor):
...
# Not perfect match, but eligible for nearest match
inputs = (Tensor([2, 3]),)
attributes = {"a": None}
aten_op(X: TTensor, a: int = 3):
...
```
Real case: aten::div and aten::div.Tensor_mode
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106478
Approved by: https://github.com/thiagocrepaldi, https://github.com/BowenBao