pytorch/torch/csrc/utils
vasiliy 382fbcc1e4 add the torch.float8_e8m0fnu dtype to PyTorch (#147466)
Summary:

Continuing the work from https://github.com/pytorch/pytorch/pull/146427

Adds the `torch.float8_e8m0fnu` dtype to PyTorch, as detailed in
https://github.com/pytorch/pytorch/issues/146414 . Please see the issue for a detailed definition of the format.  Example of basic functionality:

```python
import torch

# round trip
x0 = torch.randn(4, 4, dtype=torch.float32)
x1 = x0.to(torch.float8_e8m0fnu)  # RNE rounding
x2 = x1.to(torch.float32)  # 2 ** exponent

# creation with empty
x0 = torch.empty(4, 4, dtype=torch.float8_e8m0fnu)

# printing
print(x0)
```

Done in this PR:
* numerical correctness
* op coverage (except for `torch._scaled_mm`): create tensor, cast to/from float32
* printing a tensor works

For future PRs:
* performance optimizations for casting
* torch._scaled_mm
* PT2
* various cleanups (detailed in comments with issue numbers)

Test Plan:

```
pytest test/quantization/core/experimental/test_float8.py -s
```

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147466
Approved by: https://github.com/drisspg
2025-02-20 13:55:42 +00:00
..
byte_order.cpp Move complex<Half> from Half.h to complex.h (#140565) 2024-11-18 15:56:21 +00:00
byte_order.h
cpp_stacktraces.cpp [Reland][Environment Variable][4/N] Use thread-safe getenv functions (#140593) 2025-01-28 20:51:49 +00:00
cpp_stacktraces.h
cuda_enabled.h
device_lazy_init.cpp [8/N] Fix extra warnings brought by clang-tidy-17 (#139151) 2024-10-30 14:20:08 +00:00
device_lazy_init.h expose extra torch_python apis (#144746) 2025-01-16 20:50:31 +00:00
disable_torch_function.cpp [8/N] Fix extra warnings brought by clang-tidy-17 (#139151) 2024-10-30 14:20:08 +00:00
disable_torch_function.h
generated_serialization_types.h [export] Generate printers/parsers for serialization enum values. (#147126) 2025-02-14 02:14:35 +00:00
init.cpp [8/N] Fix extra warnings brought by clang-tidy-17 (#139151) 2024-10-30 14:20:08 +00:00
init.h
invalid_arguments.cpp Enable more readability-redundant checks (#143963) 2024-12-30 14:49:33 +00:00
invalid_arguments.h
nested.cpp
nested.h
numpy_stub.h
object_ptr.cpp Expose several APIs to public (torch python APIs) (#144525) 2025-01-15 14:34:45 +00:00
object_ptr.h dynamo tracing perf: c++ strip_function_call: 49.12 -> 47.77 (#143063) 2024-12-22 06:38:46 +00:00
out_types.cpp [3/N] Apply bugprone-unchecked-optional-access (#142442) 2024-12-11 01:39:10 +00:00
out_types.h
pybind.cpp [3/N] Apply bugprone-unchecked-optional-access (#142442) 2024-12-11 01:39:10 +00:00
pybind.h Use Wextra-semi (#140236) 2024-11-13 02:15:16 +00:00
pycfunction_helpers.h
pyobject_preservation.cpp
pyobject_preservation.h
python_arg_parser.cpp Use std::string_view (#145906) 2025-01-30 03:14:27 +00:00
python_arg_parser.h [18/N] Fix extra warnings brought by clang-tidy-17 (#144014) 2025-01-08 17:21:55 +00:00
python_compat.h Enable readability-redundant-declaration (#143982) 2024-12-31 00:20:10 +00:00
python_dispatch.cpp
python_dispatch.h
python_numbers.h
python_raii.h
python_scalars.h add the torch.float8_e8m0fnu dtype to PyTorch (#147466) 2025-02-20 13:55:42 +00:00
python_strings.h [3/N] Replace c10::sv with std::sv (#139861) 2024-11-07 20:03:57 +00:00
python_stub.h
python_symnode.cpp
python_symnode.h Use Wextra-semi (#140236) 2024-11-13 02:15:16 +00:00
python_torch_function_mode.h
python_tuples.h [8/N] Fix extra warnings brought by clang-tidy-17 (#139151) 2024-10-30 14:20:08 +00:00
pythoncapi_compat.h
schema_info.cpp [1/N] Replace c10::sv with std::sv (#139453) 2024-11-01 05:39:37 +00:00
schema_info.h [1/N] Replace c10::sv with std::sv (#139453) 2024-11-01 05:39:37 +00:00
six.h
structseq.cpp
structseq.h
tensor_apply.cpp
tensor_apply.h
tensor_dtypes.cpp
tensor_dtypes.h
tensor_flatten.cpp
tensor_flatten.h
tensor_layouts.cpp
tensor_layouts.h
tensor_list.cpp
tensor_list.h
tensor_memoryformats.cpp
tensor_memoryformats.h
tensor_new.cpp Let tensor_a.new_tensor() be on tensor_a.device by default (#144958) 2025-01-24 22:12:31 +00:00
tensor_new.h Expose several APIs to public (torch python APIs) (#144525) 2025-01-15 14:34:45 +00:00
tensor_numpy.cpp [10/N] Fix extra warnings brought by clang-tidy-17 (#139385) 2024-11-04 00:47:19 +00:00
tensor_numpy.h
tensor_qschemes.cpp
tensor_qschemes.h
tensor_types.cpp [4/N] Remove unnecessary once flag usage (#146783) 2025-02-11 13:55:06 +00:00
tensor_types.h
throughput_benchmark-inl.h Fix Throughputbenchmark issue (#144669) 2025-01-26 03:37:20 +00:00
throughput_benchmark.cpp [19/N] Fix extra warnings brought by clang-tidy-17 (#144448) 2025-01-09 15:58:05 +00:00
throughput_benchmark.h Enable more readability-redundant checks (#143963) 2024-12-30 14:49:33 +00:00
torch_dispatch_mode.h [1/N] Apply bugprone-unchecked-optional-access (#140679) 2024-11-20 04:04:41 +00:00
variadic.cpp
variadic.h
verbose.cpp
verbose.h