pytorch/torch
angelayi cbf274d4a7 [aoti] Add packaging solution (#129895)
In this PR, I added support for packaging the AOTI generated files into a zipfile, and loading it in python.

`compile_so` takes the path to the package, a device, and a desired so_path location, and compiles package into a .so, and saves to the specified location.
`load_package` takes a path to the package and device, calls _extract_so, and then creates a callable to run the compiled model.

The zipfile generated looks like the following:
```
|- version
|- archive_format
|- data
   |- aotinductor
      |- cbtnafqaqrhvwztv7xudlal4xs6sofxa5oxccyuaqtrt6aozaklx.cubin  # AOTI cuda generated cubin files
      |- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe.cpp  # AOTI generated cpp file
      |- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe_compile_flags  # Flags for compiling the .o
      |- c6qqtnpgwfi3dv5nb76ai773kt45ezoxfwdmd7q37lvq6fs2tnoi.o  # AOTI saved const.o
      |- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe_linker_flags  # Flags for linking the files to form the .so
   |- constants
      |- constants.pt  # Constants saved using torch.save, can be loaded using mmap
```

The workflow is something like:
```
with torch.no_grad():
    ep = torch.export.export(
        model,
        example_inputs,
        dynamic_shapes=dynamic_shapes,
        strict=False,
    )
    gm = ep.module()
    package_path = torch._inductor.aot_compile(
        gm,
        example_inputs,
        options= {
              "aot_inductor.output_path": "my_path.pt2",  # or a directory
              "aot_inductor.package": True,
        }
    )
compiled_model = torch._inductor.package.load_package(package_path, device)
return compiled_model
```

I tried turning on loading the weights using mmap by default, but had some trouble with it, so that is just left as a todo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129895
Approved by: https://github.com/malfet
2024-07-17 13:56:58 +00:00
..
_awaits
_C Invalidate StorageImpl instances when tensor is overwritten with cudagraphs (#125264) 2024-07-16 14:29:29 +00:00
_C_flatbuffer
_custom_op Revert "Tighten torch.library.infer_schema input types (#130705)" 2024-07-16 12:57:11 +00:00
_decomp Revert "Add decompositions for copy variants of view ops (#128416)" 2024-07-11 22:09:23 +00:00
_dispatch
_dynamo [3.13, dynamo] support TO_BOOL (#130565) 2024-07-17 09:47:58 +00:00
_export [Fix]: Convert operator that does specialization to its symbolic counterpart (#129578) 2024-07-16 17:19:57 +00:00
_functorch Propagate buffer and parameter indices through AOT (#130393) 2024-07-16 22:12:38 +00:00
_higher_order_ops Revert "Renamed mask_fn to mask_mod (#130818)" 2024-07-17 13:47:08 +00:00
_inductor [aoti] Add packaging solution (#129895) 2024-07-17 13:56:58 +00:00
_lazy [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199) 2024-07-11 17:30:28 +00:00
_library Revert "Tighten torch.library.infer_schema input types (#130705)" 2024-07-16 12:57:11 +00:00
_logging On advice of James March, log pid instead of tid (#130679) 2024-07-17 02:00:10 +00:00
_numpy Make hashing a SymInt raise an error again (#130548) 2024-07-16 18:30:30 +00:00
_prims Revert "Add decompositions for copy variants of view ops (#128416)" 2024-07-11 22:09:23 +00:00
_prims_common [BE][Easy] fix ruff rule needless-bool (SIM103) (#130206) 2024-07-14 08:17:52 +00:00
_refs [BE][Easy] fix ruff rule needless-bool (SIM103) (#130206) 2024-07-14 08:17:52 +00:00
_strobelight
_subclasses [BE][Easy] fix ruff rule needless-bool (SIM103) (#130206) 2024-07-14 08:17:52 +00:00
_vendor
amp Revert "[MPS] Add support for autocast in MPS (#99272)" 2024-07-02 12:29:51 +00:00
ao Rename generate_numeric_debug_handle to numeric_debugger (#130590) 2024-07-15 22:42:27 +00:00
autograd [autograd] Support GradientEdge as output for torch.autograd.grad (#127766) 2024-07-16 21:46:19 +00:00
backends [cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343) 2024-06-30 19:22:16 +00:00
compiler
contrib
cpu [inductor][cpp] BF16 AMX micro-gemm support (#127195) 2024-06-21 07:21:47 +00:00
csrc [pytorch][counters] WaitCounter cleanup (#130664) 2024-07-17 04:42:35 +00:00
cuda [ROCm] Return correct AMDSMI socket_power metric (#130331) 2024-07-17 01:58:58 +00:00
distributed Revert "[PT-D] Relaxed contract to allow Sequence[nn.Module] (#127773)" 2024-07-16 23:48:09 +00:00
distributions [BE]: Update mypy to 1.10.0 (#127717) 2024-06-13 15:57:13 +00:00
export [export] add non-strict training IR (#130062) 2024-07-16 17:08:00 +00:00
fft
func
futures
fx [Export] Support aten.full.default and aten.full_like.default (#130639) 2024-07-16 16:50:04 +00:00
jit [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199) 2024-07-11 17:30:28 +00:00
legacy
lib
linalg
masked [BE] update type annotations for basic utilities in torch/__init__.py (#129001) 2024-06-24 18:04:38 +00:00
monitor
mps Add support in Python API for the recommended max working set size. (#128289) 2024-06-12 16:03:57 +00:00
mtia [MTIA] Fix synchronize API (#128714) 2024-06-17 21:58:46 +00:00
multiprocessing Enable sharing meta tensors between processes (#129520) 2024-07-04 20:29:48 +00:00
nested [NJT] throw an exception if nested_tensor_from_jagged is fx-traced without being fx.wrapped (#130702) 2024-07-16 19:21:10 +00:00
nn Revert "Renamed mask_fn to mask_mod (#130818)" 2024-07-17 13:47:08 +00:00
onnx Revert "[ONNX] Remove beartype usage (#130484)" 2024-07-16 18:41:51 +00:00
optim fix the use of initial learning rate in the OneCycleLR example (#130306) 2024-07-09 18:58:07 +00:00
package [BE] enforce style for empty lines in import segments (#129751) 2024-06-29 14:15:24 +00:00
profiler [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199) 2024-07-11 17:30:28 +00:00
quantization
signal
sparse Enable UFMT on all of torch/sparse (#130545) 2024-07-15 22:35:52 +00:00
special
testing Revert "Use inductor TestCase for distributed tests (#129494)" 2024-07-17 00:32:48 +00:00
utils Keep zero check be compatible with different sympy versions (#130729) 2024-07-16 08:39:00 +00:00
xpu
__config__.py
__future__.py
__init__.py Make hashing a SymInt raise an error again (#130548) 2024-07-16 18:30:30 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py Revert "Tighten torch.library.infer_schema input types (#130705)" 2024-07-16 12:57:11 +00:00
_deploy.py
_guards.py [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199) 2024-07-11 17:30:28 +00:00
_jit_internal.py [torchscript] Add logging for model id. (#130118) 2024-07-09 22:24:16 +00:00
_linalg_utils.py [BE] enable UFMT for torch/nn/*.py (#128593) 2024-06-23 16:05:13 +00:00
_lobpcg.py [BE] enable UFMT for top-level files torch/*.py (#127707) 2024-06-12 20:15:05 +00:00
_lowrank.py [BE] enable UFMT for torch/nn/*.py (#128593) 2024-06-23 16:05:13 +00:00
_meta_registrations.py Update error message in meta__convert_weight_to_int4pack (#130707) 2024-07-16 00:44:35 +00:00
_namedtensor_internals.py
_ops.py [HOP] add HOP x torch_dispatch interaction (#130606) 2024-07-12 21:51:36 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py
_storage_docs.py
_streambase.py
_tensor_docs.py
_tensor_str.py fix tensor print behavior for XPU (#130523) 2024-07-17 02:03:32 +00:00
_tensor.py [easy] Small rendering fix in Tensor.module_load doc (#130489) 2024-07-12 22:12:53 +00:00
_torch_docs.py Introduce the concept of Accelerators to PyTorch doc (#129363) 2024-07-15 14:24:46 +00:00
_utils_internal.py [torchscript] Add logging for model id. (#130118) 2024-07-09 22:24:16 +00:00
_utils.py Remove dependency on private _compat_pickle in CPython (#129509) 2024-06-26 14:20:27 +00:00
_VF.py
_vmap_internals.py [BE] enable UFMT for torch/nn/*.py (#128593) 2024-06-23 16:05:13 +00:00
_weights_only_unpickler.py Add torch.serialization.safe_globals context manager (#127939) 2024-07-12 20:38:43 +00:00
abi-check.cpp
CMakeLists.txt [BE] [CMake] Remove AT_CORE_STATIC_WINDOWS option (#130409) 2024-07-10 15:50:47 +00:00
custom_class_detail.h [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301) 2024-07-08 07:03:53 +00:00
custom_class.h [2/N] Fix some violations of unused-function and unused-variable checks in torch_cpu (#129878) 2024-07-04 00:39:28 +00:00
extension.h
functional.py [BE] enable UFMT for torch/nn/*.py (#128593) 2024-06-23 16:05:13 +00:00
hub.py [BE] enable UFMT for torch/nn/*.py (#128593) 2024-06-23 16:05:13 +00:00
library.h [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301) 2024-07-08 07:03:53 +00:00
library.py [custom_ops] expose torch.library.register_torch_dispatch (#130261) 2024-07-12 14:13:01 +00:00
overrides.py [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199) 2024-07-11 17:30:28 +00:00
py.typed
quasirandom.py [BE] enable UFMT for top-level files torch/*.py (#127707) 2024-06-12 20:15:05 +00:00
random.py [BE] enable UFMT for top-level files torch/*.py (#127707) 2024-06-12 20:15:05 +00:00
README.txt
return_types.py [BE] enable UFMT for top-level files torch/*.py (#127707) 2024-06-12 20:15:05 +00:00
script.h
serialization.py Add torch.serialization.safe_globals context manager (#127939) 2024-07-12 20:38:43 +00:00
storage.py typing: storage (#130669) 2024-07-16 14:31:35 +00:00
torch_version.py [BE] enable UFMT for top-level files torch/*.py (#127707) 2024-06-12 20:15:05 +00:00
types.py typing fake_tensor.py (#128041) 2024-07-13 06:07:40 +00:00
version.py.tpl

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.