pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

angelayi cbf274d4a7 [aoti] Add packaging solution (#129895 ) In this PR, I added support for packaging the AOTI generated files into a zipfile, and loading it in python. `compile_so` takes the path to the package, a device, and a desired so_path location, and compiles package into a .so, and saves to the specified location. `load_package` takes a path to the package and device, calls _extract_so, and then creates a callable to run the compiled model. The zipfile generated looks like the following: ``` \|- version \|- archive_format \|- data \|- aotinductor \|- cbtnafqaqrhvwztv7xudlal4xs6sofxa5oxccyuaqtrt6aozaklx.cubin # AOTI cuda generated cubin files \|- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe.cpp # AOTI generated cpp file \|- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe_compile_flags # Flags for compiling the .o \|- c6qqtnpgwfi3dv5nb76ai773kt45ezoxfwdmd7q37lvq6fs2tnoi.o # AOTI saved const.o \|- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe_linker_flags # Flags for linking the files to form the .so \|- constants \|- constants.pt # Constants saved using torch.save, can be loaded using mmap ``` The workflow is something like: ``` with torch.no_grad(): ep = torch.export.export( model, example_inputs, dynamic_shapes=dynamic_shapes, strict=False, ) gm = ep.module() package_path = torch._inductor.aot_compile( gm, example_inputs, options= { "aot_inductor.output_path": "my_path.pt2", # or a directory "aot_inductor.package": True, } ) compiled_model = torch._inductor.package.load_package(package_path, device) return compiled_model ``` I tried turning on loading the weights using mmap by default, but had some trouble with it, so that is just left as a todo Pull Request resolved: https://github.com/pytorch/pytorch/pull/129895 Approved by: https://github.com/malfet		2024-07-17 13:56:58 +00:00
..
_awaits
_C	Invalidate StorageImpl instances when tensor is overwritten with cudagraphs (#125264 )	2024-07-16 14:29:29 +00:00
_C_flatbuffer
_custom_op	Revert "Tighten torch.library.infer_schema input types (#130705 )"	2024-07-16 12:57:11 +00:00
_decomp	Revert "Add decompositions for copy variants of view ops (#128416 )"	2024-07-11 22:09:23 +00:00
_dispatch
_dynamo	[3.13, dynamo] support TO_BOOL (#130565 )	2024-07-17 09:47:58 +00:00
_export	[Fix]: Convert operator that does specialization to its symbolic counterpart (#129578 )	2024-07-16 17:19:57 +00:00
_functorch	Propagate buffer and parameter indices through AOT (#130393 )	2024-07-16 22:12:38 +00:00
_higher_order_ops	Revert "Renamed mask_fn to mask_mod (#130818 )"	2024-07-17 13:47:08 +00:00
_inductor	[aoti] Add packaging solution (#129895 )	2024-07-17 13:56:58 +00:00
_lazy	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 )	2024-07-11 17:30:28 +00:00
_library	Revert "Tighten torch.library.infer_schema input types (#130705 )"	2024-07-16 12:57:11 +00:00
_logging	On advice of James March, log pid instead of tid (#130679 )	2024-07-17 02:00:10 +00:00
_numpy	Make hashing a SymInt raise an error again (#130548 )	2024-07-16 18:30:30 +00:00
_prims	Revert "Add decompositions for copy variants of view ops (#128416 )"	2024-07-11 22:09:23 +00:00
_prims_common	[BE][Easy] fix ruff rule needless-bool (SIM103) (#130206 )	2024-07-14 08:17:52 +00:00
_refs	[BE][Easy] fix ruff rule needless-bool (SIM103) (#130206 )	2024-07-14 08:17:52 +00:00
_strobelight
_subclasses	[BE][Easy] fix ruff rule needless-bool (SIM103) (#130206 )	2024-07-14 08:17:52 +00:00
_vendor
amp	Revert "[MPS] Add support for autocast in MPS (#99272 )"	2024-07-02 12:29:51 +00:00
ao	Rename generate_numeric_debug_handle to numeric_debugger (#130590 )	2024-07-15 22:42:27 +00:00
autograd	[autograd] Support GradientEdge as output for torch.autograd.grad (#127766 )	2024-07-16 21:46:19 +00:00
backends	[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343 )	2024-06-30 19:22:16 +00:00
compiler
contrib
cpu	[inductor][cpp] BF16 AMX micro-gemm support (#127195 )	2024-06-21 07:21:47 +00:00
csrc	[pytorch][counters] WaitCounter cleanup (#130664 )	2024-07-17 04:42:35 +00:00
cuda	[ROCm] Return correct AMDSMI socket_power metric (#130331 )	2024-07-17 01:58:58 +00:00
distributed	Revert "[PT-D] Relaxed `contract` to allow `Sequence[nn.Module]` (#127773 )"	2024-07-16 23:48:09 +00:00
distributions	[BE]: Update mypy to 1.10.0 (#127717 )	2024-06-13 15:57:13 +00:00
export	[export] add non-strict training IR (#130062 )	2024-07-16 17:08:00 +00:00
fft
func
futures
fx	[Export] Support aten.full.default and aten.full_like.default (#130639 )	2024-07-16 16:50:04 +00:00
jit	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 )	2024-07-11 17:30:28 +00:00
legacy
lib
linalg
masked	[BE] update type annotations for basic utilities in `torch/__init__.py` (#129001 )	2024-06-24 18:04:38 +00:00
monitor
mps	Add support in Python API for the recommended max working set size. (#128289 )	2024-06-12 16:03:57 +00:00
mtia	[MTIA] Fix synchronize API (#128714 )	2024-06-17 21:58:46 +00:00
multiprocessing	Enable sharing meta tensors between processes (#129520 )	2024-07-04 20:29:48 +00:00
nested	[NJT] throw an exception if nested_tensor_from_jagged is fx-traced without being fx.wrapped (#130702 )	2024-07-16 19:21:10 +00:00
nn	Revert "Renamed mask_fn to mask_mod (#130818 )"	2024-07-17 13:47:08 +00:00
onnx	Revert "[ONNX] Remove beartype usage (#130484 )"	2024-07-16 18:41:51 +00:00
optim	fix the use of initial learning rate in the OneCycleLR example (#130306 )	2024-07-09 18:58:07 +00:00
package	[BE] enforce style for empty lines in import segments (#129751 )	2024-06-29 14:15:24 +00:00
profiler	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 )	2024-07-11 17:30:28 +00:00
quantization
signal
sparse	Enable UFMT on all of torch/sparse (#130545 )	2024-07-15 22:35:52 +00:00
special
testing	Revert "Use inductor TestCase for distributed tests (#129494 )"	2024-07-17 00:32:48 +00:00
utils	Keep zero check be compatible with different sympy versions (#130729 )	2024-07-16 08:39:00 +00:00
xpu
__config__.py
__future__.py
__init__.py	Make hashing a SymInt raise an error again (#130548 )	2024-07-16 18:30:30 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py	Revert "Tighten torch.library.infer_schema input types (#130705 )"	2024-07-16 12:57:11 +00:00
_deploy.py
_guards.py	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 )	2024-07-11 17:30:28 +00:00
_jit_internal.py	[torchscript] Add logging for model id. (#130118 )	2024-07-09 22:24:16 +00:00
_linalg_utils.py	[BE] enable UFMT for `torch/nn/*.py` (#128593 )	2024-06-23 16:05:13 +00:00
_lobpcg.py	[BE] enable UFMT for top-level files `torch/*.py` (#127707 )	2024-06-12 20:15:05 +00:00
_lowrank.py	[BE] enable UFMT for `torch/nn/*.py` (#128593 )	2024-06-23 16:05:13 +00:00
_meta_registrations.py	Update error message in meta__convert_weight_to_int4pack (#130707 )	2024-07-16 00:44:35 +00:00
_namedtensor_internals.py
_ops.py	[HOP] add HOP x torch_dispatch interaction (#130606 )	2024-07-12 21:51:36 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py
_storage_docs.py
_streambase.py
_tensor_docs.py
_tensor_str.py	fix tensor print behavior for XPU (#130523 )	2024-07-17 02:03:32 +00:00
_tensor.py	[easy] Small rendering fix in Tensor.module_load doc (#130489 )	2024-07-12 22:12:53 +00:00
_torch_docs.py	Introduce the concept of Accelerators to PyTorch doc (#129363 )	2024-07-15 14:24:46 +00:00
_utils_internal.py	[torchscript] Add logging for model id. (#130118 )	2024-07-09 22:24:16 +00:00
_utils.py	Remove dependency on private _compat_pickle in CPython (#129509 )	2024-06-26 14:20:27 +00:00
_VF.py
_vmap_internals.py	[BE] enable UFMT for `torch/nn/*.py` (#128593 )	2024-06-23 16:05:13 +00:00
_weights_only_unpickler.py	Add torch.serialization.safe_globals context manager (#127939 )	2024-07-12 20:38:43 +00:00
abi-check.cpp
CMakeLists.txt	[BE] [CMake] Remove AT_CORE_STATIC_WINDOWS option (#130409 )	2024-07-10 15:50:47 +00:00
custom_class_detail.h	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 )	2024-07-08 07:03:53 +00:00
custom_class.h	[2/N] Fix some violations of unused-function and unused-variable checks in torch_cpu (#129878 )	2024-07-04 00:39:28 +00:00
extension.h
functional.py	[BE] enable UFMT for `torch/nn/*.py` (#128593 )	2024-06-23 16:05:13 +00:00
hub.py	[BE] enable UFMT for `torch/nn/*.py` (#128593 )	2024-06-23 16:05:13 +00:00
library.h	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 )	2024-07-08 07:03:53 +00:00
library.py	[custom_ops] expose torch.library.register_torch_dispatch (#130261 )	2024-07-12 14:13:01 +00:00
overrides.py	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 )	2024-07-11 17:30:28 +00:00
py.typed
quasirandom.py	[BE] enable UFMT for top-level files `torch/*.py` (#127707 )	2024-06-12 20:15:05 +00:00
random.py	[BE] enable UFMT for top-level files `torch/*.py` (#127707 )	2024-06-12 20:15:05 +00:00
README.txt
return_types.py	[BE] enable UFMT for top-level files `torch/*.py` (#127707 )	2024-06-12 20:15:05 +00:00
script.h
serialization.py	Add torch.serialization.safe_globals context manager (#127939 )	2024-07-12 20:38:43 +00:00
storage.py	typing: storage (#130669 )	2024-07-16 14:31:35 +00:00
torch_version.py	[BE] enable UFMT for top-level files `torch/*.py` (#127707 )	2024-06-12 20:15:05 +00:00
types.py	typing fake_tensor.py (#128041 )	2024-07-13 06:07:40 +00:00
version.py.tpl

README.txt

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.