pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Aaron Orenstein 524fe784ec BundledAutotuneCache (take 2) (#137902 ) Summary: Add a cache to combine individual autotune caches into a single cached bundle. We still rely on the individual autotune caches - on a cache hit we copy the individual results into the local caches so they can retrieved later. Attempt 2 of #134959 (D60677499). Various configs: env: TORCHINDUCTOR_BUNDLED_AUTOTUNE_REMOTE_CACHE config: bundled_autotune_remote_cache jk: pytorch/remote_cache:bundled_autotune_remote_cache_version Test Plan: unit tests Manually tested w/ EMU: ``` cd fbcode/accelerators/workloads/models/emu_flash/v1p4 make build_benchmark_model && make save_model_to_path make test_pt2_latency ``` - on a cold run we got 0 hits and 40 misses. On a warm run it got 40 hits and 0 miss. - perf seems a little better - for 8 runs: - no bundled cache averaged 14m11s - bundled cache averaged 14m6s - 125ms saved per cache entry seems reasonable Cache Metrics for an sample run: no bundled cache: ``` INFO: Cache Metrics: FbMemcacheRemoteKernelCache: {hit: 2256, miss: 0, put: 0, exception: 0} FbRemoteAutotuneCache: {hit: 0, miss: 0, put: 7, exception: 0} FbRemoteFxGraphCache: {hit: 40, miss: 0, put: 0, exception: 0} LocalAutotuneCache: {hit: 878, miss: 0, put: 7, exception: 0} backend:MemcacheCache: {hit: 2256, miss: 0, put: 7, exception: 0} backend:_LocalAutotuneCacheBackend: {hit: 878, miss: 0, put: 7, exception: 0} backend:_ManifoldCache: {hit: 40, miss: 0, put: 0, exception: 0} ``` bundled cache: ``` INFO: Cache Metrics: FbMemcacheRemoteKernelCache: {hit: 2258, miss: 0, put: 0, exception: 0} FbRemoteAutotuneCache: {hit: 0, miss: 0, put: 8, exception: 0} FbRemoteBundledAutotuneCache: {hit: 40, miss: 0, put: 0, exception: 0} <<<<<< FbRemoteFxGraphCache: {hit: 40, miss: 0, put: 0, exception: 0} LocalAutotuneCache: {hit: 878, miss: 0, put: 886, exception: 0} backend:MemcacheCache: {hit: 2258, miss: 0, put: 8, exception: 0} backend:_LocalAutotuneCacheBackend: {hit: 878, miss: 0, put: 886, exception: 0} backend:_ManifoldCache: {hit: 80, miss: 0, put: 0, exception: 0} ``` Differential Revision: D64336043 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137902 Approved by: https://github.com/oulgen		2024-10-15 18:39:47 +00:00
..
codegen
data
distributed	Fixed error string assertion in test_invalid_devices (#137772 )	2024-10-13 18:10:07 +00:00
generated
opinfo	[CPU] Expand `torch.special.i1` to Half and BF16 (#137899 )	2024-10-15 17:00:58 +00:00
optests	[aotd] Fix rrelu compilation (#136008 )	2024-09-25 11:26:19 +00:00
test_module
__init__.py
autocast_test_lists.py	Add _addmm_activation to lower precision cast policy on AutocastCPU (#135936 )	2024-09-18 16:31:27 +00:00
autograd_function_db.py
check_kernel_launches.py
common_cuda.py	BundledAutotuneCache (take 2) (#137902 )	2024-10-15 18:39:47 +00:00
common_device_type.py	[Inductor UT] Generalize newly introduced inductor UTs for intel GPU (Part 3) (#136947 )	2024-10-12 13:21:20 +00:00
common_dist_composable.py
common_distributed.py	Revert "[Distributed] Fix extra context on device 0 (#135273 )"	2024-10-10 23:47:25 +00:00
common_dtype.py	[redo] Fp8 support for item() with cuda, index_select, and fill_ cpu (#137341 )	2024-10-07 00:58:51 +00:00
common_fsdp.py	Generalization of FSDP common for non-cuda execution (#133209 )	2024-09-27 00:38:10 +00:00
common_jit.py
common_methods_invocations.py	[CPU] Expand `torch.special.i1` to Half and BF16 (#137899 )	2024-10-15 17:00:58 +00:00
common_mkldnn.py
common_modules.py	Revert "Validate input types for `torch.nn.Linear` and `torch.nn.Bilinear` (#135596 )"	2024-09-13 18:06:56 +00:00
common_nn.py
common_optimizers.py	Add Support for Tracking Parameter Names (named_parameters) in Optimizer State Dict (#134107 )	2024-10-14 19:24:44 +00:00
common_pruning.py
common_quantization.py	Change to export_for_training in XNNPACK tests (#137238 )	2024-10-03 21:28:05 +00:00
common_quantized.py
common_subclass.py	Fix wrapper subclass serialization with custom sizes / strides (#137030 )	2024-10-02 18:55:03 +00:00
common_utils.py	Unify cpp_extension build directory removal (#136059 )	2024-10-03 06:22:11 +00:00
composite_compliance.py	Ensure noncontiguous tensor creation tests offsetting (#136396 )	2024-10-02 00:40:43 +00:00
custom_op_db.py
custom_tensor.py
dist_utils.py
dynamo_test_failures.py
hop_db.py	[FlexAttention] Add Better error message for cpu tensors (#136673 )	2024-09-26 16:40:21 +00:00
hypothesis_utils.py
inductor_utils.py	[Inductor UT] Generalize newly introduced inductor UTs for intel GPU (Part 3) (#136947 )	2024-10-12 13:21:20 +00:00
jit_metaprogramming_utils.py
jit_utils.py
logging_tensor.py
logging_utils.py
quantization_torch_package_models.py
static_module.py
torchbind_impls.py
triton_utils.py
two_tensor.py