mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: Add a cache to combine individual autotune caches into a single cached bundle. We still rely on the individual autotune caches - on a cache hit we copy the individual results into the local caches so they can retrieved later. Attempt 2 of #134959 (D60677499). Various configs: env: TORCHINDUCTOR_BUNDLED_AUTOTUNE_REMOTE_CACHE config: bundled_autotune_remote_cache jk: pytorch/remote_cache:bundled_autotune_remote_cache_version Test Plan: unit tests Manually tested w/ EMU: ``` cd fbcode/accelerators/workloads/models/emu_flash/v1p4 make build_benchmark_model && make save_model_to_path make test_pt2_latency ``` - on a cold run we got 0 hits and 40 misses. On a warm run it got 40 hits and 0 miss. - perf seems a little better - for 8 runs: - no bundled cache averaged 14m11s - bundled cache averaged 14m6s - 125ms saved per cache entry seems reasonable Cache Metrics for an sample run: no bundled cache: ``` INFO: Cache Metrics: FbMemcacheRemoteKernelCache: {hit: 2256, miss: 0, put: 0, exception: 0} FbRemoteAutotuneCache: {hit: 0, miss: 0, put: 7, exception: 0} FbRemoteFxGraphCache: {hit: 40, miss: 0, put: 0, exception: 0} LocalAutotuneCache: {hit: 878, miss: 0, put: 7, exception: 0} backend:MemcacheCache: {hit: 2256, miss: 0, put: 7, exception: 0} backend:_LocalAutotuneCacheBackend: {hit: 878, miss: 0, put: 7, exception: 0} backend:_ManifoldCache: {hit: 40, miss: 0, put: 0, exception: 0} ``` bundled cache: ``` INFO: Cache Metrics: FbMemcacheRemoteKernelCache: {hit: 2258, miss: 0, put: 0, exception: 0} FbRemoteAutotuneCache: {hit: 0, miss: 0, put: 8, exception: 0} FbRemoteBundledAutotuneCache: {hit: 40, miss: 0, put: 0, exception: 0} <<<<<< FbRemoteFxGraphCache: {hit: 40, miss: 0, put: 0, exception: 0} LocalAutotuneCache: {hit: 878, miss: 0, put: 886, exception: 0} backend:MemcacheCache: {hit: 2258, miss: 0, put: 8, exception: 0} backend:_LocalAutotuneCacheBackend: {hit: 878, miss: 0, put: 886, exception: 0} backend:_ManifoldCache: {hit: 80, miss: 0, put: 0, exception: 0} ``` Differential Revision: D64336043 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137902 Approved by: https://github.com/oulgen |
||
|---|---|---|
| .. | ||
| codegen | ||
| data | ||
| distributed | ||
| generated | ||
| opinfo | ||
| optests | ||
| test_module | ||
| __init__.py | ||
| autocast_test_lists.py | ||
| autograd_function_db.py | ||
| check_kernel_launches.py | ||
| common_cuda.py | ||
| common_device_type.py | ||
| common_dist_composable.py | ||
| common_distributed.py | ||
| common_dtype.py | ||
| common_fsdp.py | ||
| common_jit.py | ||
| common_methods_invocations.py | ||
| common_mkldnn.py | ||
| common_modules.py | ||
| common_nn.py | ||
| common_optimizers.py | ||
| common_pruning.py | ||
| common_quantization.py | ||
| common_quantized.py | ||
| common_subclass.py | ||
| common_utils.py | ||
| composite_compliance.py | ||
| custom_op_db.py | ||
| custom_tensor.py | ||
| dist_utils.py | ||
| dynamo_test_failures.py | ||
| hop_db.py | ||
| hypothesis_utils.py | ||
| inductor_utils.py | ||
| jit_metaprogramming_utils.py | ||
| jit_utils.py | ||
| logging_tensor.py | ||
| logging_utils.py | ||
| quantization_torch_package_models.py | ||
| static_module.py | ||
| torchbind_impls.py | ||
| triton_utils.py | ||
| two_tensor.py | ||