pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Mikayla Gawarecki 8e483654cb Add config.save.use_pinned_memory_for_d2h to serialization config (#143342 ) This was benchmarked with two separate scripts on my A100 (A) Save state_dict of llama3-style model on CUDA to disk with ``torch.save`` (B) Save `ModuleList` of 10 `nn.Linear(10,000, 10,000)` on CUDA to disk with `torch.save` Timings are an average of 5 runs and benchmark scripts + results are attached Under both scenarios, we see ~2x speedup in ``torch.save`` time with (``compute_crc32=False`` and ``use_pinned_memory_for_d2h=True``) compared to the baseline of the current defaults (``compute_crc32=True`` and ``use_pinned_memory_for_d2h=False`` (A) Save state_dict of llama3-style model on CUDA to disk with ``torch.save`` [[script](https://gist.github.com/mikaylagawarecki/d3a86ea1bb08045d1a839976808d7432)][[results](https://gist.github.com/mikaylagawarecki/f61a4714e5cff703146a1fcb7e0c755c)] \| \| use_pinned_memory_for_d2h=False (Default) \| use_pinned_memory_for_d2h=True \| \|-\|-\|-\| \| `compute_crc_32= True` (Default)\| 28.54s \| 20.76s \| \| `compute_crc_32 = False` \| 22.57s \| 14.51s \| (B) Save `ModuleList` of 10 `nn.Linear(10,000, 10,000)` on CUDA to disk with `torch.save` [[script](https://gist.github.com/mikaylagawarecki/ecbc505436bdd4b5190ef1b3430c12b6)][[results](https://gist.github.com/mikaylagawarecki/4e686bcf030b57de8c3ca74d8f5a88f7)] \| \| use_pinned_memory_for_d2h=False (Default) \| use_pinned_memory_for_d2h=True \| \|-\|-\|-\| \| `compute_crc_32= True` (Default)\| 8.38s \| 5.53s \| \| `compute_crc_32 = False` \| 6.94s \| 3.99s \| Trace of (A) with `use_pinned_memory_for_d2h=True`, `compute_crc32=False` <img width="1745" alt="Screenshot 2024-12-16 at 7 32 33 PM" src="https://github.com/user-attachments/assets/80b87a8c-5a70-4eb9-ad66-7abc4aa7cc25" /> Baseline trace of (A) with `use_pinned_memory_for_d2h=False`, `compute_crc32=True` <img width="1799" alt="Screenshot 2024-12-16 at 7 38 20 PM" src="https://github.com/user-attachments/assets/13fa12d1-8f5f-424c-adc4-275b67012927" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/143342 Approved by: https://github.com/albanD ghstack dependencies: #143324		2024-12-20 21:01:18 +00:00
..
amp_examples.rst	Update document for autocast on CPU (#135299 )	2024-09-13 09:11:47 +00:00
autograd.rst	Fix unexpected inference_mode interaction with torch.autograd.functional.jacobian (#130307 )	2024-08-25 22:14:02 +00:00
broadcasting.rst
cpu_threading_runtimes.svg
cpu_threading_torchscript_inference.rst
cpu_threading_torchscript_inference.svg
cuda.rst	[PyTorch Pinned Allocator] Add support of background thread to process events (#135524 )	2024-09-17 21:08:10 +00:00
custom_operators.rst	Redirect the custom ops landing page :D (#139634 )	2024-11-04 22:25:15 +00:00
ddp.rst
extending.func.rst
extending.rst	[doc] fix grammar in "Extending Torch" (#140209 )	2024-11-13 05:34:43 +00:00
faq.rst
fsdp.rst
get_start_xpu.rst	update get start xpu (#137479 )	2024-10-16 17:36:29 +00:00
gradcheck.rst
hip.rst	[ROCm] set hipblas workspace (#138791 )	2024-10-29 01:37:55 +00:00
large_scale_deployments.rst
modules.rst	Fix to modules.rst: indent line with activation functions (#139667 )	2024-11-08 01:12:52 +00:00
mps.rst
multiprocessing.rst
numerical_accuracy.rst	Add option to configure reduced precision math backend for SDPA (#135964 )	2024-09-24 07:11:38 +00:00
randomness.rst	Fix typo in Reproducibility docs (#141341 )	2024-11-26 16:53:26 +00:00
serialization.rst	Add config.save.use_pinned_memory_for_d2h to serialization config (#143342 )	2024-12-20 21:01:18 +00:00
windows.rst