pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Tristan Rice	494abeff8a	CUDACachingAllocator,c10d: fixes for IPC release performance (#148805 ) This has two fixes to improve IPC tensor release performance when using torchft's BabyProcessGroupNCCL. 1. release the IpcMutex when deleting the `ExpandableSegements` object to avoid synchronizing under the lock 2. release the GIL in WorkNCCL destructor since the shared tensor will be destructed there Test plan: Run with torchft + torchtitan ``` REPLICA_GROUP_ID=0 NGPU=2 CUDA_VISIBLE_DEVICES=0,1 CONFIG_FILE=./torchtitan/models/llama/train_configs/llama3_8b.toml ./run_train.sh --training.data_par allel_shard_degree=2 --fault_tolerance.enable --fault_tolerance.group_size=2 --fault_tolerance.replica_id=0 --metrics.log_freq=1 --training.seq_len 4096 ... [rank0]:[titan] 2025-03-07 17:51:31,387 - root - INFO - step: 61 loss: 7.4825 memory: 79.73GiB(83.89%) tps: 317 tflops: 16.34 mfu: 1.65% ``` Check py-spy to verify no bottleneck on IPC lock when creating new shared tensors ![20250307_17h50m10s_grim](https://github.com/user-attachments/assets/fa8b359f-e337-4ed5-be22-a42ab2bee03d) ![20250307_17h50m00s_grim](https://github.com/user-attachments/assets/206f869a-f07e-4fbd-9e28-89b3da95ef6e) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148805 Approved by: https://github.com/Skylion007, https://github.com/fegin, https://github.com/zdevito	2025-03-10 19:47:04 +00:00
Richard Barnes	5301710b15	[codemod] Fix unused-value issue in caffe2/aten/src/ATen/cuda/detail/CUDAHooks.cpp +4 (#147555 ) Summary: LLVM has a warning `-Wunused-value` which we treat as an error because it's so often diagnostic of a code issue. Unused values often indicate a programming mistake, but can also just be unnecessary cruft that harms readability and performance. For questions/comments, contact r-barnes. - If you approve of this diff, please use the "Accept & Ship" button :-) Test Plan: Sandcastle Differential Revision: D69945678 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147555 Approved by: https://github.com/Skylion007, https://github.com/eqy	2025-03-01 19:46:13 +00:00
cyy	3907f36808	Turn some variables and functions into static (#136847 ) Re-check some files and mark variables and functions into static and fix other warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136847 Approved by: https://github.com/ezyang	2024-10-29 17:01:56 +00:00
FFFrog	af0bc75460	Remove deprecated alias macro(1/3) (#137556 ) Detailed Descriptions: - Remove AT_ERROR Macro Pull Request resolved: https://github.com/pytorch/pytorch/pull/137556 Approved by: https://github.com/ezyang	2024-10-21 17:32:32 +00:00
cyy	a2396b2dd8	[2/N] Fix extra warnings brought by clang-tidy-17 (#137459 ) Follows #137407 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137459 Approved by: https://github.com/Skylion007	2024-10-08 19:05:02 +00:00
zdevito	8d9c3a71f6	Support IPC for Expandable Segments (#130890 ) This reapplication commit is the same as before except it resolves a build error in an internal build where `handle` was shadowed. Differential Revision: [D60547506](https://our.internmc.facebook.com/intern/diff/D60547506) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130890 Approved by: https://github.com/dsjohns2	2024-08-05 18:48:13 +00:00
PyTorch MergeBot	49a8e061b6	Revert "Support IPC for Expandable Segments (#130890 )" This reverts commit `0e71a88f9b`. Reverted https://github.com/pytorch/pytorch/pull/130890 on behalf of https://github.com/zdevito due to some internal tests show shutdown issues with the change to the table that holds ipc handles ([comment](https://github.com/pytorch/pytorch/pull/130890#issuecomment-2250767280))	2024-07-25 15:54:57 +00:00
zdevito	0e71a88f9b	Support IPC for Expandable Segments (#130890 ) This reapplication commit is the same as before except it resolves a build error in an internal build where `handle` was shadowed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130890 Approved by: https://github.com/dsjohns2	2024-07-24 15:45:40 +00:00
PyTorch MergeBot	1e86387871	Revert "Support IPC for Expandable Segments (#130890 )" This reverts commit `32c2f84e34`. Reverted https://github.com/pytorch/pytorch/pull/130890 on behalf of https://github.com/zdevito due to variable shadowing broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/130890#issuecomment-2245456085))	2024-07-23 14:46:28 +00:00
zdevito	32c2f84e34	Support IPC for Expandable Segments (#130890 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130890 Approved by: https://github.com/dsjohns2 ghstack dependencies: #130888, #130889	2024-07-22 16:15:01 +00:00
zdevito	d8fed480ef	Move handle-creation logic into cudacaching allocator. (#130888 ) A later PR will then make the handle abstract and able to use either cudaMalloc or expandable segments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130888 Approved by: https://github.com/dsjohns2	2024-07-18 21:34:38 +00:00
cyy	91bbcf8c71	[1/N] replace THPUtils_assert with TORCH_CHECK (#116675 ) This PR replaces THPUtils_assert with TORCH_CHECK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116675 Approved by: https://github.com/albanD	2024-01-04 11:15:33 +00:00
cyy	499146354e	Use CUDA image for lintrunner (#110502 ) We switch to pytorch-linux-jammy-cuda11.8-cudnn8-py3.9-linter for lintrunner for checking CUDA cpp source. Meanwhile, there is a Dockerfile change due to missing libiomp installation and some other clang-tidy fixes triggered by the switch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110502 Approved by: https://github.com/malfet	2023-10-12 09:16:36 +00:00
Kurt Mohler	4c5e43574c	Reland 2: Add PyObject preservation for UntypedStorage (#109039 ) Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039 Approved by: https://github.com/ezyang	2023-09-12 22:26:05 +00:00
PyTorch MergeBot	59f605be57	Revert "Reland 2: Add PyObject preservation for UntypedStorage (#109039 )" This reverts commit `419e4e17a2`. Reverted https://github.com/pytorch/pytorch/pull/109039 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing linter job in trunk, probably due to a landrace ([comment](https://github.com/pytorch/pytorch/pull/109039#issuecomment-1715147020))	2023-09-12 07:26:11 +00:00
Kurt Mohler	419e4e17a2	Reland 2: Add PyObject preservation for UntypedStorage (#109039 ) Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039 Approved by: https://github.com/ezyang	2023-09-12 01:19:40 +00:00
PyTorch MergeBot	68238606f3	Revert "Reland: Add PyObject preservation for UntypedStorage (#103907 )" This reverts commit `56b848157c`. Reverted https://github.com/pytorch/pytorch/pull/103907 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing torchdistx build which uses check_pyobj here `9c1b9f5cb2/src/python/torchdistx/_C/deferred_init.cc (L87)` ([comment](https://github.com/pytorch/pytorch/pull/103907#issuecomment-1712121158))	2023-09-08 19:27:07 +00:00
Kurt Mohler	56b848157c	Reland: Add PyObject preservation for UntypedStorage (#103907 ) This relands #97470 after #102553 reverted it. This PR attempts to fix the internal failure by avoiding an unnecessary intermediate storage buffer allocation in `c10::newStorageImplFromRefcountedDataPtr`. Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103907 Approved by: https://github.com/ezyang	2023-09-07 04:24:11 +00:00
cyy	01fc6466d1	[Reland] [1/N] fix clang-tidy warnings in torch/csrc (#108114 ) Reland of PR #107648 with auto replaced with Py_ssize_t in eval_frame.c. This PR applies fixes to some found issues by clang-tidy in torch/csrc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108114 Approved by: https://github.com/Skylion007	2023-08-30 17:11:16 +00:00
PyTorch MergeBot	8cbf77585d	Revert "[1/N] fix clang-tidy warnings in torch/csrc (#107648 )" This reverts commit `49eeca00d1`. Reverted https://github.com/pytorch/pytorch/pull/107648 on behalf of https://github.com/osalpekar due to This causes breakages due to underspecified type ([comment](https://github.com/pytorch/pytorch/pull/107648#issuecomment-1696372588))	2023-08-28 20:35:12 +00:00
cyy	49eeca00d1	[1/N] fix clang-tidy warnings in torch/csrc (#107648 ) Apply fixes to some found issues by clang-tidy in torch/csrc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107648 Approved by: https://github.com/Skylion007	2023-08-25 00:30:09 +00:00
cyy	db07ba3a9b	Use size_t in THManagedMapAllocator (#103331 ) When reviewing the source code, I found the ptrdiff_t size in THManagedMapAllocator::THManagedMapAllocator can be changed to size_t size to avoid unnecessary casts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103331 Approved by: https://github.com/malfet	2023-06-13 04:50:30 +00:00
Shiyan Deng	685505353a	Back out "Add PyObject preservation for UntypedStorage (#97470 )" (#102553 ) Summary: Original commit changeset: c24708d18ccb Original Phabricator Diff: D46159983 Test Plan: SL tests and CI Differential Revision: D46284986 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102553 Approved by: https://github.com/DanilBaibak	2023-06-01 17:23:43 +00:00
Kurt Mohler	5fe629e314	Add PyObject preservation for UntypedStorage (#97470 ) Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97470 Approved by: https://github.com/ezyang	2023-05-23 01:27:30 +00:00
mikey dagitses	387feaa131	add mutable to name of non-const Storage::data_ptr (#97694 ) See D44409928. Differential Revision: [D44432585](https://our.internmc.facebook.com/intern/diff/D44432585/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97694 Approved by: https://github.com/ezyang	2023-04-08 12:44:30 +00:00
mikey dagitses	c68a94c5ea	distinguish mutability of untyped Storage::data (#97690 ) See D44409928. Differential Revision: [D44429769](https://our.internmc.facebook.com/intern/diff/D44429769/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97690 Approved by: https://github.com/ezyang	2023-04-08 02:02:28 +00:00
mikey dagitses	49b80c3ea2	[reland] remove typed StorageImpl::data() and StorageImpl::unsafe_data() (#98411 ) Original commit changeset: a466b3cb6a0a Original Phabricator Diff: D44629941 Differential Revision: [D44709004](https://our.internmc.facebook.com/intern/diff/D44709004/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98411 Approved by: https://github.com/ezyang	2023-04-06 17:42:48 +00:00
PyTorch MergeBot	45edc58e4f	Revert "remove typed StorageImpl::data() and StorageImpl::unsafe_data() (#98219 )" This reverts commit `144d5268a1`. Reverted https://github.com/pytorch/pytorch/pull/98219 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2023-04-05 09:08:08 +00:00
mikey dagitses	144d5268a1	remove typed StorageImpl::data() and StorageImpl::unsafe_data() (#98219 ) Typed data will now only be a tensor level concept. Differential Revision: [D44629941](https://our.internmc.facebook.com/intern/diff/D44629941/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98219 Approved by: https://github.com/ezyang	2023-04-05 03:32:02 +00:00
Kurt Mohler	ffddb2219a	Change `THPStorage::cdata` to be a `MaybeOwned<Storage>`, add unpack func (#96801 ) Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96801 Approved by: https://github.com/ezyang	2023-03-17 14:58:21 +00:00
Kurt Mohler	1b59c3feb5	Add PyObjectSlot member to StorageImpl (#93342 ) Part of #91395 Also modifies how `StorageImpl`s are stored in JIT static runtime's `MemoryPlanner`, which used to `std::move` `StorageImpl`s into a vector. But `StorageImpl` can no longer be moved. Instead, `MemoryPlanner` now contains a malloced buffer to which we add new `StorageImpl`s using placement new Pull Request resolved: https://github.com/pytorch/pytorch/pull/93342 Approved by: https://github.com/ezyang	2023-03-10 10:40:01 +00:00
Shawn Xu	80a6b24ee1	[pt] move csrc shm logic to aten storage utils (#95228 ) Summary: This is part 1 of the effort to support `share_memory_()` in C++ aten library. This allows C++ code to in place replace the tensor storage to shm based. For now fd based shm is the only implementation supported to simplify memory management in general. This first part intentionally avoids public api changes (to `TensorBase`, see comments in `StorageUtil.h`) such that we can get the core features usable outside pt/csrc first. The API addition to `Tensor` or `TensorBase` would involve more distracting changes and make the change harder to review. Test Plan: ``` buck test caffe2:StorageUtils_test ``` Differential Revision: D43467616 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95228 Approved by: https://github.com/ezyang	2023-02-24 05:30:00 +00:00
cyy	1a32db15e7	Some performance fixes (#94034 ) Applies some performance fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/94034 Approved by: https://github.com/Skylion007	2023-02-04 02:17:48 +00:00
cyy	37f7c00a8a	More fixes and improved clang-tidy checkers (#93213 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93213 Approved by: https://github.com/Skylion007	2023-02-01 14:44:17 +00:00
Eric Zhang	b25a1ce22d	Release GIL when doing shared memory copies on Tensors (#85389 ) See discussion here for context: https://pytorch.slack.com/archives/GEEQ2K4MD/p1663672716533319?thread_ts=1662155536.133099&cid=GEEQ2K4MD, opening a PR as suggested by @albanD Currently PyTorch holds the GIL when copying Tensors into shared memory. For certain workloads it would be nice to be able to copy different tensors into shared memory in parallel, but with the GIL being held the copies can't truly run in parallel. Here's a short example of this: ``` import torch import time from multiprocessing.pool import ThreadPool tensors = [] for i in range(64): for j in range(8): t = torch.ones(128, 480, 640).type(torch.uint8) * i tensors.append(t) print("Done generating input tensors") with ThreadPool(processes=8) as pool: futures = [] before = time.time() for t in tensors: future = pool.apply_async(t.share_memory_) futures.append(future) for f in futures: f.get() elapsed = time.time() - before print("ELAPSED TIME", elapsed) ``` With this diff, I get: ``` ~$ python repro.py Done generating input tensors ELAPSED TIME 3.561321258544922 ~$ ``` Previously, I would get: ``` ~$ python repro.py Done generating input tensors ELAPSED TIME 16.305657386779785 ~$ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85389 Approved by: https://github.com/albanD	2022-09-29 14:17:05 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
Kurt Mohler	272193d026	Move THPStorage definitions out of `torch/csrc/generic` (#78032 ) Fixes #77908 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78032 Approved by: https://github.com/ezyang	2022-06-01 19:00:58 +00:00
PyTorch MergeBot	821c711baf	Revert "Move THPStorage definitions out of `torch/csrc/generic` (#78032 )" This reverts commit `f012152836`. Reverted https://github.com/pytorch/pytorch/pull/78032 on behalf of https://github.com/suo due to This broke windows binary builds, see: `f012152836`	2022-05-24 16:37:35 +00:00
Kurt Mohler	f012152836	Move THPStorage definitions out of `torch/csrc/generic` (#78032 ) Fixes #77908 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78032 Approved by: https://github.com/ezyang	2022-05-24 13:42:14 +00:00

39 Commits