pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
clr	d80ae738c9	compile_worker: Make a timer class (#166465 ) This subclass allows us to trigger an action after we haven't seen any activity for a certain amount of seconds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166465 Approved by: https://github.com/masnesral	2025-10-31 22:39:31 +00:00
Jovian Anthony Jaison	94b91a8763	[redone][pytorch] Moving torch.compile worker process logs to a dedicated rank based log directory (#160352 ) Summary: Writing torch.compile worked logs to dedicated_log_rank{RANK} if we're running on mast. ref: D79456310 (got reverted because of linter) Testing: Refer differential Revision: D79917440 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160352 Approved by: https://github.com/masnesral	2025-08-12 16:49:08 +00:00
PyTorch MergeBot	cb4b29b754	Revert "[pytorch] Moving torch.compile worker process logs to a dedicated rank based log directory (#159874 )" This reverts commit `9fd5b5f735`. Reverted https://github.com/pytorch/pytorch/pull/159874 on behalf of https://github.com/malfet due to Broke lint ([comment](https://github.com/pytorch/pytorch/pull/159874#issuecomment-3161896978))	2025-08-06 23:21:29 +00:00
Jovian Anthony Jaison	9fd5b5f735	[pytorch] Moving torch.compile worker process logs to a dedicated rank based log directory (#159874 ) Summary: Writing torch.compile worked logs to dedicated_log_rank{RANK} if we're running on mast. Test Plan: See: D79456310 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159874 Approved by: https://github.com/c00w	2025-08-06 22:33:04 +00:00
Sam Larsen	7a41f20794	[inductor] Quiesce Triton compile worker pool after each dynamo compile (#156187 ) For internal usages, keeping the Triton compile worker pool active for the lifetime of the process has caused some challenges, e.g., it slows down and muddies profiling due to the huge number of threads on a box: N threads = 8 ranks * 32 subprocs * M threads started by torch. Also, each subproc can use more than 1GB each. This PR adds the functionality to shutdown worker subprocs after each dynamo compile when using the SubprocPool implementation. The idea is to leave the main sidecar process running, but signal it to tear down its internal ProcessPoolExecutor when compile is finished. Restarting the ProcessPoolExecutor is relatively fast, e.g., 500ms because the ProcessPoolExecutor forks from the sidecar. Changes: * Do not start the ProcessPoolExecutor automatically when compile_fx is imported. Instead, start the sidecar process only. The sidecar process imports torch, so is still slow to start. * Introduce wakeup() and quiesce() calls to the implementation to start and stop the ProcessPoolExecutor. * Add a context manager to automatically quiesce() at the end of dynamo compilation. * Signal a wakeup() in compile_fx only when we have cuda devices. * Add a killswitch so we can turn of quiescing. Testing: For correctness, the stacked change at https://github.com/pytorch/pytorch/pull/156534 enables the feature for OSS so it's exercised in CI. For performance, because of recent compile-time variance (see https://github.com/pytorch/pytorch/issues/152566), it's pretty hard to glean whether there's a regression.... * Training: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Tue%2C%2017%20Jun%202025%2021%3A32%3A04%20GMT&stopTime=Tue%2C%2024%20Jun%202025%2021%3A32%3A04%20GMT&granularity=hour&mode=training&dtype=amp&deviceName=cuda%20(h100)&lBranch=gh/masnesral/210/head&lCommit=1b7315031c3bfad66a1a01700167a9ca1a2ae5f1&rBranch=main&rCommit=eab45643f22e58ee12d95d8b0162d51ca0a50801 * Inference: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Tue%2C%2017%20Jun%202025%2021%3A32%3A04%20GMT&stopTime=Tue%2C%2024%20Jun%202025%2021%3A32%3A04%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=gh/masnesral/210/head&lCommit=1b7315031c3bfad66a1a01700167a9ca1a2ae5f1&rBranch=main&rCommit=eab45643f22e58ee12d95d8b0162d51ca0a50801 The wins (mostly for inference) don't make sense, but I'm also skeptical of the losses (mostly for training). I can't repro any of the slowdowns locally. Furthermore, check out the benchmarking results for the stacked diff, which actually enables the quiescing functionality for OSS. That should only slow down compile since there can only be overhead to stop and start the workers. But the results are somehow better: * Training: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Tue%2C%2017%20Jun%202025%2021%3A32%3A04%20GMT&stopTime=Tue%2C%2024%20Jun%202025%2021%3A32%3A04%20GMT&granularity=hour&mode=training&dtype=amp&deviceName=cuda%20(h100)&lBranch=gh/masnesral/214/head&lCommit=41943253882a019b8ceafcd2bf4cd6acbe0cbca9&rBranch=main&rCommit=eab45643f22e58ee12d95d8b0162d51ca0a50801 * Inference: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Tue%2C%2017%20Jun%202025%2021%3A32%3A04%20GMT&stopTime=Tue%2C%2024%20Jun%202025%2021%3A32%3A04%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=gh/masnesral/214/head&lCommit=41943253882a019b8ceafcd2bf4cd6acbe0cbca9&rBranch=main&rCommit=eab45643f22e58ee12d95d8b0162d51ca0a50801 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156187 Approved by: https://github.com/aorenste, https://github.com/jansel	2025-07-08 22:53:13 +00:00
Xu Han	5ad759ca33	[inductor] calibration inductor windows uts (2/N) (#134358 ) skip unsupported UTs of `test\inductor\test_compile_worker.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134358 Approved by: https://github.com/jansel	2024-08-24 04:08:59 +00:00
Xuehai Pan	134bc4fc34	[BE][Easy][12/19] enforce style for empty lines in import segments in `test/i*/` (#129763 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129763 Approved by: https://github.com/jansel	2024-07-18 07:49:19 +00:00
PyTorch MergeBot	b732b52f1e	Revert "[BE][Easy][12/19] enforce style for empty lines in import segments in `test/i*/` (#129763 )" This reverts commit `aecc746fcc`. Reverted https://github.com/pytorch/pytorch/pull/129763 on behalf of https://github.com/XuehaiPan due to need reland after rerunning lintrunner on main ([comment](https://github.com/pytorch/pytorch/pull/129763#issuecomment-2235736732))	2024-07-18 06:39:58 +00:00
Xuehai Pan	aecc746fcc	[BE][Easy][12/19] enforce style for empty lines in import segments in `test/i*/` (#129763 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129763 Approved by: https://github.com/jansel	2024-07-18 05:13:41 +00:00
Sam Larsen	24443fe16a	[inductor] parallel compile: Print traceback detail when there's an exception in a sub-process (#128775 ) Summary: We lose traceback info when an exception occurs in a subprocess because Python traceback objects don't pickle. In the subprocess-based parallel compile, we _are_ logging an exception in the subprocess, but a) those messages are easy to miss because they're not in the traceback output, and b) it seems that logging in the subproc is swallowed by default in internal builds. This PR captures the traceback in the subprocess and makes it available in the exception thrown in the main process. Users now see failures that look like this: ``` ... File "/home/slarsen/.conda/envs/pytorch-3.10_3/lib/python3.10/concurrent/futures/_base.py", line 458, in result return self.__get_result() File "/home/slarsen/.conda/envs/pytorch-3.10_3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: SubprocException: An exception occurred in a subprocess: Traceback (most recent call last): File "/data/users/slarsen/pytorch-3.10_3/torch/_inductor/compile_worker/subproc_pool.py", line 270, in do_job result = SubprocMain.foo() File "/data/users/slarsen/pytorch-3.10_3/torch/_inductor/compile_worker/subproc_pool.py", line 263, in foo SubprocMain.bar() File "/data/users/slarsen/pytorch-3.10_3/torch/_inductor/compile_worker/subproc_pool.py", line 260, in bar SubprocMain.baz() File "/data/users/slarsen/pytorch-3.10_3/torch/_inductor/compile_worker/subproc_pool.py", line 257, in baz raise Exception("an error occurred") Exception: an error occurred ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128775 Approved by: https://github.com/jansel	2024-06-17 15:10:47 +00:00
Sam Larsen	94c0dcbe1d	[inductor] Parallel compile: handle crashes in subprocesses (#128757 ) Summary: If any subprocess in the pool crashes, we get a BrokenProcessPool exception and the whole pool becomes unusable. Handle crashes by recreating the pool. Test Plan: * New unit test * Started a long-running test (`test/inductor/test_torchinductor.py`), periodically killed subprocess manually, made sure the test run recovers and makes progress. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128757 Approved by: https://github.com/jansel	2024-06-15 17:35:04 +00:00
Sam Larsen	6c81856dca	[inductor] Add a subprocess-based parallel compile (#126816 ) Summary: Adds a "safe" parallel compile implementation that a) Popens a sub-process with an entry point we control, and b) Uses a ProcessPoolExecutor in that sub-processes to perform parallel compiles. This change essentially squashes these two implementations from jansel, but removes the "thread-based" approach since benchmarking revealed that compile-time performance was poor compared to the existing impl: https://github.com/pytorch/pytorch/pull/124682 https://github.com/pytorch/pytorch/pull/122941 This PR adds the implementation, but defaults to the existing "fork". I'll submit a separate change to enable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126816 Approved by: https://github.com/jansel	2024-05-29 09:40:21 +00:00

12 Commits