pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Yuanyuan Chen	99c8640b5d	[1/N] Change C-style casts to static_cast or reinterpret_cast (#165750 ) This series of changes try to cover C style casts into C++ alternatives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165750 Approved by: https://github.com/Skylion007	2025-10-20 23:27:13 +00:00
PyTorch MergeBot	ab82456c16	Revert "[1/N] Change C-style casts to static_cast or reinterpret_cast (#165750 )" This reverts commit `e1e8491b31`. Reverted https://github.com/pytorch/pytorch/pull/165750 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165750#issuecomment-3422413890))	2025-10-20 14:51:58 +00:00
Yuanyuan Chen	e1e8491b31	[1/N] Change C-style casts to static_cast or reinterpret_cast (#165750 ) This series of changes try to cover C style casts into C++ alternatives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165750 Approved by: https://github.com/Skylion007	2025-10-20 04:36:19 +00:00
Yuanyuan Chen	5103ecc5d8	[1/N] Fix clang-tidy readability checks (#164561 ) Check all `.cpp` files except `jit` files for readability thoroughly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164561 Approved by: https://github.com/Skylion007	2025-10-04 09:40:38 +00:00
Simon Mahns	157683d862	[Reducer] Remove custom handling of view tensors for MTIA (#157882 ) Summary: Following implementation of the updated ATen Backend for mtia, and diffs enabling in tree view ops (D75266206, D75385411), we can remove custom logic from reducer to handle MTIA view operations. Test Plan: CI Rollback Plan: Reviewed By: egienvalue Differential Revision: D77843212 Pull Request resolved: https://github.com/pytorch/pytorch/pull/157882 Approved by: https://github.com/albanD, https://github.com/andyanwang	2025-07-11 17:56:45 +00:00
Xuehai Pan	d55dc00f84	[BE][11/16] fix typos in torch/ (torch/csrc/distributed/) (#156321 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156321 Approved by: https://github.com/jingsh ghstack dependencies: #156313, #156314, #156315, #156316, #156317, #156319	2025-06-23 02:57:50 +00:00
PyTorch MergeBot	4b55871e06	Revert "[BE][11/16] fix typos in torch/ (torch/csrc/distributed/) (#156321 )" This reverts commit `c95f7fa874`. Reverted https://github.com/pytorch/pytorch/pull/156321 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156321#issuecomment-2994163667))	2025-06-22 12:27:36 +00:00
Xuehai Pan	c95f7fa874	[BE][11/16] fix typos in torch/ (torch/csrc/distributed/) (#156321 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156321 Approved by: https://github.com/jingsh ghstack dependencies: #156313, #156314, #156315, #156316, #156317, #156319	2025-06-22 08:43:49 +00:00
PyTorch MergeBot	59c5fff2aa	Revert "[DDP] rebuilt bucket order when find_unused_parameters=true (#153404 )" This reverts commit `a79e621c1c`. Reverted https://github.com/pytorch/pytorch/pull/153404 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/153404#issuecomment-2902741300))	2025-05-22 22:26:59 +00:00
Yanli Zhao	a79e621c1c	[DDP] rebuilt bucket order when find_unused_parameters=true (#153404 ) Differential Revision: D72437251 Enable to rebuild bucket order when find_unused_parameters=true. It should be always better than not rebuilding bucket order when find_unused_parameters=True: 1. for cases where bucket order in the first iteration is the same as the parameter order, rebuilding bucket order will not change anything 2. for cases where bucket order in the first iteration is not the same as the parameter order, there could be two cases: a. bucket order will not change after 1st iteration even the graph is dynamic and there is unused parameter, in this case, rebuilding bucket order will have performance gain b. bucket order change after 1st iteration due to dynamic graph, in this case, both parameter order and bucket order in 1st iteration are not ideal, so rebuilding bucket order or not does not matter it can help case 2.a if enabling to rebuild bucket order when find_unused_parameters=true. meanwhile it will not hurt other cases in 1 and 2.b. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153404 Approved by: https://github.com/rohan-varma, https://github.com/fegin	2025-05-20 02:45:01 +00:00
Simon Fan	d1f1ff8610	[ddp] propagate use_python_reducer to C++ reducer (#152735 ) C++ Reducer is silently incorrect under CA, its implementation is no-oping the collective. I'm guessing that it was no-op'd because in DDP + python reducer, the C++ reducer is still being initialized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152735 Approved by: https://github.com/fegin ghstack dependencies: #153300, #152689	2025-05-16 01:38:03 +00:00
Tristan Rice	1a6d50d407	Reducer: add check on received data to avoid segfault (#152143 ) When ncclCommAbort is called it may return invalid/corrupted data to the reducer. This adds a check so we don't read past the end of the tensors leading to a segfault. While this looks like it could be a security issue it actually isn't since we only read past the end of the buffer, not write. Fixes #149418 Test plan: https://gist.github.com/d4l3k/b47c2c95cf9c37e78069e19f1b6ed2c6 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152143 Approved by: https://github.com/fduwjj, https://github.com/fegin	2025-04-25 02:16:44 +00:00
Yanli Zhao	d8bafd23ab	[DDP] add one option to allow skipping all reduce unused parameters (#151503 ) Summary: add one option to allow skipping all reduce unused parameters, this could help improve training throughput significantly when the number of unused parameters is large in the model. Test Plan: unit tests, CI Differential Revision: D72282069 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151503 Approved by: https://github.com/mrshenli	2025-04-17 23:30:19 +00:00
Simon Fan	457ff9b7ae	[reland][ca] side-effect free inital trace: compiled_args (#148376 ) This reverts commit `ea12fc8a9f`. Reland https://github.com/pytorch/pytorch/pull/147804, there was a bad import inserted by my linter. Differential Revision: [D70582747](https://our.internmc.facebook.com/intern/diff/D70582747) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148376 Approved by: https://github.com/jansel	2025-03-11 01:57:36 +00:00
cyy	9aa897b992	Remove unnecessary tensor clone (#148159 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/148159 Approved by: https://github.com/Skylion007	2025-03-02 16:21:39 +00:00
Wouter Devriendt	ea12fc8a9f	Revert D70262395 (#148164 ) Summary: This reverts #147804 due to internal revert. --- This diff reverts D70262395 Reviewed By: RossMcKenzie Differential Revision: D70318024 @diff-train-skip-merge Pull Request resolved: https://github.com/pytorch/pytorch/pull/148164 Approved by: https://github.com/xmfan	2025-02-28 06:39:48 +00:00
Simon Fan	fd1220e386	[ca] side-effect free inital trace: compiled_args (#147804 ) const methods to prevent accidental mutation. changes mainly in Error nodes and PyNode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147804 Approved by: https://github.com/jansel ghstack dependencies: #147242, #147796	2025-02-26 16:37:27 +00:00
Ke Wen	f211818bc0	[c10d] Restrict use condition of NCCL mem pool (#147764 ) Add check to see if CUDA driver support multicast, as does in Symmetric Memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147764 Approved by: https://github.com/syed-ahmed, https://github.com/yifuwang	2025-02-26 03:40:00 +00:00
PyTorch MergeBot	143f0f0006	Revert "[ca] side-effect free inital trace: compiled_args (#147804 )" This reverts commit `ec768d8dc0`. Reverted https://github.com/pytorch/pytorch/pull/147804 on behalf of https://github.com/wdvr due to failing tests in the slow workflow, see below ([comment](https://github.com/pytorch/pytorch/pull/147804#issuecomment-2683594740))	2025-02-26 00:31:40 +00:00
Simon Fan	ec768d8dc0	[ca] side-effect free inital trace: compiled_args (#147804 ) const methods to prevent accidental mutation. changes mainly in Error nodes and PyNode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147804 Approved by: https://github.com/jansel ghstack dependencies: #147242, #147796	2025-02-25 20:38:51 +00:00
Ke Wen	e1bf892d90	[DDP] Temporarily disable comm mem (#147663 ) For fear that it incur slightly more memory usage and cause some applications at tight memory margin to OOM. (bc the comm mem pool is a separate pool than the regular pool ?) Differential Revision: [D70026681](https://our.internmc.facebook.com/intern/diff/D70026681) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147663 Approved by: https://github.com/d4l3k	2025-02-22 05:55:43 +00:00
Ke Wen	effc545274	[DDP] Use NCCL allocated memory for gradient bucket (#146589 ) So that NVLink SHARP comes with zero-copy on H100+ platforms, for DDP applications. Less SM usage, less memory contention between NCCL kernel and compute kernels. Added env `DDP_DISABLE_COMM_MEM` as a back-out option: ``` An environment variable to disable comm-optimized memory pool. Default is 0, which means comm-optimized memory pool is enabled. Users can set it to 1 in case of seeing regression or OOM (because this comm MemPool may not share space with regular compute MemPool). ``` Differential Revision: [D69297766](https://our.internmc.facebook.com/intern/diff/D69297766) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146589 Approved by: https://github.com/syed-ahmed, https://github.com/c-p-i-o, https://github.com/fduwjj	2025-02-10 05:23:11 +00:00
cyy	fa0592b568	Remove some NOLINT (#146610 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146610 Approved by: https://github.com/Skylion007, https://github.com/malfet	2025-02-07 01:50:06 +00:00
cyy	6a35d9aaa4	Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (#143806 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/143806 Approved by: https://github.com/kwen2501	2025-01-24 12:22:13 +00:00
PyTorch MergeBot	6a2b4db0a1	Revert "Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (#143806 )" This reverts commit `42f4fda2eb`. Reverted https://github.com/pytorch/pytorch/pull/143806 on behalf of https://github.com/huydhn due to Lots of builds fail after this land, so maybe a landrace ([comment](https://github.com/pytorch/pytorch/pull/143806#issuecomment-2611275836))	2025-01-24 00:17:34 +00:00
cyy	42f4fda2eb	Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (#143806 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/143806 Approved by: https://github.com/kwen2501	2025-01-23 22:47:18 +00:00
cyy	7d98b3dcee	[3/N] Apply bugprone-unchecked-optional-access (#142442 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/142442 Approved by: https://github.com/albanD	2024-12-11 01:39:10 +00:00
cyy	96be048f06	[1/N] Avoid copy in std::get (#141812 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141812 Approved by: https://github.com/Skylion007	2024-12-01 03:53:35 +00:00
cyy	40fb738197	Use Wextra-semi (#140236 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140236 Approved by: https://github.com/ezyang	2024-11-13 02:15:16 +00:00
cyyever	ce631939f0	[Distributed] [18/N] Fix clang-tidy warnings in torch/csrc/distributed/ (#138692 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/138692 Approved by: https://github.com/ezyang	2024-10-25 05:32:38 +00:00
cyy	2bcfbf2505	[Distributed] [17/N] Fix clang-tidy warnings in torch/csrc/distributed/ (#138465 ) Follows #137404 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138465 Approved by: https://github.com/ezyang	2024-10-24 04:58:49 +00:00
Richard Barnes	fddabc6e0b	C10_UNUSED to [[maybe_unused]] (#6357 ) (#138364 ) Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/6357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138364 Approved by: https://github.com/Skylion007, https://github.com/eqy	2024-10-19 13:17:43 +00:00
cyy	f4dcf2ae93	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang, https://github.com/r-barnes	2024-07-08 07:03:53 +00:00
PyTorch MergeBot	846bb30e13	Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 )" This reverts commit `bd72e28314`. Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build `bd72e28314`. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))	2024-06-15 01:58:20 +00:00
cyy	bd72e28314	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang	2024-06-14 23:21:01 +00:00
Pritam Damania	0dd55ee159	Fix bug in _update_process_group API (#128262 ) `local_used_map_` was undefined in case of `find_unused_parameters=False`, this resulted in an error when we ran `local_used_map_.fill_(0);` Added a unit test as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/128262 Approved by: https://github.com/awgu	2024-06-08 19:52:24 +00:00
Pritam Damania	e9c5144cbc	Fix bug in update_process_group DDP API (#128092 ) Fix bug in `_update_process_group` DDP API where we didn't correctly reset `local_used_map_` and a few other variables. This resulted in errors like `Encountered gradient which is undefined, but still allreduced by...` Added a unit test as well that reproduced the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128092 Approved by: https://github.com/awgu, https://github.com/fegin	2024-06-06 17:10:42 +00:00
Richard Barnes	ed327876f5	[codemod] `c10:optional` -> `std::optional` (#126135 ) Generated by running the following from PyTorch root: ``` find . -regex ".*\.$cpp\\|h\\|cu\\|hpp\\|cc\\|cxx$$" \| grep -v "build/" \| xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/' ``` `c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi	2024-05-14 19:35:51 +00:00
Can Balioglu	6ea226b99c	Fix DDP no_sync when find_unused_parameters is True (#124193 ) Fixes #69031, #42793 This PR fixes the bug introduced in #54981 where parameters used within a `no_sync` scope are not respected when `find_unused_parameters` is set to `True`. The `local_used_map_` and `numGradHooksTriggeredMap_` variables should be updated regardless of the `no_sync` state. Tested and verified with fairseq2 and wav2vec2 ASR finetuning recipe. All gradients are correctly synced across workers as expected after applying this fix. Co-authored-by: Kaushik Ram Sadagopan <kaushikram2811@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124193 Approved by: https://github.com/rohan-varma	2024-05-09 17:33:33 +00:00
cyy	1ac402a96c	[Distributed] [6/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124701 ) This PR continues to fix some clang-tidy warnings in distributed/c10d code, following https://github.com/pytorch/pytorch/pull/124043. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124701 Approved by: https://github.com/ezyang	2024-04-25 11:39:23 +00:00
cyy	77a45883ce	[Reland] [Distributed] [2/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#123821 ) Reland of #122892 with problematic changes reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123821 Approved by: https://github.com/Skylion007	2024-04-13 00:57:03 +00:00
PyTorch MergeBot	54801e6fd6	Revert "[Distributed] [2/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#122892 )" This reverts commit `0ba16ffd35`. Reverted https://github.com/pytorch/pytorch/pull/122892 on behalf of https://github.com/atalman due to broke cuda tests ([comment](https://github.com/pytorch/pytorch/pull/122892#issuecomment-2037207036))	2024-04-04 13:22:22 +00:00
cyy	0ba16ffd35	[Distributed] [2/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#122892 ) This PR continues to fix some clang-tidy warnings in distributed code, following #122884. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122892 Approved by: https://github.com/Skylion007	2024-04-04 00:39:31 +00:00
cyy	87c6cd2f00	[1/N] Replace std::tie with structural binding (#119774 ) This PR replaces some std::tie calls with structural binding from C++17. This not only makes the code more compact, but also has some performance gain. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119774 Approved by: https://github.com/albanD, https://github.com/malfet	2024-02-14 09:25:04 +00:00
Chien-Chin Huang	1d2382f141	[DDP] Use compiled_autograd to trace DDP backward allreduce (#110662 ) Summary The reducer of `DistributedDataParallel` is implemented with C++ and it is not easy to trace the allreduce launched in the reducer. This PR modifies `DistributedDataParallel` to launch one allreduce per gradient when `compiled_autograd` is enabled. The changes allow us to use `compiled_autograd` to trace the allreduce and later be optimized (fused) in the Inductor. Key Logic 1. If `ddp_python_hook` is True, we assume `compiled_autograd` is used. `DistributedDataParallel` registers `compiled_accum_grad_hook` for all parameters. 2. In the first forward() call, if `DistributedDataParallel` is not compiled, all `compiled_accum_grad_hook` are deregistered. If `DistributedDataParallel` is compiled, all `compiled_accum_grad_hook` will be compiled by `compiled_autograd`. 3. `compiled_accum_grad_hook` launches an allreduce to reduce the gradient of the parameter. Bucketing The compiled backward is slow because there is no bucketing for the allreduces. We rely on Inductor to bucket the allreduces. The bucketing is done in a separate PR. Differential Revision: [D49428482](https://our.internmc.facebook.com/intern/diff/D49428482/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110662 Approved by: https://github.com/wconstab	2024-02-08 03:03:15 +00:00
garfield1997	ff9ce94489	Create empty host tensor for privateuseone (#118854 ) For the H2D copy of local_used_map_ on the privateuseone device, reuse the CUDA logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118854 Approved by: https://github.com/ezyang	2024-02-01 15:32:55 +00:00
Jun Luo	2d43e31aa9	Fix wrong behavior of is_alias_of and c10d::reducer on MTIA (#115553 ) Reviewed By: kirteshpatil Differential Revision: D51860023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115553 Approved by: https://github.com/fduwjj	2023-12-15 11:14:41 +00:00
Scott Wolchok	165f4f6ccf	[PyTorch] Redirect c10::optional to std::optional (#101995 ) We have C++17 now! I am intentionally dropping the `c10::optional<c10::ArrayRef>` size optimization. It was intended to improve dispatch, but thanks to D34602980 / #70864 we don't use `optional<ArrayRef>` in function arguments anymore anyway. Differential Revision: [D46079028](https://our.internmc.facebook.com/intern/diff/D46079028/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101995 Approved by: https://github.com/malfet, https://github.com/Skylion007, https://github.com/ezyang	2023-11-30 02:46:41 +00:00
Pritam Damania	f505d76462	Bug fixes to DDP _update_process_group API. (#114194 ) https://github.com/pytorch/pytorch/pull/113580 introduced the `DDP._update_process_group` API. However, the implementation did not correctly reset all of the necessary state in the reducer. In particular if an error occurred during backward, DDP would end up in an incorrect state. As a result, in this PR I've enhanced the unit test to test for this case and also appropriately fixed resetting Reducer state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114194 Approved by: https://github.com/rohan-varma	2023-11-27 23:52:40 +00:00
Pavan Balaji	958f3b0df6	[nccl-pg] Migrate to getCvar* functions for env variable checking (#113797 ) Summary: The getCvar* functions allow us to provide multiple environment variables for the same value. This allows us to deprecate some variables in favor of others, while still allowing users to temporarily use the old variables for some time. Test Plan: OSS CI Reviewed By: fduwjj, XilunWu Differential Revision: D51225487 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/113797 Approved by: https://github.com/fduwjj	2023-11-19 03:48:58 +00:00

1 2 3 4 5

218 Commits