Commit Graph

2208 Commits

Author SHA1 Message Date
Tristan Rice
ba214ab56c TCPStore: soft fail bind when agent store active (#147465)
This makes it easier to roll out `TORCHELASTIC_USE_AGENT_STORE` by opportunistically swallowing bind errors when the agent store is enabled and the port matches `MASTER_PORT`.

This should be very safe as if the store is somehow not up and the envs are set, the TCPStore client connections will fail to connect so we end up with a slightly different error message but success/failure behavior is identical.

This also pybinds `c10d::SocketError` into Python so we can assert on the error type in tests.

https://docs.google.com/document/d/1CzOn_N53AiFxWGgbyMWSnd2elCJd4lZ-ajPg2lzcxoM/edit?tab=t.0#heading=h.2j2f5dimrdau

Test plan:

```
pytest test/distributed/test_store.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147465
Approved by: https://github.com/fduwjj
2025-02-21 03:02:26 +00:00
cyy
15635b14ce [4/N] Remove unnecessary once flag usage (#146783)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146783
Approved by: https://github.com/albanD
2025-02-11 13:55:06 +00:00
Ke Wen
30cbf13544 [PGNCCL] Associate tensor allocation support with NCCL version (#146842)
This is a forward fix to #146589.
For NCCL version lower than 2.19, previous PR would see `RuntimeError: NCCL mem allocator is not supported in this NCCL version`.
This PR gates the support by checking link-time NCCL version via `ncclGetVersion`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146842
Approved by: https://github.com/XilunWu, https://github.com/wconstab, https://github.com/fduwjj
ghstack dependencies: #146589
2025-02-11 02:52:52 +00:00
Yifu Wang
97f6480cf5 Fix an issue where functional collectives don't force fx stride on inputs when compiled (#146467)
Fixes https://github.com/pytorch/pytorch/issues/146416

Also added contiguity checks in the C++ functional collective ops to prevent striding issues introduced during compilation manifest as silent correctness issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146467
Approved by: https://github.com/Chillee, https://github.com/lw, https://github.com/shunting314
2025-02-10 19:15:49 +00:00
Ke Wen
effc545274 [DDP] Use NCCL allocated memory for gradient bucket (#146589)
So that NVLink SHARP comes with zero-copy on H100+ platforms, for DDP applications.
Less SM usage, less memory contention between NCCL kernel and compute kernels.

Added env `DDP_DISABLE_COMM_MEM` as a back-out option:
```
An environment variable to disable comm-optimized memory pool.
Default is 0, which means comm-optimized memory pool is enabled.
Users can set it to 1 in case of seeing regression or OOM (because this
comm MemPool may not share space with regular compute MemPool).
```

Differential Revision: [D69297766](https://our.internmc.facebook.com/intern/diff/D69297766)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146589
Approved by: https://github.com/syed-ahmed, https://github.com/c-p-i-o, https://github.com/fduwjj
2025-02-10 05:23:11 +00:00
Dingming Wu
fa34128435 revert PTD's change that leads to signature mismatch of printNcclCommProxyTrace (#146453)
Summary: D68801098 introduced this function signature mismatch issue for printNcclCommProxyTrace. Revert it so that trunk build can pass.

Test Plan:
With the change, build of APS model using rcclexp can now pass:
`sh scripts/ltian/run_jobs/fb_fm_v2/run_fb_fm_v2_job.sh -h T20_GTT_MI300X -n 16 -b 1024 -t [2024-12-06] -d ai_infra_ngs -e ai_infra_training_rnd_tc -x 0`

Reviewed By: c-p-i-o

Differential Revision: D69149588

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146453
Approved by: https://github.com/c-p-i-o
2025-02-07 22:43:52 +00:00
Tristan Rice
68631f6e87 PyWork: preserve Python reference counting when used in functional collectives (#146376)
@fegin  found an issue where torchft is not compatible with functional collectives.

Found in https://github.com/pytorch/torchtitan/pull/806

The root cause is because PyProcessGroup/PyWork are not compatible with functional collectives due to a nasty ownership bug.

PyWork relies on a pybind trampoline to propagate requests to Python unfortunately the way Pybind works is that the Python object owns the C++ object rather than some form of shared ownership. Thus what happens is that the PyWork Python object will collected when returned to C++ from the PyProcessGroup but the C++ PyWork object still exists. When the PyWork object is used, this causes a deadlock as the corresponding Python object no longer exists

To solve this, we introduce a new `PyWorkHolder` class which holds a reference to the `py::object` as well as the trampoline class. This resolves any dependency issues since we can now hold ownership in C++ to both the Python and C++ objects.

To make this cleaner we introduce a `WORK_OVERRIDE` macro which is a patched version of `PYBIND11_OVERRIDE` that returns a `PyWorkHolder` rather than just `PyWork` and use for all collectives in PyProcessGroup.

Test plan:

```
cd pytorch
pytest test/distributed/test_c10d_functional_native.py
```

```
cd torchft
pytest torchft/process_group_test.py -k functional -v -x -s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146376
Approved by: https://github.com/yifuwang
2025-02-07 18:07:53 +00:00
cyy
25aa7ca62d Cleanup CallOnce.h (#146700)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146700
Approved by: https://github.com/albanD
2025-02-07 16:44:45 +00:00
cyy
fa0592b568 Remove some NOLINT (#146610)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146610
Approved by: https://github.com/Skylion007, https://github.com/malfet
2025-02-07 01:50:06 +00:00
cyy
f397c72697 Remove NOLINTNEXTLINE (#146238)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146238
Approved by: https://github.com/albanD
2025-02-04 02:45:32 +00:00
PyTorch MergeBot
00dc5b10f6 Revert "[Environment Variable][7/N] Use thread-safe getenv functions (#140211)"
This reverts commit 2fd1b6b361.

Reverted https://github.com/pytorch/pytorch/pull/140211 on behalf of https://github.com/atalman due to Breaks executorch tests ([comment](https://github.com/pytorch/pytorch/pull/140211#issuecomment-2632202864))
2025-02-03 22:04:28 +00:00
cyy
2fd1b6b361 [Environment Variable][7/N] Use thread-safe getenv functions (#140211)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140211
Approved by: https://github.com/ezyang, https://github.com/eqy
2025-02-01 12:33:41 +00:00
fduwjj
eb029fba13 [c10d][NCCL] Implement ncclCommInitRankScalable (merging #136789) (#144794)
Try to land https://github.com/pytorch/pytorch/pull/136789/files on our end and fix any remaining issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144794
Approved by: https://github.com/kwen2501, https://github.com/eqy, https://github.com/atalman
2025-01-31 22:39:56 +00:00
Yifu Wang
c70362fac8 [AsyncMM] re-enable and adapt to cutlass 3.6.0 (#144011)
[D68734067](https://our.internmc.facebook.com/intern/diff/D68734067)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144011
Approved by: https://github.com/Skylion007, https://github.com/drisspg
2025-01-31 00:48:51 +00:00
Ke Wen
9fdc20809a [PGNCCL] Simplify support macro definition (#145964)
- Promotes usage of `NCCL_VERSION_CODE >= NCCL_VERSION(X, Y, Z)`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145964
Approved by: https://github.com/fduwjj, https://github.com/shuqiangzhang
ghstack dependencies: #145893
2025-01-30 23:26:32 +00:00
Ke Wen
51ee9b154e [c10d] Add NCCL memory allocator (#145675)
This PR implements a small UI improvement over #133603.

It prepares a NCCL memory allocator in torch cpp and then pybind's it out, so that user can directly use it.

UI:
```
pool = torch.cuda.MemPool(backend.mem_allocator)
with torch.cuda.use_mem_pool(pool):
    tensor = torch.arange(1024 * 1024 * 2, device=device)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145675
Approved by: https://github.com/syed-ahmed, https://github.com/wconstab
2025-01-30 18:19:00 +00:00
PyTorch MergeBot
5fa28bbe40 Revert "[c10d] Add NCCL memory allocator (#145675)"
This reverts commit 18a7a04c4a.

Reverted https://github.com/pytorch/pytorch/pull/145675 on behalf of https://github.com/ZainRizvi due to Sorry but this still fails internally. See D68866823 for details ([comment](https://github.com/pytorch/pytorch/pull/145675#issuecomment-2624900562))
2025-01-30 16:01:52 +00:00
Ke Wen
25ca05eebf [PGNCCL] Correct some ifdef's (#145893)
`create` function supporting `ncclConfig_t` should be wrapped inside `NCCL_HAS_CONFIG` instead of `NCCL_HAS_COMM_NONBLOCKING`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145893
Approved by: https://github.com/c-p-i-o
2025-01-30 01:05:21 +00:00
Ke Wen
18a7a04c4a [c10d] Add NCCL memory allocator (#145675)
This PR implements a small UI improvement over #133603.

It prepares a NCCL memory allocator in torch cpp and then pybind's it out, so that user can directly use it.

UI:
```
pool = torch.cuda.MemPool(backend.mem_allocator)
with torch.cuda.use_mem_pool(pool):
    tensor = torch.arange(1024 * 1024 * 2, device=device)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145675
Approved by: https://github.com/syed-ahmed, https://github.com/wconstab
2025-01-29 23:20:22 +00:00
PyTorch MergeBot
6371c25b91 Revert "[c10d] Add NCCL memory allocator (#145675)"
This reverts commit 9fd6722fc9.

Reverted https://github.com/pytorch/pytorch/pull/145675 on behalf of https://github.com/ZainRizvi due to This fails to build internally, can you please take a look at D68831004 for more details? ([comment](https://github.com/pytorch/pytorch/pull/145675#issuecomment-2622515425))
2025-01-29 18:30:30 +00:00
PyTorch MergeBot
284f217011 Revert "[Environment Variable][7/N] Use thread-safe getenv functions (#140211)"
This reverts commit 97b3b73f3e.

Reverted https://github.com/pytorch/pytorch/pull/140211 on behalf of https://github.com/ZainRizvi due to Sorry but this is failing internally. @eqy @ezyang can you please help this get remerged? See D68779772. ([comment](https://github.com/pytorch/pytorch/pull/140211#issuecomment-2622504898))
2025-01-29 18:24:29 +00:00
Ke Wen
9fd6722fc9 [c10d] Add NCCL memory allocator (#145675)
This PR implements a small UI improvement over #133603.

It prepares a NCCL memory allocator in torch cpp and then pybind's it out, so that user can directly use it.

UI:
```
pool = torch.cuda.MemPool(backend.mem_allocator)
with torch.cuda.use_mem_pool(pool):
    tensor = torch.arange(1024 * 1024 * 2, device=device)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145675
Approved by: https://github.com/syed-ahmed, https://github.com/wconstab
2025-01-29 02:48:56 +00:00
fduwjj
4f949f282d [c10d][ez] Remove goto in PGNCCL and make linter happy for PGNCCL and NCCLUtils (#145855)
While working on PGNCCL I found that the code triggers some lint warnings so this PR is to address them or add lint suppressor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145855
Approved by: https://github.com/c-p-i-o, https://github.com/kwen2501
2025-01-28 21:19:49 +00:00
cyyever
ef28df5c9e [Reland][Environment Variable][4/N] Use thread-safe getenv functions (#140593)
Reland of #137843 , after checking the code again.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140593
Approved by: https://github.com/albanD

Co-authored-by: albanD <desmaison.alban@gmail.com>
2025-01-28 20:51:49 +00:00
cyyever
97b3b73f3e [Environment Variable][7/N] Use thread-safe getenv functions (#140211)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140211
Approved by: https://github.com/ezyang, https://github.com/eqy
2025-01-28 15:21:12 +00:00
Chirag Pandya
bdf6dfa17d [chore][ez] change alloc buffer size from 4000 to 4096 (#145759)
Summary:
Allocations typically happen as a power of 2 anyway.
Change the default alloc size to 4096 so eek out a bit more perf.

Test:
unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145759
Approved by: https://github.com/XilunWu, https://github.com/fduwjj
ghstack dependencies: #145756, #145757
2025-01-28 09:14:07 +00:00
Chirag Pandya
78f02bf07c [bug] handle case when remote peer closes connection (#145757)
Summary:
In the case where remote peer closes the connection, nread returns 0. In
this case, we still want to free up the allocated buffer.
Also, reorder the if so that the likely success cases (nread > 0) is at
the top of the function with an early return.

Test Plan:
unit tests

Differential Revision: [D68733192](https://our.internmc.facebook.com/intern/diff/D68733192)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145757
Approved by: https://github.com/XilunWu, https://github.com/fduwjj
ghstack dependencies: #145756
2025-01-28 03:06:38 +00:00
Yifu Wang
db33d23aa8 [SymmetricMemory] fix an issue where rendezvous is performed with wrong device context when torch.cuda.set_device() is not callled (#144886)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144886
Approved by: https://github.com/awgu
2025-01-28 01:43:37 +00:00
Chirag Pandya
3ce68dc61e [c10d] Flush file in file recorder (#145458)
Summary:
Flushing file to hopefully prevent file corruptions as reported in
https://github.com/pytorch/pytorch/pull/145125

Test Plan:
Couldn't get file corruption to occur in my tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145458
Approved by: https://github.com/kwen2501
2025-01-27 23:15:52 +00:00
Chirag Pandya
5534c270db [chore] fix new linter (#145756)
Summary:
Fix new linter that's complaining when I made changes to this file:
class 'LibUVStoreDaemon' defines a non-default destructor but does not
define a copy constructor, a copy assignment operator, a move
constructor or a move assignment operator

Test Plan:
make lint passes

Differential Revision: [D68733191](https://our.internmc.facebook.com/intern/diff/D68733191)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145756
Approved by: https://github.com/XilunWu, https://github.com/Skylion007, https://github.com/fduwjj
2025-01-27 22:48:12 +00:00
Shuqiang Zhang
c0861d092c [PGNCCL] Add an API to get the status/error code at the PG level (#144498)
Summary:
This PR is basically a replacement of
https://github.com/pytorch/pytorch/pull/140087, which caused some perf
drop due to frequent TCPStore check in watchdog thread. The fix is to move the
tcpstore check in monitoring thread

If unhealthy, the user should be able to get the type of errors, e.g.,
timeout,nccl error or remote error.

This API is applied to PG level, compared to the
work.get_future_result() API which is applied to Work Level.
Error detection at PG level is much more convenient for users to handle
the PG failure as a whole, e.g, restarting the PG.

Error handling at the work level is still useful for users to attach
work specific context and debug the RC of the specific failing
work/collective

Note it is critical for all ranks in the PG to be notified about an
error as soon as it occurs, so we introduce an errorType of
REMOTE_ERROR, which is 'broadcasted' from a src rank (which detects a
local error) to all other ranks in the PG, the broadcast is done through
TCPStore currently

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144498
Approved by: https://github.com/kwen2501
2025-01-24 16:47:32 +00:00
cyy
6a35d9aaa4 Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (#143806)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143806
Approved by: https://github.com/kwen2501
2025-01-24 12:22:13 +00:00
PyTorch MergeBot
6a2b4db0a1 Revert "Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (#143806)"
This reverts commit 42f4fda2eb.

Reverted https://github.com/pytorch/pytorch/pull/143806 on behalf of https://github.com/huydhn due to Lots of builds fail after this land, so maybe a landrace ([comment](https://github.com/pytorch/pytorch/pull/143806#issuecomment-2611275836))
2025-01-24 00:17:34 +00:00
Chirag Pandya
f8a4f16634 [c10d] fix memory leak on shutdown (#145507)
Summary:
Fix memory leak on shutdown when socket is closed.
We still need to free the buffer to make valgrind happy.

Test Plan:
Use `mtiavm`.
Repro steps provided by cristianlume.

on window 1:
```
vm ssh --vm=0 -- $(buck run @//neteng/ai/rdma_gen/mode/owl //neteng/ai/rdma_gen:rdma_gen --emit-shell) --rdma_mode=mtiav1 --num_ranks=2
```
on window 2:
```
vm ssh --vm=1 -- $(buck run @//neteng/ai/rdma_gen/mode/owl //neteng/ai/rdma_gen:rdma_gen --emit-shell) --rdma_mode=mtiav1 --num_ranks=2 --rank=1 --store_host=172.16.1.1
```

without the fix:
```
==8766==ERROR: LeakSanitizer: detected memory leaks
```
With fix, no leak

Differential Revision: D68566104

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145507
Approved by: https://github.com/XilunWu, https://github.com/d4l3k
2025-01-23 23:36:15 +00:00
cyy
42f4fda2eb Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (#143806)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143806
Approved by: https://github.com/kwen2501
2025-01-23 22:47:18 +00:00
cyy
29f52e3972 [2/N] Remove unnecessary once flag usage (#145057)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145057
Approved by: https://github.com/albanD
2025-01-23 09:48:46 +00:00
Chirag Pandya
5e6451ea78 [c10] catch c10 error and log message (#145413)
Summary:
Explicitly catch c10 error and log the error message only.

The standard exception `e.what()` below ends up logging the stack trace that is confusing users.
See S477887 for details.

Test Plan:
tested locally.
```
buck test caffe2/test/cpp/c10d:TCPStoreTest
buck2 daemon constraint mismatch: Version mismatch; killing daemon...
Starting new buck2 daemon...
Connected to new buck2 daemon.
File changed: fbcode//caffe2/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp
File changed: fbsource//xplat/caffe2/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp
Watchman fresh instance: new mergebase, cleared graph state, cleared dep files
Soft Error: source_directory_includes_subpackage: Directory `v2.17.1-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.17.1-1/src/tests`.
Soft Error: source_directory_includes_subpackage: Directory `v2.18.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.18.3-1/src/tests`.
Soft Error: source_directory_includes_subpackage: Directory `v2.19.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.19.3-1/src/tests`.
Buck UI: https://www.internalfb.com/buck2/dbd34fa4-50ed-4eeb-800d-688f5a7bec68
Test UI: https://www.internalfb.com/intern/testinfra/testrun/281475375994918
Network: Up: 1.5GiB  Down: 4.7GiB  (reSessionID-d6b0568e-2347-4375-a2d9-2d03ca0c2161)
Loading targets.   Remaining      0/3024                                                                                                                                 69199 dirs read, 687558 targets declared
Analyzing targets. Remaining      0/31483                                                                                                                                1481904 actions, 1719048 artifacts declared
Executing actions. Remaining      0/250391                                                                                                                               77:11:29.7s exec time total
Command: test.     Finished 2031 local, 45445 remote, 51473 cache (52% hit)                                                                                              20:16:36.9s exec time cached (26%)
Time elapsed: 7:32.7s
Tests finished: Pass 8. Fail 0. Fatal 0. Skip 0. Build failure 0
```

Differential Revision: D68516080

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145413
Approved by: https://github.com/fduwjj
2025-01-23 03:45:47 +00:00
cyy
843627b7b1 Remove unnecessary once flag usage (#143255)
Static variables in C++11 is guaranteed to be initialised exactly once, as mentioned [here](https://en.cppreference.com/w/cpp/language/storage_duration)
```
If multiple threads attempt to initialize the same static local variable concurrently,
the initialization occurs exactly once
(similar behavior can be obtained for arbitrary functions with std::call_once.
Usual implementations of this feature use variants
of the double-checked locking pattern,
which reduces runtime overhead for already-initialized local statics
 to a single non-atomic boolean comparison.
```
Given that static c10::once_flag is used before, why not just use the associated function to initialised the related static variables? That is the motivation behind this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143255
Approved by: https://github.com/albanD
2025-01-16 02:36:11 +00:00
fduwjj
ae7df51232 [c10d] Fix CudaEventCache for dangling references (#144496)
Reported in https://github.com/pytorch/pytorch/issues/143470, we have a dangling references in `CudaEventCache`. So we want to fix it.
1. We add a unit test to repro the issue mentioned in the issue.
2. Instead of converting variables to shared pointers as suggested in the issue, we then make the cache itself a shared pointer. So if the thread creates the cache dies before all events get recycled, the cache is still there until the last CudaEvent get deleted. (thanks for the suggestion from @kwen2501 )

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144496
Approved by: https://github.com/kwen2501
2025-01-15 05:11:48 +00:00
fduwjj
e0bbff6019 [c10d][ez] Add comments to the end of Macro for better readability (#144789)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144789
Approved by: https://github.com/c-p-i-o
2025-01-15 05:06:41 +00:00
lzhang2
1800f5f461 Enable coalescing path on XPU and dispatch to XPU tensor barrier if XCCL backend is specified. (#143735)
**Motivation:**

- Enable coalescing path on XPU for `batch_isend_irecv`.
- If XCCL backend is specified, then construct a XPU tensor to ensure `barrier` dispatch to XCCL backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143735
Approved by: https://github.com/kwen2501
2025-01-14 08:37:48 +00:00
Daulet Askarov
21cbee5d9b Drop unused num_elements variable (#144723)
Summary:
With the recent enforcement of unused variable as an error in D67329035, certain tests like
https://www.internalfb.com/intern/test/562950135258426?ref_report_id=0
can't build citing:
```
Action failed: fbcode//caffe2:libtorch_cuda (cfg:linux-x86_64-fbcode-platform010-clang17-no-san#2a7259832b2f5c67) (cxx_compile torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (pic))
Remote command returned non-zero exit code 1
Remote action, reproduce with: `frecli cas download-action a95a6625d2b071a782a7a8ea2882f4adccf103b023df5ccb596f48c506101754:145`
Stdout: <empty>
Stderr:
fbcode/caffe2/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:3757:16: error: unused variable 'num_elements' [-Werror,-Wunused-variable]
 3757 |         size_t num_elements = output.numel();
      |                ^~~~~~~~~~~~
1 error generated.
```
This causes Sandcastle to turn off these tests, decreasing protection from other bad diffs. Clean up the unused variable to unblock.

Test Plan:
```
buck2 build --config hpc_comms.use_ncclx=dev --flagfile fbcode//mode/opt fbcode//ftar:ftar_py_e2e_test
```

https://www.internalfb.com/buck2/888dfc68-07eb-4ba1-add5-b38c12d52b33

Reviewed By: c-p-i-o

Differential Revision: D68126236

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144723
Approved by: https://github.com/fduwjj, https://github.com/Skylion007

Co-authored-by: Daulet Askarov <dauleta@meta.com>
2025-01-14 08:29:01 +00:00
cyy
9a841f9321 Enable bugprone-unchecked-optional-access (#144226)
We can actually enable bugprone-unchecked-optional-access without the risk of hang.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144226
Approved by: https://github.com/albanD
2025-01-10 03:16:56 +00:00
Richard Barnes
3e7e435bb1 [codemod] Remove unused-variable in caffe2/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp +2 (#144371)
Summary:
LLVM-15 has a warning `-Wunused-variable` which we treat as an error because it's so often diagnostic of a code issue. Unused variables can compromise readability or, worse, performance.

This diff either (a) removes an unused variable and, possibly, it's associated code or (b) qualifies the variable with `[[maybe_unused]]`.

 - If you approve of this diff, please use the "Accept & Ship" button :-)

Test Plan: Sandcastle

Reviewed By: palmje

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144371
Approved by: https://github.com/Skylion007
2025-01-09 21:49:17 +00:00
cyy
b0be30dd79 [19/N] Fix extra warnings brought by clang-tidy-17 (#144448)
Apply more clang-tidy fixes. There was a bug introduced by #144014 due to incorrect namespace concatenation which is reverted here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144448
Approved by: https://github.com/albanD
2025-01-09 15:58:05 +00:00
PyTorch MergeBot
778d953951 Revert "[AsyncMM] re-enable and prepare for cutlass 3.5.1 update (#144011)"
This reverts commit 24ac87392b.

Reverted https://github.com/pytorch/pytorch/pull/144011 on behalf of https://github.com/malfet due to Not sure what is going on, but lots of builds are failing ([comment](https://github.com/pytorch/pytorch/pull/144011#issuecomment-2574317669))
2025-01-07 03:24:01 +00:00
Yifu Wang
24ac87392b [AsyncMM] re-enable and prepare for cutlass 3.5.1 update (#144011)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144011
Approved by: https://github.com/Skylion007, https://github.com/drisspg
2025-01-07 02:15:42 +00:00
Dingming Wu
a881954b0c [PTD] Dump rcclexp proxy trace in pytorch (#143678)
Summary:
Dump the active proxyOp status per rank and per communicator when WatchDog timeout or aborts.

Added
`#if defined(USE_ROCM) && defined(NCCL_COMM_DUMP)` guard in the print function, so only rcclexp users will see this dump in console.

This is the changes of the PTD.

Test Plan:
Job with A2A hang due to receiver failing to post receive operations https://fburl.com/mlhub/95vg12r3
 {F1971449692}

Reviewed By: c-p-i-o

Differential Revision: D67036093

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143678
Approved by: https://github.com/c-p-i-o
2025-01-04 10:20:47 +00:00
Driss Guessous
a8c98ce175 [cutlass-3] Update third-party/cutlass-3 from 3.4 to 3.5.1 (#143515)
# Summary:

This also makes updates to different repositories throughout FB code to roll any updates needed for this new release.

I was not able to get AsyncMM.cu to build (still trying) Yfiu suggested that I just skip it for now

Test Plan:
Have run various build commands to try and expose errors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143515
Approved by: https://github.com/eqy, https://github.com/Skylion007
2025-01-02 18:45:11 +00:00
cyy
dca443835e Enable more readability-redundant checks (#143963)
They are helpful to simplifying code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143963
Approved by: https://github.com/albanD
2024-12-30 14:49:33 +00:00