pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Tristan Rice	ba214ab56c	TCPStore: soft fail bind when agent store active (#147465 ) This makes it easier to roll out `TORCHELASTIC_USE_AGENT_STORE` by opportunistically swallowing bind errors when the agent store is enabled and the port matches `MASTER_PORT`. This should be very safe as if the store is somehow not up and the envs are set, the TCPStore client connections will fail to connect so we end up with a slightly different error message but success/failure behavior is identical. This also pybinds `c10d::SocketError` into Python so we can assert on the error type in tests. https://docs.google.com/document/d/1CzOn_N53AiFxWGgbyMWSnd2elCJd4lZ-ajPg2lzcxoM/edit?tab=t.0#heading=h.2j2f5dimrdau Test plan: ``` pytest test/distributed/test_store.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/147465 Approved by: https://github.com/fduwjj	2025-02-21 03:02:26 +00:00
cyy	15635b14ce	[4/N] Remove unnecessary once flag usage (#146783 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146783 Approved by: https://github.com/albanD	2025-02-11 13:55:06 +00:00
Ke Wen	30cbf13544	[PGNCCL] Associate tensor allocation support with NCCL version (#146842 ) This is a forward fix to #146589. For NCCL version lower than 2.19, previous PR would see `RuntimeError: NCCL mem allocator is not supported in this NCCL version`. This PR gates the support by checking link-time NCCL version via `ncclGetVersion`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146842 Approved by: https://github.com/XilunWu, https://github.com/wconstab, https://github.com/fduwjj ghstack dependencies: #146589	2025-02-11 02:52:52 +00:00
Yifu Wang	97f6480cf5	Fix an issue where functional collectives don't force fx stride on inputs when compiled (#146467 ) Fixes https://github.com/pytorch/pytorch/issues/146416 Also added contiguity checks in the C++ functional collective ops to prevent striding issues introduced during compilation manifest as silent correctness issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146467 Approved by: https://github.com/Chillee, https://github.com/lw, https://github.com/shunting314	2025-02-10 19:15:49 +00:00
Ke Wen	effc545274	[DDP] Use NCCL allocated memory for gradient bucket (#146589 ) So that NVLink SHARP comes with zero-copy on H100+ platforms, for DDP applications. Less SM usage, less memory contention between NCCL kernel and compute kernels. Added env `DDP_DISABLE_COMM_MEM` as a back-out option: ``` An environment variable to disable comm-optimized memory pool. Default is 0, which means comm-optimized memory pool is enabled. Users can set it to 1 in case of seeing regression or OOM (because this comm MemPool may not share space with regular compute MemPool). ``` Differential Revision: [D69297766](https://our.internmc.facebook.com/intern/diff/D69297766) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146589 Approved by: https://github.com/syed-ahmed, https://github.com/c-p-i-o, https://github.com/fduwjj	2025-02-10 05:23:11 +00:00
Dingming Wu	fa34128435	revert PTD's change that leads to signature mismatch of printNcclCommProxyTrace (#146453 ) Summary: D68801098 introduced this function signature mismatch issue for printNcclCommProxyTrace. Revert it so that trunk build can pass. Test Plan: With the change, build of APS model using rcclexp can now pass: `sh scripts/ltian/run_jobs/fb_fm_v2/run_fb_fm_v2_job.sh -h T20_GTT_MI300X -n 16 -b 1024 -t [2024-12-06] -d ai_infra_ngs -e ai_infra_training_rnd_tc -x 0` Reviewed By: c-p-i-o Differential Revision: D69149588 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146453 Approved by: https://github.com/c-p-i-o	2025-02-07 22:43:52 +00:00
Tristan Rice	68631f6e87	PyWork: preserve Python reference counting when used in functional collectives (#146376 ) @fegin found an issue where torchft is not compatible with functional collectives. Found in https://github.com/pytorch/torchtitan/pull/806 The root cause is because PyProcessGroup/PyWork are not compatible with functional collectives due to a nasty ownership bug. PyWork relies on a pybind trampoline to propagate requests to Python unfortunately the way Pybind works is that the Python object owns the C++ object rather than some form of shared ownership. Thus what happens is that the PyWork Python object will collected when returned to C++ from the PyProcessGroup but the C++ PyWork object still exists. When the PyWork object is used, this causes a deadlock as the corresponding Python object no longer exists To solve this, we introduce a new `PyWorkHolder` class which holds a reference to the `py::object` as well as the trampoline class. This resolves any dependency issues since we can now hold ownership in C++ to both the Python and C++ objects. To make this cleaner we introduce a `WORK_OVERRIDE` macro which is a patched version of `PYBIND11_OVERRIDE` that returns a `PyWorkHolder` rather than just `PyWork` and use for all collectives in PyProcessGroup. Test plan: ``` cd pytorch pytest test/distributed/test_c10d_functional_native.py ``` ``` cd torchft pytest torchft/process_group_test.py -k functional -v -x -s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146376 Approved by: https://github.com/yifuwang	2025-02-07 18:07:53 +00:00
cyy	25aa7ca62d	Cleanup CallOnce.h (#146700 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146700 Approved by: https://github.com/albanD	2025-02-07 16:44:45 +00:00
cyy	fa0592b568	Remove some NOLINT (#146610 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146610 Approved by: https://github.com/Skylion007, https://github.com/malfet	2025-02-07 01:50:06 +00:00
cyy	f397c72697	Remove NOLINTNEXTLINE (#146238 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146238 Approved by: https://github.com/albanD	2025-02-04 02:45:32 +00:00
PyTorch MergeBot	00dc5b10f6	Revert "[Environment Variable][7/N] Use thread-safe getenv functions (#140211 )" This reverts commit `2fd1b6b361`. Reverted https://github.com/pytorch/pytorch/pull/140211 on behalf of https://github.com/atalman due to Breaks executorch tests ([comment](https://github.com/pytorch/pytorch/pull/140211#issuecomment-2632202864))	2025-02-03 22:04:28 +00:00
cyy	2fd1b6b361	[Environment Variable][7/N] Use thread-safe getenv functions (#140211 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140211 Approved by: https://github.com/ezyang, https://github.com/eqy	2025-02-01 12:33:41 +00:00
fduwjj	eb029fba13	[c10d][NCCL] Implement ncclCommInitRankScalable (merging #136789 ) (#144794 ) Try to land https://github.com/pytorch/pytorch/pull/136789/files on our end and fix any remaining issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144794 Approved by: https://github.com/kwen2501, https://github.com/eqy, https://github.com/atalman	2025-01-31 22:39:56 +00:00
Yifu Wang	c70362fac8	[AsyncMM] re-enable and adapt to cutlass 3.6.0 (#144011 ) [D68734067](https://our.internmc.facebook.com/intern/diff/D68734067) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144011 Approved by: https://github.com/Skylion007, https://github.com/drisspg	2025-01-31 00:48:51 +00:00
Ke Wen	9fdc20809a	[PGNCCL] Simplify support macro definition (#145964 ) - Promotes usage of `NCCL_VERSION_CODE >= NCCL_VERSION(X, Y, Z)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145964 Approved by: https://github.com/fduwjj, https://github.com/shuqiangzhang ghstack dependencies: #145893	2025-01-30 23:26:32 +00:00
Ke Wen	51ee9b154e	[c10d] Add NCCL memory allocator (#145675 ) This PR implements a small UI improvement over #133603. It prepares a NCCL memory allocator in torch cpp and then pybind's it out, so that user can directly use it. UI: ``` pool = torch.cuda.MemPool(backend.mem_allocator) with torch.cuda.use_mem_pool(pool): tensor = torch.arange(1024 * 1024 * 2, device=device) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145675 Approved by: https://github.com/syed-ahmed, https://github.com/wconstab	2025-01-30 18:19:00 +00:00
PyTorch MergeBot	5fa28bbe40	Revert "[c10d] Add NCCL memory allocator (#145675 )" This reverts commit `18a7a04c4a`. Reverted https://github.com/pytorch/pytorch/pull/145675 on behalf of https://github.com/ZainRizvi due to Sorry but this still fails internally. See D68866823 for details ([comment](https://github.com/pytorch/pytorch/pull/145675#issuecomment-2624900562))	2025-01-30 16:01:52 +00:00
Ke Wen	25ca05eebf	[PGNCCL] Correct some ifdef's (#145893 ) `create` function supporting `ncclConfig_t` should be wrapped inside `NCCL_HAS_CONFIG` instead of `NCCL_HAS_COMM_NONBLOCKING` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145893 Approved by: https://github.com/c-p-i-o	2025-01-30 01:05:21 +00:00
Ke Wen	18a7a04c4a	[c10d] Add NCCL memory allocator (#145675 ) This PR implements a small UI improvement over #133603. It prepares a NCCL memory allocator in torch cpp and then pybind's it out, so that user can directly use it. UI: ``` pool = torch.cuda.MemPool(backend.mem_allocator) with torch.cuda.use_mem_pool(pool): tensor = torch.arange(1024 * 1024 * 2, device=device) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145675 Approved by: https://github.com/syed-ahmed, https://github.com/wconstab	2025-01-29 23:20:22 +00:00
PyTorch MergeBot	6371c25b91	Revert "[c10d] Add NCCL memory allocator (#145675 )" This reverts commit `9fd6722fc9`. Reverted https://github.com/pytorch/pytorch/pull/145675 on behalf of https://github.com/ZainRizvi due to This fails to build internally, can you please take a look at D68831004 for more details? ([comment](https://github.com/pytorch/pytorch/pull/145675#issuecomment-2622515425))	2025-01-29 18:30:30 +00:00
PyTorch MergeBot	284f217011	Revert "[Environment Variable][7/N] Use thread-safe getenv functions (#140211 )" This reverts commit `97b3b73f3e`. Reverted https://github.com/pytorch/pytorch/pull/140211 on behalf of https://github.com/ZainRizvi due to Sorry but this is failing internally. @eqy @ezyang can you please help this get remerged? See D68779772. ([comment](https://github.com/pytorch/pytorch/pull/140211#issuecomment-2622504898))	2025-01-29 18:24:29 +00:00
Ke Wen	9fd6722fc9	[c10d] Add NCCL memory allocator (#145675 ) This PR implements a small UI improvement over #133603. It prepares a NCCL memory allocator in torch cpp and then pybind's it out, so that user can directly use it. UI: ``` pool = torch.cuda.MemPool(backend.mem_allocator) with torch.cuda.use_mem_pool(pool): tensor = torch.arange(1024 * 1024 * 2, device=device) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145675 Approved by: https://github.com/syed-ahmed, https://github.com/wconstab	2025-01-29 02:48:56 +00:00
fduwjj	4f949f282d	[c10d][ez] Remove goto in PGNCCL and make linter happy for PGNCCL and NCCLUtils (#145855 ) While working on PGNCCL I found that the code triggers some lint warnings so this PR is to address them or add lint suppressor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145855 Approved by: https://github.com/c-p-i-o, https://github.com/kwen2501	2025-01-28 21:19:49 +00:00
cyyever	ef28df5c9e	[Reland][Environment Variable][4/N] Use thread-safe getenv functions (#140593 ) Reland of #137843 , after checking the code again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140593 Approved by: https://github.com/albanD Co-authored-by: albanD <desmaison.alban@gmail.com>	2025-01-28 20:51:49 +00:00
cyyever	97b3b73f3e	[Environment Variable][7/N] Use thread-safe getenv functions (#140211 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140211 Approved by: https://github.com/ezyang, https://github.com/eqy	2025-01-28 15:21:12 +00:00
Chirag Pandya	bdf6dfa17d	[chore][ez] change alloc buffer size from 4000 to 4096 (#145759 ) Summary: Allocations typically happen as a power of 2 anyway. Change the default alloc size to 4096 so eek out a bit more perf. Test: unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/145759 Approved by: https://github.com/XilunWu, https://github.com/fduwjj ghstack dependencies: #145756, #145757	2025-01-28 09:14:07 +00:00
Chirag Pandya	78f02bf07c	[bug] handle case when remote peer closes connection (#145757 ) Summary: In the case where remote peer closes the connection, nread returns 0. In this case, we still want to free up the allocated buffer. Also, reorder the if so that the likely success cases (nread > 0) is at the top of the function with an early return. Test Plan: unit tests Differential Revision: [D68733192](https://our.internmc.facebook.com/intern/diff/D68733192) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145757 Approved by: https://github.com/XilunWu, https://github.com/fduwjj ghstack dependencies: #145756	2025-01-28 03:06:38 +00:00
Yifu Wang	db33d23aa8	[SymmetricMemory] fix an issue where rendezvous is performed with wrong device context when torch.cuda.set_device() is not callled (#144886 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144886 Approved by: https://github.com/awgu	2025-01-28 01:43:37 +00:00
Chirag Pandya	3ce68dc61e	[c10d] Flush file in file recorder (#145458 ) Summary: Flushing file to hopefully prevent file corruptions as reported in https://github.com/pytorch/pytorch/pull/145125 Test Plan: Couldn't get file corruption to occur in my tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145458 Approved by: https://github.com/kwen2501	2025-01-27 23:15:52 +00:00
Chirag Pandya	5534c270db	[chore] fix new linter (#145756 ) Summary: Fix new linter that's complaining when I made changes to this file: class 'LibUVStoreDaemon' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator Test Plan: make lint passes Differential Revision: [D68733191](https://our.internmc.facebook.com/intern/diff/D68733191) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145756 Approved by: https://github.com/XilunWu, https://github.com/Skylion007, https://github.com/fduwjj	2025-01-27 22:48:12 +00:00
Shuqiang Zhang	c0861d092c	[PGNCCL] Add an API to get the status/error code at the PG level (#144498 ) Summary: This PR is basically a replacement of https://github.com/pytorch/pytorch/pull/140087, which caused some perf drop due to frequent TCPStore check in watchdog thread. The fix is to move the tcpstore check in monitoring thread If unhealthy, the user should be able to get the type of errors, e.g., timeout,nccl error or remote error. This API is applied to PG level, compared to the work.get_future_result() API which is applied to Work Level. Error detection at PG level is much more convenient for users to handle the PG failure as a whole, e.g, restarting the PG. Error handling at the work level is still useful for users to attach work specific context and debug the RC of the specific failing work/collective Note it is critical for all ranks in the PG to be notified about an error as soon as it occurs, so we introduce an errorType of REMOTE_ERROR, which is 'broadcasted' from a src rank (which detects a local error) to all other ranks in the PG, the broadcast is done through TCPStore currently Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/144498 Approved by: https://github.com/kwen2501	2025-01-24 16:47:32 +00:00
cyy	6a35d9aaa4	Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (#143806 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/143806 Approved by: https://github.com/kwen2501	2025-01-24 12:22:13 +00:00
PyTorch MergeBot	6a2b4db0a1	Revert "Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (#143806 )" This reverts commit `42f4fda2eb`. Reverted https://github.com/pytorch/pytorch/pull/143806 on behalf of https://github.com/huydhn due to Lots of builds fail after this land, so maybe a landrace ([comment](https://github.com/pytorch/pytorch/pull/143806#issuecomment-2611275836))	2025-01-24 00:17:34 +00:00
Chirag Pandya	f8a4f16634	[c10d] fix memory leak on shutdown (#145507 ) Summary: Fix memory leak on shutdown when socket is closed. We still need to free the buffer to make valgrind happy. Test Plan: Use `mtiavm`. Repro steps provided by cristianlume. on window 1: ``` vm ssh --vm=0 -- $(buck run @//neteng/ai/rdma_gen/mode/owl //neteng/ai/rdma_gen:rdma_gen --emit-shell) --rdma_mode=mtiav1 --num_ranks=2 ``` on window 2: ``` vm ssh --vm=1 -- $(buck run @//neteng/ai/rdma_gen/mode/owl //neteng/ai/rdma_gen:rdma_gen --emit-shell) --rdma_mode=mtiav1 --num_ranks=2 --rank=1 --store_host=172.16.1.1 ``` without the fix: ``` ==8766==ERROR: LeakSanitizer: detected memory leaks ``` With fix, no leak Differential Revision: D68566104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145507 Approved by: https://github.com/XilunWu, https://github.com/d4l3k	2025-01-23 23:36:15 +00:00
cyy	42f4fda2eb	Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (#143806 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/143806 Approved by: https://github.com/kwen2501	2025-01-23 22:47:18 +00:00
cyy	29f52e3972	[2/N] Remove unnecessary once flag usage (#145057 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/145057 Approved by: https://github.com/albanD	2025-01-23 09:48:46 +00:00
Chirag Pandya	5e6451ea78	[c10] catch c10 error and log message (#145413 ) Summary: Explicitly catch c10 error and log the error message only. The standard exception `e.what()` below ends up logging the stack trace that is confusing users. See S477887 for details. Test Plan: tested locally. ``` buck test caffe2/test/cpp/c10d:TCPStoreTest buck2 daemon constraint mismatch: Version mismatch; killing daemon... Starting new buck2 daemon... Connected to new buck2 daemon. File changed: fbcode//caffe2/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp File changed: fbsource//xplat/caffe2/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp Watchman fresh instance: new mergebase, cleared graph state, cleared dep files Soft Error: source_directory_includes_subpackage: Directory `v2.17.1-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.17.1-1/src/tests`. Soft Error: source_directory_includes_subpackage: Directory `v2.18.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.18.3-1/src/tests`. Soft Error: source_directory_includes_subpackage: Directory `v2.19.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.19.3-1/src/tests`. Buck UI: https://www.internalfb.com/buck2/dbd34fa4-50ed-4eeb-800d-688f5a7bec68 Test UI: https://www.internalfb.com/intern/testinfra/testrun/281475375994918 Network: Up: 1.5GiB Down: 4.7GiB (reSessionID-d6b0568e-2347-4375-a2d9-2d03ca0c2161) Loading targets. Remaining 0/3024 69199 dirs read, 687558 targets declared Analyzing targets. Remaining 0/31483 1481904 actions, 1719048 artifacts declared Executing actions. Remaining 0/250391 77:11:29.7s exec time total Command: test. Finished 2031 local, 45445 remote, 51473 cache (52% hit) 20:16:36.9s exec time cached (26%) Time elapsed: 7:32.7s Tests finished: Pass 8. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` Differential Revision: D68516080 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145413 Approved by: https://github.com/fduwjj	2025-01-23 03:45:47 +00:00
cyy	843627b7b1	Remove unnecessary once flag usage (#143255 ) Static variables in C++11 is guaranteed to be initialised exactly once, as mentioned [here](https://en.cppreference.com/w/cpp/language/storage_duration) ``` If multiple threads attempt to initialize the same static local variable concurrently, the initialization occurs exactly once (similar behavior can be obtained for arbitrary functions with std::call_once. Usual implementations of this feature use variants of the double-checked locking pattern, which reduces runtime overhead for already-initialized local statics to a single non-atomic boolean comparison. ``` Given that static c10::once_flag is used before, why not just use the associated function to initialised the related static variables? That is the motivation behind this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143255 Approved by: https://github.com/albanD	2025-01-16 02:36:11 +00:00
fduwjj	ae7df51232	[c10d] Fix CudaEventCache for dangling references (#144496 ) Reported in https://github.com/pytorch/pytorch/issues/143470, we have a dangling references in `CudaEventCache`. So we want to fix it. 1. We add a unit test to repro the issue mentioned in the issue. 2. Instead of converting variables to shared pointers as suggested in the issue, we then make the cache itself a shared pointer. So if the thread creates the cache dies before all events get recycled, the cache is still there until the last CudaEvent get deleted. (thanks for the suggestion from @kwen2501 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144496 Approved by: https://github.com/kwen2501	2025-01-15 05:11:48 +00:00
fduwjj	e0bbff6019	[c10d][ez] Add comments to the end of Macro for better readability (#144789 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144789 Approved by: https://github.com/c-p-i-o	2025-01-15 05:06:41 +00:00
lzhang2	1800f5f461	Enable coalescing path on XPU and dispatch to XPU tensor barrier if XCCL backend is specified. (#143735 ) Motivation: - Enable coalescing path on XPU for `batch_isend_irecv`. - If XCCL backend is specified, then construct a XPU tensor to ensure `barrier` dispatch to XCCL backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143735 Approved by: https://github.com/kwen2501	2025-01-14 08:37:48 +00:00
Daulet Askarov	21cbee5d9b	Drop unused num_elements variable (#144723 ) Summary: With the recent enforcement of unused variable as an error in D67329035, certain tests like https://www.internalfb.com/intern/test/562950135258426?ref_report_id=0 can't build citing: ``` Action failed: fbcode//caffe2:libtorch_cuda (cfg:linux-x86_64-fbcode-platform010-clang17-no-san#2a7259832b2f5c67) (cxx_compile torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (pic)) Remote command returned non-zero exit code 1 Remote action, reproduce with: `frecli cas download-action a95a6625d2b071a782a7a8ea2882f4adccf103b023df5ccb596f48c506101754:145` Stdout: <empty> Stderr: fbcode/caffe2/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:3757:16: error: unused variable 'num_elements' [-Werror,-Wunused-variable] 3757 \| size_t num_elements = output.numel(); \| ^~~~~~~~~~~~ 1 error generated. ``` This causes Sandcastle to turn off these tests, decreasing protection from other bad diffs. Clean up the unused variable to unblock. Test Plan: ``` buck2 build --config hpc_comms.use_ncclx=dev --flagfile fbcode//mode/opt fbcode//ftar:ftar_py_e2e_test ``` https://www.internalfb.com/buck2/888dfc68-07eb-4ba1-add5-b38c12d52b33 Reviewed By: c-p-i-o Differential Revision: D68126236 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144723 Approved by: https://github.com/fduwjj, https://github.com/Skylion007 Co-authored-by: Daulet Askarov <dauleta@meta.com>	2025-01-14 08:29:01 +00:00
cyy	9a841f9321	Enable bugprone-unchecked-optional-access (#144226 ) We can actually enable bugprone-unchecked-optional-access without the risk of hang. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144226 Approved by: https://github.com/albanD	2025-01-10 03:16:56 +00:00
Richard Barnes	3e7e435bb1	[codemod] Remove unused-variable in caffe2/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp +2 (#144371 ) Summary: LLVM-15 has a warning `-Wunused-variable` which we treat as an error because it's so often diagnostic of a code issue. Unused variables can compromise readability or, worse, performance. This diff either (a) removes an unused variable and, possibly, it's associated code or (b) qualifies the variable with `[[maybe_unused]]`. - If you approve of this diff, please use the "Accept & Ship" button :-) Test Plan: Sandcastle Reviewed By: palmje Pull Request resolved: https://github.com/pytorch/pytorch/pull/144371 Approved by: https://github.com/Skylion007	2025-01-09 21:49:17 +00:00
cyy	b0be30dd79	[19/N] Fix extra warnings brought by clang-tidy-17 (#144448 ) Apply more clang-tidy fixes. There was a bug introduced by #144014 due to incorrect namespace concatenation which is reverted here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144448 Approved by: https://github.com/albanD	2025-01-09 15:58:05 +00:00
PyTorch MergeBot	778d953951	Revert "[AsyncMM] re-enable and prepare for cutlass 3.5.1 update (#144011 )" This reverts commit `24ac87392b`. Reverted https://github.com/pytorch/pytorch/pull/144011 on behalf of https://github.com/malfet due to Not sure what is going on, but lots of builds are failing ([comment](https://github.com/pytorch/pytorch/pull/144011#issuecomment-2574317669))	2025-01-07 03:24:01 +00:00
Yifu Wang	24ac87392b	[AsyncMM] re-enable and prepare for cutlass 3.5.1 update (#144011 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144011 Approved by: https://github.com/Skylion007, https://github.com/drisspg	2025-01-07 02:15:42 +00:00
Dingming Wu	a881954b0c	[PTD] Dump rcclexp proxy trace in pytorch (#143678 ) Summary: Dump the active proxyOp status per rank and per communicator when WatchDog timeout or aborts. Added `#if defined(USE_ROCM) && defined(NCCL_COMM_DUMP)` guard in the print function, so only rcclexp users will see this dump in console. This is the changes of the PTD. Test Plan: Job with A2A hang due to receiver failing to post receive operations https://fburl.com/mlhub/95vg12r3 {F1971449692} Reviewed By: c-p-i-o Differential Revision: D67036093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143678 Approved by: https://github.com/c-p-i-o	2025-01-04 10:20:47 +00:00
Driss Guessous	a8c98ce175	[cutlass-3] Update third-party/cutlass-3 from 3.4 to 3.5.1 (#143515 ) # Summary: This also makes updates to different repositories throughout FB code to roll any updates needed for this new release. I was not able to get AsyncMM.cu to build (still trying) Yfiu suggested that I just skip it for now Test Plan: Have run various build commands to try and expose errors Pull Request resolved: https://github.com/pytorch/pytorch/pull/143515 Approved by: https://github.com/eqy, https://github.com/Skylion007	2025-01-02 18:45:11 +00:00
cyy	dca443835e	Enable more readability-redundant checks (#143963 ) They are helpful to simplifying code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143963 Approved by: https://github.com/albanD	2024-12-30 14:49:33 +00:00

1 2 3 4 5 ...

2208 Commits