pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	66478d0cf7	Revert "[compiled autograd] directly use python Logger class in cpp (#137953 )" This reverts commit `af91661368`. Reverted https://github.com/pytorch/pytorch/pull/137953 on behalf of https://github.com/clee2000 due to breaking builds internally D64479234, I think it makes the build size of a package too large? The logs link to a wiki with instructions of what to do ([comment](https://github.com/pytorch/pytorch/pull/137953#issuecomment-2420086928))	2024-10-17 17:19:36 +00:00
PyTorch MergeBot	375dcb960f	Revert "Avoid some dangling reference warnings (#132535 )" This reverts commit `f3d7a02716`. Reverted https://github.com/pytorch/pytorch/pull/132535 on behalf of https://github.com/clee2000 due to broke some internal builds D64479234 ([comment](https://github.com/pytorch/pytorch/pull/132535#issuecomment-2419983509))	2024-10-17 16:23:36 +00:00
homorunner	a040c4a260	Use std::move on stringstream to prevent unnecessary copy. (#138065 ) - Takes advantage of C++20's improved handling of move semantics for std::basic_stringbuf. - Reduces unnecessary copying and improves memory efficiency, especially for long formatted strings. Benchmark(proof of concept): https://quick-bench.com/q/qohAu0ARH3vSDyKVsoKEfXOO6BI Pull Request resolved: https://github.com/pytorch/pytorch/pull/138065 Approved by: https://github.com/Skylion007	2024-10-16 21:35:10 +00:00
fduwjj	b72ff35f22	[c10d][ez] Add more inline comments to CUDAEventCache code (#138079 ) Address @kwen2501 's feedback in https://github.com/pytorch/pytorch/pull/138048, add more inline comments to the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138079 Approved by: https://github.com/kwen2501 ghstack dependencies: #138040, #138048, #138059	2024-10-16 20:43:28 +00:00
Chirag Pandya	f95ddf0b31	[c10d] record world size in log (#138044 ) Summary: Record the world size in log and scuba table. This helps us quickly figure out if there are missing flight recorder files form ranks. Test Plan: Ran locally and noted that size was logged to scuba Differential Revision: D64442949 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138044 Approved by: https://github.com/Skylion007	2024-10-16 20:14:02 +00:00
PyTorch MergeBot	dd32a32cb6	Revert "Expose option to disable CRC-32 computation during `torch.save` (#137735 )" This reverts commit `534fa96f2d`. Reverted https://github.com/pytorch/pytorch/pull/137735 on behalf of https://github.com/clee2000 due to failing internally D64438525, probably needs gating ([comment](https://github.com/pytorch/pytorch/pull/137735#issuecomment-2417412264))	2024-10-16 17:03:06 +00:00
PyTorch MergeBot	361f42bc42	Revert "[compiled autograd] Compiled autograd configs in TLS (#137821 )" This reverts commit `9aba0b91c8`. Reverted https://github.com/pytorch/pytorch/pull/137821 on behalf of https://github.com/wdvr due to Reverting this for now, it is failing test_public_bindings in trunk ([comment](https://github.com/pytorch/pytorch/pull/137821#issuecomment-2417351788))	2024-10-16 16:38:29 +00:00
William Wen	4c8718d8e7	[dynamo] add torch.compiler.set_stance (#137504 ) Attempt # 2 at https://github.com/pytorch/pytorch/pull/132926 to implement https://github.com/pytorch/pytorch/issues/123771. Implement a new `torch.compiler.set_stance` function that can force `torch.compile` regions to run eagerly. See added tests for usage examples. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137504 Approved by: https://github.com/yf225, https://github.com/jansel	2024-10-16 16:18:25 +00:00
fduwjj	960c3bff98	[c10d] Refactor CUDAEventCache Create to use deque rather than stack (#138048 ) We used a LIFO stack to store the CudaEvent in the cache. ,Somehow we like FIFO deque better so aside from improving the readability of the code, we use a deque instead. As @wconstab pointed out, both methods are equally correct because the moment we put the event into stack/deque, the event is already ready for reuse, this change mostly is a preference change not trying to fix anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138048 Approved by: https://github.com/kwen2501 ghstack dependencies: #138040	2024-10-16 14:44:39 +00:00
Isuru Fernando	f3d7a02716	Avoid some dangling reference warnings (#132535 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132535 Approved by: https://github.com/aaronenyeshi	2024-10-16 13:41:12 +00:00
Simon Fan	9aba0b91c8	[compiled autograd] Compiled autograd configs in TLS (#137821 ) Multithreaded doesn't work yet, this adds python side TLS only for the python side state Pull Request resolved: https://github.com/pytorch/pytorch/pull/137821 Approved by: https://github.com/jansel, https://github.com/yf225 ghstack dependencies: #137953	2024-10-16 09:28:32 +00:00
Simon Fan	af91661368	[compiled autograd] directly use python Logger class in cpp (#137953 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137953 Approved by: https://github.com/jansel, https://github.com/yf225	2024-10-16 09:28:32 +00:00
cyyever	deaf0418b2	[2/N] Fix clang-tidy warnings in torch/csrc/api/ (#136998 ) Follows #134545 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136998 Approved by: https://github.com/ezyang	2024-10-16 07:50:59 +00:00
Shuqiang Zhang	f4158558aa	[c10d] disable watchdog thread in blockingWait mode (#138001 ) Summary: Blocking wait mode is not widely used, probably useful in debugging. in blockingWait mode, we don't need to enable the watchdog thread to check the timeout or nccl error because the main thread would throw an exception if error happens and it is obvious to user which work fails and its user's responsibility to handle the exception. Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/138001 Approved by: https://github.com/fduwjj, https://github.com/c-p-i-o ghstack dependencies: #137799	2024-10-16 07:42:22 +00:00
fduwjj	084657e012	[c10d] Fix data corruption bug after CUDAEventCache is enabled (#138040 ) Here is why we see using `CUDAEventCache` cause crash and data corruption. 1. The deleter is doing its job and append the job the stack. 2. In create, instead of getting a reference, we are getting a copy of eventsArray_[i] (which is a std::vector). This is bad because we didn't really remove the element from the stack. While we thought we already pop up the last one from the stack, but it turns out the last one is still in the stack; we end up reusing the same event again and again. What's worse, since we keep adding new events to the stack, this will eventually explode the stack and a crash happens. Fix is easy, just get a reference. Local torchtitan run see a non-Nan loss. Also we want to use a deque instead of a stack, and refactor the code a bit to make it more readable. (in a separate PR) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138040 Approved by: https://github.com/kwen2501, https://github.com/shuqiangzhang	2024-10-16 05:20:29 +00:00
Yifu Wang	80f3ee41dc	[SymmetricMemory] fix incorrect numel caculations that are using int as std::accumulate's accumulator (#138038 ) Fixes https://github.com/pytorch/pytorch/pull/137567 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138038 Approved by: https://github.com/weifengpy	2024-10-16 03:34:26 +00:00
Shuqiang Zhang	a1b22e369b	[c10d] add an API to get the future result(success or failure) of a collective and customize error handling (#137799 ) Summary: This PR is trying to let users to know what exact collective call from the python thread is failing, and customize their own error handling function, instead of watchdog thread crashing everything. This is potentially very useful in fault tolerant training, in which we can have in-process restart. E.g., when an nccl error is detected, users can potentially abort comms, re-init comms and go back to the previous check pointed step and try again, instead of crashing the whole job. This is to allow users to check the status of each collective call, using the ivalue::future libs in PT core. This also allows users to attach its customized failure handling functions by: work.get_future_result().then(erro_handling_func) Note that the above call is also non-blocking for CPU thread Test Plan: Added a new test: test_get_future_result to verify the workResutl is correctly propagated to the users Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/137799 Approved by: https://github.com/fduwjj, https://github.com/wconstab	2024-10-16 00:20:09 +00:00
PyTorch MergeBot	4557f6e339	Revert "[Dynamo] Disable torch function compilation during guard execution and in compiled bytecode (#137669 )" This reverts commit `bf0b670598`. Reverted https://github.com/pytorch/pytorch/pull/137669 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing test_public_bindings in trunk, maybe a landrace ([comment](https://github.com/pytorch/pytorch/pull/137669#issuecomment-2415331274))	2024-10-15 23:22:58 +00:00
Yifu Wang	91bc9dc2c9	[SymmetricMemory] implement timeout for barrier(), put_signal() and wait_signal() (#137643 ) Suggested by @lw for better safety/reliability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137643 Approved by: https://github.com/weifengpy, https://github.com/lw	2024-10-15 21:35:14 +00:00
Tristan Rice	aef4317ec8	[c10d] socket: retry connection timeout failures (#138003 ) This will retry connection timeout failures up to the timeout duration. Under heavy load the server may not be able to immediately accept the connection. In such a case we do want to retry the connection rather than fall back to ipv4 for the remaining of the connection timeout. The connection timeout here is not the same as the c10d timeout which appears to be higher. We could adjust the linux timeout directly but using the c10d retry loop keeps things more consistent and gives us things like exponential backoff, logs, etc. Example failure: ``` socket.cpp:752] [c10d] The client socket has failed to connect to [...]:29400 (errno: 110 - Connection timed out). socket.cpp:752] [c10d] The IPv4 network addresses of (..., 29400) cannot be retrieved (gai error: -2 - Name or service not known). ... repeats ipv4 connection failure ``` From Linux man page: https://man7.org/linux/man-pages/man2/connect.2.html ``` ETIMEDOUT Timeout while attempting connection. The server may be too busy to accept new connections. Note that for IP sockets the timeout may be very long when syncookies are enabled on the server. ``` Test plan: CI for backwards compatibility Pull Request resolved: https://github.com/pytorch/pytorch/pull/138003 Approved by: https://github.com/c-p-i-o, https://github.com/fduwjj, https://github.com/rsdcastro	2024-10-15 21:17:05 +00:00
Michael Lazos	bf0b670598	[Dynamo] Disable torch function compilation during guard execution and in compiled bytecode (#137669 ) Fixes https://github.com/pytorch/pytorch/issues/114369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137669 Approved by: https://github.com/anijain2305	2024-10-15 20:52:58 +00:00
Ke Wen	35fc24fbed	[PGNCCL] Fix bugs in non-blocking mode (#137741 ) ### Fix 1: Throw async error during init wait Previously we just busy wait for `ncclSuccess`, if the nonblocking init encountered error, we never report that. Added detection of async error via `ncclGetAsyncError`. ### Fix 2: Add wait after comm split ``` // After calling ncclCommSplit in non-blocking mode, we should wait for the // source communicator to be out of ncclInProgress state. // Reason 1: // it's unsafe to call new operations on the parent comm while it's in // ncclInProgress state. // Reason 2: // as of NCCL 2.23, the ptr value of child comm will not be filled until the // state of parent comm is ncclSuccess. This may change in the future. See: // https://github.com/NVIDIA/nccl/issues/1472 ``` This wait does not mean the child comm is ready for use, neither does it block till that point. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137741 Approved by: https://github.com/shuqiangzhang	2024-10-15 20:35:39 +00:00
Mikayla Gawarecki	534fa96f2d	Expose option to disable CRC-32 computation during `torch.save` (#137735 ) Option only works in open source, not internal Pull Request resolved: https://github.com/pytorch/pytorch/pull/137735 Approved by: https://github.com/albanD	2024-10-15 19:30:02 +00:00
PyTorch MergeBot	cd292908e5	Revert "Make c10::string_view an alias of std::string_view (#130417 )" This reverts commit `c48fe89011`. Reverted https://github.com/pytorch/pytorch/pull/130417 on behalf of https://github.com/clee2000 due to breaking some internal tests, probably usages of string_view that need to be changed? ([comment](https://github.com/pytorch/pytorch/pull/130417#issuecomment-2414775064))	2024-10-15 18:55:09 +00:00
PyTorch MergeBot	d4d687ffb2	Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519 )" This reverts commit `4a8e49389c`. Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/clee2000 due to breaking internal tests related to MITA, @ezyang has a forward fix? ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2414588302))	2024-10-15 17:19:16 +00:00
PyTorch MergeBot	9af4e0d2aa	Revert "Make Context to be Device-agnostic Step by Step (2/N) (#136526 )" This reverts commit `a6eb020522`. Reverted https://github.com/pytorch/pytorch/pull/136526 on behalf of https://github.com/clee2000 due to breaking internal tests related to MITA, @ezyang has a forward fix? ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2414588302))	2024-10-15 17:19:15 +00:00
Richard Barnes	b7f798caa4	Use C10_UNUSED instead of (void)X (#137239 ) Summary: Auto-generated with ``` buck run //scripts/rbarnes/regex_multiline_replacer:regex_multiline_replacer -- --find '^(\sfor\s\()(const.\n)\s\(void\)[A-Za-z]+;\s//\sSuppress.\s\n(.)' --replace '\1C10_UNUSED \2\3' `find caffe2/ -regex ".\.\(cpp\\|h\)"` ``` Differential Revision: D33432600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137239 Approved by: https://github.com/Skylion007	2024-10-15 14:32:59 +00:00
Xiaodong Wang	5141ade8e3	[AMD] Do not skip 0-byte send/recv (#137952 ) Summary: With https://github.com/ROCm/rccl/pull/1376, we can remove this hack now and we have verified that we no longer run into hang Test Plan: https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-xdwang-900def406a?job_attempt=0&version=1&env=PRODUCTION Differential Revision: D64370817 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137952 Approved by: https://github.com/eqy	2024-10-15 09:35:03 +00:00
cyy	8c860aef0d	[Reland][Environment Variable][3/N] Use thread-safe getenv functions (#137942 ) Reland of #137328, which was reverted due to reverting a dependent PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137942 Approved by: https://github.com/eqy	2024-10-15 07:47:24 +00:00
cyy	70206499f1	[3/N] Fix extra warnings brought by clang-tidy-17 (#137552 ) Follows #137459 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137552 Approved by: https://github.com/ezyang	2024-10-15 02:33:44 +00:00
FFFrog	a6eb020522	Make Context to be Device-agnostic Step by Step (2/N) (#136526 ) ---- - add new method(getDefaultGenerator, getNewGenerator) into AcceleratorHooksInterface Pull Request resolved: https://github.com/pytorch/pytorch/pull/136526 Approved by: https://github.com/ezyang, https://github.com/EikanWang	2024-10-15 01:53:28 +00:00
Shivam Raikundalia	aef3591998	[Profiler] Add Test for Clear on Fork (#137511 ) Summary: Tests Fix Clear On Fork by forking a process after a profile has already been done. Afterwards we check that all the PID/TID are as expected. Test Plan: Ran buck2 test 'fbcode//mode/dev' fbcode//caffe2/test:profiler -- --exact 'caffe2/test:profiler - test_forked_process (profiler.test_profiler.TestProfiler)' Differential Revision: D63992036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137511 Approved by: https://github.com/sanrise, https://github.com/aaronenyeshi	2024-10-14 23:20:33 +00:00
PyTorch MergeBot	df0c2f5cae	Revert "[Environment Variable][3/N] Use thread-safe getenv wrapper (#137328 )" This reverts commit `25ac5652d0`. Reverted https://github.com/pytorch/pytorch/pull/137328 on behalf of https://github.com/clee2000 due to need to revert this in order to revert #133896, please rebase and reland, sorry for the churn ([comment](https://github.com/pytorch/pytorch/pull/137328#issuecomment-2412143739))	2024-10-14 20:22:26 +00:00
Joel Schlosser	19918a1863	Fix autograd.Function + NJT when an output grad is None (#136875 ) For `autograd.Function`, the engine will try to allocate correctly-shaped zeros for `None` grads (i.e. in the case where the output isn't used downstream). It determines the shape of these zeros from the `VariableInfo` entry, which is derived from the forward output shape. For the NJT forward output case, the size info stored will contain a nested int, and calling `zeros()` with this size throws: ``` RuntimeError: .../build/aten/src/ATen/RegisterCPU.cpp:5260: SymIntArrayRef expected to contain only concrete integers ``` This PR fixes this by storing the full tensor in the `VariableInfo` for the nested case and calling `zeros_like()` to allocate correctly-shaped zeros. This is pretty inefficient; ideally we would want to save just the NJT shape and be able to construct zeros from it, but this requires factory function support for nested ints (WIP). So this is a short-term fix until we have that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136875 Approved by: https://github.com/soulitzer, https://github.com/huydhn	2024-10-14 19:31:50 +00:00
PyTorch MergeBot	f8a5b7170a	Revert "Fix autograd.Function + NJT when an output grad is None (#136875 )" This reverts commit `76ab1ab665`. Reverted https://github.com/pytorch/pytorch/pull/136875 on behalf of https://github.com/jbschlosser due to Caused memory leak ([comment](https://github.com/pytorch/pytorch/pull/136875#issuecomment-2411665776))	2024-10-14 16:00:44 +00:00
cyy	c48fe89011	Make c10::string_view an alias of std::string_view (#130417 ) In order to facilitate the mitigation from c10::string_view to std::string_view, the old c10::string_view was renamed to c10::string_view_ext. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130417 Approved by: https://github.com/ezyang	2024-10-14 09:28:04 +00:00
GarfieldHan	b361cd01f1	profiler: Fix undefined reference to `unwind_c` in `unwind_entry` while LTO is enabled (#137862 ) With LTO(Link Time Optimization) enabled in CFLAGS, some compiler will optimize and strip the unwind_c function, which is caused by compiler that couldn’t resolve reference correctly, thus breaking the build with undefined reference in unwind_entry. Add an attribute to avoid this bad situation. Fixes #121282 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137862 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-10-13 19:04:58 +00:00
FFFrog	4a8e49389c	Make Context to be Device-agnostic Step by Step (1/N) (#136519 ) ---- - make init to be device-agnostic and move it to AcceleratorHooksInterface - refactoring context related to device initialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519 Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/guangyey	2024-10-13 12:38:02 +00:00
Yuxin Wu	08576b254b	Fix logging in socket.cpp (#137745 ) Formatter shall avoid throwing exceptions as much as possible. Fixes https://github.com/pytorch/pytorch/pull/128673#discussion_r1796226656 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137745 Approved by: https://github.com/d4l3k, https://github.com/Skylion007	2024-10-13 10:38:10 +00:00
Michael Au-Yeung	b3af359cba	Log WorkNCCL exception string to C10dLogger (#137736 ) Summary: In WorkNCCL::handleException, log to c10d logger with `strings["work_nccl_exception"]`. Test Plan: Test run job to verify NCCL exception is logged. Differential Revision: D62603322 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137736 Approved by: https://github.com/c-p-i-o, https://github.com/fduwjj	2024-10-13 07:33:05 +00:00
Gufan Yin	fba2c0a23a	Fix comment in ProcessGroupGloo (#137746 ) Summary: Algorithm caching was removed in 2018 D13111781 Test Plan: CI Differential Revision: D64214527 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137746 Approved by: https://github.com/Skylion007, https://github.com/wz337	2024-10-12 01:04:41 +00:00
Andrew Gu	e269a5cb09	[TCPStore] Throw value error if passing `world_size=0` to TCPStore (#137792 ) This fixes https://github.com/pytorch/pytorch/issues/137577. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137792 Approved by: https://github.com/fegin, https://github.com/H-Huang ghstack dependencies: #137713, #137721	2024-10-11 23:42:57 +00:00
cyyever	25ac5652d0	[Environment Variable][3/N] Use thread-safe getenv wrapper (#137328 ) Follows #124485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137328 Approved by: https://github.com/eqy	2024-10-11 23:23:57 +00:00
Shivam Raikundalia	8486d3df69	[Profiler] Hide ProfilerStep Alignment behind Experimental Config (#137668 ) Summary: Aligning ProfilerStep# annotation can be useful for visual purposes but it affects downstream tools like HTA to misreport how long each step took. For this reason, lets give users the option to turn on this alignment manually but also turn it off by default Test Plan: Alignment off: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Oct_09_16_11_48.2543945.pt.trace.json.gz&bucket=gpu_traces Alignment on: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Oct_09_16_08_27.2518391.pt.trace.json.gz&bucket=gpu_traces Differential Revision: D64146115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137668 Approved by: https://github.com/aaronenyeshi	2024-10-11 22:57:05 +00:00
Ryan Landay	513563eb09	Fix stack named "queue" in Util::ComputePostOrder (#130526 ) This function computes a topological sort using a non-recursive implementation of DFS. Upon first reading, I thought it was using Kahn’s algorithm because it uses a variable called `queue`, but upon closer reading, I noticed this variable is actually used as a stack. This pull request improves readability by renaming the stack and changing it from `std::vector` to `std::stack`. Note: this also changes the backing store from an `std::vector` to an `std::deque`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130526 Approved by: https://github.com/alanwaketan, https://github.com/malfet	2024-10-11 20:21:07 +00:00
Andrew Gu	bdb42e7c94	[PGNCCL] Added some missing spaces in barrier msg (#137721 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137721 Approved by: https://github.com/kwen2501 ghstack dependencies: #137713	2024-10-11 15:17:25 +00:00
Ke Wen	fe148024fe	[c10d][experimental] Add _abort_process_group (#132291 ) Thanks @eqy for reminding me of this RFC: https://github.com/pytorch/pytorch/issues/119797 This PR is meant to: - provide a way to abort multiple PGs without deadlocking each other. - provide a possibility to manually handle comm errors or timeouts (and potentially recovery of such). One can find an example from: https://github.com/NVIDIA/nccl/issues/1013 ## How is it different from `destroy_process_group`? `destroy_process_group` is meant for normal exit, while `_abort_process_group` is meant for bailout upon hangs or failures. Similar to `ncclCommDestroy` vs `ncclCommAbort`. ## What's new in `_abort_process_group`? It added support for "group abort" semantic. The "group abort" semantic is capable of aborting multiple NCCL comms concurrently, avoiding deadlock in otherwise serialized `ncclCommAbort` executions. Details are in the [RFC](https://github.com/pytorch/pytorch/issues/119797) targeting [the hang issue in multi-comm case](https://github.com/NVIDIA/nccl/issues/1013). `Group abort` semantic is added in NCCL 2.22. ## What's next? Ideally, the watchdog's behavior should support "group abort" too. But this is hard to implement today due to a lack of "global view" by each PG's individual watchdog. A big semi-big refactor may be needed to "uplift" the watchdogs to a global level or consolidate them into one (i.e. one dog watching multiple PGs). In any case, it may not be a bad idea to experiment the "group abort" feature with a manual API first and then extend to the automatic mode (watchdog). Pull Request resolved: https://github.com/pytorch/pytorch/pull/132291 Approved by: https://github.com/eqy	2024-10-11 05:04:17 +00:00
Richard Barnes	a919742149	c10::optional -> std::optional in PyTorch (#137333 ) Test Plan: Sandcastle Differential Revision: D63876535 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137333 Approved by: https://github.com/Skylion007, https://github.com/albanD	2024-10-11 00:16:10 +00:00
PyTorch MergeBot	b55ff476bd	Revert "[Distributed] Fix extra context on device 0 (#135273 )" This reverts commit `cdd8fa98c7`. Reverted https://github.com/pytorch/pytorch/pull/135273 on behalf of https://github.com/PaliC due to broken tests on trunk ([comment](https://github.com/pytorch/pytorch/pull/137161#issuecomment-2406236337))	2024-10-10 23:47:25 +00:00
PyTorch MergeBot	079f909263	Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519 )" This reverts commit `be0b75256a`. Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/jovianjaison due to this pr is causing errors internally ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2405781093))	2024-10-10 18:32:17 +00:00

1 2 3 4 5 ...

14527 Commits