pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	afa1eda901	Revert "[PGNCCL] Launch kernel on current stream & remove `record_stream` entirely (#148590 )" This reverts commit `ef6296e7f2`. Reverted https://github.com/pytorch/pytorch/pull/148590 on behalf of https://github.com/izaitsevfb due to reverted internally, see D71292427 ([comment](https://github.com/pytorch/pytorch/pull/148590#issuecomment-2731114626))	2025-03-17 22:43:15 +00:00
Joel Schlosser	5e1b715dda	BC fix for AOTIModelPackageLoader() constructor defaults (#149082 ) The default value for `run_single_threaded` was wrongly specified in the .cpp file instead of the header, breaking C++-side instantiation of `AOTIModelPackageLoader` with no arguments. This PR fixes this and adds a test for the use case of running with `AOTIModelPackageLoader` instead of `AOTIModelContainerRunner` on the C++ side. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149082 Approved by: https://github.com/desertfire	2025-03-13 18:40:53 +00:00
Bin Bao	b9803a5c81	[AOTI] Re-enable AOTI cpp unit test (#149085 ) Summary: test_inductor_aoti was removed by accident previously. Add it back. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149085 Approved by: https://github.com/jbschlosser	2025-03-13 16:00:38 +00:00
Ke Wen	ef6296e7f2	[PGNCCL] Launch kernel on current stream & remove `record_stream` entirely (#148590 ) This PR has multiple changes to `ProcessGroupNCCL` (which unfortunately are related): 1. When async_op=False, we directly launch the collective on "current" stream, instead of a trampoline stream and join back. - Resolves #147729 - Resolves #146881 - Also saves two event syncs (which have overhead in case of HIP) and one pybind when we call `work.wait()` in distributed_c10d.py on behalf of user. 2. Entirely remove `record_stream` and use CPU-side stashing for managing tensor lifetime against recycling. - Resolves #147168 3. Remove tensor life management when async_op=False; only use it when async_op=True. 4. To guard against user not calling `work.wait()`, we ask watchdog to unstash tensors after detecting completion of collectives, to prevent us from holding reference to tensors forever. This is a safety net, rather than a service guarantee, see discussion [here](https://github.com/pytorch/pytorch/issues/147168#issuecomment-2660142460). 5. Profile in async_op=False mode would look different -- collective kernels would show up in the same line and compute kernels. Joint work with @cenzhaometa who wants to remove the event sync overhead. Cc: @ngimel @awgu @Aidyn-A @skyw @wconstab @leonardo0lyj Differential Revision: [D70937982](https://our.internmc.facebook.com/intern/diff/D70937982) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148590 Approved by: https://github.com/eqy, https://github.com/Aidyn-A, https://github.com/fduwjj	2025-03-11 18:36:12 +00:00
PyTorch MergeBot	a95eb0c0a7	Revert "[PGNCCL] Launch kernel on current stream & remove `record_stream` entirely (#148590 )" This reverts commit `2149f6c684`. Reverted https://github.com/pytorch/pytorch/pull/148590 on behalf of https://github.com/ZainRizvi due to Breaking internally, see D70873275. Discussed reverting this with Ke. To validate your fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/148590#issuecomment-2712001270))	2025-03-10 22:38:40 +00:00
Ke Wen	2149f6c684	[PGNCCL] Launch kernel on current stream & remove `record_stream` entirely (#148590 ) This PR has multiple changes to `ProcessGroupNCCL` (which unfortunately are related): 1. When async_op=False, we directly launch the collective on "current" stream, instead of a trampoline stream and join back. - Resolves #147729 - Resolves #146881 - Also saves two event syncs (which have overhead in case of HIP) and one pybind when we call `work.wait()` in distributed_c10d.py on behalf of user. 2. Entirely remove `record_stream` and use CPU-side stashing for managing tensor lifetime against recycling. - Resolves #147168 3. Remove tensor life management when async_op=False; only use it when async_op=True. 4. To guard against user not calling `work.wait()`, we ask watchdog to unstash tensors after detecting completion of collectives, to prevent us from holding reference to tensors forever. This is a safety net, rather than a service guarantee, see discussion [here](https://github.com/pytorch/pytorch/issues/147168#issuecomment-2660142460). 5. Profile in async_op=False mode would look different -- collective kernels would show up in the same line and compute kernels. Joint work with @cenzhaometa who wants to remove the event sync overhead. Cc: @ngimel @awgu @Aidyn-A @skyw @wconstab @leonardo0lyj Differential Revision: [D70835197](https://our.internmc.facebook.com/intern/diff/D70835197) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148590 Approved by: https://github.com/eqy, https://github.com/Aidyn-A, https://github.com/fduwjj	2025-03-09 07:32:23 +00:00
PyTorch MergeBot	9cb25f0ea2	Revert "[PGNCCL] Launch kernel on current stream & remove `record_stream` entirely (#148590 )" This reverts commit `17dbeb11db`. Reverted https://github.com/pytorch/pytorch/pull/148590 on behalf of https://github.com/janeyx99 due to PR break backward compat test ([comment](https://github.com/pytorch/pytorch/pull/148590#issuecomment-2708641172))	2025-03-09 03:01:55 +00:00
Ke Wen	17dbeb11db	[PGNCCL] Launch kernel on current stream & remove `record_stream` entirely (#148590 ) This PR has multiple changes to `ProcessGroupNCCL` (which unfortunately are related): 1. When async_op=False, we directly launch the collective on "current" stream, instead of a trampoline stream and join back. - Resolves #147729 - Resolves #146881 - Also saves two event syncs (which have overhead in case of HIP) and one pybind when we call `work.wait()` in distributed_c10d.py on behalf of user. 2. Entirely remove `record_stream` and use CPU-side stashing for managing tensor lifetime against recycling. - Resolves #147168 3. Remove tensor life management when async_op=False; only use it when async_op=True. 4. To guard against user not calling `work.wait()`, we ask watchdog to unstash tensors after detecting completion of collectives, to prevent us from holding reference to tensors forever. This is a safety net, rather than a service guarantee, see discussion [here](https://github.com/pytorch/pytorch/issues/147168#issuecomment-2660142460). 5. Profile in async_op=False mode would look different -- collective kernels would show up in the same line and compute kernels. Joint work with @cenzhaometa who wants to remove the event sync overhead. Cc: @ngimel @awgu @Aidyn-A @skyw @wconstab @leonardo0lyj Differential Revision: [D70835197](https://our.internmc.facebook.com/intern/diff/D70835197) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148590 Approved by: https://github.com/eqy, https://github.com/Aidyn-A, https://github.com/fduwjj	2025-03-08 20:00:12 +00:00
rpsilva	4abff4b271	Introduce cache clearing APIs for the lazy graph executor (#144489 ) This PR introduces two new methods to the LazyGraphExecutor class: - ClearComputationCache(): Allows clearing the entire computation cache. - RemoveFromComputationCache(hash): Enables removal of specific cache entries based on their hash. The main objective is to expose cache management functionality for debugging cache hits and misses across different computations. For instance: - Reset the cache state in tests, allowing reuse of the same computation client to evaluate cache logic consistently. - Selectively remove cache entries to analyze the impact on subsequent computations. - Improve observability into the cache behavior, aiding in the investigation of cache-related issues or optimizations. On the XLA lazy graph executor, we want to run a series of tests that modify some parts of the HLO module proto of the computation, and we need a means to ensure that the hash is agnostic to some elements (OpMetadata in the XLA proto data). Hence, it would be easy to parameterize the test, clear the cache and validate that the resulting hash is the same between runs. Otherwise, we'd need to hardcode the resulting serialized hash. Simultaneously, another motivation, is that users could also clear some computation hashes for an added flexibility in their applications, by introducing their own custom strategies for maintaining the cache (without relying on the default LRU). Pull Request resolved: https://github.com/pytorch/pytorch/pull/144489 Approved by: https://github.com/wconstab	2025-01-29 17:38:01 +00:00
Shuqiang Zhang	c0861d092c	[PGNCCL] Add an API to get the status/error code at the PG level (#144498 ) Summary: This PR is basically a replacement of https://github.com/pytorch/pytorch/pull/140087, which caused some perf drop due to frequent TCPStore check in watchdog thread. The fix is to move the tcpstore check in monitoring thread If unhealthy, the user should be able to get the type of errors, e.g., timeout,nccl error or remote error. This API is applied to PG level, compared to the work.get_future_result() API which is applied to Work Level. Error detection at PG level is much more convenient for users to handle the PG failure as a whole, e.g, restarting the PG. Error handling at the work level is still useful for users to attach work specific context and debug the RC of the specific failing work/collective Note it is critical for all ranks in the PG to be notified about an error as soon as it occurs, so we introduce an errorType of REMOTE_ERROR, which is 'broadcasted' from a src rank (which detects a local error) to all other ranks in the PG, the broadcast is done through TCPStore currently Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/144498 Approved by: https://github.com/kwen2501	2025-01-24 16:47:32 +00:00
fduwjj	ae7df51232	[c10d] Fix CudaEventCache for dangling references (#144496 ) Reported in https://github.com/pytorch/pytorch/issues/143470, we have a dangling references in `CudaEventCache`. So we want to fix it. 1. We add a unit test to repro the issue mentioned in the issue. 2. Instead of converting variables to shared pointers as suggested in the issue, we then make the cache itself a shared pointer. So if the thread creates the cache dies before all events get recycled, the cache is still there until the last CudaEvent get deleted. (thanks for the suggestion from @kwen2501 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144496 Approved by: https://github.com/kwen2501	2025-01-15 05:11:48 +00:00
PyTorch MergeBot	b80ecc4457	Revert "Fix poision child process issue when call getAccelerator() (#144368 )" This reverts commit `2583d831d4`. Reverted https://github.com/pytorch/pytorch/pull/144368 on behalf of https://github.com/clee2000 due to broke internal tests D68023262, probably the same problem as noted in the issue this PR is mentioned above ([comment](https://github.com/pytorch/pytorch/pull/144368#issuecomment-2584848568))	2025-01-10 23:36:43 +00:00
Yu, Guangye	2583d831d4	Fix poision child process issue when call getAccelerator() (#144368 ) # Motivation fix https://github.com/pytorch/pytorch/issues/144152 # Solution - Align `at::globalContext()::hasXXX` to determine if accelerator XXX is built with PyTorch or an extension already registered to PyTorch. - Define `at::hasXXX` to determine if accelerator XXX is available at runtime. - Use `at::globalContext()::hasXXX` in `getAccelerator` rather than `at::hasXXX` to avoid initializing the XXX runtime (which can poison child processes) while detecting the current accelerator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144368 Approved by: https://github.com/albanD, https://github.com/atalman, https://github.com/gujinghui	2025-01-10 09:28:27 +00:00
Aaron Gokaslan	bbec35f028	[BE]: Replace clone detach with detach clone to be more efficient (#144469 ) Follow up to #144270 and fix some vulkan code Pull Request resolved: https://github.com/pytorch/pytorch/pull/144469 Approved by: https://github.com/awgu	2025-01-09 18:28:39 +00:00
Jithun Nair	1365ae859c	[ROCm][CI] upgrade CI to ROCm 6.3 (#142152 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142152 Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-01-09 17:14:16 +00:00
Will Feng	bf7009d839	[rpc] Fix unit test after c10::nullopt removal (#143690 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143690 Approved by: https://github.com/yifuwang, https://github.com/c-p-i-o, https://github.com/XilunWu	2024-12-20 23:36:07 +00:00
Richard Barnes	f9da639950	[codemod] Fix a few unused-variable issues in pytorch (#143517 ) Summary: LLVM-15 has a warning `-Wunused-variable` which we treat as an error because it's so often diagnostic of a code issue. Unused variables can compromise readability or, worse, performance. This diff either (a) removes an unused variable and, possibly, it's associated code or (b) qualifies the variable with `[[maybe_unused]]`. - If you approve of this diff, please use the "Accept & Ship" button :-) Test Plan: Sandcastle Reviewed By: palmje Pull Request resolved: https://github.com/pytorch/pytorch/pull/143517 Approved by: https://github.com/mhorowitz	2024-12-19 00:18:08 +00:00
James	d4ed5941db	Fix floating point literals in IRPrinter (#142119 ) Fixes #114035 This is a recreation of #140002 with approval from its author. Original description: >when v larger than 1e16, the format will be error. example: v is 1.2e17, the output is 1.2e17.f, it have two point '.' Pull Request resolved: https://github.com/pytorch/pytorch/pull/142119 Approved by: https://github.com/jgong5, https://github.com/malfet	2024-12-18 21:59:48 +00:00
cyy	075905b7bd	[14/N] Fix extra warnings brought by clang-tidy-17 (#141644 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141644 Approved by: https://github.com/ezyang Co-authored-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>	2024-12-13 06:22:13 +00:00
Richard Barnes	82ce888273	c10::string_view -> std::string_view in more places (#142517 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142517 Approved by: https://github.com/malfet	2024-12-12 19:45:59 +00:00
PyTorch MergeBot	2f0fe82f6d	Revert "[14/N] Fix extra warnings brought by clang-tidy-17 (#141644 )" This reverts commit `24a5a2ef25`. Reverted https://github.com/pytorch/pytorch/pull/141644 on behalf of https://github.com/clee2000 due to failing internally D67112938 ([comment](https://github.com/pytorch/pytorch/pull/141644#issuecomment-2539602023))	2024-12-12 17:43:36 +00:00
Richard Barnes	7667235a23	c10::optional -> std::optional (#142514 ) Fixes issues introduced in https://github.com/pytorch/pytorch/pull/141348 and https://github.com/pytorch/pytorch/pull/139578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142514 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-12-12 17:23:46 +00:00
cyy	24a5a2ef25	[14/N] Fix extra warnings brought by clang-tidy-17 (#141644 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141644 Approved by: https://github.com/ezyang	2024-12-11 18:40:42 +00:00
Richard Barnes	7e41717a26	c10::string_view -> std::string_view in caffe2/jit (#142383 ) Test Plan: Sandcastle Differential Revision: D66939979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142383 Approved by: https://github.com/malfet	2024-12-10 15:42:28 +00:00
Mu-Chu Lee	d3d1a78774	[AOTInductor] Add standalone test for compilation from ExportedProgram (#142327 ) Summary: Provide a standalone path to compile and run a ExportedProgram in C. Test Plan: (1) Generate a compiled model from ExportedProgram ``` python generate_lowered_cpu.py --input-path /tmp/$USER/ep.pt --output-path /tmp/$USER/final.pt ``` (2) Compile a standalone test runner ``` TORCH_ROOT_DIR=/data/users/$USER/pytorch sh standalone_compile.sh standalone_test.cpp standalone_test.out ``` (3) Run test for the compiled model in step (1) ``` LD_LIBRARY_PATH=/data/users/$USER/pytorch/build/lib ./standalone_test.out /tmp/$USER/final.pt ``` Differential Revision: D66872380 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142327 Approved by: https://github.com/hl475	2024-12-10 06:50:09 +00:00
Bin Bao	1cb2ebd740	[AOTI] Fix #140546 and support AOTI package load for Intel GPU. (#140664 ) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #140686 * __->__ #140664 * #140269 * #140268 * #135320 * #135318 * #139026 Fix #140546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140664 Approved by: https://github.com/desertfire, https://github.com/EikanWang ghstack dependencies: #140268, #140269 Co-authored-by: Bin Bao <binbao@meta.com>	2024-12-10 05:05:08 +00:00
PyTorch MergeBot	6fcb294e18	Revert "[AOTI] Fix #140546 and support AOTI package load for Intel GPU. (#140664 )" This reverts commit `91d30546a4`. Reverted https://github.com/pytorch/pytorch/pull/140664 on behalf of https://github.com/clee2000 due to breaks forward compatibility? D66937097 ([comment](https://github.com/pytorch/pytorch/pull/140269#issuecomment-2528828555))	2024-12-09 17:33:28 +00:00
xinan.lin	91d30546a4	[AOTI] Fix #140546 and support AOTI package load for Intel GPU. (#140664 ) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #140686 * __->__ #140664 * #140269 * #140268 * #135320 * #135318 * #139026 Fix #140546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140664 Approved by: https://github.com/desertfire, https://github.com/EikanWang ghstack dependencies: #140268, #140269	2024-12-07 19:22:04 +00:00
rzou	215f5d77b5	[functional autograd] Refactor validate_outputs into a functional variant (#141348 ) Today, validate_outputs is stateful (it depends on the autograd graph). This PR refactors it into a stateless form that just depends on InputMetadata. Test Plan: - new unittest Pull Request resolved: https://github.com/pytorch/pytorch/pull/141348 Approved by: https://github.com/soulitzer ghstack dependencies: #141278	2024-12-04 18:06:31 +00:00
Nikita Shulga	38bbe37187	Enable CI on SM89 (#140305 ) Using EC2 G6 instance, based on NVIDIA L4, added to scale config in https://github.com/pytorch/test-infra/pull/5376 To enable more balanced sharding, had to push `148ae19935` Added `@xfailIfSM89` to the following tests: - test_fp8_pattern_2 - test_original_aten_preserved_split_addmm - test_sparse_semi_structured_scaled_mm - test_sparse_semi_structured_scaled_mm_fp8 - test_sparse_fp8fp8_mm Increased tolerance to 2e-4 for `RNNTest.BidirectionalMultilayerGRU_CPU_vs_CUDA` Skipped following inductor tests (that either flaky OOMs or timeouts): - test_reduction_fn_std_float64 - test_reduction_fn_var_mean_float64 - test_multi_output_unbacked_custom_op Pull Request resolved: https://github.com/pytorch/pytorch/pull/140305 Approved by: https://github.com/wdvr, https://github.com/ZainRizvi	2024-12-03 04:49:46 +00:00
Ke Wen	ad39a2fc46	[1/N] Decouple Flight Recorder from NCCL utils (#141648 ) Part of the effort to make Flight Recorder device agnostic. Step 1: Move it out of NCCLUtils. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141648 Approved by: https://github.com/fduwjj	2024-11-27 18:29:42 +00:00
fduwjj	5b4c864672	[c10d] Enable CudaEventCache by default and add multi device support (#140975 ) We added `CudaEventCache` in https://github.com/pytorch/pytorch/pull/133727 and this is a feature which tries to reuse CudaEvent so that we don't call destroy of CudaEvent which causes hang in the past. We had a bunch of tests and testing on TorchTitan and internal workload already. So far no errors or crash are found at the moment so we decide to roll out to all OSS users. For internal workload, this PR would not affect it because of some internal gating. Also we observed some multi-device use cases in OSS, so that we want to bring back multi-device support originally proposed in https://github.com/pytorch/pytorch/pull/122732/files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140975 Approved by: https://github.com/eqy, https://github.com/kwen2501	2024-11-26 18:42:45 +00:00
cyy	263d8f7a94	[8/N] Don't skip ASAN on some tests (#140081 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140081 Approved by: https://github.com/ezyang	2024-11-09 01:00:13 +00:00
Bin Bao	1868fc63d8	[AOTI] Update C++ runner API to take a const vector (#139955 ) Summary: Tighten the AOTIModelContainerRunner::run interface to take a const vector of at::Tensor, which 1) makes it clear that the runner will not modify the input tensor vector; 2) runner will be able to take a temp vector of tensors as the input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139955 Approved by: https://github.com/chenyang78	2024-11-08 16:59:10 +00:00
Ke Wen	e474f0de82	[PGNCCL] Slimming watchdog loop (#139834 ) - Refactored traceback code into `work.printTraceback()`. cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @shuqiangzhang - Refactored desync debug code into `class DesyncDebugger`. - Moved occurrences of `futureWorkResult_->markCompleted` into `checkAndSetException` and `checkTimeout`, respectively. cc @shuqiangzhang - Modularized dump signal broadcast code into `ProcessGroupNCCL::broadcastDumpSignal`. cc @fduwjj @c-p-i-o Pull Request resolved: https://github.com/pytorch/pytorch/pull/139834 Approved by: https://github.com/shuqiangzhang	2024-11-07 17:22:44 +00:00
PyTorch MergeBot	7e02386303	Revert "[2/N] Replace c10::sv with std::sv (#139456 )" This reverts commit `028c5d3426`. Reverted https://github.com/pytorch/pytorch/pull/139456 on behalf of https://github.com/ZainRizvi due to Sorry but this breaks internally. @ezyang can you please help get this landed? See D65546398 for more details ([comment](https://github.com/pytorch/pytorch/pull/139456#issuecomment-2462768891))	2024-11-07 17:00:59 +00:00
cyy	028c5d3426	[2/N] Replace c10::sv with std::sv (#139456 ) Follows #139453 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139456 Approved by: https://github.com/ezyang	2024-11-06 01:50:38 +00:00
cyyever	46d0b635b9	[CMake] Remove pthread linking (#134436 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134436 Approved by: https://github.com/r-barnes	2024-10-29 23:14:40 +00:00
Richard Barnes	068f7e7a78	torch::optional -> std::optional (#138987 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138987 Approved by: https://github.com/Skylion007	2024-10-28 19:09:46 +00:00
Richard Barnes	42994234a6	std::value/std::type -> std::_v/std::_t (#138746 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138746 Approved by: https://github.com/cyyever, https://github.com/malfet	2024-10-26 20:59:24 +00:00
Mwiza Kunda	22d2e2d9a0	Set RUNPATH so installed tests can find the required shared libraries (#136627 ) This change fixes the RUNPATH of installed c++ tests so that the linker can find the shared libraries they depend on. For example, currently: ```bash venv/lib/python3.10/site-packages/torch $ ./bin/test_lazy ./bin/test_lazy: error while loading shared libraries: libtorch.so: cannot open shared object file: No such file or directory ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136627 Approved by: https://github.com/malfet	2024-10-25 09:38:08 +00:00
Angela Yi	51f6b946ae	[torchbind] Add generic __deepcopy__ method (#137613 ) Summary: Added a generic `__deepcopy__` method which will use the torchbind object's existing `__getattr__` and `__setattr__` to copy the torchbind object. This will later be used in [D64124825](https://www.internalfb.com/diff/D64124825) Differential Revision: D64124826 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137613 Approved by: https://github.com/ydwu4, https://github.com/zou3519	2024-10-24 22:14:55 +00:00
FFFrog	af0bc75460	Remove deprecated alias macro(1/3) (#137556 ) Detailed Descriptions: - Remove AT_ERROR Macro Pull Request resolved: https://github.com/pytorch/pytorch/pull/137556 Approved by: https://github.com/ezyang	2024-10-21 17:32:32 +00:00
Richard Barnes	fddabc6e0b	C10_UNUSED to [[maybe_unused]] (#6357 ) (#138364 ) Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/6357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138364 Approved by: https://github.com/Skylion007, https://github.com/eqy	2024-10-19 13:17:43 +00:00
Edward Yang	b14269dcfb	Make Context to be Device-agnostic Step by Step (1/N) (#136519 ) (#138155 ) Summary: - make init to be device-agnostic and move it to AcceleratorHooksInterface - refactoring context related to device initialization Original pull request: https://github.com/pytorch/pytorch/pull/136519 Test Plan: contbuild & OSS CI, see `4a8e49389c` Reviewed By: malfet Differential Revision: D64471142 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138155 Approved by: https://github.com/malfet, https://github.com/bobrenjc93	2024-10-17 20:58:56 +00:00
Shivam Raikundalia	dfb5ac05cc	[Record Function] Add Kwargs only USER_SCOPE Macro (#138020 ) Summary: Add a macro such that users can easily add a USER annotation with kwargs only Test Plan: Will use D63801503 to test this E2E. Added unit test as well that makes sure that the kwargs get recorded correctly Differential Revision: D64420328 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138020 Approved by: https://github.com/davidberard98, https://github.com/aaronenyeshi	2024-10-17 18:48:49 +00:00
fduwjj	7e704c2073	[c10d] Add unit test for CUDAEventCache to ensure caching is working (#138059 ) We created a simple test to validate the cache is indeed working and when the cache is indeed used up. I revert the fix in (https://github.com/pytorch/pytorch/pull/138040) and the test indeed failed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138059 Approved by: https://github.com/kwen2501 ghstack dependencies: #138040, #138048	2024-10-16 17:34:57 +00:00
Shuqiang Zhang	f4158558aa	[c10d] disable watchdog thread in blockingWait mode (#138001 ) Summary: Blocking wait mode is not widely used, probably useful in debugging. in blockingWait mode, we don't need to enable the watchdog thread to check the timeout or nccl error because the main thread would throw an exception if error happens and it is obvious to user which work fails and its user's responsibility to handle the exception. Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/138001 Approved by: https://github.com/fduwjj, https://github.com/c-p-i-o ghstack dependencies: #137799	2024-10-16 07:42:22 +00:00
PyTorch MergeBot	d4d687ffb2	Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519 )" This reverts commit `4a8e49389c`. Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/clee2000 due to breaking internal tests related to MITA, @ezyang has a forward fix? ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2414588302))	2024-10-15 17:19:16 +00:00
Richard Barnes	b7f798caa4	Use C10_UNUSED instead of (void)X (#137239 ) Summary: Auto-generated with ``` buck run //scripts/rbarnes/regex_multiline_replacer:regex_multiline_replacer -- --find '^(\sfor\s$)(const.\n)\s\(void$[A-Za-z]+;\s//\sSuppress.\s\n(.)' --replace '\1C10_UNUSED \2\3' `find caffe2/ -regex ".\.$cpp\\|h$"` ``` Differential Revision: D33432600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137239 Approved by: https://github.com/Skylion007	2024-10-15 14:32:59 +00:00

1 2 3 4 5 ...

2329 Commits