pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
fduwjj	7e704c2073	[c10d] Add unit test for CUDAEventCache to ensure caching is working (#138059 ) We created a simple test to validate the cache is indeed working and when the cache is indeed used up. I revert the fix in (https://github.com/pytorch/pytorch/pull/138040) and the test indeed failed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138059 Approved by: https://github.com/kwen2501 ghstack dependencies: #138040, #138048	2024-10-16 17:34:57 +00:00
Shuqiang Zhang	f4158558aa	[c10d] disable watchdog thread in blockingWait mode (#138001 ) Summary: Blocking wait mode is not widely used, probably useful in debugging. in blockingWait mode, we don't need to enable the watchdog thread to check the timeout or nccl error because the main thread would throw an exception if error happens and it is obvious to user which work fails and its user's responsibility to handle the exception. Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/138001 Approved by: https://github.com/fduwjj, https://github.com/c-p-i-o ghstack dependencies: #137799	2024-10-16 07:42:22 +00:00
PyTorch MergeBot	d4d687ffb2	Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519 )" This reverts commit `4a8e49389c`. Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/clee2000 due to breaking internal tests related to MITA, @ezyang has a forward fix? ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2414588302))	2024-10-15 17:19:16 +00:00
Richard Barnes	b7f798caa4	Use C10_UNUSED instead of (void)X (#137239 ) Summary: Auto-generated with ``` buck run //scripts/rbarnes/regex_multiline_replacer:regex_multiline_replacer -- --find '^(\sfor\s$)(const.\n)\s\(void$[A-Za-z]+;\s//\sSuppress.\s\n(.)' --replace '\1C10_UNUSED \2\3' `find caffe2/ -regex ".\.$cpp\\|h$"` ``` Differential Revision: D33432600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137239 Approved by: https://github.com/Skylion007	2024-10-15 14:32:59 +00:00
Catherine Lee	8ac06467d4	Forward fix test (#137910 ) Summary: Add back in a deleted file to fix test It was removed in https://github.com/pytorch/pytorch/pull/137404 Test Plan: `buck2 build --flagfile fbcode//mode/opt fbcode//caffe2/test/cpp/c10d:ProcessGroupGlooAsyncTest` succeeded Differential Revision: D64341074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137910 Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/kit1980	2024-10-14 22:07:29 +00:00
FFFrog	4a8e49389c	Make Context to be Device-agnostic Step by Step (1/N) (#136519 ) ---- - make init to be device-agnostic and move it to AcceleratorHooksInterface - refactoring context related to device initialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519 Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/guangyey	2024-10-13 12:38:02 +00:00
Nichols A. Romero	bd63ec4f45	[ROCm] LoadHIP CMake cleanup (#137112 ) Should help mitigate issues reported here: https://github.com/pytorch/pytorch/issues/128313 While working on https://github.com/pytorch/pytorch/pull/136700, we realized that some of the ROCm CMake can be streamlined. This PR does not fix any bugs or provide any new functionality. Strictly clean-up. The remaining `${ROCM_ROCTX_LIB}` will be removed when we transition to the rocprofiler-sdk (to be done in a separate PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/137112 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily	2024-10-13 00:06:41 +00:00
PyTorch MergeBot	079f909263	Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519 )" This reverts commit `be0b75256a`. Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/jovianjaison due to this pr is causing errors internally ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2405781093))	2024-10-10 18:32:17 +00:00
cyy	94e12f97dc	[Distributed] [16/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#137404 ) Follows #137072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137404 Approved by: https://github.com/Skylion007	2024-10-10 18:05:34 +00:00
sanshang	249152475d	fix sequence number for group (#134578 ) Summary: Fix sequence number in execution trace dump for matching between collective/p2p op and wait in execution trace replay. `ProcessGroupNCCL` has 2 sequence number counter, `seqCollective_` and `seqP2P_`. `b18ba9419e/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp (L1188-L1191)` However, `WorkNCCL` only has one sequence number member `seq_`. `b18ba9419e/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp (L387)` We need to match collective and p2p with wait separately. `29b5a462dc` Depend on: https://github.com/pytorch/pytorch/pull/135132 Test Plan: buck2 run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_execution_trace_integration_test Differential Revision: Pull Request resolved: https://github.com/pytorch/pytorch/pull/134578 Approved by: https://github.com/kwen2501, https://github.com/c-p-i-o	2024-10-10 04:24:06 +00:00
Shuqiang Zhang	47a515d260	[c10d] simplify barrier implementation and further decouple CPU/GPU (#137516 ) synchronization Summary: Barrier is essentially intended to block CPU thread (instead of GPU streams). Before we used 2 stream synchronizations (1. current stream blocked by nccl stream end event, 2. CPU thread blocked on current stream). This is unnecessary as we already have CPU thread blocking logic in wait(). Also, adding barrier specific code block in the general GPU synchronize() API is intrusive and confusing. This PR cleans this. Test Plan: CI Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/137516 Approved by: https://github.com/fduwjj, https://github.com/kwen2501	2024-10-09 23:55:28 +00:00
FFFrog	be0b75256a	Make Context to be Device-agnostic Step by Step (1/N) (#136519 ) - make init to be device-agnostic and move it to AcceleratorHooksInterface - refactoring context related to device initialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519 Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/guangyey	2024-10-09 02:13:36 +00:00
cyy	6327a71880	[Environment Variable][2/N] Use thread-safe setenv wrapper (#124485 ) This follows #119449 to make setenv thread-safe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124485 Approved by: https://github.com/eqy	2024-10-04 07:30:51 +00:00
abhishek-fujitsu	63d6908da0	fix build error with gcc 12+ (#137092 ) Fixes #127920 This commit addresses a build failure occurring with GCC 12 and above due to the -Werror=nonnull flag. The error manifests in the test_api target. Issue: When building with GCC 12+, the following error occurs: ``` error: argument 1 null where non-null expected [-Werror=nonnull] 431 \| __builtin_memmove(__result, __first, sizeof(_Tp) * _Num); \| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This change ensures that: 1. The flag is only added for GCC 12 or higher 2. The flag is only added if it's supported by the compiler 3. The flag is added specifically to the test_api target, not globally By disabling this specific error, we allow the build to proceed while maintaining other compiler warnings. Test Plan: - Verified successful build with GCC 12 and above - Ensured no regression in builds with earlier GCC versions and other compilers Pull Request resolved: https://github.com/pytorch/pytorch/pull/137092 Approved by: https://github.com/malfet	2024-10-02 00:37:15 +00:00
fduwjj	911a43f930	[TCPStore] Remove deprecated constructor (#136004 ) While looking at TCPStore code again and found it confusing that we still keep the deprecated constructor for TCPStore in cpp while we don't expose it in python via pybind already. I checked both internal and external, all use cases in cpp (aside from unit test fixed in this PR) already moved to using option. So let's remove this legacy constructor to avoid confusion. Differential Revision: [D62653634](https://our.internmc.facebook.com/intern/diff/D62653634) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136004 Approved by: https://github.com/Skylion007, https://github.com/XilunWu	2024-09-14 04:25:47 +00:00
PyTorch MergeBot	564d00f364	Revert "Fix clang-tidy warnings in Caffe2 code (#134935 )" This reverts commit `7cfd23636c`. Reverted https://github.com/pytorch/pytorch/pull/134935 on behalf of https://github.com/izaitsevfb due to breaks internal builds, caffe2 is still used internally ([comment](https://github.com/pytorch/pytorch/pull/134935#issuecomment-2349368152))	2024-09-13 16:42:37 +00:00
cyy	7cfd23636c	Fix clang-tidy warnings in Caffe2 code (#134935 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134935 Approved by: https://github.com/ezyang	2024-09-12 03:27:09 +00:00
PyTorch MergeBot	c044deb9ce	Revert "c10d/logging: add C10D_LOCK_GUARD (#134131 )" This reverts commit `f33bcbe5fd`. Reverted https://github.com/pytorch/pytorch/pull/134131 on behalf of https://github.com/kit1980 due to See D61985186 ([comment](https://github.com/pytorch/pytorch/pull/134131#issuecomment-2327556381))	2024-09-03 22:35:14 +00:00
Bin Bao	310eb6d8c6	[AOTI] Fix test_aoti_inference CPU build issue (#134675 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/130311. We need to guard CUDA-only code in test_aoti_inference with macros so that it won't fail for CPU-only platform. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134675 Approved by: https://github.com/atalman, https://github.com/chunyuan-w	2024-08-28 17:42:19 +00:00
Tristan Rice	f33bcbe5fd	c10d/logging: add C10D_LOCK_GUARD (#134131 ) This adds logs if we can't acquire locks in NCCLUtils and ProcessGroupNCCL for 30s. This is motivated by some deadlocks were seeing and it's unclear if it's in NCCL or on the PyTorch side of things. This required replacing most `std::mutex` with `std::timed_mutex` and `std::condition_variable_any` as appropriate. Test plan: existing CI for regressions will add unit tests on `C10D_LOCK_GUARD` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134131 Approved by: https://github.com/c-p-i-o, https://github.com/fduwjj	2024-08-28 01:40:42 +00:00
PyTorch MergeBot	1c4780e69a	Revert "c10d/logging: add C10D_LOCK_GUARD (#134131 )" This reverts commit `4c28a0eb0b`. Reverted https://github.com/pytorch/pytorch/pull/134131 on behalf of https://github.com/ZainRizvi due to Sorry but this causes formatting errors internally which make it fail to build. See D61759282 ([comment](https://github.com/pytorch/pytorch/pull/134131#issuecomment-2310455878))	2024-08-26 15:19:27 +00:00
Sheng Fu	519342962d	Pass process group info into NcclWork (#134269 ) Summary: Pass process group info into NcclWork Test Plan: buck2 run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_execution_trace_integration_test Differential Revision: D61677160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134269 Approved by: https://github.com/wconstab	2024-08-24 01:04:43 +00:00
Tristan Rice	4c28a0eb0b	c10d/logging: add C10D_LOCK_GUARD (#134131 ) This adds logs if we can't acquire locks in NCCLUtils and ProcessGroupNCCL for 30s. This is motivated by some deadlocks were seeing and it's unclear if it's in NCCL or on the PyTorch side of things. This required replacing most `std::mutex` with `std::timed_mutex` and `std::condition_variable_any` as appropriate. Test plan: existing CI for regressions will add unit tests on `C10D_LOCK_GUARD` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134131 Approved by: https://github.com/c-p-i-o, https://github.com/fduwjj	2024-08-24 00:27:39 +00:00
Xuehai Pan	758a0a88a2	[BE][Easy] enable `ruff` rule `PIE790`: unnecessary `pass` statement (#133200 ) This PR removes unnecessary `pass` statement. This is semanticly safe because the bytecode for the Python code does not change. Note that if there is a docstring in the function, a empty function does not need a `pass` statement as placeholder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133200 Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/kit1980	2024-08-15 15:50:19 +00:00
cyy	c2eeda5da0	[structural binding][12/N] Replace std::tie with structural binding (#131031 ) Follows #130830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131031 Approved by: https://github.com/ezyang	2024-08-14 00:51:34 +00:00
Chirag Pandya	40767e8468	[BE] rename testHelperPrefix test (#132916 ) Summary: Re-enable testHelperPrefix test that was erroneously disabled in CI. Fixes #50701 Test Plan: Test passes locally: ``` ❯ ./TCPStoreTest --gtest_filter=TCPStoreTest.testHelperPrefix Running main() from /data/users/cpio/pytorch/third_party/googletest/googletest/src/gtest_main.cc Note: Google Test filter = TCPStoreTest.testHelperPrefix [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from TCPStoreTest [ RUN ] TCPStoreTest.testHelperPrefix [W807 12:01:31.531576727 socket.cpp:462] [c10d] waitForInput: poll for socket SocketImpl(fd=6, addr=[localhost]:37984, remote=[localhost]:37171) returned 0, likely a timeout [W807 12:01:31.531663710 socket.cpp:487] [c10d] waitForInput: socket SocketImpl(fd=6, addr=[localhost]:37984, remote=[localhost]:37171) timed out after 100ms [ OK ] TCPStoreTest.testHelperPrefix (314 ms) [----------] 1 test from TCPStoreTest (314 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (314 ms total) [ PASSED ] 1 test. ╭─ ~/local/pytorch/build/bin main *1 +1 ···················· ✔ /home/cpio/local/a/pytorch-env  cpio@devgpu011 ─╮ ╰─ ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/132916 Approved by: https://github.com/Skylion007	2024-08-08 20:54:52 +00:00
Prachi Gupta	c326533999	[ROCm][Inductor] Enable AOT Inductor CPP UTs for ROCm (#131521 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/131521 Approved by: https://github.com/jataylo, https://github.com/pruthvistony, https://github.com/malfet	2024-08-08 19:49:56 +00:00
Shangdi Yu	21906ddaba	[AOTI] Fix complex64 not defined (#132810 ) Partially fixes #122980 - change cpp type mapping for complex64 to std::complex<float> - add `aoti_torch_item_complex64` and `aoti_torch_scalar_to_tensor_complex64`. - add `expensiveCopyToTensor()` to convert `ArrayRefTensor<T>` type to `AtenTensorHandle` type. - if we want to fully fix #122980, we still need to let ArrayRef and MiniArrayRef to consider underlying storage number of elements. See more details in https://github.com/pytorch/pytorch/pull/132347 (#132347 broke some internal tests, so we need more work before landing it). Pull Request resolved: https://github.com/pytorch/pytorch/pull/132810 Approved by: https://github.com/desertfire	2024-08-08 18:08:23 +00:00
Matthew Hoffman	258f47fc0b	Add `padding_side` to `pad_sequence` with `"left"` and `"right"` options (`"right"` as default) (#131884 ) Fixes #10536 Reattempt of #61467. Thank you so much to @mskoh52 for your excellent work! As I was trying to create a more efficient LLM data collator, I realized that `pad_sequence` only supports right padding, even though left padding is a very common format for LLMs, like Llama and Mistral. The proposed alternative implementation was to use multiple flips, which tends to be 1.5x-2x slower. Instead we can add a [`padding_side` parameter as there is for for Hugging Face tokenizers](`9d6c0641c4/src/transformers/tokenization_utils_base.py (L1565)`), which requires only a very small change in the C++ code. Here are the benchmarks of the new implementation! `float32`: ![eaaa95ef-9384-45d2-be56-6898bc1d3514](https://github.com/user-attachments/assets/3b0eb309-e5a0-4a4d-97bb-4e3298783dbb) `bool`: ![892f32da-8d9a-492b-9507-18d3f0a41e8e](https://github.com/user-attachments/assets/6824ea15-7d4e-4b89-95f0-8546635f0c2e) Code: ```python from __future__ import annotations import random import time from typing import Literal import numpy as np import torch def pad_sequence_with_flips( sequences: list[torch.Tensor], batch_first: bool = False, padding_value: int \| float \| bool = 0.0, padding_side: Literal["left", "right"] \| str = "left", ) -> torch.Tensor: if padding_side == 'right': padded_sequence = torch._C._nn.pad_sequence([t.flatten() for t in sequences], batch_first=batch_first, padding_value=padding_value) elif padding_side=='left': padded_sequence = torch._C._nn.pad_sequence([t.flatten().flip(0) for t in sequences], batch_first=batch_first, padding_value=padding_value) # pyright: ignore[reportArgumentType] padded_sequence = padded_sequence.flip(int(batch_first)) else: raise ValueError(f"padding_side should be either 'right' or 'left', but got {padding_side}") return padded_sequence sequence_lengths: list[int] = [] flip_left_pad_times: list[float] = [] flip_left_pad_times_std: list[float] = [] left_pad_times: list[float] = [] left_pad_times_std: list[float] = [] RUNS_PER_LOOP: int = 100 for i in range(1, 7): sequence_length = i * int(1e6) // 6 sequence_lengths.append(sequence_length) sequences = [torch.randint(0, 2, (random.randint(1, sequence_length),), dtype=torch.bool) for _ in range(64)] inner_left_pad_times: list[float] = [] inner_right_pad_times: list[float] = [] inner_flip_left_pad_times: list[float] = [] inner_flip_right_pad_times: list[float] = [] for _ in range(RUNS_PER_LOOP): start = time.perf_counter() torch._C._nn.pad_sequence(sequences, batch_first=True, padding_value=False, padding_side="left") end = time.perf_counter() inner_left_pad_times.append(end - start) start = time.perf_counter() pad_sequence_with_flips(sequences, batch_first=True, padding_value=False, padding_side="left") end = time.perf_counter() inner_flip_left_pad_times.append(end - start) left_pad_times.append(sum(inner_left_pad_times) / len(inner_left_pad_times)) left_pad_times_std.append(np.std(inner_left_pad_times)) flip_left_pad_times.append(sum(inner_flip_left_pad_times) / len(inner_flip_left_pad_times)) flip_left_pad_times_std.append(np.std(inner_flip_left_pad_times)) print(f"Sequence Length: {sequence_length}, Left Pad Time: {left_pad_times[-1]}, Left with Flips Pad Time: {flip_left_pad_times[-1]}") import matplotlib.pyplot as plt plt.plot(sequence_lengths, left_pad_times, label="new pad_sequence left") plt.scatter(sequence_lengths, left_pad_times) plt.errorbar(sequence_lengths, left_pad_times, yerr=left_pad_times_std, linestyle='None', marker='^') plt.plot(sequence_lengths, flip_left_pad_times, label="old pad_sequence left (2 flips)") plt.scatter(sequence_lengths, flip_left_pad_times) plt.errorbar(sequence_lengths, flip_left_pad_times, yerr=flip_left_pad_times_std, linestyle='None', marker='^') plt.xlabel("Sequence Length") plt.ylabel("Time (s)") plt.legend(loc="upper right") # Sequence Length: 166666, Left Pad Time: 0.06147645162009212, Left with Flips Pad Time: 0.09842291727001794 # Sequence Length: 333333, Left Pad Time: 0.08933195920990329, Left with Flips Pad Time: 0.15597836187991562 # Sequence Length: 500000, Left Pad Time: 0.08863158334006585, Left with Flips Pad Time: 0.15224887342999863 # Sequence Length: 666666, Left Pad Time: 0.10524682551997103, Left with Flips Pad Time: 0.18177212480995877 # Sequence Length: 833333, Left Pad Time: 0.11801802741003485, Left with Flips Pad Time: 0.20821274195001024 # Sequence Length: 1000000, Left Pad Time: 0.131894061660023, Left with Flips Pad Time: 0.23223503091008751 ``` Co-authored-by: mskoh52 <mskoh52@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/131884 Approved by: https://github.com/ezyang	2024-08-07 15:53:07 +00:00
Ivan Zaitsev	841cadd555	Fix discrepancies from 129973 (#132545 ) #129973 ([D59132793](https://www.internalfb.com/diff/D59132793)) was exported missing changes in `test/cpp/jit/CMakeLists.txt` this PR remediates that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132545 Approved by: https://github.com/kit1980	2024-08-03 01:57:49 +00:00
Oguz Ulgen	221350e3a4	Add None return type to init -- tests (#132352 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132352 Approved by: https://github.com/ezyang ghstack dependencies: #132335, #132351	2024-08-01 15:44:51 +00:00
Xuehai Pan	548c460bf1	[BE][Easy][7/19] enforce style for empty lines in import segments in `test/[a-c]/` and `test/[q-z]/` (#129758 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129758 Approved by: https://github.com/ezyang	2024-07-31 10:54:03 +00:00
cyy	89da94594e	[11/N] Fix clang-tidy warnings in jit (#132131 ) Follows #132122 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132131 Approved by: https://github.com/Skylion007	2024-07-31 03:45:52 +00:00
cyy	73d0f484b3	[structural binding][11/N] Replace std::tie with structural binding (#130830 ) Follows #130784 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130830 Approved by: https://github.com/janeyx99	2024-07-18 00:45:06 +00:00
cyy	168e41009b	[structural binding][10/N] Replace std::tie with structural binding (#130784 ) Follows #130404 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130784 Approved by: https://github.com/malfet	2024-07-16 10:28:14 +00:00
cyy	28f6ae2718	[9/N] Replace c10::optional with std::optional (#130674 ) Follows #130509 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130674 Approved by: https://github.com/Skylion007	2024-07-15 00:48:43 +00:00
Yidi Wu	0bf9a091ec	[torchbind] add tracing_mode support (#129586 ) Sometimes, it could be difficult to write a fake class e.g. when the original implementation is using some third-party libraries or users are certain that the class is safe to trace with the real object. This PR allows user to specify their intention by implementing a "safe_to_trace_with_real_obj" method on their script class. Test Plan: `pytest test/export/test_torchbind.py -k safe` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129586 Approved by: https://github.com/zou3519	2024-07-12 18:01:47 +00:00
cyy	fb5888c719	Remove unused type traits in torch/csrc/utils (#128799 ) Follows #127852 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128799 Approved by: https://github.com/ezyang	2024-06-27 23:51:18 +00:00
Shivam Raikundalia	1d0efedc85	[Profiler] Add TSC Clock Callback to CUPTI (#125036 ) Summary: Right now we use the default clock for CUPTI which is not monotonic nor particularly fast. We have already added the Kineto side of the implementation here: https://www.internalfb.com/diff/D56525885 This diff only adds the compile flags such that the TSC format is used and sets the converter using a libkineto call in the profiler Test Plan: Obtained following trace using resnet test: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Apr_25_11_03_18.3862943.pt.trace.json.gz&bucket=gpu_traces TBD: Add benchmarks Differential Revision: D56584521 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125036 Approved by: https://github.com/aaronenyeshi	2024-06-27 21:07:43 +00:00
Yidi Wu	b9697eacd3	[torchbind] support tensor ops inside of __obj_flatten__ (#129605 ) As titled. Previously, __obj_flatten__ can run in a fake tensor mode, e.g. in process_input of aot_autograd, which is surrounded by a fake tensor mode. This causes the tensor ops inside __obj_flatten__ to run under fake tensor mode. However, tensors inside of script obejct are real tensors, this causes the fake tensor mode to error out saying that we need to first fakify fall the tensors (because allow_non_fake_inputs is set to True). In this PR, we disable all the dispatch modes when running to_fake_obj. Note that, the output of `__obj_flatten__` will be fakified and filled inside of the corresponding FakeScriptObject. So during traicng, we'll be using FakeScriptObject that has fake tensor contents. Test Plan: Add a new test: pytest test/export/test_torchbind.py -k test_compile_tensor_op_in_tensor_flatten Pull Request resolved: https://github.com/pytorch/pytorch/pull/129605 Approved by: https://github.com/angelayi	2024-06-27 03:07:31 +00:00
Tristan Rice	0298560ca2	TCPStore: improve connect and retry logic (#129261 ) We've been facing issues where TCPStore can successfully connect but then fail in the validate() function due to resets from listen backlog queue overflow when combined with reset enabled as well as long init times. This PR does a few things: * Retry that connect and validate up to the specified timeout. * Use exponential backoff for the retry logic with jitter instead of a fixed 1s sleep. * Eliminate the `sleep(std::chrono::milliseconds(numWorkers))` on init which can add significant delays to startup. This is no longer necessary per @XilunWu https://github.com/pytorch/pytorch/pull/116141 Test plan: ``` python test/distributed/test_store.py -v ./build/bin/BackoffTest ``` Will do internal testing with some large scale jobs to ensure TCPStore works correctly. At 4k scale: 4x improvement ``` tristanr@devvm4382 ~/pt_tests [SIGABRT]> time TORCH_SHOW_CPP_STACKTRACES=1 python tcpstore_large_test.py (pytorch-3.10) started 0 init 0 set 0 joined all ________________________________________________________ Executed in 1.98 secs fish external usr time 0.93 secs 91.00 micros 0.93 secs sys time 1.98 secs 954.00 micros 1.97 secs tristanr@devvm4382 ~/pt_tests> conda activate torchdrive-3.10 (pytorch-3.10) tristanr@devvm4382 ~/pt_tests> time TORCH_SHOW_CPP_STACKTRACES=1 python tcpstore_large_test.py (torchdrive-3.10) started 0 init 0 set 0 joined all ________________________________________________________ Executed in 8.20 secs fish external usr time 2.15 secs 0.00 micros 2.15 secs sys time 2.76 secs 843.00 micros 2.76 secs ``` ```py import time import os import threading from multiprocessing import Pool WORLD_SIZE = 10000 import torch.distributed as dist def run(rank): should_log = rank % (WORLD_SIZE // 10) == 0 if should_log: print(f"started {rank}") store = dist.TCPStore( host_name="devvm4382.nao0.facebook.com", port=29500, world_size=WORLD_SIZE, is_master=rank == 0, use_libuv=True, ) if should_log: print(f"init {rank}") store.set(f"key{rank}", "1234") if should_log: print(f"set {rank}") del store def noop(rank): pass print("starting pool") with Pool(WORLD_SIZE) as pool: pool.map(noop, range(WORLD_SIZE), 1) print("pool hot") start = time.time() pool.map(run, range(WORLD_SIZE), 1) print("run finished", time.time()-start) ``` ``` tristanr@devvm4382 ~/pt_tests> python tcpstore_large_test.py (pytorch-3.10) starting pool pool hot started 0 [W624 16:58:09.086081750 TCPStore.cpp:343] [c10d] Starting store with 10000 workers but somaxconn is 4096.This might cause instability during bootstrap, consider increasing it. started 1000 init 1000 set 1000 started 2000 init 2000 set 2000 started 3000 init 3000 set 3000 started 4000 init 4000 set 4000 started 5000 init 5000 set 5000 started 6000 init 6000 set 6000 started 7000 init 7000 set 7000 started 8000 init 8000 set 8000 started 9000 init 9000 set 9000 init 0 set 0 run finished 0.705092191696167 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129261 Approved by: https://github.com/rsdcastro, https://github.com/wconstab, https://github.com/kurman, https://github.com/XilunWu, https://github.com/c-p-i-o	2024-06-25 19:24:22 +00:00
Yidi Wu	b22f0f5f51	[torchbind] fix bug of mutating FakeScriptObjects twice in aot_export (#128844 ) This PR does two things: 1. it duplicates the fake script object because aot_export trace the program twice. The result of tracing in the first time would cause the tracing result of second time be wrong. 2. Also add a new test for methods that return constant outputs. Before the PR, there's is no meta["val"] for these nodes because fx won't track these constants. We still need to preserve these constant return operators in the graph because torchbind objects are stateful and deleting it would remove the implicit state mutation inside of the object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128844 Approved by: https://github.com/angelayi	2024-06-24 23:14:34 +00:00
cyy	e4c32d14a8	[3/N] Remove inclusion of c10/util/string_utils.h (#128504 ) Follows #128372 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128504 Approved by: https://github.com/malfet	2024-06-15 06:38:40 +00:00
Kiuk Chung	8629939a51	[torch/c10] Add C10_UBSAN_ENABLED macro and use it to disable SymInt_… (#127967 ) Adds `C10_UBSAN_ENABLED` macro and use it to disable `SymIntTest::Overflows` (fails under `signed-integer-overflow` UBSAN check). Also cleans up UBSAN guard in `jit/test_misc.cpp` to use `C10_UBSAN_ENABLED` and the existing `C10_ASAN_ENABLED` instead of locally defining `HAS_ASANUBSAN`. > NOTE: This should fix `SymIntTest::Overflows` failing under ubsan in fbcode too... Pull Request resolved: https://github.com/pytorch/pytorch/pull/127967 Approved by: https://github.com/atalman, https://github.com/d4l3k, https://github.com/malfet	2024-06-14 16:01:12 +00:00
cyy	b054470db2	Remove unused functions (#127881 ) Some unused functions detected by g++ warnings can be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127881 Approved by: https://github.com/zou3519	2024-06-05 05:21:24 +00:00
cyy	e7cb43a2d2	Check unused variables in tests (#127498 ) Enables unused variable checks in CMake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127498 Approved by: https://github.com/ezyang	2024-06-04 05:35:25 +00:00
cyy	8629f9b3f2	Remove more unused variables in tests (#127510 ) Follows #127379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127510 Approved by: https://github.com/Skylion007, https://github.com/r-barnes	2024-05-31 03:39:45 +00:00
Richard Barnes	3f5b59eef4	[codemod] c10::optional -> std::optional in caffe2/aten/src/ATen/DeviceGuard.h +117 (#126901 ) Summary: Generated with ``` fbgs -f '.*\.(cpp\|cxx\|cc\|h\|hpp\|cu\|cuh)$' c10::optional -l \| perl -pe 's/^fbsource.fbcode.//' \| grep -v executorch \| xargs -n 50 perl -pi -e 's/c10::optional/std::optional/g' ``` - If you approve of this diff, please use the "Accept & Ship" button :-) (117 files modified.) Test Plan: Sandcastle Reviewed By: palmje Pull Request resolved: https://github.com/pytorch/pytorch/pull/126901 Approved by: https://github.com/Skylion007, https://github.com/eqy	2024-05-24 00:26:15 +00:00
cyy	95e5c994f9	[Submodule] Clear USE_QNNPACK build option (#126941 ) Following the removal of QNNPACK third-party module #126657, we can clear more build system code. Also third_party/neon2sse was removed because it is not used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126941 Approved by: https://github.com/ezyang	2024-05-24 00:12:56 +00:00
Chirag Pandya	a83e745356	[BE] split seq_id to collective_seq_id and p2p_seq_id (#125727 ) Summary: Split out `seq_id` into `collective_seq_id` and `p2p_seq_id`. The main idea here is that collectives that go to all machines should have identical `collective_seq_id` and therefore it makes it easier to spot if one of machines isn't handling a collective operation. Next, we can attempt to match up p2p operations to ensure that the sender(s)/receivers(s) are in sync. Resolves issue: https://github.com/pytorch/pytorch/issues/125173 Test Plan: Unit tests. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/125727 Approved by: https://github.com/zdevito	2024-05-21 03:26:49 +00:00
David Berard	cb3b8cd0d3	Use object identity for deepcopy memo (#126126 ) Copy of #126089, with some additional fixes & tests Partial fix for #125635: previously, the deepcopy implementation would group together any tensors with any aliasing relationship and assign them to the same tensor. This was sort of good if you have two tensors `b = a.detach()`, because then if you deepcopy `list = [a, b]` to `list2 = list.deepcopy()`, then writes to `list2[0]` will also modify `list2[1]`. But for the most part, it's bad; (1) if you have `b = a.as_strided((4, 4), (16, 1), 16)`, then it'll make `b == a` in the deepcopied implementation, which is completely wrong; and (2) even if you have `b = a.detach()`, these are still initially two different tensors which become the same tensor after the old deepcopy implementation. The new implementation only groups together tensors that have the same identity. This is a partial fix, but it's more reasonable. What changes: * (becomes more correct): different views of the same base tensor will no longer all become equal after deepcopying * (still kind of wrong): views won't actually alias each other after deepcopying. * (arguably a minor regression): equivalent views of the same tensor will no longer be copied to the same tensor - so they won't alias. BC breaking: C++ deepcopy interface changes from accepting `IValue::HashAliasedIValueMap memo` to accepting `IValue::HashIdentityIValueMap memo`. If there are objections, we can keep the old API. However, it seems likely that users generally won't try to deepcopy from C++. Differential Revision: [D57406306](https://our.internmc.facebook.com/intern/diff/D57406306) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126126 Approved by: https://github.com/ezyang	2024-05-17 00:06:26 +00:00
Richard Barnes	ed327876f5	[codemod] `c10:optional` -> `std::optional` (#126135 ) Generated by running the following from PyTorch root: ``` find . -regex ".*\.$cpp\\|h\\|cu\\|hpp\\|cc\\|cxx$$" \| grep -v "build/" \| xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/' ``` `c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi	2024-05-14 19:35:51 +00:00
ydwu4	461ffaaaf3	[dynamo] support torchbind object input (#124978 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124978 Approved by: https://github.com/jansel	2024-05-07 03:02:00 +00:00
Zhirui Dai	3411d54811	fix loading optimizer options from archive (#125215 ) This PR makes libtorch behave the same as PyTorch when loading optimizer state from archive. With PyTorch, options of parameter groups are loaded from the archive, which is missing currently in libtorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125215 Approved by: https://github.com/janeyx99	2024-05-06 23:58:40 +00:00
Shuqiang Zhang	bfd5bb0c44	[c10d] only PG0 should dump when monitoring thread timed out (#125356 ) Summary: We found that some dumps are missing when monitoring thread timeout. This is likely due to multiple PGs could still dump the same records at the same time. So we should allow only PG0 to actualy dump Test Plan: unit test python test/run_test.py --cpp --verbose -i cpp/ProcessGroupNCCLErrorsTest Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/125356 Approved by: https://github.com/c-p-i-o	2024-05-04 00:43:20 +00:00
Stefan-Alin Pahontu	bebefcf845	Driver folder check (#117548 ) Added extra check for driver folders for Libtorch, as stat struct does not recognize driver folders, so torch.save should work for them as well. (e.g. save model.pt directly under C: ) Fixes [#111121](https://github.com/pytorch/pytorch/issues/111121) and #105488 Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/117548 Approved by: https://github.com/malfet	2024-05-03 09:10:11 +00:00
Wes Bland	6f5f405b05	[ncclx] Rename NCCL-EXP to NCCLX (#125238 ) Reviewed By: kryanchun Differential Revision: D56534548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125238 Approved by: https://github.com/kwen2501	2024-05-01 23:29:55 +00:00
PyTorch MergeBot	724c7491d0	Revert " [Distributed] [7/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124987 )" This reverts commit `b3fd94d15e`. Reverted https://github.com/pytorch/pytorch/pull/124987 on behalf of https://github.com/ezyang due to broke downstream extensions ([comment](https://github.com/pytorch/pytorch/pull/124987#issuecomment-2083956511))	2024-04-30 00:37:53 +00:00
cyy	b3fd94d15e	[Distributed] [7/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124987 ) This PR continues to clean clang-tidy warnings in torch/csrc/distributed/c10d, following #124701. In addition, libfmt dependency is added in CMake code to enable using it in the headers. The libfmt has to be added as private dependency to torch_cuda and torch_hip because they include torch/csrc/distributed/c10d/Utils.hpp which uses libfmt. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124987 Approved by: https://github.com/malfet	2024-04-27 07:22:27 +00:00
Kurt Mohler	abcb42cdd2	Avoid COW materialize in various places (1) (#124984 ) Most, not all, of these cases were found automatically with `git grep -n '^\s\<const\>.\.=.*\<data_ptr\>'` Part of #97856 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124984 Approved by: https://github.com/Skylion007	2024-04-26 19:06:28 +00:00
Shivam Raikundalia	63d4dc5a80	Remove TMP_LIBKINETO_NANOSECOND flag from Compilation (#124734 ) Summary: Now that we have reached nanosecond granularity, we can now remove the temporary guards that were previously required for nanosecond precision. Test Plan: Regression should cover this change Reviewed By: aaronenyeshi Differential Revision: D56444570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124734 Approved by: https://github.com/aaronenyeshi	2024-04-26 06:57:03 +00:00
Bin Bao	b2fd224f27	[AOTI] Add more ABI-compatiblity unit test (#123900 ) Summary: Follow https://github.com/pytorch/pytorch/pull/123848, and test more c10 util functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123900 Approved by: https://github.com/chenyang78	2024-04-23 16:06:40 +00:00
Bin Bao	4946638f06	[AOTI] Add ABI-compatiblity tests (#123848 ) Summary: In AOTInductor generated CPU model code, there can be direct references to some aten/c10 utility functions and data structures, e.g. at::vec and c10::Half. These are performance critical and thus it doesn't make sense to create C shim for them. Instead, we make sure they are implemented in a header-only way, and use this set of tests to guard future changes. There are more header files to be updated, but we will do it in other followup PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123848 Approved by: https://github.com/jansel ghstack dependencies: #123847	2024-04-19 00:51:24 +00:00
Xuehai Pan	93e249969b	[BE] enable `ruff` rule `RSE` and remove useless parentheses in `raise` statements (#124261 ) Remove useless parentheses in `raise` statements if the exception type is raised with no argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261 Approved by: https://github.com/albanD	2024-04-17 19:29:34 +00:00
Shivam Raikundalia	3ebbeb75fd	[Profiler] Make Kineto traces export ns granularity for finer timestamps (#122425 ) (#123650 ) Summary: Kineto traces use microsecond level granularity because of chrome tracing defaults to that precision. Fix by adding preprocessor flag to TARGETS and BUCK files. Also remove any unnecessary ns to us conversions made in the profiler itself. This diff contains profiler changes only. Libkineto changes found in D54964435. Test Plan: Check JSON and chrome tracing to make sure values are as expected. Tracing with flags enabled should have ns precision. Tracings without flags should be same as master. Zoomer: https://www.internalfb.com/intern/zoomer/?profiling_run_fbid=796886748550189 Ran key_averages() to make sure FunctionEvent code working as expected: -- ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ProfilerStep* 0.74% 3.976ms 64.40% 346.613ms 69.323ms 0.000us 0.00% 61.710ms 12.342ms 5 Optimizer.zero_grad#SGD.zero_grad 0.76% 4.109ms 0.76% 4.109ms 821.743us 0.000us 0.00% 0.000us 0.000us 5 ## forward ## 6.89% 37.057ms 27.19% 146.320ms 29.264ms 0.000us 0.00% 58.708ms 11.742ms 5 aten::conv2d 0.22% 1.176ms 7.74% 41.658ms 157.199us 0.000us 0.00% 27.550ms 103.962us 265 aten::convolution 0.79% 4.273ms 7.52% 40.482ms 152.762us 0.000us 0.00% 27.550ms 103.962us 265 aten::_convolution 0.69% 3.688ms 6.73% 36.209ms 136.637us 0.000us 0.00% 27.550ms 103.962us 265 aten::cudnn_convolution 6.04% 32.520ms 6.04% 32.520ms 122.719us 27.550ms 8.44% 27.550ms 103.962us 265 aten::add_ 2.42% 13.045ms 2.42% 13.045ms 30.694us 12.700ms 3.89% 12.700ms 29.882us 425 aten::batch_norm 0.19% 1.027ms 8.12% 43.717ms 164.971us 0.000us 0.00% 16.744ms 63.185us 265 aten::_batch_norm_impl_index 0.31% 1.646ms 7.93% 42.691ms 161.096us 0.000us 0.00% 16.744ms 63.185us 265 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Differential Revision: D55925068 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123650 Approved by: https://github.com/aaronenyeshi	2024-04-11 04:29:20 +00:00
PyTorch MergeBot	c66d503194	Revert "[Profiler][submodule] Make Kineto traces export ns granularity for finer timestamps (#122425 )" This reverts commit `6f7dd2f84a`. Reverted https://github.com/pytorch/pytorch/pull/122425 on behalf of https://github.com/malfet due to Breaks ROCM builds ([comment](https://github.com/pytorch/pytorch/pull/122425#issuecomment-2041129241))	2024-04-06 16:19:00 +00:00
Shivam Raikundalia	6f7dd2f84a	[Profiler][submodule] Make Kineto traces export ns granularity for finer timestamps (#122425 ) Summary: Kineto traces use microsecond level granularity because of chrome tracing defaults to that precision. Fix by adding preprocessor flag to TARGETS and BUCK files. Also remove any unnecessary ns to us conversions made in the profiler itself. This diff contains profiler changes only. Libkineto changes found in D54964435. Test Plan: Check JSON and chrome tracing to make sure values are as expected. Tracing with flags enabled should have ns precision. Tracings without flags should be same as master. Tracing with flags enabled: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Mar_18_14_37_22.4155151.pt.trace.json.gz&bucket=gpu_traces Tracing without flags enabled: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Mar_18_14_39_15.4166047.pt.trace.json.gz&bucket=gpu_traces Tracing on main: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Mar_18_14_42_43.4177559.pt.trace.json.gz&bucket=gpu_traces Ran key_averages() to make sure FunctionEvent code working as expected: -- ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ProfilerStep* 0.74% 3.976ms 64.40% 346.613ms 69.323ms 0.000us 0.00% 61.710ms 12.342ms 5 Optimizer.zero_grad#SGD.zero_grad 0.76% 4.109ms 0.76% 4.109ms 821.743us 0.000us 0.00% 0.000us 0.000us 5 ## forward ## 6.89% 37.057ms 27.19% 146.320ms 29.264ms 0.000us 0.00% 58.708ms 11.742ms 5 aten::conv2d 0.22% 1.176ms 7.74% 41.658ms 157.199us 0.000us 0.00% 27.550ms 103.962us 265 aten::convolution 0.79% 4.273ms 7.52% 40.482ms 152.762us 0.000us 0.00% 27.550ms 103.962us 265 aten::_convolution 0.69% 3.688ms 6.73% 36.209ms 136.637us 0.000us 0.00% 27.550ms 103.962us 265 aten::cudnn_convolution 6.04% 32.520ms 6.04% 32.520ms 122.719us 27.550ms 8.44% 27.550ms 103.962us 265 aten::add_ 2.42% 13.045ms 2.42% 13.045ms 30.694us 12.700ms 3.89% 12.700ms 29.882us 425 aten::batch_norm 0.19% 1.027ms 8.12% 43.717ms 164.971us 0.000us 0.00% 16.744ms 63.185us 265 aten::_batch_norm_impl_index 0.31% 1.646ms 7.93% 42.691ms 161.096us 0.000us 0.00% 16.744ms 63.185us 265 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Differential Revision: D55087993 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122425 Approved by: https://github.com/aaronenyeshi	2024-04-06 06:04:28 +00:00
Arun Pa	f71e368969	UFMT formatting on test/autograd test/ao test/cpp test/backends (#123369 ) Partially addresses #123062 Ran lintrunner on - test/_test_bazel.py - test/ao - test/autograd test/backends test/benchmark_uitls test/conftest.py test/bottleneck_test test/cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/123369 Approved by: https://github.com/huydhn	2024-04-05 18:51:38 +00:00
Chun Cai	691054eeef	Fix error message of autograd (#123154 ) This PR updates the error message in autograd when an input tensor does not set to `require_grad`. The original message does not contain the index info, making users hard to debug. The error message style consists with that on line 105-109. Co-authored-by: Jeffrey Wan <soulitzer@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123154 Approved by: https://github.com/soulitzer	2024-04-03 19:07:21 +00:00
ydwu4	c77352b5cc	Add torch._library.register_fake_class to fakify torchBind class (#122622 ) This PR only adds abstract class registration logic without touching existing tests so they still trace with real script object. The added tests are only for registration APIs and test error messages. Our design is that the abstract implementation should be in Python. This is much better in terms of usability. But this also has implications for custom op that takes script object as input, which is detailed later in this stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122622 Approved by: https://github.com/zou3519 ghstack dependencies: #122619, #122620, #122621	2024-04-02 23:52:17 +00:00
ydwu4	46c7235406	add tensor queue example (#122621 ) This PR adds a tensor queue example for later use. It doesn't touch any existing logic. It refactors the tests a little bit to avoid importing the library in unittest setUp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122621 Approved by: https://github.com/zou3519 ghstack dependencies: #122619, #122620	2024-04-02 23:52:17 +00:00
blegouix	ccfc87b199	include scheduler_on_plateau in optim.h (#121722 ) Fixes #121593 Co-authored-by: Jane Xu <janeyx@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/121722 Approved by: https://github.com/albanD	2024-03-27 19:45:25 +00:00
Mu-Chu Lee	2367d0dacd	[AOTInductor] Add tensor_constantX to pass constant buffer update's check (#122562 ) (#122690 ) Summary: During tracing, some constants (tensor_constant{idx}) are being generated internally. Those constants are neither parameters or buffers, and users have zero control on them. To accomodate this, we should allow users not passing in those constants generated internally but still be able the constants in the model. Test Plan: Included in commit. ``` build/bin/test_aot_inductor ``` Reviewed By: zoranzhao Differential Revision: D55354548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122690 Approved by: https://github.com/khabinov	2024-03-26 23:25:15 +00:00
PyTorch MergeBot	55f36d1ada	Revert "[AOTInductor] Add tensor_constantX to pass constant buffer update's check (#122562 )" This reverts commit `57a3d00b06`. Reverted https://github.com/pytorch/pytorch/pull/122562 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/122562#issuecomment-2019262415))	2024-03-26 02:18:19 +00:00
Mu-Chu Lee	57a3d00b06	[AOTInductor] Add tensor_constantX to pass constant buffer update's check (#122562 ) Summary: During tracing, some constants (tensor_constant{idx}) are being generated internally. Those constants are neither parameters or buffers, and users have zero control on them. To accomodate this, we should allow users not passing in those constants generated internally but still be able the constants in the model. Test Plan: Included in commit. ``` build/bin/test_aot_inductor ``` Differential Revision: D55286634 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122562 Approved by: https://github.com/chenyang78, https://github.com/khabinov	2024-03-25 22:05:20 +00:00
David Berard	2d9cee20a2	[jit] AliasDB type hash - don't always return 0 (#121874 ) This hash was missing an assignment, so for almost all types it was returning "0". c10::flat_hash_map turns out to have really bad behavior with a terrible hash like this, nearly exponential in memory usage. Differential Revision: [D54916424](https://our.internmc.facebook.com/intern/diff/D54916424) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121874 Approved by: https://github.com/eellison	2024-03-14 23:16:08 +00:00
angelayi	e8836759d0	[export] Add effect token to export (#121424 ) Following the creation of effect tokens (https://github.com/pytorch/pytorch/pull/120296), we want to now add support for these tokens in export because the calling/returning convention has changed. The inputs are now `(tokens, params, buffers, constants, user_inputs)` and the outputs are `(tokens, buffer_mutations, user_mutations, user_outputs)`. The graph looks something like: ``` graph(): %arg0_1 : [num_users=1] = placeholder[target=arg0_1] %attr : [num_users=2] = placeholder[target=attr] %arg1_1 : [num_users=2] = placeholder[target=arg1_1] %with_effects : [num_users=2] = call_function[target=torch._higher_order_ops.effects.with_effects](args = (%arg0_1, _TorchScriptTesting.takes_foo.default, %attr, %arg1_1), kwargs = {}) %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects, 0), kwargs = {}) %getitem_1 : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects, 1), kwargs = {}) %with_effects_1 : [num_users=2] = call_function[target=torch._higher_order_ops.effects.with_effects](args = (%getitem, _TorchScriptTesting.takes_foo.default, %attr, %getitem_1), kwargs = {}) %getitem_2 : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects_1, 0), kwargs = {}) %getitem_3 : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects_1, 1), kwargs = {}) %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%arg1_1, %getitem_3), kwargs = {}) return (getitem_2, add) ``` During unlifting, we will first remove the tokens and with_effect calls using the `remove_effect_tokens` pass. (cc @SherlockNoMad on the pass to remove tokens). This is so that this won't change the calling conventions when retracing. The graph after unlifting looks something like: ``` graph(): %attr_1 : [num_users=2] = get_attr[target=attr] %arg1_1 : [num_users=2] = placeholder[target=arg1_1] %takes_foo_default_1 : [num_users=1] = call_function[target=torch.ops._TorchScriptTesting.takes_foo.default](args = (%attr_1, %arg1_1), kwargs = {}) %takes_foo_default : [num_users=1] = call_function[target=torch.ops._TorchScriptTesting.takes_foo.default](args = (%attr_1, %takes_foo_default_1), kwargs = {}) %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%arg1_1, %takes_foo_default), kwargs = {}) return (add,) ``` Serialization support will be added in a followup. Note: tokens only affect custom ops that take in ScriptObjects, not ScriptObject methods yet. Differential Revision: [D54639390](https://our.internmc.facebook.com/intern/diff/D54639390) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121424 Approved by: https://github.com/tugsbayasgalan	2024-03-09 02:43:26 +00:00
Will Constable	f85d3a022c	[C10D] Fix pointToPoint op Flight Recording (#120270 ) Fix and test issues with both coalesced and individual send/recv ops Considered an alternate approach and then ditched it - alternate approach: #119757 - reason ditched: prefer recording individual collective events inside coalescing region instead of just the event at the end of the region, which also would not have tensor sizes or opnames without additional state variables added Another approach also ditched - record events on workEnqueue instead of initWork - reason ditched: too messy to get input/output shapes tagged on recording when recording in workEnqueue. Adding the info onto the Work obj would be possible, but adds to overhead of copying Works which we do on every collective. We can get info off the input/output tensors directly in initWork, but we don't want to keep refs to those tensors alive while the work is Enqueued, so we'd have to specifically copy size lists or something. This PR instead avoids creating a work inside pointToPoint when coalescing is active. Instead, only at endCoalescing() is a work finally intialized and enqueued. But it adds a record() call inside pointToPoint() instead of creating a work, during coalescing. This record() call picks up tensor shapes and op names. It ALSO changes initWork to accept a 'record' argument. This defaults to false, and should only be set to true if the caller ensures the work will be enqueued by workEnqueue, ensuring its cuda events are live when used by flight recorder's update_state(). The testing uncovers some odd pre-existing behavior and leaves them alone for now. We could change some of these - seq starts off at 1, not 0 for first op (but this is inconistent) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120270 Approved by: https://github.com/shuqiangzhang ghstack dependencies: #120724	2024-02-29 01:03:31 +00:00
Shuqiang Zhang	39f0a5ecc9	[c10d] simplify the dump timeout logic and unify the async call (#120331 ) Summary: The current dump timeout logic is a bit cumbersome as it needs 2 times: 1. timeout, 2. wake up time. And in theory the caller just needs to wait for a max of timeout value for the dump and declare the dump to be either successful or not. Also we unify the async call using std::async instead of a customized async lauch function for each operation. Test Plan: Unit tests Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/120331 Approved by: https://github.com/wconstab	2024-02-23 19:46:40 +00:00
cyy	1aad5c98b4	[structural binding][5/N] Replace std::tie with structural binding (#120142 ) This PR follows https://github.com/pytorch/pytorch/pull/119774, it is a continued work to clean up std::tie. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120142 Approved by: https://github.com/albanD	2024-02-21 22:32:55 +00:00
soulitzer	312ce35c1f	Rename singleton int to nested int (#119661 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119661 Approved by: https://github.com/ezyang	2024-02-16 19:21:17 +00:00
cyy	47a2e6b6b8	Fix C++20 build (#112333 ) Currently C++20 fails because of incorrect template initialization order. This PR adjusted the order of theses classes and a constructor to address the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112333 Approved by: https://github.com/albanD	2024-02-13 05:10:19 +00:00
Sahdev Zala	110919c984	Check QNNPACK support for the platform before running test (#119139 ) Do not run test ConstantPropagation.CustomClassesCanBePropagated on a platform where QNNPACK is not supported. For example, this test fails on M1 Mac because QNNPACK is not supported on M1 Mac: [----------] 1 test from ConstantPropagation [ RUN ] ConstantPropagation.CustomClassesCanBePropagated unknown file: Failure as described in more details in the issue #88613. After the PR, test passes successfully as below: [----------] 1 test from ConstantPropagation [ RUN ] ConstantPropagation.CustomClassesCanBePropagated [ OK ] ConstantPropagation.CustomClassesCanBePropagated (0 ms) [----------] 1 test from ConstantPropagation (0 ms total) Fixes #88613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119139 Approved by: https://github.com/jcaip	2024-02-12 20:21:07 +00:00
suo	82248f0b1c	[export] improve FakeTensor serialization (#119531 ) Recently we made it possible to serialize ExportedPrograms with fake parameters/buffers/etc. The serialization regime was kind of whacky; basically we serialized a stub and reassembled the FakeTensor using metadata that we had stashed elsewhere in the Graph state. This was bad for a few reasons: - Storing the metadata separately from the actual serialized object caused situations where you could have one but not the other. An example case is if you had a FakeTensor contained inside a TorchBind object—there was no obviously place to store the metadata for this. This actually happens—TensorQueue in fbgemm does this. - It created an annoying cycle: we had to deserialize the Graph's tensor metadata in order to deserialize (potentially faked) constants, but we need constants in order to deserialize the Graph. This fixes all that. The basic idea is to patch the reducer function for FakeTensor at serialization time, and serialize a copy of the FakeTensor metadata. We already are policing BC for the TensorMeta schema struct so it's not a net increase in the BC surface. As a bonus, I fixed a weird bug with torchbind tracing where we were accidentally reinterpreting a torch.ScriptObject as a torch.ScriptModule (which was the root cause of some weird behavior @bahuang was seeing last week). Differential Revision: [D53601251](https://our.internmc.facebook.com/intern/diff/D53601251/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119531 Approved by: https://github.com/zhxchen17	2024-02-12 19:28:08 +00:00
Ke Wen	b2043c0543	[c10d] PGNCCL refactor part 2: Simplify ProcessGroupNCCL into single-device style (#119421 ) Part 2 and last part of #118674: Introduce actual "single-device" code change to ProcessGroupNCCL. assert size == 1 and test refactor have been done in #119099. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119421 Approved by: https://github.com/shuqiangzhang	2024-02-12 18:45:49 +00:00
PyTorch MergeBot	0342b227e5	Revert "[c10d] PGNCCL refactor part 2: Simplify ProcessGroupNCCL into single-device style (#119421 )" This reverts commit `f3e7d80993`. Reverted https://github.com/pytorch/pytorch/pull/119421 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/119421#issuecomment-1938169747))	2024-02-12 07:34:20 +00:00
Ke Wen	f3e7d80993	[c10d] PGNCCL refactor part 2: Simplify ProcessGroupNCCL into single-device style (#119421 ) Part 2 and last part of #118674: Introduce actual "single-device" code change to ProcessGroupNCCL. assert size == 1 and test refactor have been done in #119099. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119421 Approved by: https://github.com/shuqiangzhang	2024-02-09 20:23:20 +00:00
Ke Wen	029a16c41f	[c10d] PGNCCL refactor part 1: adds assert size==1 (#119099 ) Breaking #118674 into multiple smaller PRs. This is the first one. It adds `assert size==1` to PGNCCL, and refactors some old tests written in multi-device style (which would otherwise fail at the assert). Pull Request resolved: https://github.com/pytorch/pytorch/pull/119099 Approved by: https://github.com/wconstab, https://github.com/XilunWu	2024-02-07 22:29:29 +00:00
PyTorch MergeBot	9d46fe603d	Revert "[c10d] PGNCCL refactor part 1: adds assert size==1 (#119099 )" This reverts commit `4ab852b6c5`. Reverted https://github.com/pytorch/pytorch/pull/119099 on behalf of https://github.com/atalman due to Breaks internal tests ([comment](https://github.com/pytorch/pytorch/pull/119099#issuecomment-1930839754))	2024-02-06 22:14:36 +00:00
Ke Wen	4ab852b6c5	[c10d] PGNCCL refactor part 1: adds assert size==1 (#119099 ) Breaking #118674 into multiple smaller PRs. This is the first one. It adds `assert size==1` to PGNCCL, and refactors some old tests written in multi-device style (which would otherwise fail at the assert). Pull Request resolved: https://github.com/pytorch/pytorch/pull/119099 Approved by: https://github.com/wconstab	2024-02-06 06:59:47 +00:00
lancerts	576383c2eb	Add torch check for dtype within bilinear (#118900 ) Fixes https://github.com/pytorch/pytorch/issues/117237 Short-term fix, when dtype does not match, it will be reflected in the torch check. @ezyang a cpp test case is added Pull Request resolved: https://github.com/pytorch/pytorch/pull/118900 Approved by: https://github.com/ezyang, https://github.com/malfet	2024-02-03 00:02:00 +00:00
Mu-Chu Lee	2b48891e62	[AOTInductor] Add Runtime Constant-folding for AOTInductor (#118765 ) Summary: Add Runtime Constant-folding for AOTInductor. This also include the invocation of constant folding at load time. The constant folding lowering is a 2-step process. First, we split the graph into 2 modules, one of it is the constant module, which doesn't depend on any input and the whole module could be inferred (constant-folded) one-time and be reused. The constant module, is lowered, and being codegen-ed as usual and cached (let's call this constant code). The constant code reuses the whole lowering/profiling/etc. process, only difference is that we do not generate any headers or initialization for the constant code. Second, after handling the constant module, we take care of the main module (which is the part that would depend on the user input.) For the main module, we take in one additional component, the constant code, compare with a normal lowering. Addition step we do here is that, we inject the constant code into the codegen-ed main module, and create the caller for the main module to consume the result of the constant module. Test Plan: Unit tests included in commit. Differential Revision: D53274382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118765 Approved by: https://github.com/chenyang78	2024-02-01 04:54:25 +00:00
Boyuan Feng	b369888bec	Replace `constraints` with `dynamic_shapes` in caffe2/test/cpp & torchrec/distributed/tests/test_pt2 (#118026 ) Summary: `constraints` argument for `torch.export` has been deprecated in favor of the `dynamic_shapes` argument. This PR updates the use of the deprecated API in `caffe2/test/cpp` and `torchrec/distributed/test/test_pt2`. Test Plan: CI Differential Revision: D52977354 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118026 Approved by: https://github.com/chenyang78	2024-01-23 22:15:15 +00:00
suo	4057d005ff	Initial torchbind support in PT2 (#117697 ) This PR adds the bare minimum functionality to get torchbind working in an e2e testable way on PT2. It implements: * ProxyTensor support * Simple torch.export support (proxytensor-only path, e.g. non-strict). * add some tests exercising the path. Because all this is not fully baked, I hide the functionality behind a feature flag (`enable_torchbind_tracing()`) so it does not affect regular users for now. Still on the agenda: * Dynamo support * Actual FakeMode support * Mutability support Hoping to get this first bit in as a standalone, as it will unblock some more extensive experimentation/testing going on internally. Differential Revision: [D51825372](https://our.internmc.facebook.com/intern/diff/D51825372/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117697 Approved by: https://github.com/SherlockNoMad	2024-01-19 06:28:20 +00:00
Ke Wen	c16e6e4cf7	[ProcessGroup] Make watchdog check work queue more frequently (#117297 ) Today watchdog's sleep interval is 1s. That's a bit long compared to modern GPU link's (or network link's) speed. Take DDP and Ampere for example: DDP's bucket size = 25 MB Ampere's NVLink speed = 250 GB/s 25 MB / 250 GB/s = 100 ms. So we are updating the interval to 100 ms. Update: 25 MB / 250 GB/s = 0.1 ms But let's see how it goes so far between making the checking more aggressive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117297 Approved by: https://github.com/fduwjj	2024-01-19 02:33:31 +00:00
Ke Wen	6d96beb6be	[c10d] Remove health check (#117699 ) https://github.com/pytorch/pytorch/pull/114916 and https://github.com/pytorch/pytorch/pull/116222 added support for eager NCCL comm init (performed as soon as `init_process_group` is called). If any user cares about the time difference and want to see NCCL init errors early, they can use eager init now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117699 Approved by: https://github.com/wconstab	2024-01-18 02:14:49 +00:00
fduwjj	ca4df16fdd	[c10d] Make DebugInfoWriter Singleton across all PG objects (#116489 ) Previously, we have the writer register to each NCCL PG(backend), so for every pg, we have a NCCL PG instance, so if we use some customized writer when multiple sub-PGs are used, we need to ensure user to register the writer for every backend which indicates a bad UX. Furthermore, the debug info is global, so it does not make sense to have the writer for each instance. We even have a static mutex in the `dumpDebuggingInfo` to ensure we serialize the write, that makes it more obvious that we can make the writer a singleton so that we only have one writer instance for all PG instances. Although the rationale is clear, the implementation may vary a lot. So this PR is RFC for now to see if this implementation makes sense or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116489 Approved by: https://github.com/kwen2501	2024-01-03 03:42:54 +00:00
Bin Bao	feafbcf437	[AOTI][refactor] Refactor model runner API (#116047 ) Summary: 1) make proxy executor as a private member; 2) use std::string instead of char* Differential Revision: [D52301106](https://our.internmc.facebook.com/intern/diff/D52301106) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116047 Approved by: https://github.com/khabinov	2023-12-21 01:05:37 +00:00
Xilun Wu	0b0b9b3275	[c10d][libuv] add partial read test for libuv backend and fix an error which only happens when partially reading a buffer (#116141 ) Test Plan 1. build pytorch 2. execute `TORCH_CPP_LOG_LEVEL=INFO build/bin/TCPStoreTest --gtest_filter=TCPStoreTest.testLibUVPartialRead` from the pytorch root directory. without the change: <img width="761" alt="image" src="https://github.com/pytorch/pytorch/assets/12968408/1942e3c2-a9c1-4fe4-87e8-7e21f4d8f9aa"> with the change: <img width="747" alt="image" src="https://github.com/pytorch/pytorch/assets/12968408/f3e96a5b-0ed1-49bd-9184-bb8a5ebebc33"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116141 Approved by: https://github.com/wconstab	2023-12-20 18:37:55 +00:00
Bin Bao	fabf9433e7	[AOTI][refactor] Organize model runner files (#116022 ) Summary: Move runner util files into a subdirectory and put AOTIModelContainerRunnerCpu into a separate file Differential Revision: [D52300693](https://our.internmc.facebook.com/intern/diff/D52300693) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116022 Approved by: https://github.com/khabinov	2023-12-20 15:35:34 +00:00
Nikita Shulga	d7caef7996	[CI] Update clang-format (#116002 ) To 17.0.6 build using https://github.com/pytorch/test-infra/blob/main/.github/workflows/clang-tidy-linux.yml Pull Request resolved: https://github.com/pytorch/pytorch/pull/116002 Approved by: https://github.com/suo	2023-12-18 14:58:46 +00:00
Mu-Chu Lee	c285ca7916	[AOTInductor] Add updaing constant buffer to active buffer. (#116001 ) Summary: Refactor update inactive constant buffer to allow updating with active buffer. Test Plan: Existing test to test inactive buffer updates. UpdateConstantsCuda in cpp test for active buffer updates. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/116001 Approved by: https://github.com/chenyang78	2023-12-18 11:49:03 +00:00
soulitzer	4d8ad4fb82	Move SingletonSymNodeImpl from c10 to aten (#114895 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114895 Approved by: https://github.com/jbschlosser	2023-12-13 20:01:18 +00:00
Wongboo	68f74dd162	Add python and C++ support for LPPool3d (#114199 ) Add python and C++ support for LPPool3d to Fixes #114114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114199 Approved by: https://github.com/mikaylagawarecki	2023-12-08 18:18:44 +00:00
Will Constable	7562b45454	Reland "[C10D] Use future for flight recorder dump (#115176 )" (#115332 ) Replaces the "always sleep 30 sec before abort" with "wait up to 30 sec for the future to complete then abort". The difference in this case is the abort happens as soon as the dump finishes up to a maximum, instead of always waiting the maximum. Allows multiple calls to dump, which will be serialized. Renames tryWriteDebugInfo to launchAsyncDebugDump in spirit of the change to support more than one launch and to always launch rather than only launching on the first call. Adds a test for dumping on timeout. This reverts commit `ac7d14baad`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115332 Approved by: https://github.com/fduwjj	2023-12-07 21:20:58 +00:00
Shaltiel Shmidman	ee8b33f7d5	Fixed crash when calling pad_packed_tensor when packed with cuda tensors and ensure_sorted=false due to indexing with tensors on different devices (#115028 ) Fixes #115027 Fix in csrc as done in the python code [here](https://github.com/pytorch/pytorch/blob/main/torch/nn/utils/rnn.py#L338). Pull Request resolved: https://github.com/pytorch/pytorch/pull/115028 Approved by: https://github.com/drisspg	2023-12-07 18:09:18 +00:00
suo	686a3e0bf0	[pytorch][PR] introduce WeakHashRef (#115216 ) We would like weak dictionaries that have `torch.ScriptObject` keys. Similar to tensors, we need to override the behavior of the ref to dot he right thing under comparison. This change also makes it so that WeakIdKeyDictionary works with a pluggable ref_type. Differential Revision: [D51828205](https://our.internmc.facebook.com/intern/diff/D51828205/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115216 Approved by: https://github.com/albanD	2023-12-07 17:48:11 +00:00
PyTorch MergeBot	ac7d14baad	Revert "[C10D] Use future for flight recorder dump (#115176 )" This reverts commit `0e07e3dbe4`. Reverted https://github.com/pytorch/pytorch/pull/115176 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the test_timeout_dumps is failing in trunk `0e07e3dbe4` ([comment](https://github.com/pytorch/pytorch/pull/115176#issuecomment-1844076455))	2023-12-07 02:09:58 +00:00
Will Constable	0e07e3dbe4	[C10D] Use future for flight recorder dump (#115176 ) Replaces the "always sleep 30 sec before abort" with "wait up to 30 sec for the future to complete then abort". The difference in this case is the abort happens as soon as the dump finishes up to a maximum, instead of always waiting the maximum. Allows multiple calls to dump, which will be serialized. Renames `tryWriteDebugInfo` to `launchAsyncDebugDump` in spirit of the change to support more than one launch and to always launch rather than only launching on the first call. Adds a test for dumping on timeout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115176 Approved by: https://github.com/zdevito	2023-12-06 23:42:19 +00:00
Mu-Chu Lee	80527c0cf2	[AOTInductor] Double buffering for Weights (#114446 ) Summary: This adds function to model container doing weight swapping with double buffering. There are 2 parts for double buffering a) Write constants into inactive buffer b) Swap active buffer For (a), we write the constants into the buffer that's currently not in use, and store the information in both constants map and the corresponding constant array to read. For (b), we obtain the lock, and activate the constant map/constant array that is inactive, and flag the one that's currently in use to inactive. Test Plan: test/cpp/aot_inductor/test.cpp Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D51543732](https://our.internmc.facebook.com/intern/diff/D51543732) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114446 Approved by: https://github.com/chenyang78, https://github.com/eellison	2023-12-05 22:31:56 +00:00
FFFrog	541591dd79	Add the appropriate check on div_value to the cpp frontend (#114671 ) Fixes #114334 As the title stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114671 Approved by: https://github.com/mikaylagawarecki	2023-12-04 01:28:11 +00:00
Chip Turner	9cc040fef6	Switch env variable use in test harnesses to the non-deprecated names to fix warnings (#114880 ) Previously: ``` [W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) ``` With this PR, those warnings disappear. They were introduced in #114077 This change was generated with this sed script, applied with `sed -i -f /tmp/x */.{py,hpp,cpp,cc}` and hand inspected. ``` s/\bNCCL_BLOCKING_WAIT\b/TORCH_NCCL_BLOCKING_WAIT/g s/\bNCCL_ENABLE_TIMING\b/TORCH_NCCL_ENABLE_TIMING/g s/\bNCCL_DESYNC_DEBUG\b/TORCH_NCCL_DESYNC_DEBUG/g s/\bNCCL_ASYNC_ERROR_HANDLING\b/TORCH_NCCL_ASYNC_ERROR_HANDLING/g s/\bENABLE_NCCL_HEALTH_CHECK\b/TORCH_ENABLE_NCCL_HEALTH_CHECK/g s/\bNCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK\b/TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK/g ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114880 Approved by: https://github.com/kwen2501	2023-12-01 20:08:23 +00:00
Scott Wolchok	165f4f6ccf	[PyTorch] Redirect c10::optional to std::optional (#101995 ) We have C++17 now! I am intentionally dropping the `c10::optional<c10::ArrayRef>` size optimization. It was intended to improve dispatch, but thanks to D34602980 / #70864 we don't use `optional<ArrayRef>` in function arguments anymore anyway. Differential Revision: [D46079028](https://our.internmc.facebook.com/intern/diff/D46079028/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101995 Approved by: https://github.com/malfet, https://github.com/Skylion007, https://github.com/ezyang	2023-11-30 02:46:41 +00:00
Chip Turner	066e072524	Retry #112889 (Opportunistically use ncclCommSplit when creating new NCCL groups) (#114385 ) - [c10d] (retry) Opportunistically use `ncclCommSplit` when creating new NCCL groups (#112889) - Guard use of `split_from` with a `hasattr` check for cases when NCCL (or RCCL) lacks `ncclCommSplit` Fixes cause of revert of original PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/114385 Approved by: https://github.com/huydhn	2023-11-23 07:00:00 +00:00
PyTorch MergeBot	b927a4e2ca	Revert "Opportunistically use `ncclCommSplit` when creating new NCCL groups (#112889 )" This reverts commit `64a5372e6c`. Reverted https://github.com/pytorch/pytorch/pull/112889 on behalf of https://github.com/huydhn due to Sorry for reverting you change, but it is failing ROCm distributed jobs in trunk `4d07428ede` ([comment](https://github.com/pytorch/pytorch/pull/112889#issuecomment-1823214376))	2023-11-22 17:43:51 +00:00
Chip Turner	64a5372e6c	Opportunistically use `ncclCommSplit` when creating new NCCL groups (#112889 ) Currently `ncclCommInitRankConfig` is always used when creating new communicator groups. This is wasteful as it creates non-shared pairs of endpoint queues as well as costs time to re-establish communication. This change is transparent and opportunistic; when `dist.new_group` is called, it will use the existing, healthy world process group to select the right ranks to include in the process group. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112889 Approved by: https://github.com/kwen2501	2023-11-21 21:03:52 +00:00
Pavan Balaji	8f8722e3f1	[nccl-pg] Avoid using NCCL_ prefix for non-NCCL env variables (#114077 ) NCCL_ prefix should only be used for NCCL library's environment variables. We currently use a few environment variables in PyTorch with the NCCL_ prefix that are the NCCL library does not understand. This patch renames such environment variables to use the TORCH_NCCL_ prefix instead. We still maintain the old NCCL_ variables, but throw a warning when they are used. The following env changes have been made: `NCCL_BLOCKING_WAIT` -> `TORCH_NCCL_BLOCKING_WAIT` `NCCL_ENABLE_TIMING` -> `TORCH_NCCL_ENABLE_TIMING` `NCCL_DESYNC_DEBUG` -> `TORCH_NCCL_DESYNC_DEBUG` `NCCL_ASYNC_ERROR_HANDLING` -> `TORCH_NCCL_ASYNC_ERROR_HANDLING` `ENABLE_NCCL_HEALTH_CHECK` -> `TORCH_ENABLE_NCCL_HEALTH_CHECK` `NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK` -> `TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/114077 Approved by: https://github.com/fduwjj	2023-11-21 07:23:42 +00:00
cyy	226384b460	[2/N] Cleanup header inclusions in torch_cpu by iwyu (#109964 ) Further cleaning up of torch_cpu header inclusions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109964 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2023-11-19 20:56:32 +00:00
Pavan Balaji	958f3b0df6	[nccl-pg] Migrate to getCvar* functions for env variable checking (#113797 ) Summary: The getCvar* functions allow us to provide multiple environment variables for the same value. This allows us to deprecate some variables in favor of others, while still allowing users to temporarily use the old variables for some time. Test Plan: OSS CI Reviewed By: fduwjj, XilunWu Differential Revision: D51225487 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/113797 Approved by: https://github.com/fduwjj	2023-11-19 03:48:58 +00:00
Andrzej Kotlowski	0885c58296	Add Bfloat16 scalar support to gloo backend (#113557 ) There was missing support for bfloat scalars. When I use gloo backend `torch.distributed.init_process_group(backend='gloo')` and run `torch.nn.parallel.DistributedDataParallel(model)` and _model_ has Bfloat16 features I receive following error: `RuntimeError: Invalid scalar type` This change fix this issue. c10::BFloat16 defines conversions from/to float, so calculations are made on float for bfloat. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113557 Approved by: https://github.com/XilunWu, https://github.com/jgong5	2023-11-17 21:16:54 +00:00
fduwjj	015fd2eb41	[NCCL PG] Add dumping flight recorder in the NCCL watchdog timeout (#113678 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113678 Approved by: https://github.com/XilunWu ghstack dependencies: #113503	2023-11-17 07:00:41 +00:00
Mu-Chu Lee	eddce3c054	[AOTInductor] Rename model_runner to model_container_runner (#111324 ) Summary: We rename the model_runner to model_container_runner to prepare for adding tests of pure model without container. Test Plan: commit itself is a test. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111324 Approved by: https://github.com/desertfire, https://github.com/chenyang78	2023-11-16 19:14:22 +00:00
fduwjj	5fb1d8f18a	[NCCL PG] Enable storing nccl traces into storage and make it configurable (#113503 ) This PR is to enable the store of NCCL flight recorder to storage and make it configurable by letting users register their own way of storing the debug info. We will then provide users a script to offline parse and process the dumped blobs. One thing, this PR is not trying to resolve is to decide where to dump the debug info. I will send a follow-up PR to address that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113503 Approved by: https://github.com/zdevito	2023-11-16 07:44:15 +00:00
Aleksei Nikiforov	b526aae95a	test_lazy: skip HashTest.Scalar (#112747 ) Fixes #99883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112747 Approved by: https://github.com/huydhn	2023-11-16 01:22:58 +00:00
fduwjj	f9114193bd	[NCCL PG] ADD a separate monitoring thread to ensure we collect debug info and check watchdog heartbeat (#112518 ) This PR has the following goals: 1. Detect unhealthy nccl watchdog thread by implementing a heartbeat. NCCL watchdog sometimes can hang for several reasons such as nccl/cuda API bugs or unexpected blocking behaviors. This is the last resort to ensure that we don't silently keep the training job run for hours. 2. Sometimes, the process gets stuck in the destroy of NCCL PG, and this PR will ensure that we will eventually abort it after some time (by default 2 mins) 3. Once heartbeat cannot be heard, we dump debug information (for now, we just use the flight recorder implemented in https://github.com/pytorch/pytorch/pull/110960/files) to disk. (How and where to dump the debug info will be addressed in the following PR). 4. Finally, we initiate std::abort via `LOG(FATAL)` to kill the process. To clarify further what this PR is trying to solve, we first list are four cases when a NCCL PG can end up with: - case 1: ncclwatchdog gets stuck (maybe some blocking API) and heartbeat monitor kills it during regular heartbeat monitor loop. - case 2: ncclwatchdog timeout and desync report or destroy kicked in(let's call it shutdown) but this shutdown takes so long and heartbeat believes it has to kills the process anyway. - case 3: ncclwatchdog aborts the process (heartbeat monitor not involved) - case 4: program exits cleanly (heartbeat monitor not involved) As we can see here, this PR is trying to address case one and two and we also want to ensure adding one more monitor thread does not interfere what we are currently doing in case three and four. That's why we added two flags `terminateHeartbeatMonitorThread_` and `collectiveDebugInfoMode_`. For case three and four, either `monitorWakeUpCV_` will be waked up in the destructor or `terminateHeartbeatMonitorThread_` will be set to true. So that monitor thread will just exit ASAP. For case one, both `terminateHeartbeatMonitorThread_` and `collectiveDebugInfoMode_` will still false when monitor thread see there are no heartbeat, so it will directly kill the process. For case two, either `terminateHeartbeatMonitorThread_` and `collectiveDebugInfoMode_` will be true, the monitor thread will wait extra time before killing the process. Differential Revision: [D51146305](https://our.internmc.facebook.com/intern/diff/D51146305) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112518 Approved by: https://github.com/kwen2501, https://github.com/wconstab	2023-11-10 04:41:14 +00:00
Nikita Shulga	88920b26be	[Cmake] Check that gcc-9.4 or newer is used (#112858 ) As this is the oldest gcc that is fully compatible with C++17 standard. - Replace number of conditional version with simpler `if(CMAKE_COMPILER_IS_GNUCXX)` or `append_cxx_flag_if_supported`. - As `-Wsuggest-override` condition was hidden before incorrect guard, add missing `override` keywords to `torch::autograd::PyFunctionTensorPostAccGradHooks::apply_with_saved` , `caffe2::python::TensorFeeder::Feed` and `cafee2::NetObserverReporterPrint::report``` Fixes https://github.com/pytorch/pytorch/issues/101839 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112858 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-11-06 17:19:53 +00:00
PyTorch MergeBot	679ca510b0	Revert "[Cmake] Check that gcc-9.4 or newer is used (#112858 )" This reverts commit `ad894cd072`. Reverted https://github.com/pytorch/pytorch/pull/112858 on behalf of https://github.com/PaliC due to breaking internal tests (check diff for test page) ([comment](https://github.com/pytorch/pytorch/pull/112858#issuecomment-1795485009))	2023-11-06 16:56:09 +00:00
Nikita Shulga	ad894cd072	[Cmake] Check that gcc-9.4 or newer is used (#112858 ) As this is the oldest gcc that is fully compatible with C++17 standard. - Replace number of conditional version with simpler `if(CMAKE_COMPILER_IS_GNUCXX)` or `append_cxx_flag_if_supported`. - As `-Wsuggest-override` condition was hidden before incorrect guard, add missing `override` keywords to `torch::autograd::PyFunctionTensorPostAccGradHooks::apply_with_saved` , `caffe2::python::TensorFeeder::Feed` and `cafee2::NetObserverReporterPrint::report``` Fixes https://github.com/pytorch/pytorch/issues/101839 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112858 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-11-04 05:40:08 +00:00
Nikita Shulga	8665a51baf	Initialize logging facility when running ProcessGroupNCCLTest (#112809 ) If code is compiled without `glog`, there are no way to control log levels other than explicitly calling `c10::initLogging()` Test plan: Run `TORCH_CPP_LOG_LEVEL=0 ./bin/ProcessGroupNCCLTest` and observe extra log messages Pull Request resolved: https://github.com/pytorch/pytorch/pull/112809 Approved by: https://github.com/fduwjj	2023-11-03 02:26:13 +00:00
Pritam Damania	e66ec5843f	[RESUBMIT] Cleanup error reporting for ProcessGroupNCCL (#112419 ) Continuing some of the work from https://github.com/pytorch/pytorch/pull/108191, I realized majority of errors raised from ProcessGroupNCCL were just generic RuntimeError. In this PR, I've added appropriate error types to all the exceptions raised from ProcessGroupNCCL. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112419 Approved by: https://github.com/fduwjj	2023-10-31 05:58:21 +00:00
Aaron Enye Shi	63c089b09d	[c10] Move profiler clock to libc10 for timestamps (#111972 ) Summary: Move the profiler's Approximate Clock from libtorch to libc10. The main reason is to allow c10 features to get time. The clock is using TSC when available for performance. CUDA Caching Allocator's implementation of memory snapshot will add the timestamps to memory events with this same clock in subsequent diff. Test Plan: CI Differential Revision: D50601935 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/111972 Approved by: https://github.com/davidberard98	2023-10-27 16:18:40 +00:00
PyTorch MergeBot	abe172e268	Revert "Cleanup error reporting for ProcessGroupNCCL (#111979 )" This reverts commit `b29c658265`. Reverted https://github.com/pytorch/pytorch/pull/111979 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing multigpu test in trunk `b29c658265` ([comment](https://github.com/pytorch/pytorch/pull/111979#issuecomment-1781919184))	2023-10-26 21:29:40 +00:00
angelayi	b126adcdee	[aotinductor] Pass TorchIR to AOTInductor (#110020 ) Updates `_export.aot_compile` to pass a torch IR graph to inductor, allowing inductor to now run the pre_grad_passes, and reuse more of inductor's code. Also updates the API to only return the `so_path`, and not returning the exported program. The pytree call spec is now serialized and placed inside of the generated model code. When calling the model, because there is no c++ pytree implementation linked yet, we can access the call specs through `get_call_spec()`, and call pytree flatten/unflattenin python. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110020 Approved by: https://github.com/desertfire	2023-10-26 15:54:31 +00:00
Jeff Daily	28c0b07d19	[ROCm] remove HCC references (#111975 ) - rename `__HIP_PLATFORM_HCC__` to `__HIP_PLATFORM_AMD__` - rename `HIP_HCC_FLAGS` to `HIP_CLANG_FLAGS` - rename `PYTORCH_HIP_HCC_LIBRARIES` to `PYTORCH_HIP_LIBRARIES` - workaround in tools/amd_build/build_amd.py until submodules are updated These symbols have had a long deprecation cycle and will finally be removed in ROCm 6.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111975 Approved by: https://github.com/ezyang, https://github.com/hongxiayang	2023-10-26 02:39:10 +00:00
Pritam Damania	b29c658265	Cleanup error reporting for ProcessGroupNCCL (#111979 ) Continuing some of the work from https://github.com/pytorch/pytorch/pull/108191, I realized majority of errors raised from ProcessGroupNCCL were just generic RuntimeError. In this PR, I've added appropriate error types to all the exceptions raised from ProcessGroupNCCL. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111979 Approved by: https://github.com/fduwjj	2023-10-26 01:39:54 +00:00
cyy	f9cc7f6a1c	Enable Wno-unused-private-field,Wunused-lambda-capture and fix CUDA warnings (#110856 ) This PR enables Wno-unused-private-field,Wunused-lambda-capture and some CUDA warnings were fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110856 Approved by: https://github.com/albanD, https://github.com/malfet	2023-10-25 03:39:05 +00:00
zdevito	babb6c6ac4	nccl flight recorder (#110960 ) Keep a buffer of the last 16384 nccl work actions, including the stack trace that launched the event. When torch._C._distributed_c10d._dump_nccl_trace(), it an dump these to a pickled archive. For each action we get: process_group_id, seq_id, collective_name, size_of_first_tensor, stack trace state - issued, started, completed (based on cuda events and queried if necessary when the dump is requested) I tested that it is possible to query event state when the streams are otherwise stuck. Differential Revision: [D50138956](https://our.internmc.facebook.com/intern/diff/D50138956) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110960 Approved by: https://github.com/wconstab	2023-10-24 07:12:21 +00:00
Kazuaki Ishizaki	deb800ee81	Fix typo under test directory (#111304 ) This PR fixes typo in comments under `test` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111304 Approved by: https://github.com/Skylion007	2023-10-16 23:06:06 +00:00
Yang Chen	6748a14a71	[aot_inductor] add a test with AOTInductor + TorchScript (#111124 ) This test may be of a reference for using AOTInductor with TorchScript. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111124 Approved by: https://github.com/jansel	2023-10-12 19:29:07 +00:00
Kurt Mohler	0f924cdee3	Fix `functional::smooth_l1_loss` signatures to not override `beta` (#109798 ) This splits `nn::functional::smooth_l1_loss` into two different signatures in order to keep backward compatibility for calling the function like `smooth_l1_loss(input, target, /reduction=/..., /beta=/...)` Fixes #70163 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109798 Approved by: https://github.com/mikaylagawarecki	2023-10-11 21:37:37 +00:00
Bin Bao	86619c9c9d	[aotinductor] Add both cpu and cuda tests for the AOTInductor cpp test (#110920 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110920 Approved by: https://github.com/chenyang78 ghstack dependencies: #110652, #110891	2023-10-11 15:58:28 +00:00
Bin Bao	3058700f7f	[aotinductor] Add AOTIModelRunner as a utility class (#110891 ) Summary: Introduce a utility class AOTIModelRunner to take care of running an AOTInductor compiled model. It does things like dlopen a model, initialize the model container, setup inputs and outputs, and destroy the model container. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110891 Approved by: https://github.com/chenyang78 ghstack dependencies: #110652	2023-10-11 15:58:28 +00:00
Bin Bao	b17c247eb1	[aotindutor] Update the cpp test example (#110652 ) Summary: store inputs and outpus in python, and load them back to run the compiled model in c++ and compare the output Pull Request resolved: https://github.com/pytorch/pytorch/pull/110652 Approved by: https://github.com/chenyang78	2023-10-11 15:58:28 +00:00
Will Constable	ca03f36233	Change ProcessGroupNCCL default timeout to 10 min (#110947 ) Avoid changing default for other backends as CPU backend (GLOO) may need longer timeouts. Motivated by trying to save cluster time when encountering collective hangs. Generally collectives should time out within seconds and 30 minutes (or 10 minutes) should provide ample headroom for edge cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110947 Approved by: https://github.com/xw285cornell, https://github.com/fduwjj	2023-10-10 22:28:39 +00:00
sunghyunjun	b5268456f9	Fix optimize_for_inference to support modules that don't have a forward method (#110013 ) Fixes #108662 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110013 Approved by: https://github.com/davidberard98	2023-10-02 20:13:44 +00:00
cyy	a81d083b1c	[Reland] Add -Wdeprecated and related fixes (#110019 ) This is reland of PRs #https://github.com/pytorch/pytorch/pull/108626 and #109564. We fixed the IOS build failure by changing ``` ((CHECK) ? (EXPR) : ([] { assert(!#CHECK); }(), (EXPR))) ``` to ``` ((CHECK) ? (EXPR) : ([] { assert(false); }(), (EXPR))) ``` in TR2_OPTIONAL_ASSERTED_EXPRESSION, since the former syntax was invalid on Apple Clang. Anyway, we could apply the simple fix hoping that c10::optional would be replaced by std::optional soon. We also enabled -Wdeprecated on c10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110019 Approved by: https://github.com/clee2000	2023-09-28 03:34:29 +00:00
hongxyan	0511df0ee9	[ROCM] enable skipped test_api cpp tests (#109817 ) [ROCM] enable skipped test_api cpp tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/109817 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2023-09-27 16:52:46 +00:00
Bin Bao	4bf1cd6961	[aotinductor] Rename aot_runtime to aoti_runtime (#110007 ) Summary: Make the naming more explicit Differential Revision: D49593528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110007 Approved by: https://github.com/houseroad	2023-09-26 00:46:54 +00:00
cyy	265acd4bea	Clean up CMake target linking (#109959 ) This PR cleans up more CMake target linking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109959 Approved by: https://github.com/ezyang	2023-09-25 01:37:14 +00:00
cyy	dee100945e	[2/N] Move c10::variant to std::variant (#109723 ) This PR moves most of c10::variant calls to std::variant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109723 Approved by: https://github.com/ezyang	2023-09-24 02:47:43 +00:00

1 2 3 4 5 ...

2383 Commits