pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Scott Wolchok	605f2d802a	[PyTorch] Remove unnecessary include of c10/util/Exception.h in irange.h (#136202 ) Manually audited and can't figure out why this would be needed. Differential Revision: [D62879500](https://our.internmc.facebook.com/intern/diff/D62879500/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136202 Approved by: https://github.com/malfet	2024-09-18 16:57:15 +00:00
Banit Agrawal	a575ce0dc6	[PyTorch Pinned Allocator] Add support of background thread to process events (#135524 ) Summary: Currently we process events in the regular allocation path and we call cudaEventQuery to check on the events and this path can take some locks in libcuda driver. Its not entirely needed to do process events in the allocation path, we could move this to a background thread and keep processing events regularly and put the freed block to the free list. Differential Revision: D62396585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135524 Approved by: https://github.com/zyan0	2024-09-17 21:08:10 +00:00
Banit Agrawal	48d18fbd4c	[PyTorch CUDA Allocator] Allow reuse of non-split blocks with better rounding (#136174 ) Summary: This diff adds an option to round the non-split blocks in caching allocator so that they can be reused without causing lots of fragmentation for large memory segments. For example, if we specify max_split memory size as 400MB, then all allocations more than 400MB will not be split. Lets say, we allocated some 1024MB blocks and these are cached in the allocator blocks. If we request a new 500MB block, we round it to nearest power-2-division, thats 512MB, we add default kLargeBuffer of 20MB, that will be 532MB and since 532MB is less than existing 1024MB block, the 1024MB will not be used for this allocation, instead a new 512MB block will be created. In this diff, we provide an option to cofigure the kLargeBuffer for rounding and expose as a configurable option, so 512MB + max_non_split_rounding_size and if thats greater than 1024MB, we will use te 1024MB and we wont create a new 512MB block using cudaMalloc. This option is added so that we can pre-allocate some large blocks so that we can reuse them as much as possible and we dont stall on calling cudaMalloc. Differential Revision: D62758758 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136174 Approved by: https://github.com/zyan0	2024-09-17 19:08:44 +00:00
Andrii Grynenko	a141c6bb0d	[pytorch][monitoring] Dynamic backend for WaitCounter (#135967 ) Summary: This implements a default backend proxy that tries to look up a backend via dlsym. What this enables is dynamically loading a module with a backend implementation without having it statically linked with the application. Differential Revision: D62549295 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135967 Approved by: https://github.com/c-p-i-o	2024-09-15 18:07:49 +00:00
Jessica Vandebon	baff86dafb	[MTIA tensor] allow shallow copy between CPU and MTIA tensors (#135871 ) Reviewed By: egienvalue, hanzlfs Differential Revision: D61662214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135871 Approved by: https://github.com/egienvalue, https://github.com/nautsimon	2024-09-13 22:13:58 +00:00
Yu, Guangye	e6b68359d7	Fix xpu memory stats error (#135818 ) # Motivation fix https://github.com/pytorch/pytorch/issues/135726 After merging two free blocks, I made a stupid mistake of ignoring the correct size to decrease the active memory size, which should be the original block size instead of the merged block size. # Additional Context Add a UT to guard this scenario. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135818 Approved by: https://github.com/EikanWang	2024-09-13 02:41:21 +00:00
cyy	f5f1d0a753	Fix build warnings for torch_python (#134981 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134981 Approved by: https://github.com/ezyang	2024-09-12 03:59:34 +00:00
Yu, Guangye	b53d97c7be	[Intel GPU] Add XPU memory-related APIs (#129919 ) # Motivation According to https://github.com/pytorch/pytorch/issues/116322, we will help unify the device allocator. So we introduce a simple xpu device allocator only with the key functionality first. And expect to add some memory statistics-related functionality after the unification. But now, some memory statistic-related APIs listed in https://github.com/pytorch/pytorch/issues/127929 are requested. We need more time to unify the device allocator. In order to facilitate the user experience, we expect to support these memory statistic-related APIs before the unification. # Additional Context Fixes: #127929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129919 Approved by: https://github.com/dvrogozh, https://github.com/abhilash1910, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #130923	2024-09-07 11:15:17 +00:00
Yu, Guangye	6c1da66407	[Reland] Refactor caching device allocator utils (#130923 ) # Motivation Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage. This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy	2024-09-07 11:14:17 +00:00
Haibo Chen	e162414963	add instrumentation of CCA stats for reserved and allocated memory size (#135231 ) As titled Pull Request resolved: https://github.com/pytorch/pytorch/pull/135231 Approved by: https://github.com/c-p-i-o	2024-09-06 02:48:56 +00:00
Kulin Seth	144fde4fd2	[MPS] Add support for autocast in MPS (#99272 ) Fixes https://github.com/pytorch/pytorch/issues/88415 Need to run inductor/test_cpu_select_algorithm Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272 Approved by: https://github.com/malfet Co-authored-by: Siddharth Kotapati <skotapati@apple.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: Roy Hvaara <roy@lightyear.no>	2024-09-05 23:23:17 +00:00
Scott Wolchok	a5d70cf545	[PyTorch] Add isfinite to BFloat16-math.h (#135052 ) Missing function from <cmath>. Differential Revision: [D62148884](https://our.internmc.facebook.com/intern/diff/D62148884/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135052 Approved by: https://github.com/PaliC, https://github.com/albanD ghstack dependencies: #135031	2024-09-05 21:50:36 +00:00
Scott Wolchok	7fe819d917	[PyTorch] Fix -Wshadow -Werror build in BFloat16-inl.h (#135031 ) `float_t` is required to exists in C99 math.h, which causes -Wshadow to fire. We don't need the alias, fortunately. Differential Revision: [D62135908](https://our.internmc.facebook.com/intern/diff/D62135908/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135031 Approved by: https://github.com/albanD	2024-09-05 21:48:21 +00:00
PyTorch MergeBot	e55c0f59e5	Revert "[Reland] Refactor caching device allocator utils (#130923 )" This reverts commit `9809080b9e`. Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/kit1980 due to breaking internal builds - Error: Relocation overflow has occured ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2332640961))	2024-09-05 21:16:14 +00:00
Yutao Xu	c7328dff7f	Enhance the stability of the complex divide code (#134647 ) In C++, when a floating-point literal (e.g., 3.14) is compared with a variable of type float, the literal is by default interpreted as a double. ```c++ float f = 3.14f; if (f == 3.14) { // Do something } ``` If a device does not support double, an error will occur. This PR addresses the issue of complex64 errors on machines that do not support double operations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134647 Approved by: https://github.com/EikanWang, https://github.com/albanD	2024-09-05 08:36:37 +00:00
rzou	d7b57c4d63	Fix tensor.data access under inference_mode and compile (#134878 ) Fixes https://github.com/pytorch/pytorch/issues/134798 In the regular Tensor case, when you call Tensor.data, there's a check for if inference mode is active. If it is active, then we don't set the version counter. We replicate this check for Tensor Subclasses (the bug was we were trying to set the version counter on a FakeTensor in inference_mode). Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/134878 Approved by: https://github.com/bdhirsh	2024-09-04 17:55:41 +00:00
FFFrog	5690f003a6	C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED and C10_DIAGNOST should be used in pairs (#135004 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135004 Approved by: https://github.com/aaronenyeshi	2024-09-04 13:14:23 +00:00
Yu, Guangye	9809080b9e	[Reland] Refactor caching device allocator utils (#130923 ) # Motivation Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage. This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy	2024-09-04 05:31:08 +00:00
Haibo Chen	2e0b114c06	add a new Guage API with an empty backend to PyTorch core (#134883 ) Summary: The current use case is to continuously measure the total allocated and reserved CUDA memory size from CUDACachingAllocator, and export their distribution (min, max, p90 etc) over time as timeseries. The current callback-based API does not work because the backend decides when the measurement is taken, so data points between two measurements may not be recorded. The distribution (e.g. max) as such will not be accurate. This new API closely follow the design of the existing WaitCounter API otherwise. This is not quite a synchronous version of DynamicCounter, as summing multiple data points does not make sense to my use case Test Plan: CI Differential Revision: D61837528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134883 Approved by: https://github.com/c-p-i-o	2024-09-03 17:08:47 +00:00
chilli	db193d1e29	add msg to _assert_async (#134813 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134813 Approved by: https://github.com/ezyang, https://github.com/eqy, https://github.com/albanD	2024-09-03 06:33:18 +00:00
zdevito	d91b49dbaa	expandable_segments <-> other allocator options (#134338 ) Previously setting garbage_collection_threshold or max_split_size_mb along with expandable_segments:True could cause the allocator to hit assert failures when running nearly out of memory. This PR ensures garbage_collection and max_split freeing do not accidentally try to release expandable segments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134338 Approved by: https://github.com/ezyang	2024-08-29 18:43:59 +00:00
Syed Tousif Ahmed	4655eb3ee2	Uses MemPoolContext to route allocations from CUDACachingAllocator (#134685 ) Re-open of https://github.com/pytorch/pytorch/pull/133599 that was mistakenly closed by issuing `ghstack land` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134685 Approved by: https://github.com/ezyang	2024-08-29 03:56:31 +00:00
PyTorch MergeBot	2c88a923a7	Revert "Refactor caching device allocator utils (#130923 )" This reverts commit `c45ca8092d`. Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to be causing internal tests to fail with errors like `error: no type named 'DeviceStats' in namespace 'xxx::xxx:xxxAllocator'; did you mean 'DeviceStatus'?` ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2315730155))	2024-08-28 15:56:08 +00:00
Yu, Guangye	c45ca8092d	Refactor caching device allocator utils (#130923 ) # Motivation Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage. This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy	2024-08-28 01:35:23 +00:00
Yuanhao Ji	44dadf2506	[Fix] Check name when registering privateuse1 backend (#134071 ) do some checks when registering privateuse1 backend to avoid using in-tree deivce names Pull Request resolved: https://github.com/pytorch/pytorch/pull/134071 Approved by: https://github.com/albanD	2024-08-27 20:28:30 +00:00
Yifu Wang	78d69bfe11	[SymmetricMemory] introduce multicast support, multimem_all_reduce_ and multimem_one_shot_all_reduce (#133424 ) ### Summary - Added multicast support to SymmetricMemory. If the cuda runtime and cuda driver have multicast support, SymmetricMemory associate all peer buffers with a multicast object and exposes the multicast virtual address. - Implemented `multimem_all_reduce_` and `multimem_one_shot_all_reduce` based on the multicast support. The two variants shows different performance characteristic for different message size. We plan to use Inductor for collective algo selection (and required symmetric memory buffer allocation). ### Benchmark 8xH100 (non-standard version with HBM2e at 650W). NVSwitch V3 with NVLS support. ![image](https://github.com/user-attachments/assets/4998a16b-c2c0-4797-9dd0-1da2303df947) ![image](https://github.com/user-attachments/assets/278ad361-52cb-4864-82c6-bb67e8d0a3fe) Differential Revision: [D61682507](https://our.internmc.facebook.com/intern/diff/D61682507) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133424 Approved by: https://github.com/yf225, https://github.com/weifengpy	2024-08-23 20:09:20 +00:00
PyTorch MergeBot	cedfac20c7	Revert "[SymmetricMemory] introduce multicast support, multimem_all_reduce_ and multimem_one_shot_all_reduce (#133424 )" This reverts commit `66d3eb783c`. Reverted https://github.com/pytorch/pytorch/pull/133424 on behalf of https://github.com/jeanschmidt due to Broke internal ADS builds, see D61611517 ([comment](https://github.com/pytorch/pytorch/pull/133424#issuecomment-2304676328))	2024-08-22 13:29:27 +00:00
David Berard	84b3f1900a	C++ network flow implementation in c10 (#132188 ) The functorch partitioners use network flow to split the joint graph into a forward and backward graph. Internally, we've found that upgrading to networkx 2.8.8 (from 2.5) results in some hard-to-debug failures (internal reference: https://fburl.com/workplace/jrqwagdm). And I'm told that there's interest to remove the python dependency. So this PR introduces a C++ implementation that mirrors the API provided by networkx. We'll need to add python bindings and do some additional testing to verify correctness. Differential Revision: [D61550977](https://our.internmc.facebook.com/intern/diff/D61550977) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132188 Approved by: https://github.com/Chillee	2024-08-21 18:40:54 +00:00
Yifu Wang	66d3eb783c	[SymmetricMemory] introduce multicast support, multimem_all_reduce_ and multimem_one_shot_all_reduce (#133424 ) ### Summary - Added multicast support to SymmetricMemory. If the cuda runtime and cuda driver have multicast support, SymmetricMemory associate all peer buffers with a multicast object and exposes the multicast virtual address. - Implemented `multimem_all_reduce_` and `multimem_one_shot_all_reduce` based on the multicast support. The two variants shows different performance characteristic for different message size. We plan to use Inductor for collective algo selection (and required symmetric memory buffer allocation). ### Benchmark 8xH100 (non-standard version with HBM2e at 650W). NVSwitch V3 with NVLS support. ![image](https://github.com/user-attachments/assets/4998a16b-c2c0-4797-9dd0-1da2303df947) ![image](https://github.com/user-attachments/assets/278ad361-52cb-4864-82c6-bb67e8d0a3fe) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133424 Approved by: https://github.com/yf225, https://github.com/weifengpy	2024-08-21 05:11:21 +00:00
Zhengxu Chen	517aee5369	[torchscript] Add a sampled logging integration point. (#133484 ) Test Plan: test script: ``` def test_zhxchen17(self): from libfb.py.pyinit import initFacebook initFacebook() class M(torch.nn.Module): def forward(self, x): return torch.add(x, x) def tmptmp(x, y): return torch.mul(x, y) m = M() n = torch.jit.script(m) print(n(torch.tensor(1))) print(torch.jit.script(tmptmp)(torch.tensor(1), torch.tensor(2))) ``` ``` I0802 12:01:23.932929 4079081 init.cc:407] Logging to scuba: run __torch__.caffe2.test.export.test_export.M.forward sample rate: 1000000 ``` Differential Revision: D60920867 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133484 Approved by: https://github.com/davidberard98	2024-08-19 18:04:45 +00:00
Yu, Guangye	fbd020fce6	Add new prop to _XpuDevicePropertie for triton gemm optimization (#131738 ) # Motivation This PR aims to add new properties to `_XpuDevicePropertie` for triton gemm optimization. # Additional Context `ext_oneapi_supports_cl_extension` is not a ABI-neutral API. It depends on compiler 2025.0. For more details, see https://github.com/intel/llvm/pull/13212 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131738 Approved by: https://github.com/gujinghui	2024-08-18 08:32:30 +00:00
PyTorch MergeBot	a0cb54ab46	Revert "C++ network flow implementation in c10 (#132188 )" This reverts commit `e6272acaec`. Reverted https://github.com/pytorch/pytorch/pull/132188 on behalf of https://github.com/izaitsevfb due to breaks aps models and builds internally ([comment](https://github.com/pytorch/pytorch/pull/132188#issuecomment-2294120234))	2024-08-16 19:48:54 +00:00
David Berard	e6272acaec	C++ network flow implementation in c10 (#132188 ) The functorch partitioners use network flow to split the joint graph into a forward and backward graph. Internally, we've found that upgrading to networkx 2.8.8 (from 2.5) results in some hard-to-debug failures (internal reference: https://fburl.com/workplace/jrqwagdm). And I'm told that there's interest to remove the python dependency. So this PR introduces a C++ implementation that mirrors the API provided by networkx. We'll need to add python bindings and do some additional testing to verify correctness. Differential Revision: [D61284135](https://our.internmc.facebook.com/intern/diff/D61284135) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132188 Approved by: https://github.com/Chillee	2024-08-15 07:32:51 +00:00
Guilherme Leobas	c518b50c4c	Remove functorch dispatch keys in `legacyExtractDispatchKey` (#133018 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133018 Approved by: https://github.com/zou3519	2024-08-13 15:32:01 +00:00
Yuanhao Ji	343071cd96	Fix privateuse1 backend name case (#132980 ) ### Problem `get_privateuse1_backend(bool lower_case)` always returns a lower case name and `lower_case` is not used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132980 Approved by: https://github.com/albanD	2024-08-10 07:39:54 +00:00
Yu, Guangye	9c5e0d47fe	Add xpu_cmake_macros.h to xpu build (#132847 ) # Motivation fix https://github.com/pytorch/pytorch/issues/132971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132847 Approved by: https://github.com/EikanWang	2024-08-08 08:06:49 +00:00
zdevito	fb6b001cde	Disable expandable segments IPC in fbcode, because some jobs seem to be failing. (#132890) seem to be failing. https://fb.workplace.com/groups/1405155842844877/permalink/8867182216642165/ Differential Revision: [D60912371](https://our.internmc.facebook.com/intern/diff/D60912371/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132890 Approved by: https://github.com/eqy, https://github.com/ezyang	2024-08-08 01:42:32 +00:00
cyy	6b12dc0224	[Reland] [11/N] Use std::nullopt and std::optional (#132622 ) Reland of #132396, which was reverted due to dependency reversion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132622 Approved by: https://github.com/ezyang	2024-08-05 20:36:33 +00:00
PyTorch MergeBot	2764bee942	Revert "[MPS] Add support for autocast in MPS (#99272 )" This reverts commit `6919e8baab`. Reverted https://github.com/pytorch/pytorch/pull/99272 on behalf of https://github.com/clee2000 due to Broke test/inductor/test_cpu_select_algorithm.py::TestSelectAlgorithmCPU::test_quantized_linear_amx_batch_size_3_in_features_128_out_features_64_bias_False_cpu on sm86 jobs [GH job link](https://github.com/pytorch/pytorch/actions/runs/10252979157/job/28367091621) [HUD commit link](`6919e8baab`) Not caught on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/99272#issuecomment-2269808857))	2024-08-05 19:59:04 +00:00
zdevito	8d9c3a71f6	Support IPC for Expandable Segments (#130890 ) This reapplication commit is the same as before except it resolves a build error in an internal build where `handle` was shadowed. Differential Revision: [D60547506](https://our.internmc.facebook.com/intern/diff/D60547506) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130890 Approved by: https://github.com/dsjohns2	2024-08-05 18:48:13 +00:00
Kulin Seth	6919e8baab	[MPS] Add support for autocast in MPS (#99272 ) Fixes https://github.com/pytorch/pytorch/issues/88415 Co-authored-by: Siddharth Kotapati <skotapati@apple.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272 Approved by: https://github.com/malfet	2024-08-05 17:02:30 +00:00
Aleksei Nikiforov	14edd986b3	Fix missing include file (#132647 ) This error only appears with newer gcc releases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132647 Approved by: https://github.com/Skylion007	2024-08-05 15:49:49 +00:00
PyTorch MergeBot	00097f3458	Revert "C++ network flow implementation in c10 (#132188 )" This reverts commit `dccce77935`. Reverted https://github.com/pytorch/pytorch/pull/132188 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to be failing internal tests. Please see D60702564 to investigate ([comment](https://github.com/pytorch/pytorch/pull/132188#issuecomment-2267098420))	2024-08-03 18:44:28 +00:00
David Berard	dccce77935	C++ network flow implementation in c10 (#132188 ) The functorch partitioners use network flow to split the joint graph into a forward and backward graph. Internally, we've found that upgrading to networkx 2.8.8 (from 2.5) results in some hard-to-debug failures (internal reference: https://fburl.com/workplace/jrqwagdm). And I'm told that there's interest to remove the python dependency. So this PR introduces a C++ implementation that mirrors the API provided by networkx. We'll need to add python bindings and do some additional testing to verify correctness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132188 Approved by: https://github.com/Chillee	2024-08-02 20:30:59 +00:00
PyTorch MergeBot	e4e3575fb0	Revert "[11/N] Use std::nullopt and std::optional (#132396 )" This reverts commit `d7d6190493`. Reverted https://github.com/pytorch/pytorch/pull/132396 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR has a dependency on another PR (https://github.com/pytorch/pytorch/pull/128898) that has to be reverted ([comment](https://github.com/pytorch/pytorch/pull/132396#issuecomment-2265952528))	2024-08-02 18:49:42 +00:00
Aleksei Nikiforov	df781343e2	Link libc10 to pthreads (#132484 ) It gets linked as transitive dependency of `libmkl` on x86_64, but it's must be specified explicitly on s390x Linking issue only appears when using gcc-13 with gold linker. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132484 Approved by: https://github.com/malfet	2024-08-02 18:03:44 +00:00
soulitzer	82b6480b0a	Update SavedTensorHooks TLS stack to use SafePyObject (#131700 ) Previously, we must manually manage refcounting when updating the TLS saved variable stack. With this PR, things should be handled automatically by the SafePyObject. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131700 Approved by: https://github.com/albanD	2024-08-02 16:27:16 +00:00
Andrii Grynenko	fca2dba7ca	[pytorch][counters] Pybind for WaitCounter (#132357 ) Summary: Basic pybind integration for WaitCounter providing a guard API. Also fixes broken copy/move constructor in WaitGuard (it wasn't really used with the macro-based C++ API). Test Plan: unit test Differential Revision: D60557660 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132357 Approved by: https://github.com/jamesperng, https://github.com/asiab4	2024-08-02 16:08:10 +00:00
cyy	b9cb1abf65	[12/N] Use std::optional (#132361 ) Follows #132396 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132361 Approved by: https://github.com/eqy	2024-08-02 13:46:46 +00:00
cyy	35d14d22a0	Fix some issues detected by static analysis tools (#131989 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/131989 Approved by: https://github.com/ezyang	2024-08-02 04:18:57 +00:00

1 2 3 4 5 ...

2534 Commits