Scott Wolchok
605f2d802a
[PyTorch] Remove unnecessary include of c10/util/Exception.h in irange.h ( #136202 )
...
Manually audited and can't figure out why this would be needed.
Differential Revision: [D62879500](https://our.internmc.facebook.com/intern/diff/D62879500/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136202
Approved by: https://github.com/malfet
2024-09-18 16:57:15 +00:00
Banit Agrawal
a575ce0dc6
[PyTorch Pinned Allocator] Add support of background thread to process events ( #135524 )
...
Summary: Currently we process events in the regular allocation path and we call cudaEventQuery to check on the events and this path can take some locks in libcuda driver. Its not entirely needed to do process events in the allocation path, we could move this to a background thread and keep processing events regularly and put the freed block to the free list.
Differential Revision: D62396585
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135524
Approved by: https://github.com/zyan0
2024-09-17 21:08:10 +00:00
Banit Agrawal
48d18fbd4c
[PyTorch CUDA Allocator] Allow reuse of non-split blocks with better rounding ( #136174 )
...
Summary:
This diff adds an option to round the non-split blocks in caching allocator so that they can be reused without causing lots of fragmentation for large memory segments.
For example, if we specify max_split memory size as 400MB, then all allocations more than 400MB will not be split. Lets say, we allocated some 1024MB blocks and these are cached in the allocator blocks. If we request a new 500MB block, we round it to nearest power-2-division, thats 512MB, we add default kLargeBuffer of 20MB, that will be 532MB and since 532MB is less than existing 1024MB block, the 1024MB will not be used for this allocation, instead a new 512MB block will be created. In this diff, we provide an option to cofigure the kLargeBuffer for rounding and expose as a configurable option, so 512MB + max_non_split_rounding_size and if thats greater than 1024MB, we will use te 1024MB and we wont create a new 512MB block using cudaMalloc. This option is added so that we can pre-allocate some large blocks so that we can reuse them as much as possible and we dont stall on calling cudaMalloc.
Differential Revision: D62758758
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136174
Approved by: https://github.com/zyan0
2024-09-17 19:08:44 +00:00
Andrii Grynenko
a141c6bb0d
[pytorch][monitoring] Dynamic backend for WaitCounter ( #135967 )
...
Summary: This implements a default backend proxy that tries to look up a backend via dlsym. What this enables is dynamically loading a module with a backend implementation without having it statically linked with the application.
Differential Revision: D62549295
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135967
Approved by: https://github.com/c-p-i-o
2024-09-15 18:07:49 +00:00
Jessica Vandebon
baff86dafb
[MTIA tensor] allow shallow copy between CPU and MTIA tensors ( #135871 )
...
Reviewed By: egienvalue, hanzlfs
Differential Revision: D61662214
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135871
Approved by: https://github.com/egienvalue , https://github.com/nautsimon
2024-09-13 22:13:58 +00:00
Yu, Guangye
e6b68359d7
Fix xpu memory stats error ( #135818 )
...
# Motivation
fix https://github.com/pytorch/pytorch/issues/135726
After merging two free blocks, I made a stupid mistake of ignoring the correct size to decrease the active memory size, which should be the original block size instead of the merged block size.
# Additional Context
Add a UT to guard this scenario.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135818
Approved by: https://github.com/EikanWang
2024-09-13 02:41:21 +00:00
cyy
f5f1d0a753
Fix build warnings for torch_python ( #134981 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134981
Approved by: https://github.com/ezyang
2024-09-12 03:59:34 +00:00
Yu, Guangye
b53d97c7be
[Intel GPU] Add XPU memory-related APIs ( #129919 )
...
# Motivation
According to https://github.com/pytorch/pytorch/issues/116322 , we will help unify the device allocator. So we introduce a simple xpu device allocator only with the key functionality first. And expect to add some memory statistics-related functionality after the unification.
But now, some memory statistic-related APIs listed in https://github.com/pytorch/pytorch/issues/127929 are requested. We need more time to unify the device allocator. In order to facilitate the user experience, we expect to support these memory statistic-related APIs before the unification.
# Additional Context
Fixes : #127929
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129919
Approved by: https://github.com/dvrogozh , https://github.com/abhilash1910 , https://github.com/gujinghui , https://github.com/EikanWang , https://github.com/albanD
ghstack dependencies: #130923
2024-09-07 11:15:17 +00:00
Yu, Guangye
6c1da66407
[Reland] Refactor caching device allocator utils ( #130923 )
...
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322 ), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang , https://github.com/gujinghui , https://github.com/albanD , https://github.com/eqy
2024-09-07 11:14:17 +00:00
Haibo Chen
e162414963
add instrumentation of CCA stats for reserved and allocated memory size ( #135231 )
...
As titled
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135231
Approved by: https://github.com/c-p-i-o
2024-09-06 02:48:56 +00:00
Kulin Seth
144fde4fd2
[MPS] Add support for autocast in MPS ( #99272 )
...
Fixes https://github.com/pytorch/pytorch/issues/88415
Need to run inductor/test_cpu_select_algorithm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272
Approved by: https://github.com/malfet
Co-authored-by: Siddharth Kotapati <skotapati@apple.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Roy Hvaara <roy@lightyear.no>
2024-09-05 23:23:17 +00:00
Scott Wolchok
a5d70cf545
[PyTorch] Add isfinite to BFloat16-math.h ( #135052 )
...
Missing function from <cmath>.
Differential Revision: [D62148884](https://our.internmc.facebook.com/intern/diff/D62148884/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135052
Approved by: https://github.com/PaliC , https://github.com/albanD
ghstack dependencies: #135031
2024-09-05 21:50:36 +00:00
Scott Wolchok
7fe819d917
[PyTorch] Fix -Wshadow -Werror build in BFloat16-inl.h ( #135031 )
...
`float_t` is required to exists in C99 math.h, which causes -Wshadow to fire. We don't need the alias, fortunately.
Differential Revision: [D62135908](https://our.internmc.facebook.com/intern/diff/D62135908/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135031
Approved by: https://github.com/albanD
2024-09-05 21:48:21 +00:00
PyTorch MergeBot
e55c0f59e5
Revert "[Reland] Refactor caching device allocator utils ( #130923 )"
...
This reverts commit 9809080b9e .
Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/kit1980 due to breaking internal builds - Error: Relocation overflow has occured ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2332640961 ))
2024-09-05 21:16:14 +00:00
Yutao Xu
c7328dff7f
Enhance the stability of the complex divide code ( #134647 )
...
In C++, when a floating-point literal (e.g., 3.14) is compared with a variable of type float, the literal is by default interpreted as a double.
```c++
float f = 3.14f;
if (f == 3.14) {
// Do something
}
```
If a device does not support double, an error will occur.
This PR addresses the issue of complex64 errors on machines that do not support double operations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134647
Approved by: https://github.com/EikanWang , https://github.com/albanD
2024-09-05 08:36:37 +00:00
rzou
d7b57c4d63
Fix tensor.data access under inference_mode and compile ( #134878 )
...
Fixes https://github.com/pytorch/pytorch/issues/134798
In the regular Tensor case, when you call Tensor.data, there's a check
for if inference mode is active. If it is active, then we don't set the
version counter. We replicate this check for Tensor Subclasses (the bug
was we were trying to set the version counter on a FakeTensor in
inference_mode).
Test Plan:
- new test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134878
Approved by: https://github.com/bdhirsh
2024-09-04 17:55:41 +00:00
FFFrog
5690f003a6
C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED and C10_DIAGNOST should be used in pairs ( #135004 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135004
Approved by: https://github.com/aaronenyeshi
2024-09-04 13:14:23 +00:00
Yu, Guangye
9809080b9e
[Reland] Refactor caching device allocator utils ( #130923 )
...
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322 ), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang , https://github.com/gujinghui , https://github.com/albanD , https://github.com/eqy
2024-09-04 05:31:08 +00:00
Haibo Chen
2e0b114c06
add a new Guage API with an empty backend to PyTorch core ( #134883 )
...
Summary:
The current use case is to continuously measure the total allocated and reserved CUDA memory size from CUDACachingAllocator, and export their distribution (min, max, p90 etc) over time as timeseries.
The current callback-based API does not work because the backend decides when the measurement is taken, so data points between two measurements may not be recorded. The distribution (e.g. max) as such will not be accurate.
This new API closely follow the design of the existing WaitCounter API otherwise.
This is not quite a synchronous version of DynamicCounter, as summing multiple data points does not make sense to my use case
Test Plan: CI
Differential Revision: D61837528
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134883
Approved by: https://github.com/c-p-i-o
2024-09-03 17:08:47 +00:00
chilli
db193d1e29
add msg to _assert_async ( #134813 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134813
Approved by: https://github.com/ezyang , https://github.com/eqy , https://github.com/albanD
2024-09-03 06:33:18 +00:00
zdevito
d91b49dbaa
expandable_segments <-> other allocator options ( #134338 )
...
Previously setting garbage_collection_threshold or max_split_size_mb along with expandable_segments:True could cause the allocator to hit assert failures when running nearly out of memory. This PR ensures garbage_collection and max_split freeing do not accidentally try to release expandable segments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134338
Approved by: https://github.com/ezyang
2024-08-29 18:43:59 +00:00
Syed Tousif Ahmed
4655eb3ee2
Uses MemPoolContext to route allocations from CUDACachingAllocator ( #134685 )
...
Re-open of https://github.com/pytorch/pytorch/pull/133599 that was mistakenly closed by issuing `ghstack land`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134685
Approved by: https://github.com/ezyang
2024-08-29 03:56:31 +00:00
PyTorch MergeBot
2c88a923a7
Revert "Refactor caching device allocator utils ( #130923 )"
...
This reverts commit c45ca8092d .
Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to be causing internal tests to fail with errors like `error: no type named 'DeviceStats' in namespace 'xxx::xxx:xxxAllocator'; did you mean 'DeviceStatus'?` ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2315730155 ))
2024-08-28 15:56:08 +00:00
Yu, Guangye
c45ca8092d
Refactor caching device allocator utils ( #130923 )
...
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322 ), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang , https://github.com/gujinghui , https://github.com/albanD , https://github.com/eqy
2024-08-28 01:35:23 +00:00
Yuanhao Ji
44dadf2506
[Fix] Check name when registering privateuse1 backend ( #134071 )
...
do some checks when registering privateuse1 backend to avoid using in-tree deivce names
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134071
Approved by: https://github.com/albanD
2024-08-27 20:28:30 +00:00
Yifu Wang
78d69bfe11
[SymmetricMemory] introduce multicast support, multimem_all_reduce_ and multimem_one_shot_all_reduce ( #133424 )
...
### Summary
- Added multicast support to SymmetricMemory. If the cuda runtime and cuda driver have multicast support, SymmetricMemory associate all peer buffers with a multicast object and exposes the multicast virtual address.
- Implemented `multimem_all_reduce_` and `multimem_one_shot_all_reduce` based on the multicast support. The two variants shows different performance characteristic for different message size. We plan to use Inductor for collective algo selection (and required symmetric memory buffer allocation).
### Benchmark
8xH100 (non-standard version with HBM2e at 650W). NVSwitch V3 with NVLS support.


Differential Revision: [D61682507](https://our.internmc.facebook.com/intern/diff/D61682507 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133424
Approved by: https://github.com/yf225 , https://github.com/weifengpy
2024-08-23 20:09:20 +00:00
PyTorch MergeBot
cedfac20c7
Revert "[SymmetricMemory] introduce multicast support, multimem_all_reduce_ and multimem_one_shot_all_reduce ( #133424 )"
...
This reverts commit 66d3eb783c .
Reverted https://github.com/pytorch/pytorch/pull/133424 on behalf of https://github.com/jeanschmidt due to Broke internal ADS builds, see D61611517 ([comment](https://github.com/pytorch/pytorch/pull/133424#issuecomment-2304676328 ))
2024-08-22 13:29:27 +00:00
David Berard
84b3f1900a
C++ network flow implementation in c10 ( #132188 )
...
The functorch partitioners use network flow to split the joint graph into a forward and backward graph. Internally, we've found that upgrading to networkx 2.8.8 (from 2.5) results in some hard-to-debug failures (internal reference: https://fburl.com/workplace/jrqwagdm ). And I'm told that there's interest to remove the python dependency.
So this PR introduces a C++ implementation that mirrors the API provided by networkx. We'll need to add python bindings and do some additional testing to verify correctness.
Differential Revision: [D61550977](https://our.internmc.facebook.com/intern/diff/D61550977 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132188
Approved by: https://github.com/Chillee
2024-08-21 18:40:54 +00:00
Yifu Wang
66d3eb783c
[SymmetricMemory] introduce multicast support, multimem_all_reduce_ and multimem_one_shot_all_reduce ( #133424 )
...
### Summary
- Added multicast support to SymmetricMemory. If the cuda runtime and cuda driver have multicast support, SymmetricMemory associate all peer buffers with a multicast object and exposes the multicast virtual address.
- Implemented `multimem_all_reduce_` and `multimem_one_shot_all_reduce` based on the multicast support. The two variants shows different performance characteristic for different message size. We plan to use Inductor for collective algo selection (and required symmetric memory buffer allocation).
### Benchmark
8xH100 (non-standard version with HBM2e at 650W). NVSwitch V3 with NVLS support.


Pull Request resolved: https://github.com/pytorch/pytorch/pull/133424
Approved by: https://github.com/yf225 , https://github.com/weifengpy
2024-08-21 05:11:21 +00:00
Zhengxu Chen
517aee5369
[torchscript] Add a sampled logging integration point. ( #133484 )
...
Test Plan:
test script:
```
def test_zhxchen17(self):
from libfb.py.pyinit import initFacebook
initFacebook()
class M(torch.nn.Module):
def forward(self, x):
return torch.add(x, x)
def tmptmp(x, y):
return torch.mul(x, y)
m = M()
n = torch.jit.script(m)
print(n(torch.tensor(1)))
print(torch.jit.script(tmptmp)(torch.tensor(1), torch.tensor(2)))
```
```
I0802 12:01:23.932929 4079081 init.cc:407] Logging to scuba: run __torch__.caffe2.test.export.test_export.M.forward sample rate: 1000000
```
Differential Revision: D60920867
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133484
Approved by: https://github.com/davidberard98
2024-08-19 18:04:45 +00:00
Yu, Guangye
fbd020fce6
Add new prop to _XpuDevicePropertie for triton gemm optimization ( #131738 )
...
# Motivation
This PR aims to add new properties to `_XpuDevicePropertie` for triton gemm optimization.
# Additional Context
`ext_oneapi_supports_cl_extension` is not a ABI-neutral API. It depends on compiler 2025.0. For more details, see https://github.com/intel/llvm/pull/13212
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131738
Approved by: https://github.com/gujinghui
2024-08-18 08:32:30 +00:00
PyTorch MergeBot
a0cb54ab46
Revert "C++ network flow implementation in c10 ( #132188 )"
...
This reverts commit e6272acaec .
Reverted https://github.com/pytorch/pytorch/pull/132188 on behalf of https://github.com/izaitsevfb due to breaks aps models and builds internally ([comment](https://github.com/pytorch/pytorch/pull/132188#issuecomment-2294120234 ))
2024-08-16 19:48:54 +00:00
David Berard
e6272acaec
C++ network flow implementation in c10 ( #132188 )
...
The functorch partitioners use network flow to split the joint graph into a forward and backward graph. Internally, we've found that upgrading to networkx 2.8.8 (from 2.5) results in some hard-to-debug failures (internal reference: https://fburl.com/workplace/jrqwagdm ). And I'm told that there's interest to remove the python dependency.
So this PR introduces a C++ implementation that mirrors the API provided by networkx. We'll need to add python bindings and do some additional testing to verify correctness.
Differential Revision: [D61284135](https://our.internmc.facebook.com/intern/diff/D61284135 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132188
Approved by: https://github.com/Chillee
2024-08-15 07:32:51 +00:00
Guilherme Leobas
c518b50c4c
Remove functorch dispatch keys in legacyExtractDispatchKey ( #133018 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133018
Approved by: https://github.com/zou3519
2024-08-13 15:32:01 +00:00
Yuanhao Ji
343071cd96
Fix privateuse1 backend name case ( #132980 )
...
### Problem
`get_privateuse1_backend(bool lower_case)` always returns a lower case name and `lower_case` is not used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132980
Approved by: https://github.com/albanD
2024-08-10 07:39:54 +00:00
Yu, Guangye
9c5e0d47fe
Add xpu_cmake_macros.h to xpu build ( #132847 )
...
# Motivation
fix https://github.com/pytorch/pytorch/issues/132971
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132847
Approved by: https://github.com/EikanWang
2024-08-08 08:06:49 +00:00
zdevito
fb6b001cde
Disable expandable segments IPC in fbcode, because some jobs
...
seem to be failing. (#132890 )
seem to be failing.
https://fb.workplace.com/groups/1405155842844877/permalink/8867182216642165/
Differential Revision: [D60912371](https://our.internmc.facebook.com/intern/diff/D60912371/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132890
Approved by: https://github.com/eqy , https://github.com/ezyang
2024-08-08 01:42:32 +00:00
cyy
6b12dc0224
[Reland] [11/N] Use std::nullopt and std::optional ( #132622 )
...
Reland of #132396 , which was reverted due to dependency reversion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132622
Approved by: https://github.com/ezyang
2024-08-05 20:36:33 +00:00
PyTorch MergeBot
2764bee942
Revert "[MPS] Add support for autocast in MPS ( #99272 )"
...
This reverts commit 6919e8baab .
Reverted https://github.com/pytorch/pytorch/pull/99272 on behalf of https://github.com/clee2000 due to Broke test/inductor/test_cpu_select_algorithm.py::TestSelectAlgorithmCPU::test_quantized_linear_amx_batch_size_3_in_features_128_out_features_64_bias_False_cpu on sm86 jobs [GH job link](https://github.com/pytorch/pytorch/actions/runs/10252979157/job/28367091621 ) [HUD commit link](6919e8baab ) Not caught on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/99272#issuecomment-2269808857 ))
2024-08-05 19:59:04 +00:00
zdevito
8d9c3a71f6
Support IPC for Expandable Segments ( #130890 )
...
This reapplication commit is the same as before except it resolves a build error in an internal build where `handle` was shadowed.
Differential Revision: [D60547506](https://our.internmc.facebook.com/intern/diff/D60547506 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130890
Approved by: https://github.com/dsjohns2
2024-08-05 18:48:13 +00:00
Kulin Seth
6919e8baab
[MPS] Add support for autocast in MPS ( #99272 )
...
Fixes https://github.com/pytorch/pytorch/issues/88415
Co-authored-by: Siddharth Kotapati <skotapati@apple.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272
Approved by: https://github.com/malfet
2024-08-05 17:02:30 +00:00
Aleksei Nikiforov
14edd986b3
Fix missing include file ( #132647 )
...
This error only appears with newer gcc releases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132647
Approved by: https://github.com/Skylion007
2024-08-05 15:49:49 +00:00
PyTorch MergeBot
00097f3458
Revert "C++ network flow implementation in c10 ( #132188 )"
...
This reverts commit dccce77935 .
Reverted https://github.com/pytorch/pytorch/pull/132188 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to be failing internal tests. Please see D60702564 to investigate ([comment](https://github.com/pytorch/pytorch/pull/132188#issuecomment-2267098420 ))
2024-08-03 18:44:28 +00:00
David Berard
dccce77935
C++ network flow implementation in c10 ( #132188 )
...
The functorch partitioners use network flow to split the joint graph into a forward and backward graph. Internally, we've found that upgrading to networkx 2.8.8 (from 2.5) results in some hard-to-debug failures (internal reference: https://fburl.com/workplace/jrqwagdm ). And I'm told that there's interest to remove the python dependency.
So this PR introduces a C++ implementation that mirrors the API provided by networkx. We'll need to add python bindings and do some additional testing to verify correctness.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132188
Approved by: https://github.com/Chillee
2024-08-02 20:30:59 +00:00
PyTorch MergeBot
e4e3575fb0
Revert "[11/N] Use std::nullopt and std::optional ( #132396 )"
...
This reverts commit d7d6190493 .
Reverted https://github.com/pytorch/pytorch/pull/132396 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR has a dependency on another PR (https://github.com/pytorch/pytorch/pull/128898 ) that has to be reverted ([comment](https://github.com/pytorch/pytorch/pull/132396#issuecomment-2265952528 ))
2024-08-02 18:49:42 +00:00
Aleksei Nikiforov
df781343e2
Link libc10 to pthreads ( #132484 )
...
It gets linked as transitive dependency of `libmkl` on x86_64, but it's must be specified explicitly on s390x
Linking issue only appears when using gcc-13 with gold linker.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132484
Approved by: https://github.com/malfet
2024-08-02 18:03:44 +00:00
soulitzer
82b6480b0a
Update SavedTensorHooks TLS stack to use SafePyObject ( #131700 )
...
Previously, we must manually manage refcounting when updating the TLS saved variable stack. With this PR, things should be handled automatically by the SafePyObject.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131700
Approved by: https://github.com/albanD
2024-08-02 16:27:16 +00:00
Andrii Grynenko
fca2dba7ca
[pytorch][counters] Pybind for WaitCounter ( #132357 )
...
Summary:
Basic pybind integration for WaitCounter providing a guard API.
Also fixes broken copy/move constructor in WaitGuard (it wasn't really used with the macro-based C++ API).
Test Plan: unit test
Differential Revision: D60557660
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132357
Approved by: https://github.com/jamesperng , https://github.com/asiab4
2024-08-02 16:08:10 +00:00
cyy
b9cb1abf65
[12/N] Use std::optional ( #132361 )
...
Follows #132396
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132361
Approved by: https://github.com/eqy
2024-08-02 13:46:46 +00:00
cyy
35d14d22a0
Fix some issues detected by static analysis tools ( #131989 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131989
Approved by: https://github.com/ezyang
2024-08-02 04:18:57 +00:00