cyy
f95c71867e
[9/N] Fix extra warnings brought by clang-tidy-17 ( #139286 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139286
Approved by: https://github.com/ezyang
2024-10-31 05:20:31 +00:00
cyyever
456c87c8a2
[8/N] Fix extra warnings brought by clang-tidy-17 ( #139151 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139151
Approved by: https://github.com/ezyang
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2024-10-30 14:20:08 +00:00
Edward Yang
b14269dcfb
Make Context to be Device-agnostic Step by Step (1/N) ( #136519 ) ( #138155 )
...
Summary:
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization
Original pull request: https://github.com/pytorch/pytorch/pull/136519
Test Plan: contbuild & OSS CI, see 4a8e49389c
Reviewed By: malfet
Differential Revision: D64471142
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138155
Approved by: https://github.com/malfet , https://github.com/bobrenjc93
2024-10-17 20:58:56 +00:00
PyTorch MergeBot
d4d687ffb2
Revert "Make Context to be Device-agnostic Step by Step (1/N) ( #136519 )"
...
This reverts commit 4a8e49389c .
Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/clee2000 due to breaking internal tests related to MITA, @ezyang has a forward fix? ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2414588302 ))
2024-10-15 17:19:16 +00:00
FFFrog
4a8e49389c
Make Context to be Device-agnostic Step by Step (1/N) ( #136519 )
...
----
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519
Approved by: https://github.com/ezyang , https://github.com/EikanWang , https://github.com/guangyey
2024-10-13 12:38:02 +00:00
PyTorch MergeBot
079f909263
Revert "Make Context to be Device-agnostic Step by Step (1/N) ( #136519 )"
...
This reverts commit be0b75256a .
Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/jovianjaison due to this pr is causing errors internally ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2405781093 ))
2024-10-10 18:32:17 +00:00
Jin Zhou
5516ac5c21
[ROCm] Tunableop record untuned ( #128813 )
...
When enable tunableop, It is easy to have OOM since APP usually needs large video memory size, such as running a LLM for inference. So we need a offline mode to tune the GEMMs. This PR provide an offline mode for tunableOp:
- record untuned GEMMs to file.
- a python API named tune_gemm_in_file is added to read the untuned file and tune the GEMMs in file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128813
Approved by: https://github.com/jeffdaily , https://github.com/hongxiayang , https://github.com/naromero77amd
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2024-10-09 21:59:03 +00:00
FFFrog
be0b75256a
Make Context to be Device-agnostic Step by Step (1/N) ( #136519 )
...
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519
Approved by: https://github.com/ezyang , https://github.com/EikanWang , https://github.com/guangyey
2024-10-09 02:13:36 +00:00
eellison
8893881867
Invalidate StorageImpl instances when tensor is overwritten with cudagraphs ( #125264 )
...
Fixes #104435
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125264
Approved by: https://github.com/ezyang
Co-authored-by: eellison <elias.ellison@gmail.com>
2024-10-09 00:05:52 +00:00
cyy
a2396b2dd8
[2/N] Fix extra warnings brought by clang-tidy-17 ( #137459 )
...
Follows #137407
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137459
Approved by: https://github.com/Skylion007
2024-10-08 19:05:02 +00:00
albanD
88e54de219
More nogil unsafe API fix ( #137142 )
...
Cover the PyDict APIs and confirms no update needed for PyModule one.
The rest was already covered in https://github.com/pytorch/pytorch/pull/136899
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137142
Approved by: https://github.com/eqy , https://github.com/Skylion007
2024-10-04 21:56:34 +00:00
Jeff Daily
c7b0d4b148
raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING ( #131114 )
...
raw_alloc is used by cudnn, miopen, thrust, and tunableop. Without this PR, the env var for disabling the caching allocator will only partially work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114
Approved by: https://github.com/eqy , https://github.com/houseroad , https://github.com/albanD
Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
2024-10-04 15:36:29 +00:00
PyTorch MergeBot
0d1701f310
Revert "raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING ( #131114 )"
...
This reverts commit 7001907480 .
Reverted https://github.com/pytorch/pytorch/pull/131114 on behalf of https://github.com/PaliC due to failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/131114#issuecomment-2390615007 ))
2024-10-03 06:22:55 +00:00
Jeff Daily
7001907480
raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING ( #131114 )
...
raw_alloc is used by cudnn, miopen, thrust, and tunableop. Without this PR, the env var for disabling the caching allocator will only partially work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114
Approved by: https://github.com/eqy , https://github.com/houseroad , https://github.com/albanD
Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
2024-10-02 16:27:15 +00:00
Jack Taylor
a15774563b
[ROCm] Enable ROCm support for inductor's dynamic_rblock_scaling ( #129663 )
...
As of ROCm 6.1 [hipDeviceProp_t::regsPerMultiprocessor](https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/structhip_device_prop__t.html#a7390d5b180d63978c81aa971060270b4 ) is now available allowing us to enable this attribute on ROCm.
```
>>> torch.cuda.get_device_properties(0)
_CudaDeviceProperties(name='AMD Instinct MI250X/MI250', major=9, minor=0, gcnArchName='gfx90a:sramecc+:xnack-', total_memory=65520MB, multi_processor_count=104)
>>> torch.cuda.get_device_properties(0).regs_per_multiprocessor
65536
```
With https://github.com/triton-lang/triton/pull/3962we can extract n_regs and n_spells from a triton binary with AMD backend allowing us to enable inductor's dynamic_rblock_scaling on ROCm initially implemented in https://github.com/pytorch/pytorch/pull/115094
Leaving this in draft until following PRs have landed:
- https://github.com/pytorch/pytorch/pull/129361 to bump the triton commit pin
- https://github.com/pytorch/pytorch/pull/128449 to allow us to grab warp_size from device properties instead of hard coding 64 on ROCm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129663
Approved by: https://github.com/jansel , https://github.com/shunting314
2024-09-13 16:45:39 +00:00
Yu, Guangye
6c1da66407
[Reland] Refactor caching device allocator utils ( #130923 )
...
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322 ), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang , https://github.com/gujinghui , https://github.com/albanD , https://github.com/eqy
2024-09-07 11:14:17 +00:00
PyTorch MergeBot
e55c0f59e5
Revert "[Reland] Refactor caching device allocator utils ( #130923 )"
...
This reverts commit 9809080b9e .
Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/kit1980 due to breaking internal builds - Error: Relocation overflow has occured ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2332640961 ))
2024-09-05 21:16:14 +00:00
Yu, Guangye
9809080b9e
[Reland] Refactor caching device allocator utils ( #130923 )
...
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322 ), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang , https://github.com/gujinghui , https://github.com/albanD , https://github.com/eqy
2024-09-04 05:31:08 +00:00
Natalia Gimelshein
c25b64a057
expose host_emptyCache to python, fix a bug in freeing cudaHostRegist… ( #134919 )
...
…ered memory
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134919
Approved by: https://github.com/eqy
2024-09-01 09:07:25 +00:00
Natalia Gimelshein
29b7852dc1
drop gil in couple places (leads to deadlocks) ( #134910 )
...
Per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134910
Approved by: https://github.com/eqy
2024-09-01 00:05:53 +00:00
Syed Tousif Ahmed
4655eb3ee2
Uses MemPoolContext to route allocations from CUDACachingAllocator ( #134685 )
...
Re-open of https://github.com/pytorch/pytorch/pull/133599 that was mistakenly closed by issuing `ghstack land`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134685
Approved by: https://github.com/ezyang
2024-08-29 03:56:31 +00:00
PyTorch MergeBot
2c88a923a7
Revert "Refactor caching device allocator utils ( #130923 )"
...
This reverts commit c45ca8092d .
Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to be causing internal tests to fail with errors like `error: no type named 'DeviceStats' in namespace 'xxx::xxx:xxxAllocator'; did you mean 'DeviceStatus'?` ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2315730155 ))
2024-08-28 15:56:08 +00:00
Yu, Guangye
c45ca8092d
Refactor caching device allocator utils ( #130923 )
...
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322 ), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang , https://github.com/gujinghui , https://github.com/albanD , https://github.com/eqy
2024-08-28 01:35:23 +00:00
Jesse Cai
255cd75a97
[sparse] Add cuSPARSELt as a backend ( #128534 )
...
Summary:
This PR adds in cuSPARSELt as a backend to PyTorch.
It is now possible to see if cuSPARSELt is available and the version if
it is with
```
torch.backends.cusparselt.is_available()
torch.backends.cusparselt.version()
```
Test Plan:
```
python test/test_sparse_semi_structured.py -k test_cusparselt_backend
```
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128534
Approved by: https://github.com/cpuhrsch , https://github.com/eqy , https://github.com/syed-ahmed
2024-08-21 22:06:07 +00:00
Mikayla Gawarecki
018e48c337
[Reland] Add wrappers for synchronous GPUDirect Storage APIs ( #133489 )
...
Reland #130633
USE_CUFILE turned off by default in this version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133489
Approved by: https://github.com/albanD
2024-08-15 17:11:52 +00:00
Jez Ng
260e7cb143
Make CUDA device properties's __repr__ output actually printable ( #132863 )
...
Previously we would write the UUID bytes directly, leading to 'invalid
UTF-8 sequence' errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132863
Approved by: https://github.com/Skylion007 , https://github.com/eqy
2024-08-07 21:08:43 +00:00
Nicolas Macchioni
527f104a69
add L2 cache size to device properties ( #132819 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132819
Approved by: https://github.com/eellison
2024-08-07 04:55:06 +00:00
PyTorch MergeBot
e191b83462
Revert "Add wrappers for synchronous GPUDirect Storage APIs ( #130633 )"
...
This reverts commit 709ddf7a9d .
Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to still failing internally D60265673 ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2253239607 ))
2024-07-26 18:08:20 +00:00
Mikayla Gawarecki
709ddf7a9d
Add wrappers for synchronous GPUDirect Storage APIs ( #130633 )
...
Based in part on https://github.com/NVIDIA/apex/pull/1774
Differential Revision: [D60155434](https://our.internmc.facebook.com/intern/diff/D60155434 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633
Approved by: https://github.com/albanD
2024-07-25 22:23:38 +00:00
Aaron Enye Shi
fddb1bcdea
[CCA][Memory Snapshot] Move user_defined annotations to Native Caching Allocator ( #130964 )
...
Summary: Instead of embedding the user_defined TraceEntry inside of device_traces, which causes issues when some threads may not have the proper device id set, save them into an external_annotations field by using a RingBuffer<AnnotationEntry> called annotation_buffer owned by the NativeCachingAllocator.
Test Plan: CI, resnet run, and FBR model.
Differential Revision: D59703213
Pulled By: aaronenyeshi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130964
Approved by: https://github.com/zdevito
2024-07-25 14:06:52 +00:00
PyTorch MergeBot
e4b5645f83
Revert "Add wrappers for synchronous GPUDirect Storage APIs ( #130633 )"
...
This reverts commit 5b5e0698a5 .
Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2245806738 ))
2024-07-23 17:19:34 +00:00
Mikayla Gawarecki
5b5e0698a5
Add wrappers for synchronous GPUDirect Storage APIs ( #130633 )
...
Based in part on https://github.com/NVIDIA/apex/pull/1774
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633
Approved by: https://github.com/albanD
2024-07-22 14:51:24 +00:00
PyTorch MergeBot
7c299b46ca
Revert "Invalidate StorageImpl instances when tensor is overwritten with cudagraphs ( #125264 )"
...
This reverts commit 8390843eba .
Reverted https://github.com/pytorch/pytorch/pull/125264 on behalf of https://github.com/izaitsevfb due to breaks internal tests ([comment](https://github.com/pytorch/pytorch/pull/125264#issuecomment-2240516202 ))
2024-07-19 22:58:51 +00:00
PyTorch MergeBot
5f981388ec
Revert "[ROCm] Enable ROCm support for inductor's dynamic_rblock_scaling ( #129663 )"
...
This reverts commit d7a78ec8b9 .
Reverted https://github.com/pytorch/pytorch/pull/129663 on behalf of https://github.com/atalman due to Breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/129663#issuecomment-2240011143 ))
2024-07-19 19:46:26 +00:00
Jack Taylor
d7a78ec8b9
[ROCm] Enable ROCm support for inductor's dynamic_rblock_scaling ( #129663 )
...
As of ROCm 6.1 [hipDeviceProp_t::regsPerMultiprocessor](https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/structhip_device_prop__t.html#a7390d5b180d63978c81aa971060270b4 ) is now available allowing us to enable this attribute on ROCm.
```
>>> torch.cuda.get_device_properties(0)
_CudaDeviceProperties(name='AMD Instinct MI250X/MI250', major=9, minor=0, gcnArchName='gfx90a:sramecc+:xnack-', total_memory=65520MB, multi_processor_count=104)
>>> torch.cuda.get_device_properties(0).regs_per_multiprocessor
65536
```
With https://github.com/triton-lang/triton/pull/3962we can extract n_regs and n_spells from a triton binary with AMD backend allowing us to enable inductor's dynamic_rblock_scaling on ROCm initially implemented in https://github.com/pytorch/pytorch/pull/115094
Leaving this in draft until following PRs have landed:
- https://github.com/pytorch/pytorch/pull/129361 to bump the triton commit pin
- https://github.com/pytorch/pytorch/pull/128449 to allow us to grab warp_size from device properties instead of hard coding 64 on ROCm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129663
Approved by: https://github.com/jansel , https://github.com/shunting314
2024-07-19 09:45:03 +00:00
Syed Tousif Ahmed
38b7d89aa4
Uses context pointer for deleter to enable multiple CUDAPluggableAllocator usage ( #130472 )
...
We should be able to create multiple CUDAPluggableAllocators in the same pytorch program (see https://github.com/pytorch/pytorch/issues/124807 , https://github.com/pytorch/pytorch/pull/125722 for context). When mixing CUDAPluggableAllocators in the same pytorch program, we need to make sure that the deleter passed in through the CUDAPluggableAllocator gets "attached" to the data_ptr and persist until program exit (when it's called to free the memory).
Currently, CUDAPluggableAllocator maintains a global `current_custom_allocator`. When creating the `DataPtr`, `raw_deleter` attaches `custom_raw_deleter` to the DataPtr which calls `current_custom_allocator->raw_delete(...)`. This approach is fine when using only one allocator, however for multiple allocator use case, DataPtr would be using the deleter of whatever is in the `current_custom_allocator`. For example, if allocation 1 was done with `cudaMalloc` and allocation 2 was done with `ncclMemAlloc`, and if `current_custom_allocator` is currently pointing to the CUDAPluggableAllocator with `ncclMemAlloc` - when cleaning up the allocation 1, we'd be using `ncclMemFree` instead of `cudaFree`.
In this PR, we solve the above problem by remembering the `free_fn_` using a deleter context. Hence, there is no need to go through an allocator object to find the deleter.
CC: @zdevito @ptrblck @eqy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130472
Approved by: https://github.com/eqy , https://github.com/ezyang
2024-07-18 11:33:21 +00:00
Yu, Guangye
f2552dcc3d
refactor cached tensor more generic ( #129359 )
...
# Motivation
solve https://github.com/pytorch/pytorch/issues/129027 to refactor cached tensor to be generic.
# Additional Context
No API name change. It is only decoupling with CUDA build option.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129359
Approved by: https://github.com/eqy , https://github.com/EikanWang , https://github.com/albanD
2024-07-17 03:00:08 +00:00
Isuru Fernando
8390843eba
Invalidate StorageImpl instances when tensor is overwritten with cudagraphs ( #125264 )
...
Fixes #104435
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125264
Approved by: https://github.com/ezyang
2024-07-16 14:29:29 +00:00
PyTorch MergeBot
78799e82b0
Revert "Invalidate StorageImpl instances when tensor is overwritten with cudagraphs ( #125264 )"
...
This reverts commit 1bc390c5f5 .
Reverted https://github.com/pytorch/pytorch/pull/125264 on behalf of https://github.com/jithunnair-amd due to test test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times is failing https://github.com/pytorch/pytorch/actions/runs/9933628108/job/27477785946 1bc390c5f5 . Test was introduced by fa5f572748 which is before the merge base ([comment](https://github.com/pytorch/pytorch/pull/125264#issuecomment-2229508737 ))
2024-07-15 21:59:46 +00:00
Isuru Fernando
1bc390c5f5
Invalidate StorageImpl instances when tensor is overwritten with cudagraphs ( #125264 )
...
Fixes #104435
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125264
Approved by: https://github.com/ezyang
2024-07-15 04:16:17 +00:00
Ramana Cherukuri
f6a0be5023
Add warpSize to Device properties ( #128449 )
...
Adding warp_size to CudaDeviceProperties.
>>> import torch
>>> prop = torch.cuda.get_device_properties(torch.cuda.current_device())
>>> prop.warp_size
64
>>>
@jeffdaily @pruthvistony @jithunnair-amd @ROCmSupport
Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128449
Approved by: https://github.com/eqy , https://github.com/jataylo , https://github.com/jithunnair-amd , https://github.com/malfet
2024-07-01 09:13:32 +00:00
Jeff Daily
169b4ca07e
add uuid in cudaDeviceProperties ( #125083 )
...
Replaces #99967 .
Fixes #99903 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125083
Approved by: https://github.com/pruthvistony , https://github.com/albanD , https://github.com/eqy , https://github.com/malfet
2024-06-27 23:53:13 +00:00
Aaron Enye Shi
f42d5b6dca
[Memory Snapshot] Make recordAnnotations callback initialize lazily ( #129242 )
...
Summary: Make the recordAnnotations' Record function callback lazily initialize when record memory history starts. This will help reduce the impact on Time To First Batch metric.
Test Plan: CI and ran locally.
Differential Revision: D58875576
Pulled By: aaronenyeshi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129242
Approved by: https://github.com/zdevito
2024-06-22 04:05:55 +00:00
Aaron Enye Shi
b5d541609d
[Memory Snapshot] Add recordAnnotations to capture record_function annotations ( #129072 )
...
Summary:
Add new traceEvents into Memory Snapshot for record_function annotations. These will capture both the profiler's step annotation as well as user annotations.
Test Plan:
CI
Pulled By:
aaronenyeshi
Differential Revision: D55941362
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129072
Approved by: https://github.com/zdevito
2024-06-19 18:05:41 +00:00
Jeff Daily
0e7bd7fedd
[ROCm] TunableOp improvements ( #124362 )
...
- use less memory; smaller default hipblaslt workspace size
- options to avoid cache effects
- icache flush option
- rotating buffers during tuning
- python APIs
- unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124362
Approved by: https://github.com/xw285cornell
2024-06-03 22:30:11 +00:00
PyTorch MergeBot
718bb9016f
Revert "[Memory Snapshot] Add recordAnnotations to capture record_function annotations ( #124179 )"
...
This reverts commit 187aeaeabf .
Reverted https://github.com/pytorch/pytorch/pull/124179 on behalf of https://github.com/clee2000 due to test_tensorexpr.py::TestTensorExprFuser::test_simple_add is causing a segfault https://github.com/pytorch/pytorch/actions/runs/9097383783/job/25007155440 187aeaeabf , test was skipped due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/124179#issuecomment-2112948246 ))
2024-05-15 16:11:47 +00:00
Aaron Enye Shi
187aeaeabf
[Memory Snapshot] Add recordAnnotations to capture record_function annotations ( #124179 )
...
Summary: Add new traceEvents into Memory Snapshot for record_function annotations. These will capture both the profiler's step annotation as well as user annotations.
Test Plan:
CI
New Snapshot Generated:
devvm2184.cco0.facebook.com.Apr_19_13_27_14.3072800.snapshot.pickle
Snippet of Snapshot device_traces show `ProfilerStep#0`, and `## forward ##` annotations:
```
[[{'action': 'user_defined',
'addr': 0,
'size': 0,
'stream': 0,
'time_us': 1713558427168556,
'frames': [{'name': 'START', 'filename': 'ProfilerStep#0', 'line': 0}]},
{'action': 'user_defined',
'addr': 0,
'size': 0,
'stream': 0,
'time_us': 1713558427168738,
'frames': [{'name': 'END', 'filename': 'ProfilerStep#0', 'line': 0}]},
{'action': 'user_defined',
'addr': 0,
'size': 0,
'stream': 0,
'time_us': 1713558427168865,
'frames': [{'name': 'START', 'filename': 'ProfilerStep#1', 'line': 0}]},
{'action': 'user_defined',
'addr': 0,
'size': 0,
'stream': 0,
'time_us': 1713558427168920,
'frames': [{'name': 'START', 'filename': '## forward ##', 'line': 0}]},
{'action': 'alloc',
'addr': 140166073581568,
'size': 3211264,
'stream': 0,
'time_us': 1713558427172978,
'frames': [{'name': '_conv_forward',
'filename': '/mnt/xarfuse/uid-416185/235d4caf-seed-nspid4026531836_cgpid32884718-ns-4026531840/torch/nn/modules/conv
```
Differential Revision: D55941362
Pulled By: aaronenyeshi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124179
Approved by: https://github.com/zdevito
2024-05-15 14:19:40 +00:00
Richard Barnes
ed327876f5
[codemod] c10:optional -> std::optional ( #126135 )
...
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```
`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007 , https://github.com/malfet , https://github.com/albanD , https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
PyTorch MergeBot
6fd745255e
Revert "add uuid in cudaDeviceProperties ( #125083 )"
...
This reverts commit 3f36145db2 .
Reverted https://github.com/pytorch/pytorch/pull/125083 on behalf of https://github.com/izaitsevfb due to Fails internal builds with: no member named 'uuid' in 'hipDeviceProp_t' ([comment](https://github.com/pytorch/pytorch/pull/125083#issuecomment-2103315320 ))
2024-05-09 19:52:45 +00:00
Jeff Daily
3f36145db2
add uuid in cudaDeviceProperties ( #125083 )
...
Replaces #99967 .
Fixes #99903 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125083
Approved by: https://github.com/pruthvistony , https://github.com/albanD , https://github.com/eqy
2024-05-08 19:15:55 +00:00