Commit Graph

2606 Commits

Author SHA1 Message Date
PyTorch MergeBot
df0c2f5cae Revert "[Environment Variable][3/N] Use thread-safe getenv wrapper (#137328)"
This reverts commit 25ac5652d0.

Reverted https://github.com/pytorch/pytorch/pull/137328 on behalf of https://github.com/clee2000 due to need to revert this in order to revert #133896, please rebase and reland, sorry for the churn ([comment](https://github.com/pytorch/pytorch/pull/137328#issuecomment-2412143739))
2024-10-14 20:22:26 +00:00
cyy
c48fe89011 Make c10::string_view an alias of std::string_view (#130417)
In order to facilitate the mitigation from c10::string_view to std::string_view, the old c10::string_view was renamed to c10::string_view_ext.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130417
Approved by: https://github.com/ezyang
2024-10-14 09:28:04 +00:00
yanbing-j
561f07fae7 Warn users of mkldnn device usage (#137553)
In https://github.com/pytorch/pytorch/issues/136831, user will use mkldnn device to generate tensor, while mkldnn device is no longer used as device type, and only mkldnn layout is used.

We plan to remove mkldnn device related code in the future release. This PR is to warn users not to use mkldnn device.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137553
Approved by: https://github.com/jgong5, https://github.com/ezyang
2024-10-12 13:42:12 +00:00
cyyever
25ac5652d0 [Environment Variable][3/N] Use thread-safe getenv wrapper (#137328)
Follows #124485

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137328
Approved by: https://github.com/eqy
2024-10-11 23:23:57 +00:00
eellison
8893881867 Invalidate StorageImpl instances when tensor is overwritten with cudagraphs (#125264)
Fixes #104435

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125264
Approved by: https://github.com/ezyang

Co-authored-by: eellison <elias.ellison@gmail.com>
2024-10-09 00:05:52 +00:00
cyy
a2396b2dd8 [2/N] Fix extra warnings brought by clang-tidy-17 (#137459)
Follows #137407

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137459
Approved by: https://github.com/Skylion007
2024-10-08 19:05:02 +00:00
PyTorch MergeBot
7e8dace0de Revert "[ROCm] remove caffe2 from hipify (#137157)"
This reverts commit 40d8260745.

Reverted https://github.com/pytorch/pytorch/pull/137157 on behalf of https://github.com/xw285cornell due to this is breaking internal where we still use caffe2 ([comment](https://github.com/pytorch/pytorch/pull/137157#issuecomment-2400466131))
2024-10-08 17:45:45 +00:00
Bin Bao
c04b35a5ae [AOTI] Add standalone version of TORCH_CHECK (#136873)
Summary: In the standalone mode, TORCH_CHECK throws std::runtime_error, instead of c10::Error. The goal is to cut dependency on libtorch. Specifically, AOTI generates CPU code which may call ATen vectorization ops and we need to make sure those ops are self-contained.

Differential Revision: [D63911928](https://our.internmc.facebook.com/intern/diff/D63911928)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136873
Approved by: https://github.com/albanD, https://github.com/chenyang78
2024-10-08 15:30:01 +00:00
PyTorch MergeBot
796c3c3415 Revert "Disallow FakeTensor.data_ptr access in eager mode (#137221)"
This reverts commit 7e13e7dd7e.

Reverted https://github.com/pytorch/pytorch/pull/137221 on behalf of https://github.com/jovianjaison due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/137221#issuecomment-2397957081))
2024-10-07 21:46:13 +00:00
cyy
0c0d8c8ff0 [1/N] Fix extra warnings brought by clang-tidy-17 (#137407)
Before we can use clang-tidy-17
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137407
Approved by: https://github.com/Skylion007, https://github.com/aaronenyeshi
2024-10-07 17:53:59 +00:00
Jeff Daily
40d8260745 [ROCm] remove caffe2 from hipify (#137157)
- Remove all "MasqueradingAsCUDA" files and classes.
- Do not rename "CUDA" classes to "HIP".

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137157
Approved by: https://github.com/eqy
2024-10-05 12:48:54 +00:00
Jeff Daily
c7b0d4b148 raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114)
raw_alloc is used by cudnn, miopen, thrust, and tunableop.  Without this PR, the env var for disabling the caching allocator will only partially work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114
Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD

Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
2024-10-04 15:36:29 +00:00
rzou
7e13e7dd7e Disallow FakeTensor.data_ptr access in eager mode (#137221)
Previously we raised a deprecation warning (beginning PyTorch 2.4). Now
that we are on 2.6, we're completing the deprecation and disallowing
this behavior.

Test Plan:
- tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137221
Approved by: https://github.com/albanD, https://github.com/eellison
2024-10-03 23:47:55 +00:00
Scott Wolchok
c8a7da305b [PyTorch] Add attribute version of C10_ALWAYS_INLINE (#136445)
Sometimes (such as on a lambda), you need `__attribute__((always_inline))` but not `inline`.

Differential Revision: [D63266917](https://our.internmc.facebook.com/intern/diff/D63266917/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136445
Approved by: https://github.com/malfet
2024-10-03 18:18:37 +00:00
PyTorch MergeBot
0d1701f310 Revert "raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114)"
This reverts commit 7001907480.

Reverted https://github.com/pytorch/pytorch/pull/131114 on behalf of https://github.com/PaliC due to failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/131114#issuecomment-2390615007))
2024-10-03 06:22:55 +00:00
Jeff Daily
7001907480 raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114)
raw_alloc is used by cudnn, miopen, thrust, and tunableop.  Without this PR, the env var for disabling the caching allocator will only partially work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114
Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD

Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
2024-10-02 16:27:15 +00:00
cyy
47a78daf91 [Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449)
This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449
Approved by: https://github.com/malfet, https://github.com/albanD, https://github.com/eqy
2024-10-01 06:24:30 +00:00
PyTorch MergeBot
2ef1454189 Revert "Add int1 to int7 dtypes (#136301)"
This reverts commit bfa16a161d.

Reverted https://github.com/pytorch/pytorch/pull/136301 on behalf of https://github.com/PaliC due to causing internal failures ([comment](https://github.com/pytorch/pytorch/pull/136301#issuecomment-2384119600))
2024-09-30 20:50:49 +00:00
Jerry Zhang
bfa16a161d Add int1 to int7 dtypes (#136301)
Summary:
Similar to https://github.com/pytorch/pytorch/pull/117208, we want to add int1 to int7 for edge use cases
for weight quantization (https://www.internalfb.com/diff/D62464487)

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136301
Approved by: https://github.com/ezyang
2024-09-28 02:08:33 +00:00
Yifu Wang
d55eef5c59 [SymmetricMemory] improve multicast initialization/fallback logic (#136577)
Fixes https://github.com/pytorch/pytorch/issues/136494

Currently, CUDASymmetricMemory::rendezvous() initializes a multicast address if multicast support is present. However, if we believe multicast support is present but cuMulticastCreate still fails for some reason, we do not fallback gracefully.

- In addition to CUDART and driver version check, query CU_DEVICE_ATTRIBUTE_MULTICAST_SUPPORTED to determine multicast support for a rank/device.
- Before initializing multicast for a block, ensure all ranks/devices have multicast support.
- This is unlikely, but if cuMulticastCreate still fails on rank 0, print the corresponding driver error message as a warning, and gracefully skip multicast initialization for the block.
- Introduced an environment variable (TORCH_SYMM_MEM_DISABLE_MULTICAST) to allow users to explicitly disable multicast support as a workaround.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136577
Approved by: https://github.com/Chillee, https://github.com/eqy
2024-09-27 20:04:21 +00:00
Nichols A. Romero
e8f1dd6ba0 Fix hardcoded ROCm paths in Caffe2Targets.cmake (#136283)
Fixes #131701

Use CMake imported targets more consistently to eliminate hardcode paths.

Here is the new relevant sections of Caffe2Targets.cmake:
```
set_target_properties(c10_hip PROPERTIES
  INTERFACE_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include"
  INTERFACE_LINK_LIBRARIES "c10;hip::amdhip64"
)
```

```
set_target_properties(torch_hip PROPERTIES
  INTERFACE_COMPILE_DEFINITIONS "USE_C10D_NCCL"
  INTERFACE_COMPILE_OPTIONS "-fPIC;-D__HIP_PLATFORM_AMD__=1;-DCUDA_HAS_FP16=1;-DUSE_ROCM;-D__HIP_NO_HALF_OPERATORS__=1;-D__HIP_NO_HALF_CONVERSIONS__=1;-DTORCH_HIP_VERSION=602;-Wno-shift-count-negative;-Wno-shift-count-overflow;-Wno-duplicate-decl-specifier;-DCAFFE2_USE_MIOPEN;-DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP;-std=c++17;-DHIPBLAS_V2;-DHIP_NEW_TYPE_ENUMS"
  INTERFACE_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include"
  INTERFACE_LINK_LIBRARIES "c10_hip;torch_cpu_library;hip::amdhip64;MIOpen;hiprtc::hiprtc;roc::hipblaslt;roc::hipblas;hip::hipfft;hip::hiprand;roc::hipsparse;roc::hipsolver"
)
```

HIPCUB dependency was not actually used; which is why it is removed here as the imported target had undesirable side effects.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136283
Approved by: https://github.com/jeffdaily, https://github.com/Skylion007, https://github.com/jithunnair-amd, https://github.com/atalman
2024-09-26 00:34:43 +00:00
Yifu Wang
da1560c49f [SymmetricMemory] add support for cuStreamWriteValue32 (#136488)
cuStreamWriteValue efficiently combines the issuing of a system-level fence with the update of a single memory location. It is highly suitable for inter-stream progress sharing (e.g., all_gather_with_progress).

Exposing it via SymmetricMemory allows users to more easily implement efficient progress-aware matmuls in triton ([xformers example](https://github.com/facebookresearch/xformers/blob/main/xformers/ops/_triton/sequence_parallel_fused_kernels.py)).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136488
Approved by: https://github.com/eqy, https://github.com/Chillee
2024-09-24 20:56:29 +00:00
Scott Wolchok
605f2d802a [PyTorch] Remove unnecessary include of c10/util/Exception.h in irange.h (#136202)
Manually audited and can't figure out why this would be needed.

Differential Revision: [D62879500](https://our.internmc.facebook.com/intern/diff/D62879500/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136202
Approved by: https://github.com/malfet
2024-09-18 16:57:15 +00:00
Banit Agrawal
a575ce0dc6 [PyTorch Pinned Allocator] Add support of background thread to process events (#135524)
Summary: Currently we process events in the regular allocation path and we call cudaEventQuery to check on the events and this path can take some locks in libcuda driver. Its not entirely needed to do process events in the allocation path, we could move this to a background thread and keep processing events regularly and put the freed block to the free list.

Differential Revision: D62396585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135524
Approved by: https://github.com/zyan0
2024-09-17 21:08:10 +00:00
Banit Agrawal
48d18fbd4c [PyTorch CUDA Allocator] Allow reuse of non-split blocks with better rounding (#136174)
Summary:
This diff adds an option to round the non-split blocks in caching allocator so that they can be reused without causing lots of fragmentation for large memory segments.

For example, if we specify max_split memory size as 400MB, then all allocations more than 400MB will not be split. Lets say, we allocated some 1024MB blocks and these are cached in the allocator blocks. If we request a new 500MB block, we round it to nearest power-2-division, thats 512MB, we add default kLargeBuffer of 20MB, that will be 532MB and since 532MB is less than existing 1024MB block, the 1024MB will not be used for this allocation, instead a new 512MB block will be created. In this diff, we provide an option to cofigure the kLargeBuffer for rounding and expose as a configurable option, so 512MB + max_non_split_rounding_size and if thats greater than 1024MB, we will use te 1024MB and we wont create a new 512MB block using cudaMalloc. This option is added so that we can pre-allocate some large blocks so that we can reuse them as much as possible and we dont stall on calling cudaMalloc.

Differential Revision: D62758758

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136174
Approved by: https://github.com/zyan0
2024-09-17 19:08:44 +00:00
Andrii Grynenko
a141c6bb0d [pytorch][monitoring] Dynamic backend for WaitCounter (#135967)
Summary: This implements a default backend proxy that tries to look up a backend via dlsym. What this enables is dynamically loading a module with a backend implementation without having it statically linked with the application.

Differential Revision: D62549295

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135967
Approved by: https://github.com/c-p-i-o
2024-09-15 18:07:49 +00:00
Jessica Vandebon
baff86dafb [MTIA tensor] allow shallow copy between CPU and MTIA tensors (#135871)
Reviewed By: egienvalue, hanzlfs

Differential Revision: D61662214

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135871
Approved by: https://github.com/egienvalue, https://github.com/nautsimon
2024-09-13 22:13:58 +00:00
Yu, Guangye
e6b68359d7 Fix xpu memory stats error (#135818)
# Motivation
fix https://github.com/pytorch/pytorch/issues/135726
After merging two free blocks, I made a stupid mistake of ignoring the correct size to decrease the active memory size, which should be the original block size instead of the merged block size.

# Additional Context
Add a UT to guard this scenario.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135818
Approved by: https://github.com/EikanWang
2024-09-13 02:41:21 +00:00
cyy
f5f1d0a753 Fix build warnings for torch_python (#134981)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134981
Approved by: https://github.com/ezyang
2024-09-12 03:59:34 +00:00
Yu, Guangye
b53d97c7be [Intel GPU] Add XPU memory-related APIs (#129919)
# Motivation
According to https://github.com/pytorch/pytorch/issues/116322, we will help unify the device allocator. So we introduce a simple xpu device allocator only with the key functionality first. And expect to add some memory statistics-related functionality after the unification.
But now, some memory statistic-related APIs listed in https://github.com/pytorch/pytorch/issues/127929 are requested. We need more time to unify the device allocator. In order to facilitate the user experience, we expect to support these memory statistic-related APIs before the unification.

# Additional Context
Fixes: #127929

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129919
Approved by: https://github.com/dvrogozh, https://github.com/abhilash1910, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD
ghstack dependencies: #130923
2024-09-07 11:15:17 +00:00
Yu, Guangye
6c1da66407 [Reland] Refactor caching device allocator utils (#130923)
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy
2024-09-07 11:14:17 +00:00
Haibo Chen
e162414963 add instrumentation of CCA stats for reserved and allocated memory size (#135231)
As titled
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135231
Approved by: https://github.com/c-p-i-o
2024-09-06 02:48:56 +00:00
Kulin Seth
144fde4fd2 [MPS] Add support for autocast in MPS (#99272)
Fixes https://github.com/pytorch/pytorch/issues/88415

Need to run inductor/test_cpu_select_algorithm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272
Approved by: https://github.com/malfet

Co-authored-by: Siddharth Kotapati <skotapati@apple.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Roy Hvaara <roy@lightyear.no>
2024-09-05 23:23:17 +00:00
Scott Wolchok
a5d70cf545 [PyTorch] Add isfinite to BFloat16-math.h (#135052)
Missing function from <cmath>.

Differential Revision: [D62148884](https://our.internmc.facebook.com/intern/diff/D62148884/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135052
Approved by: https://github.com/PaliC, https://github.com/albanD
ghstack dependencies: #135031
2024-09-05 21:50:36 +00:00
Scott Wolchok
7fe819d917 [PyTorch] Fix -Wshadow -Werror build in BFloat16-inl.h (#135031)
`float_t` is required to exists in C99 math.h, which causes -Wshadow to fire. We don't need the alias, fortunately.

Differential Revision: [D62135908](https://our.internmc.facebook.com/intern/diff/D62135908/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135031
Approved by: https://github.com/albanD
2024-09-05 21:48:21 +00:00
PyTorch MergeBot
e55c0f59e5 Revert "[Reland] Refactor caching device allocator utils (#130923)"
This reverts commit 9809080b9e.

Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/kit1980 due to breaking internal builds - Error: Relocation overflow has occured ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2332640961))
2024-09-05 21:16:14 +00:00
Yutao Xu
c7328dff7f Enhance the stability of the complex divide code (#134647)
In C++, when a floating-point literal (e.g., 3.14) is compared with a variable of type float, the literal is by default interpreted as a double.
```c++
float f = 3.14f;
if (f == 3.14) {
    // Do something
}
```
If a device does not support double, an error will occur.
This PR addresses the issue of complex64 errors on machines that do not support double operations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134647
Approved by: https://github.com/EikanWang, https://github.com/albanD
2024-09-05 08:36:37 +00:00
rzou
d7b57c4d63 Fix tensor.data access under inference_mode and compile (#134878)
Fixes https://github.com/pytorch/pytorch/issues/134798

In the regular Tensor case, when you call Tensor.data, there's a check
for if inference mode is active. If it is active, then we don't set the
version counter. We replicate this check for Tensor Subclasses (the bug
was we were trying to set the version counter on a FakeTensor in
inference_mode).

Test Plan:
- new test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134878
Approved by: https://github.com/bdhirsh
2024-09-04 17:55:41 +00:00
FFFrog
5690f003a6 C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED and C10_DIAGNOST should be used in pairs (#135004)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135004
Approved by: https://github.com/aaronenyeshi
2024-09-04 13:14:23 +00:00
Yu, Guangye
9809080b9e [Reland] Refactor caching device allocator utils (#130923)
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy
2024-09-04 05:31:08 +00:00
Haibo Chen
2e0b114c06 add a new Guage API with an empty backend to PyTorch core (#134883)
Summary:
The current use case is to continuously measure the total allocated and reserved CUDA memory size from CUDACachingAllocator, and export their distribution (min, max, p90 etc) over time as timeseries.

The current callback-based API does not work because the backend decides when the measurement is taken, so data points between two measurements may not be recorded. The distribution (e.g. max) as such will not be accurate.

This new API closely follow the design of the existing WaitCounter API otherwise.

This is not quite a synchronous version of DynamicCounter, as summing multiple data points does not make sense to my use case

Test Plan: CI

Differential Revision: D61837528

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134883
Approved by: https://github.com/c-p-i-o
2024-09-03 17:08:47 +00:00
chilli
db193d1e29 add msg to _assert_async (#134813)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134813
Approved by: https://github.com/ezyang, https://github.com/eqy, https://github.com/albanD
2024-09-03 06:33:18 +00:00
zdevito
d91b49dbaa expandable_segments <-> other allocator options (#134338)
Previously setting  garbage_collection_threshold or max_split_size_mb along with expandable_segments:True could cause the allocator to hit assert failures when running nearly out of memory. This PR ensures garbage_collection and max_split freeing do not accidentally try to release expandable segments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134338
Approved by: https://github.com/ezyang
2024-08-29 18:43:59 +00:00
Syed Tousif Ahmed
4655eb3ee2 Uses MemPoolContext to route allocations from CUDACachingAllocator (#134685)
Re-open of https://github.com/pytorch/pytorch/pull/133599 that was mistakenly closed by issuing `ghstack land`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134685
Approved by: https://github.com/ezyang
2024-08-29 03:56:31 +00:00
PyTorch MergeBot
2c88a923a7 Revert "Refactor caching device allocator utils (#130923)"
This reverts commit c45ca8092d.

Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to be causing internal tests to fail with errors like `error: no type named 'DeviceStats' in namespace 'xxx::xxx:xxxAllocator'; did you mean 'DeviceStatus'?` ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2315730155))
2024-08-28 15:56:08 +00:00
Yu, Guangye
c45ca8092d Refactor caching device allocator utils (#130923)
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy
2024-08-28 01:35:23 +00:00
Yuanhao Ji
44dadf2506 [Fix] Check name when registering privateuse1 backend (#134071)
do some checks when registering privateuse1 backend to avoid using in-tree deivce names

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134071
Approved by: https://github.com/albanD
2024-08-27 20:28:30 +00:00
Yifu Wang
78d69bfe11 [SymmetricMemory] introduce multicast support, multimem_all_reduce_ and multimem_one_shot_all_reduce (#133424)
### Summary
- Added multicast support to SymmetricMemory. If the cuda runtime and cuda driver have multicast support, SymmetricMemory associate all peer buffers with a multicast object and exposes the multicast virtual address.
- Implemented `multimem_all_reduce_` and `multimem_one_shot_all_reduce` based on the multicast support. The two variants shows different performance characteristic for different message size. We plan to use Inductor for collective algo selection (and required symmetric memory buffer allocation).

### Benchmark

8xH100 (non-standard version with HBM2e at 650W). NVSwitch V3 with NVLS support.

![image](https://github.com/user-attachments/assets/4998a16b-c2c0-4797-9dd0-1da2303df947)

![image](https://github.com/user-attachments/assets/278ad361-52cb-4864-82c6-bb67e8d0a3fe)

Differential Revision: [D61682507](https://our.internmc.facebook.com/intern/diff/D61682507)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133424
Approved by: https://github.com/yf225, https://github.com/weifengpy
2024-08-23 20:09:20 +00:00
PyTorch MergeBot
cedfac20c7 Revert "[SymmetricMemory] introduce multicast support, multimem_all_reduce_ and multimem_one_shot_all_reduce (#133424)"
This reverts commit 66d3eb783c.

Reverted https://github.com/pytorch/pytorch/pull/133424 on behalf of https://github.com/jeanschmidt due to Broke internal ADS builds, see D61611517 ([comment](https://github.com/pytorch/pytorch/pull/133424#issuecomment-2304676328))
2024-08-22 13:29:27 +00:00
David Berard
84b3f1900a C++ network flow implementation in c10 (#132188)
The functorch partitioners use network flow to split the joint graph into a forward and backward graph. Internally, we've found that upgrading to networkx 2.8.8 (from 2.5) results in some hard-to-debug failures (internal reference: https://fburl.com/workplace/jrqwagdm). And I'm told that there's interest to remove the python dependency.

So this PR introduces a C++ implementation that mirrors the API provided by networkx. We'll need to add python bindings and do some additional testing to verify correctness.

Differential Revision: [D61550977](https://our.internmc.facebook.com/intern/diff/D61550977)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132188
Approved by: https://github.com/Chillee
2024-08-21 18:40:54 +00:00
Yifu Wang
66d3eb783c [SymmetricMemory] introduce multicast support, multimem_all_reduce_ and multimem_one_shot_all_reduce (#133424)
### Summary
- Added multicast support to SymmetricMemory. If the cuda runtime and cuda driver have multicast support, SymmetricMemory associate all peer buffers with a multicast object and exposes the multicast virtual address.
- Implemented `multimem_all_reduce_` and `multimem_one_shot_all_reduce` based on the multicast support. The two variants shows different performance characteristic for different message size. We plan to use Inductor for collective algo selection (and required symmetric memory buffer allocation).

### Benchmark

8xH100 (non-standard version with HBM2e at 650W). NVSwitch V3 with NVLS support.

![image](https://github.com/user-attachments/assets/4998a16b-c2c0-4797-9dd0-1da2303df947)

![image](https://github.com/user-attachments/assets/278ad361-52cb-4864-82c6-bb67e8d0a3fe)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133424
Approved by: https://github.com/yf225, https://github.com/weifengpy
2024-08-21 05:11:21 +00:00
Zhengxu Chen
517aee5369 [torchscript] Add a sampled logging integration point. (#133484)
Test Plan:
test script:
```
    def test_zhxchen17(self):
        from libfb.py.pyinit import initFacebook

        initFacebook()

        class M(torch.nn.Module):
            def forward(self, x):
                return torch.add(x, x)

        def tmptmp(x, y):
            return torch.mul(x, y)

        m = M()
        n = torch.jit.script(m)
        print(n(torch.tensor(1)))
        print(torch.jit.script(tmptmp)(torch.tensor(1), torch.tensor(2)))
```

```
I0802 12:01:23.932929 4079081 init.cc:407] Logging to scuba: run __torch__.caffe2.test.export.test_export.M.forward sample rate: 1000000
```

Differential Revision: D60920867

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133484
Approved by: https://github.com/davidberard98
2024-08-19 18:04:45 +00:00
Yu, Guangye
fbd020fce6 Add new prop to _XpuDevicePropertie for triton gemm optimization (#131738)
# Motivation
This PR aims to add new properties to `_XpuDevicePropertie` for triton gemm optimization.

# Additional Context
`ext_oneapi_supports_cl_extension` is not a ABI-neutral API. It depends on compiler 2025.0. For more details, see https://github.com/intel/llvm/pull/13212

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131738
Approved by: https://github.com/gujinghui
2024-08-18 08:32:30 +00:00
PyTorch MergeBot
a0cb54ab46 Revert "C++ network flow implementation in c10 (#132188)"
This reverts commit e6272acaec.

Reverted https://github.com/pytorch/pytorch/pull/132188 on behalf of https://github.com/izaitsevfb due to breaks aps models and builds internally ([comment](https://github.com/pytorch/pytorch/pull/132188#issuecomment-2294120234))
2024-08-16 19:48:54 +00:00
David Berard
e6272acaec C++ network flow implementation in c10 (#132188)
The functorch partitioners use network flow to split the joint graph into a forward and backward graph. Internally, we've found that upgrading to networkx 2.8.8 (from 2.5) results in some hard-to-debug failures (internal reference: https://fburl.com/workplace/jrqwagdm). And I'm told that there's interest to remove the python dependency.

So this PR introduces a C++ implementation that mirrors the API provided by networkx. We'll need to add python bindings and do some additional testing to verify correctness.

Differential Revision: [D61284135](https://our.internmc.facebook.com/intern/diff/D61284135)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132188
Approved by: https://github.com/Chillee
2024-08-15 07:32:51 +00:00
Guilherme Leobas
c518b50c4c Remove functorch dispatch keys in legacyExtractDispatchKey (#133018)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133018
Approved by: https://github.com/zou3519
2024-08-13 15:32:01 +00:00
Yuanhao Ji
343071cd96 Fix privateuse1 backend name case (#132980)
### Problem

`get_privateuse1_backend(bool lower_case)` always returns a lower case name and `lower_case` is not used.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132980
Approved by: https://github.com/albanD
2024-08-10 07:39:54 +00:00
Yu, Guangye
9c5e0d47fe Add xpu_cmake_macros.h to xpu build (#132847)
# Motivation

fix https://github.com/pytorch/pytorch/issues/132971

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132847
Approved by: https://github.com/EikanWang
2024-08-08 08:06:49 +00:00
zdevito
fb6b001cde Disable expandable segments IPC in fbcode, because some jobs
seem to be failing. (#132890)

seem to be failing.

https://fb.workplace.com/groups/1405155842844877/permalink/8867182216642165/

Differential Revision: [D60912371](https://our.internmc.facebook.com/intern/diff/D60912371/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132890
Approved by: https://github.com/eqy, https://github.com/ezyang
2024-08-08 01:42:32 +00:00
cyy
6b12dc0224 [Reland] [11/N] Use std::nullopt and std::optional (#132622)
Reland of #132396, which was reverted due to dependency reversion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132622
Approved by: https://github.com/ezyang
2024-08-05 20:36:33 +00:00
PyTorch MergeBot
2764bee942 Revert "[MPS] Add support for autocast in MPS (#99272)"
This reverts commit 6919e8baab.

Reverted https://github.com/pytorch/pytorch/pull/99272 on behalf of https://github.com/clee2000 due to Broke test/inductor/test_cpu_select_algorithm.py::TestSelectAlgorithmCPU::test_quantized_linear_amx_batch_size_3_in_features_128_out_features_64_bias_False_cpu on sm86 jobs [GH job link](https://github.com/pytorch/pytorch/actions/runs/10252979157/job/28367091621) [HUD commit link](6919e8baab) Not caught on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/99272#issuecomment-2269808857))
2024-08-05 19:59:04 +00:00
zdevito
8d9c3a71f6 Support IPC for Expandable Segments (#130890)
This reapplication commit is the same as before except it resolves a build error in an internal build where `handle` was shadowed.

Differential Revision: [D60547506](https://our.internmc.facebook.com/intern/diff/D60547506)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130890
Approved by: https://github.com/dsjohns2
2024-08-05 18:48:13 +00:00
Kulin Seth
6919e8baab [MPS] Add support for autocast in MPS (#99272)
Fixes https://github.com/pytorch/pytorch/issues/88415

Co-authored-by: Siddharth Kotapati <skotapati@apple.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272
Approved by: https://github.com/malfet
2024-08-05 17:02:30 +00:00
Aleksei Nikiforov
14edd986b3 Fix missing include file (#132647)
This error only appears with newer gcc releases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132647
Approved by: https://github.com/Skylion007
2024-08-05 15:49:49 +00:00
PyTorch MergeBot
00097f3458 Revert "C++ network flow implementation in c10 (#132188)"
This reverts commit dccce77935.

Reverted https://github.com/pytorch/pytorch/pull/132188 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to be failing internal tests. Please see D60702564 to investigate ([comment](https://github.com/pytorch/pytorch/pull/132188#issuecomment-2267098420))
2024-08-03 18:44:28 +00:00
David Berard
dccce77935 C++ network flow implementation in c10 (#132188)
The functorch partitioners use network flow to split the joint graph into a forward and backward graph. Internally, we've found that upgrading to networkx 2.8.8 (from 2.5) results in some hard-to-debug failures (internal reference: https://fburl.com/workplace/jrqwagdm). And I'm told that there's interest to remove the python dependency.

So this PR introduces a C++ implementation that mirrors the API provided by networkx. We'll need to add python bindings and do some additional testing to verify correctness.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132188
Approved by: https://github.com/Chillee
2024-08-02 20:30:59 +00:00
PyTorch MergeBot
e4e3575fb0 Revert "[11/N] Use std::nullopt and std::optional (#132396)"
This reverts commit d7d6190493.

Reverted https://github.com/pytorch/pytorch/pull/132396 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR has a dependency on another PR (https://github.com/pytorch/pytorch/pull/128898) that has to be reverted ([comment](https://github.com/pytorch/pytorch/pull/132396#issuecomment-2265952528))
2024-08-02 18:49:42 +00:00
Aleksei Nikiforov
df781343e2 Link libc10 to pthreads (#132484)
It gets linked as transitive dependency of `libmkl` on x86_64,  but it's must be specified explicitly on s390x

Linking issue only appears when using gcc-13 with gold linker.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132484
Approved by: https://github.com/malfet
2024-08-02 18:03:44 +00:00
soulitzer
82b6480b0a Update SavedTensorHooks TLS stack to use SafePyObject (#131700)
Previously, we must manually manage refcounting when updating the TLS saved variable stack. With this PR, things should be handled automatically by the SafePyObject.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131700
Approved by: https://github.com/albanD
2024-08-02 16:27:16 +00:00
Andrii Grynenko
fca2dba7ca [pytorch][counters] Pybind for WaitCounter (#132357)
Summary:
Basic pybind integration for WaitCounter providing a guard API.
Also fixes broken copy/move constructor in WaitGuard (it wasn't really used with the macro-based C++ API).

Test Plan: unit test

Differential Revision: D60557660

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132357
Approved by: https://github.com/jamesperng, https://github.com/asiab4
2024-08-02 16:08:10 +00:00
cyy
b9cb1abf65 [12/N] Use std::optional (#132361)
Follows #132396

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132361
Approved by: https://github.com/eqy
2024-08-02 13:46:46 +00:00
cyy
35d14d22a0 Fix some issues detected by static analysis tools (#131989)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131989
Approved by: https://github.com/ezyang
2024-08-02 04:18:57 +00:00
Yu, Guangye
92bebb46fa Support XPU ABI=0 build (#130110)
# Motivation
This PR intends to support ABI=0 build for XPU backend.

# Additional Context
The major change is adding a compilation option `-D__INTEL_PREVIEW_BREAKING_CHANGES` for the host compiler(gcc) and `-fpreview-breaking-changes` for XPU device kernel code compiler(icpx), why?
Because we use
- gcc to compile host code and link SYCL runtime. So we need to pass `-D__INTEL_PREVIEW_BREAKING_CHANGES` to tell the host compiler invoking the ABI-neutral API included in SYCL. And
- use icpx to compile device kernel code and link SYCL runtime. So we need to pass `-fpreview-breaking-changes` to tell the device kernel compiler building ABI-neutral code. Besides,
- `libsycl-preview.so` is an ABI-neutral library but `libsycl.so` is not.

This PR depends on https://github.com/pytorch/pytorch/pull/131643.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130110
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD
2024-08-01 21:42:14 +00:00
cyy
d7d6190493 [11/N] Use std::nullopt and std::optional (#132396)
Follows #132364
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132396
Approved by: https://github.com/ezyang
2024-08-01 14:46:33 +00:00
Xu Han
6c1f1563e1 [inductor] fix UndefinedTensorImpl singleton can't export on Windows. (#132326)
This PR fix the `UndefinedTensorImpl::_singleton` can't export on Windows issue.
Snapshot:
<img width="1346" alt="image" src="https://github.com/user-attachments/assets/b34256ac-a0ae-473b-89e6-10d755eaad24">

The reason is MSVC can't export class static data to external linkage, ref: https://learn.microsoft.com/en-us/cpp/cpp/using-dllimport-and-dllexport-in-cpp-classes?view=msvc-170#_pluslang_using_dllimport_and_dllexport_in_c2b2bselectivememberimportexport

I use another singleton implenmentation to avoid the issue, for Windows.

Since this PR, cpp_wrapper on Windows would start to work.
<img width="1916" alt="image" src="https://github.com/user-attachments/assets/c1d7d7e7-64ca-4c6d-9fb7-e3b91e675b58">

Next step, I will enable the cpp_wrapper UTs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132326
Approved by: https://github.com/jgong5, https://github.com/desertfire
2024-08-01 13:37:12 +00:00
cyy
043e41f4f4 [10/N] Use std::nullopt and std::make_optional (#132364)
Follows #130674
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132364
Approved by: https://github.com/ezyang
2024-08-01 07:02:35 +00:00
Dan Zimmerman
90fa64bd7e [torch][take2] Implement BFloat16 __hip_bfloat16 overloads (#132234)
Summary:
In D60024830 I attempted to define these overloads, but gated the implementation on the wrong macros. Namely I used `__CUDACC__` instead of `__HIPCC__` (facepalm).

It might be worth merging this with the nvidia case via typedefs (e.g. `typedef __hip_bfloat16 __gpu_bfloat16` and `typedef __nv_bfloat16 __gpu_bfloat16`), but that seems like an entirely new paradigm for torch, so I'll punt that change to the future so we can focus on supporting `BFloat16(__hip_bfloat16)` here

Test Plan: CI

Differential Revision: D60362079

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132234
Approved by: https://github.com/houseroad
2024-08-01 04:25:46 +00:00
Syed Tousif Ahmed
7c89ec0f7c Implements torch.cuda.MemPool() API (#131152)
In this PR:
- Pool id creation logic is refactored and moved to a MemPool class. `graph_pool_handle()` API now uses `torch.cuda.MemPool()` to get a unique id for a pool. Existing tests should cover this change.
- MemPool holds a pointer to a CUDAAllocator as proposed in https://github.com/pytorch/pytorch/issues/124807#issuecomment-2077506997. Tests are added to show usage with CUDAPluggableAllocator.
- MemPoolContext API makes a mempool active. Tests are added to show usage of this API. This API will be used in CUDACachingAllocator to route allocations to a user provided allocator. See draft here: https://github.com/pytorch/pytorch/pull/125722/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131152
Approved by: https://github.com/eqy, https://github.com/ezyang
2024-08-01 01:29:30 +00:00
Andrii Grynenko
cb4c107d70 [pytorch][counters] DynamicCounter (#132166)
Summary:
Implement a callback-based dynamic counter with pluggable backends.
The backend API and integration is similar to WaitCounter. Note that this counter should only be used with C++ callbacks, since making it safe to be used for GIL-requiring callbacks would be pretty challenging and may defeat the whole purpose of this counter (since the duration of the callback can no longer be guaranteed).

Test Plan: unit test

Differential Revision: D60464055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132166
Approved by: https://github.com/asiab4
2024-07-31 19:52:51 +00:00
PyTorch MergeBot
dc38646c58 Revert "[pytorch][counters] Pybind for WaitCounter (#132167)"
This reverts commit 2c7bd61afa.

Reverted https://github.com/pytorch/pytorch/pull/132167 on behalf of https://github.com/clee2000 due to broke test_public_bindings.py::TestPublicBindings::test_correct_module_names [GH job link](https://github.com/pytorch/pytorch/actions/runs/10183687967/job/28172929836) [HUD commit link](2c7bd61afa) not tested on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/132167#issuecomment-2261328275))
2024-07-31 19:51:56 +00:00
Andrii Grynenko
2c7bd61afa [pytorch][counters] Pybind for WaitCounter (#132167)
Summary:
Basic pybind integration for WaitCounter providing a guard API.
Also fixes broken copy/move constructor in WaitGuard (it wasn't really used with the macro-based C++ API).

Test Plan: unit test

Reviewed By: asiab4

Differential Revision: D60463979

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132167
Approved by: https://github.com/asiab4
2024-07-31 16:04:40 +00:00
Xu Han
39a3c98aa6 [inductor] fix scalar miss constuctor for long type. (#132117)
Fix `long` to `c10::scalar` convert issue.

![image](https://github.com/user-attachments/assets/fc44a170-e293-4688-a185-d189484f6638)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132117
Approved by: https://github.com/jgong5, https://github.com/desertfire
2024-07-31 15:40:48 +00:00
Dan Zimmerman
dad125a64b Address clang-tidy nits in BFloat16 (#132203)
Summary: In https://github.com/pytorch/pytorch/pull/131359 I forgot to amend with clang-tidy fixes before merging. This addresses that.

Test Plan: CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132203
Approved by: https://github.com/houseroad
2024-07-31 13:41:56 +00:00
Roy Berger
93facac02c [NeuralNetInference] Bring up iOS builds (#131917)
Summary: Mirror Android setup to static link & use lite interpreter on iOS

Test Plan: CI

Reviewed By: EscapeZero

Differential Revision: D60156611

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131917
Approved by: https://github.com/cccclai
2024-07-30 23:01:09 +00:00
eellison
2a4d9aa548 Disable expandable segments checkpointing internally (#132048)
Differential Revision: [D60388286](https://our.internmc.facebook.com/intern/diff/D60388286)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132048
Approved by: https://github.com/ezyang, https://github.com/eqy
2024-07-30 00:26:39 +00:00
pruthvistony
9d497887b8 Changes to support clang-19 (#131905)
Co-authored-by: pruthvistony <pruthvigithub@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131905
Approved by: https://github.com/jeffdaily, https://github.com/Skylion007
2024-07-29 12:38:23 +00:00
albanD
466ea8ce54 Add fallback() to torch.library (#131707)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131707
Approved by: https://github.com/zou3519
2024-07-27 18:02:35 +00:00
cyy
f83ef69b84 Fix typo in assignment operators (#131890)
Most typos were introduced in #131077
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131890
Approved by: https://github.com/Skylion007
2024-07-27 11:13:42 +00:00
Dan Zimmerman
535c17efb3 [torch] Implement c10::BFloat16 ctor from __hip_bfloat16 (#131359)
Summary: Pretty straightfoward. ROCm 6.2.0 changed the `__hip_bfloat16` API (see [this PR](481912a1fd)), so we gate impl on `__BF16_HOST_DEVICE__` macro to support older and newer versions of ROCm.

Test Plan: CI

Differential Revision: D60024830

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131359
Approved by: https://github.com/houseroad
2024-07-26 14:30:49 +00:00
PyTorch MergeBot
49a8e061b6 Revert "Support IPC for Expandable Segments (#130890)"
This reverts commit 0e71a88f9b.

Reverted https://github.com/pytorch/pytorch/pull/130890 on behalf of https://github.com/zdevito due to some internal tests show shutdown issues with the change to the table that holds ipc handles ([comment](https://github.com/pytorch/pytorch/pull/130890#issuecomment-2250767280))
2024-07-25 15:54:57 +00:00
Aaron Enye Shi
fddb1bcdea [CCA][Memory Snapshot] Move user_defined annotations to Native Caching Allocator (#130964)
Summary: Instead of embedding the user_defined TraceEntry inside of device_traces, which causes issues when some threads may not have the proper device id set, save them into an external_annotations field by using a RingBuffer<AnnotationEntry> called annotation_buffer owned by the NativeCachingAllocator.

Test Plan: CI, resnet run, and FBR model.

Differential Revision: D59703213

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130964
Approved by: https://github.com/zdevito
2024-07-25 14:06:52 +00:00
cyy
35bb0d3638 Fix unsigned type bug in CUDACachingAllocator.cpp (#131464)
curr_block->size and block_state.size are both size_t, so once they are not equal, split will happen. According to the comment, it's better to use '>'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131464
Approved by: https://github.com/eqy, https://github.com/ezyang
2024-07-25 04:48:05 +00:00
Andrii Grynenko
b98b3127f7 [easy][pytorch][counters] Move WaitCounter in c10/util (#131021)
Summary: Since WaitCounter frontend itself has minimal depdendencies it's fine to be moved into c10. Specific backends can be registered/linked separately.

Test Plan: unit test

Reviewed By: jamesperng, asiab4, c-p-i-o

Differential Revision: D59842868

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131021
Approved by: https://github.com/asiab4
2024-07-24 18:38:33 +00:00
zdevito
0e71a88f9b Support IPC for Expandable Segments (#130890)
This reapplication commit is the same as before except it resolves a build error in an internal build where `handle` was shadowed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130890
Approved by: https://github.com/dsjohns2
2024-07-24 15:45:40 +00:00
cyyever
451462dbff [1/N] Add missing constructors or assignment operators (#131077)
Just mark them as deleted in most cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131077
Approved by: https://github.com/ezyang
2024-07-24 12:09:39 +00:00
PyTorch MergeBot
5f0b65bee7 Revert "Replace manual parsing of "TMPDIR", "TMP", "TEMP" and "TEMPDIR" with std::filesystem::temp_directory_path() (#130842)"
This reverts commit d33804f8b6.

Reverted https://github.com/pytorch/pytorch/pull/130842 on behalf of https://github.com/clee2000 due to breaking some builds internally D60085710, Im not sure what the logs mean but I think its something about build size ([comment](https://github.com/pytorch/pytorch/pull/130842#issuecomment-2245799309))
2024-07-23 17:15:06 +00:00
PyTorch MergeBot
1e86387871 Revert "Support IPC for Expandable Segments (#130890)"
This reverts commit 32c2f84e34.

Reverted https://github.com/pytorch/pytorch/pull/130890 on behalf of https://github.com/zdevito due to variable shadowing broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/130890#issuecomment-2245456085))
2024-07-23 14:46:28 +00:00
Yifu Wang
d33804f8b6 Replace manual parsing of "TMPDIR", "TMP", "TEMP" and "TEMPDIR" with std::filesystem::temp_directory_path() (#130842)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130842
Approved by: https://github.com/fegin
2024-07-22 21:49:33 +00:00
zdevito
32c2f84e34 Support IPC for Expandable Segments (#130890)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130890
Approved by: https://github.com/dsjohns2
ghstack dependencies: #130888, #130889
2024-07-22 16:15:01 +00:00
cyy
1d1d074072 [3/N] Fix Wunused-parameter warnings (#131271)
Follows #131170

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131271
Approved by: https://github.com/ezyang
2024-07-20 23:31:03 +00:00