Commit Graph

1288 Commits

Author SHA1 Message Date
cyy
3907f36808 Turn some variables and functions into static (#136847)
Re-check some files and mark variables and functions into static and fix other warnings.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136847
Approved by: https://github.com/ezyang
2024-10-29 17:01:56 +00:00
cyy
e201460f8a [2/N] Fix Wextra-semi warnings (#139142)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139142
Approved by: https://github.com/ezyang
2024-10-29 08:14:37 +00:00
cyy
383d9e3de6 [4/N] Fix cppcoreguidelines-special-member-functions warnings (#139027)
Follows #138796
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139027
Approved by: https://github.com/ezyang
2024-10-29 00:18:18 +00:00
cyy
f4f0f2995d Fix Wextra-semi warnings (#139000)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139000
Approved by: https://github.com/ezyang
2024-10-28 21:48:51 +00:00
Yu, Guangye
40c098f731 Introduce a device-agnostic runtime API design (#132204)
# Motivation
According to [[RFC]A device-agnostic Python runtime API design for stream-based accelerators](https://github.com/pytorch/pytorch/issues/128403), this PR intends to introduce a device-agnostic runtime API design.
I personally prefer the **Simple Version** APIs that no longer accept the device type as an input argument. It means we will leverage `getAccelerator` to fetch the current accelerator. And it is flexible to expand these APIs to handle multiple types of accelerator scenarios. The design does **NOT** break the previous design philosophies.
I also believe that namespace torch.accelerator is better. It lets users know that the APIs they are calling are running on an accelerator rather than CPU. This is important. Meanwhile, we can follow a simple API design principle:
1. Device-agnostic APIs should be placed under the torch.accelerator namespace and not accept a device_type optional parameter.
2. Device-specific APIs should be placed under device-specific submodules.
3. APIS required by both CPU and accelerators should be placed under the torch namespace and accept a device_type optional parameter.

Also, I list the pros and cons of **Simple Version** here:
Pros:
- `torch.accelerator.foo` will have the same input argument as `torch.xxx.foo`, bringing a better user experience;
- more concise, facilitate the developer to write a device-agnostic code.

Cons:
- no obvious drawbacks.

# Additional Context
I list the new APIs here:
```python
torch.accelerator.is_available() -> bool:
torch.accelerator.current_accelerator() -> torch.device:
torch.accelerator.device_count() -> int:
torch.accelerator.current_device_idx() -> int:
torch.accelerator.set_device_idx(device: Union[torch.device, str, int, None]) -> None:
torch.accelerator.current_stream(device: Union[torch.device, str, int, None]) -> torch.Stream:
torch.accelerator.set_stream(stream: torch.Stream) -> None:
torch.accelerator.synchronize(device: Union[torch.device, str, int, None]) -> None:
```
According to the discussion with Alban, we decide to change the API name `set_device` to `set_device_idx` and `current_device` to `current_device_idx` for more explicit. And will submit other PR to support device and stream context manager.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132204
Approved by: https://github.com/EikanWang, https://github.com/abhilash1910, https://github.com/gujinghui, https://github.com/albanD
2024-10-27 10:37:09 +00:00
Richard Barnes
42994234a6 std::value/std::type -> std::_v/std::_t (#138746)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138746
Approved by: https://github.com/cyyever, https://github.com/malfet
2024-10-26 20:59:24 +00:00
Richard Barnes
69af467d4f Eliminate c10::value_or_else (#138818)
Test Plan: Sandcastle

Differential Revision: D64857418

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138818
Approved by: https://github.com/malfet, https://github.com/Skylion007
2024-10-25 17:59:01 +00:00
Richard Barnes
8f62832189 c10::nullopt -> std::nullopt (#138701)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138701
Approved by: https://github.com/Skylion007, https://github.com/malfet
2024-10-24 15:03:32 +00:00
Xu Han
96b30dcb25 [Windows][cpu] mkl use mimalloc as allocator on Windows (#138419)
We did a lot of optimization for PyTorch Windows, and we got good progress of it. But still some models have performance gap between PyTorch Windows and PyTorch Linux. Ref: https://pytorch.org/blog/performance-boost-windows/#conclusion
From the blog conclusion, we found the `ResNet50` is typical case of it.

Let's focus on the `ResNet50`, and collect the profiling log:
```cmd
(nightly) D:\xu_git\dnnl_cb>python test_script_resnet50.py
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                             Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                  model_inference         3.91%     682.427ms       100.00%       17.448s       17.448s             1
                     aten::conv2d         0.18%      30.906ms        64.79%       11.305s       2.133ms          5300
                aten::convolution         0.45%      78.031ms        64.62%       11.275s       2.127ms          5300
               aten::_convolution         0.30%      51.670ms        64.17%       11.196s       2.113ms          5300
         aten::mkldnn_convolution        63.58%       11.093s        63.87%       11.145s       2.103ms          5300
                 aten::batch_norm         0.13%      23.536ms        20.10%        3.506s     661.580us          5300
     aten::_batch_norm_impl_index         0.28%      49.486ms        19.96%        3.483s     657.139us          5300
          aten::native_batch_norm        19.26%        3.360s        19.64%        3.427s     646.615us          5300
                 aten::max_pool2d         0.01%       1.038ms         5.84%        1.018s      10.181ms           100
    aten::max_pool2d_with_indices         5.83%        1.017s         5.83%        1.017s      10.171ms           100
                       aten::add_         3.38%     588.907ms         3.38%     588.907ms      85.349us          6900
                      aten::relu_         0.35%      60.358ms         1.67%     292.155ms      59.624us          4900
                 aten::clamp_min_         1.33%     231.797ms         1.33%     231.797ms      47.306us          4900
                      aten::empty         0.46%      80.195ms         0.46%      80.195ms       1.513us         53000
                     aten::linear         0.01%     927.300us         0.23%      39.353ms     393.532us           100
                      aten::addmm         0.20%      35.379ms         0.21%      37.016ms     370.155us           100
                 aten::empty_like         0.12%      20.455ms         0.17%      29.976ms       5.656us          5300
                aten::as_strided_         0.11%      18.830ms         0.11%      18.830ms       3.553us          5300
        aten::adaptive_avg_pool2d         0.00%     419.900us         0.08%      14.265ms     142.647us           100
                       aten::mean         0.01%       1.737ms         0.08%      13.845ms     138.448us           100
                        aten::sum         0.05%       8.113ms         0.05%       8.648ms      86.479us           100
                    aten::resize_         0.03%       5.182ms         0.03%       5.182ms       0.978us          5300
                       aten::div_         0.01%       1.445ms         0.02%       3.460ms      34.600us           100
                         aten::to         0.00%     337.000us         0.01%       2.015ms      20.154us           100
                   aten::_to_copy         0.01%     977.500us         0.01%       1.678ms      16.784us           100
                      aten::copy_         0.01%       1.474ms         0.01%       1.474ms       7.371us           200
                          aten::t         0.00%     775.900us         0.01%       1.410ms      14.104us           100
                    aten::flatten         0.00%     420.900us         0.01%       1.311ms      13.106us           100
                       aten::view         0.01%     889.700us         0.01%     889.700us       8.897us           100
                  aten::transpose         0.00%     410.700us         0.00%     634.500us       6.345us           100
                     aten::expand         0.00%     496.800us         0.00%     566.800us       5.668us           100
                      aten::fill_         0.00%     534.800us         0.00%     534.800us       5.348us           100
                 aten::as_strided         0.00%     293.800us         0.00%     293.800us       1.469us           200
              aten::empty_strided         0.00%     241.700us         0.00%     241.700us       2.417us           100
               aten::resolve_conj         0.00%      54.800us         0.00%      54.800us       0.274us           200
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 17.448s

Execution time: 20.02380895614624
```
We found the major kernel consume CPU resource is `aten::mkldnn_convolution`. It was dispatched to `MKLDNN`.
Acturally, we had optimized memory allocation via integrated mimalloc to pytorch C10 module. It helps PyTorch Windows boost a lot, but it does not cover `MKL` and `MKLDNN`'s intermediary temporary memory.
We still have potential to improve PyTorch Windows performance via optimize `MKL` and `MKLDNN`'s intermediary temporary memory.

So, I discussed with Intel MKL team, and get a method to register high performance memory allocation API to MKL, and it would help MKL to boost memory performance. Please check the online document: https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-windows/2023-0/redefining-memory-functions.html

This PR is optimize MKL memory alloction performance on Windows, via register mi_malloc to MKL. PR Changes:
1. Add cmake option: `USE_MIMALLOC_ON_MKL`, It is sub-option of `USE_MIMALLOC`.
2. Wrap and export mi_malloc APIs in C10, when `USE_MIMALLOC_ON_MKL` is `ON`.
3. Add MklAllocationHelp.cpp to register allocation APIs to MKL, when `USE_MIMALLOC_ON_MKL` is `ON`.

For `oneDNN`, it is still tracking in this proposal: https://github.com/oneapi-src/oneDNN/issues/1898

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138419
Approved by: https://github.com/jgong5, https://github.com/ezyang
2024-10-24 05:29:47 +00:00
Richard Barnes
dbd6ada8c3 Clean up a c10::optional and fix documentation (#138700)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138700
Approved by: https://github.com/Skylion007
2024-10-23 20:42:28 +00:00
cyy
38d3c27849 [1/N] Enable cppcoreguidelines-special-member-functions (#137405)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137405
Approved by: https://github.com/ezyang
2024-10-23 00:16:53 +00:00
PyTorch MergeBot
fc9093c3d2 Revert "Remove C10_DEPRECATED (#138406)"
This reverts commit 70ec86d754.

Reverted https://github.com/pytorch/pytorch/pull/138406 on behalf of https://github.com/wdvr due to failing internal tests - see D64714374 ([comment](https://github.com/pytorch/pytorch/pull/138406#issuecomment-2429912896))
2024-10-22 18:00:41 +00:00
Richard Barnes
70ec86d754 Remove C10_DEPRECATED (#138406)
Looking in the code I see
```
// NB: __cplusplus doesn't work for MSVC, so for now MSVC always uses
// the "__declspec(deprecated)" implementation and not the C++14
// "[[deprecated]]" attribute. We tried enabling "[[deprecated]]" for C++14 on
// MSVC, but ran into issues with some older MSVC versions.
```
But looking at the [MSVC C++ support table](https://learn.microsoft.com/en-us/cpp/overview/visual-cpp-language-conformance?view=msvc-170) I see that the `[[deprecated]]` attribute is supported as of MSVC 2015 and that the vast majority of C++17 features became supported in MSVC 2015 _or later_.

Since PyTorch is C++17 now, I infer that PyTorch must not support versions of MSVC earlier than MSVC 2015, so the versions of MSVC supported by PyTorch must support `[[deprecated]]`.

Therefore, since we are finished deprecating old MSVCs we can deprecate `C10_DEPRECATED`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138406
Approved by: https://github.com/cyyever, https://github.com/malfet
2024-10-21 20:57:27 +00:00
cyy
a05b64a38f [5/N] Fix extra warnings brought by clang-tidy-17 (#138403)
Follows #137983
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138403
Approved by: https://github.com/ezyang
2024-10-21 02:59:54 +00:00
Richard Barnes
fddabc6e0b C10_UNUSED to [[maybe_unused]] (#6357) (#138364)
Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/6357

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138364
Approved by: https://github.com/Skylion007, https://github.com/eqy
2024-10-19 13:17:43 +00:00
Richard Barnes
542f7c8383 Eliminate C10_NODISCARD (#138336)
Test Plan: Sandcastle

Reviewed By: swolchok

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138336
Approved by: https://github.com/Skylion007
2024-10-19 02:54:06 +00:00
Jerry Zhang
6d8c9be54b [reland] Add int1 to int7 dtypes (#137928)
Summary:
Similar to https://github.com/pytorch/pytorch/pull/117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137928
Approved by: https://github.com/malfet
2024-10-18 02:02:08 +00:00
cyy
8c860aef0d [Reland][Environment Variable][3/N] Use thread-safe getenv functions (#137942)
Reland of #137328, which was reverted due to reverting a dependent PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137942
Approved by: https://github.com/eqy
2024-10-15 07:47:24 +00:00
PyTorch MergeBot
df0c2f5cae Revert "[Environment Variable][3/N] Use thread-safe getenv wrapper (#137328)"
This reverts commit 25ac5652d0.

Reverted https://github.com/pytorch/pytorch/pull/137328 on behalf of https://github.com/clee2000 due to need to revert this in order to revert #133896, please rebase and reland, sorry for the churn ([comment](https://github.com/pytorch/pytorch/pull/137328#issuecomment-2412143739))
2024-10-14 20:22:26 +00:00
yanbing-j
561f07fae7 Warn users of mkldnn device usage (#137553)
In https://github.com/pytorch/pytorch/issues/136831, user will use mkldnn device to generate tensor, while mkldnn device is no longer used as device type, and only mkldnn layout is used.

We plan to remove mkldnn device related code in the future release. This PR is to warn users not to use mkldnn device.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137553
Approved by: https://github.com/jgong5, https://github.com/ezyang
2024-10-12 13:42:12 +00:00
cyyever
25ac5652d0 [Environment Variable][3/N] Use thread-safe getenv wrapper (#137328)
Follows #124485

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137328
Approved by: https://github.com/eqy
2024-10-11 23:23:57 +00:00
eellison
8893881867 Invalidate StorageImpl instances when tensor is overwritten with cudagraphs (#125264)
Fixes #104435

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125264
Approved by: https://github.com/ezyang

Co-authored-by: eellison <elias.ellison@gmail.com>
2024-10-09 00:05:52 +00:00
cyy
a2396b2dd8 [2/N] Fix extra warnings brought by clang-tidy-17 (#137459)
Follows #137407

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137459
Approved by: https://github.com/Skylion007
2024-10-08 19:05:02 +00:00
PyTorch MergeBot
7e8dace0de Revert "[ROCm] remove caffe2 from hipify (#137157)"
This reverts commit 40d8260745.

Reverted https://github.com/pytorch/pytorch/pull/137157 on behalf of https://github.com/xw285cornell due to this is breaking internal where we still use caffe2 ([comment](https://github.com/pytorch/pytorch/pull/137157#issuecomment-2400466131))
2024-10-08 17:45:45 +00:00
PyTorch MergeBot
796c3c3415 Revert "Disallow FakeTensor.data_ptr access in eager mode (#137221)"
This reverts commit 7e13e7dd7e.

Reverted https://github.com/pytorch/pytorch/pull/137221 on behalf of https://github.com/jovianjaison due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/137221#issuecomment-2397957081))
2024-10-07 21:46:13 +00:00
Jeff Daily
40d8260745 [ROCm] remove caffe2 from hipify (#137157)
- Remove all "MasqueradingAsCUDA" files and classes.
- Do not rename "CUDA" classes to "HIP".

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137157
Approved by: https://github.com/eqy
2024-10-05 12:48:54 +00:00
rzou
7e13e7dd7e Disallow FakeTensor.data_ptr access in eager mode (#137221)
Previously we raised a deprecation warning (beginning PyTorch 2.4). Now
that we are on 2.6, we're completing the deprecation and disallowing
this behavior.

Test Plan:
- tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137221
Approved by: https://github.com/albanD, https://github.com/eellison
2024-10-03 23:47:55 +00:00
PyTorch MergeBot
2ef1454189 Revert "Add int1 to int7 dtypes (#136301)"
This reverts commit bfa16a161d.

Reverted https://github.com/pytorch/pytorch/pull/136301 on behalf of https://github.com/PaliC due to causing internal failures ([comment](https://github.com/pytorch/pytorch/pull/136301#issuecomment-2384119600))
2024-09-30 20:50:49 +00:00
Jerry Zhang
bfa16a161d Add int1 to int7 dtypes (#136301)
Summary:
Similar to https://github.com/pytorch/pytorch/pull/117208, we want to add int1 to int7 for edge use cases
for weight quantization (https://www.internalfb.com/diff/D62464487)

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136301
Approved by: https://github.com/ezyang
2024-09-28 02:08:33 +00:00
Jessica Vandebon
baff86dafb [MTIA tensor] allow shallow copy between CPU and MTIA tensors (#135871)
Reviewed By: egienvalue, hanzlfs

Differential Revision: D61662214

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135871
Approved by: https://github.com/egienvalue, https://github.com/nautsimon
2024-09-13 22:13:58 +00:00
Yu, Guangye
6c1da66407 [Reland] Refactor caching device allocator utils (#130923)
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy
2024-09-07 11:14:17 +00:00
Kulin Seth
144fde4fd2 [MPS] Add support for autocast in MPS (#99272)
Fixes https://github.com/pytorch/pytorch/issues/88415

Need to run inductor/test_cpu_select_algorithm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272
Approved by: https://github.com/malfet

Co-authored-by: Siddharth Kotapati <skotapati@apple.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Roy Hvaara <roy@lightyear.no>
2024-09-05 23:23:17 +00:00
PyTorch MergeBot
e55c0f59e5 Revert "[Reland] Refactor caching device allocator utils (#130923)"
This reverts commit 9809080b9e.

Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/kit1980 due to breaking internal builds - Error: Relocation overflow has occured ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2332640961))
2024-09-05 21:16:14 +00:00
rzou
d7b57c4d63 Fix tensor.data access under inference_mode and compile (#134878)
Fixes https://github.com/pytorch/pytorch/issues/134798

In the regular Tensor case, when you call Tensor.data, there's a check
for if inference mode is active. If it is active, then we don't set the
version counter. We replicate this check for Tensor Subclasses (the bug
was we were trying to set the version counter on a FakeTensor in
inference_mode).

Test Plan:
- new test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134878
Approved by: https://github.com/bdhirsh
2024-09-04 17:55:41 +00:00
FFFrog
5690f003a6 C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED and C10_DIAGNOST should be used in pairs (#135004)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135004
Approved by: https://github.com/aaronenyeshi
2024-09-04 13:14:23 +00:00
Yu, Guangye
9809080b9e [Reland] Refactor caching device allocator utils (#130923)
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy
2024-09-04 05:31:08 +00:00
PyTorch MergeBot
2c88a923a7 Revert "Refactor caching device allocator utils (#130923)"
This reverts commit c45ca8092d.

Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to be causing internal tests to fail with errors like `error: no type named 'DeviceStats' in namespace 'xxx::xxx:xxxAllocator'; did you mean 'DeviceStatus'?` ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2315730155))
2024-08-28 15:56:08 +00:00
Yu, Guangye
c45ca8092d Refactor caching device allocator utils (#130923)
# Motivation
Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage.
This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy
2024-08-28 01:35:23 +00:00
Yuanhao Ji
44dadf2506 [Fix] Check name when registering privateuse1 backend (#134071)
do some checks when registering privateuse1 backend to avoid using in-tree deivce names

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134071
Approved by: https://github.com/albanD
2024-08-27 20:28:30 +00:00
Guilherme Leobas
c518b50c4c Remove functorch dispatch keys in legacyExtractDispatchKey (#133018)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133018
Approved by: https://github.com/zou3519
2024-08-13 15:32:01 +00:00
Yuanhao Ji
343071cd96 Fix privateuse1 backend name case (#132980)
### Problem

`get_privateuse1_backend(bool lower_case)` always returns a lower case name and `lower_case` is not used.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132980
Approved by: https://github.com/albanD
2024-08-10 07:39:54 +00:00
PyTorch MergeBot
2764bee942 Revert "[MPS] Add support for autocast in MPS (#99272)"
This reverts commit 6919e8baab.

Reverted https://github.com/pytorch/pytorch/pull/99272 on behalf of https://github.com/clee2000 due to Broke test/inductor/test_cpu_select_algorithm.py::TestSelectAlgorithmCPU::test_quantized_linear_amx_batch_size_3_in_features_128_out_features_64_bias_False_cpu on sm86 jobs [GH job link](https://github.com/pytorch/pytorch/actions/runs/10252979157/job/28367091621) [HUD commit link](6919e8baab) Not caught on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/99272#issuecomment-2269808857))
2024-08-05 19:59:04 +00:00
Kulin Seth
6919e8baab [MPS] Add support for autocast in MPS (#99272)
Fixes https://github.com/pytorch/pytorch/issues/88415

Co-authored-by: Siddharth Kotapati <skotapati@apple.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272
Approved by: https://github.com/malfet
2024-08-05 17:02:30 +00:00
soulitzer
82b6480b0a Update SavedTensorHooks TLS stack to use SafePyObject (#131700)
Previously, we must manually manage refcounting when updating the TLS saved variable stack. With this PR, things should be handled automatically by the SafePyObject.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131700
Approved by: https://github.com/albanD
2024-08-02 16:27:16 +00:00
cyy
b9cb1abf65 [12/N] Use std::optional (#132361)
Follows #132396

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132361
Approved by: https://github.com/eqy
2024-08-02 13:46:46 +00:00
Xu Han
6c1f1563e1 [inductor] fix UndefinedTensorImpl singleton can't export on Windows. (#132326)
This PR fix the `UndefinedTensorImpl::_singleton` can't export on Windows issue.
Snapshot:
<img width="1346" alt="image" src="https://github.com/user-attachments/assets/b34256ac-a0ae-473b-89e6-10d755eaad24">

The reason is MSVC can't export class static data to external linkage, ref: https://learn.microsoft.com/en-us/cpp/cpp/using-dllimport-and-dllexport-in-cpp-classes?view=msvc-170#_pluslang_using_dllimport_and_dllexport_in_c2b2bselectivememberimportexport

I use another singleton implenmentation to avoid the issue, for Windows.

Since this PR, cpp_wrapper on Windows would start to work.
<img width="1916" alt="image" src="https://github.com/user-attachments/assets/c1d7d7e7-64ca-4c6d-9fb7-e3b91e675b58">

Next step, I will enable the cpp_wrapper UTs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132326
Approved by: https://github.com/jgong5, https://github.com/desertfire
2024-08-01 13:37:12 +00:00
Xu Han
39a3c98aa6 [inductor] fix scalar miss constuctor for long type. (#132117)
Fix `long` to `c10::scalar` convert issue.

![image](https://github.com/user-attachments/assets/fc44a170-e293-4688-a185-d189484f6638)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132117
Approved by: https://github.com/jgong5, https://github.com/desertfire
2024-07-31 15:40:48 +00:00
albanD
466ea8ce54 Add fallback() to torch.library (#131707)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131707
Approved by: https://github.com/zou3519
2024-07-27 18:02:35 +00:00
cyy
f83ef69b84 Fix typo in assignment operators (#131890)
Most typos were introduced in #131077
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131890
Approved by: https://github.com/Skylion007
2024-07-27 11:13:42 +00:00
cyyever
451462dbff [1/N] Add missing constructors or assignment operators (#131077)
Just mark them as deleted in most cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131077
Approved by: https://github.com/ezyang
2024-07-24 12:09:39 +00:00