pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
cyy	3907f36808	Turn some variables and functions into static (#136847 ) Re-check some files and mark variables and functions into static and fix other warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136847 Approved by: https://github.com/ezyang	2024-10-29 17:01:56 +00:00
cyy	e201460f8a	[2/N] Fix Wextra-semi warnings (#139142 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139142 Approved by: https://github.com/ezyang	2024-10-29 08:14:37 +00:00
cyy	383d9e3de6	[4/N] Fix cppcoreguidelines-special-member-functions warnings (#139027 ) Follows #138796 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139027 Approved by: https://github.com/ezyang	2024-10-29 00:18:18 +00:00
cyy	f4f0f2995d	Fix Wextra-semi warnings (#139000 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139000 Approved by: https://github.com/ezyang	2024-10-28 21:48:51 +00:00
Yu, Guangye	40c098f731	Introduce a device-agnostic runtime API design (#132204 ) # Motivation According to [[RFC]A device-agnostic Python runtime API design for stream-based accelerators](https://github.com/pytorch/pytorch/issues/128403), this PR intends to introduce a device-agnostic runtime API design. I personally prefer the Simple Version APIs that no longer accept the device type as an input argument. It means we will leverage `getAccelerator` to fetch the current accelerator. And it is flexible to expand these APIs to handle multiple types of accelerator scenarios. The design does NOT break the previous design philosophies. I also believe that namespace torch.accelerator is better. It lets users know that the APIs they are calling are running on an accelerator rather than CPU. This is important. Meanwhile, we can follow a simple API design principle: 1. Device-agnostic APIs should be placed under the torch.accelerator namespace and not accept a device_type optional parameter. 2. Device-specific APIs should be placed under device-specific submodules. 3. APIS required by both CPU and accelerators should be placed under the torch namespace and accept a device_type optional parameter. Also, I list the pros and cons of Simple Version here: Pros: - `torch.accelerator.foo` will have the same input argument as `torch.xxx.foo`, bringing a better user experience; - more concise, facilitate the developer to write a device-agnostic code. Cons: - no obvious drawbacks. # Additional Context I list the new APIs here: ```python torch.accelerator.is_available() -> bool: torch.accelerator.current_accelerator() -> torch.device: torch.accelerator.device_count() -> int: torch.accelerator.current_device_idx() -> int: torch.accelerator.set_device_idx(device: Union[torch.device, str, int, None]) -> None: torch.accelerator.current_stream(device: Union[torch.device, str, int, None]) -> torch.Stream: torch.accelerator.set_stream(stream: torch.Stream) -> None: torch.accelerator.synchronize(device: Union[torch.device, str, int, None]) -> None: ``` According to the discussion with Alban, we decide to change the API name `set_device` to `set_device_idx` and `current_device` to `current_device_idx` for more explicit. And will submit other PR to support device and stream context manager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132204 Approved by: https://github.com/EikanWang, https://github.com/abhilash1910, https://github.com/gujinghui, https://github.com/albanD	2024-10-27 10:37:09 +00:00
Richard Barnes	42994234a6	std::value/std::type -> std::_v/std::_t (#138746 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138746 Approved by: https://github.com/cyyever, https://github.com/malfet	2024-10-26 20:59:24 +00:00
Richard Barnes	69af467d4f	Eliminate c10::value_or_else (#138818 ) Test Plan: Sandcastle Differential Revision: D64857418 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138818 Approved by: https://github.com/malfet, https://github.com/Skylion007	2024-10-25 17:59:01 +00:00
Richard Barnes	8f62832189	c10::nullopt -> std::nullopt (#138701 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138701 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-10-24 15:03:32 +00:00
Xu Han	96b30dcb25	[Windows][cpu] mkl use mimalloc as allocator on Windows (#138419 ) We did a lot of optimization for PyTorch Windows, and we got good progress of it. But still some models have performance gap between PyTorch Windows and PyTorch Linux. Ref: https://pytorch.org/blog/performance-boost-windows/#conclusion From the blog conclusion, we found the `ResNet50` is typical case of it. Let's focus on the `ResNet50`, and collect the profiling log: ```cmd (nightly) D:\xu_git\dnnl_cb>python test_script_resnet50.py --------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls --------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.91% 682.427ms 100.00% 17.448s 17.448s 1 aten::conv2d 0.18% 30.906ms 64.79% 11.305s 2.133ms 5300 aten::convolution 0.45% 78.031ms 64.62% 11.275s 2.127ms 5300 aten::_convolution 0.30% 51.670ms 64.17% 11.196s 2.113ms 5300 aten::mkldnn_convolution 63.58% 11.093s 63.87% 11.145s 2.103ms 5300 aten::batch_norm 0.13% 23.536ms 20.10% 3.506s 661.580us 5300 aten::_batch_norm_impl_index 0.28% 49.486ms 19.96% 3.483s 657.139us 5300 aten::native_batch_norm 19.26% 3.360s 19.64% 3.427s 646.615us 5300 aten::max_pool2d 0.01% 1.038ms 5.84% 1.018s 10.181ms 100 aten::max_pool2d_with_indices 5.83% 1.017s 5.83% 1.017s 10.171ms 100 aten::add_ 3.38% 588.907ms 3.38% 588.907ms 85.349us 6900 aten::relu_ 0.35% 60.358ms 1.67% 292.155ms 59.624us 4900 aten::clamp_min_ 1.33% 231.797ms 1.33% 231.797ms 47.306us 4900 aten::empty 0.46% 80.195ms 0.46% 80.195ms 1.513us 53000 aten::linear 0.01% 927.300us 0.23% 39.353ms 393.532us 100 aten::addmm 0.20% 35.379ms 0.21% 37.016ms 370.155us 100 aten::empty_like 0.12% 20.455ms 0.17% 29.976ms 5.656us 5300 aten::as_strided_ 0.11% 18.830ms 0.11% 18.830ms 3.553us 5300 aten::adaptive_avg_pool2d 0.00% 419.900us 0.08% 14.265ms 142.647us 100 aten::mean 0.01% 1.737ms 0.08% 13.845ms 138.448us 100 aten::sum 0.05% 8.113ms 0.05% 8.648ms 86.479us 100 aten::resize_ 0.03% 5.182ms 0.03% 5.182ms 0.978us 5300 aten::div_ 0.01% 1.445ms 0.02% 3.460ms 34.600us 100 aten::to 0.00% 337.000us 0.01% 2.015ms 20.154us 100 aten::_to_copy 0.01% 977.500us 0.01% 1.678ms 16.784us 100 aten::copy_ 0.01% 1.474ms 0.01% 1.474ms 7.371us 200 aten::t 0.00% 775.900us 0.01% 1.410ms 14.104us 100 aten::flatten 0.00% 420.900us 0.01% 1.311ms 13.106us 100 aten::view 0.01% 889.700us 0.01% 889.700us 8.897us 100 aten::transpose 0.00% 410.700us 0.00% 634.500us 6.345us 100 aten::expand 0.00% 496.800us 0.00% 566.800us 5.668us 100 aten::fill_ 0.00% 534.800us 0.00% 534.800us 5.348us 100 aten::as_strided 0.00% 293.800us 0.00% 293.800us 1.469us 200 aten::empty_strided 0.00% 241.700us 0.00% 241.700us 2.417us 100 aten::resolve_conj 0.00% 54.800us 0.00% 54.800us 0.274us 200 --------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 17.448s Execution time: 20.02380895614624 ``` We found the major kernel consume CPU resource is `aten::mkldnn_convolution`. It was dispatched to `MKLDNN`. Acturally, we had optimized memory allocation via integrated mimalloc to pytorch C10 module. It helps PyTorch Windows boost a lot, but it does not cover `MKL` and `MKLDNN`'s intermediary temporary memory. We still have potential to improve PyTorch Windows performance via optimize `MKL` and `MKLDNN`'s intermediary temporary memory. So, I discussed with Intel MKL team, and get a method to register high performance memory allocation API to MKL, and it would help MKL to boost memory performance. Please check the online document: https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-windows/2023-0/redefining-memory-functions.html This PR is optimize MKL memory alloction performance on Windows, via register mi_malloc to MKL. PR Changes: 1. Add cmake option: `USE_MIMALLOC_ON_MKL`, It is sub-option of `USE_MIMALLOC`. 2. Wrap and export mi_malloc APIs in C10, when `USE_MIMALLOC_ON_MKL` is `ON`. 3. Add MklAllocationHelp.cpp to register allocation APIs to MKL, when `USE_MIMALLOC_ON_MKL` is `ON`. For `oneDNN`, it is still tracking in this proposal: https://github.com/oneapi-src/oneDNN/issues/1898 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138419 Approved by: https://github.com/jgong5, https://github.com/ezyang	2024-10-24 05:29:47 +00:00
Richard Barnes	dbd6ada8c3	Clean up a c10::optional and fix documentation (#138700 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/138700 Approved by: https://github.com/Skylion007	2024-10-23 20:42:28 +00:00
cyy	38d3c27849	[1/N] Enable cppcoreguidelines-special-member-functions (#137405 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/137405 Approved by: https://github.com/ezyang	2024-10-23 00:16:53 +00:00
PyTorch MergeBot	fc9093c3d2	Revert "Remove C10_DEPRECATED (#138406 )" This reverts commit `70ec86d754`. Reverted https://github.com/pytorch/pytorch/pull/138406 on behalf of https://github.com/wdvr due to failing internal tests - see D64714374 ([comment](https://github.com/pytorch/pytorch/pull/138406#issuecomment-2429912896))	2024-10-22 18:00:41 +00:00
Richard Barnes	70ec86d754	Remove C10_DEPRECATED (#138406 ) Looking in the code I see ``` // NB: __cplusplus doesn't work for MSVC, so for now MSVC always uses // the "__declspec(deprecated)" implementation and not the C++14 // "[[deprecated]]" attribute. We tried enabling "[[deprecated]]" for C++14 on // MSVC, but ran into issues with some older MSVC versions. ``` But looking at the [MSVC C++ support table](https://learn.microsoft.com/en-us/cpp/overview/visual-cpp-language-conformance?view=msvc-170) I see that the `[[deprecated]]` attribute is supported as of MSVC 2015 and that the vast majority of C++17 features became supported in MSVC 2015 _or later_. Since PyTorch is C++17 now, I infer that PyTorch must not support versions of MSVC earlier than MSVC 2015, so the versions of MSVC supported by PyTorch must support `[[deprecated]]`. Therefore, since we are finished deprecating old MSVCs we can deprecate `C10_DEPRECATED`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138406 Approved by: https://github.com/cyyever, https://github.com/malfet	2024-10-21 20:57:27 +00:00
cyy	a05b64a38f	[5/N] Fix extra warnings brought by clang-tidy-17 (#138403 ) Follows #137983 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138403 Approved by: https://github.com/ezyang	2024-10-21 02:59:54 +00:00
Richard Barnes	fddabc6e0b	C10_UNUSED to [[maybe_unused]] (#6357 ) (#138364 ) Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/6357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138364 Approved by: https://github.com/Skylion007, https://github.com/eqy	2024-10-19 13:17:43 +00:00
Richard Barnes	542f7c8383	Eliminate C10_NODISCARD (#138336 ) Test Plan: Sandcastle Reviewed By: swolchok Pull Request resolved: https://github.com/pytorch/pytorch/pull/138336 Approved by: https://github.com/Skylion007	2024-10-19 02:54:06 +00:00
Jerry Zhang	6d8c9be54b	[reland] Add int1 to int7 dtypes (#137928 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137928 Approved by: https://github.com/malfet	2024-10-18 02:02:08 +00:00
cyy	8c860aef0d	[Reland][Environment Variable][3/N] Use thread-safe getenv functions (#137942 ) Reland of #137328, which was reverted due to reverting a dependent PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137942 Approved by: https://github.com/eqy	2024-10-15 07:47:24 +00:00
PyTorch MergeBot	df0c2f5cae	Revert "[Environment Variable][3/N] Use thread-safe getenv wrapper (#137328 )" This reverts commit `25ac5652d0`. Reverted https://github.com/pytorch/pytorch/pull/137328 on behalf of https://github.com/clee2000 due to need to revert this in order to revert #133896, please rebase and reland, sorry for the churn ([comment](https://github.com/pytorch/pytorch/pull/137328#issuecomment-2412143739))	2024-10-14 20:22:26 +00:00
yanbing-j	561f07fae7	Warn users of mkldnn device usage (#137553 ) In https://github.com/pytorch/pytorch/issues/136831, user will use mkldnn device to generate tensor, while mkldnn device is no longer used as device type, and only mkldnn layout is used. We plan to remove mkldnn device related code in the future release. This PR is to warn users not to use mkldnn device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137553 Approved by: https://github.com/jgong5, https://github.com/ezyang	2024-10-12 13:42:12 +00:00
cyyever	25ac5652d0	[Environment Variable][3/N] Use thread-safe getenv wrapper (#137328 ) Follows #124485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137328 Approved by: https://github.com/eqy	2024-10-11 23:23:57 +00:00
eellison	8893881867	Invalidate StorageImpl instances when tensor is overwritten with cudagraphs (#125264 ) Fixes #104435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125264 Approved by: https://github.com/ezyang Co-authored-by: eellison <elias.ellison@gmail.com>	2024-10-09 00:05:52 +00:00
cyy	a2396b2dd8	[2/N] Fix extra warnings brought by clang-tidy-17 (#137459 ) Follows #137407 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137459 Approved by: https://github.com/Skylion007	2024-10-08 19:05:02 +00:00
PyTorch MergeBot	7e8dace0de	Revert "[ROCm] remove caffe2 from hipify (#137157 )" This reverts commit `40d8260745`. Reverted https://github.com/pytorch/pytorch/pull/137157 on behalf of https://github.com/xw285cornell due to this is breaking internal where we still use caffe2 ([comment](https://github.com/pytorch/pytorch/pull/137157#issuecomment-2400466131))	2024-10-08 17:45:45 +00:00
PyTorch MergeBot	796c3c3415	Revert "Disallow FakeTensor.data_ptr access in eager mode (#137221 )" This reverts commit `7e13e7dd7e`. Reverted https://github.com/pytorch/pytorch/pull/137221 on behalf of https://github.com/jovianjaison due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/137221#issuecomment-2397957081))	2024-10-07 21:46:13 +00:00
Jeff Daily	40d8260745	[ROCm] remove caffe2 from hipify (#137157 ) - Remove all "MasqueradingAsCUDA" files and classes. - Do not rename "CUDA" classes to "HIP". Pull Request resolved: https://github.com/pytorch/pytorch/pull/137157 Approved by: https://github.com/eqy	2024-10-05 12:48:54 +00:00
rzou	7e13e7dd7e	Disallow FakeTensor.data_ptr access in eager mode (#137221 ) Previously we raised a deprecation warning (beginning PyTorch 2.4). Now that we are on 2.6, we're completing the deprecation and disallowing this behavior. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/137221 Approved by: https://github.com/albanD, https://github.com/eellison	2024-10-03 23:47:55 +00:00
PyTorch MergeBot	2ef1454189	Revert "Add int1 to int7 dtypes (#136301 )" This reverts commit `bfa16a161d`. Reverted https://github.com/pytorch/pytorch/pull/136301 on behalf of https://github.com/PaliC due to causing internal failures ([comment](https://github.com/pytorch/pytorch/pull/136301#issuecomment-2384119600))	2024-09-30 20:50:49 +00:00
Jerry Zhang	bfa16a161d	Add int1 to int7 dtypes (#136301 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/117208, we want to add int1 to int7 for edge use cases for weight quantization (https://www.internalfb.com/diff/D62464487) Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/136301 Approved by: https://github.com/ezyang	2024-09-28 02:08:33 +00:00
Jessica Vandebon	baff86dafb	[MTIA tensor] allow shallow copy between CPU and MTIA tensors (#135871 ) Reviewed By: egienvalue, hanzlfs Differential Revision: D61662214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135871 Approved by: https://github.com/egienvalue, https://github.com/nautsimon	2024-09-13 22:13:58 +00:00
Yu, Guangye	6c1da66407	[Reland] Refactor caching device allocator utils (#130923 ) # Motivation Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage. This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy	2024-09-07 11:14:17 +00:00
Kulin Seth	144fde4fd2	[MPS] Add support for autocast in MPS (#99272 ) Fixes https://github.com/pytorch/pytorch/issues/88415 Need to run inductor/test_cpu_select_algorithm Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272 Approved by: https://github.com/malfet Co-authored-by: Siddharth Kotapati <skotapati@apple.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: Roy Hvaara <roy@lightyear.no>	2024-09-05 23:23:17 +00:00
PyTorch MergeBot	e55c0f59e5	Revert "[Reland] Refactor caching device allocator utils (#130923 )" This reverts commit `9809080b9e`. Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/kit1980 due to breaking internal builds - Error: Relocation overflow has occured ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2332640961))	2024-09-05 21:16:14 +00:00
rzou	d7b57c4d63	Fix tensor.data access under inference_mode and compile (#134878 ) Fixes https://github.com/pytorch/pytorch/issues/134798 In the regular Tensor case, when you call Tensor.data, there's a check for if inference mode is active. If it is active, then we don't set the version counter. We replicate this check for Tensor Subclasses (the bug was we were trying to set the version counter on a FakeTensor in inference_mode). Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/134878 Approved by: https://github.com/bdhirsh	2024-09-04 17:55:41 +00:00
FFFrog	5690f003a6	C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED and C10_DIAGNOST should be used in pairs (#135004 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135004 Approved by: https://github.com/aaronenyeshi	2024-09-04 13:14:23 +00:00
Yu, Guangye	9809080b9e	[Reland] Refactor caching device allocator utils (#130923 ) # Motivation Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage. This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy	2024-09-04 05:31:08 +00:00
PyTorch MergeBot	2c88a923a7	Revert "Refactor caching device allocator utils (#130923 )" This reverts commit `c45ca8092d`. Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to be causing internal tests to fail with errors like `error: no type named 'DeviceStats' in namespace 'xxx::xxx:xxxAllocator'; did you mean 'DeviceStatus'?` ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2315730155))	2024-08-28 15:56:08 +00:00
Yu, Guangye	c45ca8092d	Refactor caching device allocator utils (#130923 ) # Motivation Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage. This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy	2024-08-28 01:35:23 +00:00
Yuanhao Ji	44dadf2506	[Fix] Check name when registering privateuse1 backend (#134071 ) do some checks when registering privateuse1 backend to avoid using in-tree deivce names Pull Request resolved: https://github.com/pytorch/pytorch/pull/134071 Approved by: https://github.com/albanD	2024-08-27 20:28:30 +00:00
Guilherme Leobas	c518b50c4c	Remove functorch dispatch keys in `legacyExtractDispatchKey` (#133018 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133018 Approved by: https://github.com/zou3519	2024-08-13 15:32:01 +00:00
Yuanhao Ji	343071cd96	Fix privateuse1 backend name case (#132980 ) ### Problem `get_privateuse1_backend(bool lower_case)` always returns a lower case name and `lower_case` is not used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132980 Approved by: https://github.com/albanD	2024-08-10 07:39:54 +00:00
PyTorch MergeBot	2764bee942	Revert "[MPS] Add support for autocast in MPS (#99272 )" This reverts commit `6919e8baab`. Reverted https://github.com/pytorch/pytorch/pull/99272 on behalf of https://github.com/clee2000 due to Broke test/inductor/test_cpu_select_algorithm.py::TestSelectAlgorithmCPU::test_quantized_linear_amx_batch_size_3_in_features_128_out_features_64_bias_False_cpu on sm86 jobs [GH job link](https://github.com/pytorch/pytorch/actions/runs/10252979157/job/28367091621) [HUD commit link](`6919e8baab`) Not caught on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/99272#issuecomment-2269808857))	2024-08-05 19:59:04 +00:00
Kulin Seth	6919e8baab	[MPS] Add support for autocast in MPS (#99272 ) Fixes https://github.com/pytorch/pytorch/issues/88415 Co-authored-by: Siddharth Kotapati <skotapati@apple.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272 Approved by: https://github.com/malfet	2024-08-05 17:02:30 +00:00
soulitzer	82b6480b0a	Update SavedTensorHooks TLS stack to use SafePyObject (#131700 ) Previously, we must manually manage refcounting when updating the TLS saved variable stack. With this PR, things should be handled automatically by the SafePyObject. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131700 Approved by: https://github.com/albanD	2024-08-02 16:27:16 +00:00
cyy	b9cb1abf65	[12/N] Use std::optional (#132361 ) Follows #132396 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132361 Approved by: https://github.com/eqy	2024-08-02 13:46:46 +00:00
Xu Han	6c1f1563e1	[inductor] fix UndefinedTensorImpl singleton can't export on Windows. (#132326 ) This PR fix the `UndefinedTensorImpl::_singleton` can't export on Windows issue. Snapshot: <img width="1346" alt="image" src="https://github.com/user-attachments/assets/b34256ac-a0ae-473b-89e6-10d755eaad24"> The reason is MSVC can't export class static data to external linkage, ref: https://learn.microsoft.com/en-us/cpp/cpp/using-dllimport-and-dllexport-in-cpp-classes?view=msvc-170#_pluslang_using_dllimport_and_dllexport_in_c2b2bselectivememberimportexport I use another singleton implenmentation to avoid the issue, for Windows. Since this PR, cpp_wrapper on Windows would start to work. <img width="1916" alt="image" src="https://github.com/user-attachments/assets/c1d7d7e7-64ca-4c6d-9fb7-e3b91e675b58"> Next step, I will enable the cpp_wrapper UTs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132326 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-08-01 13:37:12 +00:00
Xu Han	39a3c98aa6	[inductor] fix scalar miss constuctor for long type. (#132117 ) Fix `long` to `c10::scalar` convert issue. ![image](https://github.com/user-attachments/assets/fc44a170-e293-4688-a185-d189484f6638) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132117 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-07-31 15:40:48 +00:00
albanD	466ea8ce54	Add fallback() to torch.library (#131707 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131707 Approved by: https://github.com/zou3519	2024-07-27 18:02:35 +00:00
cyy	f83ef69b84	Fix typo in assignment operators (#131890 ) Most typos were introduced in #131077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131890 Approved by: https://github.com/Skylion007	2024-07-27 11:13:42 +00:00
cyyever	451462dbff	[1/N] Add missing constructors or assignment operators (#131077 ) Just mark them as deleted in most cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131077 Approved by: https://github.com/ezyang	2024-07-24 12:09:39 +00:00

1 2 3 4 5 ...

1288 Commits