pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nikita Shulga	3a3638be50	[BE] Enable Scalar.h compilation on 32-bit system (#142235 ) By hiding ambiguous Scalar(long long) constructor behind `std::enable_if_t<sizeof(void *) == 8>` Followup after https://github.com/pytorch/pytorch/pull/141244 Test Plan: Run `printf "#include <c10/core/Scalar.h>\n c10::Scalar x(3);" \| gcc -x c++ -std=c++17 -I. -Ibuild - -c` on ARMv7 system. Before this change it failed with: ``` In file included from <stdin>:1: ./c10/core/Scalar.h:83:3: error: ‘c10::Scalar::Scalar(long long int)’ cannot be overloaded with ‘c10::Scalar::Scalar(int64_t)’ 83 \| Scalar(long long vv) : Scalar(vv, true) {} \| ^~~~~~ ./c10/core/Scalar.h:50:3: note: previous declaration ‘c10::Scalar::Scalar(int64_t)’ 50 \| Scalar(type vv) : Scalar(vv, true) {} \| ^~~~~~ ./c10/core/ScalarType.h:288:3: note: in expansion of macro ‘DEFINE_IMPLICIT_CTOR’ 288 \| _(int64_t, Long) \ \| ^ ./c10/core/Scalar.h:52:3: note: in expansion of macro ‘AT_FORALL_SCALAR_TYPES_AND7’ 52 \| AT_FORALL_SCALAR_TYPES_AND7( \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/142235 Approved by: https://github.com/Skylion007	2024-12-07 01:05:55 +00:00
Brian Hirsh	20912ba582	fix incorrect c10::SymFloat::sqrt (#141728 ) Fixes the silent correctness for SDPA in https://github.com/pytorch/pytorch/issues/141710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141728 Approved by: https://github.com/Skylion007, https://github.com/ezyang, https://github.com/drisspg ghstack dependencies: #141725	2024-12-03 23:34:16 +00:00
Edward Z. Yang	5212ec3879	Add admonition about as_float_unchecked() (#141742 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/141742 Approved by: https://github.com/bdhirsh	2024-11-28 06:25:18 +00:00
Umberto Valleriani	3becdaf8a7	[c10] Fix static_assert for 32-bit systems (#141244 ) the `__ANDROID__` macro was used as a proxy to check whether compilation is targeting a 32 or 64 bit system, causing build failure on non-android 32 bit linux targets like arm v7. This modification adjusts the check to fail if and only if int64_t and long and not the same on 64-bit systems, on systems where `sizeof(void*) == 8` Like I said in the issue #141043 , I'm not sure whether a different `Scalar` constructor should be defined in the 32 bit case. My code does not break but I'm not sure other people's code won't. Fixes #141043 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141244 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-11-28 03:11:52 +00:00
cyy	45ed7c13fa	Remove unneeded std::make_optional (#141567 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141567 Approved by: https://github.com/albanD	2024-11-28 00:05:21 +00:00
Richard Barnes	fca0f34b83	Switch c10::string_view to std::string_view (#139635 ) Shortens `string_view_starts_with` to `starts_with`. Adds some missing headers. Isolates `c10_string_view` to use with `get_fully_qualified_name`. Test Plan: Sandcastle Reviewed By: ezyang Differential Revision: D64833558 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139635 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2024-11-27 01:41:18 +00:00
cyy	2f082e1e56	[13/N] Fix extra warnings brought by clang-tidy-17 (#140897 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140897 Approved by: https://github.com/ezyang	2024-11-27 00:35:19 +00:00
cyy	1bdb92cbff	[2/N] Use thread-safe strerror (#141011 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141011 Approved by: https://github.com/ezyang	2024-11-22 07:02:30 +00:00
Scott Wolchok	f630799587	move c10::overflows to its own header (#140564 ) Working on moving `complex<Half>` to complex.h instead of Half.h; this depends on complex and isn't used particularly widely. Differential Revision: [D65888038](https://our.internmc.facebook.com/intern/diff/D65888038/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140564 Approved by: https://github.com/ezyang, https://github.com/Skylion007, https://github.com/malfet	2024-11-18 15:56:21 +00:00
fan.mo	43edb94f8a	[Quantization][PrivateUse1] Adding more support QuantizedPrivateuse1 backends (#139860 ) Here's are some explanations of this PR. 1. Changes in `aten/src/ATen/core/Tensor.cpp` and `c10/core/DispatchKey.cpp`: Support toString method for `QuantizedPrivateUse1` backend, make pytorch print out correct backend string for it. 2. Add header `DispatchStub.h` in `aten/src/ATen/native/quantized/IndexKernel.h`: If this header is not included, we can't utilize `masked_fill_kernel_quantized_stub` even we include this `IndexKernel.h` header, it would throw an error during compilation. 3. Add multiple `TORCH_API`s in `aten/src/ATen/native/quantized/AffineQuantizer.h`: these functions is useful for other privateuse1 backends supporting quantization functions, if these `TORCH_API` are missed, it would throw an error during runtime (undefined symbol) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139860 Approved by: https://github.com/bdhirsh	2024-11-18 05:09:59 +00:00
Bob Ren	602ae9cbcf	Specialize symfloats during equality checks (#140830 ) Fixes `PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=0 python test/inductor/test_torchinductor_opinfo.py TestInductorOpInfoCPU.test_comprehensive_nn_functional_local_response_norm_cpu_float32` when `specialize_float=False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140830 Approved by: https://github.com/ezyang	2024-11-17 06:35:22 +00:00
sdp	83b6d91d08	[Intel GPU] Add NestedTensorXPU to parseDispatchKey and codegen (#140461 ) Add `NestedTensorXPU` dispatch key. ``` >>> nt = torch.nested.nested_tensor([]).to("xpu") >>> nt nested_tensor([ ], device='xpu:0') >>> nt.is_xpu True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140461 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/ezyang	2024-11-14 18:54:41 +00:00
cyy	7624d625c0	[Reland][7/N] Fix Wextra-semi warning (#140342 ) Reland of #140225 to fix a change in FBCODE_CAFFE2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140342 Approved by: https://github.com/kit1980	2024-11-12 18:55:31 +00:00
PyTorch MergeBot	dbb55b448b	Revert "[7/N] Fix Wextra-semi warning (#140225 )" This reverts commit `ffb979032d`. Reverted https://github.com/pytorch/pytorch/pull/140225 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/140225#issuecomment-2469312229))	2024-11-12 00:02:06 +00:00
Bob Ren	a96aadf0a0	fix specialization logic in Scalar.h (#140280 ) Fixes `test/inductor/test_torchinductor_opinfo.py TestInductorOpInfoCUDA.test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float64` when `specialize_float=False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140280 Approved by: https://github.com/ezyang	2024-11-11 23:51:15 +00:00
cyy	ffb979032d	[7/N] Fix Wextra-semi warning (#140225 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140225 Approved by: https://github.com/ezyang	2024-11-10 14:28:10 +00:00
cyy	d558c1a047	Enable cppcoreguidelines-special-member-functions (#139132 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139132 Approved by: https://github.com/sraikund16	2024-11-06 13:42:20 +00:00
PyTorch MergeBot	10d7729333	Revert "Enable cppcoreguidelines-special-member-functions (#139132 )" This reverts commit `a9b4989c72`. Reverted https://github.com/pytorch/pytorch/pull/139132 on behalf of https://github.com/ZainRizvi due to Sorry but this fails on trunk. See inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm [GH job link](https://github.com/pytorch/pytorch/actions/runs/11699366379/job/32591132460) [HUD commit link](`22e89ea2aa`) ([comment](https://github.com/pytorch/pytorch/pull/139132#issuecomment-2459743145))	2024-11-06 13:27:42 +00:00
cyy	a9b4989c72	Enable cppcoreguidelines-special-member-functions (#139132 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139132 Approved by: https://github.com/sraikund16	2024-11-06 07:59:09 +00:00
cyy	a2bc2e38f9	Use clang-tidy 17 (#139678 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139678 Approved by: https://github.com/Skylion007	2024-11-05 16:00:25 +00:00
cyy	3179eb15ae	[1/N] Remove usage of C array (#139567 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139567 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2024-11-04 04:52:46 +00:00
cyy	3907f36808	Turn some variables and functions into static (#136847 ) Re-check some files and mark variables and functions into static and fix other warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136847 Approved by: https://github.com/ezyang	2024-10-29 17:01:56 +00:00
cyy	e201460f8a	[2/N] Fix Wextra-semi warnings (#139142 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139142 Approved by: https://github.com/ezyang	2024-10-29 08:14:37 +00:00
cyy	383d9e3de6	[4/N] Fix cppcoreguidelines-special-member-functions warnings (#139027 ) Follows #138796 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139027 Approved by: https://github.com/ezyang	2024-10-29 00:18:18 +00:00
cyy	f4f0f2995d	Fix Wextra-semi warnings (#139000 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139000 Approved by: https://github.com/ezyang	2024-10-28 21:48:51 +00:00
Yu, Guangye	40c098f731	Introduce a device-agnostic runtime API design (#132204 ) # Motivation According to [[RFC]A device-agnostic Python runtime API design for stream-based accelerators](https://github.com/pytorch/pytorch/issues/128403), this PR intends to introduce a device-agnostic runtime API design. I personally prefer the Simple Version APIs that no longer accept the device type as an input argument. It means we will leverage `getAccelerator` to fetch the current accelerator. And it is flexible to expand these APIs to handle multiple types of accelerator scenarios. The design does NOT break the previous design philosophies. I also believe that namespace torch.accelerator is better. It lets users know that the APIs they are calling are running on an accelerator rather than CPU. This is important. Meanwhile, we can follow a simple API design principle: 1. Device-agnostic APIs should be placed under the torch.accelerator namespace and not accept a device_type optional parameter. 2. Device-specific APIs should be placed under device-specific submodules. 3. APIS required by both CPU and accelerators should be placed under the torch namespace and accept a device_type optional parameter. Also, I list the pros and cons of Simple Version here: Pros: - `torch.accelerator.foo` will have the same input argument as `torch.xxx.foo`, bringing a better user experience; - more concise, facilitate the developer to write a device-agnostic code. Cons: - no obvious drawbacks. # Additional Context I list the new APIs here: ```python torch.accelerator.is_available() -> bool: torch.accelerator.current_accelerator() -> torch.device: torch.accelerator.device_count() -> int: torch.accelerator.current_device_idx() -> int: torch.accelerator.set_device_idx(device: Union[torch.device, str, int, None]) -> None: torch.accelerator.current_stream(device: Union[torch.device, str, int, None]) -> torch.Stream: torch.accelerator.set_stream(stream: torch.Stream) -> None: torch.accelerator.synchronize(device: Union[torch.device, str, int, None]) -> None: ``` According to the discussion with Alban, we decide to change the API name `set_device` to `set_device_idx` and `current_device` to `current_device_idx` for more explicit. And will submit other PR to support device and stream context manager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132204 Approved by: https://github.com/EikanWang, https://github.com/abhilash1910, https://github.com/gujinghui, https://github.com/albanD	2024-10-27 10:37:09 +00:00
Richard Barnes	42994234a6	std::value/std::type -> std::_v/std::_t (#138746 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138746 Approved by: https://github.com/cyyever, https://github.com/malfet	2024-10-26 20:59:24 +00:00
Richard Barnes	69af467d4f	Eliminate c10::value_or_else (#138818 ) Test Plan: Sandcastle Differential Revision: D64857418 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138818 Approved by: https://github.com/malfet, https://github.com/Skylion007	2024-10-25 17:59:01 +00:00
Richard Barnes	8f62832189	c10::nullopt -> std::nullopt (#138701 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138701 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-10-24 15:03:32 +00:00
Xu Han	96b30dcb25	[Windows][cpu] mkl use mimalloc as allocator on Windows (#138419 ) We did a lot of optimization for PyTorch Windows, and we got good progress of it. But still some models have performance gap between PyTorch Windows and PyTorch Linux. Ref: https://pytorch.org/blog/performance-boost-windows/#conclusion From the blog conclusion, we found the `ResNet50` is typical case of it. Let's focus on the `ResNet50`, and collect the profiling log: ```cmd (nightly) D:\xu_git\dnnl_cb>python test_script_resnet50.py --------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls --------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.91% 682.427ms 100.00% 17.448s 17.448s 1 aten::conv2d 0.18% 30.906ms 64.79% 11.305s 2.133ms 5300 aten::convolution 0.45% 78.031ms 64.62% 11.275s 2.127ms 5300 aten::_convolution 0.30% 51.670ms 64.17% 11.196s 2.113ms 5300 aten::mkldnn_convolution 63.58% 11.093s 63.87% 11.145s 2.103ms 5300 aten::batch_norm 0.13% 23.536ms 20.10% 3.506s 661.580us 5300 aten::_batch_norm_impl_index 0.28% 49.486ms 19.96% 3.483s 657.139us 5300 aten::native_batch_norm 19.26% 3.360s 19.64% 3.427s 646.615us 5300 aten::max_pool2d 0.01% 1.038ms 5.84% 1.018s 10.181ms 100 aten::max_pool2d_with_indices 5.83% 1.017s 5.83% 1.017s 10.171ms 100 aten::add_ 3.38% 588.907ms 3.38% 588.907ms 85.349us 6900 aten::relu_ 0.35% 60.358ms 1.67% 292.155ms 59.624us 4900 aten::clamp_min_ 1.33% 231.797ms 1.33% 231.797ms 47.306us 4900 aten::empty 0.46% 80.195ms 0.46% 80.195ms 1.513us 53000 aten::linear 0.01% 927.300us 0.23% 39.353ms 393.532us 100 aten::addmm 0.20% 35.379ms 0.21% 37.016ms 370.155us 100 aten::empty_like 0.12% 20.455ms 0.17% 29.976ms 5.656us 5300 aten::as_strided_ 0.11% 18.830ms 0.11% 18.830ms 3.553us 5300 aten::adaptive_avg_pool2d 0.00% 419.900us 0.08% 14.265ms 142.647us 100 aten::mean 0.01% 1.737ms 0.08% 13.845ms 138.448us 100 aten::sum 0.05% 8.113ms 0.05% 8.648ms 86.479us 100 aten::resize_ 0.03% 5.182ms 0.03% 5.182ms 0.978us 5300 aten::div_ 0.01% 1.445ms 0.02% 3.460ms 34.600us 100 aten::to 0.00% 337.000us 0.01% 2.015ms 20.154us 100 aten::_to_copy 0.01% 977.500us 0.01% 1.678ms 16.784us 100 aten::copy_ 0.01% 1.474ms 0.01% 1.474ms 7.371us 200 aten::t 0.00% 775.900us 0.01% 1.410ms 14.104us 100 aten::flatten 0.00% 420.900us 0.01% 1.311ms 13.106us 100 aten::view 0.01% 889.700us 0.01% 889.700us 8.897us 100 aten::transpose 0.00% 410.700us 0.00% 634.500us 6.345us 100 aten::expand 0.00% 496.800us 0.00% 566.800us 5.668us 100 aten::fill_ 0.00% 534.800us 0.00% 534.800us 5.348us 100 aten::as_strided 0.00% 293.800us 0.00% 293.800us 1.469us 200 aten::empty_strided 0.00% 241.700us 0.00% 241.700us 2.417us 100 aten::resolve_conj 0.00% 54.800us 0.00% 54.800us 0.274us 200 --------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 17.448s Execution time: 20.02380895614624 ``` We found the major kernel consume CPU resource is `aten::mkldnn_convolution`. It was dispatched to `MKLDNN`. Acturally, we had optimized memory allocation via integrated mimalloc to pytorch C10 module. It helps PyTorch Windows boost a lot, but it does not cover `MKL` and `MKLDNN`'s intermediary temporary memory. We still have potential to improve PyTorch Windows performance via optimize `MKL` and `MKLDNN`'s intermediary temporary memory. So, I discussed with Intel MKL team, and get a method to register high performance memory allocation API to MKL, and it would help MKL to boost memory performance. Please check the online document: https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-windows/2023-0/redefining-memory-functions.html This PR is optimize MKL memory alloction performance on Windows, via register mi_malloc to MKL. PR Changes: 1. Add cmake option: `USE_MIMALLOC_ON_MKL`, It is sub-option of `USE_MIMALLOC`. 2. Wrap and export mi_malloc APIs in C10, when `USE_MIMALLOC_ON_MKL` is `ON`. 3. Add MklAllocationHelp.cpp to register allocation APIs to MKL, when `USE_MIMALLOC_ON_MKL` is `ON`. For `oneDNN`, it is still tracking in this proposal: https://github.com/oneapi-src/oneDNN/issues/1898 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138419 Approved by: https://github.com/jgong5, https://github.com/ezyang	2024-10-24 05:29:47 +00:00
Richard Barnes	dbd6ada8c3	Clean up a c10::optional and fix documentation (#138700 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/138700 Approved by: https://github.com/Skylion007	2024-10-23 20:42:28 +00:00
cyy	38d3c27849	[1/N] Enable cppcoreguidelines-special-member-functions (#137405 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/137405 Approved by: https://github.com/ezyang	2024-10-23 00:16:53 +00:00
PyTorch MergeBot	fc9093c3d2	Revert "Remove C10_DEPRECATED (#138406 )" This reverts commit `70ec86d754`. Reverted https://github.com/pytorch/pytorch/pull/138406 on behalf of https://github.com/wdvr due to failing internal tests - see D64714374 ([comment](https://github.com/pytorch/pytorch/pull/138406#issuecomment-2429912896))	2024-10-22 18:00:41 +00:00
Richard Barnes	70ec86d754	Remove C10_DEPRECATED (#138406 ) Looking in the code I see ``` // NB: __cplusplus doesn't work for MSVC, so for now MSVC always uses // the "__declspec(deprecated)" implementation and not the C++14 // "[[deprecated]]" attribute. We tried enabling "[[deprecated]]" for C++14 on // MSVC, but ran into issues with some older MSVC versions. ``` But looking at the [MSVC C++ support table](https://learn.microsoft.com/en-us/cpp/overview/visual-cpp-language-conformance?view=msvc-170) I see that the `[[deprecated]]` attribute is supported as of MSVC 2015 and that the vast majority of C++17 features became supported in MSVC 2015 _or later_. Since PyTorch is C++17 now, I infer that PyTorch must not support versions of MSVC earlier than MSVC 2015, so the versions of MSVC supported by PyTorch must support `[[deprecated]]`. Therefore, since we are finished deprecating old MSVCs we can deprecate `C10_DEPRECATED`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138406 Approved by: https://github.com/cyyever, https://github.com/malfet	2024-10-21 20:57:27 +00:00
cyy	a05b64a38f	[5/N] Fix extra warnings brought by clang-tidy-17 (#138403 ) Follows #137983 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138403 Approved by: https://github.com/ezyang	2024-10-21 02:59:54 +00:00
Richard Barnes	fddabc6e0b	C10_UNUSED to [[maybe_unused]] (#6357 ) (#138364 ) Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/6357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138364 Approved by: https://github.com/Skylion007, https://github.com/eqy	2024-10-19 13:17:43 +00:00
Richard Barnes	542f7c8383	Eliminate C10_NODISCARD (#138336 ) Test Plan: Sandcastle Reviewed By: swolchok Pull Request resolved: https://github.com/pytorch/pytorch/pull/138336 Approved by: https://github.com/Skylion007	2024-10-19 02:54:06 +00:00
Jerry Zhang	6d8c9be54b	[reland] Add int1 to int7 dtypes (#137928 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137928 Approved by: https://github.com/malfet	2024-10-18 02:02:08 +00:00
cyy	8c860aef0d	[Reland][Environment Variable][3/N] Use thread-safe getenv functions (#137942 ) Reland of #137328, which was reverted due to reverting a dependent PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137942 Approved by: https://github.com/eqy	2024-10-15 07:47:24 +00:00
PyTorch MergeBot	df0c2f5cae	Revert "[Environment Variable][3/N] Use thread-safe getenv wrapper (#137328 )" This reverts commit `25ac5652d0`. Reverted https://github.com/pytorch/pytorch/pull/137328 on behalf of https://github.com/clee2000 due to need to revert this in order to revert #133896, please rebase and reland, sorry for the churn ([comment](https://github.com/pytorch/pytorch/pull/137328#issuecomment-2412143739))	2024-10-14 20:22:26 +00:00
yanbing-j	561f07fae7	Warn users of mkldnn device usage (#137553 ) In https://github.com/pytorch/pytorch/issues/136831, user will use mkldnn device to generate tensor, while mkldnn device is no longer used as device type, and only mkldnn layout is used. We plan to remove mkldnn device related code in the future release. This PR is to warn users not to use mkldnn device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137553 Approved by: https://github.com/jgong5, https://github.com/ezyang	2024-10-12 13:42:12 +00:00
cyyever	25ac5652d0	[Environment Variable][3/N] Use thread-safe getenv wrapper (#137328 ) Follows #124485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137328 Approved by: https://github.com/eqy	2024-10-11 23:23:57 +00:00
eellison	8893881867	Invalidate StorageImpl instances when tensor is overwritten with cudagraphs (#125264 ) Fixes #104435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125264 Approved by: https://github.com/ezyang Co-authored-by: eellison <elias.ellison@gmail.com>	2024-10-09 00:05:52 +00:00
cyy	a2396b2dd8	[2/N] Fix extra warnings brought by clang-tidy-17 (#137459 ) Follows #137407 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137459 Approved by: https://github.com/Skylion007	2024-10-08 19:05:02 +00:00
PyTorch MergeBot	7e8dace0de	Revert "[ROCm] remove caffe2 from hipify (#137157 )" This reverts commit `40d8260745`. Reverted https://github.com/pytorch/pytorch/pull/137157 on behalf of https://github.com/xw285cornell due to this is breaking internal where we still use caffe2 ([comment](https://github.com/pytorch/pytorch/pull/137157#issuecomment-2400466131))	2024-10-08 17:45:45 +00:00
PyTorch MergeBot	796c3c3415	Revert "Disallow FakeTensor.data_ptr access in eager mode (#137221 )" This reverts commit `7e13e7dd7e`. Reverted https://github.com/pytorch/pytorch/pull/137221 on behalf of https://github.com/jovianjaison due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/137221#issuecomment-2397957081))	2024-10-07 21:46:13 +00:00
Jeff Daily	40d8260745	[ROCm] remove caffe2 from hipify (#137157 ) - Remove all "MasqueradingAsCUDA" files and classes. - Do not rename "CUDA" classes to "HIP". Pull Request resolved: https://github.com/pytorch/pytorch/pull/137157 Approved by: https://github.com/eqy	2024-10-05 12:48:54 +00:00
rzou	7e13e7dd7e	Disallow FakeTensor.data_ptr access in eager mode (#137221 ) Previously we raised a deprecation warning (beginning PyTorch 2.4). Now that we are on 2.6, we're completing the deprecation and disallowing this behavior. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/137221 Approved by: https://github.com/albanD, https://github.com/eellison	2024-10-03 23:47:55 +00:00
PyTorch MergeBot	2ef1454189	Revert "Add int1 to int7 dtypes (#136301 )" This reverts commit `bfa16a161d`. Reverted https://github.com/pytorch/pytorch/pull/136301 on behalf of https://github.com/PaliC due to causing internal failures ([comment](https://github.com/pytorch/pytorch/pull/136301#issuecomment-2384119600))	2024-09-30 20:50:49 +00:00
Jerry Zhang	bfa16a161d	Add int1 to int7 dtypes (#136301 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/117208, we want to add int1 to int7 for edge use cases for weight quantization (https://www.internalfb.com/diff/D62464487) Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/136301 Approved by: https://github.com/ezyang	2024-09-28 02:08:33 +00:00
Jessica Vandebon	baff86dafb	[MTIA tensor] allow shallow copy between CPU and MTIA tensors (#135871 ) Reviewed By: egienvalue, hanzlfs Differential Revision: D61662214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135871 Approved by: https://github.com/egienvalue, https://github.com/nautsimon	2024-09-13 22:13:58 +00:00
Yu, Guangye	6c1da66407	[Reland] Refactor caching device allocator utils (#130923 ) # Motivation Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage. This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy	2024-09-07 11:14:17 +00:00
Kulin Seth	144fde4fd2	[MPS] Add support for autocast in MPS (#99272 ) Fixes https://github.com/pytorch/pytorch/issues/88415 Need to run inductor/test_cpu_select_algorithm Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272 Approved by: https://github.com/malfet Co-authored-by: Siddharth Kotapati <skotapati@apple.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: Roy Hvaara <roy@lightyear.no>	2024-09-05 23:23:17 +00:00
PyTorch MergeBot	e55c0f59e5	Revert "[Reland] Refactor caching device allocator utils (#130923 )" This reverts commit `9809080b9e`. Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/kit1980 due to breaking internal builds - Error: Relocation overflow has occured ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2332640961))	2024-09-05 21:16:14 +00:00
rzou	d7b57c4d63	Fix tensor.data access under inference_mode and compile (#134878 ) Fixes https://github.com/pytorch/pytorch/issues/134798 In the regular Tensor case, when you call Tensor.data, there's a check for if inference mode is active. If it is active, then we don't set the version counter. We replicate this check for Tensor Subclasses (the bug was we were trying to set the version counter on a FakeTensor in inference_mode). Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/134878 Approved by: https://github.com/bdhirsh	2024-09-04 17:55:41 +00:00
FFFrog	5690f003a6	C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED and C10_DIAGNOST should be used in pairs (#135004 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135004 Approved by: https://github.com/aaronenyeshi	2024-09-04 13:14:23 +00:00
Yu, Guangye	9809080b9e	[Reland] Refactor caching device allocator utils (#130923 ) # Motivation Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage. This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy	2024-09-04 05:31:08 +00:00
PyTorch MergeBot	2c88a923a7	Revert "Refactor caching device allocator utils (#130923 )" This reverts commit `c45ca8092d`. Reverted https://github.com/pytorch/pytorch/pull/130923 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to be causing internal tests to fail with errors like `error: no type named 'DeviceStats' in namespace 'xxx::xxx:xxxAllocator'; did you mean 'DeviceStatus'?` ([comment](https://github.com/pytorch/pytorch/pull/130923#issuecomment-2315730155))	2024-08-28 15:56:08 +00:00
Yu, Guangye	c45ca8092d	Refactor caching device allocator utils (#130923 ) # Motivation Following [[RFC] Intel GPU Runtime Upstreaming for Allocator ](https://github.com/pytorch/pytorch/issues/116322), this PR aims to refactor caching device allocator utils to improve code reuse usage. This is the first PR, we could prepare some follow-up PRs continuing to refactor the device caching allocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130923 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD, https://github.com/eqy	2024-08-28 01:35:23 +00:00
Yuanhao Ji	44dadf2506	[Fix] Check name when registering privateuse1 backend (#134071 ) do some checks when registering privateuse1 backend to avoid using in-tree deivce names Pull Request resolved: https://github.com/pytorch/pytorch/pull/134071 Approved by: https://github.com/albanD	2024-08-27 20:28:30 +00:00
Guilherme Leobas	c518b50c4c	Remove functorch dispatch keys in `legacyExtractDispatchKey` (#133018 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133018 Approved by: https://github.com/zou3519	2024-08-13 15:32:01 +00:00
Yuanhao Ji	343071cd96	Fix privateuse1 backend name case (#132980 ) ### Problem `get_privateuse1_backend(bool lower_case)` always returns a lower case name and `lower_case` is not used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132980 Approved by: https://github.com/albanD	2024-08-10 07:39:54 +00:00
PyTorch MergeBot	2764bee942	Revert "[MPS] Add support for autocast in MPS (#99272 )" This reverts commit `6919e8baab`. Reverted https://github.com/pytorch/pytorch/pull/99272 on behalf of https://github.com/clee2000 due to Broke test/inductor/test_cpu_select_algorithm.py::TestSelectAlgorithmCPU::test_quantized_linear_amx_batch_size_3_in_features_128_out_features_64_bias_False_cpu on sm86 jobs [GH job link](https://github.com/pytorch/pytorch/actions/runs/10252979157/job/28367091621) [HUD commit link](`6919e8baab`) Not caught on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/99272#issuecomment-2269808857))	2024-08-05 19:59:04 +00:00
Kulin Seth	6919e8baab	[MPS] Add support for autocast in MPS (#99272 ) Fixes https://github.com/pytorch/pytorch/issues/88415 Co-authored-by: Siddharth Kotapati <skotapati@apple.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272 Approved by: https://github.com/malfet	2024-08-05 17:02:30 +00:00
soulitzer	82b6480b0a	Update SavedTensorHooks TLS stack to use SafePyObject (#131700 ) Previously, we must manually manage refcounting when updating the TLS saved variable stack. With this PR, things should be handled automatically by the SafePyObject. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131700 Approved by: https://github.com/albanD	2024-08-02 16:27:16 +00:00
cyy	b9cb1abf65	[12/N] Use std::optional (#132361 ) Follows #132396 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132361 Approved by: https://github.com/eqy	2024-08-02 13:46:46 +00:00
Xu Han	6c1f1563e1	[inductor] fix UndefinedTensorImpl singleton can't export on Windows. (#132326 ) This PR fix the `UndefinedTensorImpl::_singleton` can't export on Windows issue. Snapshot: <img width="1346" alt="image" src="https://github.com/user-attachments/assets/b34256ac-a0ae-473b-89e6-10d755eaad24"> The reason is MSVC can't export class static data to external linkage, ref: https://learn.microsoft.com/en-us/cpp/cpp/using-dllimport-and-dllexport-in-cpp-classes?view=msvc-170#_pluslang_using_dllimport_and_dllexport_in_c2b2bselectivememberimportexport I use another singleton implenmentation to avoid the issue, for Windows. Since this PR, cpp_wrapper on Windows would start to work. <img width="1916" alt="image" src="https://github.com/user-attachments/assets/c1d7d7e7-64ca-4c6d-9fb7-e3b91e675b58"> Next step, I will enable the cpp_wrapper UTs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132326 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-08-01 13:37:12 +00:00
Xu Han	39a3c98aa6	[inductor] fix scalar miss constuctor for long type. (#132117 ) Fix `long` to `c10::scalar` convert issue. ![image](https://github.com/user-attachments/assets/fc44a170-e293-4688-a185-d189484f6638) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132117 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-07-31 15:40:48 +00:00
albanD	466ea8ce54	Add fallback() to torch.library (#131707 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131707 Approved by: https://github.com/zou3519	2024-07-27 18:02:35 +00:00
cyy	f83ef69b84	Fix typo in assignment operators (#131890 ) Most typos were introduced in #131077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131890 Approved by: https://github.com/Skylion007	2024-07-27 11:13:42 +00:00
cyyever	451462dbff	[1/N] Add missing constructors or assignment operators (#131077 ) Just mark them as deleted in most cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131077 Approved by: https://github.com/ezyang	2024-07-24 12:09:39 +00:00
cyy	1d1d074072	[3/N] Fix Wunused-parameter warnings (#131271 ) Follows #131170 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131271 Approved by: https://github.com/ezyang	2024-07-20 23:31:03 +00:00
PyTorch MergeBot	7c299b46ca	Revert "Invalidate StorageImpl instances when tensor is overwritten with cudagraphs (#125264 )" This reverts commit `8390843eba`. Reverted https://github.com/pytorch/pytorch/pull/125264 on behalf of https://github.com/izaitsevfb due to breaks internal tests ([comment](https://github.com/pytorch/pytorch/pull/125264#issuecomment-2240516202))	2024-07-19 22:58:51 +00:00
cyy	feef057691	[1/N] Fix Wunused-parameter warnings (#130924 ) Before we can turn Wunused-parameter into an error Pull Request resolved: https://github.com/pytorch/pytorch/pull/130924 Approved by: https://github.com/ezyang	2024-07-19 06:14:51 +00:00
Isuru Fernando	8390843eba	Invalidate StorageImpl instances when tensor is overwritten with cudagraphs (#125264 ) Fixes #104435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125264 Approved by: https://github.com/ezyang	2024-07-16 14:29:29 +00:00
PyTorch MergeBot	78799e82b0	Revert "Invalidate StorageImpl instances when tensor is overwritten with cudagraphs (#125264 )" This reverts commit `1bc390c5f5`. Reverted https://github.com/pytorch/pytorch/pull/125264 on behalf of https://github.com/jithunnair-amd due to test test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times is failing https://github.com/pytorch/pytorch/actions/runs/9933628108/job/27477785946 `1bc390c5f5`. Test was introduced by `fa5f572748` which is before the merge base ([comment](https://github.com/pytorch/pytorch/pull/125264#issuecomment-2229508737))	2024-07-15 21:59:46 +00:00
Isuru Fernando	1bc390c5f5	Invalidate StorageImpl instances when tensor is overwritten with cudagraphs (#125264 ) Fixes #104435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125264 Approved by: https://github.com/ezyang	2024-07-15 04:16:17 +00:00
cyy	28f6ae2718	[9/N] Replace c10::optional with std::optional (#130674 ) Follows #130509 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130674 Approved by: https://github.com/Skylion007	2024-07-15 00:48:43 +00:00
Bertrand Thia	43b98fa521	Add debug repr to SymNode (#129925 ) Fixes #129403 Create a separate printing function to debug SymNode, since we can't easily change `__repr__` that is used by GraphModule.recompile() to create a pythonic version of a graph This is my first contribution, please let me know if there is anything that I should look into in further details Thank you for you guidance! 🙏 I hope to contribute more in the future! @aorenste Pull Request resolved: https://github.com/pytorch/pytorch/pull/129925 Approved by: https://github.com/aorenste	2024-07-12 18:31:23 +00:00
Valentin Andrei	b139b5090f	[pytorch] Name threads in thread pools for better debugging (#130270 ) Threads inside the thread pools are not named, so they inherit the main process name or the name of the first thread. In our case if we set `pt_main_thread` as the thread name when a thread does `import torch`, this name will be inherited by all the threads in the created pools. This PR names the threads in the pools I was able to find. There are other pools created, like OpenMP ones and we need to follow-up on those. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130270 Approved by: https://github.com/d4l3k, https://github.com/albanD	2024-07-09 08:03:47 +00:00
cyy	f4dcf2ae93	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang, https://github.com/r-barnes	2024-07-08 07:03:53 +00:00
Edward Z. Yang	64a04d2225	Make sparse empty constructors specialize instead of fail on symbolic inputs (#129983 ) Exercised in https://github.com/pytorch/pytorch/pull/128327 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129983 Approved by: https://github.com/anijain2305	2024-07-03 13:27:19 +00:00
PyTorch MergeBot	07450e9713	Revert "[MPS] Add support for autocast in MPS (#99272 )" This reverts commit `6240cfd5c7`. Reverted https://github.com/pytorch/pytorch/pull/99272 on behalf of https://github.com/jeanschmidt due to introduced breakages in trunk ([comment](https://github.com/pytorch/pytorch/pull/99272#issuecomment-2203033719))	2024-07-02 12:29:51 +00:00
Kulin Seth	6240cfd5c7	[MPS] Add support for autocast in MPS (#99272 ) Fixes https://github.com/pytorch/pytorch/issues/88415 Co-authored-by: Siddharth Kotapati <skotapati@apple.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272 Approved by: https://github.com/malfet	2024-07-02 01:49:52 +00:00
Klein Shen	0e6bb7f1ce	[caffe2][be] migrate gloabl static initializer (#128784 ) Summary: Caffe2 lib has 200+ global static initializer usage, which are papar-cut reference to startup perf. Detail in this post https://fb.workplace.com/groups/arglassesperf/permalink/623909116287154. This Diff migrate StorageImpl.cpp Addtional Context: https://fb.workplace.com/groups/arglassesperf/permalink/623909116287154 Test Plan: CI Differential Revision: D58639283 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128784 Approved by: https://github.com/aaronenyeshi	2024-06-25 15:30:49 +00:00
rzou	856541c701	[custom_op] support default dtype values (#129189 ) This PR: - moves some of the dtype-string utilities into ScalarType.{h, cpp} - adds a new utility to get a mapping from dtype name to the C++ dtype - the perser now checks if the string is a dtype name; if it is then it pulls the c++ dtype from the mapping. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/129189 Approved by: https://github.com/albanD ghstack dependencies: #129177, #129178, #129179	2024-06-23 00:13:23 +00:00
Aaron Enye Shi	f42d5b6dca	[Memory Snapshot] Make recordAnnotations callback initialize lazily (#129242 ) Summary: Make the recordAnnotations' Record function callback lazily initialize when record memory history starts. This will help reduce the impact on Time To First Batch metric. Test Plan: CI and ran locally. Differential Revision: D58875576 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/129242 Approved by: https://github.com/zdevito	2024-06-22 04:05:55 +00:00
rzou	08b616281f	[custom ops] Switch out references from old landing page to new landing page (#129178 ) Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/129178 Approved by: https://github.com/albanD ghstack dependencies: #129177	2024-06-21 13:31:40 +00:00
Aaron Enye Shi	b5d541609d	[Memory Snapshot] Add recordAnnotations to capture record_function annotations (#129072 ) Summary: Add new traceEvents into Memory Snapshot for record_function annotations. These will capture both the profiler's step annotation as well as user annotations. Test Plan: CI Pulled By: aaronenyeshi Differential Revision: D55941362 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129072 Approved by: https://github.com/zdevito	2024-06-19 18:05:41 +00:00
PyTorch MergeBot	846bb30e13	Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 )" This reverts commit `bd72e28314`. Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build `bd72e28314`. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))	2024-06-15 01:58:20 +00:00
cyy	bd72e28314	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang	2024-06-14 23:21:01 +00:00
Edward Z. Yang	3964a3ec73	Complete revamp of float/promotion sympy handling (#126905 ) At a high level, the idea behind this PR is: * Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.) * Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers. The story begins in torch/utils/_sympy/functions.py. Here, I make some changes to how we represent certain operations in sympy expressions: * FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing). * ModularIndexing, LShift, RShift now assert they are given integer inputs. * Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver * TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2*53 beyond what first coercing the integer to floats and then doing true division. Trunc is split to TruncToFloat and TruncToInt. * Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result. * RoundDecimal updated to consistently only ever return a float * Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing) In torch/__init__.py, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations. Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information. We also need to introduce some new op handlers in torch/_inductor/ops_handler.py: * `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy * `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv` These changes have consequences. First, we need to make some administrative changes: * Actually wire up these Sympy functions from SymInt/SymFloat in torch/fx/experimental/sym_node.py, including the new promotion rules (promote2) * Add support for new Sympy functions in torch/utils/_sympy/interp.py, torch/utils/_sympy/reference.py * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here * Add printer support for the Sympy functions in torch/_inductor/codegen/common.py, torch/_inductor/codegen/cpp_utils.py, torch/_inductor/codegen/triton.py. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet * Update ValueRanges logic to use new sympy functions in torch/utils/_sympy/value_ranges.py. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions. In torch/fx/experimental/symbolic_shapes.py we need to make some symbolic reasoning adjustments: * Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now * `_assert_bound_is_rational` is no more, we no longer generate rational bounds * Don't intersect non-int value ranges with the `int_range` * Support more sympy Functions for guard SYMPY_INTERP * Assert the type of value range is consistent with the variable type The new asserts uncovered necessary bug fixes: * torch/_inductor/codegen/cpp.py, torch/_inductor/select_algorithm.py, torch/_inductor/sizevars.py - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions * torch/_inductor/utils.py - make sure you actually pass in sympy.Expr to these functions * torch/_inductor/ir.py - make_contiguous_strides_for takes int/SymInt, not sympy.Expr! * torch/export/dynamic_shapes.py - don't use infinity to represent int ranges, instead use sys.maxsize - 1 Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at test/test_proxy_tensor.py Reland notes. This requires this internal fbcode diff https://www.internalfb.com/phabricator/paste/view/P1403322587 but I cannot prepare the diff codev due to https://fb.workplace.com/groups/osssupport/posts/26343544518600814/ It also requires this Executorch PR https://github.com/pytorch/executorch/pull/3911 but the ET PR can be landed prior to this landing. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905 Approved by: https://github.com/xadupre, https://github.com/lezcano	2024-06-09 06:20:25 +00:00
PyTorch MergeBot	ac51f782fe	Revert "Complete revamp of float/promotion sympy handling (#126905 )" This reverts commit `2f7cfecd86`. Reverted https://github.com/pytorch/pytorch/pull/126905 on behalf of https://github.com/atalman due to Sorry need to revert - failing internally ([comment](https://github.com/pytorch/pytorch/pull/126905#issuecomment-2155118778))	2024-06-07 16:01:46 +00:00
Edward Z. Yang	2f7cfecd86	Complete revamp of float/promotion sympy handling (#126905 ) At a high level, the idea behind this PR is: * Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.) * Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers. The story begins in torch/utils/_sympy/functions.py. Here, I make some changes to how we represent certain operations in sympy expressions: * FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing). * ModularIndexing, LShift, RShift now assert they are given integer inputs. * Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver * TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2*53 beyond what first coercing the integer to floats and then doing true division. Trunc is split to TruncToFloat and TruncToInt. * Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result. * RoundDecimal updated to consistently only ever return a float * Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing) In torch/__init__.py, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations. Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information. We also need to introduce some new op handlers in torch/_inductor/ops_handler.py: * `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy * `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv` These changes have consequences. First, we need to make some administrative changes: * Actually wire up these Sympy functions from SymInt/SymFloat in torch/fx/experimental/sym_node.py, including the new promotion rules (promote2) * Add support for new Sympy functions in torch/utils/_sympy/interp.py, torch/utils/_sympy/reference.py * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here * Add printer support for the Sympy functions in torch/_inductor/codegen/common.py, torch/_inductor/codegen/cpp_utils.py, torch/_inductor/codegen/triton.py. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet * Update ValueRanges logic to use new sympy functions in torch/utils/_sympy/value_ranges.py. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions. In torch/fx/experimental/symbolic_shapes.py we need to make some symbolic reasoning adjustments: * Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now * `_assert_bound_is_rational` is no more, we no longer generate rational bounds * Don't intersect non-int value ranges with the `int_range` * Support more sympy Functions for guard SYMPY_INTERP * Assert the type of value range is consistent with the variable type The new asserts uncovered necessary bug fixes: * torch/_inductor/codegen/cpp.py, torch/_inductor/select_algorithm.py, torch/_inductor/sizevars.py - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions * torch/_inductor/utils.py - make sure you actually pass in sympy.Expr to these functions * torch/_inductor/ir.py - make_contiguous_strides_for takes int/SymInt, not sympy.Expr! * torch/export/dynamic_shapes.py - don't use infinity to represent int ranges, instead use sys.maxsize - 1 Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at test/test_proxy_tensor.py Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905 Approved by: https://github.com/xadupre, https://github.com/lezcano	2024-06-06 02:29:45 +00:00
PyTorch MergeBot	d5cb5d623a	Revert "Complete revamp of float/promotion sympy handling (#126905 )" This reverts commit `fb696ef3aa`. Reverted https://github.com/pytorch/pytorch/pull/126905 on behalf of https://github.com/ezyang due to internal user reported ceiling equality simplification problem, I have a plan ([comment](https://github.com/pytorch/pytorch/pull/126905#issuecomment-2148805840))	2024-06-05 03:57:58 +00:00
Edward Z. Yang	fb696ef3aa	Complete revamp of float/promotion sympy handling (#126905 ) At a high level, the idea behind this PR is: * Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.) * Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers. The story begins in torch/utils/_sympy/functions.py. Here, I make some changes to how we represent certain operations in sympy expressions: * FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing). * ModularIndexing, LShift, RShift now assert they are given integer inputs. * Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver * TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2*53 beyond what first coercing the integer to floats and then doing true division. Trunc is split to TruncToFloat and TruncToInt. * Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result. * RoundDecimal updated to consistently only ever return a float * Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing) In torch/__init__.py, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations. Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information. We also need to introduce some new op handlers in torch/_inductor/ops_handler.py: * `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy * `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv` These changes have consequences. First, we need to make some administrative changes: * Actually wire up these Sympy functions from SymInt/SymFloat in torch/fx/experimental/sym_node.py, including the new promotion rules (promote2) * Add support for new Sympy functions in torch/utils/_sympy/interp.py, torch/utils/_sympy/reference.py * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here * Add printer support for the Sympy functions in torch/_inductor/codegen/common.py, torch/_inductor/codegen/cpp_utils.py, torch/_inductor/codegen/triton.py. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet * Update ValueRanges logic to use new sympy functions in torch/utils/_sympy/value_ranges.py. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions. In torch/fx/experimental/symbolic_shapes.py we need to make some symbolic reasoning adjustments: * Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now * `_assert_bound_is_rational` is no more, we no longer generate rational bounds * Don't intersect non-int value ranges with the `int_range` * Support more sympy Functions for guard SYMPY_INTERP * Assert the type of value range is consistent with the variable type The new asserts uncovered necessary bug fixes: * torch/_inductor/codegen/cpp.py, torch/_inductor/select_algorithm.py, torch/_inductor/sizevars.py - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions * torch/_inductor/utils.py - make sure you actually pass in sympy.Expr to these functions * torch/_inductor/ir.py - make_contiguous_strides_for takes int/SymInt, not sympy.Expr! * torch/export/dynamic_shapes.py - don't use infinity to represent int ranges, instead use sys.maxsize - 1 Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at test/test_proxy_tensor.py Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905 Approved by: https://github.com/xadupre, https://github.com/lezcano	2024-06-04 11:47:32 +00:00
cyy	288df042c5	[1/N] Change static functions in headers to inline (#127727 ) So that it may fix some tricky linking issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127727 Approved by: https://github.com/ezyang	2024-06-03 04:34:36 +00:00
Shan19900305	e62925930f	Clear dest impl extra_meta_ info when shallow_copy_from src impl to dest impl. (#127616 ) tensorA.data = tensorB will call shallow_copy_from function to copy tensorB metadata and storage to tensorA metadata and storage. If tensorB extra_meta_ is nullptr,then tensorA extra_meta_ still keep in tensorA. This will contaminate new meta data in tensorA. @ezyang @bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/127616 Approved by: https://github.com/ezyang	2024-06-01 06:54:32 +00:00
rzou	c9beea13ac	Rewrite existing links to custom ops gdocs with the landing page (#127423 ) NB: these links will be live after the docs build happens, which is once a day. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/127423 Approved by: https://github.com/jansel, https://github.com/williamwen42 ghstack dependencies: #127291, #127292, #127400	2024-05-30 14:54:29 +00:00
Nikita Shulga	194950c0ca	Default TreadPool size to number of physical cores (#125963 ) TODO: Some benchmarks Pull Request resolved: https://github.com/pytorch/pytorch/pull/125963 Approved by: https://github.com/janeyx99, https://github.com/Skylion007, https://github.com/gajjanag, https://github.com/jgong5	2024-05-24 16:06:48 +00:00

1 2 3 4 5 ...

1359 Commits