pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Benjamin Glass	5aa5a5763e	[inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 (#145684 ) Triton 2.2 and greater have a bug where allowing TF32 generation for a GPU that does not support TF32 will cause code generation errors. Patch around this problem by: 1. Adding a function to `torch.cuda` that determines whether CUDA hardware is capable of using the TF32 format. 2. Using that function to explicitly disable TF32 generation when calling Triton, where needed. To demonstrate that this fix works, try running `test/inductor/test_max_autotune.py` on a GPU with CUDA compute capability < 8 (e.g. any NVIDIA consumer GPU) without this fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145684 Approved by: https://github.com/eqy	2025-01-28 22:01:08 +00:00
Dmitry Nikolaev	d4871750d9	[ROCm] Enable post-merge trunk workflow on MI300 runners; skip and fix MI300 related failed tests (#143673 ) This PR * makes changes to the workflow files and scripts so we can run CI workflows on the MI300 runners * skips and fixes several tests, failed on MI300, observed in https://github.com/pytorch/pytorch/pull/140989 Skipped due to unsupported Float8_e4m3fn data type on MI300 (need to update test code to use datatypes supported by MI300): - distributed.tensor.parallel.test_micro_pipeline_tp.py::MicroPipelineTPTest::test_fuse_all_gather_scaled_matmul_A_dims_\_gather_dim_\ (24 tests across inductor/distributed configs) - distributed.tensor.parallel.test_micro_pipeline_tp.py::test_fuse_scaled_matmul_reduce_scatter_A_dims_\_scatter_dim_\ (12 tests across inductor/distributed configs)) - inductor.test_loop_ordering::LoopOrderingTest::test_fp8_cast_and_t - inductor.test_loop_ordering::LoopOrderingTest::test_fp8_pattern_2 Skipped due to AssertionError on MI300: - inductor.test_mkldnn_pattern_matcher.py::test_qconv2d_int8_mixed_bf16 - distributed._tools.test_sac_ilp::TestSACILP::test_sac_ilp_case1 Skipped: - test_cuda.py::TestCudaMallocAsync::test_clock_speed - test_cuda.py::TestCudaMallocAsync::test_power_draw - test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_cumsum_cuda Skipped flaky tests on MI300: - distributed.test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress_cuda - inductor.test_cpu_repro::CPUReproTests::test_lstm_packed_unbatched_False* (256 tests) Fixed: - test_matmul_cuda.py::TestFP8MatmulCudaCUDA::test_float8_basics_cuda Features: - inductor/test_fp8.py - declare a new function to convert FP8 datatypes to ROCm supported FP8 datatypes. It keeps test names for CUDA and ROCm and allows to enable Inductor FP8 tests on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/143673 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/pruthvistony Co-authored-by: saienduri <saimanas.enduri@amd.com> Co-authored-by: Jithun Nair <jithun.nair@amd.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-01-09 05:18:57 +00:00
cyy	df458be4e5	[4/N] Apply py39 ruff and pyupgrade fixes (#143257 ) ```torch/fx/passes/annotate_getitem_nodes.py``` was changed to support the new type hinting annotations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143257 Approved by: https://github.com/justinchuby, https://github.com/albanD	2025-01-04 10:47:51 +00:00
Davide Italiano	41b5c600df	[ReduceOps] Add dimension checking for cummin()/cummax(). (#143920 ) Summary: cum{min,max} didn't guard against 0-d vector and allowed an arbitrary dimension to be passed. Test Plan: torch_test.py Reviewers: Subscribers: Tasks: Tags: Fixes #71477 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143920 Approved by: https://github.com/malfet	2025-01-03 04:14:33 +00:00
Wenqin Yang	8d9ff9c8a4	Fix a bug for wrong stride in fake tensor (#141427 ) Fixes #141426 Please see details in the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141427 Approved by: https://github.com/jansel	2024-12-31 23:45:32 +00:00
Eddie Yan	b90a3b7281	[cumsum][CUDA][64-bit indexing] Add 64-bit indexing path for `cumsum` (#143696 ) For #143486 Interestingly enough changing the indexing type seems to degrade performance when a larger width is not needed, even on small sizes, so making this a template param rather than forcing all cases to 64-bit Pull Request resolved: https://github.com/pytorch/pytorch/pull/143696 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-12-24 03:45:28 +00:00
PyTorch MergeBot	49fdc52fd2	Revert "Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261 )" This reverts commit `bc78b6ea4f`. Reverted https://github.com/pytorch/pytorch/pull/143261 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing lint, plz help fix and reland this ([comment](https://github.com/pytorch/pytorch/pull/143261#issuecomment-2560583332))	2024-12-24 03:15:38 +00:00
Joshua Hamilton	bc78b6ea4f	Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261 ) Fixes #143071 Operations performed on tensors with `requires_grad=True` such as ```python import torch x = torch.tensor(2.0, requires_grad=True) y = x ** 3 ``` and ```python x = torch.tensor(2.0, requires_grad=True) y = torch.pow(x,3) ``` are valid operations. While an operation using `numpy` like ```python import numpy as np x = torch.tensor(2.0, requires_grad=True) y = np.pow(x,3) # > RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. ``` leads to an error. However, an operation that uses `math` like ```python import math x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) ``` does not cause an error, and `y` is no longer a tensor with a gradient! This represents a [footgun](https://en.wiktionary.org/wiki/footgun#Noun) for some users, like myself when training small, custom, non-neural network models. To prevent future undesired behavior, I added a warning when converting tensors with `requires_grad=True` to scalars. Now, when using `math.pow` on a `tensor`, we get a single warning with: ```python x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) # > UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior. # Consider using tensor.detach() first. ``` Please let me know if you have any questions 👍 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143261 Approved by: https://github.com/albanD	2024-12-24 00:22:18 +00:00
Tom Ritchford	f1cbf4b1b5	Enable ruff's unused variable checking everywhere in pytorch (#136965 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136965 Approved by: https://github.com/cyyever, https://github.com/albanD	2024-12-22 02:33:11 +00:00
Tom Ritchford	8d4926e30a	Fix unused variables in test/torch.py (#143399 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143399 Approved by: https://github.com/albanD	2024-12-18 17:57:24 +00:00
emmettbicker	576789197a	Add support for CPU scalar in addcmul (#143264 ) Step required for performance in #143122 Adds support for CPU scalar for tensor_2 in addcmul. For example: ``` import torch a = torch.rand(2, 2, device="cuda") b = torch.tensor(1e-3) torch.add(a, b) torch.addcmul(a, a, b) # used to fail, now works ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143264 Approved by: https://github.com/janeyx99 Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>	2024-12-18 04:43:29 +00:00
Natalia Gimelshein	859be14c4e	fix a few int64_t index computations, fix complex128 scan that had to… (#143401 ) …o few threads per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/143401 Approved by: https://github.com/eqy	2024-12-18 04:27:27 +00:00
albanD	70be7900bb	Fix Tensor clear to properly clear slots (#143203 ) Fixes a bug introduced in https://github.com/pytorch/pytorch/pull/137267 While the test ensures the finalizer did run to make sure things are cleared, the objects are not properly collected by the gc due to the faulty tp_clear implementation. So, while the finalizer did run, the object was still alive. Fixing this by giving tp_clear the same treatment as tp_traverse and tp_dealloc on Tensor: make it a unique function that handles the full subclass hierarchy in one place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143203 Approved by: https://github.com/ezyang, https://github.com/colesbury ghstack dependencies: #143202	2024-12-14 00:17:07 +00:00
gasoonjia	29e985b7b0	[dim_order] raised runtime error when tensor has ambiguous dim order (#141632 ) This diff makes tensor.dim_order() raise error when tensor's dim order is ambiguous. Detail discussion can be found https://fb.workplace.com/groups/894363187646754/permalink/2039987243084337/ Differential Revision: [D65133579](https://our.internmc.facebook.com/intern/diff/D65133579/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141632 Approved by: https://github.com/larryliu0820	2024-12-08 23:16:57 +00:00
Natalia Gimelshein	0443398f5b	Implement deterministic scan (#140887 ) Fixes #89492 Uses block-wise cub primitives On large inputs, this implementation is approximately 25% slower than device cub implementation, so it's turned on only in cases where cub would have been (floating point inputs, cumsum that is effectively 1d) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140887 Approved by: https://github.com/ezyang, https://github.com/kurtamohler	2024-11-19 23:43:26 +00:00
PyTorch MergeBot	7f10351ba0	Revert "Implement deterministic scan (#140887 )" This reverts commit `4eed438a42`. Reverted https://github.com/pytorch/pytorch/pull/140887 on behalf of https://github.com/ngimel due to breaks with 11.4 ([comment](https://github.com/pytorch/pytorch/pull/140887#issuecomment-2486409438))	2024-11-19 18:08:48 +00:00
Prachi Gupta	7156d0824d	[ROCm] Fix largeIndexBlockSize (#139087 ) On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. This impacts index_add_ operation on a large tensor, we end up picking the large values instead of max supported block size (128). This leads to GPU accessing memory out of bounds. While we wait for ::min to be fixed, we can use < operator to compare instead of relying on ::min. Example Code w/ failure: ``` D=6144 hidden_states = torch.zeros([16384, 6144], device="cuda:0", dtype=torch.bfloat16) index = torch.randint(0, 16384, (1, 32, 16384), device="cuda:0", dtype=torch.int64) output = torch.empty([1, 32, 16384, 6144], device="cuda:0", dtype=torch.bfloat16) hidden_states.index_add_(0, index.view(-1), output.view(-1, D)) ``` ``` Traceback (most recent call last): RuntimeError: HIP error: invalid configuration argument ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/139087 Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony	2024-11-19 06:29:48 +00:00
Natalia Gimelshein	4eed438a42	Implement deterministic scan (#140887 ) Fixes #89492 Uses block-wise cub primitives On large inputs, this implementation is approximately 25% slower than device cub implementation, so it's turned on only in cases where cub would have been (floating point inputs, cumsum that is effectively 1d) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140887 Approved by: https://github.com/ezyang, https://github.com/kurtamohler	2024-11-18 20:56:14 +00:00
Nikita Shulga	0f739b8f66	[Codemod] `skipIfMps`->`skipIfMPS` (#140562 ) As `MPS` is an acronym that stands for Metal Performance Shaders Also to closer align with `skipCUDAIf` not `skipCudaIf` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140562 Approved by: https://github.com/ZainRizvi, https://github.com/r-barnes	2024-11-13 19:45:08 +00:00
zeshengzong	cb71bcc542	Replace clone.detach with detach.clone (#140264 ) Fixes #64532 As state in issue, replace `clone.detach` by `detach.clone` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264 Approved by: https://github.com/soulitzer	2024-11-13 07:01:02 +00:00
zeshengzong	5ef33e40b3	Add size param check of unfold (#139965 ) Fixes #76617 Changes: - Add check of input `size` value, give user friendly hint message - fix `FIXME: move to shape ops test suite` in test file Before ```python import torch x = torch.arange(1., 8) x.unfold(0, -1, 1) Traceback (most recent call last): File "/home/zong/code/unfold.py", line 12, in <module> x.unfold(0, -1, 1) RuntimeError: Storage size calculation overflowed with sizes=[9, -1] and strides=[1, 1] ``` After ```python import torch x = torch.arange(1., 8) x.unfold(0, -1, 1) Traceback (most recent call last): File "/home/zong/code/pytorch/../unfold.py", line 12, in <module> x.unfold(0, -1, 1) RuntimeError: size is -1 but must be >= 0 ``` Test Result: ```bash pytest test/test_shape_ops.py ``` ![image](https://github.com/user-attachments/assets/d7bcef62-04e6-4187-9c8f-bc5220ff6c33) ```bash $ lintrunner ``` ![image](https://github.com/user-attachments/assets/6b48d095-5c8a-4e75-9957-dc22d39a73bb) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139965 Approved by: https://github.com/ezyang	2024-11-09 17:12:53 +00:00
PyTorch MergeBot	067d2a089d	Revert "Expose Storage _use_count API in Python (#139426 )" This reverts commit `e31136d07b`. Reverted https://github.com/pytorch/pytorch/pull/139426 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing some inductor job in trunk ([comment](https://github.com/pytorch/pytorch/pull/139426#issuecomment-2453269063))	2024-11-03 02:40:45 +00:00
Jane Xu	e31136d07b	Expose Storage _use_count API in Python (#139426 ) Would be nice to replace the torch._C._storage_Use_Count call in https://github.com/pytorch/torchtune/pull/1936, at least without needing to know about _cdata in OSS code. Initially keeping it private as Tensor._use_count is also private. In favor over https://github.com/pytorch/pytorch/pull/139109 in solving the same problem, as exposing an existing API is better than adding a new one (and this enables a more robust fix) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139426 Approved by: https://github.com/soulitzer	2024-11-02 23:36:31 +00:00
PyTorch MergeBot	49bfbed2eb	Revert "Add deterministic path for CUDA `cumsum` (#136224 )" This reverts commit `383eba5229`. Reverted https://github.com/pytorch/pytorch/pull/136224 on behalf of https://github.com/ezyang due to larger memory usage apparently not acceptable ([comment](https://github.com/pytorch/pytorch/pull/136224#issuecomment-2447382819))	2024-10-30 14:43:15 +00:00
Mikayla Gawarecki	37149d032c	Fix .to(cpu) for Storage (#138011 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138011 Approved by: https://github.com/albanD	2024-10-23 01:31:48 +00:00
William Wen	92fdea8a39	remove skips due to https://github.com/pytorch/torchdynamo/issues/1991 (#138133 ) Closes https://github.com/pytorch/pytorch/issues/93479. A bunch of other dynamo-wrapped tests also exhibit "torch.* returned non-Tensor output unimplemented" making the issue seem less relevant to me. Some tests are marked as xfail as they fail for other reasons. If these tests are indeed important, we should create a new issue to track them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138133 Approved by: https://github.com/ezyang	2024-10-17 17:42:46 +00:00
Haifeng Jin	2db3f85894	Fixes NumPy 2 test failures in test_torch.py (#137740 ) Related to #107302 The breakages are caused by backward incompatibility between NumPy 1 and NumPy 2. This PR fixes all the corresponding test failures in `test_torch.py`. 1. The dtype of the return value `np.percentile` when passed a `torch.float32` tensor. NumPy 1: Return value of `np.float64`. NumPy 2: Return value of `np.float32`. Solution: Enforce it with `.astype(np.float64)`. 2. The type of `np.gradient()` when returning multiple arrays. NumPy1: A list of arrays. NumPy2: A tuple of arrays. Solution: Cast the tuple to a list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137740 Approved by: https://github.com/ezyang	2024-10-12 02:40:17 +00:00
Kurt Mohler	383eba5229	Add deterministic path for CUDA `cumsum` (#136224 ) Change `cumsum` to call its decomposition when `use_deterministic_algorithms(True)` and input is CUDA. Fixes #89492 Fixes #75240 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136224 Approved by: https://github.com/ezyang, https://github.com/justinchuby, https://github.com/eqy	2024-10-10 06:59:08 +00:00
Miles	f301f6544b	fix bug for fill_empty_deterministic_ not support complex half (#137488 ) Fixes #133157 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137488 Approved by: https://github.com/ezyang	2024-10-10 01:21:32 +00:00
vasiliy	a063a82c8b	[redo] Fp8 support for item() with cuda, index_select, and fill_ cpu (#137341 ) Summary: Redo of https://github.com/pytorch/pytorch/pull/128780, easier to copy-paste. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/137341 Approved by: https://github.com/eqy	2024-10-07 00:58:51 +00:00
albanD	c0deec120f	Fix resurrection logic to trigger early enough (#137267 ) Fixes https://github.com/pytorch/pytorch/issues/136358 The bug here is that the Tensor object is actually 2 classes: `Tensor` from `_tensor.py` and `TensorBase` from c++. Before this PR, they have the following gc methods: Tensor: - tp_clear subtype_clear - tp_traverse THPVariable_subclass_traverse - tp_dealloc THPVariable_subclass_dealloc TensorBase: - tp_clear THPVariable_clear - tp_traverse THPFunction_traverse (fake function that is just an error) - tp_dealloc object_dealloc The problem is that when clear is called on the Tensor, subtype_clear is going to clear the things owned by the "Tensor" type, in particular, its `__dict__` attribute, before delegating to the TensorBase clear where we detect that resurrection needs to happen and skip it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137267 Approved by: https://github.com/ezyang, https://github.com/kshitij12345	2024-10-04 21:13:54 +00:00
PyTorch MergeBot	e9d2765ec8	Revert "Add deterministic path for CUDA `cumsum` (#136224 )" This reverts commit `d1bb8e828f`. Reverted https://github.com/pytorch/pytorch/pull/136224 on behalf of https://github.com/atalman due to Break internal CI ([comment](https://github.com/pytorch/pytorch/pull/136224#issuecomment-2379214226))	2024-09-27 12:54:47 +00:00
Kurt Mohler	d1bb8e828f	Add deterministic path for CUDA `cumsum` (#136224 ) Change `cumsum` to call its decomposition when `use_deterministic_algorithms(True)` and input is CUDA. Fixes #89492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136224 Approved by: https://github.com/ezyang, https://github.com/justinchuby	2024-09-26 04:52:05 +00:00
PyTorch MergeBot	e3b89ca124	Revert "Add deterministic path for CUDA `cumsum` (#136224 )" This reverts commit `b1a02bf708`. Reverted https://github.com/pytorch/pytorch/pull/136224 on behalf of https://github.com/ezyang due to Failing internall CI ([comment](https://github.com/pytorch/pytorch/pull/136224#issuecomment-2374201626))	2024-09-25 14:11:01 +00:00
Kurt Mohler	b1a02bf708	Add deterministic path for CUDA `cumsum` (#136224 ) Change `cumsum` to call its decomposition when `use_deterministic_algorithms(True)` and input is CUDA. Fixes #89492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136224 Approved by: https://github.com/ezyang, https://github.com/justinchuby	2024-09-24 21:34:43 +00:00
Robert Hardwick	eac04fe72a	Increase bf32 tolerances for some cdist tests in test_torch (#136315 ) - Set the new tolerances ~= N * eps(bfloat16) which should be a comfortable upper bound for tolerances. Where N is the inner dimension of the matmal. Logic behind choice of tolerance: The maximum error of the summation of a series of N numbers in bfloat16 should be `N * epsilon(bfloat16)` , I confirmed by sampling different random seeds that the maximum observed error doesn't exceed this value and is usually much less. Fixes test failures on Arm® Neoverse™ V1 ( not raised as an issue as this hardware type is not currently covered by linux-aarch64 workflow ) ``` Traceback (most recent call last): File "/var/lib/jenkins/workspace/test/test_torch.py", line 2478, in test_cdist_large self.assertEqual(expected, actual) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3885, in assertEqual raise error_metas.pop()[0].to_error( AssertionError: Tensor-likes are not close! Mismatched elements: 134118 / 1000000 (13.4%) Greatest absolute difference: 0.03829193115234375 at index (291, 726) (up to 0.005 allowed) Greatest relative difference: 0.03519868478178978 at index (291, 726) (up to 1.3e-06 allowed) ``` @malfet @jondea Pull Request resolved: https://github.com/pytorch/pytorch/pull/136315 Approved by: https://github.com/albanD	2024-09-24 16:10:11 +00:00
PyTorch MergeBot	fd182b90a7	Revert "Add deterministic path for CUDA `cumsum` (#136224 )" This reverts commit `d45b0151e5`. Reverted https://github.com/pytorch/pytorch/pull/136224 on behalf of https://github.com/atalman due to Failing internall CI ([comment](https://github.com/pytorch/pytorch/pull/136224#issuecomment-2369244135))	2024-09-23 19:57:13 +00:00
Kurt Mohler	d45b0151e5	Add deterministic path for CUDA `cumsum` (#136224 ) Change `cumsum` to call its decomposition when `use_deterministic_algorithms(True)` and input is CUDA. Fixes #89492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136224 Approved by: https://github.com/ezyang, https://github.com/justinchuby	2024-09-20 02:41:56 +00:00
Nikita Shulga	cd5452aace	[CUDA] `is_bf16_supported()` should not crash if there are no GPUs (#132313 ) `False` is the good answer on a system that does not have any CUDA GPUs. - Added regression test to TestTorch. Fixes https://github.com/pytorch/pytorch/issues/132303 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132313 Approved by: https://github.com/eqy, https://github.com/syed-ahmed	2024-08-02 02:50:43 +00:00
Songhao Jia	a141334c88	migitate wrong tensor.dim_order() (#131366 ) Summary: there're some issues for dim order creation. T194410923 has detail illustration. One of the reason is sometimes `is_contiguous` function may generate ambiguous memory format result (some tensors might be both channels_last and contiguous at the same time), and dim order generation rely on memory format result underneath for shortcut. To mitigate the issue, we make dim order utilizing the short cut if and only if the tensor is only belongs to single memory format. Otherwise, we will still recalculate it. Test Plan: CI Differential Revision: D60056793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131366 Approved by: https://github.com/ezyang	2024-07-30 21:58:15 +00:00
Aaron Orenstein	5a0068cc69	[BE] mypy: disallow untyped decorators (#131428 ) Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations. Step 1 - Enable the error and override in all the offending files. #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428 Approved by: https://github.com/justinchuby, https://github.com/oulgen	2024-07-23 21:50:55 +00:00
wizzniu	8963623494	Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376 ) This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods. Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods. Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed. Relates #124908 Relates #14560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126376 Approved by: https://github.com/albanD	2024-07-23 01:44:15 +00:00
PyTorch MergeBot	726b9268d2	Revert "Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376 )" This reverts commit `c986aeea2d`. Reverted https://github.com/pytorch/pytorch/pull/126376 on behalf of https://github.com/atalman due to Failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/126376#issuecomment-2237496633))	2024-07-18 20:25:20 +00:00
wizzniu	c986aeea2d	Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376 ) This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods. Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods. Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed. Relates #124908 Relates #14560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126376 Approved by: https://github.com/albanD	2024-07-18 11:54:14 +00:00
eellison	9ab8d47f9d	Constant folding for dynamic shape node (#129686 ) Extend constant folding for dynamic shape node, only support pointwise op and some restricted ops We support dynamic shapes by limiting constant folding of ops that are guaranteed to have uniform values (full, pointwise ops, and views) and running these operators with tensors of shape 1. This also eliminates the possibility of memory overhead of constant folding. Taken over from https://github.com/pytorch/pytorch/pull/128937 joint work with @imzhuhl Pull Request resolved: https://github.com/pytorch/pytorch/pull/129686 Approved by: https://github.com/Chillee ghstack dependencies: #130367	2024-07-16 00:17:11 +00:00
PyTorch MergeBot	9df4bc6a0d	Revert "Constant folding for dynamic shape node (#129686 )" This reverts commit `b7d287fbec`. Reverted https://github.com/pytorch/pytorch/pull/129686 on behalf of https://github.com/atalman due to Failing internally. Test: https://github.com/pytorch/ao/blob/main/test/prototype/mx_formats/test_mx_linear.py ([comment](https://github.com/pytorch/pytorch/pull/129686#issuecomment-2228755295))	2024-07-15 15:19:24 +00:00
eellison	b7d287fbec	Constant folding for dynamic shape node (#129686 ) Extend constant folding for dynamic shape node, only support pointwise op and some restricted ops We support dynamic shapes by limiting constant folding of ops that are guaranteed to have uniform values (full, pointwise ops, and views) and running these operators with tensors of shape 1. This also eliminates the possibility of memory overhead of constant folding. Taken over from https://github.com/pytorch/pytorch/pull/128937 joint work with @imzhuhl Pull Request resolved: https://github.com/pytorch/pytorch/pull/129686 Approved by: https://github.com/Chillee ghstack dependencies: #130367	2024-07-12 03:44:29 +00:00
cyy	cb5e9183c6	[Caffe2] [2/N] Remove Caffe2 from tests (#128911 ) Follows #128675 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128911 Approved by: https://github.com/titaiwangms, https://github.com/r-barnes	2024-06-19 00:05:50 +00:00
Aaron Orenstein	dcfa7702c3	Flip default value for mypy disallow_untyped_defs [1/11] (#127838 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838 Approved by: https://github.com/oulgen	2024-06-08 18:16:33 +00:00
Aaron Gokaslan	12c4a2c297	[BE]: Apply PLR1736 fixes (unnecessary index lookup) (#127716 ) Applies the PLR1736 preview rule with some more autofixes to cut down on unnecessary accesses. Added a noqa since that test actually testing the dunder method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127716 Approved by: https://github.com/ezyang	2024-06-03 17:22:13 +00:00

1 2 3 4 5 ...

2133 Commits