pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Aaron Orenstein	6bcda3a21a	dynamo tracing perf: cache on import_source: 52.9 -> 52.58 (#143058 ) See #143056 for overall docs. This PR: add cache to `InstructionTranslatorBase.import_source()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143058 Approved by: https://github.com/jansel ghstack dependencies: #143066, #143056	2024-12-13 18:20:48 +00:00
Xuehai Pan	d47a80246a	[dynamo][pytree][3/N] make CXX pytree traceable: `tree_map` / `tree_map_` (#137399 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137399 Approved by: https://github.com/jansel ghstack dependencies: #137398	2024-12-12 18:05:25 +00:00
Xuehai Pan	7edeb1005a	[dynamo][pytree][2/N] make CXX pytree traceable: `tree_flatten` / `tree_unflatten` / `tree_structure` (#137398 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137398 Approved by: https://github.com/jansel	2024-12-12 18:05:25 +00:00
Blaine Burton Rister	520ba556cd	[Inductor] Refactor "r" reduction prefix to {"r0_", "r1_"}. (#142020 ) Preparatory refactor for https://github.com/pytorch/pytorch/pull/137243. # Feature This PR changes the `RINDEX` / `"r"` symbol type to `(R0_INDEX, R1_INDEX)` and `("r0_", "r1_")`, respectively. This allows the relevant code to support 2D (often ND) reductions. Unlike the parent PR, this one does not change the tiling algorithm, so `"r1_"` is never used. However, it prepares other parts of the system to handle `"r1_"` once we start using it. This should significantly reduce the chances of hitting merge conflicts, making the parent PR much easier to land. The only change to the generated triton code is to rename `"rindex"` -> `"r0_index"`, `"RBLOCK"` -> `"R0_BLOCK"`, etc. To maintain compatibilty with existing codegen, this also generates aliases to the old reduction variables like `rindex = r0_index`. If we generated 2D reductions (which this PR will not do), the aliases would be more complicated and would collapse 2D multi-indices to linear indices. See some example kernels in the parent PR. These aliases can be eliminated by the Triton compiler, and should not impact the final machine code running on the GPU. See the perf testing in the parent PR which confirms the aliases do not impact perf. # Test plan The existing CI provides good coverage. This PR modifies the expected code in a few places, renaming reduction variables from `r.` to `r0_.`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142020 Approved by: https://github.com/jansel Co-authored-by: Jason Ansel <jansel@meta.com>	2024-12-12 17:22:20 +00:00
Colin L. Rice	d68403df3b	filelock: Make waitcounter variant to use (#139816 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139816 Approved by: https://github.com/ezyang	2024-12-12 01:18:34 +00:00
Jane Xu	be27dbf2b8	Enable CPP/CUDAExtension with py_limited_api for python agnosticism (#138088 ) Getting tested with ao, but now there is a real test i added. ## What does this PR do? We want to allow custom PyTorch extensions to be able to build one wheel for multiple Python versions, in other words, achieve python agnosticism. It turns out that there is such a way that setuptools/Python provides already! Namely, if the user promises to use only the Python limited API in their extension, they can pass in `py_limited_api` to their Extension class and to the bdist_wheel command (with a min python version) in order to build 1 wheel that will suffice across multiple Python versions. Sounds lovely! Why don't people do that already with PyTorch? Well 2 things. This workflow is hardly documented (even searching for python agnostic specifically does not reveal many answers) so I'd expect that people simply don't know about it. But even if they did, _PyTorch_ custom Extensions would still not work because we always link torch_python, which does not abide by py_limited_api rules. So this is where this PR comes in! We respect when the user specifies py_limited_api and skip linking torch_python under that condition, allowing users to enroll in the provided functionality I just described. ## How do I know this PR works? I manually tested my silly little ultra_norm locally (with `import python_agnostic`) and wrote a test case for the extension showing that - torch_python doesn't show up in the ldd tree - no Py- symbols show up It may be a little confusing that our test case is actually python-free (more clean than python-agnostic) but it is sufficient (and not necessary) towards showing that this change works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138088 Approved by: https://github.com/ezyang, https://github.com/albanD	2024-12-11 18:22:55 +00:00
PyTorch MergeBot	2374d460d0	Revert "filelock: Make waitcounter variant to use (#139816 )" This reverts commit `237c4b559c`. Reverted https://github.com/pytorch/pytorch/pull/139816 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else. The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/139816#issuecomment-2536616808))	2024-12-11 17:26:46 +00:00
Jane Xu	47a571e166	Document that load_inline requires having a compiler installed (#137521 ) Prompted by this forum q: https://discuss.pytorch.org/t/are-the-requirements-for-using-torch-utils-cpp-extension-with-cuda-documented-anywhere/211222 Would be curious to know if we could get more precise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137521 Approved by: https://github.com/zou3519	2024-12-11 03:47:54 +00:00
Colin L. Rice	237c4b559c	filelock: Make waitcounter variant to use (#139816 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139816 Approved by: https://github.com/ezyang	2024-12-10 23:02:59 +00:00
Alexander Grund	67cf126cf8	Disable PIP version check in collect_env (#142308 ) Disables version check which might require users to reach out to PyPI, reference: https://pip.pypa.io/en/latest/cli/pip/#cmdoption-disable-pip-version-check Switches pip to be used directly as a python module (`python3 -mpip`) instead of relying on `pip3` or `pip` Pull Request resolved: https://github.com/pytorch/pytorch/pull/142308 Approved by: https://github.com/seemethere	2024-12-10 19:16:36 +00:00
Alex Denisov	539286a67b	Inductor annotations (#130429 ) Add NVTX annotations around training phases and buffer computations RFC/discussion: https://dev-discuss.pytorch.org/t/rfc-performance-profiling-at-scale-with-details-nvtx-annotations/2224 <img width="2160" alt="Screenshot 2024-07-10 at 11 48 04" src="https://github.com/pytorch/pytorch/assets/1175576/9ade139c-d393-473f-9b68-6c25da367dc4"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/130429 Approved by: https://github.com/aorenste, https://github.com/eellison, https://github.com/albanD Co-authored-by: Cedric GESTES <cedric.gestes@flex.ai>	2024-12-10 08:53:39 +00:00
Alex Kiefer	2f1191fb6a	Corrected metadata variable names (#142342 ) Fixes #142341 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142342 Approved by: https://github.com/janeyx99	2024-12-10 02:24:31 +00:00
Fabian Keller	5e8e1d725a	Remove some unused type ignores (round 1) (#142325 ) Over time, a large number of the existing type ignores have become irrelevant/unused/dead as a result of improvements in annotations and type checking. Having these `# type: ignore` linger around is not ideal for two reasons: - They are verbose/ugly syntatically. - They could hide genuine bugs in the future, if a refactoring would actually introduce a bug but it gets hidden by the ignore. I'm counting over 1500 unused ignores already. This is a first PR that removes some of them. Note that I haven't touched type ignores that looked "conditional" like the import challenge mentioned in https://github.com/pytorch/pytorch/pull/60006#issuecomment-2480604728. I will address these at a later point, and eventually would enable `warn_unused_ignores = True` in the mypy configuration as discussed in that comment to prevent accumulating more dead ignores going forward. This PR should have no effect on runtime at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142325 Approved by: https://github.com/Skylion007, https://github.com/janeyx99	2024-12-09 18:23:46 +00:00
drisspg	75e72e1408	Adding lowering to persistent-tma device kernel for _scaled_mm (#142045 ) # Summary This PR adds an alternative triton lowering for _scaled_mm. This uses an updated mm template that utilizes persistent scheduling + TMAs on A and B matrices. Limitations: * This implementations does not work with Bias values: `0602676c8d/torch/_inductor/kernel/mm_scaled.py (L106)` Plan is to remove this work around and enforce that both scaling + bias is properly done as epilogues onto the existing templates * K dim must be 32 or greater for these to take effect * Gated by a config flag ( currently defaults to Off, maybe should be on) ## Testing We dont have any tests exercising this code in CI/CD but I updated the relevant tests in test_fp8 and they are all green: <img width="1680" alt="Screenshot 2024-12-05 at 7 24 07 PM" src="https://github.com/user-attachments/assets/9c520541-d97a-416f-9af7-e68b366ec90f"> ## Follow Ups * Work to update the base mm triton templates and utilize the same template from mm/addmm/scaled_mm w/ respective epilogues * Tuning on Persistent kernel configs. I found ones that work for my problem shapes but need to do some more NCU work ### Some profiling code I was using Code I am using to iterate w/ ```Python import torch from dataclasses import dataclass from jsonargparse import CLI import logging from pathlib import Path from transformer_nuggets.utils.benchmark import ProfileConfig, profile_function from torchao.float8.inference import ( addmm_float8_unwrapped_inference, preprocess_data, Float8MMConfig, ) from transformer_nuggets.fp8.fp8_matmul import ( matmul_persistent, matmul_tma_persistent, matmul_device_tma_persistent, ) from enum import Enum logging.getLogger("transformer_nuggets").setLevel(logging.INFO) class FP8Kernel(Enum): PERSISTENT = "Persistent" PERSISTENT_TMA = "Persistent-TMA" DEVICE_TMA = "Device-TMA" SCALED_MM = "Scaled-MM" class ScalingStrategy(Enum): PER_TENSOR = "PerTensor" PER_ROW = "PerRow" @dataclass(frozen=True) class ExperimentConfig: M: int K: int N: int scaling_strategy: ScalingStrategy fp8_kernel: FP8Kernel compile: bool def get_fp8_matmul( A: torch.Tensor, B: torch.Tensor, scaling_strategy: ScalingStrategy, fp8_kernel: FP8Kernel, ): A_fp8 = A.to(torch.float8_e4m3fn) B_fp8 = B.to(torch.float8_e4m3fn) A_fp8, B_fp8 = preprocess_data(A_fp8, B_fp8, Float8MMConfig(use_fast_accum=True)) if scaling_strategy == ScalingStrategy.PER_TENSOR: a_scale = torch.tensor(1, device="cuda", dtype=torch.float32) b_scale = torch.tensor(1, device="cuda", dtype=torch.float32) elif scaling_strategy == ScalingStrategy.PER_ROW: a_scale = torch.ones((A_fp8.size(0), 1), device="cuda", dtype=torch.float32) b_scale = torch.ones((B_fp8.size(1), 1), device="cuda", dtype=torch.float32).T else: raise ValueError(f"Invalid scaling strategy: {scaling_strategy}") assert fp8_kernel == FP8Kernel.SCALED_MM return lambda: addmm_float8_unwrapped_inference( A_fp8, a_scale, B_fp8, b_scale, output_dtype=torch.bfloat16, use_fast_accum=True ) def run_matmul(config: ExperimentConfig): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") A = torch.randn(config.M, config.K, device=device, dtype=torch.bfloat16) B = torch.randn(config.K, config.N, device=device, dtype=torch.bfloat16) fp8_matmul = get_fp8_matmul(A, B, config.scaling_strategy, config.fp8_kernel) if config.compile and config.fp8_kernel == FP8Kernel.SCALED_MM: fp8_matmul = torch.compile(fp8_matmul, mode="max-autotune-no-cudagraphs") _ = fp8_matmul() return def main(): torch.random.manual_seed(123) # Define your experiment configuration here config = ExperimentConfig( M=8192, K=8192, N=8192, scaling_strategy=ScalingStrategy.PER_TENSOR, fp8_kernel=FP8Kernel.SCALED_MM, compile=True, ) run_matmul(config) if __name__ == "__main__": CLI(main) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/142045 Approved by: https://github.com/eellison	2024-12-09 01:48:40 +00:00
Xuehai Pan	0bd7b7ae58	Add version check for C++ pytree availability (#142299 ) Resolves #142256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142299 Approved by: https://github.com/jansel, https://github.com/weifengpy	2024-12-08 06:27:32 +00:00
Michael Diggin	18ef3a09cd	Add option in data loader for out of order data (#141833 ) Fixes #105203 Facing a similar problem to the linked issue, where variable sized input data can mean that a handful of slow to process samples holds up smaller and faster to process samples from being used. This also leads to lower GPU utilization as well. In certain cases, e.g. evaluation epochs, inference pipelines or other cases where reproducibility isn't important, this can bring significant speed ups. This PR adds an `allow_out_of_order` bool input to the `DataLoader` class, defaulting to `false`, which when set to `true` will returning data from workers in whatever order they are ready/processed in, rather in the strict index order. Instead of storing data that was returned out of order, it is passed directly to the main thread and the entry in `_task_info` is deleted. The main changes are they to check that an entry in `_task_info` does exist, and only increasing `self._rcvd_idx` when the lowest index remaining gets returned. Two tests are added to test this for iterable type datasets and index type datasets. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141833 Approved by: https://github.com/andrewkho	2024-12-06 19:55:58 +00:00
Laith Sakka	6183c90e99	Avoid recursion in FloorDiv constructor (#142057 ) address https://github.com/pytorch/pytorch/issues/141215 and max recursion issue in this also optimize perf by avoiding a lot of sympy expressions construction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142057 Approved by: https://github.com/ezyang	2024-12-05 14:25:28 +00:00
drisspg	3fdc74ae29	Fix dumb typo (#142079 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142079 Approved by: https://github.com/jainapurva, https://github.com/soulitzer	2024-12-05 00:43:49 +00:00
drisspg	0582b32f6c	Enable Extension Support (#142028 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142028 Approved by: https://github.com/ezyang, https://github.com/eqy	2024-12-04 15:54:06 +00:00
Henry Tsang	30d907c6fb	When serializing treespec context, support enum as well (#141525 ) Following https://github.com/pytorch/pytorch/pull/102716, per @angelayi's suggestion. Note that in general enum as an input is not supported. repro: ``` class TestEnum(enum.Enum): A = auto() B = auto() @staticmethod def from_string(s): return TestEnum[s.upper()] class M(torch.nn.Module): def forward(self, x, en): return x.clone() input1 = ( torch.rand(10, device="cuda"), {TestEnum.A: torch.rand(10, device="cuda")}, ) inputs = [input1] model = M().cuda() _ = model(*input1) ep = torch.export.export(model, input1, strict=False) path = torch._inductor.aot_compile(ep.module(), input1) ``` Differential Revision: D66269157 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141525 Approved by: https://github.com/angelayi	2024-12-04 03:08:50 +00:00
dan_the_3rd	9125e9119c	Fix memory leak in `ModuleTracker` (#141960 ) Thanks @drisspg and @albanD for finding the fix TEST PLAN ``` import gc import torch import torch.nn as nn from torch.utils.module_tracker import ModuleTracker class MyModel(nn.Module): def forward(self, x): return x * x print(f"torch=={torch.__version__}") m = MyModel() m.cuda() m.to(torch.bfloat16) mt = ModuleTracker() for i in range(1000): if i % 100 == 0: gc.collect() print("memory_allocated:", torch.cuda.memory_allocated()) x = torch.randn([128, 256], device="cuda", dtype=torch.bfloat16, requires_grad=True) with mt: m(x) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141960 Approved by: https://github.com/albanD	2024-12-03 18:36:15 +00:00
rzou	ac600fdce6	Type exposed_in decorator (#141894 ) Test Plan: - lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/141894 Approved by: https://github.com/albanD	2024-12-03 16:28:17 +00:00
Xuehai Pan	78543e6002	[dynamo][pytree][1/N] make CXX pytree traceable: `tree_iter` / `tree_leaves` (#137397 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137397 Approved by: https://github.com/jansel	2024-12-03 11:17:39 +00:00
PyTorch MergeBot	9012e7a62f	Revert "[dynamo][pytree][1/N] make CXX pytree traceable: `tree_iter` / `tree_leaves` (#137397 )" This reverts commit `07850bb2c1`. Reverted https://github.com/pytorch/pytorch/pull/137397 on behalf of https://github.com/atalman due to Failing internal test ([comment](https://github.com/pytorch/pytorch/pull/137397#issuecomment-2511934283))	2024-12-02 16:05:14 +00:00
Jason Ansel	b2fe1b9409	[inductor] Fix 3d tiling (#141709 ) Fixes #141121 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141709 Approved by: https://github.com/eellison	2024-12-01 19:47:41 +00:00
PyTorch MergeBot	b33f770574	Revert "[inductor] Fix 3d tiling (#141709 )" This reverts commit `ca9bfa1a38`. Reverted https://github.com/pytorch/pytorch/pull/141709 on behalf of https://github.com/huydhn due to Sorry for reverting your change but there is one failed test showing up in trunk. It was missed by target determination ([comment](https://github.com/pytorch/pytorch/pull/141709#issuecomment-2505213481))	2024-11-28 03:55:31 +00:00
Jason Ansel	ca9bfa1a38	[inductor] Fix 3d tiling (#141709 ) Fixes #141121 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141709 Approved by: https://github.com/eellison	2024-11-28 01:34:28 +00:00
Mark Saroufim	e24190709f	[BE] Remove Model Dump utility (#141540 ) So I found this utility by accident, trying to find how many html files we have in the repo so I could convert them to markdown Turns out we package some html and js files in pytorch to visualize torchscript models. This seems kinda strange, probably shouldn't be in core, I removed the tests I could find. Maybe some internal tests will break but considering torchscript is being superseded might make sense to do this Last time there was a meaningful update to the test for this file was about 2 years ago by @digantdesai since then it's a bunch of routine upgrades It seems like this package is unused https://github.com/search?type=code&auto_enroll=true&q=torch.utils.model_dump&p=1 I skimmed through 5 pages of these and the only time this shows up in code search is when someone is either cloning pytorch or checking in their venv into github Pull Request resolved: https://github.com/pytorch/pytorch/pull/141540 Approved by: https://github.com/malfet	2024-11-27 22:52:55 +00:00
Xuehai Pan	07850bb2c1	[dynamo][pytree][1/N] make CXX pytree traceable: `tree_iter` / `tree_leaves` (#137397 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137397 Approved by: https://github.com/jansel ghstack dependencies: #141360	2024-11-27 00:21:58 +00:00
William Wen	6fa4356451	handle sympy.oo in bitwise_and/or value_ranges (#141522 ) An internal test is failing due to not handling `sympy.oo` properly in bitwise_and/or value_ranges: [T208684142](https://www.internalfb.com/intern/tasks/?t=208684142). I don't know how to repro this - seems like this requires inductor to trigger as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141522 Approved by: https://github.com/ezyang ghstack dependencies: #138777	2024-11-26 20:01:31 +00:00
Isuru Fernando	44186a0a4e	Move Sympy printers to torch/utils/_sympy/printers.py (#140597 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140597 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2024-11-26 18:11:00 +00:00
Florian (Feuermagier)	4fa72168ea	FlopCounterMode: Decompose ops for inference mode (#138508 ) Fixes #126268 I've basically followed @ezyang suggestion (I think) to use `func.decompose(...)`. Since `__torch_dispatch__` won't be called a second time for the same op, I've added a second `TorchDispatchMode` (`_DecomposedCounterMode`) that simpy dispatches to the parent flop counter. Using `self` as the inner context manager is not possible, since the second call to `__enter__` would re-initialize the counter's tracking state. Let me know if there's something wrong with this implementation, since I'm quite unsure how the decomposition thing actually works :D Pull Request resolved: https://github.com/pytorch/pytorch/pull/138508 Approved by: https://github.com/ezyang Co-authored-by: Edward Z. Yang <ezyang@meta.com>	2024-11-25 16:53:10 +00:00
Nikita Shulga	2398e758d2	Fix access to `_msvccompiler` from newer distutils (#141363 ) Newer versions of distutils no longer import `_msvccompiler` upon init(on Windows platform, that was not the case on other platforms even before 74), but it's still accessible if one chooses to import it directly. Test plan: ``` % python -c 'from setuptools import distutils; print(distutils.__version__, hasattr(distutils, "_msvccompiler")); from distutils import _msvccompiler; import setuptools; print(setuptools.__version__, _msvccompiler.__file__)' 3.10.9 False 65.5.0 /usr/local/fbcode/platform010/Python3.10.framework/Versions/3.10/lib/python3.10/site-packages/setuptools/_distutils/_msvccompiler.py ``` and ``` % python -c 'from setuptools import distutils; print(distutils.__version__, hasattr(distutils, "_msvccompiler")); from distutils import _msvccompiler; import setuptools; print(setuptools.__version__, _msvccompiler.__file__)' 3.13.0 False 75.6.0 /Users/malfet/py312-venv/lib/python3.13/site-packages/setuptools/_distutils/_msvccompiler.py ``` Gave up trying to appease the linker, so rewrote it as following function: ```python def _get_vc_env(vc_arch: str) -> dict[str, str]: try: from setuptools import distutils # type: ignore[import] return distutils._msvccompiler._get_vc_env(vc_arch) # type: ignore[no-any-return] except AttributeError: from setuptools._distutils import _msvccompiler #type: ignore[import] return _msvccompiler._get_vc_env(vc_arch) # type: ignore[no-any-return] ``` This PR also undoes setuptools version restriction introduced by https://github.com/pytorch/pytorch/pull/136489 as premise for restriction is incorrect Fixes https://github.com/pytorch/pytorch/issues/141319 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141363 Approved by: https://github.com/huydhn, https://github.com/atalman	2024-11-25 01:50:47 +00:00
William Wen	ee7eaad5c3	[dynamo] add SymNode bitwise and/or (#138777 ) Fixes [T203472723](https://www.internalfb.com/intern/tasks/?t=203472723) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138777 Approved by: https://github.com/ezyang	2024-11-22 23:36:16 +00:00
PyTorch MergeBot	f23621ec56	Revert "Move Sympy printers to torch/utils/_sympy/printers.py (#140597 )" This reverts commit `c25b201583`. Reverted https://github.com/pytorch/pytorch/pull/140597 on behalf of https://github.com/huydhn due to Trunk is sad again after this lands, this looks like a landrace this time, so please do a rebase ([comment](https://github.com/pytorch/pytorch/pull/140597#issuecomment-2494052978))	2024-11-22 15:43:39 +00:00
Isuru Fernando	c25b201583	Move Sympy printers to torch/utils/_sympy/printers.py (#140597 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140597 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2024-11-22 02:04:36 +00:00
Laith Sakka	e39955e82f	Avoid some max constructor optimizations when known not needed. (#139741 ) Summary: around 10% with 1K nodes more than that with 2K features. 414.5735 -> 333 (20%) This target optimizing patterns like this ``` sym_max: "Sym(Max(u31 + u32, u33 + u34))" = torch.sym_max(sym_sum_6, sym_sum_7); sym_sum_6 = sym_sum_7 = None sym_max_1: "Sym(Max(u31 + u32, u33 + u34, u35 + u36))" = torch.sym_max(sym_max, sym_sum_8); sym_max = sym_sum_8 = None sym_max_2: "Sym(Max(u31 + u32, u33 + u34, u35 + u36, u37 + u38))" = torch.sym_max(sym_max_1, sym_sum_9); sym_max_1 = sym_sum_9 = None sym_max_3: "Sym(Max(u31 + u32, u33 + u34, u35 + u36, u37 + u38, u39 + u40))" = torch.sym_max(sym_max_2, sym_sum_10); sym_max_2 = sym_sum_10 = None sym_max_4: "Sym(Max(u31 + u32, u33 + u34, u35 + u36, u37 + u38, u39 + u40, u41 + u42))" = torch.sym_max(sym_max_3, sym_sum_11); sym_max_3 = sym_sum_11 = None sym_max_5: "Sym(Max(u31 + u32, u33 + u34, u35 + u36, u37 + u38, u39 + u40, u41 + u42, u43 + u44))" = torch.sym_max(sym_max_4, sym_sum_12); sym_max_4 = sym_sum_12 = None sym_max_6: "Sym(Max(u31 + u32, u33 + u34, u35 + u36, u37 + u38, u39 + u40, u41 + u42, u43 + u44, u45 + u46))" = torch.sym_max(sym_max_5, sym_sum_13); sym_max_5 = sym_sum_13 = None sym_max_7: "Sym(Max(u31 + u32, u33 + u34, u35 + u36, u37 + u38, u39 + u40, u41 + u42, u43 + u44, u45 + u46, u47 + u48))" = torch.sym_max(sym_max_6, sym_sum_14); sym_max_6 = sym_sum_14 = None sym_max_8: "Sym(Max(u31 + u32, u33 + u34, u35 + u36, u37 + u38, u39 + u40, u41 + u42, u43 + u44, u45 + u46, u47 + u48, u49 + u50))" = torch.sym_max(sym_max_7, sym_sum_15); sym_max_7 = sym_sum_15 = sym_max_8 = None ``` <img width="496" alt="Screenshot 2024-11-05 at 11 00 35 AM" src="https://github.com/user-attachments/assets/455c06a3-e1bf-43cb-b880-9470ae6fb07f"> <img width="511" alt="Screenshot 2024-11-05 at 11 00 57 AM" src="https://github.com/user-attachments/assets/ff0d4236-9b5c-4a9a-8520-47b005bb3cb0"> Differential Revision: D65354971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139741 Approved by: https://github.com/ezyang	2024-11-21 16:50:52 +00:00
Colin L. Rice	1d6ca50c5b	config: Throw if justknobs value is not a boolean (#139488 ) This helps avoid an issue, where someone uses a mutable type that justknobs does not support within the code. And then it gets overriden to a different type Pull Request resolved: https://github.com/pytorch/pytorch/pull/139488 Approved by: https://github.com/ezyang	2024-11-20 23:52:17 +00:00
PyTorch MergeBot	701e06b643	Revert "Move Sympy printers to torch/utils/_sympy/printers.py (#140597 )" This reverts commit `aefcdb3c9f`. Reverted https://github.com/pytorch/pytorch/pull/140597 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think it fails inductor/test_padding in trunk. This is a target determination miss and that failed test was not run in your PR ([comment](https://github.com/pytorch/pytorch/pull/140597#issuecomment-2489641453))	2024-11-20 22:13:57 +00:00
Isuru Fernando	aefcdb3c9f	Move Sympy printers to torch/utils/_sympy/printers.py (#140597 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140597 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2024-11-20 20:26:49 +00:00
Dmitry Nikolaev	c9db2c6328	[ROCm] cudagraph explicit sync only after capture_begin() (#138722 ) hipGraphExecDestroy doesn't immediately free memory since rocm6.2. They wait for next sync point in order to free the memory, this is to ensure that all hipGraphLaunch are finished before we release any memory. We need to ensure all async opreations finish before deleting the object. capture_dev_ variable is used to save the device number when capture_begin() method is called But CUDAGraph can be created and destroyed without calling capture_begin() method. `capture_dev_ = UNDEFINED_DEVICE;` allows to detect such a case and skip sync Tests impacted: test_cuda.py::TestCuda::test_graph_make_graphed_callables_* distributed/test_c10d_nccl.py::ProcessGroupNCCLTest::test_allreduce_in_cudagraph Pull Request resolved: https://github.com/pytorch/pytorch/pull/138722 Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/jeffdaily	2024-11-20 19:37:22 +00:00
Aaron Gokaslan	12e95aa4ee	[BE]: Apply PERF401 autofixes from ruff (#140980 ) * Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables. * list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize. * Manually went back and made mypy happy after the change. * Also fixed style lints in files covered by flake8 but not by pyfmt Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-11-20 17:52:07 +00:00
Laith Sakka	8d708090c0	Optimize increment summations [Latest Nov 15] (#140822 ) Summary: wins on torchrec benchmark, for 2K nodes it save 40seconds with the recent sympy changes (https://www.internalfb.com/diff/D65883538) we save around 13 second ( with the max opt on). ``` buck2 run fbcode//mode/opt fbcode//torchrec/distributed/tests:pt2_compile_benchmark -- --num-features=200 ``` This diff optimizes construction expressions of the form a+b+c... (all unique symbols). which are very common in torchrec models. How Expressions of the form a+b+c are not optimized by add, the only needed optimization is sorting them. If we have a+b+c and we are adding (d) to it, we can do a binary search to know the position of (d) and avoid optimizing the new expression by passing the new order. Extensions: 1. support constant terms. 2. support 10a+10b+.. (this will give even more wins will extend the support in second PR) Differential Revision: D66008482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140822 Approved by: https://github.com/ezyang	2024-11-20 16:48:20 +00:00
Colin L. Rice	241d2259d3	torch/config: fix mock behaviour (#140779 ) Mock only replaces the value that was removed, if after deletion, it does not see the attribute. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140779 Approved by: https://github.com/ezyang	2024-11-20 02:57:16 +00:00
PyTorch MergeBot	727f1a6da9	Revert "FlopCounterMode: Decompose ops for inference mode (#138508 )" This reverts commit `f915409c26`. Reverted https://github.com/pytorch/pytorch/pull/138508 on behalf of https://github.com/jamesjwu due to Failing internal jobs ([comment](https://github.com/pytorch/pytorch/pull/138508#issuecomment-2484310587))	2024-11-18 22:59:36 +00:00
Aki	9c818c880f	[torchgen] Improve schema parsing with regex for numeric ranges (#140210 ) Replaces the hardcoded string replacement for numeric ranges with a more robust regex pattern that handles any combination of positive and negative numbers in default value ranges. Fixes #135470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140210 Approved by: https://github.com/ezyang	2024-11-14 23:28:27 +00:00
PyTorch MergeBot	c1fe6be202	Revert "[dynamo] add SymNode bitwise and/or (#138777 )" This reverts commit `c98ef0279e`. Reverted https://github.com/pytorch/pytorch/pull/138777 on behalf of https://github.com/ezyang due to triggering AssertionError: Guard check failed: 14/2: name 'BitwiseFn_bitwise_or' is not defined ([comment](https://github.com/pytorch/pytorch/pull/138777#issuecomment-2477477776))	2024-11-14 21:52:40 +00:00
Oguz Ulgen	65518fd9ef	Turn on triton bundler in OSS (#140600 ) Its been enabled internally, lets also push it out to OSS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140600 Approved by: https://github.com/masnesral	2024-11-14 20:02:15 +00:00
William Wen	c98ef0279e	[dynamo] add SymNode bitwise and/or (#138777 ) Fixes [T203472723](https://www.internalfb.com/intern/tasks/?t=203472723) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138777 Approved by: https://github.com/ezyang	2024-11-13 18:31:06 +00:00
Kefei Lu	d2d1258b1b	Speed up AMD AOT Inductor lowering by memoizing hipify trie to regex logic (#140156 ) Summary: AMD lowering duration is 1.55x longer than H100. Profiling shows hipification related functions took 22% of overall lowering time. This diff cuts that time by safely memoize the trie to regex logic. The trick is to incrementally build a state of the trie during the trie construction. The state is the hash of all the words added to the trie. Differential Revision: D65659445 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140156 Approved by: https://github.com/ColinPeppler Co-authored-by: Kefei Lu <kefeilu@meta.com>	2024-11-09 04:28:58 +00:00
Florian (Feuermagier)	f915409c26	FlopCounterMode: Decompose ops for inference mode (#138508 ) Fixes #126268 I've basically followed @ezyang suggestion (I think) to use `func.decompose(...)`. Since `__torch_dispatch__` won't be called a second time for the same op, I've added a second `TorchDispatchMode` (`_DecomposedCounterMode`) that simpy dispatches to the parent flop counter. Using `self` as the inner context manager is not possible, since the second call to `__enter__` would re-initialize the counter's tracking state. Let me know if there's something wrong with this implementation, since I'm quite unsure how the decomposition thing actually works :D Pull Request resolved: https://github.com/pytorch/pytorch/pull/138508 Approved by: https://github.com/ezyang	2024-11-09 03:13:53 +00:00
Gabriel Ferns	2037ea3e15	Add type annotations to Configs (#139833 ) Summary: Adds types to Configs, and fixes a bug in options that was caused by the lack of types. fixes: https://github.com/pytorch/pytorch/issues/139822 Configs are used by many modules so not sure which label to put. Types also allow https://github.com/pytorch/pytorch/pull/139736 to fuzz configs Pull Request resolved: https://github.com/pytorch/pytorch/pull/139833 Approved by: https://github.com/c00w	2024-11-07 03:49:09 +00:00
Colin L. Rice	2a857e940d	config: Add env_name_default and env_name_force to Config (#138956 ) This allows Configs to handle setting their defaults (or overriding themselves) via environment variables. The environment variables are resolved at install time (which is usually import time). This is done 1) to avoid any race conditions between threads etc..., but 2) to help encourage people to just go modify the configs directly, vs overriding environment variables to change pytorch behaviour. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138956 Approved by: https://github.com/ezyang ghstack dependencies: #138766	2024-11-06 21:20:42 +00:00
Huy Do	c19c384690	Fix torch.load (torch.utils.benchmark) after #137602 (#139810 ) After #137602, the default `weights_only` has been set to True. This test is failing in trunk slow jobs atm benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_collect_callgrind [GH job link](https://github.com/pytorch/pytorch/actions/runs/11672436111/job/32502454946) [HUD commit link](`1aa71be56c`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139810 Approved by: https://github.com/kit1980	2024-11-06 03:08:29 +00:00
Aaron Orenstein	51a3d6dbc3	Fix existing lint issues in ir.py (#139237 ) - Remove stale mypy "type: ignores" - Made ir.py pass the rest of the lints Pull Request resolved: https://github.com/pytorch/pytorch/pull/139237 Approved by: https://github.com/Skylion007	2024-11-05 06:06:12 +00:00
Jason Ansel	ed30fa74ab	[inductor] sympy.Integer([01]) -> sympy.S.(Zero\|One) (#139523 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139523 Approved by: https://github.com/ezyang ghstack dependencies: #139364, #139365, #139370, #139452	2024-11-04 04:28:40 +00:00
Edward Z. Yang	585dbfa583	Profile guided optimization for automatic_dynamic (#139001 ) Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR. This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001 Approved by: https://github.com/oulgen	2024-11-03 06:29:57 +00:00
PyTorch MergeBot	92d7f29e59	Revert "Profile guided optimization for automatic_dynamic (#139001 )" This reverts commit `f6be44c74e`. Reverted https://github.com/pytorch/pytorch/pull/139001 on behalf of https://github.com/ezyang due to more fbcode errors ([comment](https://github.com/pytorch/pytorch/pull/139001#issuecomment-2452985581))	2024-11-02 13:11:04 +00:00
Edward Z. Yang	f6be44c74e	Profile guided optimization for automatic_dynamic (#139001 ) Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR. This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001 Approved by: https://github.com/oulgen	2024-11-02 11:50:11 +00:00
PyTorch MergeBot	98e11b0021	Revert "[inductor] sympy.Integer([01]) -> sympy.S.(Zero\|One) (#139523 )" This reverts commit `c53beab377`. Reverted https://github.com/pytorch/pytorch/pull/139523 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing lots of internal tests in D65345157 ([comment](https://github.com/pytorch/pytorch/pull/139364#issuecomment-2452897337))	2024-11-02 06:49:10 +00:00
PyTorch MergeBot	8d1eaa3da6	Revert "Profile guided optimization for automatic_dynamic (#139001 )" This reverts commit `a6630bcf87`. Reverted https://github.com/pytorch/pytorch/pull/139001 on behalf of https://github.com/ezyang due to internal code triggers import cycle ([comment](https://github.com/pytorch/pytorch/pull/139001#issuecomment-2452833882))	2024-11-02 03:38:15 +00:00
Jason Ansel	c53beab377	[inductor] sympy.Integer([01]) -> sympy.S.(Zero\|One) (#139523 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139523 Approved by: https://github.com/ezyang ghstack dependencies: #139364, #139365, #139370, #139452	2024-11-02 03:04:22 +00:00
Edward Z. Yang	a6630bcf87	Profile guided optimization for automatic_dynamic (#139001 ) Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR. This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001 Approved by: https://github.com/oulgen	2024-11-01 21:43:25 +00:00
Colin L. Rice	abc5d59dcb	config: create Config objects with JK support (#138766 ) This teaches install_config_module (and the underlying code) to understands Config objects. Additionally we've added a JK option to this which resolves the JK. This config gets stored within the _ConfigEntry class and is evaluated when __getattr__ is called. If justknobs is set, it'll call justknobs_check to see the result. Due to preceeding work, basically everything works correctly here and we had to update a couple of tests, and modify the getattr behaviour. Note that we are updating the justknob_check function to support a default option, to make default work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138766 Approved by: https://github.com/ezyang	2024-11-01 19:20:37 +00:00
Xuehai Pan	9bbe4a67ad	[dynamo] support `maxlen` for `collections.deque` (#138194 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138194 Approved by: https://github.com/jansel, https://github.com/malfet	2024-10-30 10:08:02 +00:00
Colin L. Rice	a0e095dd9f	config: Modify install_config_module to use a layered approach (#138758 ) This modifies the config system, to use a single mapping of config -> ConfigEntry and to store the default and user values within them. We could have used multiple dicts (i.e. user_override and default), but as we add more fields (justknobs in this PR, perhaps testing and env variables later), it quickly becomes painful. There are a couple design decisions we could change. 1) All configs we save store the resolved value - not the default and user override seperately 2) All configs we load, apply the resolved value as a user override. This means that certain complexities of default behvaiour and deletion (as well as JK), will change if you save + load a config. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138758 Approved by: https://github.com/ezyang	2024-10-29 23:19:36 +00:00
Jeff Daily	7c7b2d89ba	[ROCm] set hipblas workspace (#138791 ) Fixes #138532. This brings hipblas behavior in line with cublas behavior with respect to setting the workspace to an allocation from the caching allocator as well as the env var HIPBLAS_WORKSPACE_CONFIG. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138791 Approved by: https://github.com/naromero77amd, https://github.com/eqy, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-10-29 01:37:55 +00:00
Edward Z. Yang	91ded0576d	Add sym_log2 (#137980 ) Internal xref: https://fb.workplace.com/groups/1075192433118967/permalink/1515595595745313/ Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/137980 Approved by: https://github.com/bobrenjc93	2024-10-28 17:03:14 +00:00
银河渡舟	4d8090cabb	Avoid file encoding issues when loading cpp extensions (#138565 ) I've found that when using `torch.utils.cpp_extension.load` on my Windows system, decoding errors occur when my .cpp/.cu files contain certain non-English characters. `test.py`: ```py from torch.utils.cpp_extension import load my_lib = load(name='my_cuda_kernel', sources=['my_cuda_kernel.cu'], extra_cuda_cflags=['-O2', '-std=c++17']) # ...... ``` `my_cuda_kernel.cu`: ```cpp #include <torch/types.h> #include <torch/extension.h> // 向量化 <------ some chinese characters // ...... ``` Errors will be reported as: ``` Traceback (most recent call last): File "E:\test\test.py", line 8, in <module> my_lib = load( ^^^^^ File "C:\Users\XXX\AppData\Roaming\Python\Python311\site-packages\torch\utils\cpp_extension.py", line 1314, in load return _jit_compile( ^^^^^^^^^^^^^ File "C:\Users\XXX\AppData\Roaming\Python\Python311\site-packages\torch\utils\cpp_extension.py", line 1680, in _jit_compile version = JIT_EXTENSION_VERSIONER.bump_version_if_changed( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\XXX\AppData\Roaming\Python\Python311\site-packages\torch\utils\_cpp_extension_versioner.py", line 46, in bump_version_if_changed hash_value = hash_source_files(hash_value, source_files) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\XXX\AppData\Roaming\Python\Python311\site-packages\torch\utils\_cpp_extension_versioner.py", line 17, in hash_source_files hash_value = update_hash(hash_value, file.read()) ^^^^^^^^^^^ UnicodeDecodeError: 'gbk' codec can't decode byte 0x96 in position 141: illegal multibyte sequence ``` The issue lies in the fact that the `open()` function in Python is platform-dependent, which can cause decoding errors when a file contains characters that are not supported by the default encoding. Pytorch uses file contents to generate hash string: `60c1433041/torch/utils/_cpp_extension_versioner.py (L16-L17)` In my windows the default encoding is `gbk` but all of my cpp files are in `utf-8`. There is a simple solution to this problem I think: just change the file reading mode to binary mode, which can avoid issues related to file encoding. It works perfectly on my computer. ```diff - with open(filename) as file: + with open(filename, 'rb') as file: hash_value = update_hash(hash_value, file.read()) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138565 Approved by: https://github.com/malfet, https://github.com/janeyx99 Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-10-28 14:06:34 +00:00
PyTorch MergeBot	2487a834a4	Revert "Add sym_log2 (#137980 )" This reverts commit `5d450d7fac`. Reverted https://github.com/pytorch/pytorch/pull/137980 on behalf of https://github.com/jeanschmidt due to lint broke from this onwards on main ([comment](https://github.com/pytorch/pytorch/pull/137980#issuecomment-2441570186))	2024-10-28 13:21:08 +00:00
Edward Z. Yang	8274dadac5	Make OpaqueUnaryFn pickleable (#138395 ) Fixes https://github.com/pytorch/pytorch/issues/138070 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/138395 Approved by: https://github.com/XuehaiPan, https://github.com/bobrenjc93	2024-10-28 13:10:04 +00:00
Edward Z. Yang	5d450d7fac	Add sym_log2 (#137980 ) Internal xref: https://fb.workplace.com/groups/1075192433118967/permalink/1515595595745313/ Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/137980 Approved by: https://github.com/bobrenjc93	2024-10-28 03:09:11 +00:00
Aaron Gokaslan	49ed365b22	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-10-26 15:07:13 +00:00
Irem Yuksel	b021486405	Enable Windows Arm64 (#133088 ) This PR enables Pytorch for Windows on Arm64 - CPU only. Currently, there aren't any checks in place to build and test for Windows on Arm64, but we're working to implement those as soon as possible. We recommend using [Arm Performance Libraries (APL)](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries) as a BLAS option, which is introduced in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133088 Approved by: https://github.com/malfet Co-authored-by: cristian panaite <panaite.cristian2000@gmail.com> Co-authored-by: Stefan-Alin Pahontu <56953855+alinpahontu2912@users.noreply.github.com> Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>	2024-10-24 16:10:44 +00:00
Laith Sakka	ed313a5ca2	Introduce torch.sym_add, variadic add (#138660 ) Tested internally here: https://www.internalfb.com/diff/D64057744 This is a reland after previous internal failures. main change is ``` if min is None and max is None: torch._check_is_size(size) return ``` Partially addresses https://github.com/pytorch/pytorch/issues/128150 When you have big sums of values, we end up computing long chains of binary addition in our FX graph representation. Not only is this ugly, it also is quadratic, as the sympy.Add constructor is O(N) in number of arguments. Instead, ensure that we maintain the summation as a single FX node so we can do the entire addition all in one go. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/138660 Approved by: https://github.com/ezyang, https://github.com/bobrenjc93	2024-10-23 17:42:41 +00:00
Colin L. Rice	bb8bc7d6b3	config: simplify most of the config handling and fix some bugs (#138377 ) This PR combines a number of cleanups in one PR. If any of the specific cleanups don't seem to make sense, let me know and I can remove them. Cleanups - This PR adds a set of test suites for the config module code, which handles basically all the APIs and ways it is used. Please let me know if you see anything critical that is not tested that I missed. This test suite is primarily used as the regression test suite for later changes in this diff. Note that there is some dynamo specific testing of the config module, but it isn't as verbose. - I removed all internal usage of shallow_copy_dict. Those usages could all use the deep copy, and did not depend on the reference behavior of certain config values that shallow_copy_dict allows. - I removed shallow copy semantics for configuration with a deprecation warning. I think this requires a release note, so hopefully I did that correctly. Let me know if we want to continue to expose shallow copy value semantics, but I just can't find a case where I expect anyone would want it. It also complicated later internal changes to the API (i.e. breaking apart various layers of the config changes). - I fixed what I believe is a bug in how hashes are calculated on configs. In particular, if you got the hash, then made a config change, and then got the hash again, it would not update the hash. @oulgen, please let me know if I'm misunderstanding this behavior and it is desired. - I switched our multiple implementations of iterating through the dictionary to a single one. This is primarily to make later changes easier, but it also makes it clear how inconsistent our various config ignoring options are. Let me know if people would be interested in me unifying the various options for ignoring config values. - I updated the test patcher (not the performance critical one, just the normal one), to use __setattr__ and __getattr__ to remove direct API access to the underlying config fetcher. For release notes, Not sure exactly how to communicate this, but something like "ConfigModule.to_dict, and ConfigModule.shallow_copy_dict no longer retain their shallow copy semantics, which allowed reference values objects to be modified. If you wish to modify the config object, call load_config explicitly". Pull Request resolved: https://github.com/pytorch/pytorch/pull/138377 Approved by: https://github.com/ezyang, https://github.com/jansel, https://github.com/jovianjaison	2024-10-22 13:40:26 +00:00
PyTorch MergeBot	32d4582e02	Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814 )" This reverts commit `16caa8c1b3`. Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/jeanschmidt due to checking if this will solve inductor errors ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2427565425))	2024-10-21 19:40:58 +00:00
Aaron Gokaslan	16caa8c1b3	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-10-21 17:20:06 +00:00
Isuru Fernando	4f45a052ad	Fix try_solve for s1*s2 == 0 when both symbols are unknown (#137919 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137919 Approved by: https://github.com/ezyang	2024-10-20 23:33:08 +00:00
Tom Ritchford	c0582fd0f8	Remove unused Python variables in torch/[b-z]* (#136963 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963 Approved by: https://github.com/ezyang	2024-10-19 16:45:22 +00:00
Bob Ren	38ea487338	Re-raise in _run_sympy_handler to reduce log spew (#138356 ) Fixes: https://github.com/pytorch/pytorch/issues/138069 I tested this by running `python test/inductor/test_torchinductor_dynamic_shapes.py DynamicShapesCpuTests.test_builtins_round_float_ndigits_pos_dynamic_shapes_cpu` before and after the change and verifying no more log spew. I'm uncertain on if it makes sense to add a test for this PR. Question for reviewers: is there a standard paradigm for testing these log spew based fixed? Happy to add a test if someone can point me towards the right direction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138356 Approved by: https://github.com/ezyang	2024-10-19 16:02:45 +00:00
PyTorch MergeBot	e8b1409dcf	Revert "[user triton] typing triton_kernel_wrap.py (#138230 )" This reverts commit `2f61b69603`. Reverted https://github.com/pytorch/pytorch/pull/138230 on behalf of https://github.com/wdvr due to Reverting this, as it started failing tests on main ([comment](https://github.com/pytorch/pytorch/pull/138230#issuecomment-2423354596))	2024-10-18 23:12:29 +00:00
David Berard	2f61b69603	[user triton] typing triton_kernel_wrap.py (#138230 ) Remove `# mypy: allow-untyped-defs` from triton_kernel_wrap.py, and fixed all the mypy errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138230 Approved by: https://github.com/oulgen, https://github.com/Skylion007	2024-10-18 19:29:31 +00:00
Jing Xu	14e6624473	Update wmic command used in collect_env.py to its counterpart in powershell due to its deprecation (#138297 ) As title. `wmic` is deprecated in Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138297 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-10-18 07:03:17 +00:00
ur4t	0b168ceb6d	Collect Nvidia libraries with collect_env.py (#138076 ) Collect Nvidia libraries to diagnose issues like #133548. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138076 Approved by: https://github.com/malfet	2024-10-18 05:05:00 +00:00
Xiaodong Wang	b14c9b7250	[AMD] Hipify torchaudio_decoder (#138181 ) Summary: X-link: https://github.com/pytorch/audio/pull/3843 Continue to hipify more torchaudio targets. Test Plan: CI buck build mode/opt-amd-gpu pytorch/audio/src/... Differential Revision: D64298970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138181 Approved by: https://github.com/houseroad	2024-10-17 23:37:37 +00:00
Adnan Akhundov	809ff3b274	Add host-side Triton TMA support to Dynamo (#137677 ) This adds Dynamo tracing support for the host-side Triton TMA API (see `create_2d_tma_descriptor` calls on the host in the [Triton tutorial](https://triton-lang.org/main/getting-started/tutorials/09-persistent-matmul.html#sphx-glr-getting-started-tutorials-09-persistent-matmul-py)). A few notes: - Here we assume the availability of the host-side TMA API added to upstream Triton in https://github.com/triton-lang/triton/pull/4498. As of time of writing, this is not a part of the PT2 OSS Triton pin (although back-ported internally). OSS Triton pin update should be done in December 2024. - To capture the chain of calls `t.data_ptr() --> create_{1d,2d}_tma_descriptor(ptr, ...) --> kernel[grid](tma_desc, ...)`, we add three new variable trackers: `DataPtrVariable`, `CreateTMADescriptorVariable` (for the function), `TMADescriptorVariable` (for TMA descriptor object). This is to maintain the path back from the Triton kernel to the Tensor from which the TMA descriptor has been created. - The newly introduced variables have `reconstruct` methods used in case of graph breaks. - The `tma_descriptor_metadata` extracted from the captured `create_{1d,2d}_tma_descriptor` calls is propagated through the HOPs in Dynamo and AOTAutograd to be used by the downstream compiler (e.g., Inductor). See the unit tests for how the captured HOP arguments look like. - In the Dynamo-captured fx graph, we replace the TMA descriptor arguments of the Triton kernel by the underlying Tensors, to be able to track the input/output relationships in terms of Tensors. - In the Triton kernel mutation analysis pass (in AOTAutograd), we use the `tt.experimental_descriptor_store` TTIR op to detect mutations of the underlying tensors via TMA descriptors. So that downstream AOTAutograd can perform functionalizations as required. - JIT Inductor and AOT Inductor support will be implemented in follow-up PRs. Differential Revision: [D64404928](https://our.internmc.facebook.com/intern/diff/D64404928) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137677 Approved by: https://github.com/zou3519	2024-10-16 02:18:48 +00:00
Isuru Fernando	08ce3aac62	Cache some ValueRanges (#137438 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137438 Approved by: https://github.com/ezyang	2024-10-13 19:23:34 +00:00
xangma	fe8d66d9a6	Faster Faster BatchSampler (#137423 ) Builds upon #76951. Benchmarking code is the same as in #76950. AMD Ryzen Threadripper PRO 3995WX: ``` batch_size drop_last origin new speedup ------------ ----------- -------- ------ --------- 4 True 0.94 0.5706 64.74% 4 False 0.9745 0.9468 2.93% 8 True 0.7423 0.3715 99.82% 8 False 0.7974 0.5666 40.73% 64 True 0.5394 0.2085 158.76% 64 False 0.6083 0.2697 125.51% 640 True 0.5448 0.1985 174.41% 640 False 0.7085 0.2308 206.91% 6400 True 0.5554 0.2028 173.88% 6400 False 0.7711 0.2109 265.60% 64000 True 0.556 0.2091 165.82% 64000 False 0.7803 0.2078 275.58% ``` When `drop_last == True`, it uses `zip` to speed things up. When `drop_last == False`, it uses `itertools` to speed things up. `itertools` was the fastest way I could find that deals with the last batch if it is smaller than `batch_size`. I have a pure python method too, but it is slower when `batch_size` is 4 or 8, so I have committed the `itertools` version for now. Happy to chat further about this change :-) I understand you may not want to introduce the `itertools` package into [sampler.py](https://github.com/pytorch/pytorch/blob/main/torch/utils/data/sampler.py). Pull Request resolved: https://github.com/pytorch/pytorch/pull/137423 Approved by: https://github.com/Skylion007	2024-10-13 09:36:03 +00:00
Xiaodong Wang	eea1f79a1d	[AMD] use rccl.h instead of rccl/rccl.h (#135472 ) Summary: We hipify NCCLUtils.h from nccl.h to rccl/rccl.h. This follows the format of the rocm rpm suite (the header is in include/rccl/rccl.h), however the source code is just src/rccl.h. Using the rccl/rccl.h will make us find the rpm's header but not the src code's header. Test Plan: buck run mode/opt-amd-gpu -c hpc_comms.use_rccl=develop -c fbcode.split-dwarf=True --config rccl.build_rdma_core=true --config rccl.adhoc_brcm=true //aps_models/ads/icvr:icvr_launcher -- mode=local_ctr_cvr_cmf_rep_1000x_v1_no_atom data_loader.dataset.table_ds=[2024-09-04] data_loader.dataset.batch_size=512 max_ind_range=10 w/o this diff, it'll show 2.18 nccl version Differential Revision: D62371434 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135472 Approved by: https://github.com/jeffdaily, https://github.com/cenzhaometa	2024-10-10 08:55:57 +00:00
Edward Z. Yang	d9f4a7d3f9	Simplify find_localzeros (#133325 ) Instead of doing an N^2 connected thing, only do simplifications for binary max/min, and for very simple situations. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Differential Revision: [D64135230](https://our.internmc.facebook.com/intern/diff/D64135230) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133325 Approved by: https://github.com/albanD	2024-10-10 00:52:50 +00:00
PyTorch MergeBot	16a2c2cfd4	Revert "Introduce torch.sym_sum (#136429 )" This reverts commit `90bed32b98`. Reverted https://github.com/pytorch/pytorch/pull/136429 on behalf of https://github.com/ezyang due to fails internal stuff ([comment](https://github.com/pytorch/pytorch/pull/136429#issuecomment-2403335147))	2024-10-09 20:08:01 +00:00
Bob Ren	36133f39db	Tensorify compute on Python scalars (#136674 ) Signed-off-by: Bob Ren <bobrenfb.com> Comandeered from https://github.com/pytorch/pytorch/pull/130228 as I'm helping @ezyang w/ shipping dynamic float arguments in PT2. This starts with supporting torch.ops.aten.mul. I'll stack on top support for other operators in subsequent PRs to keep this scoped to the mechanics of the fx pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136674 Approved by: https://github.com/ezyang	2024-10-09 18:51:41 +00:00
Edward Z. Yang	1aac1ffce1	Don't generate implicit value ranges for missing symbols. (#136667 ) Instead, callback to a missing handler when needed. This greatly speeds things up with the value ranges dict is large. The missing handler is needed because nested ints don't have VRs, but symbolic sizes involving them occasionally show up in compute. ``` TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s11" TORCH_LOGS=dynamic PYTORCH_TEST_WITH_DYNAMO=1 python test/test_nestedtensor.py TestNestedTensorAutogradCPU.test_dropout_backward_jagged_cpu ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/136667 Approved by: https://github.com/isuruf ghstack dependencies: #136429	2024-10-08 18:12:57 +00:00
Edward Z. Yang	90bed32b98	Introduce torch.sym_sum (#136429 ) Partially addresses https://github.com/pytorch/pytorch/issues/128150 When you have big sums of values, we end up computing long chains of binary addition in our FX graph representation. Not only is this ugly, it also is quadratic, as the sympy.Add constructor is O(N) in number of arguments. Instead, ensure that we maintain the summation as a single FX node so we can do the entire addition all in one go. update_hint_regression benchmark, before and after: ``` update_hint_regression,compile_time_instruction_count,2648328980 update_hint_regression,compile_time_instruction_count,2563748678 ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/136429 Approved by: https://github.com/isuruf	2024-10-08 18:12:57 +00:00
PyTorch MergeBot	7e8dace0de	Revert "[ROCm] remove caffe2 from hipify (#137157 )" This reverts commit `40d8260745`. Reverted https://github.com/pytorch/pytorch/pull/137157 on behalf of https://github.com/xw285cornell due to this is breaking internal where we still use caffe2 ([comment](https://github.com/pytorch/pytorch/pull/137157#issuecomment-2400466131))	2024-10-08 17:45:45 +00:00
Jeff Daily	40d8260745	[ROCm] remove caffe2 from hipify (#137157 ) - Remove all "MasqueradingAsCUDA" files and classes. - Do not rename "CUDA" classes to "HIP". Pull Request resolved: https://github.com/pytorch/pytorch/pull/137157 Approved by: https://github.com/eqy	2024-10-05 12:48:54 +00:00
Michal Gallus	79562f3af8	[ROCm] Modify hipify script to work with Windows paths (#135360 ) This change modifies the `hipify_python.py` script to properly detect all directories, `include` and `ignore` paths during hipification process on Windows, by changing the path syntax convention to a UNIX-like one. Since in many places the script assumes a UNIX-like convention by using paths with forward slashes `/`, I decided to accommodate for it by converting Windows paths to UNIX-like ones. By doing it so, the number of changes to the file is limited. Moreover this early-on unification allows for the rest of the code to have a battle-tested linux-like behaviour. Another option would be to use `Path` object from `pathlib` to represent all paths in the script, however, it would impact a broader share of a code and would hence require a more meticulous evaluation in terms of non-altered logic and edge cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135360 Approved by: https://github.com/jeffdaily, https://github.com/jithunnair-amd	2024-10-04 23:43:43 +00:00
Jeff Daily	c7b0d4b148	raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114 ) raw_alloc is used by cudnn, miopen, thrust, and tunableop. Without this PR, the env var for disabling the caching allocator will only partially work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114 Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD Co-authored-by: Nichols A. Romero <nick.romero@amd.com>	2024-10-04 15:36:29 +00:00
PyTorch MergeBot	0d1701f310	Revert "raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114 )" This reverts commit `7001907480`. Reverted https://github.com/pytorch/pytorch/pull/131114 on behalf of https://github.com/PaliC due to failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/131114#issuecomment-2390615007))	2024-10-03 06:22:55 +00:00
Jeff Daily	7001907480	raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114 ) raw_alloc is used by cudnn, miopen, thrust, and tunableop. Without this PR, the env var for disabling the caching allocator will only partially work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114 Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD Co-authored-by: Nichols A. Romero <nick.romero@amd.com>	2024-10-02 16:27:15 +00:00
PyTorch MergeBot	7303716005	Revert "Simplify find_localzeros (#133325 )" This reverts commit `99f90c379e`. Reverted https://github.com/pytorch/pytorch/pull/133325 on behalf of https://github.com/ezyang due to https://fb.workplace.com/groups/gpuinference/permalink/2921405651341417/ ([comment](https://github.com/pytorch/pytorch/pull/133325#issuecomment-2385832600))	2024-10-01 13:25:03 +00:00
Edward Z. Yang	cc8f1cddd4	Turn on type-checking in torch.fx.experimental.symbolic_shapes (#136972 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/136972 Approved by: https://github.com/Skylion007 ghstack dependencies: #136934, #136935	2024-10-01 13:22:10 +00:00
PyTorch MergeBot	8982906502	Revert "Turn on type-checking in torch.fx.experimental.symbolic_shapes (#136972 )" This reverts commit `3ff2d93d9f`. Reverted https://github.com/pytorch/pytorch/pull/136972 on behalf of https://github.com/ezyang due to need to back out for merge conflict ([comment](https://github.com/pytorch/pytorch/pull/136972#issuecomment-2384182244))	2024-09-30 21:35:08 +00:00
Jez Ng	71aac59e93	Add Triton CPU as an Inductor backend (#133408 ) The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133408 Approved by: https://github.com/jansel, https://github.com/blaine-rister, https://github.com/malfet	2024-09-30 20:24:52 +00:00
Edward Z. Yang	3ff2d93d9f	Turn on type-checking in torch.fx.experimental.symbolic_shapes (#136972 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/136972 Approved by: https://github.com/Skylion007 ghstack dependencies: #136917, #136934, #136935	2024-09-30 18:04:36 +00:00
Edward Z. Yang	99f90c379e	Simplify find_localzeros (#133325 ) Instead of doing an N^2 connected thing, only do simplifications for binary max/min, and for very simple situations. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/133325 Approved by: https://github.com/albanD	2024-09-28 02:38:31 +00:00
albanD	e4571e7025	Add abi flags to cpp_extension cache folder (#136890 ) This is to avoid cache confusion between normal vs pydebug vs nogil builds in cpp extensions which can lead to catastrophic ABI issues. This is rare today for people to run both normal and pydebug on the same machine, but we expect quite a few people will run normal and nogil on the same machine going forward. This is tested locally by running each version alternatively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136890 Approved by: https://github.com/colesbury	2024-09-28 00:49:56 +00:00
PyTorch MergeBot	36428f91e9	Revert "Add Triton CPU as an Inductor backend (#133408 )" This reverts commit `31c0467594`. Reverted https://github.com/pytorch/pytorch/pull/133408 on behalf of https://github.com/int3 due to internal tests failing ([comment](https://github.com/pytorch/pytorch/pull/133408#issuecomment-2379692517))	2024-09-27 16:54:27 +00:00
Jez Ng	31c0467594	Add Triton CPU as an Inductor backend (#133408 ) The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133408 Approved by: https://github.com/jansel, https://github.com/blaine-rister, https://github.com/malfet	2024-09-26 15:35:26 +00:00
Ramana Sundararaman	be4b7e8131	Param fixes in docstring (#136097 ) Fixes wrong param names in docstrings. cc: @kit1980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136097 Approved by: https://github.com/ezyang	2024-09-21 18:56:34 +00:00
Bob Ren	7f9c06462f	fix mypi in utils/_sympy/functions.py (#136339 ) Signed-off-by: Bob Ren <bobren@fb.com> Turns out older versions of python, in particular 3.8 shows errors that 3.12 doesn't. For posterity these are the steps I took to reproduce: ``` conda create -n py38 python=3.8 conda activate py38 pip install -r requirements.txt lintrunner init dmypy restart && lintrunner --all-files --take MYPY ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136339 Approved by: https://github.com/Skylion007 ghstack dependencies: #136205	2024-09-20 18:39:16 +00:00
Bob Ren	8d9c42735a	Type _sympy/functions.py [1/n] (#136205 ) Signed-off-by: Bob Ren <bobren@fb.com> I was chatting with @jamesjwu about strategies to learn the code and he suggested adding types to some files. This stack of PRs adds types to _sympy/functions.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/136205 Approved by: https://github.com/Skylion007, https://github.com/jamesjwu	2024-09-19 17:15:53 +00:00
Igor Sugak	bce52d0b60	[CODEMOD][caffe2] use npt.NDArray instead of np.ndarray in type annotations (#136288 ) Summary: To facilitate PSS-2 upgrade, this uses `ndt.NDArray` instead of `nd.ndarray` in type annotations. In Numpy-1.19 (PSS-1) it's an alias to `nd.ndarray` -- a noop. In Numpy-1.24, `ndt.NDArray` a proper generic type, and without this change uses of `nd.ndarray` generate this Pyre type error: ```counterexample Invalid type parameters [24]: Generic type `np.ndarray` expects 2 type parameters. ``` Test Plan: Sandcastle plus visual inspection Differential Revision: D62977370 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136288 Approved by: https://github.com/kit1980	2024-09-19 12:40:36 +00:00
Aaron Gokaslan	31715be72a	[BE]: Update mypy to 1.11.2 (#133816 ) Updates mypy to 1.11.1 to improve type inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816 Approved by: https://github.com/ezyang	2024-09-16 19:44:11 +00:00
PyTorch MergeBot	d0cebedb31	Revert "Add Triton CPU as an Inductor backend (#133408 )" This reverts commit `e498b02b47`. Reverted https://github.com/pytorch/pytorch/pull/133408 on behalf of https://github.com/jeanschmidt due to Broke internal signals, see D62737208 for more details ([comment](https://github.com/pytorch/pytorch/pull/133408#issuecomment-2353623816))	2024-09-16 18:33:33 +00:00
PyTorch MergeBot	3117f2cf67	Revert "[BE]: Update mypy to 1.11.2 (#133816 )" This reverts commit `55299cfc22`. Reverted https://github.com/pytorch/pytorch/pull/133816 on behalf of https://github.com/jeanschmidt due to seems to have broken https://github.com/pytorch/pytorch/actions/runs/10865710499/job/30155699792 on main ([comment](https://github.com/pytorch/pytorch/pull/133816#issuecomment-2352377684))	2024-09-16 09:11:16 +00:00
Bob Ren	a5eb43d8b4	Add TensorReferenceAnalysis and some tests (#135886 ) Split out and modified from https://github.com/pytorch/pytorch/pull/130228. There were a bunch of subtle bugs eg. sometimes we need to use torch.ops.aten.{operator}.Tensor vs other times using torch.ops.aten.{operator}.default. Or in the case of pow we need to use Tensor_Tensor. I figured it'd be easier to split out adding TensorReferenceAnalysis and add some tests and do the actual integration in a separate diff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135886 Approved by: https://github.com/ezyang	2024-09-14 23:09:40 +00:00
Jez Ng	e498b02b47	Add Triton CPU as an Inductor backend (#133408 ) The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133408 Approved by: https://github.com/jansel	2024-09-14 21:45:19 +00:00
Aaron Gokaslan	55299cfc22	[BE]: Update mypy to 1.11.2 (#133816 ) Updates mypy to 1.11.1 to improve type inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816 Approved by: https://github.com/ezyang	2024-09-14 21:40:36 +00:00
Isuru Fernando	8c738c9270	Improve performance of sympy_generic_le (#135622 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135622 Approved by: https://github.com/ezyang ghstack dependencies: #135621	2024-09-11 16:20:03 +00:00
xinan.lin	67735d1ee8	[Inductor] Generalize `is_cuda` to specific device_type to make cpp_wrapper mode be extensible (#134693 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134693 Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/jansel	2024-09-10 10:11:13 +00:00
Michael Lazos	041960a1ce	[Dynamo] Automatically in-graph traceable tensor subclass ctors (#135151 ) Fixes https://github.com/pytorch/pytorch/issues/114389 Previously, dynamo would attempt to trace through the `__init__` of traceable tensor subclasses, since their constructors are AOT dispatcher traceable by definition, dynamo should automatically put these in the graph like we do for any other tensors. Not doing this is difficult because dynamo would need to apply mutations post tensor subclass creation in the graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135151 Approved by: https://github.com/bdhirsh	2024-09-06 12:23:38 +00:00
Avik Chaudhuri	8bfd4916d6	fast path for sympy gcd in floordiv (#134880 ) Summary: Re-implementation of https://github.com/pytorch/pytorch/pull/134150, which was reverted because of some internal tests hanging (case B). The original motivation was to get some other internal test unstuck (case A). The root cause is that sympy.gcd is both very clever as well as can blow up in some cases. This PR introduces a fast path with an appropriate fallback to sympy.gcd that ensures that both cases A and B go through. Test Plan: See the included test for specific examples. Also https://fb.workplace.com/groups/1075192433118967/posts/1491493248155548/?comment_id=1491938994777640&reply_comment_id=1492622821375924 Differential Revision: D62043315 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134880 Approved by: https://github.com/ezyang	2024-09-04 14:56:49 +00:00
Edward Z. Yang	6c5669903f	Fix Invalid NaN comparison due to infinity-zero multiply on latest sympy (#135044 ) Fixes https://github.com/pytorch/pytorch/issues/133735 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/135044 Approved by: https://github.com/zou3519	2024-09-04 14:13:09 +00:00
Pian Pawakapan	0c7856973b	[export] enumerate unsupported sympy.Functions (#134271 ) (#134598 ) Summary: There's 2 concepts of unsupported sympy.Functions in symbolic_shapes: 1) unsupported by the export solver, meaning the solver doesn't know how to provide useful fixes for those functions 2) unsupported by the sympy interpreter - meaning we can't reify them into FX nodes because the functions aren't present in PythonReferenceAnalysis This splits the current call into a call for each version, with the Export solver the only user of 1). For 1), we enumerate the functions in _sympy/functions.py, and subtract the functions we know we can support. For 2) there's only 3 functions we've seen pop up in test cases. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 Differential Revision: D61863394 Pulled By: pianpwk Pull Request resolved: https://github.com/pytorch/pytorch/pull/134598 Approved by: https://github.com/angelayi	2024-08-28 00:34:38 +00:00
PyTorch MergeBot	5b392d22c6	Revert "fix stuck floordiv (#134150 )" This reverts commit `92c4771853`. Reverted https://github.com/pytorch/pytorch/pull/134150 on behalf of https://github.com/anijain2305 due to compile time regression internal ([comment](https://github.com/pytorch/pytorch/pull/134150#issuecomment-2313230404))	2024-08-27 18:23:44 +00:00
PyTorch MergeBot	141a9c7204	Revert "[export] enumerate unsupported sympy.Functions (#134271 )" This reverts commit `ddd71e3479`. Reverted https://github.com/pytorch/pytorch/pull/134271 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/134271#issuecomment-2311353460))	2024-08-27 00:45:00 +00:00
Pian Pawakapan	ddd71e3479	[export] enumerate unsupported sympy.Functions (#134271 ) There's 2 concepts of unsupported sympy.Functions in symbolic_shapes: 1) unsupported by the export solver, meaning the solver doesn't know how to provide useful fixes for those functions 2) unsupported by the sympy interpreter - meaning we can't reify them into FX nodes because the functions aren't present in PythonReferenceAnalysis This splits the current call into a call for each version, with the Export solver the only user of 1). For 1), we enumerate the functions in _sympy/functions.py, and subtract the functions we know we can support. For 2) there's only 3 functions we've seen pop up in test cases. Differential Revision: D61677956 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134271 Approved by: https://github.com/avikchaudhuri	2024-08-26 22:44:12 +00:00
soulitzer	a23dae22d5	Update AC pass use_reentrant message (#134472 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134472 Approved by: https://github.com/albanD	2024-08-26 21:57:38 +00:00
albanD	2588b5e51a	Move module_tracker to logging for confused hierarchy (#134467 ) Fixes https://github.com/pytorch/pytorch/issues/134242 Make sure to never raise an error when confused. Logs for confusion can be enabled with `TORCH_LOGS="torch.utils.module_tracker"` or the usual python systems. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134467 Approved by: https://github.com/malfet	2024-08-26 19:39:08 +00:00
Avik Chaudhuri	92c4771853	fix stuck floordiv (#134150 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/134133 Test Plan: Tested on the small repro in the linked issue with different lengths N (replacing 100), recording N vs. time taken in nanoseconds: 10 127268319 20 220839662 30 325463125 40 429259441 50 553136055 60 670799769 70 999170514 80 899014103 90 997168902 100 1168202035 110 1388556619 120 1457488235 130 1609816470 140 2177889877 150 1917560313 160 2121096113 170 2428502334 180 4117450755 190 4003068224 So N ~ 200 takes ~5s. Previously even smaller N would go for >1 min. Didn't add a perf test because ezyang is planning to build a benchmark. Also tested on https://www.internalfb.com/diff/D61560171, which now gets past the stuck point. Differential Revision: D61619660 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134150 Approved by: https://github.com/ezyang	2024-08-26 07:27:59 +00:00
Edward Z. Yang	326db8af4c	Replace sympy Min/Max with reimplementations (#133319 ) Sympy's implementation of Min/Max displays asymptotically bad behavior on `TORCH_COMPILE_CPROFILE=1 python torchrec/distributed/tests/test_pt2_multiprocess.py TestPt2Train.test_compile_multiprocess`. Evidence profile: ![image](https://github.com/user-attachments/assets/142301e9-3a18-4370-b9db-19b32ece7ee8) On this test case, we spend 42% of all time compiling the network on ShapeEnv.replace, which in turn spends all of its time in xreplace. The problem appears to be find_localzeros call. By vendoring the implementations of Min/Max, we can potentially reduce the cost of this operation. The implementation is copy-pasted sympy/functions/elementary/miscellaneous.py but with some adjustments: * I deleted logic related to differentatiation, evalf and heaviside, as it's not relevant to PyTorch reasoning * There's some massaging to appease PyTorch's linters, including a lot of noqa and type: ignore (which I could potentially refactor away with substantive changes, but that's better as its own change) * I deleted the second loop iteration for is_connected, as an attempt at initial optimization (this also simplifies the port, since I can omit some code). I'll comment at that point what the exact difference is. Before this change, the test in question takes 100s with 40 features; post this change, afterwards, it takes only 69s. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/133319 Approved by: https://github.com/Skylion007	2024-08-25 05:05:59 +00:00
Aaron Orenstein	d95aedf5fd	[BE] typing for decorators - fx/_compatibility (part 1) (#134202 ) Part of #134054. This corresponds to the pytorch mypy changes from D61493706. Updating takes so long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change. So landing these 'type: ignore' for pytorch in advance of them actually being needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202 Approved by: https://github.com/Skylion007	2024-08-22 17:07:33 +00:00
PyTorch MergeBot	2db28a9611	Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814 )" This reverts commit `bce0caba78`. Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/ezyang due to root cause of internal failures not addressed ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2302466444))	2024-08-21 16:13:34 +00:00
blazej-smorawski	585c049fa3	Fix `Extension` attribute name in `CppExtension` example (#134046 ) Hi! It seems there's a typo in `CppExtension` example. I think it should say `extra_link_args` instead of `extra_link_flags`. Not that I spent a few hours debugging missing kernels inside a library's fatbin or anything :D. Please see `Extension` definition inside setuptools: `ebddeb36f7/setuptools/_distutils/extension.py (L62)` Thanks! Błażej Pull Request resolved: https://github.com/pytorch/pytorch/pull/134046 Approved by: https://github.com/soulitzer	2024-08-21 13:58:16 +00:00
Aaron Gokaslan	afaa5fcecb	[BE][Ez]: FURB142,FURB92 misc preview fixes (#133880 ) Fixes some miscellaneous code quality issues with some refurb rules that have not been enabled yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133880 Approved by: https://github.com/soulitzer, https://github.com/malfet	2024-08-21 13:54:51 +00:00
Aaron Gokaslan	bce0caba78	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-08-20 17:19:57 +00:00
cyy	c3d02fa390	[Reland2] Update NVTX to NVTX3 (#109843 ) Another attempt to update NVTX to NVTX3. We now avoid changing NVTX header inclusion of existing code. The advantage of NVTX3 over NVTX is that it is a header-only library so that linking with NVTX3 can greatly simplify our CMake and other building scripts for finding libraries in user environments. In addition, NVTX are indeed still present in the latest CUDA versions, but they're no longer a compiled library: It's now a header-only library. That's why there isn't a .lib file anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109843 Approved by: https://github.com/peterbell10, https://github.com/eqy Co-authored-by: Ivan Zaitsev <108101595+izaitsevfb@users.noreply.github.com>	2024-08-20 16:33:26 +00:00
Aaron Orenstein	187d55018a	[BE] Fix MYPY issues (#133872 ) Fix some mypy issues that have crept in to the trunk. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133872 Approved by: https://github.com/oulgen, https://github.com/Skylion007	2024-08-20 16:12:04 +00:00
PyTorch MergeBot	42097f0ec1	Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814 )" This reverts commit `cf60fe53a8`. Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/jeanschmidt due to Broke 12k internal signals/jobs, @ezyang please help get those changes merged. More details check D61488368 ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2298210309))	2024-08-20 08:02:49 +00:00
Michael Lazos	f147349568	Fix DeviceContext bug (#133729 ) Fixes https://github.com/pytorch/pytorch/issues/133666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133729 Approved by: https://github.com/bdhirsh ghstack dependencies: #133130	2024-08-20 07:14:37 +00:00
Ahmad Sarvmeily	9a998d98f1	Fix edge case in inductor triton clean script (#130837 ) The regex in the script is too restrictive, as it excludes examples with parentheses in args, like the following: ``` triton_poi_fused_add_0.run(arg0_1.item(), arg1_1.item(), buf0, 1, grid=grid(1), stream=streamNone) ^ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/130837 Approved by: https://github.com/Chillee	2024-08-19 23:46:11 +00:00
Aaron Gokaslan	cf60fe53a8	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-08-18 19:10:16 +00:00
Oguz Ulgen	30fbf5b19c	Remove AMD restrictions on triton hashing (#133616 ) Summary: When we added these functions, AMD's triton checkout was very old, it appears to have caught up. Remove restrictions. Test Plan: unit tests Differential Revision: D61351473 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133616 Approved by: https://github.com/mxz297, https://github.com/nmacchioni, https://github.com/eellison	2024-08-16 08:02:48 +00:00
Josh Fromm	f347174d61	Hipify Pytorch3D (#133343 ) Summary: X-link: https://github.com/fairinternal/pytorch3d/pull/45 X-link: https://github.com/facebookresearch/pytorch3d/pull/1851 Very minor change to extend hipification to a missing hipcub constant. This is needed to hipify some of the kernels in pytorch3d. Differential Revision: D61171993 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133343 Approved by: https://github.com/houseroad	2024-08-15 23:39:07 +00:00
Xuehai Pan	758a0a88a2	[BE][Easy] enable `ruff` rule `PIE790`: unnecessary `pass` statement (#133200 ) This PR removes unnecessary `pass` statement. This is semanticly safe because the bytecode for the Python code does not change. Note that if there is a docstring in the function, a empty function does not need a `pass` statement as placeholder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133200 Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/kit1980	2024-08-15 15:50:19 +00:00
Nicolas Macchioni	cf81180007	allow `SubConfigProxy` of arbitrary depth (#133418 ) Before, having arbitrary depth nested configs like ``` class Foo: foo: List[int] = [1, 2, 3] class Bar: bar: str = "1" class Baz: baz: int = 1 ``` would cause problems beyond the first layer. For example, if we tried ``` from torch._inductor import config as inductor_config print(inductor_config.Foo) print(repr(inductor_config.Foo.foo)) print(inductor_config.Foo.Bar) print(repr(inductor_config.Foo.Bar.bar)) print(inductor_config.Foo.Bar.Baz) print(repr(inductor_config.Foo.Bar.Baz.baz)) ``` we would get some output like ``` <torch.utils._config_module.SubConfigProxy object at 0x7fac65de00a0> [1, 2, 3] ... AttributeError: torch._inductor.config.Foo.Bar does not exist ``` Obviously, this is not what we want. With these changes, we get the right values ``` <torch.utils._config_module.SubConfigProxy object at 0x7f840d05bf40> [1, 2, 3] <torch.utils._config_module.SubConfigProxy object at 0x7f840cedc940> '1' <torch.utils._config_module.SubConfigProxy object at 0x7f840cedc100> 1 ``` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/133418 Approved by: https://github.com/oulgen	2024-08-14 18:43:00 +00:00
Aaron Enye Shi	dadb20a9d6	[Memory Snapshot][Viz] Add Allocator Settings Tab (#132518 ) Summary: Since we are storing the allocator settings in the snapshot files for awhile now (since https://github.com/pytorch/pytorch/pull/119404), we can expose this to users with a new tab in the visualizer. Test Plan: Ran it locally: ![image](https://github.com/user-attachments/assets/5f79ccd0-fe1c-4e42-bb58-106d8f3cccd6) Differential Revision: D60673548 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/132518 Approved by: https://github.com/tianfengfrank, https://github.com/zdevito	2024-08-13 17:35:12 +00:00
Aaron Enye Shi	3128640c31	[Memory Snapshot][Viz] Show event timestamps if collected (#132523 ) Summary: Since we've been capturing timestamps for awhile (since https://github.com/pytorch/pytorch/pull/112266), we can surface this into the UI. This can be useful to correlate with timing of other events. Test Plan: Ran it locally. ![image](https://github.com/user-attachments/assets/8b3922e8-1ae2-4b09-aa13-20b2b8237064) Differential Revision: D60673800 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/132523 Approved by: https://github.com/tianfengfrank, https://github.com/zdevito	2024-08-12 16:12:04 +00:00
PyTorch MergeBot	e9eb8795bb	Revert "[Memory Snapshot][Viz] Show event timestamps if collected (#132523 )" This reverts commit `27c44c884e`. Reverted https://github.com/pytorch/pytorch/pull/132523 on behalf of https://github.com/clee2000 due to broke some tests on mac ex export/test_retraceability.py::RetraceExportTestExport::test_disable_forced_specializations_ok_retraceability [GH job link](https://github.com/pytorch/pytorch/actions/runs/10344621336/job/28630686528) [HUD commit link](`27c44c884e`) Possibly a landrace since I see that some of the failing tests ran on the PR ([comment](https://github.com/pytorch/pytorch/pull/132523#issuecomment-2284312426))	2024-08-12 15:42:07 +00:00
Aaron Enye Shi	27c44c884e	[Memory Snapshot][Viz] Show event timestamps if collected (#132523 ) Summary: Since we've been capturing timestamps for awhile (since https://github.com/pytorch/pytorch/pull/112266), we can surface this into the UI. This can be useful to correlate with timing of other events. Test Plan: Ran it locally. ![image](https://github.com/user-attachments/assets/8b3922e8-1ae2-4b09-aa13-20b2b8237064) Differential Revision: D60673800 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/132523 Approved by: https://github.com/tianfengfrank, https://github.com/zdevito	2024-08-12 01:48:23 +00:00
PyTorch MergeBot	7f08b73980	Revert "[Memory Snapshot][Viz] Show event timestamps if collected (#132523 )" This reverts commit `456909e5d3`. Reverted https://github.com/pytorch/pytorch/pull/132523 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/132523#issuecomment-2282925079))	2024-08-11 23:33:37 +00:00
Aaron Enye Shi	456909e5d3	[Memory Snapshot][Viz] Show event timestamps if collected (#132523 ) Summary: Since we've been capturing timestamps for awhile (since https://github.com/pytorch/pytorch/pull/112266), we can surface this into the UI. This can be useful to correlate with timing of other events. Test Plan: Ran it locally. ![image](https://github.com/user-attachments/assets/8b3922e8-1ae2-4b09-aa13-20b2b8237064) Differential Revision: D60673800 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/132523 Approved by: https://github.com/tianfengfrank, https://github.com/zdevito	2024-08-11 23:27:48 +00:00
Syed Tousif Ahmed	42cd397a0e	Loads .pyd instead of .so in MemPool test for windows (#132749 ) Fixes #132650 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132749 Approved by: https://github.com/albanD	2024-08-08 14:29:56 +00:00
PyTorch MergeBot	123d9ec5bf	Revert "Loads .pyd instead of .so in MemPool test for windows (#132749 )" This reverts commit `37ab0f3385`. Reverted https://github.com/pytorch/pytorch/pull/132749 on behalf of https://github.com/syed-ahmed due to Seems like periodic is still failing: `7c79e89bc5` ([comment](https://github.com/pytorch/pytorch/pull/132749#issuecomment-2274041302))	2024-08-07 18:08:44 +00:00
Syed Tousif Ahmed	37ab0f3385	Loads .pyd instead of .so in MemPool test for windows (#132749 ) Fixes #132650 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132749 Approved by: https://github.com/albanD	2024-08-07 09:58:52 +00:00
Li Yu (ads)	94155ce31b	[Torch] Support meta device in checkpoint (#132684 ) Summary: ## Why utils.checkpoint doesn't support meta device: ``` File "/Users/lyu1/torchdev/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 490, in checkpoint next(gen) File "/Users/lyu1/torchdev/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 1359, in _checkpoint_without_reentrant_generator device_module = _get_device_module(device) File "/Users/lyu1/torchdev/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 98, in _get_device_module device_module = getattr(torch, device) File "/Users/lyu1/torchdev/lib/python3.9/site-packages/torch/__init__.py", line 1938, in __getattr__ raise AttributeError(f"module '{__name__}' has no attribute '{name}'") AttributeError: module 'torch' has no attribute 'meta' ``` This blocks us from running model with checkpoint enabled in meta mode. ## What This diff handles the case of meta device in checkpoint.py. (in checkpoint.py, device module is manily used when preserve_rng_state=true, which doesn't apply to meta case. So a more elgant fix might be set preserve_rng_state=false when detecting args are on meta device. But I didn't find where to do this check in the minimum way. Let me know if you have ideas.) Test Plan: Tested with toy model which has checkpoint on its module: P1513716944 Differential Revision: D60749427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132684 Approved by: https://github.com/kit1980	2024-08-06 20:45:50 +00:00
Edward Z. Yang	345bea01dc	Refactor thunkify to return proper thunk abstraction (#132407 ) This is superior to lru_cache because (1) it's more explicit and (2) it doesn't leak the original function after it's been forced. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132407 Approved by: https://github.com/albanD	2024-08-06 02:35:45 +00:00
Brian Hirsh	26c6786109	return_and_correct_aliasing: skip dispatcher when swapping storage (#132524 ) `return_and_correct_aliasing` is used by FunctionalTensor today to ensure that when we call view/inplace ops, the input and output `FunctionalTensors` share the same storage. This was previously done with a dispatcher call to `aten.set_`. In this PR I swap it out with a util that just manually does the storage swap. Benefits: (1) we know this is safe in the specific way it is used by FunctionalTensor: avoiding the extra assertions in `aten.set_` is necessary to avoid some unbacked symint errors (2) this should improve compile times a bit Pull Request resolved: https://github.com/pytorch/pytorch/pull/132524 Approved by: https://github.com/ezyang ghstack dependencies: #132243, #132337, #132322	2024-08-06 00:44:35 +00:00
David Berard	1962f9475f	[NJT][flop counter] attention: if offsets are fake, use max seqlen (#132356 ) The flop counter is used by the partitioner, in which case the tensors passed in can be fake. The flop computations for nested attention use the offsets to determine the actual amount of compute that will be done. But when the offsets are fake, we end up with unbacked symints (from `(offsets[1:] - offsets[:-1]).to_list()`). If we find that the offsets are fake or functional tensors, then use the max sequence length instead. Repro: https://gist.github.com/davidberard98/903fb3e586edb6d1d466786e1a610eba Differential Revision: [D60597463](https://our.internmc.facebook.com/intern/diff/D60597463) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132356 Approved by: https://github.com/soulitzer	2024-08-02 20:42:29 +00:00
Michael Lazos	93979e7063	Skip frame if torch dispatch mode enabled (#131828 ) Fixes https://github.com/pytorch/pytorch/issues/105929 We now skip frames if a dispatch mode is enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131828 Approved by: https://github.com/bdhirsh, https://github.com/anijain2305	2024-08-01 19:06:20 +00:00
Xuehai Pan	30293319a8	[BE][Easy][19/19] enforce style for empty lines in import segments in `torch/[o-z]*/` (#129771 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129771 Approved by: https://github.com/justinchuby, https://github.com/janeyx99	2024-08-01 17:07:14 +00:00
Oguz Ulgen	72d2dba992	Add None return type to init (#132335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335 Approved by: https://github.com/albanD	2024-08-01 15:26:45 +00:00
Ruichen Sun	14108c1677	Fix error handling in _triton.py (#132006 ) On Windows, _triton.py creates a confusing error ("RuntimeError: Should never be _installed")_ as triton is not supported in Windows. This is not caught in the current Pytorch exception handling. This pull request adds a new exception handling for the runtime error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132006 Approved by: https://github.com/oulgen	2024-07-29 15:02:25 +00:00
PyTorch MergeBot	945bf78894	Revert "[BE] typing for decorators - fx/_compatibility (#131568 )" This reverts commit `193f62fde9`. Reverted https://github.com/pytorch/pytorch/pull/131568 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident. This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))	2024-07-28 03:43:39 +00:00
PyTorch MergeBot	d3c17fea90	Revert "[BE] typing for decorators - _library/custom_ops (#131578 )" This reverts commit `c65b197b85`. Reverted https://github.com/pytorch/pytorch/pull/131578 on behalf of https://github.com/clee2000 due to breaking lint internally D60265575 ([comment](https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359))	2024-07-28 03:29:32 +00:00
PyTorch MergeBot	5ced63a005	Revert "[BE] typing for decorators - utils/flop_counter (#131580 )" This reverts commit `81c26ba5ae`. Reverted https://github.com/pytorch/pytorch/pull/131580 on behalf of https://github.com/clee2000 due to breaking lint internally D60265575 ([comment](https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359))	2024-07-28 03:29:31 +00:00
rzou	a3cdbd8189	[FlopCounterMode] Fix register_flop_formula (#131777 ) Previously, FlopCounterMode would ignore any custom ops registered through `register_flop_formula`. The problem was: - register_flop_formula(target) requires target to be an OpOverloadPacket. - register_flop_formula used register_decomposition to populate its registry - register_decomposition decomposes the OpOverloadPacket into OpOverload before putting it into the registry - FlopCounterMode ignores OpOverloads in its registry (it assumes the registry is a dictionary mapping OpOverloadPacket to flop formula). register_decomposition is too heavy of a hammer, plus this isn't a decomposition, so I changed the registration mechanism. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/131777 Approved by: https://github.com/Chillee	2024-07-26 18:44:50 +00:00
eellison	5f2c80d16d	Add inductor OrderedSet (#130003 ) Implemented by extending `collections.abc.MutableSet` and backing it with a dictionary, which is ordered. From collections.abc.MutableSet: ``` A mutable set is a finite, iterable container. This class provides concrete generic implementations of all methods except for __contains__, __iter__, __len__, add(), and discard(). ``` In addition to implementing those methods I also had to define some methods of python's set which were not implemented in MutableSet. I reused the test from my python's lib. There were a few instances of tests that didnt pass because edge case behavior that is not necessary to reimplement - support self-referencing repr - erroring when an member's `__eq__` function would modify the set itself - MutableSet supports Iterables as inputs, but not sequences (pretty rare..) - Some specifics of exact equivalent type errors being thrown - [The protocol for automatic conversion to immutable](https://docs.python.org/2/library/sets.html#protocol-for-automatic-conversion-to-immutable) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130003 Approved by: https://github.com/aorenste	2024-07-26 18:16:57 +00:00
Aaron Orenstein	81c26ba5ae	[BE] typing for decorators - utils/flop_counter (#131580 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131580 Approved by: https://github.com/oulgen, https://github.com/zou3519 ghstack dependencies: #131568, #131569, #131570, #131571, #131572, #131573, #131574, #131575, #131576, #131577, #131578, #131579	2024-07-26 04:59:58 +00:00
Aaron Orenstein	c65b197b85	[BE] typing for decorators - _library/custom_ops (#131578 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131578 Approved by: https://github.com/oulgen, https://github.com/zou3519 ghstack dependencies: #131568, #131569, #131570, #131571, #131572, #131573, #131574, #131575, #131576, #131577	2024-07-25 22:24:19 +00:00
Aaron Orenstein	193f62fde9	[BE] typing for decorators - fx/_compatibility (#131568 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131568 Approved by: https://github.com/justinchuby, https://github.com/oulgen, https://github.com/zou3519	2024-07-25 22:24:19 +00:00
Oguz Ulgen	e0f1bf14a4	Fully type torch/utils/_config_module.py (#131676 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131676 Approved by: https://github.com/zou3519	2024-07-24 19:36:09 +00:00
rzou	480ae51f85	[pytree] Only import optree if it's used (#131478 ) torch.utils._pytree imports optree if it's available. Instead, we change it to if it gets used. The motivation for this is better isolation. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/131478 Approved by: https://github.com/albanD	2024-07-24 00:10:49 +00:00
Aaron Orenstein	5a0068cc69	[BE] mypy: disallow untyped decorators (#131428 ) Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations. Step 1 - Enable the error and override in all the offending files. #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428 Approved by: https://github.com/justinchuby, https://github.com/oulgen	2024-07-23 21:50:55 +00:00
Aaron Orenstein	f3562e2cdc	backport dataclass(slots=True) (#131014 ) Python 3.10 adds `@dataclass(slots=True)` to auto-build the `__slots__` for a dataclass. This is really useful but we can't use it until 3.10 becomes our minimum version. Copied the code for that functionality from python into a new decorator and ported it to use 3.8 syntax (removed use of `match`). Usage: ``` @dataclass_slots @dataclass class X: pass ``` is the same as (in py3.10): ``` @dataclass(slots=True) class X: pass ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131014 Approved by: https://github.com/oulgen, https://github.com/eellison	2024-07-21 19:26:31 +00:00
Xuehai Pan	1439bd3c9c	[Easy][pytree] enable CXX pytree under `torch::deploy` (#130144 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130144 Approved by: https://github.com/zou3519 ghstack dependencies: #130895, #130139	2024-07-21 07:36:22 +00:00
Xuehai Pan	d2bd9acabd	[BE] bump `optree` version to 0.12.1 (#130139 ) 0.12.0 Major Updates: - Add context manager to temporarily set the dictionary sorting mode - Add accessor APIs - Use `stable` tag for `pybind11` for Python 3.13 support - Fix potential segmentation fault for pickling support 0.12.1 Updates: - Fix warning regression during import when launch with strict warning filters Closes #130155 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130139 Approved by: https://github.com/zou3519 ghstack dependencies: #130895	2024-07-20 02:41:10 +00:00
Xuehai Pan	f0075c179b	Pin `sympy >= 1.13.0` (#130895 ) ------ The opposite of #130836. Pin `sympy >= 1.13.0` for Python >= 3.9 and `sympy == 1.12.1` for Python 3.8. - #130836 See the PR description of #130836 for more details. `sympy` 1.13.0 introduces some breaking changes which break our tests. More specifically: - Ref [Backwards compatibility breaks and deprecations](https://github.com/sympy/sympy/wiki/release-notes-for-1.13.0#backwards-compatibility-breaks-and-deprecations) > BREAKING CHANGE: Float and Integer/Rational no longer compare equal with a == b. From now on Float(2.0) != Integer(2). Previously expressions involving Float would compare unequal e.g. x2.0 != x2 but an individual Float would compare equal to an Integer. In SymPy 1.7 a Float will always compare unequal to an Integer even if they have the same "value". Use sympy.numbers.int_valued(number) to test if a number is a concrete number with no decimal part. ([#25614](https://github.com/sympy/sympy/pull/25614) by [@smichr](https://github.com/smichr)) `sympy >= 1.13.0` is required to enable Python 3.13 support. This should be part of #130689. - #130689 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130895 Approved by: https://github.com/ezyang	2024-07-20 00:59:24 +00:00
eellison	16aaff7783	Fix mm pad regresion - more conservative estimation of plannable inputs (#128909 ) - More conservative estimation of plannable inputs - Consider constant_pad_nd as pointwise node in concat lowering - Use aten.cat instead of constant pad ndwhen padding just a single dimension because it can be memory-planned away Pull Request resolved: https://github.com/pytorch/pytorch/pull/128909 Approved by: https://github.com/Chillee	2024-07-18 16:42:30 +00:00
Yu, Guangye	096dc444ce	Keep zero check be compatible with different sympy versions (#130729 ) # Motivation I found a difference between sympy 1.12 and 1.13. ```python # for 1.12 >>> import sympy >>> a = sympy.Number(0.0) >>> a == 0 True ``` ```python # for 1.13 >>> import sympy >>> a = sympy.Number(0.0) >>> a == 0 False ``` The different behavior will impact the result of [safe_mul](`6beec34b1c/torch/utils/_sympy/value_ranges.py (L521-L528)`), resulting in an incorrect results when `a = sympy.Number(0.0)`, `b = inf` and the result is `nan` if sympy version is 1.13. (the expected result is 0) ```python def safe_mul(a, b): # Make unknown() * wrap(0.0) == wrap(0.0) if a == 0.0: return a elif b == 0.0: return b else: return a * b ``` In different sympy versions, `sympy.Number(0)` always has the same behavior that equals to 0.0. ```python >>> import sympy >>> a = sympy.Number(0) >>> a == 0.0 True # for different sympy versions ``` So, use 0.0 when checking zero in safe_mul to keep compatible with different sympy versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130729 Approved by: https://github.com/lezcano, https://github.com/EikanWang	2024-07-16 08:39:00 +00:00
PyTorch MergeBot	074a5c0c9b	Revert "[BE] bump `optree` version to 0.12.1 (#130139 )" This reverts commit `8fcb156e8b`. Reverted https://github.com/pytorch/pytorch/pull/130139 on behalf of https://github.com/clee2000 due to broke inductor/test_torchinductor_codegen_dynamic_shapes.py and test_sympy_utils.py `8fcb156e8b` ([comment](https://github.com/pytorch/pytorch/pull/130139#issuecomment-2229248447))	2024-07-15 19:42:11 +00:00
Xuehai Pan	8fcb156e8b	[BE] bump `optree` version to 0.12.1 (#130139 ) 0.12.0 Major Updates: - Add context manager to temporarily set the dictionary sorting mode - Add accessor APIs - Use `stable` tag for `pybind11` for Python 3.13 support - Fix potential segmentation fault for pickling support 0.12.1 Updates: - Fix warning regression during import when launch with strict warning filters Closes #130155 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130139 Approved by: https://github.com/zou3519	2024-07-15 17:27:07 +00:00
Xuehai Pan	4d7bf72d93	[BE][Easy] fix ruff rule needless-bool (SIM103) (#130206 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130206 Approved by: https://github.com/malfet	2024-07-14 08:17:52 +00:00
Aaron Orenstein	567482973d	typing fake_tensor.py (#128041 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128041 Approved by: https://github.com/eellison ghstack dependencies: #129182	2024-07-13 06:07:40 +00:00
Aaron Orenstein	634b62f111	typing proxy_tensor.py (#129182 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129182 Approved by: https://github.com/Chillee	2024-07-12 23:17:09 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
Sam Larsen	358da54be5	[inductor] Better messaging when triton version is too old (#130403 ) Summary: If triton is available, but we can't import triton.compiler.compiler.triton_key, then we see some annoying behavior: 1) If we don't actually need to compile triton, the subprocess pool will still spew error messages about the import failure; it's unclear to users if this is an actual problem. 2) If we do need to compile triton, we a) see the error messages from above and b) get a vanilla import exception without the helpful "RuntimeError: Cannot find a working triton installation ..." Test Plan: Ran with and without torch.compile for a) recent version of triton, b) triton 2.2, and c) no triton. In all cases, verified expected output (success or meaningful error message) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130403 Approved by: https://github.com/eellison	2024-07-10 23:45:50 +00:00
Pian Pawakapan	1b3b4c2fb9	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) (#130380 ) original PR: https://github.com/pytorch/pytorch/pull/128599 (re-created after revert + poisoned diff train) Summary: This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Test Plan: contbuild & OSS CI, see `940e4477ab` Original Phabricator Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D59543603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130380 Approved by: https://github.com/izaitsevfb	2024-07-10 19:23:37 +00:00
PyTorch MergeBot	9c9744c3ac	Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599 )" This reverts commit `940e4477ab`. Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/izaitsevfb due to breaking internal APS tests, see D59498864 ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2218724762))	2024-07-09 21:03:49 +00:00
Jerry Mannil	42f647219a	[ROCm] Add int4 support (#129710 ) - Add AMD support for int4 kernel - Only supports CDNA2 and CDNA3 gpus for now - Uses `mfma_f32_16x16x16bf16` instruction for matrix multiply - Uses `v_and_or_b32` instruction and `__hfma2` instrinsic for unpacking bf16 values - Enable hipify for `__nv_bfloat16` and `__nv_bfloat162` data types - Enable int4 unit tests for CDNA2 and CDNA3 AMD gpus - Fix torchscript issues due to hipify for `__nv_bfloat16` type - TorchScript has its own implementation for bfloat16 type - Implemented in `__nv_bloat16` structure at [resource_strings.h](https://github.com/pytorch/pytorch/blob/main/torch/csrc/jit/codegen/fuser/cuda/resource_strings.h) - So, we shouldn't hipify any reference of `__nv_bfloat16` in the torchscript implementation - Hence moved the `__nv_bfloat16` direct references in `codegen.cpp` and `cuda_codegen.cpp` to `resource_strings.h` which is already exempted from hipify Fixes #124699 Fixes pytorch-labs/gpt-fast/issues/154 Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129710 Approved by: https://github.com/malfet	2024-07-09 19:49:12 +00:00
PyTorch MergeBot	d7b7f8b79f	Revert "[ROCm] Add int4 support (#129710 )" This reverts commit `d0ad13fa42`. Reverted https://github.com/pytorch/pytorch/pull/129710 on behalf of https://github.com/jeffdaily due to original ROCm PR did not have ciflow/rocm, missed signal ([comment](https://github.com/pytorch/pytorch/pull/129710#issuecomment-2214558368))	2024-07-08 16:07:53 +00:00
Jerry Mannil	d0ad13fa42	[ROCm] Add int4 support (#129710 ) Add AMD support for int4 kernel using mfma_f32_16x16x16bf16 instruction. Only supports CDNA2 and CDNA3 gpus for now. Fixes #124699 Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129710 Approved by: https://github.com/malfet	2024-07-07 23:54:22 +00:00
Pian Pawakapan	940e4477ab	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] # something with _w ... # turns into -> s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) # turns into torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599 Approved by: https://github.com/ezyang	2024-07-07 20:10:14 +00:00
PyTorch MergeBot	963f430d13	Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599 )" This reverts commit `0267b2ddcb`. Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause a landrace and fails inductor/test_cudagraph_trees in trunk `0267b2ddcb` ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2211690518))	2024-07-06 07:20:05 +00:00
Pian Pawakapan	0267b2ddcb	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] # something with _w ... # turns into -> s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) # turns into torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599 Approved by: https://github.com/ezyang	2024-07-06 03:44:49 +00:00
Edward Z. Yang	8af58f66bb	Fix typo in floordiv solver code that affects flipped relation (#129888 ) Fixes https://github.com/pytorch/pytorch/issues/123535 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129888 Approved by: https://github.com/lezcano	2024-07-03 04:47:32 +00:00
PyTorch MergeBot	c22e66896f	Revert "Fix typo in floordiv solver code that affects flipped relation (#129888 )" This reverts commit `3c6c3b9448`. Reverted https://github.com/pytorch/pytorch/pull/129888 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the updated test starts to fail flakily in trunk somehow, so I am reverting the change to see if it helps ([comment](https://github.com/pytorch/pytorch/pull/129888#issuecomment-2204442653))	2024-07-02 21:16:59 +00:00
Xuehai Pan	f1df13f023	[BE][Easy] Fix `PYI001`: unprefixed-type-param in `torch/utils/data/datapipes` (#129885 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129885 Approved by: https://github.com/ezyang	2024-07-02 14:56:27 +00:00

... 2 3 4 5 6 ...

2310 Commits