pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
linhaifeng	369f2d6951	[3/N] fix typo in other folders (#166606 ) fix typo in other folders #166374 #166126 _typos.toml ```bash [files] extend-exclude = ["tools/linter/dictionary.txt"] [default.extend-words] nd = "nd" arange = "arange" Nd = "Nd" GLOBALs = "GLOBALs" hte = "hte" iy = "iy" PN = "PN" Dout = "Dout" optin = "optin" gam = "gam" PTD = "PTD" Sur = "Sur" nin = "nin" tme = "tme" inpt = "inpt" mis = "mis" Raison = "Raison" ouput = "ouput" nto = "nto" Onwer = "Onwer" callibrate = "callibrate" ser = "ser" Metdata = "Metdata" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166606 Approved by: https://github.com/ezyang	2025-10-30 10:30:40 +00:00
Natalia Gimelshein	37c6087334	Add split-K control to cuBLAS reduced-precision settings (#164766 ) ## Summary - add a CuBLASReductionOption enum so the CUDA context can track reduced-precision and split-K options - extend the Python bindings, backend helpers, and docs to accept an optional allow_splitk argument for fp16/bf16 matmul controls - update cuBLAS/cuBLASLt call sites plus dynamo guards and tests to respect the new combinations ## Testing - python test/test_cuda.py TestCuda.test_cublas_allow_fp16_reduced_precision_reduction_get_set -v (fails: ModuleNotFoundError: No module named 'psutil') ------ https://chatgpt.com/codex/tasks/task_e_68e404623178832f8a3e1d34e1e175da Pull Request resolved: https://github.com/pytorch/pytorch/pull/164766 Approved by: https://github.com/malfet, https://github.com/albanD	2025-10-08 18:48:45 +00:00
Yuanyuan Chen	5103ecc5d8	[1/N] Fix clang-tidy readability checks (#164561 ) Check all `.cpp` files except `jit` files for readability thoroughly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164561 Approved by: https://github.com/Skylion007	2025-10-04 09:40:38 +00:00
Lakshay Garg	f006aee601	Speed up FP precision lookup (#164044 ) This commit simplifies the precision lookup and setting logic by reducing the number of branches and using a custom hash function. Fixes #161822. The issue described in #163709 still persists. This is meant as a short term fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164044 Approved by: https://github.com/ngimel, https://github.com/eqy	2025-10-03 21:35:20 +00:00
PyTorch MergeBot	2a7c486750	Revert "Speed up FP precision lookup (#164044 )" This reverts commit `723ba21393`. Reverted https://github.com/pytorch/pytorch/pull/164044 on behalf of https://github.com/yangw-dev due to broke internal build In file included from xplat/caffe2/aten/src/ATen/DeviceAccelerator.cpp:1: xplat/caffe2/aten/src/ATen/Context.h:502:38: error: shift count >= width of type [-Werror,-Wshift-count-overflow] 502 \| return std::hash<size_t>{}((k1 << 32) \| k2); ([comment](https://github.com/pytorch/pytorch/pull/164044#issuecomment-3363016702))	2025-10-02 21:00:44 +00:00
Lakshay Garg	723ba21393	Speed up FP precision lookup (#164044 ) This commit simplifies the precision lookup and setting logic by reducing the number of branches and using a custom hash function. Fixes #161822. The issue described in #163709 still persists. This is meant as a short term fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164044 Approved by: https://github.com/ngimel, https://github.com/eqy	2025-10-02 00:59:19 +00:00
Animesh Jain	991e3d0d16	[dynamo][guards] Revert introduction of different types of lambda_guards (#163385 ) With https://fb.workplace.com/groups/260102303573409/permalink/787294574187510/ issue, it might be a better idea to just speedup _realize_dict and keep the changes very local. So reverting this PR as well, to return to clean slate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163385 Approved by: https://github.com/jansel	2025-09-27 18:20:48 +00:00
Karhou Tam	39df24fe04	[Code Clean] Replace `std::runtime_error` with `TORCH_CHECK` (#163610 ) Including: - `torch/csrc/instruction_counter` - `torch/csrc/lazy` - `torch/csrc/monitor` - `torch/csrc/profiler` - `torch/csrc/dynamo` Fixes part of #148114 Personal mistake about (PR #163317), this PR does the same thing and PR #163317 has already been approved by @albanD. This is a personal mistake on my part, and I'm so sorry about that. Hope you won't mind @albanD. 🥹 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163610 Approved by: https://github.com/albanD, https://github.com/Skylion007	2025-09-26 04:52:48 +00:00
PyTorch MergeBot	32ad29b72a	Revert "[dynamo][guards] Fail on an unknown framelocals to dict conversion (#162695 )" This reverts commit `a8432bcaad`. Reverted https://github.com/pytorch/pytorch/pull/162695 on behalf of https://github.com/anijain2305 due to internal failure at https://fburl.com/workplace/qiitdlp6 ([comment](https://github.com/pytorch/pytorch/pull/162695#issuecomment-3310757225))	2025-09-19 06:18:27 +00:00
PyTorch MergeBot	1302637a23	Revert "[dynamo][guards] Do not construct entire framelocals dict for LAMBDA_GUARD (#162525 )" This reverts commit `5f630d28d7`. Reverted https://github.com/pytorch/pytorch/pull/162525 on behalf of https://github.com/anijain2305 due to internal tests fail ([comment](https://github.com/pytorch/pytorch/pull/162525#issuecomment-3310748980))	2025-09-19 06:15:28 +00:00
joshuamarkovic	559e8d1c20	[doc]: Small typos (#162982 ) Small typo fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/162982 Approved by: https://github.com/ezyang, https://github.com/zou3519	2025-09-16 17:42:19 +00:00
Isuru Fernando	79d2418b5a	[inductor] Add FLOAT_IS_NAN and COMPLEX_IS_NAN guards (#162537 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162537 Approved by: https://github.com/anijain2305, https://github.com/mlazos ghstack dependencies: #162528	2025-09-12 04:32:46 +00:00
Isuru Fernando	5dd84559a5	[dynamo] Add DUAL_LEVEL_MATCH C++ guard (#162528 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162528 Approved by: https://github.com/anijain2305	2025-09-12 04:32:46 +00:00
Animesh Jain	a8432bcaad	[dynamo][guards] Fail on an unknown framelocals to dict conversion (#162695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162695 Approved by: https://github.com/williamwen42 ghstack dependencies: #162694	2025-09-11 15:01:00 +00:00
Animesh Jain	a3a40cb741	[dynamo][guards] Do not consturct framelocals to dict on GlobalsGuardAccessor (#162694 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162694 Approved by: https://github.com/williamwen42	2025-09-11 15:01:00 +00:00
Animesh Jain	5f630d28d7	[dynamo][guards] Do not construct entire framelocals dict for LAMBDA_GUARD (#162525 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162525 Approved by: https://github.com/williamwen42 ghstack dependencies: #162509	2025-09-10 18:52:15 +00:00
Animesh Jain	a67e798cb7	[dynamo][guards] Prevent framelocals to dict conversion for not required LAMBDA_GUARD (#162509 ) This is a smaller PR to reduce framelocals to dict conversion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162509 Approved by: https://github.com/williamwen42	2025-09-10 18:52:15 +00:00
Animesh Jain	4d5b3f2d5a	[dynamo][guards] Install dict watchers for recrusive dict tag optimization (#159796 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159796 Approved by: https://github.com/jansel	2025-08-12 09:49:11 +00:00
Markus Hoehnerbach	e167c7d0f3	[inductor] allocate non-blocking copy destinations in pinned memory (#155121 ) (#158758 ) Fixes #155121 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158758 Approved by: https://github.com/EikanWang, https://github.com/eellison	2025-08-07 17:07:26 +00:00
PyTorch MergeBot	83ba3f1101	Revert "[inductor] allocate non-blocking copy destinations in pinned memory (#155121 ) (#158758 )" This reverts commit `6085bf7565`. Reverted https://github.com/pytorch/pytorch/pull/158758 on behalf of https://github.com/davidberard98 due to I need to revert #158462 (it causes device-side asserts), and this PR causes a merge conflict in the test file. Sorry about that! ([comment](https://github.com/pytorch/pytorch/pull/158758#issuecomment-3152490371))	2025-08-04 21:47:11 +00:00
Markus Hoehnerbach	6085bf7565	[inductor] allocate non-blocking copy destinations in pinned memory (#155121 ) (#158758 ) Fixes #155121 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158758 Approved by: https://github.com/EikanWang, https://github.com/eellison	2025-08-04 21:22:11 +00:00
Animesh Jain	53e47af0f7	[dynamo][guards] Read the attr name from GetAttrGuardAccessor (#159754 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159754 Approved by: https://github.com/jansel ghstack dependencies: #159752	2025-08-04 16:51:27 +00:00
Animesh Jain	66ad881fc7	[dynamo][guards][refactor] Simplify type extraction from GuardManager (#159752 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159752 Approved by: https://github.com/jansel	2025-08-04 16:51:27 +00:00
Animesh Jain	64cbaa876c	[dynamo][guards] Make class members go through obj.__class__.__dict__ (#159534 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159534 Approved by: https://github.com/jansel	2025-08-04 05:12:44 +00:00
Animesh Jain	4516c59f5f	[dynamo][source] Add special source for __code__ and __closure__ (#159722 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159722 Approved by: https://github.com/jansel	2025-08-04 05:02:05 +00:00
PyTorch MergeBot	805a102beb	Revert "[dynamo][guards] Make class members go through obj.__class__.__dict__ (#159534 )" This reverts commit `1616777cd2`. Reverted https://github.com/pytorch/pytorch/pull/159534 on behalf of https://github.com/malfet due to Broke some inductor test and lint among other things, see `9c18901bfd/1` ([comment](https://github.com/pytorch/pytorch/pull/159534#issuecomment-3146983186))	2025-08-03 04:58:32 +00:00
Animesh Jain	1616777cd2	[dynamo][guards] Make class members go through obj.__class__.__dict__ (#159534 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159534 Approved by: https://github.com/jansel ghstack dependencies: #159186	2025-08-02 18:04:35 +00:00
Animesh Jain	7eb5fdb358	[dynamo][guards] Recursive dict tag optimization (#159183 ) Design doc here - https://docs.google.com/document/d/1W29DrWID5miGWlZXspsQVN5U0zydE3kjZpziOXrhuaY/edit?tab=t.0#bookmark=id.sba04iw9sp68 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159183 Approved by: https://github.com/jansel	2025-07-30 06:01:32 +00:00
Animesh Jain	f7d6e9f500	[dynamo][guards] More small guard optimizations (#159345 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159345 Approved by: https://github.com/williamwen42 ghstack dependencies: #159288	2025-07-29 18:36:49 +00:00
anwang	c55e72bea1	[Re-land][Inductor] Support native Inductor as backend for MTIA (#159211 ) The previous [diff/PR] (https://github.com/pytorch/pytorch/pull/158526) was reverted due to this docstring lint error: <img width="1736" height="722" alt="image" src="https://github.com/user-attachments/assets/216b1720-4002-48da-b5f3-32b5d48aaa54" /> I didn't add the docstring cause I thought I'm not supposed to add docstring for an EXISTING function. So this diff/PR is an exactly copy of the previous one, except for adding the docstring. ------------- This diff/PR includes the changes to support native Inductor integration for MTIA. The goal is to support `torch.compile(backend="inductor")` for MTIA. Inductor should generate code(triton kernel + python wrapper code) similar to CUDA. And the triton kernels can be launched eagerly. The changes include: - Add MTIA device interfaces used by Dynamo and Inductor, including APIs on device, stream, event, etc. - Add required torch.mtia APIs, like is_bf16_supported, memory_allocated, set_stream_by_id, etc. - MTIA specific codegen logic, for example, loading MTIA dynamic_library. - Other necessary changes to integrate with Inductor codegn, following other devices like CUDA, XPU. - Integrate with the [empty_strided_mtia](https://www.internalfb.com/code/fbsource/[0d017d3a4a1bdff7253f9c66a9f38e77bd62166b]/fbcode/caffe2/aten/src/ATen/native/mtia/EmptyTensor.cpp?lines=49%2C63%2C71%2C74%2C78) API that we’ve added for the new MTIA ATen backend. - A change in Inductor runtime to avoid re-initialize MTIADriver. - BUCK changes to include ATen-mtia in Inductor, and to use -USE_MTIA preprocessor flag. - Update `test_mnist_e2e.py` to cover native Inductor as backend, using the `--use_native_inductor` flag. - Add a personal script(`scripts/anwang/run_native_inductor_script.py`) for testing purpose. Note: - This approach(option 3) aims to provide a pytorch native approach of Inductor integration for MTIA, minimizing the onboarding overhead. The downside of this approach is that it doesn't leverage MTIA specific graph optimization, and is limited to eagerly launch overhead. - MTIA will support another approach(option 2) to provide best performance, based on WrapperFxCodegen. We should be able to reuse the fundamental changes of this diff for option 2, like the device interfaces, steam/event APIs, etc, especially as WrapperFxCodegen inherits PythonWrapperCodegen. Internal: References: - [post for context](https://fb.workplace.com/groups/mtiasw/permalink/1718377262384606/) - [Inductor integration discussion(option 1/2/3)](https://docs.google.com/document/d/1p6363OXtVIRv1hPoaKlRSK3j-iir3QIbDd5bjyqCNig/edit?tab=t.0#heading=h.7s4ns6wcnhmb) - [Project design doc(option 3)](https://docs.google.com/document/d/1jXUmhgoV9WvkMf-bcY3Od_kK9K_RDOdgHdt1LoQ5Tc4/edit?tab=t.0#heading=h.y43gwdqlv46w) - [early prototying diff](https://www.internalfb.com/diff/D75110196) - [MPS integration PR](https://github.com/pytorch/pytorch/pull/153959) - [empty_strided_xpu PR](https://github.com/pytorch/pytorch/pull/126678) Differential Revision: [D79040806](https://our.internmc.facebook.com/intern/diff/D79040806/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159211 Approved by: https://github.com/eellison, https://github.com/blaine-rister, https://github.com/jansel	2025-07-29 17:03:24 +00:00
PyTorch MergeBot	fe0ff12dab	Revert "[Inductor] Support native Inductor as backend for MTIA (#158526 )" This reverts commit `cd68559d04`. Reverted https://github.com/pytorch/pytorch/pull/158526 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/158526#issuecomment-3122186057))	2025-07-26 17:58:00 +00:00
anwang	cd68559d04	[Inductor] Support native Inductor as backend for MTIA (#158526 ) This diff/PR includes the changes to support native Inductor integration for MTIA. The goal is to support `torch.compile(backend="inductor")` for MTIA. Inductor should generate code(triton kernel + python wrapper code) similar to CUDA. And the triton kernels can be launched eagerly. The changes include: - Add MTIA device interfaces used by Dynamo and Inductor, including APIs on device, stream, event, etc. - Add required torch.mtia APIs, like is_bf16_supported, memory_allocated, set_stream_by_id, etc. - MTIA specific codegen logic, for example, loading MTIA dynamic_library. - Other necessary changes to integrate with Inductor codegn, following other devices like CUDA, XPU. - Integrate with the [empty_strided_mtia](https://www.internalfb.com/code/fbsource/[0d017d3a4a1bdff7253f9c66a9f38e77bd62166b]/fbcode/caffe2/aten/src/ATen/native/mtia/EmptyTensor.cpp?lines=49%2C63%2C71%2C74%2C78) API that we’ve added for the new MTIA ATen backend. - A change in Inductor runtime to avoid re-initialize MTIADriver. - BUCK changes to include ATen-mtia in Inductor, and to use -USE_MTIA preprocessor flag. - Update `test_mnist_e2e.py` to cover native Inductor as backend, using the `--use_native_inductor` flag. - Add a personal script(`scripts/anwang/run_native_inductor_script.py`) for testing purpose. Note: - This approach(option 3) aims to provide a pytorch native approach of Inductor integration for MTIA, minimizing the onboarding overhead. The downside of this approach is that it doesn't leverage MTIA specific graph optimization, and is limited to eagerly launch overhead. - MTIA will support another approach(option 2) to provide best performance, based on WrapperFxCodegen. We should be able to reuse the fundamental changes of this diff for option 2, like the device interfaces, steam/event APIs, etc, especially as WrapperFxCodegen inherits PythonWrapperCodegen. Internal: References: - [post for context](https://fb.workplace.com/groups/mtiasw/permalink/1718377262384606/) - [Inductor integration discussion(option 1/2/3)](https://docs.google.com/document/d/1p6363OXtVIRv1hPoaKlRSK3j-iir3QIbDd5bjyqCNig/edit?tab=t.0#heading=h.7s4ns6wcnhmb) - [Project design doc(option 3)](https://docs.google.com/document/d/1jXUmhgoV9WvkMf-bcY3Od_kK9K_RDOdgHdt1LoQ5Tc4/edit?tab=t.0#heading=h.y43gwdqlv46w) - [early prototying diff](https://www.internalfb.com/diff/D75110196) - [MPS integration PR](https://github.com/pytorch/pytorch/pull/153959) - [empty_strided_xpu PR](https://github.com/pytorch/pytorch/pull/126678) Differential Revision: [D78458745](https://our.internmc.facebook.com/intern/diff/D78458745/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158526 Approved by: https://github.com/blaine-rister, https://github.com/jansel, https://github.com/eellison	2025-07-26 08:16:34 +00:00
Animesh Jain	659f8fb115	[dynamo][guards] Add some relational guard helpers (#159077 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159077 Approved by: https://github.com/jansel ghstack dependencies: #158995	2025-07-25 06:28:10 +00:00
Animesh Jain	05a748d287	[dynamo][guards] Expand is_immutable_object to have None (#158995 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158995 Approved by: https://github.com/Lucaskabela, https://github.com/jansel	2025-07-25 06:12:05 +00:00
Animesh Jain	1b456c580d	[dynamo][guards] Add type info of the guarded value in guard managers (#158765 ) tlparse looks like this <img width="1165" height="226" alt="image" src="https://github.com/user-attachments/assets/04c4e6b1-34a3-4d9d-8304-6eb6d9a94980" /> This will aid in reading guards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158765 Approved by: https://github.com/Lucaskabela, https://github.com/StrongerXi	2025-07-23 16:59:15 +00:00
cyy	1b91954b9f	Suppress volatile type error (#158435 ) Fixes ``` /var/lib/jenkins/workspace/torch/csrc/dynamo/guards.cpp:5320:10: error: compound assignment to object of volatile-qualified type 'volatile char' is deprecated [-Werror,-Wdeprecated-volatile] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/158435 Approved by: https://github.com/janeyx99	2025-07-17 22:21:04 +00:00
Animesh Jain	2179afd714	[easy][guards] Add developer comment for posterity (#158471 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158471 Approved by: https://github.com/StrongerXi	2025-07-17 01:17:04 +00:00
Animesh Jain	cc0faeb80f	[dynamo][guards] Instruction count for guard eval for development work (#158214 ) Its turned off by default. Even the code is hidden before of the define preprocessing flag. It will be used only for development work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158214 Approved by: https://github.com/StrongerXi ghstack dependencies: #158215	2025-07-15 20:29:23 +00:00
Guilherme Leobas	e7167dbacf	[Set] Support sets in VariableBuilder (#153150 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153150 Approved by: https://github.com/zou3519	2025-07-04 00:45:03 +00:00
Animesh Jain	bb476310a4	[dynamo][guards] Stash root guard manager pointer in the LeafGuard (#157325 ) Preparing to simplify the recompilation reason codebase. This PR was 95% done by using AI tools. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157325 Approved by: https://github.com/jansel	2025-07-02 00:42:43 +00:00
zhxchen17	0f9c1b374f	[dynamo] Ensure global state guard is preserved across serialization. (#157285 ) Currently, every time we construct a GLOBAL_STATE guard, we always create a fresh guard based on the current global state. For precompile, we want to create a GLOBAL_STATE guard always based on some external sources, e.g. serialized global states. This can also be applied with the normal case where we just pass in the global state guard from Python. Differential Revision: [D77400988](https://our.internmc.facebook.com/intern/diff/D77400988/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157285 Approved by: https://github.com/jansel	2025-07-01 15:46:34 +00:00
haozhe.zhu	53e0b9c393	refine fp32 precision api (#125888 ) Based on the [conversation](https://github.com/pytorch/pytorch/issues/121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "high". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision') ![image](https://github.com/user-attachments/assets/f89143e5-d6a1-4865-9351-9a50439f5067) ### We provide 3 fp32 compute precision can be set: - "ieee": Not allowed to use any other internal computation data types . - "tf32": Allowed to use tf32 as internal computation data types. - "bf16": Allowed to use bf16 as internal computation data types. - "none": Precision's are not set. Can be override by its father node. ### Overriding Precision Settings Child node can be override by its father node if it is set to default. For current default settings: ``` backend = generic, op = all, precision setting = none backend = cuda, op = all, precision setting = none backend = cuda, op = conv, precision setting = tf32 backend = cuda, op = rnn, precision setting = tf32 backend = cuda, op = matmul, precision setting = none backend = matmul, op = all, precision setting = none backend = matmul, op = conv, precision setting = none backend = matmul, op = rnn, precision setting = none backend = matmul, op = matmul, precision setting = none ``` - If the user set `torch.backends.mkldnn.fp32_precision="bf16"`, his child nodes `torch.backends.mkldnn.matmul.fp32_precision` / `torch.backends.mkldnn.conv.fp32_precision` / `torch.backends.mkldnn.rnn.fp32_precision` will also be override to "bf16". - If the user set `torch.backends.fp32_precision="bf16"`, `torch.backends.mkldnn.fp32_precision` and his child nodes will also we override to "bf16". ### Backward Compatible Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` are not enough to represent the status for `torch.backends.cudnn.rnn.fp32_precision="ieee"` and `torch.backends.cudnn.conv.fp32_precision="tf32"`. Therefore, our goal for backward compatible is - If the user only uses previous APIs, it will work as previous expectations. - If the user use new API to change the status to an un-representable status for old API, and try to access the status by old API. We will raise Runtime Error and point the document for user. ### Test Plan ``` python test/test_cuda.py -k test_fp32_precision_with_tf32 python test/test_cuda.py -k test_fp32_precision_with_float32_matmul_precision python test/test_cuda.py -k test_invalid_status_for_legacy_api python test/test_mkldnn.py -k test_mlkdnn_get_set python test/test_mkldnn.py -k test_generic_precision python test/test_mkldnn.py -k test_invalid python test/test_mkldnn.py -k test_default_use_parent ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125888 Approved by: https://github.com/jgong5, https://github.com/albanD Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>	2025-06-26 10:32:20 +00:00
Yuanyuan Chen	07bb097698	Fix clang-tidy bugprone* warnings (#148529 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/148529 Approved by: https://github.com/ezyang	2025-06-23 23:09:56 +00:00
Xuehai Pan	5b210bb3a6	[BE][9/16] fix typos in torch/ (torch/csrc/) (#156319 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156319 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315, #156316, #156317	2025-06-23 02:57:50 +00:00
PyTorch MergeBot	1d3bca40ed	Revert "[BE][9/16] fix typos in torch/ (torch/csrc/) (#156319 )" This reverts commit `a23ccaa847`. Reverted https://github.com/pytorch/pytorch/pull/156319 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))	2025-06-22 12:31:56 +00:00
Xuehai Pan	a23ccaa847	[BE][9/16] fix typos in torch/ (torch/csrc/) (#156319 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156319 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315, #156316, #156317	2025-06-22 08:43:49 +00:00
karthickai	10c3e6ec43	[inductor][dynamo] Include operator name in size/stride/alignment assertion (#152353 ) Fixes #151930 This PR updates the `assert_size_stride` and `assert_alignment` functions in [guards.cpp](https://github.com/pytorch/pytorch/blob/main/torch/csrc/dynamo/guards.cpp) to accept an optional `op_name` argument and includes it in the error messages. The corresponding type stubs in [guards.pyi](https://github.com/pytorch/pytorch/blob/main/torch/_C/_dynamo/guards.pyi) are updated to match the new function arg. In [inductor/ir.py](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/ir.py) extracts the operator name from the FX graph and passes it into the `codegen_size_asserts` and `codegen_alignment_asserts` functions, so that generated assertions in Triton code include the op name for better debugging. Added unit tests inside [test_torchinductor.py](https://github.com/pytorch/pytorch/blob/main/test/inductor/test_torchinductor.py). - Verified both successful and failing assertion cases include the operator name. - Verified that generated Triton code contains the op name inside the asserts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152353 Approved by: https://github.com/jansel, https://github.com/shunting314	2025-06-03 19:21:15 +00:00
Animesh Jain	635b73e697	[dynamo][guards] Flush cache to more accurately measure guard overhead (#154764 ) We observed that guard overhead at runtime using profiler traces was higher than reported in this profiling function at the compile time. After investigation, we found that f_locals are already in cache and that was causing the guard overhead to be way smaller while profiling during the compilation. To be more realistic, we flush the cache here. Profiling the guard overhead during compilation (in addition to at runtime) allows faster iteration time, and logging in tlparse and internal databases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154764 Approved by: https://github.com/zou3519, https://github.com/jansel, https://github.com/StrongerXi	2025-06-03 11:50:57 +00:00
PyTorch MergeBot	b86aaaae0b	Revert "[dynamo][guards] Flush cache to more accurately measure guard overhead (#154764 )" This reverts commit `7dee899130`. Reverted https://github.com/pytorch/pytorch/pull/154764 on behalf of https://github.com/seemethere due to This fails internal tests see [fburl.com/diff/67gyp7gp](https://fburl.com/diff/67gyp7gp) ([comment](https://github.com/pytorch/pytorch/pull/154769#issuecomment-2933629894))	2025-06-03 06:13:49 +00:00
Animesh Jain	7dee899130	[dynamo][guards] Flush cache to more accurately measure guard overhead (#154764 ) We observed that guard overhead at runtime using profiler traces was higher than reported in this profiling function at the compile time. After investigation, we found that f_locals are already in cache and that was causing the guard overhead to be way smaller while profiling during the compilation. To be more realistic, we flush the cache here. Profiling the guard overhead during compilation (in addition to at runtime) allows faster iteration time, and logging in tlparse and internal databases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154764 Approved by: https://github.com/zou3519, https://github.com/jansel, https://github.com/StrongerXi ghstack dependencies: #154769	2025-06-02 23:01:58 +00:00

1 2 3 4 5 ...

263 Commits