pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	03b7ec9237	Revert "create a new torch.cuda.memory_usage_in_bytes api (#140719 )" This reverts commit `9febc47637`. Reverted https://github.com/pytorch/pytorch/pull/140719 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the test is flaky on ROCm ([comment](https://github.com/pytorch/pytorch/pull/140719#issuecomment-2479832082))	2024-11-15 20:05:32 +00:00
Yu Guo	9febc47637	create a new torch.cuda.memory_usage_in_bytes api (#140719 ) Summary: the current torch.cuda.memory_usage returns the memory utilization, more specifically, percent of time over the past sample period global memory being read/written for Nvidia. see more details in https://github.com/pytorch/pytorch/issues/140638 Test Plan: added a new unittest Differential Revision: D65928031 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140719 Approved by: https://github.com/xw285cornell, https://github.com/hongxiayang	2024-11-15 05:59:40 +00:00
Masaki Kozuki	71d8bb7ede	implement `torch._foreach_rsqrt` (#134574 ) Related: - #133367 c Pull Request resolved: https://github.com/pytorch/pytorch/pull/134574 Approved by: https://github.com/eqy, https://github.com/janeyx99	2024-11-12 15:34:35 +00:00
Jiang, Yanbing	f77eb07662	Split int4wo weight packing (#139611 ) Fixes https://github.com/pytorch/ao/issues/1117. This PR is to seperate int4wo weight packing between CPU and other devices, to help implement `INT4CPULayout` in torchao based on https://github.com/pytorch/ao/issues/1117#issuecomment-2451252756. Now, for CPU, the input `weight` of `_convert_weight_to_int4pack_for_cpu` is [n, k] int32, output is [n, k / 2] uint8. The input packed weight of `_weight_int4pack_mm_for_cpu` is [n, k / 2] uint8. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139611 Approved by: https://github.com/jerryzh168	2024-11-12 10:12:50 +00:00
Haifeng Jin	2af5172774	fix dynamo tracking numpy 2 ops (#138686 ) Fixes #136559 As we upgrade to NumPy 2, torch falsely filtered out `numpy.random` as unsupported in dynamo tracking. This PR changes the filtering rules to include them while keeping behavior with numpy 1 unchanged. Before this PR, the following tests failed: ``` PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_functions.py -k FunctionTests.test_numpy_random PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_unspec.py -k UnspecTests.test_to_tensor PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k FakeTensorTest.test_export_numpy PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k PropagateRealTensorsFakeTensorTest.test_export_numpy_propagate_real_tensors ``` With this PR, the supported/unsupported ops in NumPy 1 are not changed. For NumPy 2, only the `numpy.random` ops that are already supported with NumPy 1 are added to the supported list. I used the following scripts to check the differences before and after the change for both NumPy 1 & 2. The output is empty for NumPy 1 since there is no change. The output is a list of `numpy.random` that considered supported for NumPy 2. ```py from torch._dynamo import trace_rules import numpy as np def new_numpy_function_ids(): unsupported_funcs = {"seed", "ranf", "get_bit_generator", "RandomState", "set_bit_generator", "sample"} def is_supported(k, v, mod): if not callable(v): return False if not getattr(v, "__module__", None): return True if v.__module__ == mod.__name__: return True if v.__module__ == "numpy.random.mtrand" and mod.__name__== "numpy.random" and k not in unsupported_funcs: return True return False rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: for k, v in mod.__dict__.items(): if is_supported(k, v, mod): rv[id(v)] = f"{mod.__name__}.{k}" return rv def old_numpy_function_ids(): rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: rv.update( { id(v): f"{mod.__name__}.{k}" for k, v in mod.__dict__.items() if callable(v) and (getattr(v, "__module__", None) or mod.__name__) == mod.__name__ } ) return rv rv1 = set(old_numpy_function_ids().values()) rv2 = set(new_numpy_function_ids().values()) for v in (rv1 - rv2): print(v) print("****") for v in (rv2 - rv1): print(v) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138686 Approved by: https://github.com/williamwen42	2024-11-08 23:38:53 +00:00
Thomas Bohnstingl	d1c26b0781	Improvements for associative_scan - slicing of xs (#138858 ) In this PR, the combine_fn is consistently called with a slice along the scan dim. It implements part of https://github.com/pytorch/pytorch/pull/136966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138858 Approved by: https://github.com/ydwu4	2024-11-05 23:38:21 +00:00
CaoE	9e14d86573	[Inductor][CPP] Add oneDNN BRGEMM config for Half cpp gemm template (#136255 ) `kernel_micro_gemm` generated using BRGEMM: ``` template <bool accum> inline void kernel_micro_gemm( const half* __restrict__ A, const half* __restrict__ B, float* __restrict__ C, int64_t M, int64_t N, int64_t K, int64_t lda, int64_t ldb, int64_t ldc ) { at::native::cpublas::brgemm( M, N, K, lda, ldb, ldc, 1.f, accum ? 1.f : 0.f, A, B, C); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136255 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-11-05 05:33:29 +00:00
Yuanhao Ji	b46e1fc141	[Dynamo] Fix graph break when `tensor.split()` is called within a device context manager (#139270 ) Fixes: #139183 Note: this case can also be reproduced on cpu Pull Request resolved: https://github.com/pytorch/pytorch/pull/139270 Approved by: https://github.com/ezyang Co-authored-by: Vincent Moens <vincentmoens@gmail.com>	2024-11-02 23:55:51 +00:00
PyTorch MergeBot	b617d4813c	Revert "fix dynamo tracking numpy 2 ops (#138686 )" This reverts commit `124eac255e`. Reverted https://github.com/pytorch/pytorch/pull/138686 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I am seeing inductor failure with hf_BigBird number of graph breaks after it lands ([comment](https://github.com/pytorch/pytorch/pull/138686#issuecomment-2452718164))	2024-11-01 23:34:06 +00:00
Haifeng Jin	124eac255e	fix dynamo tracking numpy 2 ops (#138686 ) Fixes #136559 As we upgrade to NumPy 2, torch falsely filtered out `numpy.random` as unsupported in dynamo tracking. This PR changes the filtering rules to include them while keeping behavior with numpy 1 unchanged. Before this PR, the following tests failed: ``` PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_functions.py -k FunctionTests.test_numpy_random PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_unspec.py -k UnspecTests.test_to_tensor PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k FakeTensorTest.test_export_numpy PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k PropagateRealTensorsFakeTensorTest.test_export_numpy_propagate_real_tensors ``` With this PR, the supported/unsupported ops in NumPy 1 are not changed. For NumPy 2, only the `numpy.random` ops that are already supported with NumPy 1 are added to the supported list. I used the following scripts to check the differences before and after the change for both NumPy 1 & 2. The output is empty for NumPy 1 since there is no change. The output is a list of `numpy.random` that considered supported for NumPy 2. ```py from torch._dynamo import trace_rules import numpy as np def new_numpy_function_ids(): unsupported_funcs = {"seed", "ranf", "get_bit_generator", "RandomState", "set_bit_generator", "sample"} def is_supported(k, v, mod): if not callable(v): return False if not getattr(v, "__module__", None): return True if v.__module__ == mod.__name__: return True if v.__module__ == "numpy.random.mtrand" and mod.__name__== "numpy.random" and k not in unsupported_funcs: return True return False rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: for k, v in mod.__dict__.items(): if is_supported(k, v, mod): rv[id(v)] = f"{mod.__name__}.{k}" return rv def old_numpy_function_ids(): rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: rv.update( { id(v): f"{mod.__name__}.{k}" for k, v in mod.__dict__.items() if callable(v) and (getattr(v, "__module__", None) or mod.__name__) == mod.__name__ } ) return rv rv1 = set(old_numpy_function_ids().values()) rv2 = set(new_numpy_function_ids().values()) for v in (rv1 - rv2): print(v) print("****") for v in (rv2 - rv1): print(v) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138686 Approved by: https://github.com/lezcano, https://github.com/williamwen42	2024-11-01 19:51:40 +00:00
Ma Jian	ded83d2b16	support torch._utils._flatten_dense_tensors/_unflatten_dense_tensors … (#139023 ) Fixes #138897 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139023 Approved by: https://github.com/ezyang	2024-10-28 21:59:07 +00:00
Edward Z. Yang	192385e261	Add sym_sum to TorchInGraphFunctionVariable (#138848 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/138848 Approved by: https://github.com/Skylion007	2024-10-27 20:04:35 +00:00
Animesh Jain	4dd4d38ca9	[hierarchical-compilation][hop] Introduce invoke_subgraph (#137538 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137538 Approved by: https://github.com/zou3519	2024-10-22 15:33:34 +00:00
Animesh Jain	0a2407b93c	[dynamo] Support omegaconf DictConfig (#138378 ) Fixes https://github.com/pytorch/pytorch/issues/138224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138378 Approved by: https://github.com/jansel ghstack dependencies: #138359	2024-10-20 02:43:17 +00:00
Xuehai Pan	1d6932937e	[dynamo] fix `NamedTupleVariable` for PyStructSequence (`torch.return_types.`) support (#137776 ) PyStructSequence is the C API equivalent for `collections.namedtuple` in Python. But they have different constructors: ```python tuple = NamedTupleType(args) tuple = NamedTupleType._make(args) tuple = StructSequenceType(args) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/137776 Approved by: https://github.com/jansel	2024-10-13 06:46:41 +00:00
Aditya Tewari	575f260229	Extend vectorization with SVE(ARM) with Torch Compile (Inductor) (#134672 ) Motivation Enable SVE vectorization with `torch.compile` Extends PR: #119571 * This PR enables vectorization for codegen part using SVE-256 (vec length) * The changes can be extended to other SVE vec lengths I've done some comparisons against existing NEON implementation with SVE vectorization enabled route for `torch.compile` Test results are for 8 cores on ARM Neoverse_V1 <img width="359" alt="Screenshot 2024-08-28 at 16 02 07" src="https://github.com/user-attachments/assets/6961fbea-8285-4ca3-b92e-934a2db50ee2"> It's worth mentioning, for standalone `SiLU op` there's a `~1.8x` speedup with `torch.compile` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134672 Approved by: https://github.com/jgong5, https://github.com/malfet	2024-10-10 13:20:40 +00:00
Michael Lazos	d5785d4295	[Dynamo] Handle torch function subclass/mode dispatch on generic tensor methods (#137119 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137119 Approved by: https://github.com/williamwen42, https://github.com/anijain2305 ghstack dependencies: #137114, #137115, #137116, #137117, #137120, #137227	2024-10-09 02:29:40 +00:00
Michael Lazos	e41dffbedd	[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 ) (#137114 ) This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode) Typically the bytecode for a context manager looks like this during a graph break: 1. graph call 2. enter context 3. unsupported code 4. exit context 5. resume call resume fn structure: 1. enter context 2. jump ... 3. exit context The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack). So for torch function modes the structure of our output code is this: 1. graph call 2. mutate tf mode stack to replay mutations 4. unsupported code 5. on exception restore stack 6. resume function Then our resume fn looks like this: 1. no-op enter torch function mode 2. jump 3. exit tf mode To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context). Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly. Approved by: https://github.com/williamwen42 ghstack dependencies: #134732, #133137, #135443, #135444 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137114 Approved by: https://github.com/yanboliang	2024-10-09 02:29:40 +00:00
PyTorch MergeBot	d34b617bb9	Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 ) (#137114 )" This reverts commit `51bc839b94`. Reverted https://github.com/pytorch/pytorch/pull/137114 on behalf of https://github.com/huydhn due to The top of the stack has been reverted but it leaves trunk in a broken state, so I try to revert the rest of the stack ([comment](https://github.com/pytorch/pytorch/pull/137114#issuecomment-2400765603))	2024-10-08 20:33:17 +00:00
PyTorch MergeBot	c88c0e6c65	Revert "[Dynamo] Handle torch function subclass/mode dispatch on generic tensor methods (#137119 )" This reverts commit `d255b34c0a`. Reverted https://github.com/pytorch/pytorch/pull/137119 on behalf of https://github.com/malfet due to Need to revert to be able to revert https://github.com/pytorch/pytorch/pull/136910 ([comment](https://github.com/pytorch/pytorch/pull/137119#issuecomment-2400401262))	2024-10-08 17:09:26 +00:00
Michael Lazos	d255b34c0a	[Dynamo] Handle torch function subclass/mode dispatch on generic tensor methods (#137119 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137119 Approved by: https://github.com/williamwen42 ghstack dependencies: #137114, #137115, #137116, #137117, #137120, #137227	2024-10-07 18:55:26 +00:00
Michael Lazos	51bc839b94	[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 ) (#137114 ) This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode) Typically the bytecode for a context manager looks like this during a graph break: 1. graph call 2. enter context 3. unsupported code 4. exit context 5. resume call resume fn structure: 1. enter context 2. jump ... 3. exit context The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack). So for torch function modes the structure of our output code is this: 1. graph call 2. mutate tf mode stack to replay mutations 4. unsupported code 5. on exception restore stack 6. resume function Then our resume fn looks like this: 1. no-op enter torch function mode 2. jump 3. exit tf mode To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context). Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly. Approved by: https://github.com/williamwen42 ghstack dependencies: #134732, #133137, #135443, #135444 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137114 Approved by: https://github.com/yanboliang	2024-10-07 18:55:26 +00:00
Jeff Daily	c7b0d4b148	raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114 ) raw_alloc is used by cudnn, miopen, thrust, and tunableop. Without this PR, the env var for disabling the caching allocator will only partially work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114 Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD Co-authored-by: Nichols A. Romero <nick.romero@amd.com>	2024-10-04 15:36:29 +00:00
PyTorch MergeBot	0d1701f310	Revert "raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114 )" This reverts commit `7001907480`. Reverted https://github.com/pytorch/pytorch/pull/131114 on behalf of https://github.com/PaliC due to failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/131114#issuecomment-2390615007))	2024-10-03 06:22:55 +00:00
Jeff Daily	7001907480	raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114 ) raw_alloc is used by cudnn, miopen, thrust, and tunableop. Without this PR, the env var for disabling the caching allocator will only partially work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114 Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD Co-authored-by: Nichols A. Romero <nick.romero@amd.com>	2024-10-02 16:27:15 +00:00
Animesh Jain	289df45cee	Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 )" (#136590 ) This reverts commit `7743149b2b`. Reverts * https://github.com/pytorch/pytorch/pull/135503 * https://github.com/pytorch/pytorch/pull/135502 * https://github.com/pytorch/pytorch/pull/135422 This passes this test. Earlier, the getitem would stay like a getitem in the Fx graph. But now the fake tensor propagations fails saying that .item is called. It seems that torch function is not getting triggered while fake tensor propagation. ``` import torch from torch.nn.attention.flex_attention import BlockMask, _mask_mod_signature, _score_mod_signature, flex_attention from torch._inductor.lowering import make_pointwise, register_lowering from torch._inductor.virtualized import ops from torch.nn.attention.flex_attention import create_block_mask torch.set_default_device('cuda') flex_attention = torch.compile(flex_attention, dynamic=False) prefix_lengths = torch.arange(8) def prefix_lm(b, h, q, kv): return prefix_lengths[b] >= kv mask = create_block_mask(prefix_lm, 8, None, 512, 512, _compile=True) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136590 Approved by: https://github.com/Chillee	2024-09-25 21:10:43 +00:00
Jianyu Huang	0a35986cdb	Add option to configure reduced precision math backend for SDPA (#135964 ) Summary: Address https://github.com/pytorch/pytorch/issues/135778 by adding a global flag to configure whether using high precision or low precision for math backend of SDPA. Test Plan: buck2 run mode/opt //scripts/feikou/llm:run_attn_kernels Differential Revision: D62625515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135964 Approved by: https://github.com/jbschlosser	2024-09-24 07:11:38 +00:00
Xuehai Pan	9961aaa601	[dynamo] simplify implementation for `functools.reduce` (#133778 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133778 Approved by: https://github.com/jansel, https://github.com/anijain2305	2024-09-16 04:53:06 +00:00
Michael Lazos	1b9daeb240	[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 ) This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode) Typically the bytecode for a context manager looks like this during a graph break: 1. graph call 2. enter context 3. unsupported code 4. exit context 5. resume call resume fn structure: 1. enter context 2. jump ... 3. exit context The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack). So for torch function modes the structure of our output code is this: 1. graph call 2. mutate tf mode stack to replay mutations 4. unsupported code 5. on exception restore stack 6. resume function Then our resume fn looks like this: 1. no-op enter torch function mode 2. jump 3. exit tf mode To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context). Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135422 Approved by: https://github.com/williamwen42 ghstack dependencies: #134732, #133137, #135443, #135444	2024-09-14 18:52:22 +00:00
Michael Lazos	5c5c33ac32	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-14 18:52:22 +00:00
PyTorch MergeBot	8c8a3086a7	Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 )" This reverts commit `4528777e03`. Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/mlazos due to broke python test/quantization/pt2e/test_numeric_debugger.py TestNumericDebugger.test_re_export_preserve_handle modified yesterday ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2350937008))	2024-09-14 10:02:55 +00:00
PyTorch MergeBot	f3180f0088	Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 )" This reverts commit `7743149b2b`. Reverted https://github.com/pytorch/pytorch/pull/135422 on behalf of https://github.com/mlazos due to broke python test/quantization/pt2e/test_numeric_debugger.py TestNumericDebugger.test_re_export_preserve_handle modified yesterday ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2350937008))	2024-09-14 10:02:55 +00:00
Michael Lazos	7743149b2b	[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 ) This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode) Typically the bytecode for a context manager looks like this during a graph break: 1. graph call 2. enter context 3. unsupported code 4. exit context 5. resume call resume fn structure: 1. enter context 2. jump ... 3. exit context The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack). So for torch function modes the structure of our output code is this: 1. graph call 2. mutate tf mode stack to replay mutations 4. unsupported code 5. on exception restore stack 6. resume function Then our resume fn looks like this: 1. no-op enter torch function mode 2. jump 3. exit tf mode To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context). Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135422 Approved by: https://github.com/williamwen42 ghstack dependencies: #134732, #133137, #135443, #135444	2024-09-14 02:41:08 +00:00
Michael Lazos	4528777e03	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-14 02:40:43 +00:00
PyTorch MergeBot	eb7dd91dd1	Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 )" This reverts commit `fafdd588f2`. Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/albanD due to Broke tests on main ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2348886378))	2024-09-13 12:52:58 +00:00
PyTorch MergeBot	ac169795a9	Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 )" This reverts commit `2af3b8ffd8`. Reverted https://github.com/pytorch/pytorch/pull/135422 on behalf of https://github.com/albanD due to Broke tests on main ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2348886378))	2024-09-13 12:52:57 +00:00
Michael Lazos	2af3b8ffd8	[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 ) This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode) Typically the bytecode for a context manager looks like this during a graph break: 1. graph call 2. enter context 3. unsupported code 4. exit context 5. resume call resume fn structure: 1. enter context 2. jump ... 3. exit context The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack). So for torch function modes the structure of our output code is this: 1. graph call 2. mutate tf mode stack to replay mutations 4. unsupported code 5. on exception restore stack 6. resume function Then our resume fn looks like this: 1. no-op enter torch function mode 2. jump 3. exit tf mode To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context). Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135422 Approved by: https://github.com/williamwen42 ghstack dependencies: #134732, #133137, #135443, #135444	2024-09-13 08:41:24 +00:00
Michael Lazos	fafdd588f2	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-13 08:41:00 +00:00
Joel Schlosser	525bec804c	NJT <-> padded dense conversions (#125947 ) This PR: * Implements the pre-existing `nt.to_padded_tensor(padding_val)` ATen op via the FBGEMM kernel + appropriate view gymnastics (since that kernel only handles 2D values) * Introduces a new `_nested_from_padded_tensor` op for the reverse conversion, implemented via the reverse FBGEMM kernel + view gymnastics * Note: there is currently no public API for this; design booted to a future PR TODO: * ~~Propagate min / max sequence length via the new factory function `_nested_from_padded_tensor`~~ * ~~Verify that Inductor does computation fusion via test logic~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/125947 Approved by: https://github.com/soulitzer	2024-09-12 17:54:25 +00:00
PyTorch MergeBot	183c32fd3b	Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 )" This reverts commit `0d15122092`. Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/clee2000 due to something in this stack broke functorch/test_control_flow.py::TestControlFlow::test_scan_simple_graph [GH job link](https://github.com/pytorch/pytorch/actions/runs/10804912306/job/29980571390) [HUD commit link](`444b52ff40`), newly added test yesterday ([comment](https://github.com/pytorch/pytorch/pull/133137#issuecomment-2344054339))	2024-09-11 15:57:00 +00:00
Michael Lazos	0d15122092	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-11 04:18:22 +00:00
Thomas Bohnstingl	e889252493	Implementation of scan (#134102 ) This operation is supposed to be the pendant to the `associative_scan`, but can operate with non-associative functions. @ydwu4 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134102 Approved by: https://github.com/ydwu4	2024-09-10 04:51:16 +00:00
Yanbo Liang	d81731615f	[Dynamo] Adding CallFunctionNoArgsSource and (#135425 ) CallFunctionNoArgsGuardAccessor to support torch.cuda.current_device() Pull Request resolved: https://github.com/pytorch/pytorch/pull/135425 Approved by: https://github.com/anijain2305	2024-09-09 22:46:00 +00:00
PyTorch MergeBot	70a65a8bd5	Revert "NJT <-> padded dense conversions (#125947 )" This reverts commit `09a5e88bef`. Reverted https://github.com/pytorch/pytorch/pull/125947 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing dynamo test `09a5e88bef`, maybe a landrace ([comment](https://github.com/pytorch/pytorch/pull/125947#issuecomment-2339228570))	2024-09-09 22:01:09 +00:00
Joel Schlosser	09a5e88bef	NJT <-> padded dense conversions (#125947 ) This PR: * Implements the pre-existing `nt.to_padded_tensor(padding_val)` ATen op via the FBGEMM kernel + appropriate view gymnastics (since that kernel only handles 2D values) * Introduces a new `_nested_from_padded_tensor` op for the reverse conversion, implemented via the reverse FBGEMM kernel + view gymnastics * Note: there is currently no public API for this; design booted to a future PR TODO: * ~~Propagate min / max sequence length via the new factory function `_nested_from_padded_tensor`~~ * ~~Verify that Inductor does computation fusion via test logic~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/125947 Approved by: https://github.com/soulitzer	2024-09-09 19:37:32 +00:00
Wanchao Liang	cfc227ad43	[reland][dtensor] move DTensor to public namespace (#134203 ) reland of https://github.com/pytorch/pytorch/pull/133113 I have to create a new PR because the previous reverted PR could not either be rebased, or imported successfully :( ---- Moving DTensor to be in the public namespace, to formally add the documentation page that includes all the public APIs. This includes: * many path renames and path import fixes * a dedicated doc page without too much content yet (adding in the next PRs) * To preserve the BC for users still using the torch.distributed._tensor, I added a shim script to redirect old path calls to the new module The BC preserving is evidented by the fact that all DTensor tests are still working without changing the public imports. So it's safe to land the changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/134203 Approved by: https://github.com/tianyu-l	2024-09-08 17:08:40 +00:00
sanchitintel	43dcb4bb61	Revise CPU vectorization ISA support API (#135075 ) Revising (mostly renaming) CPU vectorization ISA support API (non-frontend-user-facing). Also added AVX512_BF16 ISA detection API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135075 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/ezyang	2024-09-05 12:14:56 +00:00
rzou	2276940f8c	Make Dynamo inline through torch._library.custom_ops.autograd (#135066 ) Fixes https://github.com/pytorch/pytorch/issues/135057 The bug was: in the situation that Dynamo graph breaks in the forward and Compiled Autograd uses Dynamo to introspect the backward, we end up running into a "Unsupported: inlining through SKIPFILES" error. The solution is to mark the entirety of this module as inlineable. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/135066 Approved by: https://github.com/bdhirsh, https://github.com/williamwen42, https://github.com/yanboliang	2024-09-04 21:48:28 +00:00
Xuehai Pan	eed0d76682	[dynamo][itertools] refactor `itertools.islice` to use polyfill (#133876 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133876 Approved by: https://github.com/jansel ghstack dependencies: #133864, #133894	2024-08-31 10:08:07 +00:00
Xuehai Pan	ec660c383e	[dynamo] reduce overhead for `PolyfilledFunctionVariable.call_function` (#134842 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134842 Approved by: https://github.com/jansel	2024-08-31 09:12:46 +00:00

1 2 3 4 5 ...

262 Commits