pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	e448f32944	Revert "[BE] typing for decorators - signal/windows/windows (#131582 )" This reverts commit `8689d377f9`. Reverted https://github.com/pytorch/pytorch/pull/131582 on behalf of https://github.com/clee2000 due to breaking lint internally D60265575 ([comment](https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359))	2024-07-28 03:29:31 +00:00
PyTorch MergeBot	d90f6b45c0	Revert "[inductor] Add type hints to functions in mkldnn_fusion.py (#131820 )" This reverts commit `fb3ddafbcf`. Reverted https://github.com/pytorch/pytorch/pull/131820 on behalf of https://github.com/clee2000 due to reverting this to revert something else, only action you should need to do is to rebase and merge again, sorry for the churn ([comment](https://github.com/pytorch/pytorch/pull/131820#issuecomment-2254327833))	2024-07-28 03:26:14 +00:00
PyTorch MergeBot	8f5cf46405	Revert "Fix public API tests (#131386 )" This reverts commit `91fcfd8760`. Reverted https://github.com/pytorch/pytorch/pull/131386 on behalf of https://github.com/clee2000 due to reverting this to revert something else, only action you should need to do is to rebase and merge again, sorry for the churn ([comment](https://github.com/pytorch/pytorch/pull/131386#issuecomment-2254327487))	2024-07-28 03:23:04 +00:00
cyy	7be0ce51b6	Fix handle serialization error (#131871 ) This is a bug to try serialise std::string in C API Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/131871 Approved by: https://github.com/Skylion007	2024-07-28 00:33:20 +00:00
Aaron Orenstein	3e0ccb3a9f	Fixing fake tensor SymInt caching (#131966 ) Summary: Some tests are failing because of a weird interaction between the symbolic sizes and the `set()` - back it out for now. Differential Revision: D60320595 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131966 Approved by: https://github.com/oulgen	2024-07-27 22:43:57 +00:00
Shuo Ding	d07a125af2	[Inductor] supporting pointwise intermediate nodes in B2B-GEMM (#131685 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131685 Approved by: https://github.com/eellison	2024-07-27 20:11:20 +00:00
Xuehai Pan	14158d892a	[BE][tests] show local variables on failure in tests (#131151 ) ------ As per the title, add argument `--locals` for `unittest` and `--showlocals --tb=long` for `pytest` in CI. Some failures cannot be reproduced on the local machine but exist on cloud CI. This change allows us to investigate the test failure more easily. Example output: https://github.com/pytorch/pytorch/actions/runs/9961546996/job/27523888353?pr=130710#step:20:3361 ```text /opt/conda/envs/py_3.8/lib/python3.8/site-packages/sympy/core/function.py:307: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ cls = FloorDiv, base = -1.00000000000000, divisor = -1.00000000000000 @classmethod def eval(cls, base, divisor): # python test/test_dynamic_shapes.py -k TestDimConstraints.test_dim_constraints_solve_full # Assert triggered by inequality solver # assert base.is_integer, base # assert divisor.is_integer, divisor # We don't provide the same error message as in Python because SymPy # makes it difficult to check the types. if divisor.is_zero: raise ZeroDivisionError("division by zero") if base in (int_oo, -int_oo, sympy.oo, -sympy.oo) and divisor in ( int_oo, -int_oo, sympy.oo, -sympy.oo, ): return sympy.nan if base is sympy.nan or divisor is sympy.nan: return sympy.nan if base.is_zero: return sympy.S.Zero if base.is_integer and divisor == 1: return base if base.is_integer and divisor == -1: return sympy.Mul(base, -1) if ( isinstance(base, sympy.Number) and isinstance(divisor, sympy.Number) and ( base in (int_oo, -int_oo, sympy.oo, -sympy.oo) or divisor in (int_oo, -int_oo, sympy.oo, -sympy.oo) ) ): r = float(base) / float(divisor) if r == math.inf: return int_oo elif r == -math.inf: return -int_oo elif math.isnan(r): return sympy.nan else: return sympy.Integer(math.floor(r)) if isinstance(base, sympy.Integer) and isinstance(divisor, sympy.Integer): return sympy.Integer(int(base) // int(divisor)) if isinstance(base, FloorDiv): return FloorDiv(base.args[0], base.args[1] * divisor) # Expands (x + y) // b into x // b + y // b. # This only works if floor is an identity, i.e. x / b is an integer. for term in sympy.Add.make_args(base): quotient = term / divisor if quotient.is_integer and isinstance(divisor, sympy.Integer): # NB: this is correct even if the divisor is not an integer, but it # creates rational expressions that cause problems with dynamic # shapes. return FloorDiv(base - term, divisor) + quotient try: gcd = sympy.gcd(base, divisor) if gcd != 1: > return FloorDiv( sympy.simplify(base / gcd), sympy.simplify(divisor / gcd) ) base = -1.00000000000000 cls = FloorDiv divisor = -1.00000000000000 gcd = 1.00000000000000 quotient = 1.00000000000000 term = -1.00000000000000 /opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/_sympy/functions.py:159: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ args = (FloorDiv, -1.00000000000000, -1.00000000000000), kwargs = {} @wraps(func) def wrapper(args, kwargs): try: > retval = cfunc(args, **kwargs) E RecursionError: maximum recursion depth exceeded in comparison E E To execute this test, run the following from the base repo dir: E python test/test_sympy_utils.py -k TestValueRanges.test_binary_ref_fn_floordiv_dtype_float E E This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 args = (FloorDiv, -1.00000000000000, -1.00000000000000) cfunc = <functools._lru_cache_wrapper object at 0x7fc5303173a0> func = <function Function.__new__ at 0x7fc530317280> kwargs = {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131151 Approved by: https://github.com/ezyang	2024-07-27 19:39:40 +00:00
albanD	466ea8ce54	Add fallback() to torch.library (#131707 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131707 Approved by: https://github.com/zou3519	2024-07-27 18:02:35 +00:00
cyy	8e5a367311	[5/N] Fix clang-tidy warnings in jit (#131969 ) Follows #131903 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131969 Approved by: https://github.com/ezyang	2024-07-27 17:54:20 +00:00
Xuehai Pan	918ece4f4d	[BE][Easy][11/19] enforce style for empty lines in import segments in `test/dy*/` (#129762 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129762 Approved by: https://github.com/anijain2305	2024-07-27 17:43:53 +00:00
Angela Yi	ae9f17a821	[aoti] Rename OSS DynamicArg and OpKernel (#131862 ) Summary: Fixing P1495466240 which I think is due to the fact that internal also has an "OpKernel" in the same namespace, using thrift instead of json. Test Plan: https://www.internalfb.com/intern/testinfra/testrun/4785074844896831 Differential Revision: D60273354 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131862 Approved by: https://github.com/desertfire	2024-07-27 17:34:50 +00:00
PyTorch MergeBot	8cdfdb41bc	Revert "[NestedTensor] Integrate the layer normalization operator along the jagged dimension into NestedTensor (#131519 )" This reverts commit `f862f45730`. Reverted https://github.com/pytorch/pytorch/pull/131519 on behalf of https://github.com/atalman due to broke CI: test_nestedtensor.py::TestNestedTensorSubclassCPU::test_layer_norm_with_lengths_requires_grad_False_components_require_grad_False_cpu_float32 [GH job link](https://github.com/pytorch/pytorch/actions/runs/10121747545/job/27996722731) [HUD commit link](`f862f45730`) ([comment](https://github.com/pytorch/pytorch/pull/131519#issuecomment-2254167994))	2024-07-27 14:45:47 +00:00
Nikita Shulga	07389163f0	[C10][BE] Use range loop (#131922 ) Non-function change that iterates over entries in `getCollectiveTraceJson` and uses `C10_UNUSED` rather than `(void)i;` trick Pull Request resolved: https://github.com/pytorch/pytorch/pull/131922 Approved by: https://github.com/XilunWu	2024-07-27 11:26:27 +00:00
cyy	f83ef69b84	Fix typo in assignment operators (#131890 ) Most typos were introduced in #131077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131890 Approved by: https://github.com/Skylion007	2024-07-27 11:13:42 +00:00
cyy	c82441e07a	Fix std::optional checking bug (#131874 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/131874 Approved by: https://github.com/Skylion007	2024-07-27 11:08:10 +00:00
Yifu Wang	93a4671746	Add out_dtypes to fused_all_gather_scaled_matmul's args (#131831 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131831 Approved by: https://github.com/weifengpy ghstack dependencies: #131410	2024-07-27 11:07:43 +00:00
Yifu Wang	12cd040edd	[micro_pipeline_tp] exclude simple overlappable collectives as micro-pipeline TP candidates when reorder_for_compute_comm_overlap is enabled (#131410 ) When a collective can be hidden through either simple overlapping or micro-pipeline TP, we prefer simple overlapping to avoid the overhead associated with decomposition. If `reorder_for_compute_comm_overlap` is enabled, we identify collectives that can be hidden through simple overlapping and exclude them from micro-pipeline TP candidates. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131410 Approved by: https://github.com/weifengpy	2024-07-27 11:07:43 +00:00
Animesh Jain	36d24925c6	[inline_inbuilt_nn_modules][inductor-cpu] More skips for dynamic shapes when inlining enabled (#131948 ) The issue is tracked here - https://github.com/pytorch/pytorch/issues/131929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131948 Approved by: https://github.com/eellison, https://github.com/leslie-fang-intel ghstack dependencies: #131744, #131928	2024-07-27 10:03:49 +00:00
Will Feng	aee6bcdba4	[Traceable FSDP2][Inductor] Apply compute/comm reordering passes to achieve overlap (#131614 ) This PR enables the Inductor compute/comm reordering passes to Traceable FSDP2 to achieve overlap. Note that the overlap is not maximally optimized yet and the follow-up work will be done in subsequent PRs. Test commands: - `pytest -rA test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc` - `pytest -rA test/distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_transformer_backend_inductor` - `pytest -rA test/distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_nested_fully_shard_backend_inductor` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131614 Approved by: https://github.com/yifuwang ghstack dependencies: #131510	2024-07-27 08:39:58 +00:00
Will Feng	9e06572704	[Traceable FSDP2][Inductor] Create grouped nodes for FSDP2 all-gather code block and reduce-scatter code block (after Buffer/Operation split) (#131510 ) This PR creates these `GroupedSchedulerNode`s: - One for each all-gather code block (cast + copy-in + all-gather) - One for each all-gather-wait code block (all-gather-wait + copy-out) - One for each reduce-scatter code block (copy-in + reduce-scatter) - One for each reduce-scatter-wait code block (reduce-scatter-wait) This serves two goals: - Prevent outside ops from being fused into these op groups, in order to have more predicable memory usage. - Make it easier to specify the dependency e.g. from `i+1` all-gather group node to the `i` all-gather-wait group node, to enforce FSDP2 comm ordering (i.e. "serialization of comms"). The actual "reorder-for-FSDP-compute-comm-overlap" PR will come next. Test commands: - `pytest -rA test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc` - `pytest -rA test/distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_transformer_backend_inductor` - `pytest -rA test/distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_nested_fully_shard_backend_inductor` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131510 Approved by: https://github.com/yifuwang	2024-07-27 08:39:58 +00:00
cyy	99e13e68e9	[4/N] Fix clang-tidy warnings in jit (#131903 ) Follows #131830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131903 Approved by: https://github.com/Skylion007	2024-07-27 08:08:14 +00:00
Janani Sriram	f862f45730	[NestedTensor] Integrate the layer normalization operator along the jagged dimension into NestedTensor (#131519 ) Modify the existing `layer normalization` operator in PyTorch, invoked by `torch.layer_norm`, to allow for reductions along the jagged dimension of a nested tensor. The function originally had a basic implementation for reducing along 1 non-ragged dimension. This diff, which uses the `aten` padding operator, enables PyTorch users to invoke `torch.nn.functional.layer_norm` on a nested tensor when reducing along the ragged dimension, e.g. `` in a `(B, , M)` or `(B, *, M, N)` nested tensor. Write unit tests based on the `softmax` jagged operator to verify the accuracy of the ragged reduction implementation for `torch.nn.functional.layer_norm`. Add unit tests to verify error handling for unsupported features. Note that this implementation is limited to nested tensors with `ragged_idx == 1`, i.e. the ragged dimension is not transposed. The layer normalization operator also requires an operation on a 2-dimensional layer; for nested tensors with 4 or more dimensions, I flatten the extra dimensions, then unflatten them after performing layer normalization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131519 Approved by: https://github.com/davidberard98 ghstack dependencies: #131518	2024-07-27 07:09:10 +00:00
Janani Sriram	bcf5c68c18	[NestedTensor] Integrate the softmax operator along the jagged dimension into NestedTensor (#131518 ) Modify the existing `softmax` operator in PyTorch, invoked by `torch.softmax`, to allow for reductions along the jagged dimension of a nested tensor. The function originally had a basic implementation for reducing along 1 non-ragged dimension. This diff, which uses the aten padding operator, enables PyTorch users to invoke `torch.softmax` on a nested tensor when reducing along the ragged dimension, e.g. `` in a `(B, , M)` nested tensor. Write unit tests based on the `sum` and `mean` jagged operators to verify the accuracy of the ragged reduction implementation for `torch.softmax`. Add unit tests to verify error handling for unsupported features in `NestedTensor` `torch.softmax`. Note that this implementation is limited to nested tensors with `ragged_idx == 1`, i.e. the ragged dimension is not transposed. In addition, the `softmax` operator is required to take in as input an integer for the reduction dimension `dim`, requiring new unit tests heavily inspired by the `sum` and `mean` jagged operator unit tests. `Softmax` also allows for reducing along the batch dimension. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131518 Approved by: https://github.com/davidberard98	2024-07-27 07:09:10 +00:00
Avik Chaudhuri	c49e857d32	[pt] immutable accessors in graph signature (#131940 ) Summary: splitting PT part of D60253955 Test Plan: existing tests Differential Revision: D60296909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131940 Approved by: https://github.com/angelayi, https://github.com/zhxchen17	2024-07-27 05:32:53 +00:00
Oguz Ulgen	96c1862e0b	Remove mypy ignore from torch/_dynamo/variables/__init__.py (#131784 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131784 Approved by: https://github.com/aorenste, https://github.com/zou3519, https://github.com/Skylion007	2024-07-27 05:07:33 +00:00
drisspg	1bfe7eb7e6	Update how we do sdpa testing (#131743 ) ## Motivation This refactor aligns our testing methodology with the Flash Attention upstream repository while addressing several key issues: 1. Standardized comparison: We now compare fused kernels against float64 references, using the maximum of a calculated tolerance (based on same-precision math implementation) or standard float32 `atol`. 2. Reduced redundancy: Utilizing the same tensors for both same-precision math and fused kernel runs eliminates duplication. 3. Improved maintainability: The new approach simplifies tolerance adjustments across all affected tests. 4. Consistency: Standardizing tensor comparisons ensures a more uniform and reliable testing suite. These changes collectively simplify our testing code, improve its maintainability, and provide a more robust framework for validating our attention mechanisms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131743 Approved by: https://github.com/jainapurva, https://github.com/jbschlosser	2024-07-27 03:58:49 +00:00
Vishwa Raj Singh	bcdba9f91d	Added hpu backend support in fsdp utils (#127757 ) In fsdp init_utils, adding support for hpu backend device on _get_device API. Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127757 Approved by: https://github.com/wconstab, https://github.com/jgong5, https://github.com/awgu	2024-07-27 03:30:59 +00:00
Xu Han	28fd2e905d	[inductor] enhance cpp_builder lint check. (#131752 ) enhance cpp_builder `mypy` check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131752 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-07-27 02:46:27 +00:00
Xu Han	a90b8b967a	[inductor] enable windows inductor UTs (#131767 ) Changes: 1. Add `skipIfWindows` function. 2. Fix `fresh_inductor_cache` raise error on Windows, due to can't delete loaded modules. 3. Disable some UTs, which are not passed on Windows. 4. Enable test_torchinductor in Windows CI. I have tested passed on my dev machine: <img width="864" alt="image" src="https://github.com/user-attachments/assets/91d5a62f-7383-44b3-b614-99940f196fdb"> TODO: review and fix the skipped cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131767 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-07-27 02:46:03 +00:00
Avik Chaudhuri	3768faec2f	carry cond in data-dependent error (#131932 ) Test Plan: existing Differential Revision: D60302877 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131932 Approved by: https://github.com/zhxchen17	2024-07-27 02:13:04 +00:00
Xu Han	9606d61e0c	[reland][inductor] switch AotCodeCompiler to new cpp_builder (#130127 ) Changes: 1. Switch `AotCodeCompiler` to new cpp_builder. 2. Only use `deprecated_cpp_compile_command` for `fb_code`, due to I can't debug anymore on no Meta internal environment access. 3. Add `TODO` comments for further some Meta employee help on contine to do this work. 4. Due to item 3, we only remaining `deprecated_cpp_compile_command` for `fb_code` to be fix, let's remove `validate_new_cpp_commands`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-07-27 01:46:13 +00:00
Matthew Hoffman	fdf1451bfa	Add `__all__` to torch.optim to define public interface (#131959 ) There was a regression in the public interface for `torch.optim` introduced in #125452 when `torch/optim/__init__.pyi` was merged into `torch/optim/__init__.py`. [The import aliases were not preserved and so now `pyright` thinks that these classes are not publicly exported from `torch/optim/__init__.py`.](https://github.com/pytorch/pytorch/pull/125452/files#diff-941595c1e1aa06bec94578499dd3654532a5183d0bc1bcd94d1f33b47e0d0adfL1-L15) ``` error: "SGD" is not exported from module "torch.optim" ``` Adding these classes/modules to `__all__` fixes this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131959 Approved by: https://github.com/ezyang	2024-07-27 01:03:25 +00:00
Sergii Dymchenko	8458980bbf	Move benchmarks/dynamo/huggingface configuration to YAML (#131724 ) Similar to https://github.com/pytorch/pytorch/pull/120299 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131724 Approved by: https://github.com/shunting314	2024-07-27 00:55:04 +00:00
Zain Rizvi	ef8d118c67	Sync with changes to test-infra's scale-config.yml (#131955 ) This synchronized lf-canary-scale-config and lf-scale-config with one in test-infra. This really needs some automatic validation to prevent it from drifting out of sync over and over again (coming soon...) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131955 Approved by: https://github.com/malfet	2024-07-27 00:25:40 +00:00
Nikita Shulga	8b04edcac1	Delete unused yml files (#131298 ) To be landed at least 3 days later after previous commit Pull Request resolved: https://github.com/pytorch/pytorch/pull/131298 Approved by: https://github.com/ZainRizvi ghstack dependencies: #130762	2024-07-27 00:21:22 +00:00
Zain Rizvi	1e00f055a4	Move distributed experimental jobs back to the amazon2 for now (#131963 ) Something about the new Amazon2023 AMI is making some distributed tests fail. Moving them back to the old AMI until the issue is fixed This particular jobs are causing this test to fail: https://github.com/pytorch/pytorch/issues/129539 More details in https://github.com/pytorch/pytorch/issues/131962 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131963 Approved by: https://github.com/clee2000	2024-07-26 23:44:56 +00:00
Joel Schlosser	91fcfd8760	Fix public API tests (#131386 ) This PR fixes a bug in `test_correct_module_names` introduced in #130497. It also addresses post-fix test failures in: * `torch/ao/quantization/__init__.py` - set the correct `__module__` for several public API helpers * `torch/library.py` - add `register_vmap` to `__all__` * `torch/nn/attention/flex_attention.py` - make `round_up_to_multiple` private by prepending an underscore * `torch/storage.py` - introduce `__all__` to avoid `Self` being re-exported as a public API * `torch/distributed/pipelining/schedules.py` - add `ZeroBubbleAlgorithm` to `__all__` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131386 Approved by: https://github.com/albanD	2024-07-26 23:38:43 +00:00
Shangdi Yu	02b922900b	[aoti] Fix float16 and bfloat16 for generated GPU code (#131437 ) Fixes #131333 Summary: - Add header to define `float16` and `bfloat16` as `at::Half` and `at::BFloat16`. - change `float16` and `bfloat16` to `float` before passing to kernel. code generated before: ```cpp ..... half var_1; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_item_float16(convert_arrayref_tensor_to_tensor(arg1_1), &var_1)); .... ``` code generated now: ```cpp typedef at::Half half; typedef at::BFloat16 bfloat16; ..... half var_1_tmp; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_item_float16(convert_arrayref_tensor_to_tensor(arg1_1), &var_1_tmp)); float var_1 = float(var_1_tmp); .... ``` Test plan: `TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCHINDUCTOR_CPP_WRAPPER=1 python test/inductor/test_torchinductor.py -k GPUTests.test_unspec_inputs_cuda` Work in progress. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131437 Approved by: https://github.com/desertfire	2024-07-26 23:36:11 +00:00
Bin Bao	0272934238	[Inductor][CPU] Fix an InvalidVecISA issue on CI (#131812 ) Summary: CPU CI nodes failed to find valid VecISA because importing torch under the default pytorch directory will fail with the following msg, so switch cwd to a tmp directory. ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/var/lib/jenkins/workspace/torch/__init__.py", line 66, in <module> from torch.torch_version import __version__ as __version__ File "/var/lib/jenkins/workspace/torch/torch_version.py", line 4, in <module> from torch.version import __version__ as internal_version ModuleNotFoundError: No module named 'torch.version' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131812 Approved by: https://github.com/eellison, https://github.com/malfet	2024-07-26 22:31:44 +00:00
Sergii Dymchenko	5489ff8e94	Use Mermaid for the diagram in torch/ao/quantization/fx/README.md (#131412 ) preview `3a0efcdfa3/torch/ao/quantization/fx/README.md` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131412 Approved by: https://github.com/jerryzh168	2024-07-26 22:01:21 +00:00
Peter Bell	16cd1aaa1d	[inductor] Improve sort kernel perf (#131719 ) Closes #129507 This makes two changes to the sort kernel: 1. Use int16 for the indices since we only operate on small dims anyway 2. Instead of passing an explicit mask, we pass the rnumel and imply the mask from that which saves an additional reduction in the sort kernel's inner loop. In my benchmarks, this gives enough of a perf improvement to bump up the max rblock to 512. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131719 Approved by: https://github.com/eellison	2024-07-26 21:56:47 +00:00
Luca Wehrstedt	b90bc66766	Enable FlashAttention on Windows (#131906 ) Let's just give this a try. Reland of https://github.com/pytorch/pytorch/pull/131875. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131906 Approved by: https://github.com/drisspg	2024-07-26 21:41:56 +00:00
rzou	d73b55d64b	Support meta tensors as inputs to the triton_kernel_wrapper HOPs (#131896 ) We automatically generate FakeTensor support for them (the FakeTensor kernel for a triton kernel is "return None"). The same thing should apply to the meta kernel. Tests: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/131896 Approved by: https://github.com/oulgen	2024-07-26 21:41:03 +00:00
Animesh Jain	fb98cd33f1	[inline_inbuilt_nn_modules][inductor-cpu] Skip test_quantized_linear_amx (#131928 ) The issue is tracked here - https://github.com/pytorch/pytorch/issues/131929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131928 Approved by: https://github.com/eellison ghstack dependencies: #131744	2024-07-26 21:28:17 +00:00
Shunting Zhang	c8626a4e1f	[BE] add a list of inductor test files to skip resetting dynamo (#131551 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131551 Approved by: https://github.com/zou3519	2024-07-26 21:08:15 +00:00
Catherine Lee	fde577702d	[TD] More synonyms for filepath (#131838 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/131838 Approved by: https://github.com/PaliC, https://github.com/ZainRizvi	2024-07-26 21:02:42 +00:00
Zain Rizvi	1bda3a3135	Migrate nightly.yml workflow & docs to Amazon 2023 (#131821 ) A continuation of the migration started in - https://github.com/pytorch/pytorch/pull/131250 Migrates nightly jobs and the linux-docs job in pull.yml To preserve reusability, I'm switching to a new format here that allows one to only specify the runner prefix instead of the full runner name, allowing multiple jobs to continue using the same base runner type like how they did before Validation: - Nightly builds passed in the prev commit: https://github.com/pytorch/pytorch/actions/runs/10102118461/job/27937632823?pr=131821 - Latest commit only updated the docs job in pull.yml, and that has already passed: https://github.com/pytorch/pytorch/actions/runs/10114635537/job/27974392472?pr=131821 The other in-progress jobs are irrelevant Pull Request resolved: https://github.com/pytorch/pytorch/pull/131821 Approved by: https://github.com/atalman, https://github.com/seemethere	2024-07-26 20:54:43 +00:00
James Wu	0e6df1e0fb	Disable remote cache on test (#131908 ) Summary: Fixes test internally Test Plan: buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/inductor:cudagraph_trees -- --exact 'caffe2/test/inductor:cudagraph_trees - test_cache_hit_forward_miss_backward (caffe2.test.inductor.test_cudagraph_trees.CudaGraphTreeTests)' Passes Differential Revision: D60293177 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131908 Approved by: https://github.com/clee2000	2024-07-26 20:19:02 +00:00
Brian Hirsh	071ac38141	fast-path FakeTensor detach (#131899 ) Fixes https://github.com/pytorch/pytorch/issues/128281, see investigation at https://github.com/pytorch/pytorch/issues/128281#issuecomment-2252976926. benchmark: ``` python benchmarks/dynamo/huggingface.py --performance --timing --explain --backend aot_eager --device cuda --training --float32 --only BertForMaskedLM ``` time before: ``` TIMING: entire_frame_compile:30.85435 backend_compile:23.98599 total_wall_time:30.85435 ``` time after: ``` TIMING: entire_frame_compile:24.35898 backend_compile:18.15235 total_wall_time:24.35898 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131899 Approved by: https://github.com/ezyang, https://github.com/zou3519, https://github.com/albanD	2024-07-26 20:16:08 +00:00
Catherine Lee	2ec8312a28	Add rerun_disabled_tests for inductor (#131681 ) Test in prod? THis also turns on mem leak check Briefly checked that ``` python3 ".github/scripts/filter_test_configs.py" \ --workflow "inductor" \ --job-name "cuda12.1-py3.10-gcc9-sm86 / build" \ --test-matrix "{ include: [ { config: "inductor", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_distributed", shard: 1, num_shards: 1, runner: "linux.g5.12xlarge.nvidia.gpu" }, { config: "inductor_huggingface", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_timm", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_timm", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_torchbench", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_torchbench", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "dynamic_inductor_huggingface", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "dynamic_inductor_timm", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "dynamic_inductor_timm", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "dynamic_inductor_torchbench", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "dynamic_inductor_torchbench", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "aot_inductor_huggingface", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "aot_inductor_timm", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "aot_inductor_timm", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "aot_inductor_torchbench", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "aot_inductor_torchbench", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_cpp_wrapper_abi_compatible", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, ]} " \ --selected-test-configs "" \ --pr-number "${PR_NUMBER}" \ --tag "${TAG}" \ --event-name "schedule" \ --schedule "29 8 * * *" \ --branch "${HEAD_BRANCH}" ``` has rerun disabled tests option in the test matrix I don't think all these things need to run but I'm not sure which ones (probably just inductor?) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131681 Approved by: https://github.com/zou3519	2024-07-26 20:05:24 +00:00

1 2 3 4 5 ...

76242 Commits