Commit Graph

905 Commits

Author SHA1 Message Date
Bin Bao
df7e43e5d4 [AOTI] Fix aot_inductor_package test errors (#148279)
Summary: Fix fbcode test failures introduced by https://github.com/pytorch/pytorch/pull/147975. Make sure script.ld is copied to the build-time directory.

Differential Revision: D70454149

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148279
Approved by: https://github.com/zoranzhao
2025-03-05 05:22:48 +00:00
Xuehai Pan
754fb834db [BE][CI] bump ruff to 0.9.0: string quote styles (#144569)
Reference: https://docs.astral.sh/ruff/formatter/#f-string-formatting

- Change the outer quotes to double quotes for nested f-strings

```diff
- f'{", ".join(args)}'
+ f"{', '.join(args)}"
```

- Change the inner quotes to double quotes for triple f-strings

```diff
  string = """
-     {', '.join(args)}
+     {", ".join(args)}
  """
```

- Join implicitly concatenated strings

```diff
- string = "short string " "short string " f"{var}"
+ string = f"short string short string {var}"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144569
Approved by: https://github.com/Skylion007
ghstack dependencies: #146509
2025-02-24 19:56:09 +00:00
atalman
4ece056791 Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073)
Should resolve: https://github.com/pytorch/pytorch/issues/144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146073
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj
2025-02-19 03:52:26 +00:00
PyTorch MergeBot
7622e29a37 Revert "Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073)"
This reverts commit eecee5863e.

Reverted https://github.com/pytorch/pytorch/pull/146073 on behalf of https://github.com/atalman due to breaks Locally building benchmarks ([comment](https://github.com/pytorch/pytorch/pull/146073#issuecomment-2667054179))
2025-02-18 22:23:35 +00:00
FFFrog
b10ba0a46c Unify all sympy versions to avoid conflicts within PyTorch (#147197)
As the title stated.

There are some tiny diffrences between 1.13.1 and 1.13.3:
1.13.1:
2e489cf4b1/sympy/core/numbers.py (L1591)

1.13.3:
b4ce69ad5d/sympy/core/numbers.py (L1591)

**Previous PR:**
https://github.com/pytorch/pytorch/pull/143908

**ISSUE Related:**
https://github.com/pytorch/pytorch/issues/147144
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147197
Approved by: https://github.com/malfet
2025-02-18 10:51:43 +00:00
atalman
eecee5863e Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073)
Should resolve: https://github.com/pytorch/pytorch/issues/144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146073
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj
2025-02-14 21:23:19 +00:00
PyTorch MergeBot
e06ee4aa9f Revert "Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073)"
This reverts commit 06f4a5c0e5.

Reverted https://github.com/pytorch/pytorch/pull/146073 on behalf of https://github.com/atalman due to breaks macos builds: ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package ([comment](https://github.com/pytorch/pytorch/pull/146073#issuecomment-2659802389))
2025-02-14 16:44:46 +00:00
atalman
06f4a5c0e5 Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073)
Should resolve: https://github.com/pytorch/pytorch/issues/144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146073
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj
2025-02-14 15:29:59 +00:00
Nikita Shulga
0d5f0a81c5 [CMake] Find HomeBrew OpenMP on MacOS (#145870)
Either via `OMP_PREFIX` envvar or by searching in `/opt/homebrew/opt/libomp` folder

Modify libomp bundling logic in setup.py to change absolute path to libomp.dylib to a relative one if necessary
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145870
Approved by: https://github.com/Skylion007, https://github.com/atalman
ghstack dependencies: #145871
2025-01-30 03:19:51 +00:00
bglass@quansight.com
40ccb7a86d cpp_wrapper: Move #includes to per-device header files (#145932)
Summary:
This prepares us for the next PR in the stack, where we introduce pre-compiled per-device header files to save compilation time.

Reland https://github.com/pytorch/pytorch/pull/143909 after merge conflicts.

Co-authored-by: Benjamin Glass <[bglass@quansight.com](mailto:bglass@quansight.com)>

Differential Revision: D68656960

Pulled By: benjaminglass1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145932
Approved by: https://github.com/yushangdi, https://github.com/benjaminglass1

Co-authored-by: bglass@quansight.com <bglass@quansight.com>
2025-01-29 21:08:45 +00:00
albanD
29ddf9a63e Document dispatch trace build flag (#145517)
Ok, the build flag seems to have been broken for a while since the function it calls doesn't exist anymore.
Repurposed it to enable dispatcher printing (which requires a full (and slow) debug build otherwise).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145517
Approved by: https://github.com/bdhirsh
2025-01-24 03:19:39 +00:00
Nikhil Gupta
41b38f755c Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392)" (#145505)
https://github.com/pytorch/pytorch/pull/134124 was reverted by https://github.com/pytorch/pytorch/pull/145392 due to KleidiAI clone issue.

1. This reverts commit 0940eb6d44 (https://github.com/pytorch/pytorch/pull/145392 )and Fixes KleidiAI mirror issue.
2. KleidiAI is now cloned from github mirror instead of arm gitlab

Change-Id: I7d6eee7214cd117d3057d615936fcc3ee6052fa2

Fixes https://github.com/pytorch/pytorch/issues/145273

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145505
Approved by: https://github.com/malfet
2025-01-23 18:50:59 +00:00
albanD
0940eb6d44 Reverting the PR adding Kleidiai-based int4 kernels (#145392)
Mitigation for https://github.com/pytorch/pytorch/issues/145273
Reverting https://github.com/pytorch/pytorch/pull/134124 and https://github.com/pytorch/pytorch/pull/144074

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145392
Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/atalman, https://github.com/digantdesai
2025-01-22 20:11:49 +00:00
Nikita Shulga
dc9b77cc55 [MPS] Support includes in metal objects (#145087)
Useful for code reuse for Metal shader build both for eager mode and MPSInductor, but it requires one to implement `_cpp_embed_headers` tool that, as name suggests, would preprocess and embeds the for shader to be used in dynamic compilation.
Test using:
 -  `TestMetalLibrary.test_metal_include`
 - Moving `i0`/`i1` implementation to `c10/util/metal_special_math.h` and call it from `SpecialOps.metal` shader, which now looks much more compact:
 ```metal
template <typename T, typename Tout = T>
void kernel
i0(constant T* input,
   device Tout* output,
   uint index [[thread_position_in_grid]]) {
  output[index] = c10::i0(static_cast<Tout>(input[index]));
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145087
Approved by: https://github.com/dcci
ghstack dependencies: #145023
2025-01-18 05:35:22 +00:00
atalman
a215e174a1 [BE] Remove conda from scripts and build files Part 2 (#145015)
Continuation of https://github.com/pytorch/pytorch/pull/144870

Remove conda logic from scripts:

1. Remove conda build from triton build script
2. Remove conda checks from setup.py
3. Remove conda from release scripts
4. Script read_conda_versions.sh is not used (checked via git grep)

Related to: https://github.com/pytorch/pytorch/issues/138506
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145015
Approved by: https://github.com/malfet, https://github.com/Skylion007
2025-01-17 16:26:24 +00:00
PyTorch MergeBot
94c0f15302 Revert "cpp_wrapper: Move #includes to per-device header files (#143909)"
This reverts commit d62b3979da.

Reverted https://github.com/pytorch/pytorch/pull/143909 on behalf of https://github.com/kit1980 due to breaking internal builds because of removal of torch‎/_inductor‎/codegen‎/aoti_runtime‎/implementation.cpp‎ ([comment](https://github.com/pytorch/pytorch/pull/143909#issuecomment-2597188669))
2025-01-17 00:36:38 +00:00
Benjamin Glass
d62b3979da cpp_wrapper: Move #includes to per-device header files (#143909)
This prepares us for the next PR in the stack, where we introduce pre-compiled per-device header files to save compilation time.

Differential Revision: [D67938955](https://our.internmc.facebook.com/intern/diff/D67938955)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143909
Approved by: https://github.com/desertfire
2025-01-15 21:14:02 +00:00
atalman
e14c36d3f4 Set maximum supported version of Python as 3.13 (#144396)
Same as https://github.com/pytorch/pytorch/pull/119743 Required for Release 2.6.0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144396
Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/malfet
2025-01-08 16:16:10 +00:00
Nikhil Gupta
94737e8a2a [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)
Description:
1. Quantize Linear Layer Weights to 4-bits:
Quantize the weights of the Linear layer to 4 bits, using symmetric quantization.
Pack two 4-bit weights into one uint8 container.
Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32.

2. Prepare Quantized Weights, Scales, and Optional Bias:
After quantizing, obtain the quantized_weights, scales, and groupsize.
If the original Linear layer has a bias, prepare it as well.

3. Pack the Weights Efficiently:
Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias.
```python
packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features)
```
Input parameters should include:
in_features and out_features (the same as the Linear layer’s corresponding parameters).

4. Perform Dynamic Quantized Matrix Multiplication:
Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights.
```python
output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights,  groupsize, in_features, out_features)
```
Inputs required include:
The input tensor, packed_weights , groupsize, and the in_features and out_features.

API Usage: https://github.com/pytorch/pytorch/issues/143289

Model Perf :
7B Transformer model:
Prefill : 340 t/s
Decode  : 40  t/s
2B Transformer model
Prefill : 747 t/s
Decode  : 80  t/s

Tests:
python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight
Ran 1 test in 0.016s

OK

python test/test_linalg.py -k test__dyn_quant_matmul_4bit
Ran 8 tests in 0.077s

OK

python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit
Ran 8 tests in 11.454s

Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124
Approved by: https://github.com/digantdesai, https://github.com/malfet
2024-12-20 19:32:03 +00:00
PyTorch MergeBot
8136daff5a Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)"
This reverts commit 4b82251011.

Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks lots of internal build ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2555953189))
2024-12-19 23:33:17 +00:00
Nikhil Gupta
4b82251011 [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)
Description:
1. Quantize Linear Layer Weights to 4-bits:
Quantize the weights of the Linear layer to 4 bits, using symmetric quantization.
Pack two 4-bit weights into one uint8 container.
Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32.

2. Prepare Quantized Weights, Scales, and Optional Bias:
After quantizing, obtain the quantized_weights, scales, and groupsize.
If the original Linear layer has a bias, prepare it as well.

3. Pack the Weights Efficiently:
Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias.
```python
packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features)
```
Input parameters should include:
in_features and out_features (the same as the Linear layer’s corresponding parameters).

4. Perform Dynamic Quantized Matrix Multiplication:
Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights.
```python
output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights,  groupsize, in_features, out_features)
```
Inputs required include:
The input tensor, packed_weights , groupsize, and the in_features and out_features.

API Usage: https://github.com/pytorch/pytorch/issues/143289

Model Perf :
7B Transformer model:
Prefill : 340 t/s
Decode  : 40  t/s
2B Transformer model
Prefill : 747 t/s
Decode  : 80  t/s

Tests:
python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight
Ran 1 test in 0.016s

OK

python test/test_linalg.py -k test__dyn_quant_matmul_4bit
Ran 8 tests in 0.077s

OK

python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit
Ran 8 tests in 11.454s

Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124
Approved by: https://github.com/digantdesai, https://github.com/malfet
2024-12-19 18:51:26 +00:00
PyTorch MergeBot
14fe1f7190 Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)"
This reverts commit d3ff2d42c2.

Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/malfet due to This broke S390 builds, includes cpuinfo unconditionally ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2552560208))
2024-12-19 01:05:11 +00:00
Nikhil Gupta
d3ff2d42c2 [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)
Description:
1. Quantize Linear Layer Weights to 4-bits:
Quantize the weights of the Linear layer to 4 bits, using symmetric quantization.
Pack two 4-bit weights into one uint8 container.
Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32.

2. Prepare Quantized Weights, Scales, and Optional Bias:
After quantizing, obtain the quantized_weights, scales, and groupsize.
If the original Linear layer has a bias, prepare it as well.

3. Pack the Weights Efficiently:
Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias.
```python
packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features)
```
Input parameters should include:
in_features and out_features (the same as the Linear layer’s corresponding parameters).

4. Perform Dynamic Quantized Matrix Multiplication:
Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights.
```python
output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights,  groupsize, in_features, out_features)
```
Inputs required include:
The input tensor, packed_weights , groupsize, and the in_features and out_features.

API Usage: https://github.com/pytorch/pytorch/issues/143289

Model Perf :
7B Transformer model:
Prefill : 340 t/s
Decode  : 40  t/s
2B Transformer model
Prefill : 747 t/s
Decode  : 80  t/s

Tests:
python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight
Ran 1 test in 0.016s

OK

python test/test_linalg.py -k test__dyn_quant_matmul_4bit
Ran 8 tests in 0.077s

OK

python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit
Ran 8 tests in 11.454s

Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124
Approved by: https://github.com/digantdesai, https://github.com/malfet
2024-12-18 22:30:07 +00:00
Xinya Zhang
424156c26c [ROCm] Update to AOTriton 0.8b (#140172)
Notable new features for SDPA operators on AMD systems from AOTriton 0.8b:

1. Nestedtensor support;
2. MQA/GQA support;
3. Restore Efficient attention support for causal=True and seqlen_q != seqlen_k cases;
    + The kernel should use top-left alignment, bottom right alignment will be added later
4. Move gfx1100 (RX7900/W7800/W7900) out of experimental support status.
   However, users are strongly recommended to update to ROCM 6.2.4, notably for
   its firmware updates.

Related unit tests are enabled as well.

Notable related changes from AOTriton 0.8b:

1. AOTriton 0.8b moves the GPU kernel out of libaotriton.so to a separate directory `aotriton.images`;
2. LZMA replaces ZSTD as GPU kernel compression algorithm for better compression ratio: aotriton0.8b (.so + aotriton.images take 350MB) compared to aotriton0.7b .so: 800MB
3. The compression cannot be disabled now, and `liblzma` is hard run-time dependency.
    + Should not be a problem, since `lzma` is part of Python Standard Library

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140172
Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
2024-12-06 21:45:18 +00:00
Zhengxu Chen
1a7da6e7e9 [export] Add test to enforce consistency between synced thrift and generated thrift from schema.py (#141989)
Summary:
In this diff we implement a way to ensure the internal thrift schema from cfgr (configerator/structs/caffe2/torch/export/schema.thrift) and the schema in OSS (torch/_export/serde/schema.thrift) are in sync, by adding a unittest to reflect on the type names and fields from each schema and compare them field by field.

When we detect new fields/types from torch/_export/serde/schema.thrift, there'll be a test failure on the trunk and the error message hints people to add the missing field/type to the thrift schema from cfgr, so that they are always in sync in practice.

Test Plan: buck test mode/opt caffe2/test:test_export -- -r test_thrift_schema_in_sync

Differential Revision: D66716834

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141989
Approved by: https://github.com/yiming0416
2024-12-06 18:42:20 +00:00
xinan.lin
4742080ed9 [AOTI XPU] Enable Cpp wraper for Intel GPU. (#135318)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135318
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/guangyey, https://github.com/desertfire
2024-11-26 11:51:32 +00:00
Nikita Shulga
a2ac96cae0 [BE] Rectify some references to caffe2 (#140204)
- Rename `tools.build_pytorch_libs.build_caffe2` to `tools.build_pytorch_libs.build_pytorch`
- Delete number of `if BUILD_CAFFE2` conditions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140204
Approved by: https://github.com/huydhn, https://github.com/r-barnes, https://github.com/atalman
2024-11-09 14:14:20 +00:00
Yu, Guangye
46bca8a4b6 Export XPU oneDNN header to the public (#139177)
# Motivation
Export oneDNN header to the public, for example, the third-party extension now could use `GpuStreamManager` to manage `dnnl::stream` to submit oneDNN kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139177
Approved by: https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/malfet
2024-11-01 02:36:16 +00:00
Piotr Bialecki
bd88d40e5f [Submodule] update submodule onnx==1.17.0 (#139128)
Follow-up PR of: https://github.com/pytorch/pytorch/pull/138719

CC @malfet @ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139128
Approved by: https://github.com/malfet
2024-10-31 02:50:00 +00:00
Nikita Shulga
5c49db98b4 [EZ] Update minversion to 3.9.0 (#139085)
Fixes https://github.com/pytorch/pytorch/issues/138979

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139085
Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/seemethere, https://github.com/Skylion007
2024-10-28 18:04:29 +00:00
Aaron Gokaslan
49ed365b22 [BE]: Update Typeguard to TypeIs for better type inference (#133814)
Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814
Approved by: https://github.com/ezyang
2024-10-26 15:07:13 +00:00
Scott Wolchok
a3de067975 [PyTorch] Use 128-bit vectors for ARM64 (#137426)
The correct vector length for ARM64 is 128 bits (16
bytes). We were previously using double this, apparently just because
that would be the same length as AVX2.

Differential Revision: [D63984039](https://our.internmc.facebook.com/intern/diff/D63984039/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137426
Approved by: https://github.com/jgong5, https://github.com/malfet
ghstack dependencies: #138486, #138542, #138655, #138716, #138744
2024-10-26 00:20:35 +00:00
PyTorch MergeBot
32d4582e02 Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814)"
This reverts commit 16caa8c1b3.

Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/jeanschmidt due to checking if this will solve inductor errors ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2427565425))
2024-10-21 19:40:58 +00:00
Aaron Gokaslan
16caa8c1b3 [BE]: Update Typeguard to TypeIs for better type inference (#133814)
Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814
Approved by: https://github.com/ezyang
2024-10-21 17:20:06 +00:00
PyTorch MergeBot
d1027c2be6 Revert "Update sympy version constraint to 1.13.3 (#138338)"
This reverts commit d8279ad9d1.

Reverted https://github.com/pytorch/pytorch/pull/138338 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I think a bunch of inductor tests and test_dynamic_shapes are failing in trunk after this lands d8279ad9d1 ([comment](https://github.com/pytorch/pytorch/pull/138338#issuecomment-2424487225))
2024-10-20 03:19:02 +00:00
Jeongseok (JS) Lee
d8279ad9d1 Update sympy version constraint to 1.13.3 (#138338)
`simpy` was pinned to version 1.13.1 due to test failures with version 1.13.2 on Windows and mac, as reported in https://github.com/pytorch/pytorch/pull/133235. Now that a newer version, 1.13.3, has been released, this PR aims to verify if the test failure has been resolved and also allow building with newer versions for packaging purposes (e.g., https://github.com/conda-forge/pytorch-cpu-feedstock/pull/277#discussion_r1806721862).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138338
Approved by: https://github.com/Skylion007, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2024-10-20 00:20:02 +00:00
Jeongseok Lee
3cfd244495 Add USE_SYSTEM_NVTX option (#138287)
## Summary

We are currently [updating](https://github.com/conda-forge/pytorch-cpu-feedstock/pull/277) the [`conda-forge::pytorch`](https://anaconda.org/conda-forge/pytorch) package to version 2.5.0. This update includes a new dependency, the third_party/NVTX submodule. However, like other package management frameworks (e.g., apt), conda-forge prefers using system-installed packages instead of vendor-provided third-party packages.

This pull request aims to add an option, `USE_SYSTEM_NVTX`, to select whether to use the vendored nvtx or the system-installed one, with the default being the vendored one (which is the current behavior).

## Test Plan

The `USE_SYSTEM_NVTX` option is tested by building the `conda-forge::pytorch` package with the change applied as a [patch](cd1d2464dd/recipe/patches/0005-Use-system-nvtx3.patch).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138287
Approved by: https://github.com/albanD
2024-10-19 04:26:01 +00:00
Yu, Guangye
8cda774a03 Add torch.xpu.get_arch_list and torch.xpu.get_gencode_flags for XPU (#137773)
# Motivation
Add `torch.xpu.get_arch_list()` and `torch.xpu.get_gencode_flags()` methods that return architecture list and AOT flags to preserve what flags PyTorch XPU was built with.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137773
Approved by: https://github.com/EikanWang, https://github.com/albanD
2024-10-18 02:28:08 +00:00
atalman
6016b8a9be Remove CI/CD python 3.8 requirements (#137893)
Python 3.8 is deprecated from CI/CD. No reason have these pins
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137893
Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/albanD, https://github.com/kit1980
2024-10-14 20:28:48 +00:00
Xuehai Pan
59cdd8ddf1 Bump optree version to 0.13.0 to enable Python 3.13 and Python 3.13t support (#137396)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137396
Approved by: https://github.com/albanD
2024-10-08 06:49:04 +00:00
maajidkhann
5a6ddbcc3b Extending the Pytorch vec backend for SVE (ARM) (#119571)
**Motivation:**
In Pytorch, Aten vectorization supports multiple platforms, including x86 and Arm, as well as multiple data types. It provides a generic implementation of Vector (Vec) type that allows the programmer to write code packing various primitives (such as floats) within 256bit & 512bits registers. It can be extended to support other ISAs easily by adding more VecISA sub-classes.

**Reference Link:** https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/cpu/vec

**This PR:**

* Our goal with this contribution is to add support for SVE backend for Vec in the Aten vectorization for CPU backend which can be benefitted by any ARM architecture supported CPU's that supports SVE.

* More about SVE ISA for ARM: [https://developer.arm.com/Architectures/Scalable Vector Extensions](https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions)

* We are using the ARM C Language Extensions for SVE (https://developer.arm.com/documentation/102699/0100/Optimizing-with-intrinsics ) to accelerate performance for various operators in the SVE backend for Vec.

* Currently we are adding support only for SVE ISA with the vector length of 256 bits (SVE 256). In future, we plan to extend this SVE support for other vector lengths as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119571
Approved by: https://github.com/malfet, https://github.com/snadampal

Co-authored-by: Divya Kotadiya <divya.kotadiya@fujitsu.com>
2024-09-18 18:59:10 +00:00
angelayi
cd9ee49a69 [aoti] Add cpp loader (#135374)
* Added a cpp loader, AOTIModelPackageLoader, which can load the .pt2, build the .so, and create a runner. The python-facing API is that users can directly call the `run` function, whereas in cpp users can directly access the `runner_` if they are more familiar with that. I couldn't figure out how to bind the `get_runner()` function to python...
* Added a new config, `aot_inductor.package_cpp_only` which will **not** package the so. This means that whenever the package is loaded, we will need to build the so. This is turned off by default so that new environments do not need to rebuild their so. The `package_cpp_only` is a feature which torchchat intends to use to provide flexibility to users.
* Added a new config, `aot_inductor.metadata` which stores user-provided metadata, serialized to the pt2 as a json file. It also stores the device used when exporting, "cuda" or "cpu", so that during load time, we can use that data to determine which AOTIModelContainerRunner to use. The metadata can be accessed through `loader.get_metadata()`. TODO is to move this metadata to the toplevel `package_aoti` function so that we can remove the metadata as a config.
* Separated out `package_aoti` as a standalone function, instead of it automatically being called in inductor. This is to prepare for the case where users will compile multiple models, and want to bundle it in one package. The specific use case is in torchchat, where we want to package the separately-exported encoder and decoder layers. An example of how to use this is in `test_multiple_methods`.
* `load_package` will load a singular model, given the model name.
* The loader doesn't support windows for now, I think I need to add some more casing to make the build commands work on windows?

Differential Revision: [D62329906](https://our.internmc.facebook.com/intern/diff/D62329906)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135374
Approved by: https://github.com/desertfire, https://github.com/malfet
2024-09-11 03:00:01 +00:00
Moritz Marseu
e000cf0ad9 Fix license metadata in setup.py (#129219)
Package metadata in setup.py lists license as BSD-3 which is not a valid SPDX id. The correct id would be BSD-3-Clause.

Specifying an SPDX id is beneficial to license compliance scanning.

*Taking up #129123 from my personal account.*
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129219
Approved by: https://github.com/malfet, https://github.com/kit1980
2024-09-04 00:21:22 +00:00
fduwjj
bdfa94b787 [RFC] Make fr trace script a console scripts (#134729)
We want to make fr analyzer script a command after users `pip install torch`, that's why we want to mimic what torchrun is doing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134729
Approved by: https://github.com/c-p-i-o, https://github.com/malfet
ghstack dependencies: #134528, #134780
2024-08-30 18:17:06 +00:00
Zitong Zhan
90c821814e SparseCsrCUDA: cuDSS backend for linalg.solve (#129856)
This PR switches to cuDSS library and has the same purpose of #127692, which is to add Sparse CSR tensor support to linalg.solve.
Fixes #69538

Minimum example of usage:
```
import torch

if __name__ == '__main__':
    spd = torch.rand(4, 3)
    A = spd.T @ spd
    b = torch.rand(3).to(torch.float64).cuda()
    A = A.to_sparse_csr().to(torch.float64).cuda()

    x = torch.linalg.solve(A, b)
    print((A @ x - b).norm())

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129856
Approved by: https://github.com/amjames, https://github.com/lezcano, https://github.com/huydhn

Co-authored-by: Zihang Fang <zhfang1108@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
2024-08-22 07:57:30 +00:00
PyTorch MergeBot
2db28a9611 Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814)"
This reverts commit bce0caba78.

Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/ezyang due to root cause of internal failures not addressed ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2302466444))
2024-08-21 16:13:34 +00:00
Aaron Gokaslan
bce0caba78 [BE]: Update Typeguard to TypeIs for better type inference (#133814)
Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814
Approved by: https://github.com/ezyang
2024-08-20 17:19:57 +00:00
cyy
c3d02fa390 [Reland2] Update NVTX to NVTX3 (#109843)
Another attempt to update NVTX to NVTX3. We now avoid changing NVTX header inclusion of existing code.  The advantage of NVTX3 over NVTX is that it is a header-only library so that linking with NVTX3 can greatly simplify our CMake and other building scripts for finding libraries in user environments. In addition, NVTX are indeed still present in the latest CUDA versions, but they're no longer a compiled library: It's now a header-only library. That's why there isn't a .lib file anymore.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109843
Approved by: https://github.com/peterbell10, https://github.com/eqy

Co-authored-by: Ivan Zaitsev <108101595+izaitsevfb@users.noreply.github.com>
2024-08-20 16:33:26 +00:00
Christophe Bornet
d6368985af [BE]: Fix setuptools not installed with Python 3.12 (#133561)
setuptools is not installed correctly for Python 3.12.
See https://github.com/python-poetry/poetry/issues/9630#issuecomment-2291114885

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133561
Approved by: https://github.com/Skylion007

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2024-08-17 17:42:04 +00:00
Mikayla Gawarecki
018e48c337 [Reland] Add wrappers for synchronous GPUDirect Storage APIs (#133489)
Reland #130633

USE_CUFILE turned off by default in this version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133489
Approved by: https://github.com/albanD
2024-08-15 17:11:52 +00:00
cyy
e76f0e0646 Remove QNNPACK reference from setup.py (#133177)
QNNPACK has been removed from third party
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133177
Approved by: https://github.com/albanD
2024-08-13 01:19:12 +00:00
Catherine Lee
4f0d5f6551 Pin sympy to 1.13.1 (#133235)
Sympy 1.13.2 release yesterday, and it results in test failures on windows and mac

454713fe9d/1

Hopefully these are the places it needs to get pinned
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133235
Approved by: https://github.com/atalman, https://github.com/ZainRizvi
2024-08-12 20:10:09 +00:00
cyy
05e8e87a69 [Submodule] Remove foxi (#132976)
It is not used after removal of Caffe2 code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132976
Approved by: https://github.com/ezyang
2024-08-09 03:46:52 +00:00
PyTorch MergeBot
e191b83462 Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633)"
This reverts commit 709ddf7a9d.

Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to still failing internally D60265673 ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2253239607))
2024-07-26 18:08:20 +00:00
Mikayla Gawarecki
709ddf7a9d Add wrappers for synchronous GPUDirect Storage APIs (#130633)
Based in part on https://github.com/NVIDIA/apex/pull/1774

Differential Revision: [D60155434](https://our.internmc.facebook.com/intern/diff/D60155434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633
Approved by: https://github.com/albanD
2024-07-25 22:23:38 +00:00
PyTorch MergeBot
e4b5645f83 Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633)"
This reverts commit 5b5e0698a5.

Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2245806738))
2024-07-23 17:19:34 +00:00
Mikayla Gawarecki
5b5e0698a5 Add wrappers for synchronous GPUDirect Storage APIs (#130633)
Based in part on https://github.com/NVIDIA/apex/pull/1774

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633
Approved by: https://github.com/albanD
2024-07-22 14:51:24 +00:00
Xuehai Pan
d2bd9acabd [BE] bump optree version to 0.12.1 (#130139)
0.12.0 Major Updates:

- Add context manager to temporarily set the dictionary sorting mode
- Add accessor APIs
- Use `stable` tag for `pybind11` for Python 3.13 support
- Fix potential segmentation fault for pickling support

0.12.1 Updates:

- Fix warning regression during import when launch with strict warning filters

Closes #130155
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130139
Approved by: https://github.com/zou3519
ghstack dependencies: #130895
2024-07-20 02:41:10 +00:00
Xuehai Pan
f0075c179b Pin sympy >= 1.13.0 (#130895)
------

The opposite of #130836. Pin `sympy >= 1.13.0` for Python >= 3.9 and `sympy == 1.12.1` for Python 3.8.

- #130836

See the PR description of #130836 for more details.

`sympy` 1.13.0 introduces some breaking changes which break our tests. More specifically:

- Ref [Backwards compatibility breaks and deprecations](https://github.com/sympy/sympy/wiki/release-notes-for-1.13.0#backwards-compatibility-breaks-and-deprecations)

> BREAKING CHANGE: Float and Integer/Rational no longer compare equal with a == b. From now on Float(2.0) != Integer(2). Previously expressions involving Float would compare unequal e.g. x*2.0 != x*2 but an individual Float would compare equal to an Integer. In SymPy 1.7 a Float will always compare unequal to an Integer even if they have the same "value". Use sympy.numbers.int_valued(number) to test if a number is a concrete number with no decimal part. ([#25614](https://github.com/sympy/sympy/pull/25614) by [@smichr](https://github.com/smichr))

`sympy >= 1.13.0` is required to enable Python 3.13 support. This should be part of #130689.

- #130689

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130895
Approved by: https://github.com/ezyang
2024-07-20 00:59:24 +00:00
Xuehai Pan
a3abfa5cb5 [BE][Easy][1/19] enforce style for empty lines in import segments (#129752)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129752
Approved by: https://github.com/ezyang, https://github.com/malfet
2024-07-16 00:42:56 +00:00
PyTorch MergeBot
074a5c0c9b Revert "[BE] bump optree version to 0.12.1 (#130139)"
This reverts commit 8fcb156e8b.

Reverted https://github.com/pytorch/pytorch/pull/130139 on behalf of https://github.com/clee2000 due to broke inductor/test_torchinductor_codegen_dynamic_shapes.py and test_sympy_utils.py 8fcb156e8b ([comment](https://github.com/pytorch/pytorch/pull/130139#issuecomment-2229248447))
2024-07-15 19:42:11 +00:00
Xuehai Pan
8fcb156e8b [BE] bump optree version to 0.12.1 (#130139)
0.12.0 Major Updates:

- Add context manager to temporarily set the dictionary sorting mode
- Add accessor APIs
- Use `stable` tag for `pybind11` for Python 3.13 support
- Fix potential segmentation fault for pickling support

0.12.1 Updates:

- Fix warning regression during import when launch with strict warning filters

Closes #130155
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130139
Approved by: https://github.com/zou3519
2024-07-15 17:27:07 +00:00
Catherine Lee
df50452279 Pin optree==0.11.0 on windows CI (#130155)
Fixes #ISSUE_NUMBER

doctests
test_testing

Failing run has 0.12.0 https://github.com/pytorch/pytorch/actions/runs/9804335516/job/27072891998
Succeeding run has 0.11.0 https://github.com/pytorch/pytorch/actions/runs/9798330845/job/27057359554

It is already pinned for mac and linux
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130155
Approved by: https://github.com/huydhn, https://github.com/atalman
2024-07-05 20:28:58 +00:00
PaliC
3d56673b24 [Split Build][BE] remove extraneous .py, .a, and .so files (#130053)
Removes extraneous .a, .so, and .py files from the split build. From here we can also clean up the builder script which produces the binary to do this. That pr is https://github.com/pytorch/builder/pull/1912

Verification:

The built wheel with BUILD_LIBTORCH_WHL=1 has the following files only (with .a, .so, and .py extensions)

```
sahanp@devgpu086 ~/p/dist (viable/strict)> pwd                                                                                                                                                                                                                            (pytorch-3.10)
/home/sahanp/pytorch/dist
sahanp@devgpu086 ~/p/dist (viable/strict)> find . -type f \( -name "*.py" -o -name "*.a" -o -name "*.so" \)                                                                                                                                                               (pytorch-3.10)
./torch/__init__.py
./torch/lib/libbackend_with_compiler.so
./torch/lib/libc10.so
./torch/lib/libjitbackend_test.so
./torch/lib/libtorch.so
./torch/lib/libtorch_cpu.so
./torch/lib/libtorch_global_deps.so
./torch/lib/libtorchbind_test.so
sahanp@devgpu086 ~/p/dist (viable/strict)>
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130053
Approved by: https://github.com/atalman
2024-07-05 19:05:32 +00:00
Randolf Scholz
22a06869f2 include jit/*.pyi (#129654)
Fixes #108781, see https://github.com/pytorch/pytorch/pull/108782#issuecomment-1927321532

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129654
Approved by: https://github.com/ezyang
2024-06-28 12:40:11 +00:00
Xu Han
424068d0d2 [Windows] remove mkl shared library dependency. (#129493)
# Background
I have fixed pytorch Windows missing mkl shared library dependency issue: https://github.com/pytorch/pytorch/issues/124009
The solution is change torch_cpu module static link mkl library:
1. pytorch static link mkl PR: https://github.com/pytorch/pytorch/pull/124925
2. builder install mkl static library: https://github.com/pytorch/builder/pull/1790

Double confirmed current build is using mkl static link: https://github.com/pytorch/pytorch/issues/124009#issuecomment-2160941802

# Goal
Remove setup.py `install_requires` will install mkl shared lib on pytorch Windows. It is not required now, due to we have static linked it.
It will reduce the pytorch install network traffic and avoid install useless mkl shared library package.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129493
Approved by: https://github.com/malfet
2024-06-28 11:42:21 +00:00
Nikita Shulga
816e8a3f21 [MacOS] Improve libomp packaging (#129473)
Instead of replacing `@rpath/libomp.dylib` with `@loadper_path/libomp.dylib`, keep it in place and add `@loadper_path` as new rpath

This should prevent double-loading of OpenMP runtime, because in case of `@rpath` loader is allowed to reuse other libraries, but `loadper_path` directive forces it to load it from the location relative to the executable

Test plan:
- Prepare the environment
```shell
conda create -n py310-cf python=3.10 numpy pip -c conda-forge
conda activate py310-cf
pip install torch --index-url https://download.pytorch.org/whl/test/cpu
```
- Verify that OpenMP is loaded twice and than crashes
```shell
KMP_VERSION=true python -c "import numpy as np; import torch; print(torch.__version__, torch.backends.openmp.is_available()); print(torch.rand(300, 300).abs().max())"
```
output:
```
LLVM OMP version: 5.0.20140926
LLVM OMP library type: performance
LLVM OMP link type: dynamic
LLVM OMP build time: no_timestamp
LLVM OMP build compiler: Clang 16.0
LLVM OMP alternative compiler support: yes
LLVM OMP API version: 5.0 (201611)
LLVM OMP dynamic error checking: no
LLVM OMP thread affinity support: no
LLVM OMP version: 5.0.20140926
LLVM OMP library type: performance
LLVM OMP link type: dynamic
LLVM OMP build time: no_timestamp
LLVM OMP build compiler: Clang 12.0
LLVM OMP alternative compiler support: yes
LLVM OMP API version: 5.0 (201611)
LLVM OMP dynamic error checking: no
LLVM OMP thread affinity support: no
2.4.0 True
zsh: segmentation fault  KMP_VERSION=true python -c
```
- Install artifact from this PR and make sure it passes the same test
```shell
python -mpip install ~/Downloads/torch-2.5.0.dev20240625-cp310-none-macosx_11_0_arm64.whl
KMP_VERSION=true python -c "import numpy as np; import torch; print(torch.__version__, torch.backends.openmp.is_available()); print(torch.rand(300, 300).abs().max())"
```
output
```
LLVM OMP version: 5.0.20140926
LLVM OMP library type: performance
LLVM OMP link type: dynamic
LLVM OMP build time: no_timestamp
LLVM OMP build compiler: Clang 16.0
LLVM OMP alternative compiler support: yes
LLVM OMP API version: 5.0 (201611)
LLVM OMP dynamic error checking: no
LLVM OMP thread affinity support: no
2.5.0.dev20240625 True
tensor(1.0000)
```
- Make sure it still uses bundled OpenMP if none is available in the environment
```
conda uninstall numpy -c conda-forge
KMP_VERSION=true python -c "from ctypes import cdll, c_char_p, c_uint32; import torch; from ctypes import cdll, c_char_p, c_uint32; libdyld = cdll.LoadLibrary('libSystem.dylib'); libdyld._dyld_image_count.restype = c_uint32; libdyld._dyld_get_image_name.restype = c_char_p; libdyld._dyld_get_image_name.argtypes = [c_uint32]; print(torch.rand(300, 300).abs().max()); libs = [libdyld._dyld_get_image_name(i).decode('ascii') for i in range(libdyld._dyld_image_count())]; print([l for l in libs if 'libomp.dylib' in l])"
```

Fixes https://github.com/pytorch/pytorch/issues/124497 and https://github.com/pytorch/pytorch/issues/126385
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129473
Approved by: https://github.com/atalman
2024-06-25 19:12:34 +00:00
PaliC
b0044e2e18 [Split Build] Support nightly release (#129011)
This PR adds the split build to our binaries workflow. Validation for the workflow is done using the PR above in conjunction with https://github.com/pytorch/builder/pull/1876.

Test Workflow: Check CI in the workflow above
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129011
Approved by: https://github.com/atalman
2024-06-22 05:45:14 +00:00
cyy
479ce5e2f4 Remove outdated CUDA code from CMake (#128801)
It's possible to simplify some CUDA handling logic in CMake.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128801
Approved by: https://github.com/r-barnes, https://github.com/malfet
2024-06-21 15:00:00 +00:00
PaliC
7d33ff59ba [Split Build]Use same package (#127934)
This PR removes the second separate package we were using for the libtorch wheel.
In terms of testing that this works we will look use the PRs above this in the stack.

As for sanity checking these are the wheels that are produced by running
```
python setup.py clean && BUILD_LIBTORCH_WHL=1 with-proxy python setup.py bdist_whee
l && BUILD_PYTHON_ONLY=1 with-proxy python setup.py bdist_wheel --cmake
```

```
sahanp@devgpu086 ~/pytorch ((5f15e171…))> ls -al dist/                                                        (pytorch-3.10)
total 677236
drwxr-xr-x 1 sahanp users       188 Jun  4 12:19 ./
drwxr-xr-x 1 sahanp users      1696 Jun  4 12:59 ../
-rw-r--r-- 1 sahanp users  81405742 Jun  4 12:19 torch-2.4.0a0+gitca0a73c-cp310-cp310-linux_x86_64.whl
-rw-r--r-- 1 sahanp users 612076919 Jun  4 12:19 libtorch-2.4.0a0+gitca0a73c-py3-none-any.whl
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127934
Approved by: https://github.com/atalman
2024-06-19 15:57:21 +00:00
albanD
8bcebc8dae Add runtime dependency on setuptools for cpp_extensions (#127921)
As per title since this was removed from the builtin python binary in 3.12 and we use it `torch.utils.cpp_extension.*`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127921
Approved by: https://github.com/Skylion007
2024-06-05 23:59:38 +00:00
cyy
d44daebdbc [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-31 01:20:45 +00:00
PyTorch MergeBot
67739d8c6f Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 699db7988d.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))
2024-05-30 01:16:57 +00:00
PaliC
9257a0698b [Split Build] Load dependencies from libtorch in __init__.py (#126826)
This PR makes it such that we search for a libtorch wheel when initializing pytorch in order to find the necessary shared libraries.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126826
Approved by: https://github.com/huydhn, https://github.com/atalman, https://github.com/ZainRizvi
2024-05-29 22:03:50 +00:00
cyy
699db7988d [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-29 11:58:03 +00:00
PaliC
a25b28a753 [Split Build] Add option to create libtorch wheel and use it to build pytorch as a separate wheel (#126328)
Creates an option to just build the libtorch portion of pytorch such that we have the necessary .so files.  Then it builds a torch package using the libtorch wheel. These options are enabled using ` BUILD_LIBTORCH_WHL` and `BUILD_PYTHON_ONLY`.

We run

```
 BUILD_LIBTORCH_WHL=1 python setup.py install
python setup.py clean
BUILD_PYTHON_ONLY=1 python setup.py install
```

to produce

```
sahanp@devgpu086 ~/pytorch (detached HEAD|REBASE-i 3/5)> ls /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/torch/lib/                                                                                                                (pytorch-3.10)
libshm.so*  libtorch_global_deps.so*  libtorch_python.so*
sahanp@devgpu086 ~/pytorch (detached HEAD|REBASE-i 3/5)> ldd build/lib/libtorch_python.so                                                                                                                                                                (pytorch-3.10)
        linux-vdso.so.1 (0x00007ffdc2d37000)
        libtorch.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libtorch.so (0x00007f539fe99000)
        libshm.so => /home/sahanp/pytorch/build/lib/libshm.so (0x00007f539fe90000)
        libcudnn.so.8 => /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn.so.8 (0x00007f539e800000)
        libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007f539e400000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f539e000000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f539fda5000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f539ebe5000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f539dc00000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f539fea0000)
        libtorch_cpu.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libtorch_cpu.so (0x00007f5392400000)
        libtorch_cuda.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libtorch_cuda.so (0x00007f5380000000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f539fd9e000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f539fd99000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f539fd94000)
        libc10.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libc10.so (0x00007f539eb07000)
        libmkl_intel_lp64.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_intel_lp64.so.2 (0x00007f537ec00000)
        libmkl_gnu_thread.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_gnu_thread.so.2 (0x00007f537ce00000)
        libmkl_core.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_core.so.2 (0x00007f5378800000)
        libomp.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/libomp.so (0x00007f539e707000)
        libcupti.so.12 => /usr/local/cuda/lib64/libcupti.so.12 (0x00007f5377e00000)
        libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x00007f5377a00000)
        libc10_cuda.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libc10_cuda.so (0x00007f539ea6a000)
        libcusparse.so.12 => /usr/local/cuda/lib64/libcusparse.so.12 (0x00007f5368400000)
        libcufft.so.11 => /usr/local/cuda/lib64/libcufft.so.11 (0x00007f535ee00000)
        libcusolver.so.11 => /usr/local/cuda/lib64/libcusolver.so.11 (0x00007f534c800000)
        libcurand.so.10 => /usr/local/cuda/lib64/libcurand.so.10 (0x00007f5346200000)
        libcublas.so.12 => /usr/local/cuda/lib64/libcublas.so.12 (0x00007f533f800000)
        libcublasLt.so.12 => /usr/local/cuda/lib64/libcublasLt.so.12 (0x00007f531e800000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007f539ea63000)
        libnvJitLink.so.12 => /usr/local/cuda/lib64/libnvJitLink.so.12 (0x00007f531b800000)
sahanp@devgpu086 ~/pytorch (detached HEAD|REBASE-i 3/5)> ldd build/lib/libtorch_global_deps.so                                                                                                                                                           (pytorch-3.10)
        linux-vdso.so.1 (0x00007ffc265df000)
        libmkl_intel_lp64.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_intel_lp64.so.2 (0x00007fa93fc00000)
        libmkl_gnu_thread.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_gnu_thread.so.2 (0x00007fa93de00000)
        libmkl_core.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_core.so.2 (0x00007fa939800000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fa940f05000)
        libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x00007fa939400000)
        libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007fa939000000)
        libgomp.so.1 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libgomp.so.1 (0x00007fa93fb07000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fa938c00000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fa940efe000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa940ef9000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa940ff5000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fa940ef2000)
        libstdc++.so.6 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libstdc++.so.6 (0x00007fa93921d000)
        libgcc_s.so.1 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libgcc_s.so.1 (0x00007fa93faec000)
        ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126328
Approved by: https://github.com/atalman
2024-05-29 04:33:56 +00:00
PyTorch MergeBot
cdbb2c9acc Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 4fdbaa794f.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))
2024-05-29 03:02:35 +00:00
cyy
4fdbaa794f [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-27 03:54:03 +00:00
cyy
7428fd19fe Remove outdated options from setup.py (#125988)
Since the recent removal of Caffe2 files.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125988
Approved by: https://github.com/ezyang
2024-05-21 18:48:23 +00:00
cyy
4ed93d6e0c [Submodule] Remove zstd dependency (#126485)
After searching in the codebase, it seems that zstd is not in use now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126485
Approved by: https://github.com/ezyang
2024-05-17 12:49:23 +00:00
FEI
b950217f19 Support third-party devices emit a range for each autograd operator (#125822)
Fixes #125752

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125822
Approved by: https://github.com/aaronenyeshi
2024-05-15 05:06:24 +00:00
Richard Barnes
b9e7b35912 Remove caffe2 from more build files (#125898)
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125898
Approved by: https://github.com/Skylion007
2024-05-13 18:37:59 +00:00
PyTorch MergeBot
33fae4fcf4 Revert "Use recursive blob for package data (#119257)"
This reverts commit f20e3ae0c3.

Reverted https://github.com/pytorch/pytorch/pull/119257 on behalf of https://github.com/malfet due to This likely caused https://github.com/pytorch/pytorch/issues/124941, not sure why warning about recursive grep was ignored ([comment](https://github.com/pytorch/pytorch/pull/119257#issuecomment-2078312309))
2024-04-25 23:08:22 +00:00
Florian
7ad6dc2cf3 [Profiler][PrivateUse1] Profiler support PrivateUse1 key (#124818)
Summary:
1.Package public headers of kineto if USE_KINETO so that they can be used by PrivateUse1 user.
2.Add PrivateUse1 key to ActivityType.
3. Support PrivateUse1 key in function deviceTypeFromActivity and _supported_activities.
4. Fix some bugs when processing profiler results.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124818
Approved by: https://github.com/aaronenyeshi
2024-04-24 18:52:08 +00:00
Timmy Xiao
f20e3ae0c3 Use recursive blob for package data (#119257)
setup.py now supports recursive glob for package data

I only added `.cpp`, `.h`, and `.yaml` files. Not sure if you want to include BAZEL or other files in package_data.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119257
Approved by: https://github.com/zou3519
2024-04-20 06:33:39 +00:00
PyTorch MergeBot
36f6928a37 Revert "[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556)"
This reverts commit 41613a0803.

Reverted https://github.com/pytorch/pytorch/pull/120556 on behalf of https://github.com/aaronenyeshi due to Breaks GPU Chrome trace UI ([comment](https://github.com/pytorch/pytorch/pull/120556#issuecomment-2061578951))
2024-04-17 15:38:14 +00:00
Xuehai Pan
2e48f7b044 [pytree] add tree_iter function (#123913)
- Add a new `tree_iter` function.
- Bump `optree` version to `0.11.0` for C++ version of `tree_iter`.

This PR is split from #120300.

- #120300

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123913
Approved by: https://github.com/zou3519
2024-04-16 06:02:08 +00:00
Florian
41613a0803 [Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556)
Summary:
1.Package public headers of kineto if USE_KINETO so that they can be used by PrivateUse1 user.
2.Add PrivateUse1 key to ActivityType.
3. Support PrivateUse1 key in function deviceTypeFromActivity and _supported_activities.
4. Fix some bugs when processing profiler results.
Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Aaron Shi <enye.shi@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120556
Approved by: https://github.com/aaronenyeshi
2024-04-12 14:28:19 +00:00
Aidyn-A
a6080f79e9 [Build] Add linker script optimization (#121975)
This PR adds a linker script optimization based on prioritized symbols that can be extracted from the profiles of popular workloads. The present linker script was generated to target ARM+CUDA and later can be extended if necessary. The reason we target ARM is shown below:

> PyTorch and other applications that access more than 24x 2MB code regions in quick succession can result in performance bottlenecks in the CPU front-end.  The link-time optimization improves executable code locality and improve performance. We recommend turning on the optimization always for PyTorch and other application that behaves similarly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121975
Approved by: https://github.com/ptrblck, https://github.com/atalman
2024-04-09 20:22:25 +00:00
Nikita Shulga
5b0ce8f334 [Wheel] Change libtorch_cpu OpenMP search path (#123417)
To prevent delocate from double-packing it, which makes Torch wheels
unusable with torch.compile out of the box

Fixes https://github.com/pytorch/pytorch/issues/122705

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123417
Approved by: https://github.com/atalman
2024-04-05 13:02:38 +00:00
Lei,zhenyuan
15bd81bfaf expose transformer header in cmake and wheel (#122586)
expose transformer header in cmake and wheel, some utils functions are used in nested transformer development on IPEX side
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122586
Approved by: https://github.com/drisspg, https://github.com/Neilblaze, https://github.com/gujinghui
2024-04-03 02:27:40 +00:00
dujinhang
9990d1bc22 Add 'profiler/python' to the package.' (#121892)
Fixes #ISSUE_NUMBER
expose the `py_symbolize` interface for use.
thank you
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121892
Approved by: https://github.com/zdevito
2024-03-16 11:11:26 +00:00
Bin Bao
bd19d6d822 [AOTI] Use torchgen to generate C shim functions (#120513)
Summary: The current C shim layer manually implements a C interface for a handful of ops. Obviously that's not scalable if we want to extend it to cover all aten ops. This new torchgen script automatically generates C shim interfaces for CPU and CUDA backends. The interface follows the same parameter passing rules as the current C shim layer, such as

* Use plain C data types to pass parameters
* Use AtenTensorHandle to pass at::Tensor
* Use pointer type to pass optional parameter
* Use pointer+length to pass list
* Use device_type+device_index to pass device
* When a parameter is a pointer of pointer, e.g. AtenTensorHandle**, the script generates either a list of optional values or an optional list of values

https://gist.github.com/desertfire/83701532b126c6d34dae6ba68a1b074a is an example of the generated torch/csrc/inductor/aoti_torch/generated/c_shim_cuda.cpp file. The current version doesn't generate C shim wrappers for all aten ops, and probably generates more wrappers than needed on the other hand, but it should serve as a good basis.

This PR by itself won't change AOTI codegen and thus won't introduce any FC breakage. The actual wrapper codegen changes will come in another PR with some version control flag to avoid FC breakage.

Differential Revision: [D54258087](https://our.internmc.facebook.com/intern/diff/D54258087)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120513
Approved by: https://github.com/jansel
2024-03-05 04:28:44 +00:00
atalman
cb812c9832 Add windows constraint to mkl package in wheel (#121014)
Follow up on: https://github.com/pytorch/pytorch/pull/102604
Address this comment: https://github.com/pytorch/pytorch/pull/102604#discussion_r1419944305

Whl metadata for all wheels published to pypi must match, otherwise poetry install will fail see this comment:
https://github.com/pytorch/pytorch/issues/88049#issuecomment-1302555269

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121014
Approved by: https://github.com/malfet
2024-03-04 20:54:26 +00:00
Kurman Karabukaev
b0cfa96e82 [Torchelastic][Logging] Pluggable logsspecs using python entrypoints and option to specify one by name. (#120942)
Summary:
Expose an option to users to specify name of the LogsSpec implementation to use.
- Has to be defined in entrypoints under `torchrun.logs_specs` group.
- Must implement LogsSpec defined in prior PR/diff.

Test Plan: unit test+local tests

Reviewed By: ezyang

Differential Revision: D54180838

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120942
Approved by: https://github.com/ezyang
2024-03-02 08:07:52 +00:00
Lei,zhenyuan
eee040c939 expose nested header to wheel (#120603)
expose nested header to pytorch wheel, help with developers for reuse pytorch nested tensor related utils header inside wheel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120603
Approved by: https://github.com/jbschlosser, https://github.com/gujinghui
2024-03-01 09:59:45 +00:00
Yu, Guangye
df40847486 Add xpu header to include/ATen/xpu (#120786)
# Motivation
Add xpu header file to `include/ATen/xpu` to make them public.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120786
Approved by: https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/jgong5, https://github.com/albanD
2024-02-28 16:22:14 +00:00
Jeff Daily
0e6eee3c89 [ROCm] TunableOp (#114894)
Some operations, such as GEMMs, could be implemented using more than one library or more than one technique. For example, a GEMM could be implemented for CUDA or ROCm using either the blas or blasLt libraries. Further, ROCm's rocblas and hipblaslt libraries allow the user to query for all possible algorithms and then choose one. How does one know which implementation is the fastest and should be chosen? That's what TunableOp provides.

See the README.md for additional details.

TunableOp was ported from onnxruntime starting from commit 08dce54266.  The content was significantly modified and reorganized for use within PyTorch.  The files copied and their approximate new names or source content location within aten/src/ATen/cuda/tunable include the following:

- onnxruntime/core/framework/tunable.h -> Tunable.h
- onnxruntime/core/framework/tuning_context.h -> Tunable.h
- onnxruntime/core/framework/tuning_context_impl.h -> Tunable.cpp
- onnxruntime/core/providers/rocm/tunable/gemm_common.h -> GemmCommon.h
- onnxruntime/core/providers/rocm/tunable/gemm_hipblaslt.h -> GemmHipblaslt.h
- onnxruntime/core/providers/rocm/tunable/gemm_rocblas.h -> GemmRocblas.h
- onnxruntime/core/providers/rocm/tunable/gemm_tunable.cuh -> TunableGemm.h
- onnxruntime/core/providers/rocm/tunable/rocm_tuning_context.cc -> Tunable.cpp
- onnxruntime/core/providers/rocm/tunable/util.h -> StreamTimer.h
- onnxruntime/core/providers/rocm/tunable/util.cc -> StreamTimer.cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114894
Approved by: https://github.com/xw285cornell, https://github.com/jianyuh
2024-02-14 19:03:49 +00:00
Aaron Gokaslan
f9200c8608 [BE][Ez]: FURB129: remove unneeded readlines() (#119796)
Applies a refurb rule to remove any readlines() in a for loop iteration as it just creates a temporary list in memory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119796
Approved by: https://github.com/ezyang
2024-02-13 21:21:22 +00:00
Nikita Shulga
60148f1761 [EZ] Set maximum supported version of Python as 3.12 (#119743)
Doesn't really affect anything other than metadata on PyPI website
Otherwise programming languages tab on https://pypi.org/project/torch/2.2.0/ shows supported version 3.8 to 3.10:
<img width="239" alt="image" src="https://github.com/pytorch/pytorch/assets/2453524/e17f9982-8833-4cd8-b8d8-b2f1cb538548">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119743
Approved by: https://github.com/kit1980, https://github.com/Skylion007
2024-02-13 06:56:32 +00:00