Commit Graph

785 Commits

Author SHA1 Message Date
albanD
8bcebc8dae Add runtime dependency on setuptools for cpp_extensions (#127921)
As per title since this was removed from the builtin python binary in 3.12 and we use it `torch.utils.cpp_extension.*`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127921
Approved by: https://github.com/Skylion007
2024-06-05 23:59:38 +00:00
cyy
d44daebdbc [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-31 01:20:45 +00:00
PyTorch MergeBot
67739d8c6f Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 699db7988d.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))
2024-05-30 01:16:57 +00:00
PaliC
9257a0698b [Split Build] Load dependencies from libtorch in __init__.py (#126826)
This PR makes it such that we search for a libtorch wheel when initializing pytorch in order to find the necessary shared libraries.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126826
Approved by: https://github.com/huydhn, https://github.com/atalman, https://github.com/ZainRizvi
2024-05-29 22:03:50 +00:00
cyy
699db7988d [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-29 11:58:03 +00:00
PaliC
a25b28a753 [Split Build] Add option to create libtorch wheel and use it to build pytorch as a separate wheel (#126328)
Creates an option to just build the libtorch portion of pytorch such that we have the necessary .so files.  Then it builds a torch package using the libtorch wheel. These options are enabled using ` BUILD_LIBTORCH_WHL` and `BUILD_PYTHON_ONLY`.

We run

```
 BUILD_LIBTORCH_WHL=1 python setup.py install
python setup.py clean
BUILD_PYTHON_ONLY=1 python setup.py install
```

to produce

```
sahanp@devgpu086 ~/pytorch (detached HEAD|REBASE-i 3/5)> ls /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/torch/lib/                                                                                                                (pytorch-3.10)
libshm.so*  libtorch_global_deps.so*  libtorch_python.so*
sahanp@devgpu086 ~/pytorch (detached HEAD|REBASE-i 3/5)> ldd build/lib/libtorch_python.so                                                                                                                                                                (pytorch-3.10)
        linux-vdso.so.1 (0x00007ffdc2d37000)
        libtorch.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libtorch.so (0x00007f539fe99000)
        libshm.so => /home/sahanp/pytorch/build/lib/libshm.so (0x00007f539fe90000)
        libcudnn.so.8 => /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn.so.8 (0x00007f539e800000)
        libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007f539e400000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f539e000000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f539fda5000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f539ebe5000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f539dc00000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f539fea0000)
        libtorch_cpu.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libtorch_cpu.so (0x00007f5392400000)
        libtorch_cuda.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libtorch_cuda.so (0x00007f5380000000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f539fd9e000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f539fd99000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f539fd94000)
        libc10.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libc10.so (0x00007f539eb07000)
        libmkl_intel_lp64.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_intel_lp64.so.2 (0x00007f537ec00000)
        libmkl_gnu_thread.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_gnu_thread.so.2 (0x00007f537ce00000)
        libmkl_core.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_core.so.2 (0x00007f5378800000)
        libomp.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/libomp.so (0x00007f539e707000)
        libcupti.so.12 => /usr/local/cuda/lib64/libcupti.so.12 (0x00007f5377e00000)
        libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x00007f5377a00000)
        libc10_cuda.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libc10_cuda.so (0x00007f539ea6a000)
        libcusparse.so.12 => /usr/local/cuda/lib64/libcusparse.so.12 (0x00007f5368400000)
        libcufft.so.11 => /usr/local/cuda/lib64/libcufft.so.11 (0x00007f535ee00000)
        libcusolver.so.11 => /usr/local/cuda/lib64/libcusolver.so.11 (0x00007f534c800000)
        libcurand.so.10 => /usr/local/cuda/lib64/libcurand.so.10 (0x00007f5346200000)
        libcublas.so.12 => /usr/local/cuda/lib64/libcublas.so.12 (0x00007f533f800000)
        libcublasLt.so.12 => /usr/local/cuda/lib64/libcublasLt.so.12 (0x00007f531e800000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007f539ea63000)
        libnvJitLink.so.12 => /usr/local/cuda/lib64/libnvJitLink.so.12 (0x00007f531b800000)
sahanp@devgpu086 ~/pytorch (detached HEAD|REBASE-i 3/5)> ldd build/lib/libtorch_global_deps.so                                                                                                                                                           (pytorch-3.10)
        linux-vdso.so.1 (0x00007ffc265df000)
        libmkl_intel_lp64.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_intel_lp64.so.2 (0x00007fa93fc00000)
        libmkl_gnu_thread.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_gnu_thread.so.2 (0x00007fa93de00000)
        libmkl_core.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_core.so.2 (0x00007fa939800000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fa940f05000)
        libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x00007fa939400000)
        libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007fa939000000)
        libgomp.so.1 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libgomp.so.1 (0x00007fa93fb07000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fa938c00000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fa940efe000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa940ef9000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa940ff5000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fa940ef2000)
        libstdc++.so.6 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libstdc++.so.6 (0x00007fa93921d000)
        libgcc_s.so.1 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libgcc_s.so.1 (0x00007fa93faec000)
        ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126328
Approved by: https://github.com/atalman
2024-05-29 04:33:56 +00:00
PyTorch MergeBot
cdbb2c9acc Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 4fdbaa794f.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))
2024-05-29 03:02:35 +00:00
cyy
4fdbaa794f [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-27 03:54:03 +00:00
cyy
7428fd19fe Remove outdated options from setup.py (#125988)
Since the recent removal of Caffe2 files.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125988
Approved by: https://github.com/ezyang
2024-05-21 18:48:23 +00:00
cyy
4ed93d6e0c [Submodule] Remove zstd dependency (#126485)
After searching in the codebase, it seems that zstd is not in use now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126485
Approved by: https://github.com/ezyang
2024-05-17 12:49:23 +00:00
FEI
b950217f19 Support third-party devices emit a range for each autograd operator (#125822)
Fixes #125752

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125822
Approved by: https://github.com/aaronenyeshi
2024-05-15 05:06:24 +00:00
Richard Barnes
b9e7b35912 Remove caffe2 from more build files (#125898)
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125898
Approved by: https://github.com/Skylion007
2024-05-13 18:37:59 +00:00
PyTorch MergeBot
33fae4fcf4 Revert "Use recursive blob for package data (#119257)"
This reverts commit f20e3ae0c3.

Reverted https://github.com/pytorch/pytorch/pull/119257 on behalf of https://github.com/malfet due to This likely caused https://github.com/pytorch/pytorch/issues/124941, not sure why warning about recursive grep was ignored ([comment](https://github.com/pytorch/pytorch/pull/119257#issuecomment-2078312309))
2024-04-25 23:08:22 +00:00
Florian
7ad6dc2cf3 [Profiler][PrivateUse1] Profiler support PrivateUse1 key (#124818)
Summary:
1.Package public headers of kineto if USE_KINETO so that they can be used by PrivateUse1 user.
2.Add PrivateUse1 key to ActivityType.
3. Support PrivateUse1 key in function deviceTypeFromActivity and _supported_activities.
4. Fix some bugs when processing profiler results.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124818
Approved by: https://github.com/aaronenyeshi
2024-04-24 18:52:08 +00:00
Timmy Xiao
f20e3ae0c3 Use recursive blob for package data (#119257)
setup.py now supports recursive glob for package data

I only added `.cpp`, `.h`, and `.yaml` files. Not sure if you want to include BAZEL or other files in package_data.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119257
Approved by: https://github.com/zou3519
2024-04-20 06:33:39 +00:00
PyTorch MergeBot
36f6928a37 Revert "[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556)"
This reverts commit 41613a0803.

Reverted https://github.com/pytorch/pytorch/pull/120556 on behalf of https://github.com/aaronenyeshi due to Breaks GPU Chrome trace UI ([comment](https://github.com/pytorch/pytorch/pull/120556#issuecomment-2061578951))
2024-04-17 15:38:14 +00:00
Xuehai Pan
2e48f7b044 [pytree] add tree_iter function (#123913)
- Add a new `tree_iter` function.
- Bump `optree` version to `0.11.0` for C++ version of `tree_iter`.

This PR is split from #120300.

- #120300

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123913
Approved by: https://github.com/zou3519
2024-04-16 06:02:08 +00:00
Florian
41613a0803 [Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556)
Summary:
1.Package public headers of kineto if USE_KINETO so that they can be used by PrivateUse1 user.
2.Add PrivateUse1 key to ActivityType.
3. Support PrivateUse1 key in function deviceTypeFromActivity and _supported_activities.
4. Fix some bugs when processing profiler results.
Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Aaron Shi <enye.shi@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120556
Approved by: https://github.com/aaronenyeshi
2024-04-12 14:28:19 +00:00
Aidyn-A
a6080f79e9 [Build] Add linker script optimization (#121975)
This PR adds a linker script optimization based on prioritized symbols that can be extracted from the profiles of popular workloads. The present linker script was generated to target ARM+CUDA and later can be extended if necessary. The reason we target ARM is shown below:

> PyTorch and other applications that access more than 24x 2MB code regions in quick succession can result in performance bottlenecks in the CPU front-end.  The link-time optimization improves executable code locality and improve performance. We recommend turning on the optimization always for PyTorch and other application that behaves similarly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121975
Approved by: https://github.com/ptrblck, https://github.com/atalman
2024-04-09 20:22:25 +00:00
Nikita Shulga
5b0ce8f334 [Wheel] Change libtorch_cpu OpenMP search path (#123417)
To prevent delocate from double-packing it, which makes Torch wheels
unusable with torch.compile out of the box

Fixes https://github.com/pytorch/pytorch/issues/122705

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123417
Approved by: https://github.com/atalman
2024-04-05 13:02:38 +00:00
Lei,zhenyuan
15bd81bfaf expose transformer header in cmake and wheel (#122586)
expose transformer header in cmake and wheel, some utils functions are used in nested transformer development on IPEX side
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122586
Approved by: https://github.com/drisspg, https://github.com/Neilblaze, https://github.com/gujinghui
2024-04-03 02:27:40 +00:00
dujinhang
9990d1bc22 Add 'profiler/python' to the package.' (#121892)
Fixes #ISSUE_NUMBER
expose the `py_symbolize` interface for use.
thank you
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121892
Approved by: https://github.com/zdevito
2024-03-16 11:11:26 +00:00
Bin Bao
bd19d6d822 [AOTI] Use torchgen to generate C shim functions (#120513)
Summary: The current C shim layer manually implements a C interface for a handful of ops. Obviously that's not scalable if we want to extend it to cover all aten ops. This new torchgen script automatically generates C shim interfaces for CPU and CUDA backends. The interface follows the same parameter passing rules as the current C shim layer, such as

* Use plain C data types to pass parameters
* Use AtenTensorHandle to pass at::Tensor
* Use pointer type to pass optional parameter
* Use pointer+length to pass list
* Use device_type+device_index to pass device
* When a parameter is a pointer of pointer, e.g. AtenTensorHandle**, the script generates either a list of optional values or an optional list of values

https://gist.github.com/desertfire/83701532b126c6d34dae6ba68a1b074a is an example of the generated torch/csrc/inductor/aoti_torch/generated/c_shim_cuda.cpp file. The current version doesn't generate C shim wrappers for all aten ops, and probably generates more wrappers than needed on the other hand, but it should serve as a good basis.

This PR by itself won't change AOTI codegen and thus won't introduce any FC breakage. The actual wrapper codegen changes will come in another PR with some version control flag to avoid FC breakage.

Differential Revision: [D54258087](https://our.internmc.facebook.com/intern/diff/D54258087)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120513
Approved by: https://github.com/jansel
2024-03-05 04:28:44 +00:00
atalman
cb812c9832 Add windows constraint to mkl package in wheel (#121014)
Follow up on: https://github.com/pytorch/pytorch/pull/102604
Address this comment: https://github.com/pytorch/pytorch/pull/102604#discussion_r1419944305

Whl metadata for all wheels published to pypi must match, otherwise poetry install will fail see this comment:
https://github.com/pytorch/pytorch/issues/88049#issuecomment-1302555269

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121014
Approved by: https://github.com/malfet
2024-03-04 20:54:26 +00:00
Kurman Karabukaev
b0cfa96e82 [Torchelastic][Logging] Pluggable logsspecs using python entrypoints and option to specify one by name. (#120942)
Summary:
Expose an option to users to specify name of the LogsSpec implementation to use.
- Has to be defined in entrypoints under `torchrun.logs_specs` group.
- Must implement LogsSpec defined in prior PR/diff.

Test Plan: unit test+local tests

Reviewed By: ezyang

Differential Revision: D54180838

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120942
Approved by: https://github.com/ezyang
2024-03-02 08:07:52 +00:00
Lei,zhenyuan
eee040c939 expose nested header to wheel (#120603)
expose nested header to pytorch wheel, help with developers for reuse pytorch nested tensor related utils header inside wheel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120603
Approved by: https://github.com/jbschlosser, https://github.com/gujinghui
2024-03-01 09:59:45 +00:00
Yu, Guangye
df40847486 Add xpu header to include/ATen/xpu (#120786)
# Motivation
Add xpu header file to `include/ATen/xpu` to make them public.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120786
Approved by: https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/jgong5, https://github.com/albanD
2024-02-28 16:22:14 +00:00
Jeff Daily
0e6eee3c89 [ROCm] TunableOp (#114894)
Some operations, such as GEMMs, could be implemented using more than one library or more than one technique. For example, a GEMM could be implemented for CUDA or ROCm using either the blas or blasLt libraries. Further, ROCm's rocblas and hipblaslt libraries allow the user to query for all possible algorithms and then choose one. How does one know which implementation is the fastest and should be chosen? That's what TunableOp provides.

See the README.md for additional details.

TunableOp was ported from onnxruntime starting from commit 08dce54266.  The content was significantly modified and reorganized for use within PyTorch.  The files copied and their approximate new names or source content location within aten/src/ATen/cuda/tunable include the following:

- onnxruntime/core/framework/tunable.h -> Tunable.h
- onnxruntime/core/framework/tuning_context.h -> Tunable.h
- onnxruntime/core/framework/tuning_context_impl.h -> Tunable.cpp
- onnxruntime/core/providers/rocm/tunable/gemm_common.h -> GemmCommon.h
- onnxruntime/core/providers/rocm/tunable/gemm_hipblaslt.h -> GemmHipblaslt.h
- onnxruntime/core/providers/rocm/tunable/gemm_rocblas.h -> GemmRocblas.h
- onnxruntime/core/providers/rocm/tunable/gemm_tunable.cuh -> TunableGemm.h
- onnxruntime/core/providers/rocm/tunable/rocm_tuning_context.cc -> Tunable.cpp
- onnxruntime/core/providers/rocm/tunable/util.h -> StreamTimer.h
- onnxruntime/core/providers/rocm/tunable/util.cc -> StreamTimer.cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114894
Approved by: https://github.com/xw285cornell, https://github.com/jianyuh
2024-02-14 19:03:49 +00:00
Aaron Gokaslan
f9200c8608 [BE][Ez]: FURB129: remove unneeded readlines() (#119796)
Applies a refurb rule to remove any readlines() in a for loop iteration as it just creates a temporary list in memory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119796
Approved by: https://github.com/ezyang
2024-02-13 21:21:22 +00:00
Nikita Shulga
60148f1761 [EZ] Set maximum supported version of Python as 3.12 (#119743)
Doesn't really affect anything other than metadata on PyPI website
Otherwise programming languages tab on https://pypi.org/project/torch/2.2.0/ shows supported version 3.8 to 3.10:
<img width="239" alt="image" src="https://github.com/pytorch/pytorch/assets/2453524/e17f9982-8833-4cd8-b8d8-b2f1cb538548">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119743
Approved by: https://github.com/kit1980, https://github.com/Skylion007
2024-02-13 06:56:32 +00:00
Yu, Guangye
a205e7bf56 [3/4] Intel GPU Runtime Upstreaming for Device (#116850)
# Motivation
According to [[1/4] Intel GPU Runtime Upstreaming for Device](https://github.com/pytorch/pytorch/pull/116019), As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), this third PR  covers the changes under `libtorch_python`.

# Design
This PR primarily offers device-related APIs in python frontend, including
- `torch.xpu.is_available`
- `torch.xpu.device_count`
- `torch.xpu.current_device`
- `torch.xpu.set_device`
- `torch.xpu.device`
- `torch.xpu.device_of`
- `torch.xpu.get_device_name`
- `torch.xpu.get_device_capability`
- `torch.xpu.get_device_properties`
- ====================
- `torch.xpu._DeviceGuard`
- `torch.xpu._is_compiled`
- `torch.xpu._get_device`

# Additional Context
We will implement the support of lazy initialization in the next PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116850
Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/gujinghui, https://github.com/malfet
2024-02-01 12:31:26 +00:00
Zhengxu Chen
2d37a046e7 [export] Enforce serialization BC/FC with updater script. (#118424)
Summary:
This diff implements a mechanism for safely update torch.export serialization schema, aka schema.py, which is the API surface having the strongest compatibility guarantee.

The diff is consist of 3 changes:
- Added a script to "build" or "materialize" schema.py into a platform neutral format (yaml), which serves as the committed form of the seialization schema.
- Added unittest to compare against schema.py and schema.yaml, so that it forces developers to execute the updater script when there is mismatch between two files.
- Added a checker inside the updater script, so that all the compatible change will result in a minor version bump, and all the incompatible changes will result in a major version bump.

torch.export's serialization BC/FC policy is (tentatively) documented here: https://docs.google.com/document/d/1EN7JrHbOPDhbpLDtiYG4_BPUs7PttpXlbZ27FuwKhxg/edit#heading=h.pup7ir8rqjhx , we will update the

As noted in the code doc, people should be able to run the following command to update schema properly from now on:

```
    python scripts/export/update_schema.py --prefix <path_to_torch_development_diretory>
or
    buck run caffe2:export_update_schema -- --prefix /data/users/$USER/fbsource/fbcode/caffe2/
```

Test Plan:
buck test mode/opt caffe2/test:test_export -- -r test_schema
buck run caffe2:update_export_schema -- --prefix /data/users/$USER/fbsource/fbcode/caffe2/

Differential Revision: D52971020

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118424
Approved by: https://github.com/angelayi
2024-01-31 05:37:58 +00:00
feifan
3c77a3ed03 export ATen/native/sparse/*.h (#118274)
Fixes #ISSUE_NUMBER

We are trying to adapt `SparsePrivateUse1` in our code. However, I found that `sparse_stup` has not been exposed yet, which makes it impossible for me to implement stup and register. I hope that the header files in this directory can be exposed. @albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118274
Approved by: https://github.com/ezyang
2024-01-25 22:47:39 +00:00
mantaionut
6784594532 Fix sparse windows on CPU with MKL (#102604)
Fix https://github.com/pytorch/pytorch/issues/97352.
This PR changes the way the linking to intel MKL is done and updating MKL on Windows to mkl-2021.4.0 .
There are for both conda and pip packages MKL  version with which you can link dynamically. mkl-devel contains the static versions of the dlls and MKL contains the needed dlls for the runtime. MKL dlls and static libs starting with  2021.4.0 have the version in their names( for MKL 2023 we have mkl_core.2.dll and for 2021.4.0 we have mkl_core.1.dll) so its possible to have multiple versions installed and it will work properly.
For the wheel build, I added dependency for whell MKL and on conda a dependecy for the conda MKL  and on libtorch I copied the MKL binaries in libtorch.
In order to test this PR I have to use custom builder https://github.com/pytorch/builder/pull/1467

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102604
Approved by: https://github.com/IvanYashchuk, https://github.com/malfet
2024-01-23 17:41:18 +00:00
Nikita Shulga
c4eab49ded [MacOS] Embed libomp.dylib/omp.h into MacOS wheel (#114816)
To keep them on par with what we do on x86
And `omp.h` as it is needed for `torch.compile` on CPU

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114816
Approved by: https://github.com/atalman
2024-01-19 21:21:33 +00:00
Yu, Guangye
50049cfaa0 [1/4] Intel GPU Runtime Upstreaming for Device (#116019)
# Motivation
As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), The first runtime component we would like to upstream is `Device` which contains the device management functions of Intel GPU's runtime. To facilitate the code review, we split the code changes into 4 PRs. This is one of the 4 PRs and covers the changes under `c10`.

# Design
Intel GPU device is a wrapper of sycl device on which kernels can be executed. In our design, we will maintain a sycl device pool containing all the GPU devices of the current machine, and manage the status of the device pool by PyTorch. The thread local safe is considered in this design. The corresponding C++ files related to `Device` will be placed in c10/xpu folder. And we provide the c10 device runtime APIs, like
  - `c10::xpu::device_count`
  - `c10::xpu::set_device`
  - ...

# Additional Context
In our plan, 4 PRs should be submitted to PyTorch for `Device`:
1. for c10
2. for aten
3. for python frontend
4. for lazy initialization shared with CUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116019
Approved by: https://github.com/gujinghui, https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet
2024-01-12 07:36:25 +00:00
Edward Yang
b4a35632f9 Add function to materialize COW storages (#117053)
Summary: From Kurt Mohler, see https://github.com/pytorch/pytorch/pull/113396 (manually imported due to ghimport problems)

Test Plan: sandcastle, OSS CI

Differential Revision: D52610522

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117053
Approved by: https://github.com/malfet, https://github.com/kurtamohler
2024-01-10 15:34:16 +00:00
PyTorch MergeBot
9ac0e6971a Revert "[1/4] Intel GPU Runtime Upstreaming for Device (#116019)"
This reverts commit b4cebe2c34.

Reverted https://github.com/pytorch/pytorch/pull/116019 on behalf of https://github.com/malfet due to Broke internal and periodic buck builds, see https://github.com/pytorch/pytorch/actions/runs/7414664129/job/20176215868 ([comment](https://github.com/pytorch/pytorch/pull/116019#issuecomment-1879030285))
2024-01-05 17:36:39 +00:00
Yu, Guangye
b4cebe2c34 [1/4] Intel GPU Runtime Upstreaming for Device (#116019)
# Motivation
As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), The first runtime component we would like to upstream is `Device` which contains the device management functions of Intel GPU's runtime. To facilitate the code review, we split the code changes into 4 PRs. This is one of the 4 PRs and covers the changes under `c10`.

# Design
Intel GPU device is a wrapper of sycl device on which kernels can be executed. In our design, we will maintain a sycl device pool containing all the GPU devices of the current machine, and manage the status of the device pool by PyTorch. The thread local safe is considered in this design. The corresponding C++ files related to `Device` will be placed in c10/xpu folder. And we provide the c10 device runtime APIs, like
  - `c10::xpu::device_count`
  - `c10::xpu::set_device`
  - ...

# Additional Context
In our plan, 4 PRs should be submitted to PyTorch for `Device`:
1. for c10
2. for aten
3. for python frontend
4. for lazy initialization shared with CUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116019
Approved by: https://github.com/gujinghui, https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet
2024-01-04 17:35:04 +00:00
Bin Bao
fabf9433e7 [AOTI][refactor] Organize model runner files (#116022)
Summary: Move runner util files into a subdirectory and put AOTIModelContainerRunnerCpu into a separate file

Differential Revision: [D52300693](https://our.internmc.facebook.com/intern/diff/D52300693)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116022
Approved by: https://github.com/khabinov
2023-12-20 15:35:34 +00:00
hongxyan
66a76516bf [ROCm] Disabling Kernel Asserts for ROCm by default - fix and clean up and refactoring (#114660)
Related to #103973  #110532 #108404 #94891

**Context:**
As commented in 6ae0554d11/cmake/Dependencies.cmake (L1198)
Kernel asserts are enabled by default for CUDA and disabled for ROCm.
However it is somewhat broken, and Kernel assert was still enabled for ROCm.

Disabling kernel assert is also needed for users who do not have PCIe atomics support. These community users have verified that disabling the kernel assert in PyTorch/ROCm platform fixed their pytorch workflow, like torch.sum script, stable-diffusion. (see the related issues)

**Changes:**

This pull request serves the following purposes:
* Refactor and clean up the logic,  make it simpler for ROCm to enable and disable Kernel Asserts
* Fix the bug that Kernel Asserts for ROCm was not disabled by default.

Specifically,
- Renamed `TORCH_DISABLE_GPU_ASSERTS` to `C10_USE_ROCM_KERNEL_ASSERT` for the following reasons:
(1) This variable only applies to ROCm.
(2) The new name is more align with #define CUDA_KERNEL_ASSERT function.
(3) With USE_ in front of the name, we can easily control it with environment variable to turn on and off this feature during build (e.g. `USE_ROCM_KERNEL_ASSERT=1 python setup.py develop` will enable kernel assert for ROCm build).
- Get rid of the `ROCM_FORCE_ENABLE_GPU_ASSERTS' to simplify the logic and make it easier to understand and maintain
- Added `#cmakedefine` to carry over the CMake variable to C++

**Tests:**
(1) build with default mode and verify that USE_ROCM_KERNEL_ASSERT  is OFF(0), and kernel assert is disabled:

```
python setup.py develop
```
Verify CMakeCache.txt has correct value.
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=0
```
Tested the following code in ROCm build and CUDA build, and expected the return code differently.

```
subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
```
This piece of code is adapted from below unit test to get around the limitation that this unit test now was skipped for ROCm. (We will check to enable this unit test in the future)

```
python test/test_cuda_expandable_segments.py -k test_fixed_cuda_assert_async
```

Ran the following script, expecting r ==0 since the CUDA_KERNEL_ASSERT is defined as nothing:
```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>> r
0
```

(2) Enable the kernel assert by building with USE_ROCM_KERNEL_ASSERT=1, or USE_ROCM_KERNEL_ASSERT=ON
```
USE_ROCM_KERNEL_ASSERT=1 python setup.py develop
```

Verify `USE_ROCM_KERNEL_ASSERT` is `1`
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=1
```

Run the assert test, and expected return code not equal to 0.

```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>>/xxxx/pytorch/aten/src/ATen/native/hip/TensorCompare.hip:108: _assert_async_cuda_kernel: Device-side assertion `input[0] != 0' failed.
:0:rocdevice.cpp            :2690: 2435301199202 us: [pid:206019 tid:0x7f6cf0a77700] Callback: Queue 0x7f64e8400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016

>>> r
-6
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114660
Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/jithunnair-amd
2023-12-13 15:44:53 +00:00
PyTorch MergeBot
ee96399bb4 Revert "[Reland2] Update NVTX to NVTX3 (#109843)"
This reverts commit dcb486232d.

Reverted https://github.com/pytorch/pytorch/pull/109843 on behalf of https://github.com/atalman due to Diff broke internal builds and tests ([comment](https://github.com/pytorch/pytorch/pull/109843#issuecomment-1841105398))
2023-12-05 16:10:20 +00:00
cyyever
dcb486232d [Reland2] Update NVTX to NVTX3 (#109843)
Another attempt to update NVTX to NVTX3. We now avoid changing NVTX header inclusion of existing code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109843
Approved by: https://github.com/peterbell10
2023-12-04 19:02:07 +00:00
Nikita Shulga
1fce51037e Add profiler/unwind to the package (#114981)
Needed by `torch/csrc/profiler/combined_traceback.h`
Fixes https://github.com/pytorch/pytorch/issues/114978

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114981
Approved by: https://github.com/atalman
2023-12-01 23:55:01 +00:00
Nikita Shulga
a3bbf9ce3e [BE][RelEng] Remove dynamo extra (#114720)
As all dynamo dependencies are part of the default requirements, see
```
% curl -s https://pypi.org/pypi/torch/2.1.1/json | jq '.info.requires_dist'
[
  "filelock",
  "typing-extensions",
  "sympy",
  "networkx",
  "jinja2",
  "fsspec",
  "nvidia-cuda-nvrtc-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cuda-runtime-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cuda-cupti-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cudnn-cu12 (==8.9.2.26) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cublas-cu12 (==12.1.3.1) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cufft-cu12 (==11.0.2.54) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-curand-cu12 (==10.3.2.106) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cusolver-cu12 (==11.4.5.107) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cusparse-cu12 (==12.1.0.106) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-nccl-cu12 (==2.18.1) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-nvtx-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "triton (==2.1.0) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "jinja2 ; extra == 'dynamo'",
  "opt-einsum (>=3.3) ; extra == 'opt-einsum'"
]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114720
Approved by: https://github.com/kit1980, https://github.com/huydhn
2023-11-29 15:08:27 +00:00
Philip Meier
2aa486de9b vendor packaging.version (#114108)
Fixes #113940. This vendors the relevant parts of [`packaging==23.2.0`]() to have access to `Version` and `InvalidVersion` without taking a runtime dependency on `setuptools` or `packaging`.

I didn't find any vendoring policy so I put it under `torch._vendor.packaging`. While I have only vendored the files we need, I have not touched or trimmed the files otherwise.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114108
Approved by: https://github.com/malfet, https://github.com/albanD
2023-11-21 11:51:23 +00:00
albanD
25fb88cf23 Add all 3.12 binary build for wheel. Let's see how it goes. V2 (#112882)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112882
Approved by: https://github.com/malfet, https://github.com/sammcj
2023-11-16 18:20:12 +00:00
Nikita Shulga
7bd066ab48 Package pybind11/eigen/ (#113055)
Which was added for eigen 2.11 release, see https://github.com/pybind/pybind11/tree/v2.11.0/include/pybind11/eigen

Fixes https://github.com/pytorch/pytorch/issues/112841

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113055
Approved by: https://github.com/Skylion007, https://github.com/seemethere
2023-11-07 04:27:43 +00:00
jjsjann123
39c09d4da6 Revert "Revert "Nvfuser code removal (#111093)"" (#111604)
This reverts commit 715dfced72.

The original PR #111093 is reverted due to broken internal build.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111604
Approved by: https://github.com/davidberard98
2023-10-23 18:32:41 +00:00
albanD
236472b32a Allow to specify specific files for debug info (#111748)
Building with `USE_CUSTOM_DEBINFO=torch/csrc/Module.cpp python setup.py develop` for example will provide debug info only for this file.
This allows to enable debug symbols very fast from a non-debug build by doing a clean then develop (as long as you have ccache) and avoid very large binaries that take a very long time to load in gdb.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111748
Approved by: https://github.com/drisspg, https://github.com/ezyang, https://github.com/malfet
2023-10-23 14:00:54 +00:00