pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	d1027c2be6	Revert "Update sympy version constraint to 1.13.3 (#138338 )" This reverts commit `d8279ad9d1`. Reverted https://github.com/pytorch/pytorch/pull/138338 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I think a bunch of inductor tests and test_dynamic_shapes are failing in trunk after this lands `d8279ad9d1` ([comment](https://github.com/pytorch/pytorch/pull/138338#issuecomment-2424487225))	2024-10-20 03:19:02 +00:00
Jeongseok (JS) Lee	d8279ad9d1	Update sympy version constraint to 1.13.3 (#138338 ) `simpy` was pinned to version 1.13.1 due to test failures with version 1.13.2 on Windows and mac, as reported in https://github.com/pytorch/pytorch/pull/133235. Now that a newer version, 1.13.3, has been released, this PR aims to verify if the test failure has been resolved and also allow building with newer versions for packaging purposes (e.g., https://github.com/conda-forge/pytorch-cpu-feedstock/pull/277#discussion_r1806721862). Pull Request resolved: https://github.com/pytorch/pytorch/pull/138338 Approved by: https://github.com/Skylion007, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-10-20 00:20:02 +00:00
Jeongseok Lee	3cfd244495	Add USE_SYSTEM_NVTX option (#138287 ) ## Summary We are currently [updating](https://github.com/conda-forge/pytorch-cpu-feedstock/pull/277) the [`conda-forge::pytorch`](https://anaconda.org/conda-forge/pytorch) package to version 2.5.0. This update includes a new dependency, the third_party/NVTX submodule. However, like other package management frameworks (e.g., apt), conda-forge prefers using system-installed packages instead of vendor-provided third-party packages. This pull request aims to add an option, `USE_SYSTEM_NVTX`, to select whether to use the vendored nvtx or the system-installed one, with the default being the vendored one (which is the current behavior). ## Test Plan The `USE_SYSTEM_NVTX` option is tested by building the `conda-forge::pytorch` package with the change applied as a [patch](`cd1d2464dd/recipe/patches/0005-Use-system-nvtx3.patch`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/138287 Approved by: https://github.com/albanD	2024-10-19 04:26:01 +00:00
Yu, Guangye	8cda774a03	Add torch.xpu.get_arch_list and torch.xpu.get_gencode_flags for XPU (#137773 ) # Motivation Add `torch.xpu.get_arch_list()` and `torch.xpu.get_gencode_flags()` methods that return architecture list and AOT flags to preserve what flags PyTorch XPU was built with. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137773 Approved by: https://github.com/EikanWang, https://github.com/albanD	2024-10-18 02:28:08 +00:00
atalman	6016b8a9be	Remove CI/CD python 3.8 requirements (#137893 ) Python 3.8 is deprecated from CI/CD. No reason have these pins Pull Request resolved: https://github.com/pytorch/pytorch/pull/137893 Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/albanD, https://github.com/kit1980	2024-10-14 20:28:48 +00:00
Xuehai Pan	59cdd8ddf1	Bump optree version to 0.13.0 to enable Python 3.13 and Python 3.13t support (#137396 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137396 Approved by: https://github.com/albanD	2024-10-08 06:49:04 +00:00
maajidkhann	5a6ddbcc3b	Extending the Pytorch vec backend for SVE (ARM) (#119571 ) Motivation: In Pytorch, Aten vectorization supports multiple platforms, including x86 and Arm, as well as multiple data types. It provides a generic implementation of Vector (Vec) type that allows the programmer to write code packing various primitives (such as floats) within 256bit & 512bits registers. It can be extended to support other ISAs easily by adding more VecISA sub-classes. Reference Link: https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/cpu/vec This PR: * Our goal with this contribution is to add support for SVE backend for Vec in the Aten vectorization for CPU backend which can be benefitted by any ARM architecture supported CPU's that supports SVE. * More about SVE ISA for ARM: [https://developer.arm.com/Architectures/Scalable Vector Extensions](https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions) * We are using the ARM C Language Extensions for SVE (https://developer.arm.com/documentation/102699/0100/Optimizing-with-intrinsics ) to accelerate performance for various operators in the SVE backend for Vec. * Currently we are adding support only for SVE ISA with the vector length of 256 bits (SVE 256). In future, we plan to extend this SVE support for other vector lengths as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119571 Approved by: https://github.com/malfet, https://github.com/snadampal Co-authored-by: Divya Kotadiya <divya.kotadiya@fujitsu.com>	2024-09-18 18:59:10 +00:00
angelayi	cd9ee49a69	[aoti] Add cpp loader (#135374 ) * Added a cpp loader, AOTIModelPackageLoader, which can load the .pt2, build the .so, and create a runner. The python-facing API is that users can directly call the `run` function, whereas in cpp users can directly access the `runner_` if they are more familiar with that. I couldn't figure out how to bind the `get_runner()` function to python... * Added a new config, `aot_inductor.package_cpp_only` which will not package the so. This means that whenever the package is loaded, we will need to build the so. This is turned off by default so that new environments do not need to rebuild their so. The `package_cpp_only` is a feature which torchchat intends to use to provide flexibility to users. * Added a new config, `aot_inductor.metadata` which stores user-provided metadata, serialized to the pt2 as a json file. It also stores the device used when exporting, "cuda" or "cpu", so that during load time, we can use that data to determine which AOTIModelContainerRunner to use. The metadata can be accessed through `loader.get_metadata()`. TODO is to move this metadata to the toplevel `package_aoti` function so that we can remove the metadata as a config. * Separated out `package_aoti` as a standalone function, instead of it automatically being called in inductor. This is to prepare for the case where users will compile multiple models, and want to bundle it in one package. The specific use case is in torchchat, where we want to package the separately-exported encoder and decoder layers. An example of how to use this is in `test_multiple_methods`. * `load_package` will load a singular model, given the model name. * The loader doesn't support windows for now, I think I need to add some more casing to make the build commands work on windows? Differential Revision: [D62329906](https://our.internmc.facebook.com/intern/diff/D62329906) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135374 Approved by: https://github.com/desertfire, https://github.com/malfet	2024-09-11 03:00:01 +00:00
Moritz Marseu	e000cf0ad9	Fix license metadata in setup.py (#129219 ) Package metadata in setup.py lists license as BSD-3 which is not a valid SPDX id. The correct id would be BSD-3-Clause. Specifying an SPDX id is beneficial to license compliance scanning. Taking up #129123 from my personal account. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129219 Approved by: https://github.com/malfet, https://github.com/kit1980	2024-09-04 00:21:22 +00:00
fduwjj	bdfa94b787	[RFC] Make fr trace script a console scripts (#134729 ) We want to make fr analyzer script a command after users `pip install torch`, that's why we want to mimic what torchrun is doing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134729 Approved by: https://github.com/c-p-i-o, https://github.com/malfet ghstack dependencies: #134528, #134780	2024-08-30 18:17:06 +00:00
Zitong Zhan	90c821814e	SparseCsrCUDA: cuDSS backend for linalg.solve (#129856 ) This PR switches to cuDSS library and has the same purpose of #127692, which is to add Sparse CSR tensor support to linalg.solve. Fixes #69538 Minimum example of usage: ``` import torch if __name__ == '__main__': spd = torch.rand(4, 3) A = spd.T @ spd b = torch.rand(3).to(torch.float64).cuda() A = A.to_sparse_csr().to(torch.float64).cuda() x = torch.linalg.solve(A, b) print((A @ x - b).norm()) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129856 Approved by: https://github.com/amjames, https://github.com/lezcano, https://github.com/huydhn Co-authored-by: Zihang Fang <zhfang1108@gmail.com> Co-authored-by: Huy Do <huydhn@gmail.com>	2024-08-22 07:57:30 +00:00
PyTorch MergeBot	2db28a9611	Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814 )" This reverts commit `bce0caba78`. Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/ezyang due to root cause of internal failures not addressed ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2302466444))	2024-08-21 16:13:34 +00:00
Aaron Gokaslan	bce0caba78	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-08-20 17:19:57 +00:00
cyy	c3d02fa390	[Reland2] Update NVTX to NVTX3 (#109843 ) Another attempt to update NVTX to NVTX3. We now avoid changing NVTX header inclusion of existing code. The advantage of NVTX3 over NVTX is that it is a header-only library so that linking with NVTX3 can greatly simplify our CMake and other building scripts for finding libraries in user environments. In addition, NVTX are indeed still present in the latest CUDA versions, but they're no longer a compiled library: It's now a header-only library. That's why there isn't a .lib file anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109843 Approved by: https://github.com/peterbell10, https://github.com/eqy Co-authored-by: Ivan Zaitsev <108101595+izaitsevfb@users.noreply.github.com>	2024-08-20 16:33:26 +00:00
Christophe Bornet	d6368985af	[BE]: Fix setuptools not installed with Python 3.12 (#133561 ) setuptools is not installed correctly for Python 3.12. See https://github.com/python-poetry/poetry/issues/9630#issuecomment-2291114885 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133561 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-08-17 17:42:04 +00:00
Mikayla Gawarecki	018e48c337	[Reland] Add wrappers for synchronous GPUDirect Storage APIs (#133489 ) Reland #130633 USE_CUFILE turned off by default in this version Pull Request resolved: https://github.com/pytorch/pytorch/pull/133489 Approved by: https://github.com/albanD	2024-08-15 17:11:52 +00:00
cyy	e76f0e0646	Remove QNNPACK reference from setup.py (#133177 ) QNNPACK has been removed from third party Pull Request resolved: https://github.com/pytorch/pytorch/pull/133177 Approved by: https://github.com/albanD	2024-08-13 01:19:12 +00:00
Catherine Lee	4f0d5f6551	Pin sympy to 1.13.1 (#133235 ) Sympy 1.13.2 release yesterday, and it results in test failures on windows and mac `454713fe9d/1` Hopefully these are the places it needs to get pinned Pull Request resolved: https://github.com/pytorch/pytorch/pull/133235 Approved by: https://github.com/atalman, https://github.com/ZainRizvi	2024-08-12 20:10:09 +00:00
cyy	05e8e87a69	[Submodule] Remove foxi (#132976 ) It is not used after removal of Caffe2 code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132976 Approved by: https://github.com/ezyang	2024-08-09 03:46:52 +00:00
PyTorch MergeBot	e191b83462	Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633 )" This reverts commit `709ddf7a9d`. Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to still failing internally D60265673 ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2253239607))	2024-07-26 18:08:20 +00:00
Mikayla Gawarecki	709ddf7a9d	Add wrappers for synchronous GPUDirect Storage APIs (#130633 ) Based in part on https://github.com/NVIDIA/apex/pull/1774 Differential Revision: [D60155434](https://our.internmc.facebook.com/intern/diff/D60155434) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633 Approved by: https://github.com/albanD	2024-07-25 22:23:38 +00:00
PyTorch MergeBot	e4b5645f83	Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633 )" This reverts commit `5b5e0698a5`. Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2245806738))	2024-07-23 17:19:34 +00:00
Mikayla Gawarecki	5b5e0698a5	Add wrappers for synchronous GPUDirect Storage APIs (#130633 ) Based in part on https://github.com/NVIDIA/apex/pull/1774 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633 Approved by: https://github.com/albanD	2024-07-22 14:51:24 +00:00
Xuehai Pan	d2bd9acabd	[BE] bump `optree` version to 0.12.1 (#130139 ) 0.12.0 Major Updates: - Add context manager to temporarily set the dictionary sorting mode - Add accessor APIs - Use `stable` tag for `pybind11` for Python 3.13 support - Fix potential segmentation fault for pickling support 0.12.1 Updates: - Fix warning regression during import when launch with strict warning filters Closes #130155 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130139 Approved by: https://github.com/zou3519 ghstack dependencies: #130895	2024-07-20 02:41:10 +00:00
Xuehai Pan	f0075c179b	Pin `sympy >= 1.13.0` (#130895 ) ------ The opposite of #130836. Pin `sympy >= 1.13.0` for Python >= 3.9 and `sympy == 1.12.1` for Python 3.8. - #130836 See the PR description of #130836 for more details. `sympy` 1.13.0 introduces some breaking changes which break our tests. More specifically: - Ref [Backwards compatibility breaks and deprecations](https://github.com/sympy/sympy/wiki/release-notes-for-1.13.0#backwards-compatibility-breaks-and-deprecations) > BREAKING CHANGE: Float and Integer/Rational no longer compare equal with a == b. From now on Float(2.0) != Integer(2). Previously expressions involving Float would compare unequal e.g. x2.0 != x2 but an individual Float would compare equal to an Integer. In SymPy 1.7 a Float will always compare unequal to an Integer even if they have the same "value". Use sympy.numbers.int_valued(number) to test if a number is a concrete number with no decimal part. ([#25614](https://github.com/sympy/sympy/pull/25614) by [@smichr](https://github.com/smichr)) `sympy >= 1.13.0` is required to enable Python 3.13 support. This should be part of #130689. - #130689 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130895 Approved by: https://github.com/ezyang	2024-07-20 00:59:24 +00:00
Xuehai Pan	a3abfa5cb5	[BE][Easy][1/19] enforce style for empty lines in import segments (#129752 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129752 Approved by: https://github.com/ezyang, https://github.com/malfet	2024-07-16 00:42:56 +00:00
PyTorch MergeBot	074a5c0c9b	Revert "[BE] bump `optree` version to 0.12.1 (#130139 )" This reverts commit `8fcb156e8b`. Reverted https://github.com/pytorch/pytorch/pull/130139 on behalf of https://github.com/clee2000 due to broke inductor/test_torchinductor_codegen_dynamic_shapes.py and test_sympy_utils.py `8fcb156e8b` ([comment](https://github.com/pytorch/pytorch/pull/130139#issuecomment-2229248447))	2024-07-15 19:42:11 +00:00
Xuehai Pan	8fcb156e8b	[BE] bump `optree` version to 0.12.1 (#130139 ) 0.12.0 Major Updates: - Add context manager to temporarily set the dictionary sorting mode - Add accessor APIs - Use `stable` tag for `pybind11` for Python 3.13 support - Fix potential segmentation fault for pickling support 0.12.1 Updates: - Fix warning regression during import when launch with strict warning filters Closes #130155 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130139 Approved by: https://github.com/zou3519	2024-07-15 17:27:07 +00:00
Catherine Lee	df50452279	Pin optree==0.11.0 on windows CI (#130155 ) Fixes #ISSUE_NUMBER doctests test_testing Failing run has 0.12.0 https://github.com/pytorch/pytorch/actions/runs/9804335516/job/27072891998 Succeeding run has 0.11.0 https://github.com/pytorch/pytorch/actions/runs/9798330845/job/27057359554 It is already pinned for mac and linux Pull Request resolved: https://github.com/pytorch/pytorch/pull/130155 Approved by: https://github.com/huydhn, https://github.com/atalman	2024-07-05 20:28:58 +00:00
PaliC	3d56673b24	[Split Build][BE] remove extraneous .py, .a, and .so files (#130053 ) Removes extraneous .a, .so, and .py files from the split build. From here we can also clean up the builder script which produces the binary to do this. That pr is https://github.com/pytorch/builder/pull/1912 Verification: The built wheel with BUILD_LIBTORCH_WHL=1 has the following files only (with .a, .so, and .py extensions) ``` sahanp@devgpu086 ~/p/dist (viable/strict)> pwd (pytorch-3.10) /home/sahanp/pytorch/dist sahanp@devgpu086 ~/p/dist (viable/strict)> find . -type f $ -name ".py" -o -name ".a" -o -name "*.so" $ (pytorch-3.10) ./torch/__init__.py ./torch/lib/libbackend_with_compiler.so ./torch/lib/libc10.so ./torch/lib/libjitbackend_test.so ./torch/lib/libtorch.so ./torch/lib/libtorch_cpu.so ./torch/lib/libtorch_global_deps.so ./torch/lib/libtorchbind_test.so sahanp@devgpu086 ~/p/dist (viable/strict)> ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/130053 Approved by: https://github.com/atalman	2024-07-05 19:05:32 +00:00
Randolf Scholz	22a06869f2	include jit/*.pyi (#129654 ) Fixes #108781, see https://github.com/pytorch/pytorch/pull/108782#issuecomment-1927321532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129654 Approved by: https://github.com/ezyang	2024-06-28 12:40:11 +00:00
Xu Han	424068d0d2	[Windows] remove mkl shared library dependency. (#129493 ) # Background I have fixed pytorch Windows missing mkl shared library dependency issue: https://github.com/pytorch/pytorch/issues/124009 The solution is change torch_cpu module static link mkl library: 1. pytorch static link mkl PR: https://github.com/pytorch/pytorch/pull/124925 2. builder install mkl static library: https://github.com/pytorch/builder/pull/1790 Double confirmed current build is using mkl static link: https://github.com/pytorch/pytorch/issues/124009#issuecomment-2160941802 # Goal Remove setup.py `install_requires` will install mkl shared lib on pytorch Windows. It is not required now, due to we have static linked it. It will reduce the pytorch install network traffic and avoid install useless mkl shared library package. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129493 Approved by: https://github.com/malfet	2024-06-28 11:42:21 +00:00
Nikita Shulga	816e8a3f21	[MacOS] Improve libomp packaging (#129473 ) Instead of replacing `@rpath/libomp.dylib` with `@loadper_path/libomp.dylib`, keep it in place and add `@loadper_path` as new rpath This should prevent double-loading of OpenMP runtime, because in case of `@rpath` loader is allowed to reuse other libraries, but `loadper_path` directive forces it to load it from the location relative to the executable Test plan: - Prepare the environment ```shell conda create -n py310-cf python=3.10 numpy pip -c conda-forge conda activate py310-cf pip install torch --index-url https://download.pytorch.org/whl/test/cpu ``` - Verify that OpenMP is loaded twice and than crashes ```shell KMP_VERSION=true python -c "import numpy as np; import torch; print(torch.__version__, torch.backends.openmp.is_available()); print(torch.rand(300, 300).abs().max())" ``` output: ``` LLVM OMP version: 5.0.20140926 LLVM OMP library type: performance LLVM OMP link type: dynamic LLVM OMP build time: no_timestamp LLVM OMP build compiler: Clang 16.0 LLVM OMP alternative compiler support: yes LLVM OMP API version: 5.0 (201611) LLVM OMP dynamic error checking: no LLVM OMP thread affinity support: no LLVM OMP version: 5.0.20140926 LLVM OMP library type: performance LLVM OMP link type: dynamic LLVM OMP build time: no_timestamp LLVM OMP build compiler: Clang 12.0 LLVM OMP alternative compiler support: yes LLVM OMP API version: 5.0 (201611) LLVM OMP dynamic error checking: no LLVM OMP thread affinity support: no 2.4.0 True zsh: segmentation fault KMP_VERSION=true python -c ``` - Install artifact from this PR and make sure it passes the same test ```shell python -mpip install ~/Downloads/torch-2.5.0.dev20240625-cp310-none-macosx_11_0_arm64.whl KMP_VERSION=true python -c "import numpy as np; import torch; print(torch.__version__, torch.backends.openmp.is_available()); print(torch.rand(300, 300).abs().max())" ``` output ``` LLVM OMP version: 5.0.20140926 LLVM OMP library type: performance LLVM OMP link type: dynamic LLVM OMP build time: no_timestamp LLVM OMP build compiler: Clang 16.0 LLVM OMP alternative compiler support: yes LLVM OMP API version: 5.0 (201611) LLVM OMP dynamic error checking: no LLVM OMP thread affinity support: no 2.5.0.dev20240625 True tensor(1.0000) ``` - Make sure it still uses bundled OpenMP if none is available in the environment ``` conda uninstall numpy -c conda-forge KMP_VERSION=true python -c "from ctypes import cdll, c_char_p, c_uint32; import torch; from ctypes import cdll, c_char_p, c_uint32; libdyld = cdll.LoadLibrary('libSystem.dylib'); libdyld._dyld_image_count.restype = c_uint32; libdyld._dyld_get_image_name.restype = c_char_p; libdyld._dyld_get_image_name.argtypes = [c_uint32]; print(torch.rand(300, 300).abs().max()); libs = [libdyld._dyld_get_image_name(i).decode('ascii') for i in range(libdyld._dyld_image_count())]; print([l for l in libs if 'libomp.dylib' in l])" ``` Fixes https://github.com/pytorch/pytorch/issues/124497 and https://github.com/pytorch/pytorch/issues/126385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129473 Approved by: https://github.com/atalman	2024-06-25 19:12:34 +00:00
PaliC	b0044e2e18	[Split Build] Support nightly release (#129011 ) This PR adds the split build to our binaries workflow. Validation for the workflow is done using the PR above in conjunction with https://github.com/pytorch/builder/pull/1876. Test Workflow: Check CI in the workflow above Pull Request resolved: https://github.com/pytorch/pytorch/pull/129011 Approved by: https://github.com/atalman	2024-06-22 05:45:14 +00:00
cyy	479ce5e2f4	Remove outdated CUDA code from CMake (#128801 ) It's possible to simplify some CUDA handling logic in CMake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128801 Approved by: https://github.com/r-barnes, https://github.com/malfet	2024-06-21 15:00:00 +00:00
PaliC	7d33ff59ba	[Split Build]Use same package (#127934 ) This PR removes the second separate package we were using for the libtorch wheel. In terms of testing that this works we will look use the PRs above this in the stack. As for sanity checking these are the wheels that are produced by running ``` python setup.py clean && BUILD_LIBTORCH_WHL=1 with-proxy python setup.py bdist_whee l && BUILD_PYTHON_ONLY=1 with-proxy python setup.py bdist_wheel --cmake ``` ``` sahanp@devgpu086 ~/pytorch ((5f15e171…))> ls -al dist/ (pytorch-3.10) total 677236 drwxr-xr-x 1 sahanp users 188 Jun 4 12:19 ./ drwxr-xr-x 1 sahanp users 1696 Jun 4 12:59 ../ -rw-r--r-- 1 sahanp users 81405742 Jun 4 12:19 torch-2.4.0a0+gitca0a73c-cp310-cp310-linux_x86_64.whl -rw-r--r-- 1 sahanp users 612076919 Jun 4 12:19 libtorch-2.4.0a0+gitca0a73c-py3-none-any.whl ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/127934 Approved by: https://github.com/atalman	2024-06-19 15:57:21 +00:00
albanD	8bcebc8dae	Add runtime dependency on setuptools for cpp_extensions (#127921 ) As per title since this was removed from the builtin python binary in 3.12 and we use it `torch.utils.cpp_extension.*`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127921 Approved by: https://github.com/Skylion007	2024-06-05 23:59:38 +00:00
cyy	d44daebdbc	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-31 01:20:45 +00:00
PyTorch MergeBot	67739d8c6f	Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )" This reverts commit `699db7988d`. Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))	2024-05-30 01:16:57 +00:00
PaliC	9257a0698b	[Split Build] Load dependencies from libtorch in __init__.py (#126826 ) This PR makes it such that we search for a libtorch wheel when initializing pytorch in order to find the necessary shared libraries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126826 Approved by: https://github.com/huydhn, https://github.com/atalman, https://github.com/ZainRizvi	2024-05-29 22:03:50 +00:00
cyy	699db7988d	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-29 11:58:03 +00:00
PaliC	a25b28a753	[Split Build] Add option to create libtorch wheel and use it to build pytorch as a separate wheel (#126328 ) Creates an option to just build the libtorch portion of pytorch such that we have the necessary .so files. Then it builds a torch package using the libtorch wheel. These options are enabled using ` BUILD_LIBTORCH_WHL` and `BUILD_PYTHON_ONLY`. We run ``` BUILD_LIBTORCH_WHL=1 python setup.py install python setup.py clean BUILD_PYTHON_ONLY=1 python setup.py install ``` to produce ``` sahanp@devgpu086 ~/pytorch (detached HEAD\|REBASE-i 3/5)> ls /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/torch/lib/ (pytorch-3.10) libshm.so* libtorch_global_deps.so* libtorch_python.so* sahanp@devgpu086 ~/pytorch (detached HEAD\|REBASE-i 3/5)> ldd build/lib/libtorch_python.so (pytorch-3.10) linux-vdso.so.1 (0x00007ffdc2d37000) libtorch.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libtorch.so (0x00007f539fe99000) libshm.so => /home/sahanp/pytorch/build/lib/libshm.so (0x00007f539fe90000) libcudnn.so.8 => /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn.so.8 (0x00007f539e800000) libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007f539e400000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f539e000000) libm.so.6 => /lib64/libm.so.6 (0x00007f539fda5000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f539ebe5000) libc.so.6 => /lib64/libc.so.6 (0x00007f539dc00000) /lib64/ld-linux-x86-64.so.2 (0x00007f539fea0000) libtorch_cpu.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libtorch_cpu.so (0x00007f5392400000) libtorch_cuda.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libtorch_cuda.so (0x00007f5380000000) librt.so.1 => /lib64/librt.so.1 (0x00007f539fd9e000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f539fd99000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f539fd94000) libc10.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libc10.so (0x00007f539eb07000) libmkl_intel_lp64.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_intel_lp64.so.2 (0x00007f537ec00000) libmkl_gnu_thread.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_gnu_thread.so.2 (0x00007f537ce00000) libmkl_core.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_core.so.2 (0x00007f5378800000) libomp.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/libomp.so (0x00007f539e707000) libcupti.so.12 => /usr/local/cuda/lib64/libcupti.so.12 (0x00007f5377e00000) libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x00007f5377a00000) libc10_cuda.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libc10_cuda.so (0x00007f539ea6a000) libcusparse.so.12 => /usr/local/cuda/lib64/libcusparse.so.12 (0x00007f5368400000) libcufft.so.11 => /usr/local/cuda/lib64/libcufft.so.11 (0x00007f535ee00000) libcusolver.so.11 => /usr/local/cuda/lib64/libcusolver.so.11 (0x00007f534c800000) libcurand.so.10 => /usr/local/cuda/lib64/libcurand.so.10 (0x00007f5346200000) libcublas.so.12 => /usr/local/cuda/lib64/libcublas.so.12 (0x00007f533f800000) libcublasLt.so.12 => /usr/local/cuda/lib64/libcublasLt.so.12 (0x00007f531e800000) libutil.so.1 => /lib64/libutil.so.1 (0x00007f539ea63000) libnvJitLink.so.12 => /usr/local/cuda/lib64/libnvJitLink.so.12 (0x00007f531b800000) sahanp@devgpu086 ~/pytorch (detached HEAD\|REBASE-i 3/5)> ldd build/lib/libtorch_global_deps.so (pytorch-3.10) linux-vdso.so.1 (0x00007ffc265df000) libmkl_intel_lp64.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_intel_lp64.so.2 (0x00007fa93fc00000) libmkl_gnu_thread.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_gnu_thread.so.2 (0x00007fa93de00000) libmkl_core.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_core.so.2 (0x00007fa939800000) libm.so.6 => /lib64/libm.so.6 (0x00007fa940f05000) libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x00007fa939400000) libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007fa939000000) libgomp.so.1 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libgomp.so.1 (0x00007fa93fb07000) libc.so.6 => /lib64/libc.so.6 (0x00007fa938c00000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fa940efe000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa940ef9000) /lib64/ld-linux-x86-64.so.2 (0x00007fa940ff5000) librt.so.1 => /lib64/librt.so.1 (0x00007fa940ef2000) libstdc++.so.6 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libstdc++.so.6 (0x00007fa93921d000) libgcc_s.so.1 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libgcc_s.so.1 (0x00007fa93faec000) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126328 Approved by: https://github.com/atalman	2024-05-29 04:33:56 +00:00
PyTorch MergeBot	cdbb2c9acc	Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )" This reverts commit `4fdbaa794f`. Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))	2024-05-29 03:02:35 +00:00
cyy	4fdbaa794f	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-27 03:54:03 +00:00
cyy	7428fd19fe	Remove outdated options from setup.py (#125988 ) Since the recent removal of Caffe2 files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125988 Approved by: https://github.com/ezyang	2024-05-21 18:48:23 +00:00
cyy	4ed93d6e0c	[Submodule] Remove zstd dependency (#126485 ) After searching in the codebase, it seems that zstd is not in use now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126485 Approved by: https://github.com/ezyang	2024-05-17 12:49:23 +00:00
FEI	b950217f19	Support third-party devices emit a range for each autograd operator (#125822 ) Fixes #125752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125822 Approved by: https://github.com/aaronenyeshi	2024-05-15 05:06:24 +00:00
Richard Barnes	b9e7b35912	Remove caffe2 from more build files (#125898 ) Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125898 Approved by: https://github.com/Skylion007	2024-05-13 18:37:59 +00:00
PyTorch MergeBot	33fae4fcf4	Revert "Use recursive blob for package data (#119257 )" This reverts commit `f20e3ae0c3`. Reverted https://github.com/pytorch/pytorch/pull/119257 on behalf of https://github.com/malfet due to This likely caused https://github.com/pytorch/pytorch/issues/124941, not sure why warning about recursive grep was ignored ([comment](https://github.com/pytorch/pytorch/pull/119257#issuecomment-2078312309))	2024-04-25 23:08:22 +00:00
Florian	7ad6dc2cf3	[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#124818 ) Summary: 1.Package public headers of kineto if USE_KINETO so that they can be used by PrivateUse1 user. 2.Add PrivateUse1 key to ActivityType. 3. Support PrivateUse1 key in function deviceTypeFromActivity and _supported_activities. 4. Fix some bugs when processing profiler results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124818 Approved by: https://github.com/aaronenyeshi	2024-04-24 18:52:08 +00:00
Timmy Xiao	f20e3ae0c3	Use recursive blob for package data (#119257 ) setup.py now supports recursive glob for package data I only added `.cpp`, `.h`, and `.yaml` files. Not sure if you want to include BAZEL or other files in package_data. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119257 Approved by: https://github.com/zou3519	2024-04-20 06:33:39 +00:00
PyTorch MergeBot	36f6928a37	Revert "[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556 )" This reverts commit `41613a0803`. Reverted https://github.com/pytorch/pytorch/pull/120556 on behalf of https://github.com/aaronenyeshi due to Breaks GPU Chrome trace UI ([comment](https://github.com/pytorch/pytorch/pull/120556#issuecomment-2061578951))	2024-04-17 15:38:14 +00:00
Xuehai Pan	2e48f7b044	[pytree] add `tree_iter` function (#123913 ) - Add a new `tree_iter` function. - Bump `optree` version to `0.11.0` for C++ version of `tree_iter`. This PR is split from #120300. - #120300 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123913 Approved by: https://github.com/zou3519	2024-04-16 06:02:08 +00:00
Florian	41613a0803	[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556 ) Summary: 1.Package public headers of kineto if USE_KINETO so that they can be used by PrivateUse1 user. 2.Add PrivateUse1 key to ActivityType. 3. Support PrivateUse1 key in function deviceTypeFromActivity and _supported_activities. 4. Fix some bugs when processing profiler results. Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Aaron Shi <enye.shi@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120556 Approved by: https://github.com/aaronenyeshi	2024-04-12 14:28:19 +00:00
Aidyn-A	a6080f79e9	[Build] Add linker script optimization (#121975 ) This PR adds a linker script optimization based on prioritized symbols that can be extracted from the profiles of popular workloads. The present linker script was generated to target ARM+CUDA and later can be extended if necessary. The reason we target ARM is shown below: > PyTorch and other applications that access more than 24x 2MB code regions in quick succession can result in performance bottlenecks in the CPU front-end. The link-time optimization improves executable code locality and improve performance. We recommend turning on the optimization always for PyTorch and other application that behaves similarly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121975 Approved by: https://github.com/ptrblck, https://github.com/atalman	2024-04-09 20:22:25 +00:00
Nikita Shulga	5b0ce8f334	[Wheel] Change libtorch_cpu OpenMP search path (#123417 ) To prevent delocate from double-packing it, which makes Torch wheels unusable with torch.compile out of the box Fixes https://github.com/pytorch/pytorch/issues/122705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123417 Approved by: https://github.com/atalman	2024-04-05 13:02:38 +00:00
Lei,zhenyuan	15bd81bfaf	expose transformer header in cmake and wheel (#122586 ) expose transformer header in cmake and wheel, some utils functions are used in nested transformer development on IPEX side Pull Request resolved: https://github.com/pytorch/pytorch/pull/122586 Approved by: https://github.com/drisspg, https://github.com/Neilblaze, https://github.com/gujinghui	2024-04-03 02:27:40 +00:00
dujinhang	9990d1bc22	Add 'profiler/python' to the package.' (#121892 ) Fixes #ISSUE_NUMBER expose the `py_symbolize` interface for use. thank you Pull Request resolved: https://github.com/pytorch/pytorch/pull/121892 Approved by: https://github.com/zdevito	2024-03-16 11:11:26 +00:00
Bin Bao	bd19d6d822	[AOTI] Use torchgen to generate C shim functions (#120513 ) Summary: The current C shim layer manually implements a C interface for a handful of ops. Obviously that's not scalable if we want to extend it to cover all aten ops. This new torchgen script automatically generates C shim interfaces for CPU and CUDA backends. The interface follows the same parameter passing rules as the current C shim layer, such as * Use plain C data types to pass parameters * Use AtenTensorHandle to pass at::Tensor * Use pointer type to pass optional parameter * Use pointer+length to pass list * Use device_type+device_index to pass device * When a parameter is a pointer of pointer, e.g. AtenTensorHandle**, the script generates either a list of optional values or an optional list of values https://gist.github.com/desertfire/83701532b126c6d34dae6ba68a1b074a is an example of the generated torch/csrc/inductor/aoti_torch/generated/c_shim_cuda.cpp file. The current version doesn't generate C shim wrappers for all aten ops, and probably generates more wrappers than needed on the other hand, but it should serve as a good basis. This PR by itself won't change AOTI codegen and thus won't introduce any FC breakage. The actual wrapper codegen changes will come in another PR with some version control flag to avoid FC breakage. Differential Revision: [D54258087](https://our.internmc.facebook.com/intern/diff/D54258087) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120513 Approved by: https://github.com/jansel	2024-03-05 04:28:44 +00:00
atalman	cb812c9832	Add windows constraint to mkl package in wheel (#121014 ) Follow up on: https://github.com/pytorch/pytorch/pull/102604 Address this comment: https://github.com/pytorch/pytorch/pull/102604#discussion_r1419944305 Whl metadata for all wheels published to pypi must match, otherwise poetry install will fail see this comment: https://github.com/pytorch/pytorch/issues/88049#issuecomment-1302555269 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121014 Approved by: https://github.com/malfet	2024-03-04 20:54:26 +00:00
Kurman Karabukaev	b0cfa96e82	[Torchelastic][Logging] Pluggable logsspecs using python entrypoints and option to specify one by name. (#120942 ) Summary: Expose an option to users to specify name of the LogsSpec implementation to use. - Has to be defined in entrypoints under `torchrun.logs_specs` group. - Must implement LogsSpec defined in prior PR/diff. Test Plan: unit test+local tests Reviewed By: ezyang Differential Revision: D54180838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120942 Approved by: https://github.com/ezyang	2024-03-02 08:07:52 +00:00
Lei,zhenyuan	eee040c939	expose nested header to wheel (#120603 ) expose nested header to pytorch wheel, help with developers for reuse pytorch nested tensor related utils header inside wheel Pull Request resolved: https://github.com/pytorch/pytorch/pull/120603 Approved by: https://github.com/jbschlosser, https://github.com/gujinghui	2024-03-01 09:59:45 +00:00
Yu, Guangye	df40847486	Add xpu header to include/ATen/xpu (#120786 ) # Motivation Add xpu header file to `include/ATen/xpu` to make them public. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120786 Approved by: https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/jgong5, https://github.com/albanD	2024-02-28 16:22:14 +00:00
Jeff Daily	0e6eee3c89	[ROCm] TunableOp (#114894 ) Some operations, such as GEMMs, could be implemented using more than one library or more than one technique. For example, a GEMM could be implemented for CUDA or ROCm using either the blas or blasLt libraries. Further, ROCm's rocblas and hipblaslt libraries allow the user to query for all possible algorithms and then choose one. How does one know which implementation is the fastest and should be chosen? That's what TunableOp provides. See the README.md for additional details. TunableOp was ported from onnxruntime starting from commit `08dce54266`. The content was significantly modified and reorganized for use within PyTorch. The files copied and their approximate new names or source content location within aten/src/ATen/cuda/tunable include the following: - onnxruntime/core/framework/tunable.h -> Tunable.h - onnxruntime/core/framework/tuning_context.h -> Tunable.h - onnxruntime/core/framework/tuning_context_impl.h -> Tunable.cpp - onnxruntime/core/providers/rocm/tunable/gemm_common.h -> GemmCommon.h - onnxruntime/core/providers/rocm/tunable/gemm_hipblaslt.h -> GemmHipblaslt.h - onnxruntime/core/providers/rocm/tunable/gemm_rocblas.h -> GemmRocblas.h - onnxruntime/core/providers/rocm/tunable/gemm_tunable.cuh -> TunableGemm.h - onnxruntime/core/providers/rocm/tunable/rocm_tuning_context.cc -> Tunable.cpp - onnxruntime/core/providers/rocm/tunable/util.h -> StreamTimer.h - onnxruntime/core/providers/rocm/tunable/util.cc -> StreamTimer.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/114894 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh	2024-02-14 19:03:49 +00:00
Aaron Gokaslan	f9200c8608	[BE][Ez]: FURB129: remove unneeded readlines() (#119796 ) Applies a refurb rule to remove any readlines() in a for loop iteration as it just creates a temporary list in memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119796 Approved by: https://github.com/ezyang	2024-02-13 21:21:22 +00:00
Nikita Shulga	60148f1761	[EZ] Set maximum supported version of Python as 3.12 (#119743 ) Doesn't really affect anything other than metadata on PyPI website Otherwise programming languages tab on https://pypi.org/project/torch/2.2.0/ shows supported version 3.8 to 3.10: <img width="239" alt="image" src="https://github.com/pytorch/pytorch/assets/2453524/e17f9982-8833-4cd8-b8d8-b2f1cb538548"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/119743 Approved by: https://github.com/kit1980, https://github.com/Skylion007	2024-02-13 06:56:32 +00:00
Yu, Guangye	a205e7bf56	[3/4] Intel GPU Runtime Upstreaming for Device (#116850 ) # Motivation According to [[1/4] Intel GPU Runtime Upstreaming for Device](https://github.com/pytorch/pytorch/pull/116019), As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), this third PR covers the changes under `libtorch_python`. # Design This PR primarily offers device-related APIs in python frontend, including - `torch.xpu.is_available` - `torch.xpu.device_count` - `torch.xpu.current_device` - `torch.xpu.set_device` - `torch.xpu.device` - `torch.xpu.device_of` - `torch.xpu.get_device_name` - `torch.xpu.get_device_capability` - `torch.xpu.get_device_properties` - ==================== - `torch.xpu._DeviceGuard` - `torch.xpu._is_compiled` - `torch.xpu._get_device` # Additional Context We will implement the support of lazy initialization in the next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116850 Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/gujinghui, https://github.com/malfet	2024-02-01 12:31:26 +00:00
Zhengxu Chen	2d37a046e7	[export] Enforce serialization BC/FC with updater script. (#118424 ) Summary: This diff implements a mechanism for safely update torch.export serialization schema, aka schema.py, which is the API surface having the strongest compatibility guarantee. The diff is consist of 3 changes: - Added a script to "build" or "materialize" schema.py into a platform neutral format (yaml), which serves as the committed form of the seialization schema. - Added unittest to compare against schema.py and schema.yaml, so that it forces developers to execute the updater script when there is mismatch between two files. - Added a checker inside the updater script, so that all the compatible change will result in a minor version bump, and all the incompatible changes will result in a major version bump. torch.export's serialization BC/FC policy is (tentatively) documented here: https://docs.google.com/document/d/1EN7JrHbOPDhbpLDtiYG4_BPUs7PttpXlbZ27FuwKhxg/edit#heading=h.pup7ir8rqjhx , we will update the As noted in the code doc, people should be able to run the following command to update schema properly from now on: ``` python scripts/export/update_schema.py --prefix <path_to_torch_development_diretory> or buck run caffe2:export_update_schema -- --prefix /data/users/$USER/fbsource/fbcode/caffe2/ ``` Test Plan: buck test mode/opt caffe2/test:test_export -- -r test_schema buck run caffe2:update_export_schema -- --prefix /data/users/$USER/fbsource/fbcode/caffe2/ Differential Revision: D52971020 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118424 Approved by: https://github.com/angelayi	2024-01-31 05:37:58 +00:00
feifan	3c77a3ed03	export ATen/native/sparse/*.h (#118274 ) Fixes #ISSUE_NUMBER We are trying to adapt `SparsePrivateUse1` in our code. However, I found that `sparse_stup` has not been exposed yet, which makes it impossible for me to implement stup and register. I hope that the header files in this directory can be exposed. @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/118274 Approved by: https://github.com/ezyang	2024-01-25 22:47:39 +00:00
mantaionut	6784594532	Fix sparse windows on CPU with MKL (#102604 ) Fix https://github.com/pytorch/pytorch/issues/97352. This PR changes the way the linking to intel MKL is done and updating MKL on Windows to mkl-2021.4.0 . There are for both conda and pip packages MKL version with which you can link dynamically. mkl-devel contains the static versions of the dlls and MKL contains the needed dlls for the runtime. MKL dlls and static libs starting with 2021.4.0 have the version in their names( for MKL 2023 we have mkl_core.2.dll and for 2021.4.0 we have mkl_core.1.dll) so its possible to have multiple versions installed and it will work properly. For the wheel build, I added dependency for whell MKL and on conda a dependecy for the conda MKL and on libtorch I copied the MKL binaries in libtorch. In order to test this PR I have to use custom builder https://github.com/pytorch/builder/pull/1467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102604 Approved by: https://github.com/IvanYashchuk, https://github.com/malfet	2024-01-23 17:41:18 +00:00
Nikita Shulga	c4eab49ded	[MacOS] Embed libomp.dylib/omp.h into MacOS wheel (#114816 ) To keep them on par with what we do on x86 And `omp.h` as it is needed for `torch.compile` on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/114816 Approved by: https://github.com/atalman	2024-01-19 21:21:33 +00:00
Yu, Guangye	50049cfaa0	[1/4] Intel GPU Runtime Upstreaming for Device (#116019 ) # Motivation As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), The first runtime component we would like to upstream is `Device` which contains the device management functions of Intel GPU's runtime. To facilitate the code review, we split the code changes into 4 PRs. This is one of the 4 PRs and covers the changes under `c10`. # Design Intel GPU device is a wrapper of sycl device on which kernels can be executed. In our design, we will maintain a sycl device pool containing all the GPU devices of the current machine, and manage the status of the device pool by PyTorch. The thread local safe is considered in this design. The corresponding C++ files related to `Device` will be placed in c10/xpu folder. And we provide the c10 device runtime APIs, like - `c10::xpu::device_count` - `c10::xpu::set_device` - ... # Additional Context In our plan, 4 PRs should be submitted to PyTorch for `Device`: 1. for c10 2. for aten 3. for python frontend 4. for lazy initialization shared with CUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/116019 Approved by: https://github.com/gujinghui, https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet	2024-01-12 07:36:25 +00:00
Edward Yang	b4a35632f9	Add function to materialize COW storages (#117053 ) Summary: From Kurt Mohler, see https://github.com/pytorch/pytorch/pull/113396 (manually imported due to ghimport problems) Test Plan: sandcastle, OSS CI Differential Revision: D52610522 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117053 Approved by: https://github.com/malfet, https://github.com/kurtamohler	2024-01-10 15:34:16 +00:00
PyTorch MergeBot	9ac0e6971a	Revert "[1/4] Intel GPU Runtime Upstreaming for Device (#116019 )" This reverts commit `b4cebe2c34`. Reverted https://github.com/pytorch/pytorch/pull/116019 on behalf of https://github.com/malfet due to Broke internal and periodic buck builds, see https://github.com/pytorch/pytorch/actions/runs/7414664129/job/20176215868 ([comment](https://github.com/pytorch/pytorch/pull/116019#issuecomment-1879030285))	2024-01-05 17:36:39 +00:00
Yu, Guangye	b4cebe2c34	[1/4] Intel GPU Runtime Upstreaming for Device (#116019 ) # Motivation As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), The first runtime component we would like to upstream is `Device` which contains the device management functions of Intel GPU's runtime. To facilitate the code review, we split the code changes into 4 PRs. This is one of the 4 PRs and covers the changes under `c10`. # Design Intel GPU device is a wrapper of sycl device on which kernels can be executed. In our design, we will maintain a sycl device pool containing all the GPU devices of the current machine, and manage the status of the device pool by PyTorch. The thread local safe is considered in this design. The corresponding C++ files related to `Device` will be placed in c10/xpu folder. And we provide the c10 device runtime APIs, like - `c10::xpu::device_count` - `c10::xpu::set_device` - ... # Additional Context In our plan, 4 PRs should be submitted to PyTorch for `Device`: 1. for c10 2. for aten 3. for python frontend 4. for lazy initialization shared with CUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/116019 Approved by: https://github.com/gujinghui, https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet	2024-01-04 17:35:04 +00:00
Bin Bao	fabf9433e7	[AOTI][refactor] Organize model runner files (#116022 ) Summary: Move runner util files into a subdirectory and put AOTIModelContainerRunnerCpu into a separate file Differential Revision: [D52300693](https://our.internmc.facebook.com/intern/diff/D52300693) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116022 Approved by: https://github.com/khabinov	2023-12-20 15:35:34 +00:00
hongxyan	66a76516bf	[ROCm] Disabling Kernel Asserts for ROCm by default - fix and clean up and refactoring (#114660 ) Related to #103973 #110532 #108404 #94891 Context: As commented in `6ae0554d11/cmake/Dependencies.cmake (L1198)` Kernel asserts are enabled by default for CUDA and disabled for ROCm. However it is somewhat broken, and Kernel assert was still enabled for ROCm. Disabling kernel assert is also needed for users who do not have PCIe atomics support. These community users have verified that disabling the kernel assert in PyTorch/ROCm platform fixed their pytorch workflow, like torch.sum script, stable-diffusion. (see the related issues) Changes: This pull request serves the following purposes: * Refactor and clean up the logic, make it simpler for ROCm to enable and disable Kernel Asserts * Fix the bug that Kernel Asserts for ROCm was not disabled by default. Specifically, - Renamed `TORCH_DISABLE_GPU_ASSERTS` to `C10_USE_ROCM_KERNEL_ASSERT` for the following reasons: (1) This variable only applies to ROCm. (2) The new name is more align with #define CUDA_KERNEL_ASSERT function. (3) With USE_ in front of the name, we can easily control it with environment variable to turn on and off this feature during build (e.g. `USE_ROCM_KERNEL_ASSERT=1 python setup.py develop` will enable kernel assert for ROCm build). - Get rid of the `ROCM_FORCE_ENABLE_GPU_ASSERTS' to simplify the logic and make it easier to understand and maintain - Added `#cmakedefine` to carry over the CMake variable to C++ Tests: (1) build with default mode and verify that USE_ROCM_KERNEL_ASSERT is OFF(0), and kernel assert is disabled: ``` python setup.py develop ``` Verify CMakeCache.txt has correct value. ``` /xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt USE_ROCM_KERNEL_ASSERT:BOOL=0 ``` Tested the following code in ROCm build and CUDA build, and expected the return code differently. ``` subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"]) ``` This piece of code is adapted from below unit test to get around the limitation that this unit test now was skipped for ROCm. (We will check to enable this unit test in the future) ``` python test/test_cuda_expandable_segments.py -k test_fixed_cuda_assert_async ``` Ran the following script, expecting r ==0 since the CUDA_KERNEL_ASSERT is defined as nothing: ``` >> import sys >>> import subprocess >>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"]) >>> r 0 ``` (2) Enable the kernel assert by building with USE_ROCM_KERNEL_ASSERT=1, or USE_ROCM_KERNEL_ASSERT=ON ``` USE_ROCM_KERNEL_ASSERT=1 python setup.py develop ``` Verify `USE_ROCM_KERNEL_ASSERT` is `1` ``` /xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt USE_ROCM_KERNEL_ASSERT:BOOL=1 ``` Run the assert test, and expected return code not equal to 0. ``` >> import sys >>> import subprocess >>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"]) >>>/xxxx/pytorch/aten/src/ATen/native/hip/TensorCompare.hip:108: _assert_async_cuda_kernel: Device-side assertion `input[0] != 0' failed. :0:rocdevice.cpp :2690: 2435301199202 us: [pid:206019 tid:0x7f6cf0a77700] Callback: Queue 0x7f64e8400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016 >>> r -6 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114660 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/jithunnair-amd	2023-12-13 15:44:53 +00:00
PyTorch MergeBot	ee96399bb4	Revert "[Reland2] Update NVTX to NVTX3 (#109843 )" This reverts commit `dcb486232d`. Reverted https://github.com/pytorch/pytorch/pull/109843 on behalf of https://github.com/atalman due to Diff broke internal builds and tests ([comment](https://github.com/pytorch/pytorch/pull/109843#issuecomment-1841105398))	2023-12-05 16:10:20 +00:00
cyyever	dcb486232d	[Reland2] Update NVTX to NVTX3 (#109843 ) Another attempt to update NVTX to NVTX3. We now avoid changing NVTX header inclusion of existing code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109843 Approved by: https://github.com/peterbell10	2023-12-04 19:02:07 +00:00
Nikita Shulga	1fce51037e	Add `profiler/unwind` to the package (#114981 ) Needed by `torch/csrc/profiler/combined_traceback.h` Fixes https://github.com/pytorch/pytorch/issues/114978 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114981 Approved by: https://github.com/atalman	2023-12-01 23:55:01 +00:00
Nikita Shulga	a3bbf9ce3e	[BE][RelEng] Remove `dynamo` extra (#114720 ) As all dynamo dependencies are part of the default requirements, see ``` % curl -s https://pypi.org/pypi/torch/2.1.1/json \| jq '.info.requires_dist' [ "filelock", "typing-extensions", "sympy", "networkx", "jinja2", "fsspec", "nvidia-cuda-nvrtc-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cuda-runtime-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cuda-cupti-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cudnn-cu12 (==8.9.2.26) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cublas-cu12 (==12.1.3.1) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cufft-cu12 (==11.0.2.54) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-curand-cu12 (==10.3.2.106) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cusolver-cu12 (==11.4.5.107) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cusparse-cu12 (==12.1.0.106) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-nccl-cu12 (==2.18.1) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-nvtx-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "triton (==2.1.0) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "jinja2 ; extra == 'dynamo'", "opt-einsum (>=3.3) ; extra == 'opt-einsum'" ] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114720 Approved by: https://github.com/kit1980, https://github.com/huydhn	2023-11-29 15:08:27 +00:00
Philip Meier	2aa486de9b	vendor packaging.version (#114108 ) Fixes #113940. This vendors the relevant parts of [`packaging==23.2.0`]() to have access to `Version` and `InvalidVersion` without taking a runtime dependency on `setuptools` or `packaging`. I didn't find any vendoring policy so I put it under `torch._vendor.packaging`. While I have only vendored the files we need, I have not touched or trimmed the files otherwise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114108 Approved by: https://github.com/malfet, https://github.com/albanD	2023-11-21 11:51:23 +00:00
albanD	25fb88cf23	Add all 3.12 binary build for wheel. Let's see how it goes. V2 (#112882 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/112882 Approved by: https://github.com/malfet, https://github.com/sammcj	2023-11-16 18:20:12 +00:00
Nikita Shulga	7bd066ab48	Package `pybind11/eigen/` (#113055 ) Which was added for eigen 2.11 release, see https://github.com/pybind/pybind11/tree/v2.11.0/include/pybind11/eigen Fixes https://github.com/pytorch/pytorch/issues/112841 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113055 Approved by: https://github.com/Skylion007, https://github.com/seemethere	2023-11-07 04:27:43 +00:00
jjsjann123	39c09d4da6	Revert "Revert "Nvfuser code removal (#111093 )"" (#111604 ) This reverts commit `715dfced72`. The original PR #111093 is reverted due to broken internal build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111604 Approved by: https://github.com/davidberard98	2023-10-23 18:32:41 +00:00
albanD	236472b32a	Allow to specify specific files for debug info (#111748 ) Building with `USE_CUSTOM_DEBINFO=torch/csrc/Module.cpp python setup.py develop` for example will provide debug info only for this file. This allows to enable debug symbols very fast from a non-debug build by doing a clean then develop (as long as you have ccache) and avoid very large binaries that take a very long time to load in gdb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111748 Approved by: https://github.com/drisspg, https://github.com/ezyang, https://github.com/malfet	2023-10-23 14:00:54 +00:00
Sergii Dymchenko	3c4581d613	Remove outdated declarations from setup.py (#110660 ) `-Wno-deprecated-declarations` should not be needed after Python 2 not supported. Clang issue for `-Wno-missing-braces` was fixed in 2018. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110660 Approved by: https://github.com/huydhn, https://github.com/atalman, https://github.com/malfet	2023-10-21 04:55:44 +00:00
Aleksei Nikiforov	ba04d84089	S390x inductor support (#111367 ) Use arch compile flags. They are needed for vectorization support on s390x. Implement new helper functions for inductor. This change fixes multiple tests in test_cpu_repro.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/111367 Approved by: https://github.com/ezyang	2023-10-20 19:38:46 +00:00
PyTorch MergeBot	715dfced72	Revert "Nvfuser code removal (#111093 )" This reverts commit `572628e520`. Reverted https://github.com/pytorch/pytorch/pull/111093 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, @albanD please help to support the author with the next steps to get this diff merged ([comment](https://github.com/pytorch/pytorch/pull/111093#issuecomment-1771434853))	2023-10-19 17:39:49 +00:00
jjsjann123	572628e520	Nvfuser code removal (#111093 ) Removes the existing integration code & build of nvfuser in TorchScript. Note that I intentionally left the part where we wipe out `third_party/nvfuser` repo. I'll do that in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111093 Approved by: https://github.com/albanD	2023-10-18 01:00:47 +00:00
atalman	f9053877b4	Add pypi required metadata to all wheels except linux (#111042 ) Will fix package after publishing https://github.com/pytorch/pytorch/issues/100974 Poetry install requires all wheels on pypi to have same metadata. Hence including linux dependencies in all non-linux wheels Pull Request resolved: https://github.com/pytorch/pytorch/pull/111042 Approved by: https://github.com/malfet	2023-10-12 17:40:13 +00:00
Bin Bao	4bf1cd6961	[aotinductor] Rename aot_runtime to aoti_runtime (#110007 ) Summary: Make the naming more explicit Differential Revision: D49593528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110007 Approved by: https://github.com/houseroad	2023-09-26 00:46:54 +00:00
Bin Bao	9c2715bbb2	[inductor] Clean up AOTInductor runtime ABI (#109678 ) Summary: Change the AOTInductor runtime interface to avoid referring to aten data structures directly, mostly at::Tensor and ProxyExecutor. This a combination of https://github.com/pytorch/pytorch/pull/109436, https://github.com/pytorch/pytorch/pull/109498, https://github.com/pytorch/pytorch/pull/109450, https://github.com/pytorch/pytorch/pull/109606, plus a few internal build changes. Reviewed By: frank-wei Differential Revision: D49374820 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109678 Approved by: https://github.com/frank-wei, https://github.com/chenyang78	2023-09-21 00:25:24 +00:00
Xuehai Pan	0bf30c140a	[pytree] Use OpTree for PyTree manipulation (#93139 ) Split from #92679. Use C++-based PyTree implementation. ## Highlights 1. High performance (20x speedup than the pure-Python implementation, 10%-20% overall speedup for `torch.fx`) 2. Multi-input tree-map support 3. Custom tree node registry with namespace isolation Refs: - #65761 - #91323 - #92679 From https://github.com/pytorch/pytorch/issues/65761#issuecomment-1334746366: > ### 0. Out-of-box compatible with JAX's pytree, provides the same interfaces and functions (and more). > > ### 1. High-performance: `optree` has comparable fast tree operations (~0.9x for `dict`s and ~2.5x for `OrderedDict`s) than JAX's pytree and it is 20x faster than `torch.utils._pytree`. > > `optree` implements some common Python container types in C++ (e.g., `OrderedDict`) and achieves 2.5x performance than JAX's pytree. Check out section [Built-in PyTree Node Types](https://github.com/metaopt/optree#built-in-pytree-node-types) and [Benchmark](https://github.com/metaopt/optree#benchmark) for more details. > > \| Module \| Nodes \| OpTree (μs) \| JAX XLA (μs) \| PyTorch (μs) \| DM-Tree (μs) \| Speedup (J / O) \| Speedup (P / O) \| Speedup (D / O) \| > \| :-------- \| ----: \| ----------: \| -----------: \| -----------: \| -----------: \| --------------: \| --------------: \| --------------: \| > \| TinyMLP \| 53 \| 26.40 \| 68.19 \| 586.87 \| 34.14 \| 2.58 \| 22.23 \| 1.29 \| > \| AlexNet \| 188 \| 84.28 \| 259.51 \| 2182.07 \| 125.12 \| 3.08 \| 25.89 \| 1.48 \| > \| ResNet18 \| 698 \| 288.57 \| 807.27 \| 7881.69 \| 429.39 \| 2.80 \| 27.31 \| 1.49 \| > \| ResNet34 \| 1242 \| 580.75 \| 1564.97 \| 15082.84 \| 819.02 \| 2.69 \| 25.97 \| 1.41 \| > \| ResNet50 \| 1702 \| 791.18 \| 2081.17 \| 20982.82 \| 1104.62 \| 2.63 \| 26.52 \| 1.40 \| > \| ResNet101 \| 3317 \| 1603.93 \| 3939.37 \| 40382.14 \| 2208.63 \| 2.46 \| 25.18 \| 1.38 \| > \| ResNet152 \| 4932 \| 2446.56 \| 6267.98 \| 56892.36 \| 3139.17 \| 2.56 \| 23.25 \| 1.28 \| > \| ViT-H/14 \| 3420 \| 1681.48 \| 4488.33 \| 41703.16 \| 2504.86 \| 2.67 \| 24.80 \| 1.49 \| > \| Swin-B \| 2881 \| 1565.41 \| 4091.10 \| 34241.99 \| 1936.75 \| 2.61 \| 21.87 \| 1.24 \| > \| \| \| \| \| \| Average \| 2.68 \| 24.78 \| 1.38 \| > > <div align="center"> > <img src="https://user-images.githubusercontent.com/16078332/200494435-fd5bb385-59f7-4811-b520-98bf5763ccf3.png" width="90%" /> > </div> > > ### 2. Namespace Isolation for the PyTree Type Registry > > In addition to the JAX's pytree registry for custom node type registration, `optree` adds `namespace` isolation to the registry. Users can register the same type multiple times for different flatten/unflatten behavior. It also provides module-level isolation for safety reasons. For example, you can add a unique prefix to your namespace to isolate your registry with other modules (e.g., `torch.xxx`, `torch.functorch.xxx`): > > ```python > # Register a Python type into a namespace > import torch > > optree.register_pytree_node( > torch.Tensor, > # (tensor) -> (children, metadata) > flatten_func=lambda tensor: ( > (tensor.cpu().numpy(),), > dict(dtype=tensor.dtype, device=tensor.device, requires_grad=tensor.requires_grad), > ), > # (metadata, children) -> tensor > unflatten_func=lambda metadata, children: torch.tensor(children[0], *metadata), > namespace='torch.torch2numpy', > ) > ``` > > ```python > >>> tree = {'weight': torch.ones(size=(1, 2)).cuda(), 'bias': torch.zeros(size=(2,))} > >>> tree > {'weight': tensor([[1., 1.]], device='cuda:0'), 'bias': tensor([0., 0.])} > > # Flatten without specifying the namespace > >>> tree_flatten(tree) # `torch.Tensor`s are leaf nodes > ([tensor([0., 0.]), tensor([[1., 1.]], device='cuda:0')], PyTreeSpec({'bias': , 'weight': })) > > # Flatten with the namespace > >>> leaves, treespec = optree.tree_flatten(tree, namespace='torch.torch2numpy') > >>> leaves, treespec > ( > [array([0., 0.], dtype=float32), array([[1., 1.]], dtype=float32)], > PyTreeSpec( > { > 'bias': CustomTreeNode(Tensor[{'dtype': torch.float32, 'device': device(type='cpu'), 'requires_grad': False}], []), > 'weight': CustomTreeNode(Tensor[{'dtype': torch.float32, 'device': device(type='cuda', index=0), 'requires_grad': False}], [*]) > }, > namespace='torch.torch2numpy' > ) > ) > > # `entries` are not defined and use `range(len(children))` > >>> optree.tree_paths(tree, namespace='torch.torch2numpy') > [('bias', 0), ('weight', 0)] > > # Unflatten back to a copy of the original object > >>> optree.tree_unflatten(treespec, leaves) > {'bias': tensor([0., 0.]), 'weight': tensor([[1., 1.]], device='cuda:0')} > ``` > > Check out section [Registering a Container-like Custom Type as Non-leaf Nodes](https://github.com/metaopt/optree#notes-about-the-pytree-type-registry) for more details. > > ### 3. Support both `None` as Non-leaf Node and `None` as Leaf > > In JAX's implementation, `None` is always an internal non-leaf node with an arity 0, which is like an empty tuple. This limits the usage of the JAX's pytree utilities for PyTorch. For example, the `nn.Module` uses `_parameters` and `_buffers` (`OrderedDict[str, Optional[Tensor]]`) to hold the tensors, while the value can be a tensor or `None`. > > `optree` supports both `None` as Non-leaf Node (JAX's default) and `None` as Leaf (PyTorch's default). Check out section [None is Non-leaf Node vs. None is Leaf](https://github.com/metaopt/optree#none-is-non-leaf-node-vs-none-is-leaf) for more details. > > ### 4. Some other improvements and bug fixes > > 1. Adds in-place version of treemap (`tree_map_`), which reduces redundant unflatten operation for better performance. > 2. Adds support for tree flatten and tree map with paths. (useful for `functorch` module extraction). > 3. Improves the JAX's pytree sorting support for `dict`s. > 4. Better string representation `repr(PyTreeSpec)`. > 5. Fixes some bugs for JAX's pytree of hashing, pickle serialization, segmentation fault for infinite recursion, and tree-compose/tree-transpose. From https://github.com/pytorch/pytorch/pull/92679#issuecomment-1398778481: > ```python > # pytree_make_fx_bench.py > import torch > from torch.fx.experimental.proxy_tensor import make_fx > import time > > def f(x): > for _ in range(10000): > x = x+x > return x > > import time > begin = time.time() > out = make_fx(f, tracing_mode="real")(torch.randn(20)) > begin = time.time() > print(f'tracing_mode="real" {time.time() - begin:.2f}') > out = make_fx(f, tracing_mode="fake")(torch.randn(20)) > print(f'tracing_mode="fake" {time.time() - begin:.2f}') > > out = make_fx(f, tracing_mode="symbolic")(torch.randn(20)) > print(f'tracing_mode="symbolic" {time.time() - begin:.2f}') > ``` > > This seems to run around 10-20% faster with the optree implementation: > > ``` > # Optree > python pytree_make_fx_bench.py > tracing_mode="real" 0.00 > tracing_mode="fake" 6.32 > tracing_mode="symbolic" 27.13 > ``` > > ``` > # torch.utils._pytree > python pytree_make_fx_bench.py > tracing_mode="real" 0.00 > tracing_mode="fake" 7.66 > tracing_mode="symbolic" 31.07 > ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/93139 Approved by: https://github.com/malfet	2023-09-18 21:24:56 +00:00
Bin Bao	0f646b1d15	[inductor] Add a C shim layer for libtorch (#109391 ) Summary: This PR adds a limited C shim layer for libtorch. The ultimate goal is to ban any direct reference to aten/c10 data structures or functions, to avoid ABI breakage by providing stable C interfaces. To make the review and landing easier, we broke the changes into several steps. In this PR (a combination of https://github.com/pytorch/pytorch/pull/109022 and https://github.com/pytorch/pytorch/pull/109351), we add C interfaces for certain libtorch functions and modify the wrapper codegen to generate calls to those interfaces. There are a few other items to be addressed in future PRs: * The AOTInductor runtime interface still takes lists of aten tensors as input and output * The interaction with ProxyExecutor (general fallback support) needs to move away from aten tensor * Remove all references to aten/c10 headers in the AOTInductor-generated code Differential Revision: D49302669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109391 Approved by: https://github.com/chenyang78	2023-09-16 16:46:26 +00:00
Yu, Guangye	b1f21399c8	Prerequisite of ATen/native/utils header for C++ extension (#109013 ) # Motivate Without this PR, if we would like to include the header file like ```#include <ATen/native/ForeachUtils.h>``` in our C++ extension, it will raise a Error ```/home/xxx/torch/include/ATen/native/ForeachUtils.h:7:10: fatal error: 'ATen/native/utils/ParamsHash.h' file not found```. We should fix it. # Solution Add the ATen/native/utils header file in the build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109013 Approved by: https://github.com/ezyang	2023-09-12 02:30:45 +00:00
Bin Bao	60bd30ee0b	[inductor] Move AOTInductor runtime headers (#108564 ) Summary: Move AOTInductor runtime header files into its own subdirectory, to separate them from to-be-added libtorch C interface. Reviewed By: frank-wei Differential Revision: D48905038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108564 Approved by: https://github.com/frank-wei	2023-09-06 11:50:41 +00:00
Huy Do	4084d039b7	Only add triton dependency to CUDA and ROCm binaries if it hasn't been set as an installation requirement yet (#108424 ) The dependency was added twice before in CUDA and ROCm binaries, one as an installation dependency from builder and the later as an extra dependency for dynamo, for example: ``` Requires-Python: >=3.8.0 Description-Content-Type: text/markdown License-File: LICENSE License-File: NOTICE Requires-Dist: filelock Requires-Dist: typing-extensions Requires-Dist: sympy Requires-Dist: networkx Requires-Dist: jinja2 Requires-Dist: fsspec Requires-Dist: pytorch-triton (==2.1.0+e6216047b8) Provides-Extra: dynamo Requires-Dist: pytorch-triton (==2.1.0+e6216047b8) ; extra == 'dynamo' Requires-Dist: jinja2 ; extra == 'dynamo' Provides-Extra: opt-einsum Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum' ``` In the previous release, we needed to remove this part from `setup.py` to build release binaries https://github.com/pytorch/pytorch/pull/96010. With this, that step isn't needed anymore because the dependency will come from builder. ### Testing Using the draft https://github.com/pytorch/pytorch/pull/108374 for testing and manually inspect the wheels artifact at https://github.com/pytorch/pytorch/actions/runs/6045878399 (don't want to go through all `ciflow/binaries` again) * torch-2.1.0.dev20230901+cu121-cp39-cp39-linux_x86_64 ``` Requires-Python: >=3.8.0 Description-Content-Type: text/markdown Requires-Dist: filelock Requires-Dist: typing-extensions Requires-Dist: sympy Requires-Dist: networkx Requires-Dist: jinja2 Requires-Dist: fsspec Requires-Dist: pytorch-triton (==2.1.0+e6216047b8) <-- This will be 2.1.0 on the release branch after https://github.com/pytorch/builder/pull/1515 Provides-Extra: dynamo Requires-Dist: jinja2 ; extra == 'dynamo' Provides-Extra: opt-einsum Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum' ``` * torch-2.1.0.dev20230901+cu121.with.pypi.cudnn-cp39-cp39-linux_x86_64 ``` Requires-Python: >=3.8.0 Description-Content-Type: text/markdown Requires-Dist: filelock Requires-Dist: typing-extensions Requires-Dist: sympy Requires-Dist: networkx Requires-Dist: jinja2 Requires-Dist: fsspec Requires-Dist: pytorch-triton (==2.1.0+e6216047b8) Requires-Dist: nvidia-cuda-nvrtc-cu12 (==12.1.105) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cuda-runtime-cu12 (==12.1.105) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cuda-cupti-cu12 (==12.1.105) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cudnn-cu12 (==8.9.2.26) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cublas-cu12 (==12.1.3.1) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cufft-cu12 (==11.0.2.54) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-curand-cu12 (==10.3.2.106) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cusolver-cu12 (==11.4.5.107) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cusparse-cu12 (==12.1.0.106) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-nccl-cu12 (==2.18.1) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-nvtx-cu12 (==12.1.105) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: triton (==2.1.0) ; platform_system == "Linux" and platform_machine == "x86_64" <--This is 2.1.0 because it already has https://github.com/pytorch/pytorch/pull/108423, but the package doesn't exist yet atm Provides-Extra: dynamo Requires-Dist: jinja2 ; extra == 'dynamo' Provides-Extra: opt-einsum Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum' ``` * torch-2.1.0.dev20230901+rocm5.6-cp38-cp38-linux_x86_64 ``` Requires-Python: >=3.8.0 Description-Content-Type: text/markdown Requires-Dist: filelock Requires-Dist: typing-extensions Requires-Dist: sympy Requires-Dist: networkx Requires-Dist: jinja2 Requires-Dist: fsspec Requires-Dist: pytorch-triton-rocm (==2.1.0+34f8189eae) <-- This will be 2.1.0 on the release branch after https://github.com/pytorch/builder/pull/1515 Provides-Extra: dynamo Requires-Dist: jinja2 ; extra == 'dynamo' Provides-Extra: opt-einsum Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108424 Approved by: https://github.com/atalman	2023-09-02 01:16:18 +00:00
drisspg	182a9cf366	Add Independent Memory Efficient and Flash Attention Build Flags (#107985 ) # Summary In an effort to simplify https://github.com/pytorch/pytorch/pull/105602, this PR pulls out independent chunks of code that can be landed prior to FlashV2 landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107985 Approved by: https://github.com/cpuhrsch	2023-08-28 18:39:18 +00:00
PyTorch MergeBot	22cade56ba	Revert "[Reland] Upgrade NVTX to NVTX3 (#97582 )" This reverts commit `5bbfb96203`. Reverted https://github.com/pytorch/pytorch/pull/97582 on behalf of https://github.com/izaitsevfb due to Breaks meta RL builds ([comment](https://github.com/pytorch/pytorch/pull/97582#issuecomment-1679568525))	2023-08-15 20:55:12 +00:00
cyy	5bbfb96203	[Reland] Upgrade NVTX to NVTX3 (#97582 ) PR #90689 replaces NVTX with NVTX3. However, the torch::nvtoolsext is created only when the third party NVTX is used. This is clear a logical error. We now move the creation code out of the branch to cover all cases. This should fix the issues reported in the comments of #90689. It would be better to move configurations of the failed FRL jobs to CI tests so that we can find such issues early before merging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97582 Approved by: https://github.com/peterbell10	2023-08-14 16:55:25 +00:00
shibo19	6691413145	export torch/csrc/dynamo/*.h (#106757 ) Fixes #ISSUE_NUMBER as title, we need the header files in torch/csrc/dynamo, so to export it. could you have a look? @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/106757 Approved by: https://github.com/albanD	2023-08-09 03:57:49 +00:00
shibo19	26846546e8	export tools/autograd to torchgen package (#106663 ) Fixes #ISSUE_NUMBER as discussed here https://github.com/pytorch/pytorch/pull/105003, I have exported tools/autograd to torchgen package, and could you have a look? @zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106663 Approved by: https://github.com/zou3519	2023-08-07 16:14:51 +00:00
Jesse Cai	f81f9093ec	[core][pruning][feature] cuSPARSELt build integration (#103700 ) Summary: This stack of PR's integrates cuSPARSELt into PyTorch. This PR adds support for cuSPARSELt into the build process. It adds in a new flag, USE_CUSPARSELT that defaults to false. When USE_CUSPASRELT=1 is specified, the user can also specify CUSPASRELT_ROOT, which defines the path to the library. Compiling pytorch with cusparselt support can be done as follows: `` USE_CUSPARSELT=1 CUSPARSELT_ROOT=/path/to/cusparselt python setup.py develop ``` Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/103700 Approved by: https://github.com/albanD	2023-08-02 12:48:39 +00:00
Edward Z. Yang	f70844bec7	Enable UFMT on a bunch of low traffic Python files outside of main files (#106052 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106052 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-07-27 01:01:17 +00:00
Justin Chu	4cc1745b13	[BE] f-stringify torch/ and scripts (#105538 ) This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`. - https://docs.python.org/3/reference/lexical_analysis.html#f-strings - https://pypi.org/project/flynt/ Command used: ``` flynt torch/ -ll 120 flynt scripts/ -ll 120 flynt tools/ -ll 120 ``` and excluded `collect_env.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-21 19:35:24 +00:00
George White	803d58a408	Add TensorPipe header files to Python package (#105521 ) This change adds the TensorPipe header files to `torch_package_data` if `USE_DISTRIBUTED` is set to `ON` in the CMake cache. The TensorPipe library and CMake config is already available in the Torch wheel, but the headers are not. This resolves issue where out-of-tree backends could not implement TensorPipe converters, because the definition of the `tensorpipe::Message` struct is defined in the TensorPipe headers. Fixes #105224. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105521 Approved by: https://github.com/albanD	2023-07-20 16:06:00 +00:00
Justin Chu	14d87bb5ff	[BE] Enable ruff's UP rules and autoformat tools and scripts (#105428 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105428 Approved by: https://github.com/albanD, https://github.com/soulitzer, https://github.com/malfet	2023-07-19 01:24:44 +00:00
Bin Bao	b10de43c0a	Add aot_inductor as a test backend for benchmarking (#105221 ) Summary: Original PR at https://github.com/pytorch/pytorch/pull/104977. Landing from fbcode instead. Add an aot_inductor backend (Export+AOTInductor) in the benchmarking harness. Note it is not a dynamo backend. Moved files from torch/_inductor/aot_inductor_include to torch/csrc/inductor as a more standard way for exposing headers Created a caching function in benchmarks/dynamo/common.py for compiling, loading and caching the .so file, as a proxy for a pure C++ deployment, but easier for benchmarking. Differential Revision: D47452591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105221 Approved by: https://github.com/jansel	2023-07-18 13:16:36 +00:00
Bin Bao	528ab477ce	[reland][inductor] Register an op for mm_plus_mm (#105153 ) Summary: Reland https://github.com/pytorch/pytorch/pull/104835 after fixing internal build issues Test Plan: CI Differential Revision: D47442849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105153 Approved by: https://github.com/clee2000	2023-07-14 14:35:29 +00:00
Catherine Lee	c36dca7bc5	Revert "[inductor] Register an op for mm_plus_mm (#104835 )" (#105150 ) This reverts commit `9c46a1620c`. Actual revert referenced in https://github.com/pytorch/pytorch/pull/105149 #104835 is causing internal builds to fail Pull Request resolved: https://github.com/pytorch/pytorch/pull/105150 Approved by: https://github.com/atalman	2023-07-13 17:13:45 +00:00
Bin Bao	9c46a1620c	[inductor] Register an op for mm_plus_mm (#104835 ) Summary: Currently the aten version of mm_plus_mm has no cpp implementation, and thus cpp_wrapper can not generate the correct cpp function call for it. Differential Revision: [D47372057](https://our.internmc.facebook.com/intern/diff/D47372057) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104835 Approved by: https://github.com/jansel, https://github.com/SherlockNoMad	2023-07-12 02:34:02 +00:00
Edward Z. Yang	3dc4adc7a6	Don't build CUDA with debug info by default. (#102617 ) Fixes https://github.com/pytorch/pytorch/issues/102594 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102617 Approved by: https://github.com/malfet	2023-07-05 20:16:19 +00:00
Xu Han	6c1ccccf21	Enable mimalloc on pytorch Windows (#102595 ) This PR is implemention of [#102534](https://github.com/pytorch/pytorch/issues/102534), option 2. Major changes: 1. Add mimalloc to the submodule. 2. Add build option "USE_MIMALLOC". 3. It is only enabled on Windows build, And it would improve pytorch memory allocation performance. Additional Test: <img width="953" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/4b2ec2dc-16f1-4ad9-b457-cfeb37e489d3"> This PR also build & static link mimalloc on Linux well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102595 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-06-27 08:53:26 +00:00
Yang Chen	d2281e38ae	Adds the initial support for AOTInductor model and interface (#104202 ) This PR combines the C++ code for the AOTInductor's model and interface with Bin Bao's changes to AOTInductor codegen. It adds a number of AOTInductor C interfaces that can be used by an inference runtime. Under the hood of the interfaces, the model code generated by the AOTInductor's codegen is wrapped into a class, AOTInductorModel, which manages tensors and run the model inference. On top of AOTInductorModel, we provide one more abstract layer, AOTInductorModelContainer, which allows the user to have multiple inference runs concurrently for the same model. This PR also adjusts the compilation options for AOT codegen, particularly some fbcode-related changes such as libs to be linked and header-file search paths. Note that this is the very first version of the AOTInductor model and interface, so many features (e.g. dynamic shape) are incomplete. We will support those missing features in in future PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104202 Approved by: https://github.com/desertfire	2023-06-27 00:37:26 +00:00
PyTorch MergeBot	2c313e7b99	Revert "Record view stacks if running anomaly mode (#103185 )" This reverts commit `a02c573a89`. Reverted https://github.com/pytorch/pytorch/pull/103185 on behalf of https://github.com/izaitsevfb due to Breaks internal builds, see D46629734 ([comment](https://github.com/pytorch/pytorch/pull/103185#issuecomment-1588258206))	2023-06-12 23:52:10 +00:00
Edward Z. Yang	a02c573a89	Record view stacks if running anomaly mode (#103185 ) Now, when you do an inplace mutation and the view is naughty, you get this message: ``` RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone, it is forbidden. You can clarify your code by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked). To find out where this view was allocated, run your entire forward region under anomaly mode (torch.autograd.detect_anomaly(check_nan=False)). ``` When you run under anomaly mode, you get: ``` RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone, it is forbidden. You can clarify your code by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked). This view was allocated at: File "/data/users/ezyang/c/pytorch/test/test_autograd.py", line 4299, in arglebargle File "/data/users/ezyang/c/pytorch/test/test_autograd.py", line 4306, in test_anomaly_gives_view_stack File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/case.py", line 549, in _callTestMethod File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/case.py", line 591, in run File "/data/users/ezyang/c/pytorch/torch/testing/_internal/common_utils.py", line 2266, in _run_with_retry File "/data/users/ezyang/c/pytorch/torch/testing/_internal/common_utils.py", line 2337, in run File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/case.py", line 650, in __call__ File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/suite.py", line 122, in run File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/suite.py", line 84, in __call__ File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/suite.py", line 122, in run File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/suite.py", line 84, in __call__ File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/runner.py", line 184, in run File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/main.py", line 271, in runTests File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/main.py", line 101, in __init__ File "/data/users/ezyang/c/pytorch/torch/testing/_internal/common_utils.py", line 894, in run_tests File "/data/users/ezyang/c/pytorch/test/test_autograd.py", line 11209, in <module> ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103185 Approved by: https://github.com/zdevito	2023-06-09 16:56:28 +00:00
Li-Huai (Allan) Lin	3c0072e7c0	[MPS] Prerequisite for MPS C++ extension (#102483 ) in order to add mps kernels to torchvision codebase, we need to expose mps headers and allow objc++ files used in extensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102483 Approved by: https://github.com/malfet	2023-06-07 17:28:31 +00:00
lkct	9567aaebe5	Package `torch/*.pyi` type hints (#103016 ) Including `torch._VF` and `torch.return_types` These are generated by: `4003e96ca1/tools/pyi/gen_pyi.py (L1139-L1155)` Ref #99541 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103016 Approved by: https://github.com/Skylion007	2023-06-05 23:08:10 +00:00
Nikita Shulga	49d0d1d79f	Update XLA pin (#102446 ) Updating the pin to the same hash as https://github.com/pytorch/pytorch/pull/100922 On the XLA side, build have switch from CMake to bazel, which requires number of changes on PyTorch side: - Copy installed headers back to the `torch/` folder before starting the build - Install `torch/csrc/lazy/python/python_utils.h` - Define `LD_LIBRARY_PATH` TODO: - Enable bazel caching - Pass CXX11_ABI flag to `//test/cpp:all` to reuse build artifacts from `//:_XLAC.so` <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at cd4768b</samp> > _To fix the XLA tests that were failing_ > _We updated the submodule and scaling_ > _We added `python_util.h`_ > _And copied `torch` as well_ > _And set `LD_LIBRARY_PATH` for linking_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/102446 Approved by: https://github.com/huydhn	2023-06-01 02:04:07 +00:00
lantiankaikai	17166c2511	python_arg_parser to allow fake tensor element in symint_list when in dynamo mode #95424 (#97508 ) Failing mechanism on #95424 : In dynamo mode, when passing numpy.int_ to 'shape' like param (Sequence[Union[int, symint]]) is wrapped as list with FakeTensor. However, in python_arg_parser, parser expect int in symint_list but got FakeTensor. Following #85759, this PR allow tensor element in symint_list when in dynamo mode This PR also fix below test with similar failing mechanism pytest ./generated/test_huggingface_diffusers.py -k test_016 pytest ./generated/test_ustcml_RecStudio.py -k test_036 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/97508 Approved by: https://github.com/yanboliang	2023-05-31 19:19:17 +00:00
mikey dagitses	979f55d3bc	implementation of DataPtr context for copy-on-write tensors (#100818 ) implementation of DataPtr context for copy-on-write tensors Summary: Copy-on-write storage ===================== This library adds support for copy-on-write storage, i.e. lazy copies, to tensors. The design maintains the PyTorch invariant that tensors alias if and only if they share a storage. Thus, tensors that are lazy copies of one another will have distinct storages that share a data allocation. Thread-safety ------------- The correctness of this design hinges on the pre-existing PyTorch user requirement (and general default programming assumption) that users are responsible for guaranteeing that writes do not take places concurrently with reads and other writes. Lazily copied tensors add a complication to this programming model because users are not required to know if lazy copies exist and are not required to serialize writes across lazy copies. For example: two tensors with distinct storages that share a copy-on-write data context may be given to different threads that may do whatever they wish to them, and the runtime is required to guarantee its safety. It turns out that this is not that difficult to protect because, due to the copy-on-write requirement, we just need to materialize a tensor upon writing. This could be done entirely without synchronization if we materialized each copy, however, we have a common-sense optimization to elide the copy for the last remaining reference. This requires waiting for any pending copies. ### Thread-safety detailed design There are two operations that affect the copy-on-write details of a tensor: 1) lazy-clone (e.g. an explicit call or a hidden implementation detail added through an operator like reshape) 2) materialization (i.e. any write to the tensor) The key insight that we exploit is that lazy-clone is logically a read operation and materialization is logically a write operation. This means that, for a given set of tensors that share a storage, if materialization is taking place, no other read operation, including lazy-clone, can be concurrent with it. However, this insight only applies within a set of tensors that share a storage. We also have to be concerned with tensors with different storages that share a copy-on-write context. In this world, materialization can race with lazy-clone or even other materializations. _However_, in order for this to be the case, there must be _at least_ two references to the context. This means that the context _can not_ vanish out from under you if you are performing a lazy-clone, and hence, it only requires an atomic refcount bump. The most complicated case is that all lazy-copies are concurrently materializing. In this case, because a write is occurring, there are no in-flight lazy-copies taking place. We must simply ensure that all lazy-copies are able to materialize (read the data) concurrently. If we didn't have the aforementioned optimization where the last copy steals the data, we could get away with no locking whatsoever: each makes a copy and decrements the refcount. However, because of the optimization, we require the loser of the materializing race wait for the pending copies to finish, and then steal the data without copying it. We implement this by taking a shared lock when copying the data and taking an exclusive lock when stealing the data. The exclusive lock acquisition ensures that all pending shared locks are finished before we steal the data. Test Plan: 100% code coverage. --- Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/100818). * #100821 * #100820 * #100819 * __->__ #100818 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100818 Approved by: https://github.com/ezyang	2023-05-11 11:13:51 +00:00
Nikita Shulga	08ef92e711	Delete Python-2 checks from setup.py (#101112 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 557960b</samp> > _`Python 2` is gone_ > _PyTorch cleans up its code_ > _Winter of legacy_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/101112 Approved by: https://github.com/kit1980, https://github.com/albanD	2023-05-10 20:17:31 +00:00
Iris	466adab7c4	Add fsspec to PT setup.py (#99768 ) Follow up for https://github.com/pytorch/pytorch/pull/96532. Including this in setup.py so the package will be available for CI. Fsspec package size: ``` du -h /fsx/users/irisz/conda/envs/pytorch/lib/python3.9/site-packages/fsspec-2023.3.0-py3.9.egg 264K /fsx/users/irisz/conda/envs/pytorch/lib/python3.9/site-packages/fsspec-2023.3.0-py3.9.egg/fsspec/__pycache__ 58K /fsx/users/irisz/conda/envs/pytorch/lib/python3.9/site-packages/fsspec-2023.3.0-py3.9.egg/fsspec/implementations/__pycache__ 377K /fsx/users/irisz/conda/envs/pytorch/lib/python3.9/site-packages/fsspec-2023.3.0-py3.9.egg/fsspec/implementations 1017K /fsx/users/irisz/conda/envs/pytorch/lib/python3.9/site-packages/fsspec-2023.3.0-py3.9.egg/fsspec 96K /fsx/users/irisz/conda/envs/pytorch/lib/python3.9/site-packages/fsspec-2023.3.0-py3.9.egg/EGG-INFO 1.2M /fsx/users/irisz/conda/envs/pytorch/lib/python3.9/site-packages/fsspec-2023.3.0-py3.9.egg ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99768 Approved by: https://github.com/kit1980	2023-04-25 01:34:08 +00:00
Nikita Shulga	32cd05ae60	Package `torch.fx` type hints (#99541 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at ca3aab4</samp> > _`fx` module traced_ > _Symbolic graphs transformed_ > _Type stubs for winter_ Fixes https://github.com/pytorch/pytorch/issues/99530 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99541 Approved by: https://github.com/kit1980, https://github.com/Chillee	2023-04-19 22:00:07 +00:00
Jithun Nair	ce4df4cc59	Enable triton build in CI docker image for ROCm (#98096 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98096 Approved by: https://github.com/malfet	2023-04-11 09:02:19 +00:00
PyTorch MergeBot	cb3c478069	Revert "refactor(add privateuseone floder in aten/src/ATen): add a PrivateUse… (#98127 )" This reverts commit `5a537e291d`. Reverted https://github.com/pytorch/pytorch/pull/98127 on behalf of https://github.com/weiwangmeta due to Sorry, our internal code is not ready to take such changes	2023-04-08 05:32:21 +00:00
ykddd	5a537e291d	refactor(add privateuseone floder in aten/src/ATen): add a PrivateUse… (#98127 ) Add a PrivateUse1 folder to contain all the feature adaptations for PrivateUse1 under Aten,For example GetGeneratorPrivate which is used for the three-party backend to register his own Generator implementation.This makes it easier for us to centrally manage these features, and it will increase the convenience of adaptation for different back-end manufacturers. For more info: https://github.com/pytorch/pytorch/issues/98073 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98127 Approved by: https://github.com/bdhirsh	2023-04-07 03:43:16 +00:00
jjsjann123	7282be3d91	Patch for nvfuser build (#97404 ) 1. Packaging nvfuser header for support c++ build against nvfuser; 2. Moving `#include <torch/csrc/jit/codegen/fuser/interface.h>` from `torch/csrc/jit/runtime/register_ops_utils.h` to `torch/csrc/jit/runtime/register_prim_ops_fulljit.cpp` to avoid missing header, since pytorch doesn't package `interface.h`; 3. Patching DynamicLibrary load of nvfuser to leak the handle, this avoids double de-allocation of `libnvfuser_codegen.so`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97404 Approved by: https://github.com/davidberard98	2023-03-28 23:36:08 +00:00
Han Qi (qihqi)	b895a0a675	[BE] Move flatbuffer related python C bindings to script_init (#97476 ) Summary: Extra C binding module for flatbuffer was introduced because not all dependencies of Pytorch want (or can) bundle in flatbuffer. However, flatbuffer is in by default now so this separate binding is not longer needed. Test Plan: existing unit tests Differential Revision: D44352583 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97476 Approved by: https://github.com/dbort	2023-03-28 17:56:32 +00:00
PyTorch MergeBot	5170995b2a	Revert "Upgrade NVTX to NVTX3 (#90689 )" This reverts commit `e64ddd1ab9`. Reverted https://github.com/pytorch/pytorch/pull/90689 on behalf of https://github.com/osalpekar due to Build Failures due to not being able to find one nvtx3 header in FRL jobs: [D42332540](https://www.internalfb.com/diff/D42332540)	2023-03-24 18:16:06 +00:00
cyy	e64ddd1ab9	Upgrade NVTX to NVTX3 (#90689 ) Due to recent upgrade to CUDA 11, we can upgrade NVTX to NVTX3 as well, which is a header only library that can simplify the building system a lot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90689 Approved by: https://github.com/soumith, https://github.com/malfet	2023-03-23 01:56:42 +00:00
Nikita Shulga	1ab883797a	[BE] Dedup hardcoded triton versions (#96580 ) Define it once in `.ci/docker/trition_version.txt` and use everywhere. Also, patch version defined in `triton/__init__.py` as currently it always returns `2.0.0` even if package name is `2.1.0` Followup after https://github.com/pytorch/pytorch/pull/95896 where version needed to be updated in 4+ places Pull Request resolved: https://github.com/pytorch/pytorch/pull/96580 Approved by: https://github.com/huydhn	2023-03-12 20:00:48 +00:00
PyTorch MergeBot	30b968f60d	Revert "[BE] Dedup hardcoded triton versions (#96580 )" This reverts commit `c131e51e62`. Reverted https://github.com/pytorch/pytorch/pull/96580 on behalf of https://github.com/malfet due to Forgot to fix lint	2023-03-12 19:37:52 +00:00
Nikita Shulga	c131e51e62	[BE] Dedup hardcoded triton versions (#96580 ) Define it once in `.ci/docker/trition_version.txt` and use everywhere. Also, patch version defined in `triton/__init__.py` as currently it always returns `2.0.0` even if package name is `2.1.0` Followup after https://github.com/pytorch/pytorch/pull/95896 where version needed to be updated in 4+ places Pull Request resolved: https://github.com/pytorch/pytorch/pull/96580 Approved by: https://github.com/huydhn	2023-03-12 16:56:04 +00:00
Natalia Gimelshein	76cac70939	new triton main pin (#95896 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95896 Approved by: https://github.com/jansel, https://github.com/malfet	2023-03-10 06:30:41 +00:00
cyy	6786a24fd2	fix some tiny code issues (#95757 ) This PR tries to fix: 1. a misspelled NDEBUG preprocessing condition. 2. get ride of all writable-strings warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95757 Approved by: https://github.com/soulitzer	2023-03-01 23:27:32 +00:00
Wei Wang	46f092dc66	Add jinja2 as mandatory dependency (#95691 ) Should fix #95671 for nightly wheels issue. v2.0.0 RC does not need this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95691 Approved by: https://github.com/malfet	2023-03-01 17:28:55 +00:00
cyy	f27e09de04	Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927 ) This PR do two things: 1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang. 2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927 Approved by: https://github.com/malfet	2023-02-27 19:22:20 +00:00
donnyyou	5d70ee93fa	Expose more headers for extensions. (#95447 ) Fixes #ISSUE_NUMBER Expose more headers for extensions of distributed methods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95447 Approved by: https://github.com/ezyang	2023-02-27 18:59:40 +00:00
jjsjann123	21eb7f70f1	Nvfuser python API import fix (#94036 ) 1. Having nvfuser python API import working with both devel and upstream; 2. Add environment variable to allow custom nvfuser code base to be built with upstream pytorch core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94036 Approved by: https://github.com/malfet, https://github.com/davidberard98	2023-02-16 20:10:40 +00:00
Douglas Lehr	77d1135566	[ROCm] Pyt 2.0 rocm staging (#94660 ) Add triton support for ROCm builds of PyTorch. * Enables inductor and dynamo when rocm is detected * Adds support for pytorch-triton-mlir backend * Adds check_rocm support for verify_dynamo.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/94660 Approved by: https://github.com/malfet	2023-02-15 06:15:18 +00:00
Wen Chen	69bcefceec	[ROCm] Added MIOpen header files to installation package for ROCm. (#92969 ) Added MIOpen header files to installation package for building Pytorch extensions that requires MIOpen as a dependency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92969 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2023-02-14 21:43:31 +00:00
Xuehai Pan	69e0bda999	[BE] Import `Literal`, `Protocol`, and `Final` from standard library `typing` as of Python 3.8+ (#94490 ) Changes: 1. `typing_extensions -> typing-extentions` in dependency. Use dash rather than underline to fit the [PEP 503: Normalized Names](https://peps.python.org/pep-0503/#normalized-names) convention. ```python import re def normalize(name): return re.sub(r"[-_.]+", "-", name).lower() ``` 2. Import `Literal`, `Protocal`, and `Final` from standard library as of Python 3.8+ 3. Replace `Union[Literal[XXX], Literal[YYY]]` to `Literal[XXX, YYY]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94490 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-09 19:17:49 +00:00
Soumith Chintala	76b999803a	add filelock as a dependency (#91607 ) `filelock` is a dependency now for inductor's caching mechanism and CPU backend. Add `filelock` as a dependency Fixes https://github.com/pytorch/pytorch/issues/93499 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91607 Approved by: https://github.com/anijain2305, https://github.com/jansel	2023-02-01 17:30:55 +00:00
Nikita Shulga	5976f0bdfe	Set min supported Python version to 3.8 (#93155 ) Also, grep for `if sys.version_info .cond. (3, 8)` and replaces them with appropriate action. This is a last in a series of PRs that moved CI/CD away from testing PyTorch behavior against Python-3.7. Fixes https://github.com/pytorch/pytorch/issues/80513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93155 Approved by: https://github.com/huydhn	2023-01-29 18:28:46 +00:00
jjsjann123	c11b301bcd	[NVFUSER] refactor nvfuser build (#89621 ) This PR is the first step towards refactors the build for nvfuser in order to have the coegen being a standalone library. Contents inside this PR: 1. nvfuser code base has been moved to `./nvfuser`, from `./torch/csrc/jit/codegen/cuda/`, except for registration code for integration (interface.h/interface.cpp) 2. splits the build system so nvfuser is generating its own `.so` files. Currently there are: - `libnvfuser_codegen.so`, which contains the integration, codegen and runtime system of nvfuser - `nvfuser.so`, which is nvfuser's python API via pybind. Python frontend is now exposed via `nvfuser._C.XXX` instead of `torch._C._nvfuser` 3. nvfuser cpp tests is currently being compiled into `nvfuser_tests` 4. cmake is refactored so that: - nvfuser now has its own `CMakeLists.txt`, which is under `torch/csrc/jit/codegen/cuda/`. - nvfuser backend code is not compiled inside `libtorch_cuda_xxx` any more - nvfuser is added as a subdirectory under `./CMakeLists.txt` at the very end after torch is built. - since nvfuser has dependency on torch, the registration of nvfuser at runtime is done via dlopen (`at::DynamicLibrary`). This avoids circular dependency in cmake, which will be a nightmare to handle. For details, look at `torch/csrc/jit/codegen/cuda/interface.cpp::LoadingNvfuserLibrary` Future work that's scoped in following PR: - Currently since nvfuser codegen has dependency on torch, we need to refactor that out so we can move nvfuser into a submodule and not rely on dlopen to load the library. @malfet - Since we moved nvfuser into a cmake build, we effectively disabled bazel build for nvfuser. This could impact internal workload at Meta, so we need to put support back. cc'ing @vors Pull Request resolved: https://github.com/pytorch/pytorch/pull/89621 Approved by: https://github.com/davidberard98	2023-01-26 02:50:44 +00:00
Driss Guessous	4bc0491752	Add USE_FLASH_ATTENTION flag to setup.py (#92903 ) # Summary Adds documentation to setup.py for USE_FLASH_ATTENTION=0 disabling to decrease build times. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92903 Approved by: https://github.com/cpuhrsch, https://github.com/bdhirsh	2023-01-24 22:59:51 +00:00
Jason Ansel	7c1c239db1	[inductor] Rewrite Triton templates + epilogue fusion (retry) (#91575 ) This reverts commit `94262efc7d` to reland #91105 / #90738. Fixes https://github.com/pytorch/torchdynamo/issues/2015 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91575 Approved by: https://github.com/ngimel	2023-01-11 00:08:03 +00:00
Adrian Ostrowski	d0a4e2e782	Don't remove files across the whole OS on clean (#91503 ) setup.py clean now won't remove paths matching .gitignore patterns across the entire OS. Instead, now only files from the repository will be removed. `/build_*` had to be removed from .gitignore because with the wildcard fixed, build_variables.bzl file was deleted on cleanup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91503 Approved by: https://github.com/soumith	2023-01-06 05:13:51 +00:00

1 2 3 4 5 ...

921 Commits