Summary:
This PR greatly simplifies `mypy-strict.ini` by strictly typing everything in `.github` and `tools`, rather than picking and choosing only specific files in those two dirs. It also removes `warn_unused_ignores` from `mypy-strict.ini`, for reasons described in https://github.com/pytorch/pytorch/pull/56402#issuecomment-822743795: basically, that setting makes life more difficult depending on what libraries you have installed locally vs in CI (e.g. `ruamel`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59117
Test Plan:
```
flake8
mypy --config mypy-strict.ini
```
Reviewed By: malfet
Differential Revision: D28765386
Pulled By: samestep
fbshipit-source-id: 3e744e301c7a464f8a2a2428fcdbad534e231f2e
Summary:
[distutils](https://docs.python.org/3/library/distutils.html) is on its way out and will be deprecated-on-import for Python 3.10+ and removed in Python 3.12 (see [PEP 632](https://www.python.org/dev/peps/pep-0632/)). There's no reason for us to keep it around since all the functionality we want from it can be found in `setuptools` / `sysconfig`. `setuptools` includes a copy of most of `distutils` (which is fine to use according to the PEP), that it uses under the hood, so this PR also uses that in some places.
Fixes#56527
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57040
Pulled By: driazati
Reviewed By: nikithamalgifb
Differential Revision: D28051356
fbshipit-source-id: 1ca312219032540e755593e50da0c9e23c62d720
Summary:
The Python traceback on a cmake invocation is meaningless to most developers, so this PR wraps it in a `try..catch` so we can ignore it and save scrolling through the 20-or-so lines.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55986
Pulled By: driazati
Reviewed By: wanchaol
Differential Revision: D27769304
fbshipit-source-id: 5889eea03db098d10576290abeeb4600029fb3f2
Summary:
We are trying to build libtorch statically (BUILD_SHARED_LIBS=OFF) then link it into a DLL. Our setup hits the infinite loop mentioned [here](54c05fa34e/torch/csrc/autograd/engine.cpp (L228)) because we build with `BUILD_SHARED_LIBS=OFF` but still link it all into a DLL at the end of the day.
This PR fixes the issue by changing the condition to guard on which windows runtime the build links against using the `CAFFE2_USE_MSVC_STATIC_RUNTIME` flag. `CAFFE2_USE_MSVC_STATIC_RUNTIME` defaults to ON when `BUILD_SHARED_LIBS=OFF`, so backwards compatibility is maintained.
I'm not entirely confident I understand the subtleties of the windows runtime versus linking setup, but this setup works for us and should not affect the existing builds.
Fixes https://github.com/pytorch/pytorch/issues/44470
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43532
Reviewed By: mrshenli
Differential Revision: D24053767
Pulled By: albanD
fbshipit-source-id: 1127fefe5104d302a4fc083106d4e9f48e50add8
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:
```2to3 -f future -w caffe2```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033
Reviewed By: seemethere
Differential Revision: D23808648
Pulled By: bugra
fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
Summary:
It is currently broken due to a ninja bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37917
Differential Revision: D21470357
Pulled By: ezyang
fbshipit-source-id: c0ed858c63a7504bf2c4961dd7ed906fc3f4502a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/26304
Test procedure:
With ninja:
[x] Build a clean checkout
[x] Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s.
[x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding.
[x] Modify DispatchTable.h. Build again. Result: `.cu` files are rebuilt, as well as many `.cpp` files
[x] Build for the fifth time. Result: Virtually instantaneous, with no extra rebuilding.
[x] Touch one of the `.depend` files. Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s.
Without ninja:
[x] Build a clean checkout
[x] Build again. Result: There is some unnecessary rebuilding. But it was also happening before this change.
[x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37661
Differential Revision: D21434624
Pulled By: ezyang
fbshipit-source-id: 379d2315486b8bb5972c184f9b8da8e00d38c338
Summary:
In Summary specify whether CUDA code is compiled with separate compilation enabled
Also, correctly handle space-separate TORCH_NVCC_FLAGS when adding them to NVCC_CUDA_FLAGS
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35726
Test Plan: CI + local build with TORCH_NVCC_FLAGS set to "-Xfatbin -compress-all"
Differential Revision: D20830885
Pulled By: malfet
fbshipit-source-id: 0e0ecab4a97b6c8662a2c4bfc817857da9f32201
Summary:
## Motivation
This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300.
DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version.
This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture.
<br>
## What's included?
Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes:
<br>
**General:**
1. Replace op-level allocator with global-registered allocator
```
// before
ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z);
// after
ideep::sum::compute(scales, {x, y}, z);
```
The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator.
```
RegisterEngineAllocator cpu_alloc(
ideep::engine::cpu_engine(),
[](size_t size) {
return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size);
},
[](void* p) {
c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p);
}
);
```
------
2. Simplify group convolution
We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case.
As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code.
```
// aten/src/ATen/native/mkldnn/Conv.cpp
if (w.ndims() == x.ndims() + 1) {
AT_ASSERTM(
groups > 1,
"Only group _mkldnn_conv2d weights could have been reordered to 5d");
kernel_size[0] = w.get_dim(0) * w.get_dim(1);
std::copy_n(
w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1);
} else {
std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin());
}
```
------
3. Enable DNNL built-in cache
Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and **no longer** caching buffers in order to reduce memory footprint.
This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before.
------
4. Use 64-bit integer to denote dimensions
We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector.
<br>
**Misc changes in each commit:**
**Commit:** change build options
Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`.
Old | New
-- | --
WITH_EXAMPLE | MKLDNN_BUILD_EXAMPLES
WITH_TEST | MKLDNN_BUILD_TESTS
MKLDNN_THREADING | MKLDNN_CPU_RUNTIME
MKLDNN_USE_MKL | N/A (not use MKL anymore)
------
**Commit:** aten reintegration
- aten/src/ATen/native/mkldnn/BinaryOps.cpp
Implement binary ops using new operation `binary` provided by DNNL
- aten/src/ATen/native/mkldnn/Conv.cpp
Clean up group convolution checks
Simplify conv backward integration
- aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp
Simplify prepacking convolution weights
- test/test_mkldnn.py
Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue
- torch/utils/mkldnn.py
Prepack weight tensor on module `__init__` to achieve better performance significantly
------
**Commit:** caffe2 reintegration
- caffe2/ideep/ideep_utils.h
Clean up unused type definitions
- caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc
Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit`
- caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc
Clean up group convolution checks
Revamp convolution API
- caffe2/ideep/operators/conv_transpose_op.cc
Clean up group convolution checks
Clean up deconv workaround code
------
**Commit:** custom allocator
- Register c10 allocator as mentioned above
<br><br>
## Performance
We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20.
ratio: new / old | Latency (batch=1 4T) | Throughput (batch=64 56T)
-- | -- | --
pytorch resnet18 | 121.4% | 99.7%
pytorch resnet50 | 123.1% | 106.9%
pytorch resnext101_32x8d | 116.3% | 100.1%
pytorch resnext50_32x4d | 141.9% | 104.4%
pytorch mobilenet_v2 | 163.0% | 105.8%
caffe2 alexnet | 303.0% | 99.2%
caffe2 googlenet-v3 | 101.1% | 99.2%
caffe2 inception-v1 | 102.2% | 101.7%
caffe2 mobilenet-v1 | 356.1% | 253.7%
caffe2 resnet101 | 100.4% | 99.8%
caffe2 resnet152 | 99.8% | 99.8%
caffe2 shufflenet | 141.1% | 69.0% †
caffe2 squeezenet | 98.5% | 99.2%
caffe2 vgg16 | 136.8% | 100.6%
caffe2 googlenet-v3 int8 | 100.0% | 100.7%
caffe2 mobilenet-v1 int8 | 779.2% | 943.0%
caffe2 resnet50 int8 | 99.5% | 95.5%
_Configuration:
Platform: Skylake 8180
Latency Test: 4 threads, warmup 30, iteration 500, batch size 1
Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_
† Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like **jemalloc** as a drop-in replacement for system allocator in such heavy workloads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422
Test Plan:
Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results
10% improvement for ResNext with avx512, neutral on avx2
More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP
Reviewed By: yinghai
Differential Revision: D20381325
Pulled By: dzhulgakov
fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77
Summary:
## Several flags
`/MP[M]`: It is a flag for the compiler `cl`. It leads to object-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC.
`/maxcpucount:[M]`: It is a flag for the generator `msbuild`. It leads to project-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC.
`/p:CL_MPCount=[M]`: It is a flag for the generator `msbuild`. It leads the generator to pass `/MP[M]` to the compiler.
`/j[M]`: It is a flag for the generator `ninja`. It leads to object-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC.
## Reason for the change
1. Object-level multiprocessing is preferred over project-level multiprocessing.
2. ~For ninja, we don't need to set `/MP` otherwise M * M processes will be spawned.~ Actually, it is not correct because in ninja configs, there are only one source file in the command. Therefore, the `/MP` switch should be useless.
3. For msbuild, if it is called through Python configuration scripts, then `/p:CL_MPCount=[M]` will be added, otherwise, we add `/MP` to `CMAKE_CXX_FLAGS`.
4. ~It may be a possible fix for https://github.com/pytorch/pytorch/issues/28271, https://github.com/pytorch/pytorch/issues/27463 and https://github.com/pytorch/pytorch/issues/25393. Because `/MP` is also passed to `nvcc`.~ It is probably not true. Because `/MP` should not be effective given there is only one source file per command.
## Reference
1. https://docs.microsoft.com/en-us/cpp/build/reference/mp-build-with-multiple-processes?view=vs-2019
2. https://github.com/Microsoft/checkedc-clang/wiki/Parallel-builds-of-clang-on-Windows
3. https://blog.kitware.com/cmake-building-with-all-your-cores/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33120
Differential Revision: D19817227
Pulled By: ezyang
fbshipit-source-id: f8d01f835016971729c7a8d8a0d1cb8a8c2c6a5f
Summary:
- Add a "BUILD_JNI" option that enables building PyTorch JNI bindings and
fbjni. This is off by default because it adds a dependency on jni.h.
- Update to the latest fbjni so we can inhibit building its tests,
because they depend on gtest.
- Set JAVA_HOME and BUILD_JNI in Linux binary build configurations if we
can find jni.h in Docker.
Test Plan:
- Built on dev server.
- Verified that libpytorch_jni links after libtorch when both are built
in a parallel build.
Differential Revision: D18536828
fbshipit-source-id: 19cb3be8298d3619352d02bb9446ab802c27ec66
Summary:
Except for the Windows default path, everything it does has been done in
FindCUDA.cmake. Search for nvcc in path has been added to FindCUDA.cmake (https://github.com/pytorch/pytorch/issues/29160). The Windows default path part is moved to
build_pytorch_libs.py. CUDA_HOME is kept for now because other parts of
the build system is still using it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28617
Differential Revision: D18347814
Pulled By: ezyang
fbshipit-source-id: 22bb7eccc17b559ce3efc1ca964e3fbb270b5b0f
Summary:
FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed.
Previously in https://github.com/pytorch/pytorch/issues/25482, one test failed because TensorRT detects cuDNN differently, and there may be situations we can find cuDNN but TensorRT cannot. This is fixed by passing our detection result down to TensorRT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25876
Differential Revision: D17346270
Pulled By: ezyang
fbshipit-source-id: c1e7ad4a1cb20f964fe07a72906f2f002425d894
Summary:
What dist_check.py does is largely merely determining whether we should
use set "USE_IBVERBS" to ON or OFF when the user sets "USE_GLOO_IBVERBS"
to ON. But this is unnecessary, because this complicated determination
will always be overrided by gloo:
2101e02cea/cmake/Dependencies.cmake (L19-L28)
Since dist_check.py becomes irrelevant, this commit also simplifies the
setting of `USE_DISTRIBUTED` (by removing its explicit setting in Python scripts), and deprecate `USE_GLOO_IBVERBS` in favor
of `USE_IBVERBS`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25879
Differential Revision: D17282395
Pulled By: pietern
fbshipit-source-id: a10735f50728d89c3d81fd57bcd26764e7f84dd1
Summary:
FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25482
Differential Revision: D17226408
Pulled By: ezyang
fbshipit-source-id: abd9cd0244cabea1f5d9f93f828d632d77c8dd5e
Summary:
It doesn't seem to be used anywhere once down to CMake in this repo or any submodules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25720
Differential Revision: D17225088
Pulled By: pietern
fbshipit-source-id: a24b080e6346a203b345e2b834fe095e3b9aece0
Summary:
Which was added in https://github.com/pytorch/pytorch/issues/16412.
Also make some CUDNN_* CMake variables to be build options so as to avoid direct reading using `$ENV` from environment variables from CMake scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24044
Differential Revision: D16783426
Pulled By: ezyang
fbshipit-source-id: cb196b0013418d172d0d36558995a437bd4a3986
Summary:
Not sure whether 34c0043aae still makes sense.
`USE_SYSTEM_EIGEN_INSTALL` is OFF by default (as set in CMakeLists.txt). If a user wants to change this build option, I don't see any reason to force them to do it in `CMakeCache.txt`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23990
Differential Revision: D16732569
Pulled By: ezyang
fbshipit-source-id: 4604b4a1d5857552ad02e76aee91641aea48801a
Summary:
Currently when reading CMakeCache.txt, only `VAR:TYPE=VAL` can be matched.
This works well for CMake-generated lines, but a user may add a line
without specifying type (`VAR=VAL`), which is totally legitimate in the
eyes of CMake. This improvements in regex ensure that `VAR:TYPE=VAL` is
also matched. The situation of `"VAR":TYPE=VAL` is also corrected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23745
Differential Revision: D16726514
Pulled By: ezyang
fbshipit-source-id: 6c50150d58926563837cf77d156c24d644666ef0
Summary:
Simplifying https://github.com/pytorch/pytorch/issues/23793: The dependency relationship between
{INSTALL,BUILD}_TEST is already properly handled in CMakeLists.txt. All
we need to do is to pass down INSTALL_TEST.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23806
Differential Revision: D16691833
Pulled By: soumith
fbshipit-source-id: 7607492b2d82db3f79b174373a92e2810a854a61
Summary:
This allows `INSTALL_*` to pass through to cmake.
Additional fix is that if `INSTALL_TEST` is specified, it wont use `BUILD_TEST` as the default value for `INSTALL_TEST`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23793
Differential Revision: D16648668
Pulled By: soumith
fbshipit-source-id: 52c2a0d8033bc556355b87a6731a577940de9859
Summary:
- MSVC_Z7_OVERRIDE has already handled in CMakeLists.txt. No need to process it for once more in the Python scripts.
- Option MSVC_Z7_OVERRIDE should be visible to the user only if MSVC is used.
- Move the setting of "/EHa" flag to CMakeLists.txt, where other MSVC-specific flags are processed. This also further prepares the removal of redundant cflags setup in Python build scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23455
Differential Revision: D16542274
Pulled By: ezyang
fbshipit-source-id: 4d3b8b07161478bbba8a21feb6ea24c9024e21ac
Summary:
Instead, defer its default value to CMakeLists.txt
NO_FBGEMM has already been handled in tools/setup_helpers/env.py
(although deprecated)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23314
Differential Revision: D16493580
Pulled By: ezyang
fbshipit-source-id: 7255eb1df5e8a6dd0362507d68da0986a9ed46e2
Summary:
Currently the build type is decided by the environment variable DEBUG
and REL_WITH_DEB_INFO. This commit also lets CMAKE_BUILD_TYPE be
effective. This makes the interface more consistent with CMake. This
also prepares https://github.com/pytorch/pytorch/issues/22776.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22875
Differential Revision: D16281663
Pulled By: ezyang
fbshipit-source-id: 952f92aad85ff59f1c7abe8256eca8a4a0936026
Summary:
---
How does the current code subsume all detections in the deleted `nccl.py`?
- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.
- The main NCCL detection happens in [FindNCCL.cmake](8377d4b32c/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](8377d4b32c/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.
- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.
- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
* The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA. `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.
* Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
- It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
- It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).
---
Regarding for relevant issues:
- https://github.com/pytorch/pytorch/issues/12063 and https://github.com/pytorch/pytorch/issues/2877: These are properly handled, as explained in the updated comment.
- https://github.com/pytorch/pytorch/issues/2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f81e A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:
> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22930
Differential Revision: D16440275
Pulled By: ezyang
fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
Summary:
Also revert the change of cmake.py in
c97829d701 . The comments are added to
prevent future similar incidents in the future (which has occurred a couple of times in the past).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22641
Differential Revision: D16171763
Pulled By: ezyang
fbshipit-source-id: 5a65f9fbb3c1c798ebd25521932bfde0ad3d16fc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22174
This is a preliminary change outlining the approach we plan to follow to integrate QNNPACK operators into the pytorch backend. The operators will not be made visible to the user in the python world, so ultimately we will have a function that calls qnnpack backend based on the environment being run on.
The goal of the project is to integrate QNNPACK library with PyTorch to achieve good performance for quantized mobile models.
Reviewed By: ljk53
Differential Revision: D15806325
fbshipit-source-id: c14e1d864ac94570333a7b14031ea231d095c2ae
Summary:
It's always set to equal USE_NCCL, we made Gloo depending on Caffe2 NCCL
build. See 30da84fbe1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22467
Differential Revision: D16098581
Pulled By: ezyang
fbshipit-source-id: f706ec7cebc2e6315bafca013b669f5a72e04815
Summary:
ROCm is already detected in cmake/public/LoadHIP.cmake. No need to
detect twice. Plus, the Python script read environment variable
ROCM_HOME, but what is really used in CMake scripts is ROCM_PATH -- A
user must specify both environment variables right. Since ROCM_HOME is
undocumented, this commit completely eradicates it.
---
ezyang A remake of https://github.com/pytorch/pytorch/issues/22228 because its dependency has been dismissed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22464
Differential Revision: D16096833
Pulled By: bddppq
fbshipit-source-id: fea461e80ee61ec77fa3a7b476f7aec4fc453d5d
Summary:
Currently the build system accepts USE_NAMEDTENSOR from the environment
variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake.
This discrepancy does not seem necessary and complicates the build
system. The naming of this build option is also semantically incorrect
("BUILD_" vis-a-vis "USE_"). This commit eradicate this issue before it
is made into a stable release.
The support of NO_NAMEDTENSOR is also removed, since PyTorch has been
quite inconsistent about "NO_*" build options.
---
Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360
Differential Revision: D16074509
Pulled By: zou3519
fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae
Summary:
`setup.py` recommends setting `USE_QNNPACK=0` and `USE_NNPACK=0` to disable building QNNPACK and NNPACK respectively. However this wasn't reflected correctly because we were looking for `NO_QNNPACK` and `NO_NNPACK`. This PR fixes it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22367
Differential Revision: D16067393
Pulled By: soumith
fbshipit-source-id: 6491865ade9a6d41b7a79d68fd586a7854051f28
Summary:
This is yet another step to disentangle Python build scripts and CMake
and improve their integration (Let CMake handle more build environment
detections, and less by our handcrafted Python scripts).
The processor detection logic also changed a bit: Instead of detecting
whether the system processor is PPC or ARM, this PR changes to detect
Intel CPUs, because this is more precise as MKL only supports Intel
CPUs. The build option `USE_MKLDNN` will also not be presented to
users on non-Intel processors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22215
Differential Revision: D16005953
Pulled By: ezyang
fbshipit-source-id: bf3f74d53609b3f835e280f63a872ff3c9352763
Summary:
Currently many build options are explicitly passed from Python build scripts to CMake. But this is unecessary, at least for many of them. This commit removes the build options that have the same name in CMakeLists.txt and environment variables (e.g., `USE_REDIS`). Additionally, many build options that are not explicitly passed to CMake are lost.
For `ONNX_ML`, `ONNX_NAMESPACE`, and `BUILDING_WITH_TORCH_LIBS`, I changed their default values in CMake scripts (as consistent with what the `CMake.defines` call meant), to avoid their default values being redundantly set in the Python build scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21877
Differential Revision: D15964996
Pulled By: ezyang
fbshipit-source-id: 127a46af7e2964885ffddce24e1a62995e0c5007
Summary:
Following up b811b6d5c0
* Use property instead of __setattr__ in CMake.
* Add a comment clarifying when built_ext.run is called.
---
cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21792
Differential Revision: D15860606
Pulled By: umanwizard
fbshipit-source-id: ba1fa07f58d4eac81ac27fa9dc7115d1cdd3dec0
Summary:
Currently when building extensions, variables such as USE_CUDA, USE_CUDNN are used to determine what libraries should be linked. But we should use what CMake has detected, because:
1. If CMake found them unavailable but the variables say some libraries should be linked, the build would fail.
2. If the first build is made using a set of non-default build options, rebuild must have these option passed to setup.py again, otherwise the extension build process is inconsistent with CMake. For example,
```bash
# First build
USE_CUDA=0 python setup.py install
# Subsequent builds like this would fail, unless "build/" is deleted
python setup.py install
```
This commit addresses the above issues by using variables from CMakeCache.txt when building the extensions.
---
The changes in `setup.py` may look lengthy, but the biggest changed block is mostly moving them into a function `configure_extension_build` (along with some variable names changed to `cmake_cache_vars['variable name']` and other minor changes), because it must be called after CMake has been called (and thus the options used and system environment detected by CMake become available).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21653
Differential Revision: D15824506
Pulled By: ezyang
fbshipit-source-id: 1e1eb7eec7debba30738f65472ccad966ee74028
Summary:
Fixes#21026.
1. Improve build docs for Windows
2. Change `BUILD_SHARED_LIBS=ON` for Caffe2 local builds
3. Change to out-source builds for LibTorch and Caffe2 (transferred to #21452)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21190
Differential Revision: D15695223
Pulled By: ezyang
fbshipit-source-id: 0ad69d7553a40fe627582c8e0dcf655f6f63bfdf
Summary:
Currently tools/build_pytorch_libs.py looks quite convoluted. This commit reorgnizes cmake related functions to a separate file to make the code clearer.
---
This is hopefully helpful for further contribution for better integration with cmake.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21367
Differential Revision: D15636991
Pulled By: soumith
fbshipit-source-id: 44d76e4e77aec0ce33cb32962b6a79a7f82785da