Commit Graph

81 Commits

Author SHA1 Message Date
Nikita Shulga
c373387709 Update CMake and use native CUDA language support (#62445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445

PyTorch currently uses the old style of compiling CUDA in CMake which is just a
bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as
a language just like C++ or C.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31503350

fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55
2021-10-11 09:05:48 -07:00
Nikita Shulga
923f06621c Fix Windows ninja builds when MAX_JOBS is specified (#65444)
Summary:
Reported by cloudhan in https://github.com/pytorch/pytorch/pull/64733#issuecomment-924545463

Fixes regression introduced by 047e68235f

cc malfet seemethere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65444

Reviewed By: dagitses, seemethere

Differential Revision: D31103260

Pulled By: malfet

fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284
2021-09-22 14:04:31 -07:00
Michael Dagitses
047e68235f delegate parallelism to Ninja when possible (#64733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64733

The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

Reviewed By: jbschlosser, driazati

Differential Revision: D30968796

Pulled By: dagitses

fbshipit-source-id: 29547dd378fea793957bcc2f7d52d5def1ecace2
2021-09-17 12:28:28 -07:00
XiaobingSuper
0561e104d9 fix build error when system cmake3 version >=3.5 but <=3.10 (#64914)
Summary:
For PyTorch source build using conda, there will raise an error in 8535418a06/CMakeLists.txt (L1) when we get a CMake version < 3.10, it can be fixed by upgrade CMake in conda env, but for centos, there has CMake3, PyTorch fist check whether CMake3's verison<=3.5, so if user's system camke<= 3.5, PyTorch will use the system's cmake3, which will have build error like:
```
CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
  CMake 3.10 or higher is required.  You are running version 3.6.3

-- Configuring incomplete, errors occurred!
```

we need to check CMake3 also >=3.10, if not, then check conda's CMake version.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64914

Reviewed By: jbschlosser

Differential Revision: D30901673

Pulled By: ezyang

fbshipit-source-id: 064e2c5bc0b9331d6ecd65cd700e5a42c3403790
2021-09-13 13:26:06 -07:00
Can Balioglu
e711b5ce6c Respect user-set CMAKE_PREFIX_PATH (#61904)
Summary:
Fixes the case where the `CMAKE_PREFIX_PATH` variable gets silently overwritten by a user specified environment variable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61904

Reviewed By: walterddr, malfet

Differential Revision: D29792014

Pulled By: cbalioglu

fbshipit-source-id: babacc8d5a1490bff1e14247850cc00c6ba9e6be
2021-08-13 13:49:05 -07:00
peterjc123
d16587f84d Enable rebuilds for Ninja on Windows (#62948)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59859.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62948

Reviewed By: seemethere, tktrungna

Differential Revision: D30192246

Pulled By: janeyx99

fbshipit-source-id: af25cc4bf0db67a1304d9971cfa0ff6831bb3b48
2021-08-09 16:15:45 -07:00
Isuru Fernando
273188549f pass through *EXITCODE *EXITCODE__TRYRUN_OUTPUT variables (#49646)
Summary:
This is needed to allow cross compiling to work

There are some `try_run` statements in CMake files used for building pytorch and dependencies. Since we are cross compiling, there's no way to run the compiled executables to get the output for `try_run` function. CMake provides a solution to this by requiring the user to manually provide the exitcode and the output of the executable which should be given by `*EXITCODE` and `*EXITCODE__TRYRUN_OUTPUT` respectively.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49646

Reviewed By: heitorschueroff

Differential Revision: D29960301

Pulled By: malfet

fbshipit-source-id: b10ab9c182d1220f7e1911f922e7db261d521145
2021-07-30 08:22:33 -07:00
Yi Zhang
cc32dcadd9 Fix Error when run python setup.py install again on Windows (#59689)
Summary:
Fix https://github.com/pytorch/pytorch/issues/59688

So far, .build.ninja should be removed before building the source code on Windows at any time

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59689

Reviewed By: bdhirsh

Differential Revision: D29032960

Pulled By: walterddr

fbshipit-source-id: 2b8162cd119820d3b6d8715745ec29b9c381e01f
2021-06-10 12:22:21 -07:00
Sam Estep
737d920b21 Strictly type everything in .github and tools (#59117)
Summary:
This PR greatly simplifies `mypy-strict.ini` by strictly typing everything in `.github` and `tools`, rather than picking and choosing only specific files in those two dirs. It also removes `warn_unused_ignores` from `mypy-strict.ini`, for reasons described in https://github.com/pytorch/pytorch/pull/56402#issuecomment-822743795: basically, that setting makes life more difficult depending on what libraries you have installed locally vs in CI (e.g. `ruamel`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59117

Test Plan:
```
flake8
mypy --config mypy-strict.ini
```

Reviewed By: malfet

Differential Revision: D28765386

Pulled By: samestep

fbshipit-source-id: 3e744e301c7a464f8a2a2428fcdbad534e231f2e
2021-06-07 14:49:36 -07:00
Your Name
65748f81c9 Un-verbose the build (#59235)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59235

Reviewed By: zou3519

Differential Revision: D28792468

Pulled By: driazati

fbshipit-source-id: 98f730ea0ee28b4b5c13198879bee8f586c0c14c
2021-06-01 10:14:26 -07:00
Jane Xu
29487ac7ff Add 11.3 binaries without conda (#58877)
Summary:
Tested specifically for cuda 11.3 in https://github.com/pytorch/pytorch/pull/57522.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58877

Reviewed By: walterddr

Differential Revision: D28697703

Pulled By: janeyx99

fbshipit-source-id: 08ae7f7d023cb93e47a2e0a4f115cee9e8a6156a
2021-05-26 12:40:13 -07:00
davidriazati@fb.com
4b96fc060b Remove distutils (#57040)
Summary:
[distutils](https://docs.python.org/3/library/distutils.html) is on its way out and will be deprecated-on-import for Python 3.10+ and removed in Python 3.12 (see [PEP 632](https://www.python.org/dev/peps/pep-0632/)). There's no reason for us to keep it around since all the functionality we want from it can be found in `setuptools` / `sysconfig`. `setuptools` includes a copy of most of `distutils` (which is fine to use according to the PEP), that it uses under the hood, so this PR also uses that in some places.

Fixes #56527
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57040

Pulled By: driazati

Reviewed By: nikithamalgifb

Differential Revision: D28051356

fbshipit-source-id: 1ca312219032540e755593e50da0c9e23c62d720
2021-04-29 12:10:11 -07:00
Pavel Belevich
dde2bc4818 Add OPENSSL_ROOT_DIR to cmake.py (#56846)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56846

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D27992923

Pulled By: pbelevich

fbshipit-source-id: dc2d26d4bc9d17a5da441ae4db8241609ca97c6e
2021-04-25 20:14:56 -07:00
davidriazati@fb.com
48a7d69946 Catch and ignore tracebacks for compilation errors (#55986)
Summary:
The Python traceback on a cmake invocation is meaningless to most developers, so this PR wraps it in a `try..catch` so we can ignore it and save scrolling through the 20-or-so lines.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55986

Pulled By: driazati

Reviewed By: wanchaol

Differential Revision: D27769304

fbshipit-source-id: 5889eea03db098d10576290abeeb4600029fb3f2
2021-04-14 13:05:27 -07:00
Richard Barnes
a5339b9d7c Drop unused imports from leftovers (#49953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49953

From
```
./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/
```

Test Plan: Standard sandcastle tests

Reviewed By: xush6528

Differential Revision: D25727348

fbshipit-source-id: b3feef80b9b4b535f1bd4060dace5b1a50bd5e69
2021-01-04 16:31:48 -08:00
Samuel Marks
e6779d4357 [*.py] Rename "Arguments:" to "Args:" (#49736)
Summary:
I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings.

```sh
(pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do
    printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" | paste -s -d+ -- | bc)"; done
Args:      1095
Arguments: 0336
```

It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per:

  - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md)

  - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md)

  - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst)

Therefore, only `Args:` is valid. This PR replaces them throughout the codebase.

PS: For related PRs, see tensorflow/tensorflow/pull/45420

PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736

Reviewed By: albanD

Differential Revision: D25710534

Pulled By: soumith

fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619
2020-12-28 09:34:47 -08:00
Nikita Shulga
2dff0b3e91 Fix typos in comments (#48316)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48316

Reviewed By: walterddr, mrshenli

Differential Revision: D25125123

Pulled By: malfet

fbshipit-source-id: 6f31e5456cc078cc61b288191f1933711acebba0
2020-11-24 10:56:40 -08:00
Abaho Katabarwa
de3a48013a Use CAFFE2_USE_MSVC_STATIC_RUNTIME to determine when to avoid waiting for global destructors on Windows (#43532)
Summary:
We are trying to build libtorch statically (BUILD_SHARED_LIBS=OFF) then link it into a DLL. Our setup hits the infinite loop mentioned [here](54c05fa34e/torch/csrc/autograd/engine.cpp (L228)) because we build with `BUILD_SHARED_LIBS=OFF` but still link it all into a DLL at the end of the day.

This PR fixes the issue by changing the condition to guard on which windows runtime the build links against using the `CAFFE2_USE_MSVC_STATIC_RUNTIME` flag. `CAFFE2_USE_MSVC_STATIC_RUNTIME` defaults to ON when `BUILD_SHARED_LIBS=OFF`, so backwards compatibility is maintained.

I'm not entirely confident I understand the subtleties of the windows runtime versus linking setup, but this setup works for us and should not affect the existing builds.

Fixes https://github.com/pytorch/pytorch/issues/44470

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43532

Reviewed By: mrshenli

Differential Revision: D24053767

Pulled By: albanD

fbshipit-source-id: 1127fefe5104d302a4fc083106d4e9f48e50add8
2020-10-01 16:41:14 -07:00
Bugra Akyildiz
27c7158166 Remove __future__ imports for legacy Python2 supports (#45033)
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:

```2to3 -f future -w caffe2```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033

Reviewed By: seemethere

Differential Revision: D23808648

Pulled By: bugra

fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
2020-09-23 17:57:02 -07:00
guol-fnst
e1afa9daff fix cmake bug (#39930)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39930

Differential Revision: D22391207

Pulled By: ezyang

fbshipit-source-id: bde19a112846e124d4e5316ba947f48d4dccf361
2020-07-06 08:02:30 -07:00
peter
bfa5070cbc Fix rebuild with Ninja on Windows (#37917)
Summary:
It is currently broken due to a ninja bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37917

Differential Revision: D21470357

Pulled By: ezyang

fbshipit-source-id: c0ed858c63a7504bf2c4961dd7ed906fc3f4502a
2020-05-07 19:15:27 -07:00
Wojciech Baranowski
945672bf3e cmake: improve dependencies in incremental builds (#37661)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/26304

Test procedure:
With ninja:
[x] Build a clean checkout
[x] Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s.
[x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding.
[x] Modify DispatchTable.h. Build again. Result: `.cu` files are rebuilt, as well as many `.cpp` files
[x] Build for the fifth time. Result: Virtually instantaneous, with no extra rebuilding.
[x] Touch one of the `.depend` files. Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s.

Without ninja:
[x] Build a clean checkout
[x] Build again. Result: There is some unnecessary rebuilding. But it was also happening before this change.
[x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37661

Differential Revision: D21434624

Pulled By: ezyang

fbshipit-source-id: 379d2315486b8bb5972c184f9b8da8e00d38c338
2020-05-06 14:25:18 -07:00
Nikita Shulga
e2adcc1c53 Report CUDA separate compilation flag (#35726)
Summary:
In Summary specify whether CUDA code is compiled with separate compilation enabled

Also, correctly handle space-separate TORCH_NVCC_FLAGS when adding them to NVCC_CUDA_FLAGS
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35726

Test Plan: CI + local build with TORCH_NVCC_FLAGS set to "-Xfatbin -compress-all"

Differential Revision: D20830885

Pulled By: malfet

fbshipit-source-id: 0e0ecab4a97b6c8662a2c4bfc817857da9f32201
2020-04-02 19:35:02 -07:00
pinzhenx
bd604cb5b7 Upgrade MKL-DNN to DNNL v1.2 (#32422)
Summary:
## Motivation

This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300.

DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version.

This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture.

<br>

## What's included?

Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes:

<br>

**General:**

1. Replace op-level allocator with global-registered allocator

```
// before
ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z);

// after
ideep::sum::compute(scales, {x, y}, z);
```

The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator.

```
RegisterEngineAllocator cpu_alloc(
  ideep::engine::cpu_engine(),
  [](size_t size) {
    return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size);
  },
  [](void* p) {
    c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p);
  }
);
```
------

2. Simplify group convolution

We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case.

As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code.

```
// aten/src/ATen/native/mkldnn/Conv.cpp

if (w.ndims() == x.ndims() + 1) {
  AT_ASSERTM(
      groups > 1,
      "Only group _mkldnn_conv2d weights could have been reordered to 5d");
  kernel_size[0] = w.get_dim(0) * w.get_dim(1);
  std::copy_n(
      w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1);
} else {
  std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin());
}
```

------

3. Enable DNNL built-in cache

Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and **no longer** caching buffers in order to reduce memory footprint.

This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before.

------

4. Use 64-bit integer to denote dimensions

We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector.

<br>

**Misc changes in each commit:**

**Commit:** change build options

Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`.

Old | New
-- | --
WITH_EXAMPLE | MKLDNN_BUILD_EXAMPLES
WITH_TEST | MKLDNN_BUILD_TESTS
MKLDNN_THREADING | MKLDNN_CPU_RUNTIME
MKLDNN_USE_MKL | N/A (not use MKL anymore)

------

**Commit:** aten reintegration

- aten/src/ATen/native/mkldnn/BinaryOps.cpp

    Implement binary ops using new operation `binary` provided by DNNL

- aten/src/ATen/native/mkldnn/Conv.cpp

    Clean up group convolution checks
    Simplify conv backward integration

- aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp

    Simplify prepacking convolution weights

- test/test_mkldnn.py

    Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue

- torch/utils/mkldnn.py

    Prepack weight tensor on module `__init__` to achieve better performance significantly

------

**Commit:** caffe2 reintegration

- caffe2/ideep/ideep_utils.h

    Clean up unused type definitions

- caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc

   Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit`

- caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc

    Clean up group convolution checks
    Revamp convolution API

- caffe2/ideep/operators/conv_transpose_op.cc

    Clean up group convolution checks
    Clean up deconv workaround code

------

**Commit:** custom allocator

- Register c10 allocator as mentioned above

<br><br>

## Performance

We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20.

ratio: new / old | Latency (batch=1 4T) | Throughput (batch=64 56T)
-- | -- | --
pytorch resnet18 | 121.4% | 99.7%
pytorch resnet50 | 123.1% | 106.9%
pytorch resnext101_32x8d | 116.3% | 100.1%
pytorch resnext50_32x4d | 141.9% | 104.4%
pytorch mobilenet_v2 | 163.0% | 105.8%
caffe2 alexnet | 303.0% | 99.2%
caffe2 googlenet-v3 | 101.1% | 99.2%
caffe2 inception-v1 | 102.2% | 101.7%
caffe2 mobilenet-v1 | 356.1% | 253.7%
caffe2 resnet101 | 100.4% | 99.8%
caffe2 resnet152 | 99.8% | 99.8%
caffe2 shufflenet | 141.1% | 69.0% †
caffe2 squeezenet | 98.5% | 99.2%
caffe2 vgg16 | 136.8% | 100.6%
caffe2 googlenet-v3 int8 | 100.0% | 100.7%
caffe2 mobilenet-v1 int8 | 779.2% | 943.0%
caffe2 resnet50 int8 | 99.5% | 95.5%

_Configuration:
Platform: Skylake 8180
Latency Test: 4 threads, warmup 30, iteration 500, batch size 1
Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_

† Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like **jemalloc** as a drop-in replacement for system allocator in such heavy workloads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422

Test Plan:
Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results

10% improvement for ResNext with avx512, neutral on avx2

More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP

Reviewed By: yinghai

Differential Revision: D20381325

Pulled By: dzhulgakov

fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77
2020-03-26 22:07:59 -07:00
cyy
5be8a4e027 find mkl installed by nuget (#34031)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34031

Differential Revision: D20221807

Pulled By: ezyang

fbshipit-source-id: 827e2775956f408febb287676bbf9a96a70fe2d4
2020-03-03 07:44:20 -08:00
Hong Xu
f255b7a3ac Drop support of the build option USE_GLOO_IBVERBS (#33163)
Summary:
Two releases have passed since its deprecation:
8a026d4f74
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33163

Differential Revision: D19850713

Pulled By: ezyang

fbshipit-source-id: 30a60df470b88e8c40e33112296e437cde29c49f
2020-02-11 20:35:50 -08:00
peterjc123
ebed008dd4 Correct /MP usage in MSVC (#33120)
Summary:
## Several flags
`/MP[M]`: It is a flag for the compiler `cl`. It leads to object-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC.
`/maxcpucount:[M]`: It is a flag for the generator `msbuild`. It leads to project-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC.
`/p:CL_MPCount=[M]`: It is a flag for the generator `msbuild`. It leads the generator to pass `/MP[M]` to the compiler.
`/j[M]`: It is a flag for the generator `ninja`. It leads to object-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC.

## Reason for the change
1. Object-level multiprocessing is preferred over project-level multiprocessing.
2. ~For ninja, we don't need to set `/MP` otherwise M * M processes will be spawned.~ Actually, it is not correct because in ninja configs, there are only one source file in the command. Therefore, the `/MP` switch should be useless.
3. For msbuild, if it is called through Python configuration scripts, then `/p:CL_MPCount=[M]` will be added, otherwise, we add `/MP` to `CMAKE_CXX_FLAGS`.
4. ~It may be a possible fix for https://github.com/pytorch/pytorch/issues/28271, https://github.com/pytorch/pytorch/issues/27463 and https://github.com/pytorch/pytorch/issues/25393. Because `/MP` is also passed to `nvcc`.~ It is probably not true. Because `/MP` should not be effective given there is only one source file per command.

## Reference
1. https://docs.microsoft.com/en-us/cpp/build/reference/mp-build-with-multiple-processes?view=vs-2019
2. https://github.com/Microsoft/checkedc-clang/wiki/Parallel-builds-of-clang-on-Windows
3. https://blog.kitware.com/cmake-building-with-all-your-cores/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33120

Differential Revision: D19817227

Pulled By: ezyang

fbshipit-source-id: f8d01f835016971729c7a8d8a0d1cb8a8c2c6a5f
2020-02-10 11:29:25 -08:00
cyy
27e1fecabd let user specify CUDA_HOST_COMPILER
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32904

Differential Revision: D19729047

Pulled By: ezyang

fbshipit-source-id: c233e3924f71a025c51d25a7e3a8d728dac8730a
2020-02-04 14:32:12 -08:00
peterjc123
9a5fd2eb07 Fix conflicts in CMAKE_GENERATOR and generator (#30971)
Summary:
...specified in -G

https://cmake.org/cmake/help/latest/variable/CMAKE_GENERATOR.html
According to the document, the generator could be determined through two methods:
1. Specify in `-G`
2. Read from `CMAKE_GENERATOR`

We should avoid conflicts in these two methods. This fixes https://github.com/pytorch/pytorch/issues/30910.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30971

Differential Revision: D18927529

Pulled By: mingbowan

fbshipit-source-id: e9a179ceb32d6fbabfaeac6cfe9e6170ca170b20
2019-12-10 22:22:26 -08:00
Hong Xu
21d7532dfe Add more comment on NumPy detection in Python scripts.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30417

Differential Revision: D18716502

Pulled By: albanD

fbshipit-source-id: 0b1b86f882e0e24cb6845e4a44708048e7e3b4a8
2019-11-26 17:38:27 -08:00
Hong Xu
3455231e9c Expose configuration of Numa directories to setup.py (#30104)
Summary:
https://github.com/pytorch/pytorch/issues/29968
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30104

Differential Revision: D18656882

Pulled By: ezyang

fbshipit-source-id: f932a98674033f1a3184dc1c22faa6f8c2b50134
2019-11-22 07:07:39 -08:00
David Reiss
d22f61432d Update fbjni and enable PyTorch JNI build
Summary:
- Add a "BUILD_JNI" option that enables building PyTorch JNI bindings and
  fbjni.  This is off by default because it adds a dependency on jni.h.
- Update to the latest fbjni so we can inhibit building its tests,
  because they depend on gtest.
- Set JAVA_HOME and BUILD_JNI in Linux binary build configurations if we
  can find jni.h in Docker.

Test Plan:
- Built on dev server.
- Verified that libpytorch_jni links after libtorch when both are built
  in a parallel build.

Differential Revision: D18536828

fbshipit-source-id: 19cb3be8298d3619352d02bb9446ab802c27ec66
2019-11-15 13:59:44 -08:00
Hong Xu
ff9d508b88 Remove tools/setup_helpers/cuda.py. (#28617)
Summary:
Except for the Windows default path, everything it does has been done in
FindCUDA.cmake. Search for nvcc in path has been added to FindCUDA.cmake (https://github.com/pytorch/pytorch/issues/29160). The Windows default path part is moved to
build_pytorch_libs.py. CUDA_HOME is kept for now because other parts of
the build system is still using it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28617

Differential Revision: D18347814

Pulled By: ezyang

fbshipit-source-id: 22bb7eccc17b559ce3efc1ca964e3fbb270b5b0f
2019-11-06 07:12:01 -08:00
Hong Xu
5e5cbceeba remove tools/setup_helpers/cudnn.py (#25876)
Summary:
FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed.

Previously in https://github.com/pytorch/pytorch/issues/25482, one test failed because TensorRT detects cuDNN differently, and there may be situations we can find cuDNN but TensorRT cannot. This is fixed by passing our detection result down to TensorRT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25876

Differential Revision: D17346270

Pulled By: ezyang

fbshipit-source-id: c1e7ad4a1cb20f964fe07a72906f2f002425d894
2019-09-24 07:44:33 -07:00
Hong Xu
a96e41b7c0 Use expected_wrapper only if CMAKE_{C,CXX}_COMPILER and/or is not set by user (#26306)
Summary:
This will honor user's preference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26306

Differential Revision: D17408030

Pulled By: soumith

fbshipit-source-id: 6841b805603d40cd7caf78dbb42405a0c931f052
2019-09-16 16:12:29 -07:00
Hong Xu
8a026d4f74 Remove tools/setup_helpers/dist_check.py (#25879)
Summary:
What dist_check.py does is largely merely determining whether we should
use set "USE_IBVERBS" to ON or OFF when the user sets "USE_GLOO_IBVERBS"
to ON. But this is unnecessary, because this complicated determination
will always be overrided by gloo:

2101e02cea/cmake/Dependencies.cmake (L19-L28)

Since dist_check.py becomes irrelevant, this commit also simplifies the
setting of `USE_DISTRIBUTED` (by removing its explicit setting in Python scripts), and deprecate `USE_GLOO_IBVERBS` in favor
of `USE_IBVERBS`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25879

Differential Revision: D17282395

Pulled By: pietern

fbshipit-source-id: a10735f50728d89c3d81fd57bcd26764e7f84dd1
2019-09-10 04:33:28 -07:00
Edward Yang
97b432bdf0 Back out "[pytorch][PR] remove tools/setup_helpers/cudnn.py"
Summary:
Original commit changeset: abd9cd0244ca

(Note: this ignores all push blocking failures!)

Test Plan: none

Reviewed By: nairbv

Differential Revision: D17259003

fbshipit-source-id: d7e067eeb36192766c639bfcbc66f540ce8eb77e
2019-09-09 06:47:45 -07:00
Hong Xu
66ac6698f6 remove tools/setup_helpers/cudnn.py (#25482)
Summary:
FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25482

Differential Revision: D17226408

Pulled By: ezyang

fbshipit-source-id: abd9cd0244cabea1f5d9f93f828d632d77c8dd5e
2019-09-06 06:54:35 -07:00
Hong Xu
cc4211069e Do not pass down USE_GLOO_IBVERBS to CMake (#25720)
Summary:
It doesn't seem to be used anywhere once down to CMake in this repo or any submodules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25720

Differential Revision: D17225088

Pulled By: pietern

fbshipit-source-id: a24b080e6346a203b345e2b834fe095e3b9aece0
2019-09-06 02:40:42 -07:00
Hong Xu
0b1fee0819 Remove escape_path in our build system. (#24044)
Summary:
Which was added in https://github.com/pytorch/pytorch/issues/16412.

Also make some CUDNN_* CMake variables to be build options so as to avoid direct reading using `$ENV` from environment variables from CMake scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24044

Differential Revision: D16783426

Pulled By: ezyang

fbshipit-source-id: cb196b0013418d172d0d36558995a437bd4a3986
2019-08-13 20:38:19 -07:00
Hong Xu
994f643d9a Do not force USE_SYSTEM_EIGEN_INSTALL to be OFF in Python build scripts (#23990)
Summary:
Not sure whether 34c0043aae still makes sense.

`USE_SYSTEM_EIGEN_INSTALL` is OFF by default (as set in CMakeLists.txt). If a user wants to change this build option, I don't see any reason to force them to do it in `CMakeCache.txt`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23990

Differential Revision: D16732569

Pulled By: ezyang

fbshipit-source-id: 4604b4a1d5857552ad02e76aee91641aea48801a
2019-08-09 08:33:48 -07:00
Hong Xu
e80b48390d When matching a line in CMakeCache.txt, ensure A=B and "A"=B are matched (#23745)
Summary:
Currently when reading CMakeCache.txt, only `VAR:TYPE=VAL` can be matched.
This works well for CMake-generated lines, but a user may add a line
without specifying type (`VAR=VAL`), which is totally legitimate in the
eyes of CMake. This improvements in regex ensure that `VAR:TYPE=VAL` is
also matched. The situation of `"VAR":TYPE=VAL` is also corrected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23745

Differential Revision: D16726514

Pulled By: ezyang

fbshipit-source-id: 6c50150d58926563837cf77d156c24d644666ef0
2019-08-08 18:07:28 -07:00
Hong Xu
1a9334ea59 Hotpatch CXXFLAGS to be the same as CFLAGS if CXXFLAGS is not set. (#23568)
Summary:
This fixes build regression caused by https://github.com/pytorch/pytorch/issues/23528 because we used to let CXXFLAGS equal CFLAGS.

cc suo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23568

Differential Revision: D16568820

Pulled By: suo

fbshipit-source-id: 64a0dc923c08ac1751224f42bc4ccdc707341762
2019-08-07 16:25:57 -07:00
Hong Xu
323aad6b20 No need to handle the dependency of INSTALL_TEST on BUILD_TEST in cmake.py (#23806)
Summary:
Simplifying https://github.com/pytorch/pytorch/issues/23793: The dependency relationship between
{INSTALL,BUILD}_TEST is already properly handled in CMakeLists.txt. All
we need to do is to pass down INSTALL_TEST.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23806

Differential Revision: D16691833

Pulled By: soumith

fbshipit-source-id: 7607492b2d82db3f79b174373a92e2810a854a61
2019-08-07 11:34:31 -07:00
Soumith Chintala
7d9e69e62e allow INSTALL_TEST to pass through from env to cmake (#23793)
Summary:
This allows `INSTALL_*` to pass through to cmake.
Additional fix is that if `INSTALL_TEST` is specified, it wont use `BUILD_TEST` as the default value for `INSTALL_TEST`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23793

Differential Revision: D16648668

Pulled By: soumith

fbshipit-source-id: 52c2a0d8033bc556355b87a6731a577940de9859
2019-08-05 09:55:14 -07:00
Edward Yang
3cc7da3a7d Revert D16561561: [pytorch][PR] Remove preprocessing of CFLAGS, CPPFLAGS, and LDFLAGS in Python scripts.
Differential Revision:
D16561561

Original commit changeset: 962a27a2b0a1

fbshipit-source-id: 82ed08e5599ddbb9ed96352ac4572aa73df65aac
2019-07-30 13:28:19 -07:00
Hong Xu
cfe9400996 Remove preprocessing of CFLAGS, CPPFLAGS, and LDFLAGS in Python scripts. (#23528)
Summary:
After https://github.com/pytorch/pytorch/issues/23455, there is no need of this preprocessing in Python scripts.
They will be automatically processed in CMake (plus CPPFLAGS here
probably meant to be CXXFLAGS).

Reference:

- https://cmake.org/cmake/help/v3.15/envvar/CFLAGS.html
- https://cmake.org/cmake/help/v3.15/envvar/CXXFLAGS.html
- https://cmake.org/cmake/help/v3.15/envvar/LDFLAGS.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23528

Differential Revision: D16561561

Pulled By: ezyang

fbshipit-source-id: 962a27a2b0a18db0f95477ad067a2611e4128187
2019-07-30 08:07:36 -07:00
Hong Xu
8ada7c9920 Remove two CMAKE_ build options from additional_options. (#23451)
Summary:
Following up 915261c8be
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23451

Differential Revision: D16542303

Pulled By: ezyang

fbshipit-source-id: 1406c311c198eb237f85d6d8f1f0d58626be8257
2019-07-29 08:13:59 -07:00
Hong Xu
b335f3910f Remove redundant MSVC_Z7_OVERRIDE processing and combine "/EHa" flag setup (#23455)
Summary:
- MSVC_Z7_OVERRIDE has already handled in CMakeLists.txt. No need to process it for once more in the Python scripts.
- Option MSVC_Z7_OVERRIDE should be visible to the user only if MSVC is used.
- Move the setting of "/EHa" flag to CMakeLists.txt, where other MSVC-specific flags are processed. This also further prepares the removal of redundant cflags setup in Python build scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23455

Differential Revision: D16542274

Pulled By: ezyang

fbshipit-source-id: 4d3b8b07161478bbba8a21feb6ea24c9024e21ac
2019-07-29 08:08:47 -07:00
Ilia Cherniavskii
74f8094ea5 Rename threading build options
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23407

Test Plan:
USE_CUDA=0 ATEN_THREADING=TBB USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB
BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python
setup.py develop install --cmake

./build/bin/parallel_info

Imported from OSS

Differential Revision: D16522538

Pulled By: ilia-cher

fbshipit-source-id: 75c4761d93a7f5936f28e4c5eedcd27d8490d0c5
2019-07-26 13:09:14 -07:00