Summary:
This PR greatly simplifies `mypy-strict.ini` by strictly typing everything in `.github` and `tools`, rather than picking and choosing only specific files in those two dirs. It also removes `warn_unused_ignores` from `mypy-strict.ini`, for reasons described in https://github.com/pytorch/pytorch/pull/56402#issuecomment-822743795: basically, that setting makes life more difficult depending on what libraries you have installed locally vs in CI (e.g. `ruamel`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59117
Test Plan:
```
flake8
mypy --config mypy-strict.ini
```
Reviewed By: malfet
Differential Revision: D28765386
Pulled By: samestep
fbshipit-source-id: 3e744e301c7a464f8a2a2428fcdbad534e231f2e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51043
This PR makes `fast_nvcc` stop at failing commands, rather than continuing on to run commands that would otherwise run after those commands. It is still possible for `fast_nvcc` to run more commands than `nvcc` would run if there's no dependency between them, but this should still help to reduce noise from failing `fast_nvcc` runs.
Test Plan: Unfortunately the test suite for this script is FB-internal. It would probably be a good idea to move it into the PyTorch GitHub repo, but I'm not entirely sure how to do so, since I don't believe we currently have a good place to put tests for things in `tools`.
Reviewed By: malfet
Differential Revision: D26007788
fbshipit-source-id: 8fe1e7d020a29d32d08fe55fb59229af5cdfbcaa
Summary:
draft enable fast_nvcc.
* cleaned up some non-standard usages
* added fall-back to wrap_nvcc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49773
Test Plan:
Configuration to enable fast nvcc:
- install and enable `ccache` but delete `.ccache/` folder before each build.
- `TORCH_CUDA_ARCH_LIST=6.0;6.1;6.2;7.0;7.5`
- Toggling `USE_FAST_NVCC=ON/OFF` cmake config and run `cmake --build` to verify the build time.
Initial statistic for a full compilation:
* `cmake --build . -- -j $(nproc)`:
- fast NVCC
```
real 48m55.706s
user 1559m14.218s
sys 318m41.138s
```
- normal NVCC:
```
real 43m38.723s
user 1470m28.131s
sys 90m46.879s
```
* `cmake --build . -- -j $(nproc/4)`:
- fast NVCC:
```
real 53m44.173s
user 1130m18.323s
sys 71m32.385s
```
- normal NVCC:
```
real 81m53.768s
user 858m45.402s
sys 61m15.539s
```
* Conclusion: fast NVCC doesn't provide too much gain when compiler is set to use full CPU utilization, in fact it is **even worse** because of the thread switcing.
initial statistic for partial recompile (edit .cu files)
* `cmake --build . -- -j $(nproc)`
- fast NVCC:
```
[2021-01-13 18:10:24] [ 86%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMiscOpsKernels.cu.o
[2021-01-13 18:11:08] [ 86%] Linking CXX shared library ../lib/libtorch_cuda.so
```
- normal NVCC:
```
[2021-01-13 17:35:40] [ 86%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMiscOpsKernels.cu.o
[2021-01-13 17:38:08] [ 86%] Linking CXX shared library ../lib/libtorch_cuda.so
```
* Conclusion: Effective compilation time for single CU file modification reduced from from 2min30sec to only 40sec when compiling multiple architecture. This shows **4X** gain in speed up using fast NVCC -- reaching the theoretical limit of 5X when compiling 5 gencode architecture at the same time.
Follow up PRs:
- should have better fallback mechanism to detect whether a build is supported by fast_nvcc or not instead of dryruning then fail with fallback.
- performance measurement instrumentation to measure what's the total compile time vs the parallel tasks critical path time.
- figure out why `-j $(nproc)` gives significant sys overhead (`sys 318m41.138s` vs `sys 90m46.879s`) over normal nvcc, guess this is context switching, but not exactly sure
Reviewed By: malfet
Differential Revision: D25692758
Pulled By: walterddr
fbshipit-source-id: c244d07b9b71f146e972b6b3682ca792b38c4457
Summary:
This PR adds `tools/fast_nvcc/fast_nvcc.py`, a mostly-transparent wrapper over `nvcc` that parallelizes compilation of CUDA files when building for multiple architectures at once.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48934
Test Plan: Currently this script isn't actually used in PyTorch OSS. Coming soon!
Reviewed By: walterddr
Differential Revision: D25286030
Pulled By: samestep
fbshipit-source-id: 971a404cf57f5694dea899a27338520d25191706