Commit Graph

1387 Commits

Author SHA1 Message Date
Nikita Shulga
36ac095ff8 Migrate PyTorch to C++17 (#85969)
With CUDA-10.2 gone we can finally do it!

This PR mostly contains build system related changes, invasive functional ones are to be followed.
Among many expected tweaks to the build system, here are few unexpected ones:
 - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it
 - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code.
 - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious
 - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it.

Some prerequisites:
 - https://github.com/pytorch/pytorch/pull/89297
 - https://github.com/pytorch/pytorch/pull/89605
 - https://github.com/pytorch/pytorch/pull/90228
 - https://github.com/pytorch/pytorch/pull/90389
 - https://github.com/pytorch/pytorch/pull/90379
 - https://github.com/pytorch/pytorch/pull/89570
 - https://github.com/facebookincubator/gloo/pull/336
 - https://github.com/facebookincubator/gloo/pull/343
 - 919676fb32

Fixes https://github.com/pytorch/pytorch/issues/56055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969
Approved by: https://github.com/ezyang, https://github.com/kulinseth
2022-12-08 02:27:48 +00:00
Peter Bell
5a8b07de75 Declare public dependencies on libshm (#82694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82694
Approved by: https://github.com/malfet
2022-10-07 00:01:25 +00:00
Tongliang Liao
dff70a5e1a Make language std configurable. (#75519)
RocksDB 7 starts to use C++17 in header.
We should make this configurable, in case user needs higher std version.

List of files to changed is found by `git grep 'CMAKE_[^_]*_STANDARD'`.
Doc string is from CMake code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75519
Approved by: https://github.com/malfet
2022-07-13 14:21:27 +00:00
Michael Andreas Dagitses
ab2ca95dd1 turn on -Werror=unused-variable in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79156

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-11 02:46:34 +00:00
Shashank Chaudhry
89c4e8c22b [NOOP][clangformat][codemod] Enable CLANGFORMAT for some folders in caffe2/* (#67746)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67746

Test Plan: Visual inspection. Sandcastle.

Reviewed By: zertosh

Differential Revision: D31986646

fbshipit-source-id: 91885c20c3cead3853c49abb9fe0a94a67f33cc8
2021-11-03 12:23:14 -07:00
Nikita Shulga
c373387709 Update CMake and use native CUDA language support (#62445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445

PyTorch currently uses the old style of compiling CUDA in CMake which is just a
bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as
a language just like C++ or C.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31503350

fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55
2021-10-11 09:05:48 -07:00
Nikita Shulga
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
Richard Barnes
a91be24e2d Modernize make pointers (#61741)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61741

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29717385

fbshipit-source-id: 4452b77981e49175f744bdaab12cd225bf75b90e
2021-07-22 15:54:37 -07:00
Richard Barnes
a8d99a28d7 Modernize avoid a C array (#61740)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61740

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29717118

fbshipit-source-id: 70e73346b75deb4fe6b6399e06bd576f3b6e2b91
2021-07-21 13:52:54 -07:00
Richard Barnes
59a5312ce6 Modernize fix deprecated header (#61736)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61736

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29716965

fbshipit-source-id: 314c2b557c240ac16bbfab114ab764beb189e78a
2021-07-20 10:06:11 -07:00
Peter Bell
4a7d281119 Migrate THAllocator to ATen (#60325)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60325

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29371715

Pulled By: ngimel

fbshipit-source-id: 78ec8368a48e1a4690d0664a0b02d2a235af98ff
2021-06-24 19:42:14 -07:00
Luca Wehrstedt
a016150163 Move torch/lib/c10d to torch/csrc/distributed/c10d (#60543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60543

Since now c10d is part of libtorch, it would also be nice if the sources lived all in one place.
ghstack-source-id: 132306292

Test Plan: It builds

Reviewed By: cbalioglu

Differential Revision: D29062002

fbshipit-source-id: d9e1301e9d73e1643fa0f0119cd2d618f1ad52e6
2021-06-24 12:38:51 -07:00
Eli Uriegas
2dedd96dd2 cmake: Prefer CMAKE_CURRENT_SOURCE_DIR to TORCH_SRC_DIR (#60493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60493

TORCH_SRC_DIR appears to be a bit bugged when it comes to identifying
include directories so let's try and use CMAKE_CURRENT_SOURCE_DIR
instead

<details>
<summary>Logs for builds with torchaudio</summary>

```
-- Building version 0.10.0a0+9e36281
running bdist_wheel
running build
running build_py
copying torchaudio/version.py -> build/lib.linux-x86_64-3.6/torchaudio
running build_ext
-- Configuring done
-- Generating done
-- Build files have been written to: /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6
[1/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-error.cc
[2/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-math.cc
[3/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/feature-functions.cc
[4/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-matrix.cc
[5/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -c ../../third_party/kaldi/submodule/src/feat/resample.cc
[6/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-vector.cc
[7/11] /usr/lib64/ccache/c++ -DINCLUDE_KALDI -DTORCH_API_INCLUDE_EXTENSION_H -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_torchaudio_EXPORTS -I../../ -I/tmp/tmp.GKeM3KKcFi/include/python3.6m -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -MF torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o.d -o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -c ../../torchaudio/csrc/kaldi.cpp
[8/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc
../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlinePitchFeatureImpl::UpdateRemainder(const kaldi::VectorBase<float>&)’:
../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:814:11: warning: unused variable ‘full_frame_length’ [-Wunused-variable]
  814 |     int32 full_frame_length = opts_.NccfWindowSize() + nccf_last_lag_;
      |           ^~~~~~~~~~~~~~~~~
../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlineProcessPitch::UpdateNormalizationStats(kaldi::int32)’:
../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:1504:35: warning: comparison of integer expressions of different signedness: ‘std::vector<kaldi::OnlineProcessPitch::NormalizationStats>::size_type’ {aka ‘long unsigned int’} and ‘kaldi::int32’ {aka ‘int’} [-Wsign-compare]
 1504 |   if (normalization_stats_.size() <= frame)
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
[9/11] : && /usr/bin/cmake -E rm -f third_party/kaldi/libkaldi.a && /usr/bin/ar qc third_party/kaldi/libkaldi.a  third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o && /usr/bin/ranlib third_party/kaldi/libkaldi.a && :
[10/11] : && /usr/lib64/ccache/c++ -fPIC -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG   -shared -Wl,-soname,_torchaudio.so -o torchaudio/csrc/_torchaudio.so torchaudio/csrc/CMakeFiles/_torchaudio.dir/pybind.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/lfilter.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/overdrive.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/utils.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o  -Wl,-rpath,/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib:  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_python.so  third_party/kaldi/libkaldi.a  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so  -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed  /usr/local/lib/libbreakpad_client.a  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so  -lpthread  -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so" -Wl,--as-needed  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so && :
[10/11] cd /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6 && /usr/bin/cmake -P cmake_install.cmake
-- Install configuration: "Release"
-- Installing: /home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so
-- Set runtime path of "/home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so" to ""
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/torchaudio
copying build/lib.linux-x86_64-3.6/torchaudio/kaldi_io.py -> build/bdist.linux-x86_64/wheel/torchaudio
copying build/lib.linux-x86_64-3.6/torchaudio/transforms.py -> build/bdist.linux-x86_64/wheel/torchaudio
copying build/lib.linux-x86_64-3.6/torchaudio/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio
creating build/bdist.linux-x86_64/wheel/torchaudio/compliance
copying build/lib.linux-x86_64-3.6/torchaudio/compliance/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance
copying build/lib.linux-x86_64-3.6/torchaudio/compliance/kaldi.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance
creating build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/cmuarctic.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/librispeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/libritts.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/vctk.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/commonvoice.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/gtzan.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/ljspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/speechcommands.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/tedlium.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/yesno.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
creating build/bdist.linux-x86_64/wheel/torchaudio/_internal
copying build/lib.linux-x86_64-3.6/torchaudio/_internal/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal
copying build/lib.linux-x86_64-3.6/torchaudio/_internal/fft.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal
copying build/lib.linux-x86_64-3.6/torchaudio/_internal/module_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal
creating build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/common.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/no_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/soundfile_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/sox_io_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
creating build/bdist.linux-x86_64/wheel/torchaudio/extension
copying build/lib.linux-x86_64-3.6/torchaudio/extension/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension
copying build/lib.linux-x86_64-3.6/torchaudio/extension/extension.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension
creating build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/conv_tasnet.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/deepspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2letter.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/wavernn.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/components.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/model.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2
creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_fairseq.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_huggingface.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils
creating build/bdist.linux-x86_64/wheel/torchaudio/sox_effects
copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects
copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/sox_effects.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects
creating build/bdist.linux-x86_64/wheel/torchaudio/utils
copying build/lib.linux-x86_64-3.6/torchaudio/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils
copying build/lib.linux-x86_64-3.6/torchaudio/utils/sox_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils
creating build/bdist.linux-x86_64/wheel/torchaudio/functional
copying build/lib.linux-x86_64-3.6/torchaudio/functional/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional
copying build/lib.linux-x86_64-3.6/torchaudio/functional/filtering.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional
copying build/lib.linux-x86_64-3.6/torchaudio/functional/functional.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional
creating build/bdist.linux-x86_64/wheel/torchaudio/prototype
copying build/lib.linux-x86_64-3.6/torchaudio/prototype/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype
copying build/lib.linux-x86_64-3.6/torchaudio/prototype/rnnt_loss.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype
copying build/lib.linux-x86_64-3.6/torchaudio/version.py -> build/bdist.linux-x86_64/wheel/torchaudio
copying build/lib.linux-x86_64-3.6/torchaudio/_torchaudio.so -> build/bdist.linux-x86_64/wheel/torchaudio
running install_egg_info
running egg_info
writing torchaudio.egg-info/PKG-INFO
writing dependency_links to torchaudio.egg-info/dependency_links.txt
writing requirements to torchaudio.egg-info/requires.txt
writing top-level names to torchaudio.egg-info/top_level.txt
reading manifest file 'torchaudio.egg-info/SOURCES.txt'
writing manifest file 'torchaudio.egg-info/SOURCES.txt'
Copying torchaudio.egg-info to build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281-py3.6.egg-info
running install_scripts
adding license file "LICENSE" (matched pattern "LICEN[CS]E*")
creating build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281.dist-info/WHEEL
creating 'dist/torchaudio-0.10.0a0+9e36281-cp36-cp36m-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
adding 'torchaudio/__init__.py'
adding 'torchaudio/_torchaudio.so'
adding 'torchaudio/kaldi_io.py'
adding 'torchaudio/transforms.py'
adding 'torchaudio/version.py'
adding 'torchaudio/_internal/__init__.py'
adding 'torchaudio/_internal/fft.py'
adding 'torchaudio/_internal/module_utils.py'
adding 'torchaudio/backend/__init__.py'
adding 'torchaudio/backend/common.py'
adding 'torchaudio/backend/no_backend.py'
adding 'torchaudio/backend/soundfile_backend.py'
adding 'torchaudio/backend/sox_io_backend.py'
adding 'torchaudio/backend/utils.py'
adding 'torchaudio/compliance/__init__.py'
adding 'torchaudio/compliance/kaldi.py'
adding 'torchaudio/datasets/__init__.py'
adding 'torchaudio/datasets/cmuarctic.py'
adding 'torchaudio/datasets/commonvoice.py'
adding 'torchaudio/datasets/gtzan.py'
adding 'torchaudio/datasets/librispeech.py'
adding 'torchaudio/datasets/libritts.py'
adding 'torchaudio/datasets/ljspeech.py'
adding 'torchaudio/datasets/speechcommands.py'
adding 'torchaudio/datasets/tedlium.py'
adding 'torchaudio/datasets/utils.py'
adding 'torchaudio/datasets/vctk.py'
adding 'torchaudio/datasets/yesno.py'
adding 'torchaudio/extension/__init__.py'
adding 'torchaudio/extension/extension.py'
adding 'torchaudio/functional/__init__.py'
adding 'torchaudio/functional/filtering.py'
adding 'torchaudio/functional/functional.py'
adding 'torchaudio/models/__init__.py'
adding 'torchaudio/models/conv_tasnet.py'
adding 'torchaudio/models/deepspeech.py'
adding 'torchaudio/models/wav2letter.py'
adding 'torchaudio/models/wavernn.py'
adding 'torchaudio/models/wav2vec2/__init__.py'
adding 'torchaudio/models/wav2vec2/components.py'
adding 'torchaudio/models/wav2vec2/model.py'
adding 'torchaudio/models/wav2vec2/utils/__init__.py'
adding 'torchaudio/models/wav2vec2/utils/import_fairseq.py'
adding 'torchaudio/models/wav2vec2/utils/import_huggingface.py'
adding 'torchaudio/prototype/__init__.py'
adding 'torchaudio/prototype/rnnt_loss.py'
adding 'torchaudio/sox_effects/__init__.py'
adding 'torchaudio/sox_effects/sox_effects.py'
adding 'torchaudio/utils/__init__.py'
adding 'torchaudio/utils/sox_utils.py'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/LICENSE'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/METADATA'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/WHEEL'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/top_level.txt'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel

```

</details>

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D29316372

Pulled By: seemethere

fbshipit-source-id: 02be64df6197c0d4bad5a5bfb3cef336c11f53ed
2021-06-23 14:08:19 -07:00
Rohan Varma
d5df274ea5 [DDP] Support for multiple backwards (#59359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59359

Move `prepare_for_backward` into `_DDPSink` backward instead of calling it in DDP forward pass so that we can run multiple backwards in DDP with `retain_graph=True`.

ghstack-source-id: 131774159

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D28855226

fbshipit-source-id: 6b7b25d75b7696f5b5629078233433f97663d61c
2021-06-18 09:23:57 -07:00
Alexander Golynski
ed1da5be21 PG NCCL cleanup: remove usage of completed_ in WorkNCCL copies (#59899)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59899

Test Plan: Imported from OSS

Reviewed By: cbalioglu, osalpekar

Differential Revision: D29080299

Pulled By: agolynski

fbshipit-source-id: 9ae368f91e81f19471e0a20fc913d8e9df1b9dec
2021-06-17 09:05:35 -07:00
Neel Pragnesh Gandhi
2c5db9a40a Add c10d filestore functionality to the current c10d_rendezvous_backend (#59719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59719

Added filestore functionality to the c10d backend. FileStore will create a temporary file in the /tmp directory to use if it is selected as the store type. Appropriate tests were added as well.
FileStore was modified to expose the path field for testing. It was also modified so that the numWorkers field in the constructor is optional (defaulting to -1). A negative value indicates there is not a fixed number of workers. In this case, the file is not attempted to be cleaned at the end.

Test Plan: Unit tests for creating a c10d backend with filestore and simple error handling.

Reviewed By: cbalioglu, H-Huang

Differential Revision: D28997436

fbshipit-source-id: 24c9b2c9b13ea6c947e8b1207beda892bdca2217
2021-06-16 12:13:36 -07:00
Luca Wehrstedt
a1780432fa Move c10d to libtorch(_cuda) (#59563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59563

ghstack-source-id: 131331264

Test Plan: CI

Reviewed By: malfet

Differential Revision: D28932239

fbshipit-source-id: 5df6cdfa5253b15cbbc97039fe672d6d97321e34
2021-06-15 02:01:31 -07:00
Rohan Varma
580a20f33b [reland] torch/lib/c10d: Use torch_check instead of throwing runtime_error (#59918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59918

Reland of https://github.com/pytorch/pytorch/pull/59684
ghstack-source-id: 131303057

Test Plan: ci

Reviewed By: cbalioglu

Differential Revision: D29081452

fbshipit-source-id: 419df79341f702e796f7adf5f1071a6cd1dcd8d1
2021-06-14 09:52:54 -07:00
Michael Carilli
be038d8989 [CUDA graphs] Make stream semantics of backward calls consistent with other cuda ops (ci-all edition) (#57833)
Summary:
ci-all resubmit of https://github.com/pytorch/pytorch/pull/54227.

Tests look good except for a few distributed autograd failures (pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test) and rocm failures (pr/pytorch-linux-bionic-rocm4.1-py3.6).

The common denominator in rocm failures appears to be multi-gpu activity: some [multiprocess DDP failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test1/8115/console), some [single-process failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test2/8115/console) where the single process has autograd ops that span devices. jeffdaily jithunnair-amd sunway513, could one of you take a look? The streaming backward change is also beneficial to rocm, I expect.

For debugging rocm failures, I think we should ignore the multiprocess/DDP tests and focus on the single process cases. The root cause is probably the same and the single process cases are simpler.

----------------------------------

Update: Rocm failures are due to https://github.com/pytorch/pytorch/issues/59750.
2718a54032 is a workaround, to be updated once https://github.com/pytorch/pytorch/issues/59750 is fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57833

Reviewed By: mruberry

Differential Revision: D28942391

Pulled By: ngimel

fbshipit-source-id: d6047e971c5f1c6386334bf3641402a92f12e2f8
2021-06-13 12:09:56 -07:00
Rohan Varma
3529a48ebb Revert D28981326: torch/lib/c10d: Use torch_check instead of throwing runtime_error
Test Plan: revert-hammer

Differential Revision:
D28981326 (6ea6075002)

Original commit changeset: 264a7f787ea8

fbshipit-source-id: 75625b76dfbd0cbaf59705d621ef9e2d1677c482
2021-06-11 17:17:10 -07:00
Rohan Varma
6ea6075002 torch/lib/c10d: Use torch_check instead of throwing runtime_error (#59684)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59684

Same reasoning as in the below diff.
ghstack-source-id: 131167212

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D28981326

fbshipit-source-id: 264a7f787ea8be76f743a2eaca67ae1d3bd8073a
2021-06-11 11:16:58 -07:00
Luca Wehrstedt
c9e4d1372f Add guards for USE_C10D_FOO in relevant c10d files (#59697)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59697

The c10d build process selectively adds files based on the `USE_C10D_FOO` flags (where `FOO` is one of `GLOO`, `NCCL` or `MPI`). Replicating this logic inside libtorch will be harder, since libtorch uses a simpler approach (i.e., it lists the files in `build_variables.bzl`). So instead we could always include all files, and "disable" each file as needed using `#ifdef`s. Note that this is not a new approach: we already do the same for all the files of the TensorPipe agent based on the flag `USE_TENSORPIPE`.
ghstack-source-id: 131169540

Test Plan: CI

Reviewed By: agolynski

Differential Revision: D28987577

fbshipit-source-id: 4c6195de4e9a58101dad9379537e8d055dfd38af
2021-06-11 05:06:42 -07:00
Luca Wehrstedt
773b56e719 Fix Windows guards in c10d (#59696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59696

Some files in c10d refer to dist autograd. However, on Windows, dist autograd isn't built. Hence we need to "mask out" those references under Windows. This was already partly done, but when moving c10d to libtorch some issues came up, possibly due to the different way in which linking happens. Hence I masked out the remaining references.
ghstack-source-id: 131169541

Test Plan: CI

Reviewed By: agolynski

Differential Revision: D28987579

fbshipit-source-id: c29c5330f8429d699554972d30f99a89b2e3971d
2021-06-11 05:06:40 -07:00
Luca Wehrstedt
cbcae46fa5 Remove USE_CUDA from c10d reducer/logger (#59562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59562

Needed to merge c10d into libtorch(_cuda).

ghstack-source-id: 131169542

Test Plan: CI

Reviewed By: agolynski

Differential Revision: D28931378

fbshipit-source-id: 71376b862ff6ef7dbfa7331ec8d269bd3fcc7e0d
2021-06-11 05:06:39 -07:00
Luca Wehrstedt
b4c35d7ae7 Remove USE_CUDA from ProcessGroupGloo (#59561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59561

Needed to merge c10d into libtorch(_cuda).

ghstack-source-id: 131169544

Test Plan: CI

Reviewed By: agolynski

Differential Revision: D28931379

fbshipit-source-id: 9bd68477ae7bb870b6737a555edd5696149ff5d6
2021-06-11 05:05:31 -07:00
Rohan Varma
fc0582ee95 [c10d] Use TORCH_CHECK for monitored barrier error (#59667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59667

Use torch_check over throw std::runtime_error in monitored barrier so
that it works with torch_cpp_show_stacktraces to reveal the entire callstack
where the monitored barrier failed, which can help determine where the
particular rank encountered an issue.
ghstack-source-id: 130993689

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D28974510

fbshipit-source-id: 6a6958995c1066cddcd647ca88c74473079b69fc
2021-06-09 22:31:33 -07:00
Richard Barnes
e3d75b8475 irange for PyTorch sans jit (#59481)
Summary:
Switches most of the simple for loops outside of `jit` directories to use `c10::irange`.

Generated with D28874212.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59481

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D28909681

fbshipit-source-id: ec9ab1bd602933238d9d0f73d4d8d027b75d9d85
2021-06-09 14:46:11 -07:00
Yi Wang
31d136c81f [DDP] Rename the member divFactor_ as div_factor for naming consistency in reducer (#59523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59523

Should use snake case instead of camel case for the consistency.
ghstack-source-id: 130759655

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs

Reviewed By: cbalioglu

Differential Revision: D28922896

fbshipit-source-id: e04298284a78b2e71b562f790a878731962f873a
2021-06-08 10:04:20 -07:00
Yi Wang
b7ee164456 [DDP] Remove the duplicate parseHookResult in reducer (#59510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59510

Address the comment in https://github.com/pytorch/pytorch/pull/58937#discussion_r645822768

#Closes: https://github.com/pytorch/pytorch/issues/41266
ghstack-source-id: 130758758

Test Plan: waitforbuildbot

Reviewed By: cbalioglu

Differential Revision: D28918694

fbshipit-source-id: 7ac4e4e6268e220adefed230bdb377ab3b25e302
2021-06-08 10:04:18 -07:00
Yi Wang
2b398d0537 [Reland][Gradient Compression] Apply division first to avoid overflow (#59576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59576

If the gradients before allreduce are large, then the sum after allreduce may overflow, especially for FP16. Therefore, apply the division before allreduce.

This fix is applied to both C++ and Python comm hooks.
ghstack-source-id: 130754510

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl_is_view
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_is_view
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl_grad_is_view
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl_grad_is_view

Reviewed By: rohan-varma

Differential Revision: D28941327

fbshipit-source-id: 932e8ddbdb2bfd609a78943f6dc390d3d6ca333f
2021-06-08 10:03:21 -07:00
Yi Wang
6575975da9 [Reland2][DDP] Merge work and future_work in reducer (#59574)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59574

Remove `work` attribute from Reducer class in favor of `future_work`.

Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input.

1) Compared with the reverted https://github.com/pytorch/pytorch/pull/58937, updated `_AllReduceCommHookWithDivFactor` in `default_comm_hooks.cpp` to apply division first and hence avoid FP16 overflow.

2) Compared with the reverted https://github.com/pytorch/pytorch/pull/59520, disabled `test_DistributedDataParallel_non_default_stream` on AMD, because now applying division first hurts the gradient averaging accuracy on AMD.
See [07:48:26]:
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.2-py3.6-test1/1129/console

#Original PR Issue: https://github.com/pytorch/pytorch/issues/41266
ghstack-source-id: 130752393

Test Plan:
buck test caffe2/test/distributed:distributed_gloo_fork --  test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view

buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork --  test_DistributedDataParallel_non_default_stream

Reviewed By: rohan-varma

Differential Revision: D28940800

fbshipit-source-id: 1ba727ac951ebc1e7875dc1a1be8108a2c8d9462
2021-06-07 16:52:20 -07:00
Richard Barnes
93140a31e2 Use irange in a few places (#55325)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55325

Test Plan: Sandcastle

Reviewed By: SciPioneer

Differential Revision: D27573006

fbshipit-source-id: 647b5da3901e92c23e95b2fe5e833e9081d72837
2021-06-07 14:53:41 -07:00
Mike Ruberry
94cc681fc2 Revert D28922305: [Reland][DDP] Merge work and future_work in reducer
Test Plan: revert-hammer

Differential Revision:
D28922305 (3137bbeb1a)

Original commit changeset: 6388a96eda7a

fbshipit-source-id: bc150672e857286eeb129ea683b1cfd2034f0564
2021-06-07 03:58:20 -07:00
Mike Ruberry
f998e63dca Revert D28922548: [Gradient Compression] Apply division first to avoid overflow
Test Plan: revert-hammer

Differential Revision:
D28922548 (459270ac01)

Original commit changeset: 442bd3cc7a35

fbshipit-source-id: 7e4361b4eb283cdb21f15a36d6eebf558dd7386f
2021-06-07 03:57:10 -07:00
Yi Wang
459270ac01 [Gradient Compression] Apply division first to avoid overflow (#59522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59522

If the gradients before allreduce are large, then the sum after allreduce may overflow, especially for FP16. Therefore, apply the division before allreduce.

This fix is applied to both C++ and Python comm hooks.
ghstack-source-id: 130686229

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl_is_view
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_is_view
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl_grad_is_view
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl_grad_is_view

Reviewed By: rohan-varma

Differential Revision: D28922548

fbshipit-source-id: 442bd3cc7a35a8b948f626062fa7ad2e3704c5be
2021-06-07 01:43:10 -07:00
Yi Wang
3137bbeb1a [Reland][DDP] Merge work and future_work in reducer (#59520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59520

Remove `work` attribute from Reducer class in favor of `future_work`.

Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input.

Compared with the reverted https://github.com/pytorch/pytorch/pull/58937, updated `_AllReduceCommHookWithDivFactor` in `default_comm_hooks.cpp` to apply division first and hence avoid FP16 overflow.

#Original PR Issue: https://github.com/pytorch/pytorch/issues/41266
ghstack-source-id: 130685351

Test Plan:
buck test caffe2/test/distributed:distributed_gloo_fork --  test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view

Reviewed By: walterddr

Differential Revision: D28922305

fbshipit-source-id: 6388a96eda7a06f292873afed6d1362096c13e1c
2021-06-06 09:49:08 -07:00
Can Balioglu
1d9c1cc00a [4/n] [c10d] Introduce the multi-tenancy feature in TCPStore (#58331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58331

This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.
ghstack-source-id: 130676394

Test Plan:
- Run the existing and newly-introduced tests.
- Run several smoke tests including the short code snippet referred in GitHub issue #41614.

Reviewed By: H-Huang

Differential Revision: D28453850

fbshipit-source-id: f9066b164305de0f8c257e9d5736e93fd7e21ec6
2021-06-05 07:50:07 -07:00
Can Balioglu
844a98758a [3/n] [c10d] Revise the implementation of TCPStore (#58330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58330

This PR is part of a stack that addresses the GitHub issue #41614; it introduces a major refactoring of the `TCPStore` class in preparation of the multi-tenancy feature.

- All TCP sockets are wrapped with a new `TCPSocket` RAII type.
- `BackgroundThread` and daemon types are moved from header to cpp file.
- Server, client, and callback sockets are refactored into their own internal types `TCPServer`, `TCPClient` and `TCPCallbackClient`.
- Calls to `tcputil::send*` and `tcputil::recv*` are wrapped in `TCPClient` for easier readability and maintenance purposes.
- Two `TODO` statements are put to reference future improvements. Based on feedback, I will either create separate GitHub issues for them or address them as part of this stack.
ghstack-source-id: 130676392

Test Plan: Run the existing tests since there are no user-facing behavioral changes.

Reviewed By: H-Huang

Differential Revision: D28448981

fbshipit-source-id: 415b21e74b3cd51d673c1d5c349c6a2cb21dd667
2021-06-05 07:50:06 -07:00
Can Balioglu
4ee761c2c5 [2/n] [c10d] Introduce the 'multiTenant' constructor parameter in TCPStore (#58329)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58329

This PR is part of a stack that addresses the GitHub issue #41614; it introduces:

- A new `multiTenant` constructor option for the `TCPStore` class indicating whether multiple store instances can be initialized with the same host:port pair.

- Updates to the C10d distributed (elastic) rendezvous and the `init_process_group` method to leverage the new `multiTenant` feature.

Note that the multi-tenancy feature itself is implemented in the fourth PR of this stack. In this PR passing `true` to `multiTenant` results only with a warning output.
ghstack-source-id: 130676389

Test Plan: Run the existing tests since there are no behavioral changes.

Reviewed By: rohan-varma

Differential Revision: D28424978

fbshipit-source-id: fb1d1d81b8b5884cc5b54486700a8182a69c1f29
2021-06-05 07:50:04 -07:00
Can Balioglu
cf408c3743 [1/n] [c10d] Introduce a new TCPStore constructor (#58328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58328

This PR is part of a stack that addresses the GitHub issue #41614; it introduces a new `TCPStore` constructor that takes its optional parameters via a newly introduced `TCPStoreOptions` structure. This gives the API callers the flexibility to specify only the desired options while skipping the rest.

The main motivation behind this change is the introduction of the `multiTenant` constructor option in the second PR of this stack.
ghstack-source-id: 130676384

Test Plan: Run the existing tests since there are no behavioral changes.

Reviewed By: H-Huang

Differential Revision: D28417742

fbshipit-source-id: e6ac2a057f7ad1908581176ee6d2c2554c3c74a9
2021-06-05 07:50:02 -07:00
Rong Rong (AI Infra)
c88a0b55b3 Revert D28677383: [DDP] Merge work and future_work in reducer
Test Plan: revert-hammer

Differential Revision:
D28677383 (f8bebade47)

Original commit changeset: 85e0620378b7

fbshipit-source-id: ef3c65b88c375aa9a6befe2ab004ec37ae7eb587
2021-06-05 07:25:44 -07:00
Yi Wang
f8bebade47 [DDP] Merge work and future_work in reducer (#58937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58937

Remove `work` attribute from Reducer class in favor of `future_work`.

Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input.

#Original PR Issue: https://github.com/pytorch/pytorch/issues/41266
ghstack-source-id: 130673249

Test Plan:
buck test caffe2/test/distributed:distributed_gloo_fork --  test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs

Reviewed By: agolynski

Differential Revision: D28677383

fbshipit-source-id: 85e0620378b7e9d837e436e94b9d807631d7d752
2021-06-05 01:18:30 -07:00
Alexander Golynski
1183fa3817 Switch PG::Work to Future in default_comm_hooks.cpp (#59398)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59398

Test Plan: Imported from OSS

Reviewed By: SciPioneer

Differential Revision: D28876182

Pulled By: agolynski

fbshipit-source-id: 9d8f09ffa2f40bb0fb25c626b52678a1597a797e
2021-06-04 15:27:13 -07:00
Liang Luo
77de640f4b [torch distributed] Implementing reduce_scatter_base (#57567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57567

Support flattened reduce_scatter.

Test Plan:
buck test mode/opt -c fbcode.enable_gpu_sections=true //caffe2/torch/lib/c10d:ProcessGroupNCCLTest
buck test mode/opt -c fbcode.enable_gpu_sections=true //caffe2/test/distributed:c10d

Reviewed By: zhaojuanmao

Differential Revision: D27876281

fbshipit-source-id: 58e2edfb1baff5cdc083dbaaba9f19502ef0b298
2021-06-03 17:17:53 -07:00
Rohan Varma
332b01e93f [DDP] log usage of torch_distributed_debug (#59351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59351

Logging PT distributed debug level to track usage internally.
ghstack-source-id: 130443122

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D28854914

fbshipit-source-id: a8e85ca4a3c9ac2f18d13190e87c0ebc4a8e7ea2
2021-06-03 11:49:23 -07:00
Richard Barnes
3979cb0656 irange for size_t (#55320)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27572577

fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03
2021-06-03 01:04:13 -07:00
Rohan Varma
79aeca0b00 [DDP] Log when errors happen (#59281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59281

Adds ability to log when reducer/ddp encounters an error. We add fields "has_error" and "error" to indicate that an error has
occured in this iteration, and the other fields (performance stats) are not
guaranteed to be updated.

Errors encountered in python-side DDP will be added in the next diff.
ghstack-source-id: 130412974

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28652717

fbshipit-source-id: 9772abc2647a92dac6a325da6976ef5eb877c589
2021-06-02 19:48:26 -07:00
Rohan Varma
1968efa2dd [c10d] Remove verbose log (#59070)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59070

This log is too verbose, especially in the case we call monitored
barrier before every collective as we do in ProcessGroupWrapper.
ghstack-source-id: 130052822

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D28738189

fbshipit-source-id: f2899537caa4c13508da31134d5dd0f4fd6a1f3a
2021-06-02 13:50:11 -07:00
Michael Suo
b977a3b66d [c10d] Split custom class bindings out of python binding code (#58992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58992

Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.
ghstack-source-id: 130168942

Test Plan: CI

Reviewed By: wconstab

Differential Revision: D28690832

fbshipit-source-id: 3c5e3fff28abb8bcdb4a952794c07de1ee2ae5a8
2021-05-28 15:35:23 -07:00
Nikita Shulga
0e9a295b41 Refactor GlooDeviceFactory::makeDeviceFor... (#58996)
Summary:
`makeDeviceForHostname` and `makeDeviceForInterface` are almost
duplicate except for different default argument values

Create generic `makeGlooDevice` anonymous function that takes both host
name and interface name and call it from both
makeDeviceFor[Hostname|Interface]

Also solve two other minor issues:
 - do not call `getenv("GLOO_DEVICE_TRANSPORT")` during library load
   time
 - Raise exception rather than crash if GLOO_DEVICE_TRANSPORT is set to unknown value

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58996

Reviewed By: pbelevich

Differential Revision: D28713324

Pulled By: malfet

fbshipit-source-id: cb33b438078d163e3ec6f047f2e5247b07d94f8d
2021-05-26 20:33:11 -07:00