pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nikita Shulga	36ac095ff8	Migrate PyTorch to C++17 (#85969 ) With CUDA-10.2 gone we can finally do it! This PR mostly contains build system related changes, invasive functional ones are to be followed. Among many expected tweaks to the build system, here are few unexpected ones: - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code. - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it. Some prerequisites: - https://github.com/pytorch/pytorch/pull/89297 - https://github.com/pytorch/pytorch/pull/89605 - https://github.com/pytorch/pytorch/pull/90228 - https://github.com/pytorch/pytorch/pull/90389 - https://github.com/pytorch/pytorch/pull/90379 - https://github.com/pytorch/pytorch/pull/89570 - https://github.com/facebookincubator/gloo/pull/336 - https://github.com/facebookincubator/gloo/pull/343 - `919676fb32` Fixes https://github.com/pytorch/pytorch/issues/56055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969 Approved by: https://github.com/ezyang, https://github.com/kulinseth	2022-12-08 02:27:48 +00:00
Peter Bell	5a8b07de75	Declare public dependencies on libshm (#82694 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82694 Approved by: https://github.com/malfet	2022-10-07 00:01:25 +00:00
Tongliang Liao	dff70a5e1a	Make language std configurable. (#75519 ) RocksDB 7 starts to use C++17 in header. We should make this configurable, in case user needs higher std version. List of files to changed is found by `git grep 'CMAKE_[^_]*_STANDARD'`. Doc string is from CMake code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75519 Approved by: https://github.com/malfet	2022-07-13 14:21:27 +00:00
Michael Andreas Dagitses	ab2ca95dd1	turn on -Werror=unused-variable in our Bazel CPU build Summary: We also fix any existing issues. Note that we only do this for the CPU build because nvcc is considered a C++ toolchain but it does not have the same flag support. Adding flags to the GPU build will cause nvcc errors. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79156 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-11 02:46:34 +00:00
Shashank Chaudhry	89c4e8c22b	[NOOP][clangformat][codemod] Enable CLANGFORMAT for some folders in caffe2/* (#67746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67746 Test Plan: Visual inspection. Sandcastle. Reviewed By: zertosh Differential Revision: D31986646 fbshipit-source-id: 91885c20c3cead3853c49abb9fe0a94a67f33cc8	2021-11-03 12:23:14 -07:00
Nikita Shulga	c373387709	Update CMake and use native CUDA language support (#62445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445 PyTorch currently uses the old style of compiling CUDA in CMake which is just a bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as a language just like C++ or C. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503350 fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55	2021-10-11 09:05:48 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Richard Barnes	a91be24e2d	Modernize make pointers (#61741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61741 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717385 fbshipit-source-id: 4452b77981e49175f744bdaab12cd225bf75b90e	2021-07-22 15:54:37 -07:00
Richard Barnes	a8d99a28d7	Modernize avoid a C array (#61740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61740 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717118 fbshipit-source-id: 70e73346b75deb4fe6b6399e06bd576f3b6e2b91	2021-07-21 13:52:54 -07:00
Richard Barnes	59a5312ce6	Modernize fix deprecated header (#61736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61736 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29716965 fbshipit-source-id: 314c2b557c240ac16bbfab114ab764beb189e78a	2021-07-20 10:06:11 -07:00
Peter Bell	4a7d281119	Migrate THAllocator to ATen (#60325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60325 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29371715 Pulled By: ngimel fbshipit-source-id: 78ec8368a48e1a4690d0664a0b02d2a235af98ff	2021-06-24 19:42:14 -07:00
Luca Wehrstedt	a016150163	Move torch/lib/c10d to torch/csrc/distributed/c10d (#60543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60543 Since now c10d is part of libtorch, it would also be nice if the sources lived all in one place. ghstack-source-id: 132306292 Test Plan: It builds Reviewed By: cbalioglu Differential Revision: D29062002 fbshipit-source-id: d9e1301e9d73e1643fa0f0119cd2d618f1ad52e6	2021-06-24 12:38:51 -07:00
Eli Uriegas	2dedd96dd2	cmake: Prefer CMAKE_CURRENT_SOURCE_DIR to TORCH_SRC_DIR (#60493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60493 TORCH_SRC_DIR appears to be a bit bugged when it comes to identifying include directories so let's try and use CMAKE_CURRENT_SOURCE_DIR instead <details> <summary>Logs for builds with torchaudio</summary> ``` -- Building version 0.10.0a0+9e36281 running bdist_wheel running build running build_py copying torchaudio/version.py -> build/lib.linux-x86_64-3.6/torchaudio running build_ext -- Configuring done -- Generating done -- Build files have been written to: /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6 [1/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-error.cc [2/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-math.cc [3/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/feature-functions.cc [4/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-matrix.cc [5/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -c ../../third_party/kaldi/submodule/src/feat/resample.cc [6/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-vector.cc [7/11] /usr/lib64/ccache/c++ -DINCLUDE_KALDI -DTORCH_API_INCLUDE_EXTENSION_H -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_torchaudio_EXPORTS -I../../ -I/tmp/tmp.GKeM3KKcFi/include/python3.6m -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -MF torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o.d -o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -c ../../torchaudio/csrc/kaldi.cpp [8/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlinePitchFeatureImpl::UpdateRemainder(const kaldi::VectorBase<float>&)’: ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:814:11: warning: unused variable ‘full_frame_length’ [-Wunused-variable] 814 \| int32 full_frame_length = opts_.NccfWindowSize() + nccf_last_lag_; \| ^~~~~~~~~~~~~~~~~ ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlineProcessPitch::UpdateNormalizationStats(kaldi::int32)’: ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:1504:35: warning: comparison of integer expressions of different signedness: ‘std::vector<kaldi::OnlineProcessPitch::NormalizationStats>::size_type’ {aka ‘long unsigned int’} and ‘kaldi::int32’ {aka ‘int’} [-Wsign-compare] 1504 \| if (normalization_stats_.size() <= frame) \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ [9/11] : && /usr/bin/cmake -E rm -f third_party/kaldi/libkaldi.a && /usr/bin/ar qc third_party/kaldi/libkaldi.a third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o && /usr/bin/ranlib third_party/kaldi/libkaldi.a && : [10/11] : && /usr/lib64/ccache/c++ -fPIC -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG -shared -Wl,-soname,_torchaudio.so -o torchaudio/csrc/_torchaudio.so torchaudio/csrc/CMakeFiles/_torchaudio.dir/pybind.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/lfilter.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/overdrive.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/utils.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -Wl,-rpath,/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib: /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_python.so third_party/kaldi/libkaldi.a /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed /usr/local/lib/libbreakpad_client.a /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so -lpthread -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so" -Wl,--as-needed /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so && : [10/11] cd /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6 && /usr/bin/cmake -P cmake_install.cmake -- Install configuration: "Release" -- Installing: /home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so -- Set runtime path of "/home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so" to "" installing to build/bdist.linux-x86_64/wheel running install running install_lib creating build/bdist.linux-x86_64/wheel creating build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/kaldi_io.py -> build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/transforms.py -> build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio creating build/bdist.linux-x86_64/wheel/torchaudio/compliance copying build/lib.linux-x86_64-3.6/torchaudio/compliance/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance copying build/lib.linux-x86_64-3.6/torchaudio/compliance/kaldi.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance creating build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/cmuarctic.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/librispeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/libritts.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/vctk.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/commonvoice.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/gtzan.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/ljspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/speechcommands.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/tedlium.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/yesno.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets creating build/bdist.linux-x86_64/wheel/torchaudio/_internal copying build/lib.linux-x86_64-3.6/torchaudio/_internal/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal copying build/lib.linux-x86_64-3.6/torchaudio/_internal/fft.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal copying build/lib.linux-x86_64-3.6/torchaudio/_internal/module_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal creating build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/common.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/no_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/soundfile_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/sox_io_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend creating build/bdist.linux-x86_64/wheel/torchaudio/extension copying build/lib.linux-x86_64-3.6/torchaudio/extension/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension copying build/lib.linux-x86_64-3.6/torchaudio/extension/extension.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension creating build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/conv_tasnet.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/deepspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2letter.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/wavernn.py -> build/bdist.linux-x86_64/wheel/torchaudio/models creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/components.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/model.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_fairseq.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_huggingface.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils creating build/bdist.linux-x86_64/wheel/torchaudio/sox_effects copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/sox_effects.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects creating build/bdist.linux-x86_64/wheel/torchaudio/utils copying build/lib.linux-x86_64-3.6/torchaudio/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils copying build/lib.linux-x86_64-3.6/torchaudio/utils/sox_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils creating build/bdist.linux-x86_64/wheel/torchaudio/functional copying build/lib.linux-x86_64-3.6/torchaudio/functional/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional copying build/lib.linux-x86_64-3.6/torchaudio/functional/filtering.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional copying build/lib.linux-x86_64-3.6/torchaudio/functional/functional.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional creating build/bdist.linux-x86_64/wheel/torchaudio/prototype copying build/lib.linux-x86_64-3.6/torchaudio/prototype/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype copying build/lib.linux-x86_64-3.6/torchaudio/prototype/rnnt_loss.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype copying build/lib.linux-x86_64-3.6/torchaudio/version.py -> build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/_torchaudio.so -> build/bdist.linux-x86_64/wheel/torchaudio running install_egg_info running egg_info writing torchaudio.egg-info/PKG-INFO writing dependency_links to torchaudio.egg-info/dependency_links.txt writing requirements to torchaudio.egg-info/requires.txt writing top-level names to torchaudio.egg-info/top_level.txt reading manifest file 'torchaudio.egg-info/SOURCES.txt' writing manifest file 'torchaudio.egg-info/SOURCES.txt' Copying torchaudio.egg-info to build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281-py3.6.egg-info running install_scripts adding license file "LICENSE" (matched pattern "LICEN[CS]E*") creating build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281.dist-info/WHEEL creating 'dist/torchaudio-0.10.0a0+9e36281-cp36-cp36m-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it adding 'torchaudio/__init__.py' adding 'torchaudio/_torchaudio.so' adding 'torchaudio/kaldi_io.py' adding 'torchaudio/transforms.py' adding 'torchaudio/version.py' adding 'torchaudio/_internal/__init__.py' adding 'torchaudio/_internal/fft.py' adding 'torchaudio/_internal/module_utils.py' adding 'torchaudio/backend/__init__.py' adding 'torchaudio/backend/common.py' adding 'torchaudio/backend/no_backend.py' adding 'torchaudio/backend/soundfile_backend.py' adding 'torchaudio/backend/sox_io_backend.py' adding 'torchaudio/backend/utils.py' adding 'torchaudio/compliance/__init__.py' adding 'torchaudio/compliance/kaldi.py' adding 'torchaudio/datasets/__init__.py' adding 'torchaudio/datasets/cmuarctic.py' adding 'torchaudio/datasets/commonvoice.py' adding 'torchaudio/datasets/gtzan.py' adding 'torchaudio/datasets/librispeech.py' adding 'torchaudio/datasets/libritts.py' adding 'torchaudio/datasets/ljspeech.py' adding 'torchaudio/datasets/speechcommands.py' adding 'torchaudio/datasets/tedlium.py' adding 'torchaudio/datasets/utils.py' adding 'torchaudio/datasets/vctk.py' adding 'torchaudio/datasets/yesno.py' adding 'torchaudio/extension/__init__.py' adding 'torchaudio/extension/extension.py' adding 'torchaudio/functional/__init__.py' adding 'torchaudio/functional/filtering.py' adding 'torchaudio/functional/functional.py' adding 'torchaudio/models/__init__.py' adding 'torchaudio/models/conv_tasnet.py' adding 'torchaudio/models/deepspeech.py' adding 'torchaudio/models/wav2letter.py' adding 'torchaudio/models/wavernn.py' adding 'torchaudio/models/wav2vec2/__init__.py' adding 'torchaudio/models/wav2vec2/components.py' adding 'torchaudio/models/wav2vec2/model.py' adding 'torchaudio/models/wav2vec2/utils/__init__.py' adding 'torchaudio/models/wav2vec2/utils/import_fairseq.py' adding 'torchaudio/models/wav2vec2/utils/import_huggingface.py' adding 'torchaudio/prototype/__init__.py' adding 'torchaudio/prototype/rnnt_loss.py' adding 'torchaudio/sox_effects/__init__.py' adding 'torchaudio/sox_effects/sox_effects.py' adding 'torchaudio/utils/__init__.py' adding 'torchaudio/utils/sox_utils.py' adding 'torchaudio-0.10.0a0+9e36281.dist-info/LICENSE' adding 'torchaudio-0.10.0a0+9e36281.dist-info/METADATA' adding 'torchaudio-0.10.0a0+9e36281.dist-info/WHEEL' adding 'torchaudio-0.10.0a0+9e36281.dist-info/top_level.txt' adding 'torchaudio-0.10.0a0+9e36281.dist-info/RECORD' removing build/bdist.linux-x86_64/wheel ``` </details> Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D29316372 Pulled By: seemethere fbshipit-source-id: 02be64df6197c0d4bad5a5bfb3cef336c11f53ed	2021-06-23 14:08:19 -07:00
Rohan Varma	d5df274ea5	[DDP] Support for multiple backwards (#59359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59359 Move `prepare_for_backward` into `_DDPSink` backward instead of calling it in DDP forward pass so that we can run multiple backwards in DDP with `retain_graph=True`. ghstack-source-id: 131774159 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28855226 fbshipit-source-id: 6b7b25d75b7696f5b5629078233433f97663d61c	2021-06-18 09:23:57 -07:00
Alexander Golynski	ed1da5be21	PG NCCL cleanup: remove usage of completed_ in WorkNCCL copies (#59899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59899 Test Plan: Imported from OSS Reviewed By: cbalioglu, osalpekar Differential Revision: D29080299 Pulled By: agolynski fbshipit-source-id: 9ae368f91e81f19471e0a20fc913d8e9df1b9dec	2021-06-17 09:05:35 -07:00
Neel Pragnesh Gandhi	2c5db9a40a	Add c10d filestore functionality to the current c10d_rendezvous_backend (#59719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59719 Added filestore functionality to the c10d backend. FileStore will create a temporary file in the /tmp directory to use if it is selected as the store type. Appropriate tests were added as well. FileStore was modified to expose the path field for testing. It was also modified so that the numWorkers field in the constructor is optional (defaulting to -1). A negative value indicates there is not a fixed number of workers. In this case, the file is not attempted to be cleaned at the end. Test Plan: Unit tests for creating a c10d backend with filestore and simple error handling. Reviewed By: cbalioglu, H-Huang Differential Revision: D28997436 fbshipit-source-id: 24c9b2c9b13ea6c947e8b1207beda892bdca2217	2021-06-16 12:13:36 -07:00
Luca Wehrstedt	a1780432fa	Move c10d to libtorch(_cuda) (#59563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59563 ghstack-source-id: 131331264 Test Plan: CI Reviewed By: malfet Differential Revision: D28932239 fbshipit-source-id: 5df6cdfa5253b15cbbc97039fe672d6d97321e34	2021-06-15 02:01:31 -07:00
Rohan Varma	580a20f33b	[reland] torch/lib/c10d: Use torch_check instead of throwing runtime_error (#59918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59918 Reland of https://github.com/pytorch/pytorch/pull/59684 ghstack-source-id: 131303057 Test Plan: ci Reviewed By: cbalioglu Differential Revision: D29081452 fbshipit-source-id: 419df79341f702e796f7adf5f1071a6cd1dcd8d1	2021-06-14 09:52:54 -07:00
Michael Carilli	be038d8989	[CUDA graphs] Make stream semantics of backward calls consistent with other cuda ops (ci-all edition) (#57833 ) Summary: ci-all resubmit of https://github.com/pytorch/pytorch/pull/54227. Tests look good except for a few distributed autograd failures (pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test) and rocm failures (pr/pytorch-linux-bionic-rocm4.1-py3.6). The common denominator in rocm failures appears to be multi-gpu activity: some [multiprocess DDP failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test1/8115/console), some [single-process failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test2/8115/console) where the single process has autograd ops that span devices. jeffdaily jithunnair-amd sunway513, could one of you take a look? The streaming backward change is also beneficial to rocm, I expect. For debugging rocm failures, I think we should ignore the multiprocess/DDP tests and focus on the single process cases. The root cause is probably the same and the single process cases are simpler. ---------------------------------- Update: Rocm failures are due to https://github.com/pytorch/pytorch/issues/59750. `2718a54032` is a workaround, to be updated once https://github.com/pytorch/pytorch/issues/59750 is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57833 Reviewed By: mruberry Differential Revision: D28942391 Pulled By: ngimel fbshipit-source-id: d6047e971c5f1c6386334bf3641402a92f12e2f8	2021-06-13 12:09:56 -07:00
Rohan Varma	3529a48ebb	Revert D28981326: torch/lib/c10d: Use torch_check instead of throwing runtime_error Test Plan: revert-hammer Differential Revision: D28981326 (`6ea6075002`) Original commit changeset: 264a7f787ea8 fbshipit-source-id: 75625b76dfbd0cbaf59705d621ef9e2d1677c482	2021-06-11 17:17:10 -07:00
Rohan Varma	6ea6075002	torch/lib/c10d: Use torch_check instead of throwing runtime_error (#59684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59684 Same reasoning as in the below diff. ghstack-source-id: 131167212 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D28981326 fbshipit-source-id: 264a7f787ea8be76f743a2eaca67ae1d3bd8073a	2021-06-11 11:16:58 -07:00
Luca Wehrstedt	c9e4d1372f	Add guards for USE_C10D_FOO in relevant c10d files (#59697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59697 The c10d build process selectively adds files based on the `USE_C10D_FOO` flags (where `FOO` is one of `GLOO`, `NCCL` or `MPI`). Replicating this logic inside libtorch will be harder, since libtorch uses a simpler approach (i.e., it lists the files in `build_variables.bzl`). So instead we could always include all files, and "disable" each file as needed using `#ifdef`s. Note that this is not a new approach: we already do the same for all the files of the TensorPipe agent based on the flag `USE_TENSORPIPE`. ghstack-source-id: 131169540 Test Plan: CI Reviewed By: agolynski Differential Revision: D28987577 fbshipit-source-id: 4c6195de4e9a58101dad9379537e8d055dfd38af	2021-06-11 05:06:42 -07:00
Luca Wehrstedt	773b56e719	Fix Windows guards in c10d (#59696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59696 Some files in c10d refer to dist autograd. However, on Windows, dist autograd isn't built. Hence we need to "mask out" those references under Windows. This was already partly done, but when moving c10d to libtorch some issues came up, possibly due to the different way in which linking happens. Hence I masked out the remaining references. ghstack-source-id: 131169541 Test Plan: CI Reviewed By: agolynski Differential Revision: D28987579 fbshipit-source-id: c29c5330f8429d699554972d30f99a89b2e3971d	2021-06-11 05:06:40 -07:00
Luca Wehrstedt	cbcae46fa5	Remove USE_CUDA from c10d reducer/logger (#59562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59562 Needed to merge c10d into libtorch(_cuda). ghstack-source-id: 131169542 Test Plan: CI Reviewed By: agolynski Differential Revision: D28931378 fbshipit-source-id: 71376b862ff6ef7dbfa7331ec8d269bd3fcc7e0d	2021-06-11 05:06:39 -07:00
Luca Wehrstedt	b4c35d7ae7	Remove USE_CUDA from ProcessGroupGloo (#59561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59561 Needed to merge c10d into libtorch(_cuda). ghstack-source-id: 131169544 Test Plan: CI Reviewed By: agolynski Differential Revision: D28931379 fbshipit-source-id: 9bd68477ae7bb870b6737a555edd5696149ff5d6	2021-06-11 05:05:31 -07:00
Rohan Varma	fc0582ee95	[c10d] Use TORCH_CHECK for monitored barrier error (#59667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59667 Use torch_check over throw std::runtime_error in monitored barrier so that it works with torch_cpp_show_stacktraces to reveal the entire callstack where the monitored barrier failed, which can help determine where the particular rank encountered an issue. ghstack-source-id: 130993689 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D28974510 fbshipit-source-id: 6a6958995c1066cddcd647ca88c74473079b69fc	2021-06-09 22:31:33 -07:00
Richard Barnes	e3d75b8475	irange for PyTorch sans jit (#59481 ) Summary: Switches most of the simple for loops outside of `jit` directories to use `c10::irange`. Generated with D28874212. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59481 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D28909681 fbshipit-source-id: ec9ab1bd602933238d9d0f73d4d8d027b75d9d85	2021-06-09 14:46:11 -07:00
Yi Wang	31d136c81f	[DDP] Rename the member divFactor_ as div_factor for naming consistency in reducer (#59523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59523 Should use snake case instead of camel case for the consistency. ghstack-source-id: 130759655 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs Reviewed By: cbalioglu Differential Revision: D28922896 fbshipit-source-id: e04298284a78b2e71b562f790a878731962f873a	2021-06-08 10:04:20 -07:00
Yi Wang	b7ee164456	[DDP] Remove the duplicate parseHookResult in reducer (#59510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59510 Address the comment in https://github.com/pytorch/pytorch/pull/58937#discussion_r645822768 #Closes: https://github.com/pytorch/pytorch/issues/41266 ghstack-source-id: 130758758 Test Plan: waitforbuildbot Reviewed By: cbalioglu Differential Revision: D28918694 fbshipit-source-id: 7ac4e4e6268e220adefed230bdb377ab3b25e302	2021-06-08 10:04:18 -07:00
Yi Wang	2b398d0537	[Reland][Gradient Compression] Apply division first to avoid overflow (#59576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59576 If the gradients before allreduce are large, then the sum after allreduce may overflow, especially for FP16. Therefore, apply the division before allreduce. This fix is applied to both C++ and Python comm hooks. ghstack-source-id: 130754510 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl_grad_is_view Reviewed By: rohan-varma Differential Revision: D28941327 fbshipit-source-id: 932e8ddbdb2bfd609a78943f6dc390d3d6ca333f	2021-06-08 10:03:21 -07:00
Yi Wang	6575975da9	[Reland2][DDP] Merge work and future_work in reducer (#59574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59574 Remove `work` attribute from Reducer class in favor of `future_work`. Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input. 1) Compared with the reverted https://github.com/pytorch/pytorch/pull/58937, updated `_AllReduceCommHookWithDivFactor` in `default_comm_hooks.cpp` to apply division first and hence avoid FP16 overflow. 2) Compared with the reverted https://github.com/pytorch/pytorch/pull/59520, disabled `test_DistributedDataParallel_non_default_stream` on AMD, because now applying division first hurts the gradient averaging accuracy on AMD. See [07:48:26]: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.2-py3.6-test1/1129/console #Original PR Issue: https://github.com/pytorch/pytorch/issues/41266 ghstack-source-id: 130752393 Test Plan: buck test caffe2/test/distributed:distributed_gloo_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16 buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_non_default_stream Reviewed By: rohan-varma Differential Revision: D28940800 fbshipit-source-id: 1ba727ac951ebc1e7875dc1a1be8108a2c8d9462	2021-06-07 16:52:20 -07:00
Richard Barnes	93140a31e2	Use irange in a few places (#55325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55325 Test Plan: Sandcastle Reviewed By: SciPioneer Differential Revision: D27573006 fbshipit-source-id: 647b5da3901e92c23e95b2fe5e833e9081d72837	2021-06-07 14:53:41 -07:00
Mike Ruberry	94cc681fc2	Revert D28922305: [Reland][DDP] Merge work and future_work in reducer Test Plan: revert-hammer Differential Revision: D28922305 (`3137bbeb1a`) Original commit changeset: 6388a96eda7a fbshipit-source-id: bc150672e857286eeb129ea683b1cfd2034f0564	2021-06-07 03:58:20 -07:00
Mike Ruberry	f998e63dca	Revert D28922548: [Gradient Compression] Apply division first to avoid overflow Test Plan: revert-hammer Differential Revision: D28922548 (`459270ac01`) Original commit changeset: 442bd3cc7a35 fbshipit-source-id: 7e4361b4eb283cdb21f15a36d6eebf558dd7386f	2021-06-07 03:57:10 -07:00
Yi Wang	459270ac01	[Gradient Compression] Apply division first to avoid overflow (#59522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59522 If the gradients before allreduce are large, then the sum after allreduce may overflow, especially for FP16. Therefore, apply the division before allreduce. This fix is applied to both C++ and Python comm hooks. ghstack-source-id: 130686229 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl_grad_is_view Reviewed By: rohan-varma Differential Revision: D28922548 fbshipit-source-id: 442bd3cc7a35a8b948f626062fa7ad2e3704c5be	2021-06-07 01:43:10 -07:00
Yi Wang	3137bbeb1a	[Reland][DDP] Merge work and future_work in reducer (#59520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59520 Remove `work` attribute from Reducer class in favor of `future_work`. Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input. Compared with the reverted https://github.com/pytorch/pytorch/pull/58937, updated `_AllReduceCommHookWithDivFactor` in `default_comm_hooks.cpp` to apply division first and hence avoid FP16 overflow. #Original PR Issue: https://github.com/pytorch/pytorch/issues/41266 ghstack-source-id: 130685351 Test Plan: buck test caffe2/test/distributed:distributed_gloo_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16 buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view Reviewed By: walterddr Differential Revision: D28922305 fbshipit-source-id: 6388a96eda7a06f292873afed6d1362096c13e1c	2021-06-06 09:49:08 -07:00
Can Balioglu	1d9c1cc00a	[4/n] [c10d] Introduce the multi-tenancy feature in TCPStore (#58331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58331 This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair. ghstack-source-id: 130676394 Test Plan: - Run the existing and newly-introduced tests. - Run several smoke tests including the short code snippet referred in GitHub issue #41614. Reviewed By: H-Huang Differential Revision: D28453850 fbshipit-source-id: f9066b164305de0f8c257e9d5736e93fd7e21ec6	2021-06-05 07:50:07 -07:00
Can Balioglu	844a98758a	[3/n] [c10d] Revise the implementation of TCPStore (#58330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58330 This PR is part of a stack that addresses the GitHub issue #41614; it introduces a major refactoring of the `TCPStore` class in preparation of the multi-tenancy feature. - All TCP sockets are wrapped with a new `TCPSocket` RAII type. - `BackgroundThread` and daemon types are moved from header to cpp file. - Server, client, and callback sockets are refactored into their own internal types `TCPServer`, `TCPClient` and `TCPCallbackClient`. - Calls to `tcputil::send` and `tcputil::recv` are wrapped in `TCPClient` for easier readability and maintenance purposes. - Two `TODO` statements are put to reference future improvements. Based on feedback, I will either create separate GitHub issues for them or address them as part of this stack. ghstack-source-id: 130676392 Test Plan: Run the existing tests since there are no user-facing behavioral changes. Reviewed By: H-Huang Differential Revision: D28448981 fbshipit-source-id: 415b21e74b3cd51d673c1d5c349c6a2cb21dd667	2021-06-05 07:50:06 -07:00
Can Balioglu	4ee761c2c5	[2/n] [c10d] Introduce the 'multiTenant' constructor parameter in TCPStore (#58329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58329 This PR is part of a stack that addresses the GitHub issue #41614; it introduces: - A new `multiTenant` constructor option for the `TCPStore` class indicating whether multiple store instances can be initialized with the same host:port pair. - Updates to the C10d distributed (elastic) rendezvous and the `init_process_group` method to leverage the new `multiTenant` feature. Note that the multi-tenancy feature itself is implemented in the fourth PR of this stack. In this PR passing `true` to `multiTenant` results only with a warning output. ghstack-source-id: 130676389 Test Plan: Run the existing tests since there are no behavioral changes. Reviewed By: rohan-varma Differential Revision: D28424978 fbshipit-source-id: fb1d1d81b8b5884cc5b54486700a8182a69c1f29	2021-06-05 07:50:04 -07:00
Can Balioglu	cf408c3743	[1/n] [c10d] Introduce a new TCPStore constructor (#58328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58328 This PR is part of a stack that addresses the GitHub issue #41614; it introduces a new `TCPStore` constructor that takes its optional parameters via a newly introduced `TCPStoreOptions` structure. This gives the API callers the flexibility to specify only the desired options while skipping the rest. The main motivation behind this change is the introduction of the `multiTenant` constructor option in the second PR of this stack. ghstack-source-id: 130676384 Test Plan: Run the existing tests since there are no behavioral changes. Reviewed By: H-Huang Differential Revision: D28417742 fbshipit-source-id: e6ac2a057f7ad1908581176ee6d2c2554c3c74a9	2021-06-05 07:50:02 -07:00
Rong Rong (AI Infra)	c88a0b55b3	Revert D28677383: [DDP] Merge work and future_work in reducer Test Plan: revert-hammer Differential Revision: D28677383 (`f8bebade47`) Original commit changeset: 85e0620378b7 fbshipit-source-id: ef3c65b88c375aa9a6befe2ab004ec37ae7eb587	2021-06-05 07:25:44 -07:00
Yi Wang	f8bebade47	[DDP] Merge work and future_work in reducer (#58937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58937 Remove `work` attribute from Reducer class in favor of `future_work`. Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input. #Original PR Issue: https://github.com/pytorch/pytorch/issues/41266 ghstack-source-id: 130673249 Test Plan: buck test caffe2/test/distributed:distributed_gloo_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs Reviewed By: agolynski Differential Revision: D28677383 fbshipit-source-id: 85e0620378b7e9d837e436e94b9d807631d7d752	2021-06-05 01:18:30 -07:00
Alexander Golynski	1183fa3817	Switch PG::Work to Future in default_comm_hooks.cpp (#59398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59398 Test Plan: Imported from OSS Reviewed By: SciPioneer Differential Revision: D28876182 Pulled By: agolynski fbshipit-source-id: 9d8f09ffa2f40bb0fb25c626b52678a1597a797e	2021-06-04 15:27:13 -07:00
Liang Luo	77de640f4b	[torch distributed] Implementing reduce_scatter_base (#57567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57567 Support flattened reduce_scatter. Test Plan: buck test mode/opt -c fbcode.enable_gpu_sections=true //caffe2/torch/lib/c10d:ProcessGroupNCCLTest buck test mode/opt -c fbcode.enable_gpu_sections=true //caffe2/test/distributed:c10d Reviewed By: zhaojuanmao Differential Revision: D27876281 fbshipit-source-id: 58e2edfb1baff5cdc083dbaaba9f19502ef0b298	2021-06-03 17:17:53 -07:00
Rohan Varma	332b01e93f	[DDP] log usage of torch_distributed_debug (#59351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59351 Logging PT distributed debug level to track usage internally. ghstack-source-id: 130443122 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28854914 fbshipit-source-id: a8e85ca4a3c9ac2f18d13190e87c0ebc4a8e7ea2	2021-06-03 11:49:23 -07:00
Richard Barnes	3979cb0656	irange for size_t (#55320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27572577 fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03	2021-06-03 01:04:13 -07:00
Rohan Varma	79aeca0b00	[DDP] Log when errors happen (#59281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59281 Adds ability to log when reducer/ddp encounters an error. We add fields "has_error" and "error" to indicate that an error has occured in this iteration, and the other fields (performance stats) are not guaranteed to be updated. Errors encountered in python-side DDP will be added in the next diff. ghstack-source-id: 130412974 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28652717 fbshipit-source-id: 9772abc2647a92dac6a325da6976ef5eb877c589	2021-06-02 19:48:26 -07:00
Rohan Varma	1968efa2dd	[c10d] Remove verbose log (#59070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59070 This log is too verbose, especially in the case we call monitored barrier before every collective as we do in ProcessGroupWrapper. ghstack-source-id: 130052822 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D28738189 fbshipit-source-id: f2899537caa4c13508da31134d5dd0f4fd6a1f3a	2021-06-02 13:50:11 -07:00
Michael Suo	b977a3b66d	[c10d] Split custom class bindings out of python binding code (#58992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58992 Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. ghstack-source-id: 130168942 Test Plan: CI Reviewed By: wconstab Differential Revision: D28690832 fbshipit-source-id: 3c5e3fff28abb8bcdb4a952794c07de1ee2ae5a8	2021-05-28 15:35:23 -07:00
Nikita Shulga	0e9a295b41	Refactor GlooDeviceFactory::makeDeviceFor... (#58996 ) Summary: `makeDeviceForHostname` and `makeDeviceForInterface` are almost duplicate except for different default argument values Create generic `makeGlooDevice` anonymous function that takes both host name and interface name and call it from both makeDeviceFor[Hostname\|Interface] Also solve two other minor issues: - do not call `getenv("GLOO_DEVICE_TRANSPORT")` during library load time - Raise exception rather than crash if GLOO_DEVICE_TRANSPORT is set to unknown value Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/58996 Reviewed By: pbelevich Differential Revision: D28713324 Pulled By: malfet fbshipit-source-id: cb33b438078d163e3ec6f047f2e5247b07d94f8d	2021-05-26 20:33:11 -07:00

1 2 3 4 5 ...

1387 Commits