This PR applies clang-tidy readability checks to jit sources and all headers in the code base.
`readability-redundant-inline-specifier` is suppressed because it incurs too many changes. `readability-redundant-inline-specifier` is used to detect redundant inline specifiers on function and variable declarations. There are many in-class method definitions that are marked inline.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164652
Approved by: https://github.com/Skylion007
This PR applies clang-tidy readability checks to jit sources and all headers in the code base.
`readability-redundant-inline-specifier` is suppressed because it incurs too many changes. `readability-redundant-inline-specifier` is used to detect redundant inline specifiers on function and variable declarations. There are many in-class method definitions that are marked inline.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164652
Approved by: https://github.com/Skylion007
Fixes#134714 (or attempts to, idk how to test yet)
For posterity, how one can test:
1. make sure you have USE_PTHREADPOOL=1 or pull a packaged binary
2. run gdb --args python, with `r` to enter, `Ctrl-C` to pause, and `c` to get back into Python
3. import torch
4. torch.set_num_threads(1), make sure this does not trigger any additional threads getting created.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136793
Approved by: https://github.com/albanD
Threads inside the thread pools are not named, so they inherit the main process name or the name of the first thread. In our case if we set `pt_main_thread` as the thread name when a thread does `import torch`, this name will be inherited by all the threads in the created pools.
This PR names the threads in the pools I was able to find. There are other pools created, like OpenMP ones and we need to follow-up on those.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130270
Approved by: https://github.com/d4l3k, https://github.com/albanD
Summary:
`nullptr` is typesafe. `0` and `NULL` are not. In the future, only `nullptr` will be allowed.
This diff helps us embrace the future _now_ in service of enabling `-Wzero-as-null-pointer-constant`.
Test Plan: Sandcastle
Reviewed By: meyering
Differential Revision: D54163060
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120740
Approved by: https://github.com/Skylion007
Previously a crash in PyTorch on power systems was fixed with #110708.
Even with the fix, the torch_test.py test throws the following error
for one of the tests.
"Error in cpuinfo: processor architecture is not supported in cpuinfo"
This is a follow up patch to fix this error.
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112707
Approved by: https://github.com/albanD
Summary:
This will make sure we don't run into an internal assert for clang tsan which has a cap of 63 on concurrently held lock count.
Seems like it is failing with 64 since the comparison is `<`, so setting it to 63 here.
```
llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))"
```
Created from CodeHub with https://fburl.com/edit-in-codehub
Test Plan:
CI
Sandcastle run
Reviewed By: kimishpatel, salilsdesai
Differential Revision: D41444710
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89453
Approved by: https://github.com/mcr229
Summary: Cap the thread count to 64 unconditionally to solve this tsan issue which leads to harder to debug, flaky test failures.
Test Plan: CI
Reviewed By: kimishpatel
Differential Revision: D38136212
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83950
Approved by: https://github.com/kimishpatel
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76366
caffe2 is not currently being built for XROS.
Test Plan: CI
Reviewed By: kimishpatel
Differential Revision: D35923922
fbshipit-source-id: 260dacadf0bd5b6bab7833a4ce81e896d280b053
(cherry picked from commit 8370b8dd2519d55a79fa8d45e7951ca8dc0b21a8)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69533
Modified loops in files under fbsource/fbcode/caffe2/ from the format
```
for(TYPE var=x0;var<x_max;x++)
```
to the format
```
for(const auto var: irange(xmax))
```
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: malfet
Differential Revision: D32837942
fbshipit-source-id: 8663037a38ade8f81bd5e983a614d197ea11f0d1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: malfet
Differential Revision: D31705359
fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
bypass_size_limit
allow-large-files
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D30652629
fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65245
Building and running c10 and qnnpack tests on XROS.
Notable changes:
- Adding #if define(_XROS_) in few places not supported by XROS
- Changing Threadpool to abstract class
ghstack-source-id: 139513579
Test Plan: Run c10 and qnnpack tests on XROS.
Reviewed By: veselinp, iseeyuan
Differential Revision: D30137333
fbshipit-source-id: bb6239b935187fac712834341fe5a8d3377762b1
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57273.
Some users reported that they dislike the Caffe2 thread-pool leak warning, as it floods their logs, and have requested disabling it, or have asked for a way to filter it.
It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so `torch.set_num_threads()` invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch).
https://github.com/pytorch/pytorch/issues/60171's test script does have a `set_num_threads` invocation & hence that's why I was able to reproduce the issue after building from the master branch's source code.
cc malfet & ejguan, who have the authority to make a decision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60318
Reviewed By: albanD
Differential Revision: D29265771
Pulled By: ezyang
fbshipit-source-id: 26f678af2fec45ef8f7e1d39a57559790eb9e94b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58759
* Makes `pthreadpool()->run` respect `_NoPThreadPoolGuard`
Runs tasks on the same thread instead of parallelizing when guard is present
Test Plan:
buck build //xplat/caffe2:aten_test_test_thread_pool_guard
./buck-out/last/aten_test_test_thread_pool_guard
Reviewed By: kimishpatel
Differential Revision: D28597425
fbshipit-source-id: 0365ad9947c239f5b37ce682802d4d401b8b0a48
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
Following up on https://github.com/pytorch/pytorch/pull/54895#discussion_r606402656.
A race-condition wouldn't arise because `leak_corrupted_threadpool` can be set to true only after fork via the `pthread_atfork` handler, when a (child) process would be single-threaded. It's set to false also when the process is still single-threaded (`pthreadpool` is called during an invocation to `set_num_threads`, prior to which a child process would remain single-threaded). All threads (if & when multiple threads would be created) would always see `leak_corrupted_threadpool` as false if it would be accessed concurrently.
Since no reader threads can exist while a writer thread changes its value (false->true and true->false), `leak_corrupted_threadpool` might as well be a non-atomic bool.
### Pros
1. No thread-synchronization is required for `leak_corrupted_threadpool`, as it's a non-atomic bool.
2. The call to `compare_exchange_strong` has been be removed.
cc: malfet VitalyFedyunin ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55341
Reviewed By: albanD
Differential Revision: D27669442
Pulled By: ezyang
fbshipit-source-id: 926cb5c1b0a537c1c2ab164b0d51d37c1f1b67f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55435
We've seen issues from the macos skylight app that PyTorch is super slow due to the lack of cap support in pthreadpools. For mac builds, we set the thread count to `#threads/2`.
ghstack-source-id: 125900852
Test Plan:
- Sandcastle CI
- CircleCI
Reviewed By: kimishpatel
Differential Revision: D27578871
fbshipit-source-id: 7b947bc5d6cf289378abf5f479575e112325d02b
Summary:
## Problem summary
Fixes https://github.com/pytorch/pytorch/issues/54752 - when the number of threads is more than 3 and at least one `set_num_threads` invocation has taken place before forking child processes by the dataloader, `set_num_threads(1)` in the child process causes a segfault, as during its invocation, the child process is made to handle the data structures of the Caffe2 thread-pool of the parent process, whose data structures it inherits from the parent process (these threads don't exist in the child process, but some of its data structures do, due to the copy-on-write technique used by `fork`).
## Solution
malfet [advised](https://github.com/pytorch/pytorch/issues/54752#issuecomment-810315302) & [authored code](https://github.com/pytorch/pytorch/pull/54895#pullrequestreview-625670122) for adding a `pthread_atfork` handler in `pytorch/caffe2/utils/threadpool/pthreadpool-cpp.cc`, that's invoked in the child process right after fork, to leak the Caffe2 thread-pool (the child inherits the thread-pool's data structures from its parent process, but doesn't actually have those threads, since after `fork` , a child process only has one thread).
## Additional changes
Added unittest `test_no_segfault` to test for this issue in `test_dataloader.py`
Also enabled `test_segfault` (which actually makes sure that segfaults happen in worker processes in a particular case).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54895
Reviewed By: zhangguanheng66
Differential Revision: D27542253
Pulled By: malfet
fbshipit-source-id: 10f9c67ce1ff1aa37d3efebf405bd93f7f9d2489
Summary:
Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO.
Manually edited some references of the removed `CAFFE2_API`:
* `CONTRIBUTING.md`
* `caffe2/proto/CMakeLists.txt`
* `cmake/ProtoBuf.cmake`
* `c10/macros/Export.h`
* `torch/csrc/WindowsTorchApiMacro.h`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496
Reviewed By: malfet, samestep
Differential Revision: D25600726
Pulled By: janeyx99
fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782
Summary:
This re-applies D21232894 (b9d3869df3) and D22162524, plus updates jni_deps in a few places
to avoid breaking host JNI tests.
Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test`
Reviewed By: xcheng16
Differential Revision: D22199952
fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243
*** Why ***
As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool. Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version.
The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point. That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks. With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene. As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands.
This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2. Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell.
So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do.
The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene. This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the
exact same third party implementation in this PR.
Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well. The implementation of ATen parallel_for on non-mobile builds remains unchanged.
*** How ***
This is where things get tricky.
A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use.
pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR. This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation. In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in. Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try. I am heavily relying on CI to find any issues as local testing can only go that far.
Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration. This simplifies the logic at the cost of pushing the complexity to the build scripts. From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration.
When it is all said or done, the layering will look like this:
a) aten::parallel_for, uses
b) caffe2::PThreadPool, which uses
c) pthreadpool C API, which delegates to
c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here.
c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to
c-2-1) caffe2::ThreadPool, and the rabbit hole ends here.
NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b).
Differential Revision: D21232894
Test Plan: Imported from OSS
Reviewed By: dreiss
Pulled By: AshkanAliabadi
fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354
Summary:
Because MacOS is not iOS
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37283
Test Plan: CI
Differential Revision: D21244398
Pulled By: malfet
fbshipit-source-id: b822e216e83887e2f2961b5c5384eaf749629f61
Summary:
We get seg fault without this in using XNNPACK.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34087
Differential Revision: D20199787
Pulled By: kimishpatel
fbshipit-source-id: d3d274e7bb197461632b21688820cd4c10dcd819
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33957
lots of small preprocessor warning cleanup for windows
Test Plan: CI green
Reviewed By: malfet, albanD
Differential Revision: D20153582
fbshipit-source-id: 18fd61c466fd1f55ededdae4448b3009a9cedc04