Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236
Approved by: https://github.com/malfet
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57566
Fix the problem that `tempfile` has never been deleted even after `torch_shm_manager` is destroyed.
- The previous implementation has wrong path length for the Linux Socket. It leads to we lose the last character of the name of `tempfile` when bind the pathname to socket. At the end, we can not delete this file due to unexpected file name.
- After we solve the racing problem by introducing a temporary directory, it becomes more dangerous since it prevents `torch_shm_manager` to delete directory as the tempfile persists in the temporary directory.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D28202866
Pulled By: ejguan
fbshipit-source-id: 912cfd8fec0cc309d47df223b2b0faa599c60799
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57309
Addressing a race condition that can occur in `torch_shm_manager` between the time its temporary file is unlinked and when it `bind()`s the manager server socket to that same name. In that time window, other threads/processes can re-create another temporary file with the same name, causing `bind()` to fail with `EADDRINUSE`.
This diff introduces `c10::TempDir` and associated helper functions that mirror those of `c10::TempFile` and generates the manager socket name using a combination of a temporary directory, which will be valid for the lifetime of `torch_shm_manager`, and a well-known file name within that directory that will never be used outside of `bind()`.
Reviewed By: ejguan
Differential Revision: D28047914
fbshipit-source-id: 148d54818add44159881d3afc2ffb31bd73bcabf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57308
This diff makes `c10::TempFile` non-copyable but movable. `torch_shm_manager` was previously dependent upon some hidden behavior that was a result of copying `TempFile`s, which is also being made more explicit now that they can be moved but not copied.
Context:
`c10::TempFile` is currently copyable, which leads to surprising behavior. A seemingly valid `TempFile` may in fact be invalid if the original it was copied from has already been destroyed, resulting in the file descriptor to be closed and the filename being unlinked without the user knowing about it.
**In fact, both `c10::try_make_tempfile` and `c10::make_tempfile` cause copies of `TempFile` to be made**, which can easily be verified by explicitly deleting the copy constructor of `TempFile` and attempting to compile. This means that in practice, users of these functions are getting temporary files that have already been closed and unlinked.
This copying of `TempFile` is particularly interesting in the case of `torch_shm_manager`, which uses `try_make_tempfile` to generate the name of a Unix domain socket to communicate with clients. In order for `bind()` on the socket name to be successful, a file with that same name must not be linked in the filesystem, or `EADDRINUSE` will result. Happily, beacuse `try_make_tempfile` previously created a copy of the `TempFile` while destroying the original, `torch_shm_manager` did not encounter this. With this change, howevrer, `torch_shm_manager` must now explicitly destroy the `TempFile` before attempting to `bind()`. Unfortunately, this exposes a race condition--**other code can re-generate the same-named temporary file after the one created by `torch_shm_manager` is explicitly unlinked but before `torch_shm_manager` binds it to the server socket.** To be clear: this race condition already existed before this diff, but this makes things more explicit. The real fix will be in a follow-up change.
Reviewed By: ejguan
Differential Revision: D28047915
fbshipit-source-id: e8a1b6bb50419fe65620cfecdb67c566a4cf9056
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57307
Extend the `"ERROR"` message that `torch_shm_manager` writes to the pipe when it encounters a fatal error with some extra context (specifically, the `what()` on a caught `std::exception`), allowing the parent process to gain some insight into the cause of the failure.
Also, simply return from `main()` with an error exit code when a fatal exception is caught rather than re-throwing, because re-throwing leads to premature process termination that may prevent standard output from being flushed (and therefore the parent process from being able to read the error context from the pipe).
Reviewed By: ejguan
Differential Revision: D28047916
fbshipit-source-id: d423ee8ed1b2bf7831db877e8f8515ec6d6aa169
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
libshm_manager doesn't need to depend on all of libtorch. It only uses tiny tempfile.h which can be moved to c10. I could just duplicate the file too, but it's not worth it as c10 is small enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17019
Differential Revision: D14052688
Pulled By: dzhulgakov
fbshipit-source-id: 8797d15f8c7c49c49d40b7ab2f43aa3bf6becb0c
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.
I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.
I used the following script to do the canonicalization:
```
import subprocess
import re
import os.path
files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
for fn in files:
if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
continue
if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
continue
with open(fn, 'r') as f:
c = f.read()
def fmt(p):
return "#include <{}>".format(p)
def repl(m):
p = m.group(1)
if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
return fmt(p)
if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
return fmt(p)
for root in ["aten/src", "torch/lib", ""]:
for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
new_p = os.path.relpath(os.path.join(bad_root, p), root)
if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
return fmt(new_p)
print("ERROR: ", fn, p)
return m.group(0)
new_c = re.sub(r'#include "([^"]+)"', repl, c)
if new_c != c:
print(fn)
with open(fn, 'w') as f:
f.write(new_c)
```
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849
Reviewed By: dzhulgakov
Differential Revision: D13363445
Pulled By: ezyang
fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
- Add additional timeouts to test_multiprocessing to reduce chances of
hanging indefintely on failure
- Add missing header guards
- Fix typo
- Check that torch_shm_manager exists in torch/__init__.py