PR #106656 was reverted due to IOS failures. It seems that IOS builds don't have full support of std::filesystem. This PR discards std::filesystem changes and add temp file creation on Windows. It also moves the platform syscalls into a separate cpp file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108508
Approved by: https://github.com/ezyang
This clang-tidy check is disabled globally due to false positives on containers, but there are a few places here where adding clang-tidy would actually improve performance (by allowing STL containers to use the move operator / assignment)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92156
Approved by: https://github.com/ngimel
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57566
Fix the problem that `tempfile` has never been deleted even after `torch_shm_manager` is destroyed.
- The previous implementation has wrong path length for the Linux Socket. It leads to we lose the last character of the name of `tempfile` when bind the pathname to socket. At the end, we can not delete this file due to unexpected file name.
- After we solve the racing problem by introducing a temporary directory, it becomes more dangerous since it prevents `torch_shm_manager` to delete directory as the tempfile persists in the temporary directory.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D28202866
Pulled By: ejguan
fbshipit-source-id: 912cfd8fec0cc309d47df223b2b0faa599c60799
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830
Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase.
Test Plan: CI
Reviewed By: zertosh
Differential Revision: D27979080
fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57309
Addressing a race condition that can occur in `torch_shm_manager` between the time its temporary file is unlinked and when it `bind()`s the manager server socket to that same name. In that time window, other threads/processes can re-create another temporary file with the same name, causing `bind()` to fail with `EADDRINUSE`.
This diff introduces `c10::TempDir` and associated helper functions that mirror those of `c10::TempFile` and generates the manager socket name using a combination of a temporary directory, which will be valid for the lifetime of `torch_shm_manager`, and a well-known file name within that directory that will never be used outside of `bind()`.
Reviewed By: ejguan
Differential Revision: D28047914
fbshipit-source-id: 148d54818add44159881d3afc2ffb31bd73bcabf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57308
This diff makes `c10::TempFile` non-copyable but movable. `torch_shm_manager` was previously dependent upon some hidden behavior that was a result of copying `TempFile`s, which is also being made more explicit now that they can be moved but not copied.
Context:
`c10::TempFile` is currently copyable, which leads to surprising behavior. A seemingly valid `TempFile` may in fact be invalid if the original it was copied from has already been destroyed, resulting in the file descriptor to be closed and the filename being unlinked without the user knowing about it.
**In fact, both `c10::try_make_tempfile` and `c10::make_tempfile` cause copies of `TempFile` to be made**, which can easily be verified by explicitly deleting the copy constructor of `TempFile` and attempting to compile. This means that in practice, users of these functions are getting temporary files that have already been closed and unlinked.
This copying of `TempFile` is particularly interesting in the case of `torch_shm_manager`, which uses `try_make_tempfile` to generate the name of a Unix domain socket to communicate with clients. In order for `bind()` on the socket name to be successful, a file with that same name must not be linked in the filesystem, or `EADDRINUSE` will result. Happily, beacuse `try_make_tempfile` previously created a copy of the `TempFile` while destroying the original, `torch_shm_manager` did not encounter this. With this change, howevrer, `torch_shm_manager` must now explicitly destroy the `TempFile` before attempting to `bind()`. Unfortunately, this exposes a race condition--**other code can re-generate the same-named temporary file after the one created by `torch_shm_manager` is explicitly unlinked but before `torch_shm_manager` binds it to the server socket.** To be clear: this race condition already existed before this diff, but this makes things more explicit. The real fix will be in a follow-up change.
Reviewed By: ejguan
Differential Revision: D28047915
fbshipit-source-id: e8a1b6bb50419fe65620cfecdb67c566a4cf9056
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28039
Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().
Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.
For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.
We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.
Test Plan:
buck test mode/dev-nosan caffe2/test/...
(Basic serialization test in caffe2/test/cpp/api/serialize.cpp)
Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443
(1M time goes from 90ms -> 40ms, albeit with crc patch applied)
Differential Revision: D17939034
fbshipit-source-id: 344cce46f74b6438cb638a8cfbeccf4e1aa882d7
Summary:
Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().
Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.
For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.
We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27586
Test Plan:
buck test mode/dev-nosan caffe2/test/...
(Basic serialization test in caffe2/test/cpp/api/serialize.cpp)
Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443
(1M time goes from 90ms -> 40ms, albeit with crc patch applied)
Differential Revision: D17822962
Pulled By: jjlilley
fbshipit-source-id: d344a7e59707f3b30d42280fbab78f87399e4d10
Summary:
libshm_manager doesn't need to depend on all of libtorch. It only uses tiny tempfile.h which can be moved to c10. I could just duplicate the file too, but it's not worth it as c10 is small enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17019
Differential Revision: D14052688
Pulled By: dzhulgakov
fbshipit-source-id: 8797d15f8c7c49c49d40b7ab2f43aa3bf6becb0c