… all instances of std::result_of and std:result_of_t are conditionally replaced by std::invoke_result and std::invoke_result_t if __cpp_lib_is_invocable >= 201703L. std::invoke_result was only introduced in c++17, so it should probably not be required yet.
Fixes#71657 and a small part of #69290
Tested on Centos 7 / gcc11 + a private project that requires cpp20.
I think the main questions to check by a maintainer are,
- whether my choices of preprocessor blocks are appropriate
- whether there are any very subtle differences between std::result_of and std::invoke_result that I have missed
- whether in any of the replacements the 'new' side can/should be simplified further
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79985
Approved by: https://github.com/ezyang
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74109
As the title says - to prevent unintentional usage of the `data_` member without the lock held.
ghstack-source-id: 151471742
Test Plan:
Build using:
```
buck build -c pt.disable_per_op_profiling=0 -c pt.enable_record_kernel_dtype=1 --show-output xplat/caffe2/fb/model_tracer:model_tracer
```
Reviewed By: malfet
Differential Revision: D34822913
fbshipit-source-id: 956fbe78956cf556fd6c8481b910acf557fbe608
(cherry picked from commit 9e4e3115f8e92bc4834179902d7ca79b3f97d985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74061
As the title says - to prevent unintentional usage of the `data_` member without the lock held.
ghstack-source-id: 150555298
Test Plan:
Build using:
```
buck build -c pt.disable_per_op_profiling=0 -c pt.enable_record_kernel_dtype=1 --show-output xplat/caffe2/fb/model_tracer:model_tracer
```
Reviewed By: JacobSzwejbka
Differential Revision: D34645508
fbshipit-source-id: effa8064f92550cb4fdd078fd85887751d8f849d
(cherry picked from commit 377f85907c14c0f2f0bd19068ee23cf48df9e17b)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66064
The only place this is used seems to be in the dispatcher for `operatorLookupTable_`. Disarming `LeftRight` disarms it for this one use case.
This should make .so loading faster, and also reduce memory consumption since `LeftRight<T>` does 2 writes for every write. I'd like to get a thorough review from reviewers for this diff since I want to make sure that initialization of stuff that writes into the dispatcher isn't going to happen on multiple threads for on-device use.
Created a new class named `LeftRightNoOpWrapper<T>` for use in mobile builds.
### Why is LeftRight<T> slow?
It maintains 2 copies of each data structure `T` to be able to keep reads quick. Every write goes to both data structures, which means that writes that 2x and memory overhead is also 2x
### Why is this safe for mobile builds?
1. .so loading never happens concurrently with model execution
2. Custom ops are loaded during .so load - initializers are all run serially
3. I don't see any threads being spawned from the global schema and kernel initializers
After discussing with dreiss, it seems like there could be rare cases in OSS apps or internal Android/iOS apps where a `.so` or `dylib` is loaded after the PT runtime is loaded, and this load happens concurrently with an in-progress inference run, which is looking up the operator table in the dispatcher.
To avoid crashes there, it seems reasonable to use the RW lock, since I don't expect any contention 99.9% of the time.
When registering operators, everything is serial so only one thread will ever hold the lock. The next time it needs the lock, it will have already released it.
During inference runs, only one thread will ask for the shared lock unless multiple concurrent inferences are in progress. Even in that case, they will all be able to simultaneously get the Read lock.
Test Plan: Build and generate a local build of the iOS app to test.
Reviewed By: swolchok
Differential Revision: D31352346
fbshipit-source-id: c3f12454de3dbd7b421a6057d561e9373ef5bf98
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830
Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase.
Test Plan: CI
Reviewed By: zertosh
Differential Revision: D27979080
fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57164
Give some more indications about its performance characteristics
and when it is appropriate to use.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D28064685
Pulled By: ezyang
fbshipit-source-id: dbf5e041088d7921db2111d287feb9079466f1b5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31026
This is error prone and probably wrong. Since we don't use LeftRight on the hot path anymore, let's remove this.
ghstack-source-id: 96369644
Test Plan: none
Differential Revision: D18902165
fbshipit-source-id: 7b9478cd7cc071f403d75da20c7c889c27248b5c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30282
The atomic increment/decrements in LeftRight::read() were measurable in perf benchmarks. Let's improve their perf.
ghstack-source-id: 94443230
Test Plan: unit tests, perf benchmarks
Differential Revision: D18650228
fbshipit-source-id: d184ce8288510ab178e7c7da73562609d1ca3c9f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25133
This is driven by benchmarks I did for moving ATen ops to the c10 operator library.
Improvements:
- tell the compiler that the error cases are unlikely so it can optimize code better
- optimize cache layout of LeftRight.
ghstack-source-id: 88907294
Test Plan: unit tests
Differential Revision: D16998010
fbshipit-source-id: 0e3cbff0a4983133a4447ec093444f5d85dd61d6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16524
- Make it exception safe. When an exception happens during write, the old state is recovered.
- Use RAII instead of try/catch to increment counters in readers. This is more readable, and it also makes it work with reader closures that return void, which previously didn't work because the reader return value was stored on the stack.
- Assert there's no reads or writes happening when it's destructed to avoid destruction race conditions
- Explain the algorithm in detail in comments
- Add test cases
Reviewed By: ezyang
Differential Revision: D13866609
fbshipit-source-id: 01306a282a3f555569caa13d8041486f960d00e2