pytorch/c10/util/Optional.cpp
Scott Wolchok df5b4696cf [Pytorch] Specialize guts of c10::optional for 32-bit scalars (#47015)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47015

c10::optional has non-trivial copy and move operations always. This change specializes it for 32-bit scalars so that it has trivial copy and move operations in that case. Ideally, we would instead rely on P0602 "variant and optional should propagate copy/move triviality" and use `std::optional` (or implement that functionality ourselves). We can't use `std::optional` because we are stuck with C++14. Implementing the full P0602 ourselves would add even more complexity. We could do it, but this should be a helpful first step.
ghstack-source-id: 115886743

Test Plan:
Collect Callgrind instruction counts for `torch.empty(())`. Data:

Make empty c10-ful (https://github.com/pytorch/pytorch/pull/46092):

```
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7ffaed1128e0>
torch.empty(())
                           All          Noisy symbols removed
    Instructions:       648005                     632899
    Baseline:             4144                       3736
100 runs per measurement, 1 thread
```

This diff atop #46092:

```
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f943f1dc8e0>
torch.empty(())
                           All          Noisy symbols removed
    Instructions:       602347                     591005
    Baseline:             4106                       3736
100 runs per measurement, 1 thread
```

(6.6% improvement vs #46092)

Pass optionals by const reference (https://github.com/pytorch/pytorch/pull/46598)

```
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f1abb3988e0>
torch.empty(())
                           All          Noisy symbols removed
    Instructions:       601349                     590005
    Baseline:             4162                       3736
100 runs per measurement, 1 thread
```
(6.8% improvement vs #46092)

This diff atop #46598 (i.e., both together)

```
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f9577c22850>
torch.empty(())
                           All          Noisy symbols removed
    Instructions:       596095                     582451
    Baseline:             4162                       3736
100 runs per measurement, 1 thread
Warning: PyTorch was not built with debug symbols.
         Source information may be limited. Rebuild with
         REL_WITH_DEB_INFO=1 for more detailed results.
```

(another 1.3% savings!)

#46598 outperformed this change slightly, and combining the two leads to further benefits. I guess we should do both! (Though I still don't understand why passing optionals that should fit in a register by const reference would help...)

Reviewed By: smessmer

Differential Revision: D24552280

fbshipit-source-id: 4d93bfcffafebd8c01559398513fa6b9db959d11
2020-11-04 21:08:50 -08:00

7 lines
284 B
C++

#include <c10/util/Optional.h>
#include <type_traits>
static_assert(C10_IS_TRIVIALLY_COPYABLE(c10::optional<int>), "c10::optional<int> should be trivially copyable");
static_assert(C10_IS_TRIVIALLY_COPYABLE(c10::optional<bool>), "c10::optional<bool> should be trivially copyable");