pytorch/test/cpp
Jeremy Lilley 2e0294cb39 Make JIT Serialization support arbitrary std::function<> IO (#28039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28039

Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

Test Plan:
buck test mode/dev-nosan caffe2/test/...
      (Basic serialization test in caffe2/test/cpp/api/serialize.cpp)
      Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443
        (1M time goes from 90ms -> 40ms, albeit with crc patch applied)

Differential Revision: D17939034

fbshipit-source-id: 344cce46f74b6438cb638a8cfbeccf4e1aa882d7
2019-10-15 22:12:04 -07:00
..
api Make JIT Serialization support arbitrary std::function<> IO (#28039) 2019-10-15 22:12:04 -07:00
common Trim libshm deps, move tempfile.h to c10 (#17019) 2019-02-13 19:38:35 -08:00
dist_autograd add known worker ids to distributed autograd context (#26324) 2019-10-14 10:43:09 -07:00
jit module dedupe (#26666) 2019-10-12 09:51:57 -07:00
__init__.py Add train() / eval() / is_training() to C++ ScriptModule API (#16044) 2019-02-01 13:07:38 -08:00