Commit Graph

4 Commits

Author SHA1 Message Date
Lu Fang
a918f1d9af Adding a hook (wrapper) for non-std stream reader in PyTorchStreamReader (#15551)
Summary:
To implement a stream is very annoying, since it is closely defined with the underlying storage streambuffer.

So in this PR, we add ReadAdapterInterface and PyTorchStreamReader will use it. We implement IStreamAdapter as a wrapper of std::istream. And keep the user interface unchanged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15551

Reviewed By: zrphercule

Differential Revision: D13568907

Pulled By: houseroad

fbshipit-source-id: 93708cb801248a6c101f35cb14d1631029365c3c
2019-01-04 22:50:07 -08:00
Zachary DeVito
170ff7764f Use a zip archive as our container format (#14521)
Summary:
After consulting with Owen, who pointed out the existence of the miniz library, I decided to take one last shot at using zip as our container format.
miniz makes this surprisingly feasible and I think the benefits of using zip are large enough that we should do it.

This replaces our custom container format with a zip archive, preserving all of the
desirable features of our custom format, such as append-oriented writing, and
mmap'able tensor data while adding a bunch of debugging advantages:

1. You can unzip and explore the container to debug what is going on with a model.
2. You can edit the model using a text editor (e.g. change the definition of a method,
   or editing the json-serialized meta-data), re-zip the file use OSX's native 'Compress'
   option, and re-load the result into pytorch. Note: this enables you to, e.g., print-debug
   serialized models.
3. We can easily enable features like compression in the future.
4. Stock python , without pytorch installed, and other programming languages
   can reasonably consume this format,using json  and zipfile packages, which enables
   people to build tools like visualizers without those visualizers depending on pytorch.
   This will be especially useful if you want to, for instance, write a visualizer in javascript.

Notes:

*  This add miniz (https://github.com/richgel999/miniz) as a dependency. miniz is a self-contained
   library for reading/writing zipfiles that unlike other zip libraries also includes libz
   compatible compress/decompress support. It is a single header and a single C file without
   any other dependencies. Note that the instructions for miniz explicitly state:

   > Please use the files from the releases page in your projects. Do not use the git checkout directly!

   So we have checked in the 'release' source. Miniz supports zip64, and its API is amenable
   to doing zip-align style things to align data.

*  Removes 'size' from RecordRef. This allows you to edit files in the zip archive without
   editing the meta-data file. Very important if you want to print-debug serialized models.

*  PyTorchStreamReader/PyTorchStreamWriter keep mostly the same API (though keys become strings)
   However, their implementation is completely swapped out to use miniz.

*  Code exists to check for the old magic number to give a decent warning to our preview users
   after we change the format.

*  Container version information is now put in a stand-alone 'version' file in the archive
   and serves a similar purpose to the other container version info.

*  All files in the zip archive start at 64-byte boundaries, using an approach similar to
   zip-align. Tests check that this property remains true. While the writer does this,
   the reader doesn't depend on it, allowing user-created archives that can use compression,
   and do not have to align data.

*  Added test to check for > 4GB files and archives. Disabled by default because it takes
   almost 2 minutes to run.

*  torchscript files are now optional: if a submodule does not have methods, it will
   not be written.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14521

Reviewed By: jamesr66a

Differential Revision: D13252945

Pulled By: zdevito

fbshipit-source-id: 01209294c0f6543d0fd716f85a38532249c52f8c
2018-11-30 19:19:29 -08:00
Lu Fang
5182fdad0b Compute the offset to make sure the order in InlineContainer test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13198

Reviewed By: bddppq

Differential Revision: D12812909

Pulled By: houseroad

fbshipit-source-id: f448e0d7957c316099a6b565d129eabb7ef81e59
2018-10-26 21:32:25 -07:00
Lu Fang
9f9f06c937 Improve inline container and add some test (#12993)
Summary:
Added getNextRecord/hasNextRecord methods. Even the model data is stored at the end, we can still read the file from the beginning.

Added gtest to cover reader and writer's code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12993

Reviewed By: yinghai

Differential Revision: D10860086

Pulled By: houseroad

fbshipit-source-id: 01b1380f8f50f5e853fe48a8136e3176eb3b0c29
2018-10-26 12:06:47 -07:00