Summary:
To implement a stream is very annoying, since it is closely defined with the underlying storage streambuffer.
So in this PR, we add ReadAdapterInterface and PyTorchStreamReader will use it. We implement IStreamAdapter as a wrapper of std::istream. And keep the user interface unchanged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15551
Reviewed By: zrphercule
Differential Revision: D13568907
Pulled By: houseroad
fbshipit-source-id: 93708cb801248a6c101f35cb14d1631029365c3c
Summary:
After consulting with Owen, who pointed out the existence of the miniz library, I decided to take one last shot at using zip as our container format.
miniz makes this surprisingly feasible and I think the benefits of using zip are large enough that we should do it.
This replaces our custom container format with a zip archive, preserving all of the
desirable features of our custom format, such as append-oriented writing, and
mmap'able tensor data while adding a bunch of debugging advantages:
1. You can unzip and explore the container to debug what is going on with a model.
2. You can edit the model using a text editor (e.g. change the definition of a method,
or editing the json-serialized meta-data), re-zip the file use OSX's native 'Compress'
option, and re-load the result into pytorch. Note: this enables you to, e.g., print-debug
serialized models.
3. We can easily enable features like compression in the future.
4. Stock python , without pytorch installed, and other programming languages
can reasonably consume this format,using json and zipfile packages, which enables
people to build tools like visualizers without those visualizers depending on pytorch.
This will be especially useful if you want to, for instance, write a visualizer in javascript.
Notes:
* This add miniz (https://github.com/richgel999/miniz) as a dependency. miniz is a self-contained
library for reading/writing zipfiles that unlike other zip libraries also includes libz
compatible compress/decompress support. It is a single header and a single C file without
any other dependencies. Note that the instructions for miniz explicitly state:
> Please use the files from the releases page in your projects. Do not use the git checkout directly!
So we have checked in the 'release' source. Miniz supports zip64, and its API is amenable
to doing zip-align style things to align data.
* Removes 'size' from RecordRef. This allows you to edit files in the zip archive without
editing the meta-data file. Very important if you want to print-debug serialized models.
* PyTorchStreamReader/PyTorchStreamWriter keep mostly the same API (though keys become strings)
However, their implementation is completely swapped out to use miniz.
* Code exists to check for the old magic number to give a decent warning to our preview users
after we change the format.
* Container version information is now put in a stand-alone 'version' file in the archive
and serves a similar purpose to the other container version info.
* All files in the zip archive start at 64-byte boundaries, using an approach similar to
zip-align. Tests check that this property remains true. While the writer does this,
the reader doesn't depend on it, allowing user-created archives that can use compression,
and do not have to align data.
* Added test to check for > 4GB files and archives. Disabled by default because it takes
almost 2 minutes to run.
* torchscript files are now optional: if a submodule does not have methods, it will
not be written.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14521
Reviewed By: jamesr66a
Differential Revision: D13252945
Pulled By: zdevito
fbshipit-source-id: 01209294c0f6543d0fd716f85a38532249c52f8c
Summary:
Hi guys,
I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios.
This is the first pull request.
Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015.
CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system.
Python is 3.5, Detectron works from python interface as well.
It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built.
What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat.
After this pull request the next step is to add Visual Studio 2017 support in the script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550
Reviewed By: ezyang
Differential Revision: D13042597
Pulled By: orionr
fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc
Summary:
When loading a non-existant / non-openeable file, the current error message is
```
Expected to read 8 bytes but got %llu bytes0
```
This
- fixes two ASSERTM formatting calls (including the above),
- throws a more specific error message if the ifstream constructor sets `.fail`.
Here is someone apparently confused by the current message: https://github.com/facebookresearch/maskrcnn-benchmark/pull/138#issuecomment-437848307
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13894
Differential Revision: D13043228
Pulled By: soumith
fbshipit-source-id: b348b482c66d5e420874ae6e101b834106b89e82
Summary:
Added getNextRecord/hasNextRecord methods. Even the model data is stored at the end, we can still read the file from the beginning.
Added gtest to cover reader and writer's code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12993
Reviewed By: yinghai
Differential Revision: D10860086
Pulled By: houseroad
fbshipit-source-id: 01b1380f8f50f5e853fe48a8136e3176eb3b0c29