Summary:
The goal of this diff is:
1) Enable checkpointing to honor batches_per_epoch
2) Resume hive_readers mid-split
Reviewed By: azzolini
Differential Revision: D5004212
fbshipit-source-id: 2ff5df30ba946eefadd109d80056cde67398a080
Summary:
Before we didn't propagate the 'out-of-data' signal if splits_per_epoch wasn't specified.
Right now it's a hacky fix (just reuse ReaderWithLimit). azzolini - any suggestions of more elegant solution? I can create an extra reader that just export "is empty" signal out.
Overall, I guess we need to turn global_queue into a more sustainable unittest that verifies all possible combinations - I'm still not sure it's correct :-\
Reviewed By: xianjiec
Differential Revision: D4665677
fbshipit-source-id: fe44d10ee82c3383145635e67dea1d9b666e061f
Summary: This makes sure dper_example is compatible with the new way of defining checkpoint epochs. See D4499320.
Reviewed By: xianjiec
Differential Revision: D4511618
fbshipit-source-id: f5188010cdefe3739f87f6049d1ea6aee765c514
Summary: For customers like Ads, Feeds, MarketPlace, their training data size is super large. It is unnecessary and costly to go over all the data to compute meta information. In this diff, numSample option is added in preCompute, so users have control over how many samples they want to use when computing meta information.
Differential Revision: D4492399
fbshipit-source-id: 7199381d226ee6300a959fc5e116d39984d199fc
(1) nccl submodule, cnmem submodule
(2) mpi ops fallback test
(3) a bit more blob interface
(4) fixed tests
(5) caffe2.python.io -> caffe2.python.dataio to avoid name conflicts
(6) In the build system autogen __init__.py instead of having manual
rules just to copy over an empty __init__.py.