Summary: For data parallel we need the batch size to be multiple of nubmer of replicas. In order to do so with this diff we do Dataset(rec).trim(multiple_of=num_replicas)
Reviewed By: dzhulgakov, harouwu
Differential Revision: D5753861
fbshipit-source-id: c5d728b925707dbd3d1f500a93e67e185c223569
Summary: This will throw away a few examples. It is desirable to keep batch size constant for full sync data parallel
Reviewed By: dzhulgakov
Differential Revision: D5531788
fbshipit-source-id: e19385401155e731cfc5b25e8e9ea7c16c19d478
Summary: Currently the dataset cursor blob is using a fixed name. When we read from multi input tables, the dataset cursor of each table is using the same blob. This messed up the split queue and crashed the reader pipelines (see the errors and failures in https://fb.quip.com/uzbIA7K0PgVe)
Reviewed By: dragonxlwang, rayleichen
Differential Revision: D5419863
fbshipit-source-id: 5983a3d8d2e286dc47c2ec38ed1dbbe30c7c9b49
(1) nccl submodule, cnmem submodule
(2) mpi ops fallback test
(3) a bit more blob interface
(4) fixed tests
(5) caffe2.python.io -> caffe2.python.dataio to avoid name conflicts
(6) In the build system autogen __init__.py instead of having manual
rules just to copy over an empty __init__.py.