Summary:
Refactor data_parallel_model all_reduce and broadcast methods to work for
a given parameter set not only gradients and reuse them for BMUF distributed
implementation.
Add a distributed test (multiprocessing) to BMUF.
Reviewed By: akyrola
Differential Revision: D5267083
fbshipit-source-id: 8dcc7527d0a755b903d693d8071585f0b54d3403