Summary:
It appears that my initial implementation was not really working when one
starts doing nesting. This diff is fixing this by replacing itertools with
something that is really easy to reason about.
Reviewed By: idning
Differential Revision: D6933763
fbshipit-source-id: f7a1de996d878a41bac2b2acd9d87a7c4b416778
Summary: Added initializer which sets up the ParameterInfo object in the opposite format as the pFP16Initializer. This is needed for when the op requires the initialized blob to be FP32 but a FP16 copy of the weights is needed.
Reviewed By: wesolwsk
Differential Revision: D5840832
fbshipit-source-id: 439e87f41a1dbc58bf63a5c0e7f7fc4cb00b4d65
Summary:
Today, the PS's weirdly store the entire embedding and not just their
subsection of it. This was simply an oversight on the part of the original
author and this diff fixes that.
The sparse params are sharded to the PS's and the PS's just store their section
of the embedding. The trainer requests the id's as is from the PS. But the PS
divides the id by the num_of_shards before looking it up in the emdedding table
blob. This happens on the backward and the forward pass. However, during the
model download part, the PS multiples the embeddings with the num_of_shards
before returning them to the trainer. The upshot is that the trainer does not
know anything about how the embeddings are scaled on the PS. The PS adds extra
divide and multiply steps to achieve that.
2. During estimation time, we allocate just one PS for estimation. So in order
to make all of the embeddings fit on the single PS: We simply additionally
scale the hash table sizes (proportionally and equally for all the sparse
params) such that it fits. This scaling is handled analogously to (1).
Reviewed By: boryiingsu
Differential Revision: D5664093
fbshipit-source-id: 92f501f61566f939c41ce0b614a1b499669f978a
Summary: this diff adds optimizer into param_info, and the associated implementations for modelhelper and brew to set optimizer for each individual parameter.
Reviewed By: kennyhorror
Differential Revision: D5385432
fbshipit-source-id: 5d682f9d1ab077e04a5d76a24d71470f4e64fc92
Summary:
This diff is introducing abstractions for parameter sharing for all the
parameters, that are created through new create_param syntax.
Possible use-cases of this parameters sharing:
1. Share params within RNN interface.
2. Some complicated models that might share some of the branches.
3. TODO (next diff): Cross-model parameter sharing.
Reviewed By: salexspb
Differential Revision: D5160935
fbshipit-source-id: c6d40a5ed7ead240cd7db0eb69de6dc5f505b05a
Summary:
This diff is creating new type of Initializer - ExternalInitializer. This
initializer is supposed to be used in cases when the parameter blob is already
expected to exist in the workspace.
Reviewed By: dzhulgakov
Differential Revision: D5171322
fbshipit-source-id: d27861f0f80afdea93c235d49f63da19adccc92c
Summary:
This diff is the first step in the effort for refactoring all parameters. As a first step - I'm merging concept of params and computed_params, that is going
to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences).
Renaming computed_params to non-trainable/non-backprop params should be done is some other diff.
Reviewed By: salexspb
Differential Revision: D5171159
fbshipit-source-id: 68031ca779f053fb266a7c4a2e5b482a3bd9c832
Summary:
This diff is the first step in the effort for refactoring all paramters. As a
first step - I'm merging concept of params and computed_params, that is going
to be based on tags instead (in the first version it's still using old data
structs to store all the BlobReferences).
Renaming computed_params to non-trainable/non-backprop params should be done is
some other diff.
Reviewed By: salexspb
Differential Revision: D5119830
fbshipit-source-id: 2001090a37346eb12abbb234e13e727c288eb8a7
Summary:
Adds support for generating and training pfp16 models. Added SGD optimizer for multi-precision trainers and a new callback to data_parallel_model in order to help multi-precision models keep their different copies of parameters in sync during training.
Closes https://github.com/caffe2/caffe2/pull/697
Differential Revision: D5159712
Pulled By: salexspb
fbshipit-source-id: 60a889494d2e2f4df1d720331e19f638c5eb95cc
Summary:
This is going to unblock Nvidia in their work on adding fp16
support to Caffe2. I discussed this with kennyhorror before to make
sure this fits into his work on parameter sharing.
Reviewed By: kennyhorror
Differential Revision: D5127797
fbshipit-source-id: 4db155d320b1862570c23b77c4252bdacbf2296f