Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:
```2to3 -f future -w caffe2```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033
Reviewed By: seemethere
Differential Revision: D23808648
Pulled By: bugra
fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31707
Change the initialization value for FC weight init and sparse embedding lookup init.
Previous default initialization is uniform(-\sqrt(1/input_dim), \sqrt(1/input_dim)); Now pass into a flexible hyperparameter, say \alpha into it, to change into uniform(-\sqrt(\alpha/input_dim), \sqrt(\alpha/input_dim));
Reviewed By: chonglinsun
Differential Revision: D18825615
fbshipit-source-id: 4c5f2e07f2b3f5d642fd96d64dbf68892ebeb30b
Summary:
To achive this, I modified the blob name scheme defined in a layer.
Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc
within the same scope).
Now I change it to scope/fc/w and scope/fc_auto_0/w.
That is, we rely on the uniqueness of the scoped layer name to define
names for blobs.
I also overwrote the create_param method in LayerModelHelper to let it
use the resolved name for blobs given the sharingparameter context.
There are some details such as making the initializer more structured
that I need to finalize.
Reviewed By: kennyhorror
Differential Revision: D5435132
fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455
Summary:
`SamplingTrain` layer is a wrapper around another layer subclassing `SamplingTrainableMixin`. When initiated in the training context, `SamplingTrain` produces sparse output of the wrapped layer. Output can be paired with `indices` to create Map schema. When initiated in prediction context, the full output of the wrap layer is produced.
This is liked the SampledFC function in model helper, https://fburl.com/gi9g1awh, with the ability to initiated in both trainig and prediction context.
I'd like to get consensus whether we should introduce the `SamplingTrain` layer and the accompaying mixin. This can probably be accomplished in some other way, but I think this is not too bad.
Reviewed By: xianjiec
Differential Revision: D4689887
fbshipit-source-id: 7be8a52d82f3a09a053378146262df1047ab26a8
Summary: For some embedding task, we don't want to include bias term in embedding computation.
Reviewed By: xianjiec
Differential Revision: D4689620
fbshipit-source-id: 4168584681d30c0eaa1d17ceaf68edda11924644