mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
Summary: Per discussion with Fei Tian, we need to add a `scale_init_value` to scale down the output of normalization such as batch-norm and layer-norm. Currently we have `sparse_normalization_options` to normalize embedding pooling output. By default, scale = 1.0, we found it's better to set scale from 0.025 to 0.1 https://fb.quip.com/MiKUAibEaYhH Besides, I am removing the tags from normalizers because it makes more sense to calculate norm ops in distributed trainers, not ps. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31983 Test Plan: Testing LN and BN after sum-pooling -- baseline f160348514 LN: f160348609 BN: f160348710 {F226106518} Layer norm after sum-pooling fwd_net https://fburl.com/sa4j207n Layer norm after dot-prod fwd_net https://fburl.com/twggwyvb ## Unit Tests Testing normalization after pooling ``` buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_sparse_pooling_batch_normalization buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_dense_sparse_pooling_batch_normalization buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_sparse_pooling_layer_normalization buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_dense_sparse_pooling_layer_normalization ``` Testing normalization after dot-prod ``` buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_last_layer_use_batch_norm buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_last_layer_use_layer_norm ``` Differential Revision: D19277618 Pulled By: SilunWang fbshipit-source-id: ea323e33e3647ba55d2e808ef09d94ad7b45b934
45 lines
1.5 KiB
Python
45 lines
1.5 KiB
Python
# @package optimizer
|
|
# Module caffe2.python.normalizer
|
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
|
|
|
|
|
class Normalizer(object):
|
|
def __init__(self):
|
|
pass
|
|
"""
|
|
Adds normalization to train_net for given parameter. Its factor ahead of
|
|
regularization is given when initialization.
|
|
The param should be a BlobReference.
|
|
"""
|
|
|
|
def __call__(self, net, param):
|
|
return self._run(net, param)
|
|
|
|
def _run(self, net, param):
|
|
raise Exception("Not Impelemented")
|
|
|
|
|
|
class BatchNormalizer(Normalizer):
|
|
def __init__(self, momentum, scale_init_value=1.0):
|
|
super(BatchNormalizer, self).__init__()
|
|
self._momentum = float(momentum)
|
|
self._scale_init_value = float(scale_init_value)
|
|
|
|
def _run(self, layer_model, param):
|
|
return layer_model.BatchNormalization(
|
|
param, momentum=self._momentum, scale_init_value=self._scale_init_value
|
|
)
|
|
|
|
|
|
class LayerNormalizer(Normalizer):
|
|
def __init__(self, epsilon, use_layer_norm_op=True, scale_init_value=1.0):
|
|
super(LayerNormalizer, self).__init__()
|
|
self._epsilon = float(epsilon)
|
|
self._use_layer_norm_op = use_layer_norm_op
|
|
self._scale_init_value = float(scale_init_value)
|
|
|
|
def _run(self, layer_model, param):
|
|
return layer_model.LayerNormalization(
|
|
param, epsilon=self._epsilon, use_layer_norm_op=self._use_layer_norm_op, scale_init_value=self._scale_init_value
|
|
)
|