pytorch/caffe2/python/normalizer.py
Silun Wang 28c1258f18 Scale init for batch-norm and layer-norm (#31983)
Summary:
Per discussion with Fei Tian, we need to add a `scale_init_value` to scale down the output of normalization such as batch-norm and layer-norm.

Currently we have `sparse_normalization_options` to normalize embedding pooling output. By default, scale = 1.0, we found it's better to set scale from 0.025 to 0.1 https://fb.quip.com/MiKUAibEaYhH

Besides, I am removing the tags from normalizers because it makes more sense to calculate norm ops in distributed trainers, not ps.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31983

Test Plan:
Testing LN and BN after sum-pooling --
baseline f160348514
LN: f160348609
BN: f160348710

{F226106518}

Layer norm after sum-pooling fwd_net https://fburl.com/sa4j207n
Layer norm after dot-prod fwd_net https://fburl.com/twggwyvb

## Unit Tests
Testing normalization after pooling
```
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_sparse_pooling_batch_normalization
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_dense_sparse_pooling_batch_normalization
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_sparse_pooling_layer_normalization
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_dense_sparse_pooling_layer_normalization
```

Testing normalization after dot-prod
```
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_last_layer_use_batch_norm
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_last_layer_use_layer_norm
```

Differential Revision: D19277618

Pulled By: SilunWang

fbshipit-source-id: ea323e33e3647ba55d2e808ef09d94ad7b45b934
2020-01-10 11:55:56 -08:00

45 lines
1.5 KiB
Python

# @package optimizer
# Module caffe2.python.normalizer
from __future__ import absolute_import, division, print_function, unicode_literals
class Normalizer(object):
def __init__(self):
pass
"""
Adds normalization to train_net for given parameter. Its factor ahead of
regularization is given when initialization.
The param should be a BlobReference.
"""
def __call__(self, net, param):
return self._run(net, param)
def _run(self, net, param):
raise Exception("Not Impelemented")
class BatchNormalizer(Normalizer):
def __init__(self, momentum, scale_init_value=1.0):
super(BatchNormalizer, self).__init__()
self._momentum = float(momentum)
self._scale_init_value = float(scale_init_value)
def _run(self, layer_model, param):
return layer_model.BatchNormalization(
param, momentum=self._momentum, scale_init_value=self._scale_init_value
)
class LayerNormalizer(Normalizer):
def __init__(self, epsilon, use_layer_norm_op=True, scale_init_value=1.0):
super(LayerNormalizer, self).__init__()
self._epsilon = float(epsilon)
self._use_layer_norm_op = use_layer_norm_op
self._scale_init_value = float(scale_init_value)
def _run(self, layer_model, param):
return layer_model.LayerNormalization(
param, epsilon=self._epsilon, use_layer_norm_op=self._use_layer_norm_op, scale_init_value=self._scale_init_value
)