Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

A Meterez, A Joudaki, F Orabona, A Immer, G Rätsch, H Daneshmand
ICLR, 2024
Overview

Abstract

Normalization layers are one of the key building blocks for deep neural networks. Several theoretical studies have shown that batch normalization improves the signal propagation, by avoiding the representations from becoming collinear across the layers. However, results on mean-field theory of batch normalization also conclude that this benefit comes at the expense of exploding gradients in depth.

Links