Characterizing Signal Propagation to Design Performant ResNets Without Normalization


Batch Normalization is a key component in virtually every state of the art image classifier, but it also introduces a number of practical challenges. It breaks the independence between training examples in the same batch, can be a surprisingly expensive operation to compute, and often introduces unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we develop a set of analysis tools which characterize signal propagation on the forward pass, and leverage these tools to design a simple class of ResNets without normalization layers. Across a range of compute budgets, our networks attain performance competitive with the state of the art EfficientNet networks on ImageNet. Crucial to our success is an adapted version of the recently proposed Weight Standardization. Our analysis tools show how this technique prevents signal collapse in ReLU networks, by ensuring that per channel mean activations do not grow with depth.