On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models

Abstract

Recent work has shown that it is possible to train deep neural networks that are verifiably robust to norm-bounded adversarial perturbations. Most of these methods are based on minimizing an upper bound on the worst-case loss over all possible adversarial perturbations. While these techniques show promise, they remain hard to scale to larger networks. Through a comprehensive analysis, we show how a careful implementation of a simple bounding technique, interval bound propagation (IBP), can be exploited to train verifiably robust neural networks that beat the state-of-the-art in verified accuracy. While the upper bound computed by IBP can be quite weak for general networks, we demonstrate that an appropriate loss and choice of hyper-parameters allows the network to adapt such that the IBP bound is tight. This results in a fast and stable learning algorithm that outperforms more sophisticated methods and achieves state-of-the-art results on MNIST, CIFAR-10 and SVHN. It also allows us to obtain the first verifiably robust model on a downscaled version of ImageNet.

Publications