A Closer Look at the Adversarial Robustness of Information Bottleneck Models


We study the adversarial robustness of information bottleneck models for classification. The results of our evaluation under a diverse range of white-box $l_{\infty}$ attacks suggest that information bottlenecks alone do not lead to strong defense strategies. Previous results that showed improved robustness of models trained with information bottleneck objectives in comparison to adversarial training were likely influenced by gradient obfuscation.