Self-supervised learning with kernel dependence maximization
Self-supervised image representation learning has greatly closed the gap with its supervised counterpart and even outperforms it on some of the downstream tasks due to contrastive-style losses and carefully selected data augmentations. We study self-supervised learning from a statistical dependence perspective and propose SSL-HSIC Bottleneck, a new loss function which maximizes the dependence between the features and the self-supervised "labels" using kernel dependence measure. We establish connections from SSL-HSIC Bottleneck to both mutual information and clustering, which sheds new lights on how contrastive learning works. Our method is theoretically sound, doesn't involve computing variational bounds and doesn't need a target network to work. Our algorithm scales linearly with the batch size using Random Fourier Features. Finally, SSL-HSIC Bottleneck achieves a performance on par with current state of the art for downstream tasks.