Mutual Information Constraints for Monte-Carlo Objectives


The best density models trained as variational autoencoders have a tendency to model the data without relying on their latent variables, rendering these variables useless. Two contributing factors, the underspecification of the model and the looseness of the variational lower bound, have been studied separately in the literature. We weave these two strands of research together, specifically the tighter bounds of Monte-Carlo objectives and constraints on the mutual information between data and the latent variables. Estimating the mutual information as the average Kullback-Leibler divergence between the easily available variational posterior $q(z|x)$ and the prior does not work with Monte-Carlo objectives because $q(z|x)$ is no longer a direct approximation to the model's true posterior $p(z|x)$. Hence, we construct estimators of the Kullback-Leibler divergence of the true posterior from the prior by repurposing the samples used in the objective, which allows us to train models of continuous and discrete latents at much improved rate-distortion and no posterior collapse. Our experiments indicate a severe tradeoff between modelling the data and using the latents, emphasizing the need for evaluating inference methods with this tradeoff in mind.