NeRF-VAE: A Geometry Aware 3D Scene Generative Model


We propose NeRF-VAE, a 3D scene generative model that incorporates geometric structure via NeRF and differentiable volume rendering. In contrast to NeRF, our model is able to infer scene structure from few input views---without the need to re-train---using amortized inference. NeRF-VAE is further able to handle uncertainty, as opposed to NeRF and recently proposed deterministic variants. NeRF-VAE's explicit 3D rendering process also contrasts previous convolutional scene generative models, whose rendering process lacks geometric structure. NeRF-VAE is a variational autoencoder that learns a distribution over radiance fields, where a particular radiance field is conditioned on a latent scene representation. We show that, once trained, NeRF-VAE is able to infer and render geometrically-consistent scenes from previously unseen 3D environments using very few input images. We further demonstrate that NeRF-VAE generalizes well to out-of-distribution cameras, while convolutional generative models do not. Finally, we study different methods of conditioning NeRF-VAE's decoder on the latent representation and introduce a novel attention-based mechanism.