Datasets for Multi-object Representation Learning
We've released several datasets for multi-object representation learning, used in developing scene decomposition methods like MONet and IODINE. The datasets consist of multi-object scenes. Each image is accompanied by ground-truth segmentation masks for all objects in the scene. We also provide per-object generative factors (except in Objects Room) to facilitate representation learning. The generative factors include all necessary and sufficient features (size, color, position, etc.) to describe and render the objects present in a scene. We also provide a metric to compare inferred object segmentations with ground-truth segmentation masks.