A large-scale, high-quality dataset of URL links to approximately 300,000 video clips that covers 400 human action classes, including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging. Each action class has at least 400 video clips. Each clip is human annotated with a single action class and lasts around 10s.
AQuA-RAT (Algebra Question Answering with Rationales)
A large-scale dataset consisting of approximately 100,000 algebraic word problems. The solution to each question is explained step-by-step using natural language. This data is used to train a program generation model that learns to generate the explanation, while generating the program that solves the question.
dSprites - Disentanglement testing Sprites dataset
This dataset consists of 737,280 images of 2D shapes, procedurally generated from 5 ground truth independent latent factors, controlling the shape, scale, rotation and position of a sprite. This data can be used to assess the disentanglement properties of unsupervised learning methods.
DeepMind CNN/Daily Mail Reading Comprehension Corpus
This dataset contains over 1.5 million question and answer pairs for a reading comprehension task based on articles from the CNN and Daily Mail. Questions, answers and context are anonymised with random entity markers, thereby forcing systems to answer questions purely based on the context provided. This dataset accompanies the 'Teaching Machines to Read and Comprehend' paper.
Metacontrol for Adaptive Imagination-Based Optimization task
An artificially generated dataset for the spaceship task from 'Metacontrol for Adaptive Imagination-Based Optimization'. We generated five datasets, each containing scenes with a different number of planets (ranging from a single planet to five planets). Each dataset consisted of 100,000 training scenes and 1,000 testing scenes.