A large-scale extendable dataset which generates question and answer pairs from a range of question types at roughly school-level difficulty. It is designed to test the mathematical learning and algebraic reasoning skills of learning models.
StreetLearn dataset for academic research, based on Google Street View images of two cities.
This repository contains levels for boxoban, a box-pushing puzzle game inspired by Sokoban.
Abstract reasoning matrices
Progressive matrices dataset, as described in: Measuring abstract reasoning in neural networks.
Spatial language Integrating Model (SLIM)
This dataset consists of virtual scenes rendered in MuJoCo with multiple views each presented in multiple modalities: image, and synthetic or natural language descriptions. Each scene consists of two or three objects placed on a square walled room, and for each of the 10 camera viewpoint we render a 3D view of the scene as seen from that viewpoint as well as a synthetically generated description of the scene.
This repository contains an entailment dataset for propositional logic, and code for generating that dataset. It also contains code for parsing the dataset in Python.
A large-scale, high-quality dataset of URL links to approximately 300,000 video clips that covers 400 human action classes, including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging. Each action class has at least 400 video clips. Each clip is human annotated with a single action class and lasts around 10s.
This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
AQuA-RAT (Algebra Question Answering with Rationales)
A large-scale dataset consisting of approximately 100,000 algebraic word problems. The solution to each question is explained step-by-step using natural language. This data is used to train a program generation model that learns to generate the explanation, while generating the program that solves the question.
dSprites - Disentanglement testing Sprites dataset
This dataset consists of 737,280 images of 2D shapes, procedurally generated from 5 ground truth independent latent factors, controlling the shape, scale, rotation and position of a sprite. This data can be used to assess the disentanglement properties of unsupervised learning methods.
Metacontrol for Adaptive Imagination-Based Optimization task
An artificially generated dataset for the spaceship task from 'Metacontrol for Adaptive Imagination-Based Optimization'. We generated five datasets, each containing scenes with a different number of planets (ranging from a single planet to five planets). Each dataset consisted of 100,000 training scenes and 1,000 testing scenes.
Collectible Card Game to Code
This dataset contains the language to code datasets described in our paper 'Latent Predictor Networks for Code Generation'.
Unsupervised Data Generated for GeoQuery and SAIL
This dataset contains the generated unsupervised data for GeoQuery and SAIL semantic parsing tasks in our paper 'Semantic Parsing with Semi-Supervised Sequential Autoencoders'.