Kinetics is a large-scale, high-quality dataset of YouTube video URLs which include a diverse range of human focused actions. Our aim in releasing the Kinetics dataset is to help the machine learning community to advance models for video understanding.
The dataset consists of approximately 300,000 video clips, and covers 400 human action classes with at least 400 video clips for each action class. Each clip lasts around 10s and is labeled with a single class. All of the clips have been through multiple rounds of human annotation, and each is taken from a unique YouTube video. The actions cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging.
Kinetics forms the basis of an international human action classification competition being organised by ActivityNet.
For a detailed description of how the dataset was compiled and baseline classifier performance see our paper.
The Kinetics Human Action Video Dataset
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, Andrew Zisserman,
arXiv:1705.06950, May 2017
Please cite the paper if you use the dataset.
- Kinetics Training (ZIP file)
- Kinetics Validation (ZIP file)
- Kinetics Test (ZIP file)
- Kinetics Readme (TXT file)
The dataset is made available by Google, Inc. under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
To provide suggestions for new human action classes and other feedback on the dataset click here.
Browse the Dataset
Explore a selection of clips from the dataset. You can also browse full screen.
A cautionary note on the use of this dataset: Kinetics is drawn from the videos uploaded to YouTube, based on the title of the video provided by the uploader. This means that the clips obtained reflect the distribution of the uploaded videos. For example, some classes may contain predominantly males or females, and there might be a bias towards exciting and unusual videos. Consequently, the dataset is neither intended to be a canonical catalogue of human activities, nor are the example clips for the included action classes intended to be canonical representations of these actions. In particular, the distribution of gender, race, age or other factors across the depicted human actors should not be interpreted as representing the actual distribution of human actors.