Jump to Content

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

Published
View publication Download

Abstract

For robots to be useful outside labs and factories we need to be able to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational bottleneck to allow fast and general-purpose learning from demonstration. Our approach utilizes Track-Any-Point (TAP) models to isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene. We show that this results in robust robot policies that can solve complex object-arrangement tasks such as kitting, stacking up to four objects, and even full path-following tasks such as applying glue and sticking objects together, all from demonstrations that can be collected in minutes.

Authors

Mel Vecerik, Carl Doersch, Yi Yang, Todor Davchev, Yusuf Aytar, Stannis Zhou, Andrew Zisserman, Jon Scholz, Lourdes Agapito*

Venue

ICRA 2024