Learning rich touch representations through cross-modal self-supervision


The sense of touch is fundamental in several manipulation tasks, but rarely used in robotic manipulation. In this work we tackle the problem of learning rich touch features from cross-modal self-supervision, evaluating them on few4 shot classification to identify objects and their properties. Two new datasets are introduced using an anthropomorphic robotic hand equipped with tactile sensors, and both synthetic and daily life objects. Several self-supervised learning methods are benchmarked on these datasets, and evaluated on few-shot classification, on unseen objects and on pose estimation. Our experiments indicate that cross-modal self-supervision effectively improve touch representation, and in turn performance for robotics manipulation skills.