Drop, Swap, and Generate: A Self-SupervisedApproach for Disentangling Neural Activity


Learning meaningful disentangled representations of brain activity is a critical first step towards gaining insights into how information is processed within neural circuits. However, without labels, this process can be challenging. Here, we develop a novel self-supervised approach for disentangling neural activity called SWAP-VAE. Our approach is inspired by methods used in computer vision that aim to decompose images into their content and style: the representation of the content should give us the ``gist'' of the image, and the style components are needed to create something realistic to mimic true structures in the data. We apply this idea to the activities of many neurons in the primate brain and ask whether these datasets can be decomposed into their underlying content and style. To build the content space in our model, we propose an instance-specific alignment loss that effectively tries to maximize the representational similarity between transformed views of the input (neural state); by dropping out neurons and jittering samples in time and using these transformed views as inputs to our alignment loss, we essentially ask the network to find a representation that maintains both temporal consistency and invariance to the specific neurons used to represent the brain state. We then couple this with a generative model to recreate and simulate new high-dimensional neural activities. Through evaluations on both synthetic and real neural datasets from hundreds of neurons in the primate brain, we show that through combining our self-supervised alignment loss with a generative model, we can build representations that disentangle neural datasets.