Hierarchical Protein Function Prediction with Tails-GNNs


Protein function prediction may be framed as predicting subgraphs (with certain closure properties) of a directed acyclic graph describing the hierarchy of protein functions. Graph neural networks (GNNs), with their built-in inductive bias for relational data, are hence naturally suited for this task. However, in contrast with most GNN applications, the graph is not related to the input, but to the label space. Accordingly, we propose Tail-GNNs, neural networks which naturally compose with the output space of any neural network for multi-task prediction, to provide relationally-reinforced labels. For protein function prediction, we combine a Tail-GNN with a dilated convolutional network which learns representations of the protein sequence, making significant improvement in F_1 score and demonstrating the ability of Tail-GNNs to learn useful representations of labels and exploit them in real-world problem solving.