SOLA: Continual Learning with Second-Order Loss Approximation
Neural networks have achieved remarkable success in many cognitive tasks. However, when they are trained sequentially on multiple tasks without access to old data, it is observed that their performance on old tasks tend to drop significantly after the model is trained on new tasks. Continual learning aims to tackle this problem often referred to as catastrophic forgetting and to ensure sequential learning capability. We study continual learning from the perspective of loss landscapes and propose to construct a second-order Taylor approximation of the loss functions in previous tasks. Our proposed method does not require any memorization of raw data or their gradients, and therefore, offers better privacy protection. We theoretically analyze our algorithm from an optimization viewpoint and provide a sufficient and worst-case necessary condition for the gradient updates on the approximate loss function to be descent directions for the true loss function. Experiments on multiple continual learning benchmarks suggest that our method is effective in avoiding catastrophic forgetting and in many scenarios, outperforms several baseline algorithms that do not explicitly store the data samples.