Neural rate control for video encoding using imitation learning


Rate control is a critical component in video encoders and has been heavily engineered in modern video encoders. It decides, for every video frame, how many bits to spend to encode the frame, in order to optimize the rate-distortion trade-off over all video frames. This is a challenging constrained planning problem because of the complex dependency among decisions of different frames and the bitrate constraint defined at the end of the episode.

We formulate the rate control problem as a Markov Decision Process (MDP), and apply imitation learning to learn an offline optimal control policy from video encoding trajectories. Because imitation learning does not interact with the environment during training, it is extremely hard for the learned policy to satisfy the episodic constraint of bitrate. We find that by employing auxiliary losses and hindsight experience reply during training, and by augmenting the learned policy with a truncation trick and feedback control, we can achieve better encoding efficiency and comparable bitrate accuracy, compared to VP9, a widely used video encoder. We evaluate our approach on a diverse set of real world videos, and demonstrate that the learned model achieves significant reduction in bitrate without sacrificing visual quality.