Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

Abstract

When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than start with a randomly initialized one – due to lacking enough training data, performing lifelong learning where the system has to learn a new task while being previously trained for other tasks, or wishing to encode priors in the network via preset weights. The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others.
In this paper we propose a straightforward alternative: Side-Tuning. Side-tuning adapts a pre-trained network by training a lightweight ;‘side’ network that is fused with the (unchanged) pre-trained network using a simple additive process. This simple method works as well as or better than existing solutions while it resolves some of the basic issues with fine-tuning, fixed features, and several other common baselines. In particular, side-tuning is less prone to overfitting when little training data is available, yields better results than using a fixed feature extractor, and does not suffer from catastrophic forgetting in lifelong learning. We demonstrate the performance of side-tuning under a diverse set of scenarios, including lifelong learning (iCIFAR, Taskonomy), reinforcement learning, imitation learning (visual navigation in Habitat), NLP question-answering (SQuAD v2), and single-task transfer learning (Taskonomy), with consistently promising results.

Publication
In Conference on Computer Vision and Pattern Recognition, IEEE
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Click the Slides button above to demo Academic’s Markdown slides feature.

Supplementary notes can be added here, including code and math.

Alexander Sax
Alexander Sax
PhD Student, Computer Science

My research interests include distributed robotics, mobile computing and programmable matter.

Related