Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

Jeffrey O. Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra Malik

June 2020

PDF Code Project Project Site

Abstract

When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than start with a randomly initialized one – due to lacking enough training data, performing lifelong learning where the system has to learn a new task while being previously trained for other tasks, or wishing to encode priors in the network via preset weights. The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others.
In this paper we propose a straightforward alternative: Side-Tuning. Side-tuning adapts a pre-trained network by training a lightweight ;‘side’ network that is fused with the (unchanged) pre-trained network using a simple additive process. This simple method works as well as or better than existing solutions while it resolves some of the basic issues with fine-tuning, fixed features, and several other common baselines. In particular, side-tuning is less prone to overfitting when little training data is available, yields better results than using a fixed feature extractor, and does not suffer from catastrophic forgetting in lifelong learning. We demonstrate the performance of side-tuning under a diverse set of scenarios, including lifelong learning (iCIFAR, Taskonomy), reinforcement learning, imitation learning (visual navigation in Habitat), NLP question-answering (SQuAD v2), and single-task transfer learning (Taskonomy), with consistently promising results.

Type

Conference paper

Publication

In Conference on Computer Vision and Pattern Recognition, IEEE

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

Click the Slides button above to demo Academic’s Markdown slides feature.

Supplementary notes can be added here, including code and math.

Robustness

Alexander Sax

PhD Student, Computer Science

My research interests include distributed robotics, mobile computing and programmable matter.

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

Abstract

Alexander Sax

PhD Student, Computer Science

Related