Robustness

Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans

The Omnidata annotator is a pipeline to resample comprehensive 3D scans from the real-world into static multi-task vision datasets. Because this is resampling is parametric, we can control or steer datasets. This enables interesting lines of research …

Robustness via Cross-Domain Ensembles

We present a method for making neural network predictions robust to shifts from the training data distribution. The proposed method is based on making predictions via a diverse set of cues (called ‘middle domains’) and ensembling them into one strong …

Robust Policies via Mid-Level Visual Representations

Vision-based robotics often separates the control loop into one module for perception and a separate module for control. It is possible to train the whole system end-to-end (e.g. with deep RL), but doing it 'from scratch' comes with a high sample …

Robust Learning Through Cross-Task Consistency

Visual perception entails solving a wide set of tasks, e.g., object detection, depth estimation, etc. The predictions made for multiple tasks from the same image are not independent, and therefore, are expected to be 'consistent'. We propose a …

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than start with a randomly initialized one -- due to lacking enough training data, performing lifelong learning where the system has to learn a …

Mid-Level Visual Priors Improve Generalization and Sample Efficiency for Learning Visuomotor Policies

How much does having **visual priors about the world** (e.g. the fact that the world is 3D) assist in learning to perform **downstream motor tasks** (e.g. delivering a package)? We study this question by integrating a generic perceptual skill set …

GibsonEnv: Embodied Real-World Active Perception

Perception and being active (i.e. having a certain level of motion freedom) are closely tied. Learning active perception and sensorimotor control in the physical world is cumbersome as existing algorithms are too slow to efficiently learn in …

2D-3D-Semantic Data for Indoor Scene Understanding

We present a dataset of large-scale indoor spaces that provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations. The dataset covers over 6,000 m2 and contains over …