SmaSH: One-shot model architecture search through hypernetworks

Andrew Brock, Theodore Lim, James Millar Ritchie, Nicholas J. Weston

Research output: Contribution to conferencePaperpeer-review

254 Citations (Scopus)

Abstract

Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model’s architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectively search over a wide range of architectures at the cost of a single training run. To facilitate this search, we develop a flexible mechanism based on memory read-writes that allows us to define a wide range of network connectivity patterns, with ResNet, DenseNet, and FractalNet blocks as special cases. We validate our method (SMASH) on CIFAR-10 and CIFAR-100, STL-10, ModelNet10, and Imagenet32x32, achieving competitive performance with similarly-sized hand-designed networks.

Original languageEnglish
Publication statusPublished - May 2018
Event6th International Conference on Learning Representations 2018 - Vancouver Convention Center, Vancouver, Canada
Duration: 30 Apr 20183 May 2018
Conference number: 6
https://iclr.cc/

Conference

Conference6th International Conference on Learning Representations 2018
Abbreviated titleICLR 2018
Country/TerritoryCanada
CityVancouver
Period30/04/183/05/18
Internet address

Keywords

  • Deep Learning
  • Hypernetworks
  • Transfer leanring
  • Benchmarking

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Computer Science Applications
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'SmaSH: One-shot model architecture search through hypernetworks'. Together they form a unique fingerprint.

Cite this