Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model's architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectively search over a wide range of architectures at the cost of a single training run. To facilitate this search, we develop a flexible mechanism based on memory read-writes that allows us to define a wide range of network connectivity patterns, with ResNet, DenseNet, and FractalNet blocks as special cases. We validate our method (SMASH) on CIFAR-10 and CIFAR-100, STL-10, ModelNet10, and Imagenet32x32, achieving competitive performance with similarly-sized hand-designed networks.
|Publication status||Published - May 2018|
|Event||6th International Conference on Learning Representations - Vancouver Convention Center, Vancouver , Canada|
Duration: 30 Apr 2018 → 3 May 2018
Conference number: 6
|Conference||6th International Conference on Learning Representations|
|Abbreviated title||ICLR 2018|
|Period||30/04/18 → 3/05/18|
- Deep Learning
- Transfer leanring
ASJC Scopus subject areas
- Artificial Intelligence
Brock, A., Lim, T., Ritchie, J. M., & Weston, N. J. (2018). SMASH: One-Shot Model Architecture Search through HyperNetworks. Poster session presented at 6th International Conference on Learning Representations, Vancouver , Canada.