Much work in neural architecture search (NAS) is extremely compute hungry — so compute hungry that it hurts progress and scientific rigor in the community. When individual experiments require 800 GPUs for weeks nobody in academia can meaningfully join the community, and even in companies with huge compute resources nobody thinks of repeating their experiments many times in order to assess stability and statistical significance of results. Combine this with proprietary code bases and the whole thing looks pretty gloomy in terms of building an inclusive, open-minded yet rigorous scientific community.
To address this systemic problem, my lab teamed up with Google Brain to use their large-scale resources to create a NAS benchmark that makes future research on NAS dramatically cheaper, more reproducible, and more scientific. How? By evaluating a small cell search space exhaustively and saving the results to a table. The result, our NAS-Bench-101 benchmark, allows anyone to benchmark their own NAS algorithm on a laptop, in seconds: whenever that algorithm queries the performance of a cell, instead of training a neural network with that cell for hours on a GPU, we simply take a second to look up the result. Indeed, we evaluated 423k different architectures, with 3 repetitions each — and even with 4 different epoch budgets each in order to be able to benchmark multi-fidelity optimizers, such as Hyperband and BOHB. Of course, we made all of this data publicly available, and importantly, all the exact code for training the networks used for this data is also open-source.
Figure 1: Evaluation of NAS & HPO algorithms on NAS-Bench-101. Note that on the right, reinforcement learning (RL) improves over random search (RS) and Hyperband (HB), but regularized evolution (RE) and the Bayesian optimization algorithms SMAC and BOHB perform better yet.