AutoML | Hyperparameter Optimization

The quality of performance of a Machine Learning model heavily depends on its hyperparameter settings. Given a dataset and a task, the choice of the machine learning (ML) model and its hyperparameters is typically performed manually. Hyperparameter Optimization (HPO) algorithms aim to alleviate this task as much as possible for the human expert.

The design of an HPO algorithm depends on the nature of the task and its context, such as the optimization budget and available information. Below are some of the different flavors of performing HPO.

Black-box Optimization

Bayesian Optimization is widely recognized as one of the most popular approaches for HPO, thanks to its sample efficiency, flexibility, and convergence guarantees. The central concept revolves around treating all desired tuning decisions within an ML pipeline as a search space or domain for a function. This function represents the evaluation of an ML pipeline under a fixed compute budget and yields a performance metric that is typically minimized. Through the iterative suggestion of promising configurations, HPO algorithms strive to converge toward the global optimum.

Bayesian Optimization

Combined Algorithms Selection and Hyperparameter Optimization (CASH)

An AutoML system needs to select not only the optimal hyperparameter configuration of a given model but also which model to be used. This problem can be regarded as a single HPO problem with a hierarchy configuration space, where the top-level hyperparameter decides which algorithm to choose and all other hyperparameters depend on this one. To deal with such complex and structured configuration spaces, we apply for example random forests as surrogate models in Bayesian Optimization.

Our Packages

Auto-sklearn provides out-of-the-box supervised machine learning by modeling the search space as a CASH problem.
Auto-Pytorch is a framework for automatically searching neural network architecture and its hyperparameters and also makes use of structured configuration space.
SMAC implements a random forest as a surrogate model which can efficiently deal with structured search spaces.

Using many-fidelities for early-stopping HPO

The black-box view adopted by Bayesian Optimization can be relaxed with a gray-box view, which allows access to intermediate states of a targeted machine-learning model. That is, the function to be optimized has a proxy state along one or more variables (fidelities) that can be obtained at a cheaper cost and likely indicates the performance of the target state. HPO algorithms that can leverage search over fidelities can provide better anytime performance.

Many-fidelity HPO

Speeding up HPO with learning curves extrapolation techniques

Various machine learning algorithms that are trained iteratively yield learning curves. Under different hyperparameter settings, different learning curves can be obtained. Exploiting the smooth trends of a learning curve from a partially trained machine learning model to predict future performance is an active area of research with promising results.

HPO speedup with learning curve extrapolation

HPO with expert prior inputs

Although HPO can be seen as removing the human from the loop, the intuition and experience of the human expert offer valuable information as a guide for an HPO algorithm. The challenge then is to find suitable interfaces and principled methodologies to realize practical algorithms.

Expert-in-the-loop HPO

Benchmarks for reproducible research

Evaluation of AutoML and especially of HPO faces many challenges. For example, many repeated runs of HPO can be computationally expensive, the benchmarks can be fairly noisy, and it is often not clear which benchmarks are representative of typical HPO applications. Therefore, we develop HPO benchmark collections that improve reproducibility and decrease the computational burden on researchers.

HPO Benchmarks

Other resources

The book “AutoML: Methods, System, Challengers” provides a concise overview about HPO.
For more focused HPO for DL, refer here.
HPO to optimize for multiple objectives (MOO) can be found here.
An Overview of various HPO tools available as open-source.

Blog Posts

Please also check our blog posts for our work in HPO (including BO)

LC-PFN

Self-Adjusting Bayesian Optimization with SAWEI

PFNs4BO: In-Context Learning for Bayesian Optimization

Hyperparameter Tuning in Reinforcement Learning is Easy, Actually

Call for Datasets: OpenML 2023 Benchmark Suites

DEHB

HPOBench: Compare Multi-fidelity Optimization Algorithms with Ease

TrivialAugment: You don’t need to tune your augmentations for image classification

Auto-Sklearn – What happened in 2020

Auto-Sklearn 2.0: The Next Generation

BOHB: Robust and Efficient Hyperparameter Optimization at Scale