Author Archives: André Biedenkapp

Learning Step-Size Adaptation in CMA-ES


Comparison of example optimization trajectory and corresponding step-size of our approach (“LTO”) to a hand-crafted heuristic (“CSA”)

In a Nutshell

In CMA-ES, the step size controls how fast or slow a population traverses through a search space. Large steps allow you to quickly skip over uninteresting areas (exploration), whereas small steps allow a more focused traversal of interesting areas (exploitation). Handcrafted heuristics usually trade off small and large steps given some measure of progress. Directly learning in which situation which step-size is preferable would allow us to act more flexible than a hand-crafted heuristic could. To learn such dynamic configuration policies, one approach is dynamic algorithm configuration (DAC) through the use of reinforcement learning. (more…)

Read More

Dynamic Algorithm Configuration


A versatile DAC can handle any situation at any time. (Image credit: Ina Lindauer)



When designing algorithms we want them to be as flexible as possible such that they can solve as many problems as possible. To solve a specific family of problems well, finding well-performing hyperparameter configurations requires us to either use extensive domain knowledge or resources. The second point is especially true if we want to use algorithms that we did not author ourselves. We most likely know nothing of the internals of the algorithm, so to figure out what makes this black-box tick, far too often, researchers of all flavors fall back to “Graduate Student Descent”.

Automated hyperparameter optimization and algorithm configuration offer far better alternatives, allowing for more structured and less error-prone optimization in potentially infinitely large search spaces. Throughout the search, various useful statistics are logged which can further allow us to gain insights into the algorithm, it’s hyperparameters and tasks the algorithm has to solve. At the end of the search, they present us with a hyperparameter configuration that was found to work well, e.g quickly finds a solution or has the best solution quality.

However the resulting static configurations these approaches are able to find exhibit two shortcomings:

  1. The configuration will only work well when the configured algorithm will be used to solve similar tasks.
  2. The iterative nature of most algorithms is ignored and a configuration is assumed to work best at each iteration of the algorithm.

To address Shortcoming 1, a combination of algorithm configuration and algorithm selection can be used. First search for well-performing but potentially complementary configurations (which solve different problems best) and then learn a selection mechanism to determine with which configuration to solve a problem. However, even this more general form of optimization (called per-instance algorithm configuration) is not able to address Shortcoming 2. (more…)

Read More

BOHB: Robust and Efficient Hyperparameter Optimization at Scale

By ,

Machine learning has achieved many successes in a wide range of application areas, but more often than not, these strongly rely on choosing the correct values for many hyperparameters (see e.g. Snoek et al., 2012). For example, we all know of the awesome results deep learning can achieve, but when we set its learning rates wrongly it might end in total disaster. And there are a lot of other hyperparameters, e.g. determining architecture and regularization, which you really need to set well to get strong performance.  A common solution these days is to use random search, but that is very inefficient. This post presents BOHB, a new and versatile tool for hyperparameter optimization which comes with substantial speedups through the combination of Hyperband with Bayesian optimization. The following figure shows an exemplary result: BOHB starts to make progress much faster than vanilla Bayesian optimization and random search and finds far better solutions than Hyperband and random search.

Example results of applying BOHB (freely available under to optimize six hyperparameters of a neural network. The curves show the immediate regret of the best configuration found by the methods as a function of time. BOHB combines the best of BO and Hyperband: it yields good performance (20x) faster than BO in the beginning, and converges much faster than Hyperband in the end.


Read More