Progress in practical hyperparameter tuning is often hampered by the fact that there are no standardized benchmark problems. To alleviate this problem we maintain HPOlib, a library which provides a unified interface machine learning tasks.

Currently we are working on HPOlib2, a new version focusing on reproducible containerized benchmarks.

Our previous versions, HPOlib1 can be found here.

Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters [pdf] [bib] [poster]
NeurIPS workshop on Bayesian Optimization in Theory and Practice
This includes results for SMAC, spearmint and TPE on the benchmarks we provided in HPOlib1