Meta data, i.e. number of instances and parameters as well as configuration budget. Statistics apply to the best run, if multiple configurator runs are compared.
General information about the optimization scenario.
# aggregated parallel BOHB runs | 10 |
---|---|
# parameters | 7 |
Deterministic target algorithm | True |
Optimized run objective | quality |
Information to specific runs (if there are multiple runs). Interesting for parallel optimizations or usage of budgets/fidelities.
budget 1 | budget 3 | budget 9 | |
---|---|---|---|
Total time spent evaluating configurations | 255454.91 sec | 458507.82 sec | 1153603.02 sec |
Average time per configuration (mean / std) | 66.01 sec (± 57.30) | 177.72 sec (± 138.02) | 544.15 sec (± 396.82) |
# evaluated configurations | 3870 | 2580 | 2120 |
# changed parameters (default to incumbent) | 7 | 7 | 7 |
Configuration origins | Acquisition Function : 2438, Random : 1432 | Acquisition Function : 2012, Random : 568 | Acquisition Function : 1617, Random : 503 |
Show the incumbents for each budget (i.e. the best configuration by kernel-estimation using data from that budget).
budget 1 | budget 3 | budget 9 | |
---|---|---|---|
batch_size | 10 | 10 | 8 |
discount | 0.958445 | 0.958445 | 0.990177 |
entropy_regularization | 0.219758 | 0.219758 | 0.206025 |
learning_rate | 0.00267234 | 0.00267234 | 0.00136828 |
likelihood_ratio_clipping | 0.949068 | 0.949068 | 0.922201 |
n_units_1 | 84 | 84 | 126 |
n_units_2 | 111 | 111 | 121 |
Cost | 123.0 | 140.0 | 179.889 |
Visualizing the learning curves of all individual HyperBand-iterations. Model-based picks are marked with a cross. The config-id tuple denotes (iteration, stage, id_within_stage), where the iteration is the hyperband iteration and the stage is the index of the budget in which the configuration was first sampled (should be 0). The third index is just a sequential enumeration. This id can be interpreted as a nested index-identifier.
Use spearman correlation to get a correlation-value and a p-value for every pairwise combination of budgets. First value is the correlation, second is the p-value (the p-value roughly estimates the likelihood to obtain this correlation coefficient with uncorrelated datasets). This can be used to estimate how well a budget approximates the function to be optimized.
Analysis of the iteratively sampled configurations during the optimization procedure. Multi-dimensional scaling (MDS) is used to reduce dimensionality of the search space and plot the distribution of evaluated configurations. The larger the dot, the more often the configuration was evaluated on instances from the set. Configurations that were incumbents at least once during optimization are marked as red squares. Configurations acquired through local search are marked with a 'x'. The downward triangle denotes the final incumbent, whereas the orange upward triangle denotes the default configuration. The heatmap and the colorbar correspond to the predicted performance in that part of the search space.
Depicts the average cost of the best so far found configuration (using all trajectory data) over the time spent by the configurator (including target algorithm runs and the overhead generated by the configurator) If the curve flattens out early, it indicates that too much time was spent for the configurator run; whereas a curve that is still improving at the end of the budget indicates that one should increase the configuration budget. The plotted standard deviation gives the uncertainty over multiple configurator runs.
Previously used by Golovin et al. to study the frequency of chosen parameter settings in black-box-optimization. Each line corresponds to one configuration in the runhistory and shows the parameter settings and the corresponding (estimated) average cost. To handle large configuration spaces with hundreds of parameters, the (at most) 10 most important parameters based on a fANOVA parameter importance analysis are plotted. To emphasize better configurations, the performance is encoded in the color of each line, ranging from blue to red. These plots provide insights into whether the configurator focused on specific parameter values and how these correlate to their costs. NOTE: the given runhistory should contain only optimization and no validation to analyze the explored parameter-space.
Parameter Importance analysis to determine which of the parameters most influence the analysed algorithms performance.
Parameters are initially sorted by average. Only parameters with an importance greater than 5 in any of the methods are shown. Note, that the values of the used methods are not directly comparable. For more information on the metrics, see respective tooltips.
fANOVA (functional analysis of variance) computes the fraction of the variance in the cost space explained by changing a parameter by marginalizing over all other parameters, for each parameter (or for pairs of parameters). Parameters with high importance scores will have a large impact on the performance. To this end, a random forest is trained as an empirical performance model on the available empirical data from the available runhistories.
-------------------- Single importance: -------------------- | -------------------- |
---|---|
discount | 20.4606 +/- 11.784 |
batch_size | 5.4967 +/- 3.8915 |
learning_rate | 3.4615 +/- 3.026 |
entropy_regularization | 1.6406 +/- 1.8142 |
n_units_1 | 0.7475 +/- 0.9433 |
likelihood_ratio_clipping | 0.7267 +/- 0.6724 |
n_units_2 | 0.5735 +/- 0.4222 |
-------------------- Pairwise importance: -------------------- | -------------------- |
discount & batch_size | 6.6184 +/- 5.4168 |
discount & learning_rate | 5.6498 +/- 3.9109 |
discount & entropy_regularization | 3.6442 +/- 3.265 |
batch_size & learning_rate | 3.0571 +/- 2.0302 |
discount & n_units_1 | 2.3893 +/- 3.2305 |
learning_rate & entropy_regularization | 0.8282 +/- 1.6415 |
batch_size & entropy_regularization | 0.8039 +/- 0.5902 |
batch_size & n_units_1 | 0.4626 +/- 0.4853 |
learning_rate & n_units_1 | 0.1925 +/- 0.2415 |
entropy_regularization & n_units_1 | 0.1666 +/- 0.2919 |
-------------------- Single importance: -------------------- | -------------------- |
---|---|
discount | 13.9079 +/- 8.9614 |
batch_size | 7.9511 +/- 5.432 |
entropy_regularization | 4.6757 +/- 6.288 |
learning_rate | 3.9117 +/- 2.6278 |
likelihood_ratio_clipping | 3.8846 +/- 7.9333 |
n_units_1 | 2.8052 +/- 4.8185 |
n_units_2 | 0.6493 +/- 0.8385 |
-------------------- Pairwise importance: -------------------- | -------------------- |
discount & batch_size | 7.1254 +/- 3.291 |
discount & likelihood_ratio_clipping | 6.3741 +/- 5.6589 |
discount & learning_rate | 4.387 +/- 2.6596 |
discount & entropy_regularization | 3.2372 +/- 3.3237 |
batch_size & learning_rate | 2.5406 +/- 1.2372 |
batch_size & likelihood_ratio_clipping | 0.8555 +/- 0.5317 |
batch_size & entropy_regularization | 0.8151 +/- 0.7631 |
entropy_regularization & likelihood_ratio_clipping | 0.7775 +/- 1.8244 |
learning_rate & likelihood_ratio_clipping | 0.7425 +/- 0.787 |
entropy_regularization & learning_rate | 0.428 +/- 0.4925 |
-------------------- Single importance: -------------------- | -------------------- |
---|---|
discount | 19.317 +/- 8.7635 |
batch_size | 15.7729 +/- 9.1154 |
learning_rate | 3.7046 +/- 5.6795 |
likelihood_ratio_clipping | 3.3983 +/- 3.9136 |
n_units_1 | 1.8647 +/- 3.1457 |
entropy_regularization | 1.2473 +/- 1.949 |
n_units_2 | 0.3915 +/- 0.5608 |
-------------------- Pairwise importance: -------------------- | -------------------- |
discount & batch_size | 10.4271 +/- 5.1149 |
discount & likelihood_ratio_clipping | 4.8495 +/- 4.4263 |
discount & learning_rate | 4.4318 +/- 5.9691 |
batch_size & learning_rate | 3.0992 +/- 1.8549 |
discount & n_units_1 | 2.987 +/- 4.4374 |
batch_size & likelihood_ratio_clipping | 2.4033 +/- 2.5565 |
likelihood_ratio_clipping & n_units_1 | 0.5867 +/- 1.3872 |
batch_size & n_units_1 | 0.5747 +/- 0.553 |
learning_rate & likelihood_ratio_clipping | 0.4292 +/- 0.4297 |
learning_rate & n_units_1 | 0.2549 +/- 0.6863 |
Using an empirical performance model, performance changes of a configuration along each parameter are calculated. To quantify the importance of a parameter value, the variance of all cost values by changing that parameter are predicted and then the fraction of all variances is computed. This analysis is inspired by the human behaviour to look for improvements in the neighborhood of individual parameters of a configuration.