AutoML in the Age of Pretrained Models

With the ever-growing sizes of models, the size and type of data used for training and the increasing need for large compute resources require a realignment of perspectives for AutoML. Given the impressive performance of large pretrained models, lack of easy access to compute and data for training them, and infeasible costs for hyperparameter tuning for training these pretrained models and fine-tuning them for downstream tasks, requires a realignment of perspective on AutoML.

This presents 2 broad directions of interfacing AutoML and large model training.

AutoML for Pretrained models

Under this view, we look at leveraging existing or developing new methods for efficient HPO:

  • ZAP-HPO: Meta-learns over a meta dataset of pretrained models with their fine-tuning hyperparameters for zero-shot hyperparameter prediction on an unseen test task.
  • PriorBand: Allows expert prior interface to HyperBand for efficient, robust multi-fidelity HPO under short compute budgets.

Pretrained models for AutoML

We also look at leveraging the strong performance of pretrained models for various downstream applications in AutoML:

  • LC-PFN: Uses a transformer pretrained on synthetic data for a Bayesian Learning Curve Exptrapolation applied for early stopping in multi-fidelity HPO.
  • CAAFE: We use automated prompting with GPT-4 for feature engineering on tabular datasets. We tell GPT-4 about a tabular dataset and ask it what operations it would perform on the dataset before feeding it to a standard ML algorithm. We build a loop out of this, feeding back the change in cross-validation performance of the last operation performed. CAAFE comes up with very interesting approaches to improve performance, e.g. it splits string attributes into multiple categorical features with lower cardinality or bins ages into relevant subgroups.