Title: Metalearning using structure-rich pipeline representations for improved AutoML

Authors: Brandon Schoenfeld; Kevin Seppi; Christophe Giraud-Carrier

Addresses: PassiveLogic, Inc., 6405 S 3000 E Ste 300, Salt Lake City, UT, USA ' Department of Computer Science, Brigham Young University, Provo, UT, USA ' Department of Computer Science, Brigham Young University, Provo, UT, USA

Abstract: Automatic machine learning (AutoML) systems have been shown to perform better when they learn from past experience. Examples include Auto-sklearn, which warm-starts the ML pipeline search using existing programs known to perform well on 'similar' tasks, and AlphaD3M, which uses online reinforcement learning to search the ML pipeline space. These metalearning approaches, as well as many others, depend on simplifying assumptions about the pipeline search space and/or the pipeline representation. Here, we attempt to extend the applicability of AutoML by relaxing such simplifications. Using a sizable metadataset of 194 classification tasks and 4,592 pipelines, we show that using pipeline metadata, including the underlying DAG structure, leads to better estimates of pipeline performance and to more robust rankings of pipelines.

Keywords: automatic machine learning; AutoML; metalearning; democratisation of data analysis.

DOI: 10.1504/IJDATS.2022.129174

International Journal of Data Analysis Techniques and Strategies, 2022 Vol.14 No.4, pp.267 - 282

Received: 10 Dec 2021
Received in revised form: 05 Oct 2022
Accepted: 16 Oct 2022

Published online: 27 Feb 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article