On Multi-Label Meta-Learning for automated pipeline  recommendation

MAIA, Cynthia Moreira

Por favor, use este identificador para citar o enlazar este ítem: https://repositorio.ufpe.br/handle/123456789/67046

Comparte esta pagina

Título :	On Multi-Label Meta-Learning for automated pipeline recommendation
Autor :	MAIA, Cynthia Moreira
Palabras clave :	Fluxos; Meta-aprendizagem; Multirrótulo
Fecha de publicación :	15-oct-2025
Editorial :	Universidade Federal de Pernambuco
Citación :	MAIA, Cynthia Moreira. On Multi-Label Meta-Learning for automated pipeline recommendation. 2025. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.
Resumen :	Automated Machine Learning (AutoML) aims to automate stages of the machine learn ing process, such as algorithm selection, data preprocessing, and hyperparameter tuning. One of its main challenges is designing a search space that can handle different problems while ensuring the best trade-off between performance and computational cost. Traditional AutoML approaches primarily explore the search space online, utilizing optimization strategies such as Bayesian Optimization to identify the optimal configuration within a specified time budget. Although effective, such methods often result in high computational costs. In contrast, our proposal seeks to avoid online search strategies by employing meta-learning to address these challenges. This approach leverages the meta-features of problems to recommend solutions appropriate to their nature, thereby eliminating the need for exhaustive search at runtime. Accordingly, we propose MetaML, the first study of this thesis, a meta-learning approach based on multi-label algorithms for pipeline recommendation in AutoML. To this end, we present a curated search space design that automatically reduces the number of candidate pipelines, based on historical data from online repositories, including only the most frequently used pipelines with the best performance across a significant number of datasets. Additionally, we propose chained recommendations utilizing multi-label algorithms that take into account the interdependencies between pipeline stages. Experiments conducted on different datasets demonstrate the effectiveness of the approach, with MetaML achieving satisfactory results and, in some cases, superior outcomes at a lower computational cost compared to current AutoML methods. However, the pipelines derived from the repository experiments showed limited representativeness with respect to preprocessing techniques. As an alternative, we pro pose the PIPES meta-dataset, the second study of this thesis, which consists of a collection of experiments involving multiple pipelines, designed to represent all selected combinations of techniques, including different preprocessing blocks and a classification block. After con structing PIPES, we employed this meta-dataset in the third study of the thesis, MetaML 2.0, to investigate whether broader pipeline representativeness could yield even better results. The experiments demonstrated that this approach indeed achieved improved performance in specific datasets.
URI :	https://repositorio.ufpe.br/handle/123456789/67046
Aparece en las colecciones:	Teses de Doutorado - Ciência da Computação

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
TESE Cynthia Moreira Maia.pdf		2.62 MB	Adobe PDF	Visualizar/Abrir

Este ítem está protegido por copyright original

View License

Show full item record Recommend this item

This item is licensed under a Creative Commons License