On Multi-Label Meta-Learning for automated pipeline  recommendation

MAIA, Cynthia Moreira

Use este identificador para citar ou linkar para este item: https://repositorio.ufpe.br/handle/123456789/67046

Compartilhe esta página

Título:	On Multi-Label Meta-Learning for automated pipeline recommendation
Autor(es):	MAIA, Cynthia Moreira
Palavras-chave:	Fluxos; Meta-aprendizagem; Multirrótulo
Data do documento:	15-Out-2025
Editor:	Universidade Federal de Pernambuco
Citação:	MAIA, Cynthia Moreira. On Multi-Label Meta-Learning for automated pipeline recommendation. 2025. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.
Abstract:	Automated Machine Learning (AutoML) aims to automate stages of the machine learn ing process, such as algorithm selection, data preprocessing, and hyperparameter tuning. One of its main challenges is designing a search space that can handle different problems while ensuring the best trade-off between performance and computational cost. Traditional AutoML approaches primarily explore the search space online, utilizing optimization strategies such as Bayesian Optimization to identify the optimal configuration within a specified time budget. Although effective, such methods often result in high computational costs. In contrast, our proposal seeks to avoid online search strategies by employing meta-learning to address these challenges. This approach leverages the meta-features of problems to recommend solutions appropriate to their nature, thereby eliminating the need for exhaustive search at runtime. Accordingly, we propose MetaML, the first study of this thesis, a meta-learning approach based on multi-label algorithms for pipeline recommendation in AutoML. To this end, we present a curated search space design that automatically reduces the number of candidate pipelines, based on historical data from online repositories, including only the most frequently used pipelines with the best performance across a significant number of datasets. Additionally, we propose chained recommendations utilizing multi-label algorithms that take into account the interdependencies between pipeline stages. Experiments conducted on different datasets demonstrate the effectiveness of the approach, with MetaML achieving satisfactory results and, in some cases, superior outcomes at a lower computational cost compared to current AutoML methods. However, the pipelines derived from the repository experiments showed limited representativeness with respect to preprocessing techniques. As an alternative, we pro pose the PIPES meta-dataset, the second study of this thesis, which consists of a collection of experiments involving multiple pipelines, designed to represent all selected combinations of techniques, including different preprocessing blocks and a classification block. After con structing PIPES, we employed this meta-dataset in the third study of the thesis, MetaML 2.0, to investigate whether broader pipeline representativeness could yield even better results. The experiments demonstrated that this approach indeed achieved improved performance in specific datasets.
URI:	https://repositorio.ufpe.br/handle/123456789/67046
Aparece nas coleções:	Teses de Doutorado - Ciência da Computação

Arquivos associados a este item:

Arquivo	Descrição	Tamanho	Formato
TESE Cynthia Moreira Maia.pdf		2.62 MB	Adobe PDF	Visualizar/Abrir

Este arquivo é protegido por direitos autorais

Ver licença

Mostrar registro completo do item Recomendar este item Visualizar estatísticas

Este item está licenciada sob uma Licença Creative Commons