Isolating variable effects in supervised machine learning illustrated in educational data mining

SILVA FILHO, Rogério Luiz Cardoso

Por favor, use este identificador para citar o enlazar este ítem: https://repositorio.ufpe.br/handle/123456789/57408

Comparte esta pagina

Título :	Isolating variable effects in supervised machine learning illustrated in educational data mining
Autor :	SILVA FILHO, Rogério Luiz Cardoso
Palabras clave :	IA explicável; Aprendizagem de máquina interpretável; Explicadores globais; Mineração de dados educacionais; Importância de variáveis
Fecha de publicación :	18-abr-2024
Editorial :	Universidade Federal de Pernambuco
Citación :	SILVA FILHO, Rogério Luiz Cardoso. Isolating variable effects in supervised machine learning illustrated in educational data mining. 2024. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2024.
Resumen :	This thesis investigates the application of Explainable Artificial Intelligence (XAI) in Su- pervised Machine Learning (SML) models. The motivation for this study stems from the development of Educational Data Mining (EDM), an area that frequently uses such models to analyze and extract insights from large datasets. A central issue of this work is the challenge of generating global explanations for SML, particularly in cases where data independence is not guaranteed. This is a recurring but still underexplored problem in EDM. Neglecting data interdependencies can lead to biased explanations, overestimating irrelevant variables or disproportionately assigning importance to predictors with similar relevance. To address these challenges, this work builds on Accumulated Local Effects (ALE), a recent method for post-hoc global explanation that visualizes the impact of features. ALE’s pseudo-orthogonality property allows for isolating individual variable effects, distinguishing it from widely used methods in EDM such as partial dependence plots and Shapley-based explanations. In a preliminary stage, ALE techniques is compared to other existing ones by using a new methodology that evaluates how different these techniques approximate the true variable effects in various contexts of data dependency. In a preliminary stage, ALE techniques are compared to other existing ones using a new methodology that evaluates how well these techniques approximate the true variable ef- fects in various contexts of data dependency. Furthermore, based on the ALE promising results of this stage, this work proposes new ALE-based scores to measure the impact of variables in SML. The scores are model-agnostic and can report both the magnitude and direction of the individual impact of features. The scores prove to be efficient in various scenarios when compared to existing metrics on synthetic and real-world datasets. Moreover, an empirical study using data from Brazilian secondary schools not only confirms the usefulness of the new scores in a real-world scenario but also extends the contributions of this thesis by identifying and offering new perspectives on the determinants of Brazilian school success over more than a decade.
URI :	https://repositorio.ufpe.br/handle/123456789/57408
Aparece en las colecciones:	Teses de Doutorado - Ciência da Computação

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
TESE Rogério Luiz Cardoso Silva Filho.pdf		5,79 MB	Adobe PDF	Visualizar/Abrir

Este ítem está protegido por copyright original

Visualizar la licencia

Mostrar el registro Dublin Core completo del ítem Recomiende este ítem

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons