Skip navigation
Please use this identifier to cite or link to this item: https://repositorio.ufpe.br/handle/123456789/67273

Share on

Title: Machine Learning and Readability in Accounting: An Ensemble Learning Approach
Authors: COSTA NETO, Arlindo Menezes da
Keywords: Informativeness; Machine Learning; Accounting information
Issue Date: 26-Nov-2025
Publisher: Universidade Federal de Pernambuco
Citation: COSTA NETO, Arlindo Menezes da. Machine Learning and Readability in Accounting: An Ensemble Learning Approach. 2025. Tese (Doutorado em Ciências Contábeis) - Universidade Federal de Pernambuco, Recife, 2025.
Abstract: We expand on the value relevance of accounting information by exploring a new metric for valuing the financial text, to do so we employ a language model (FinBERT-PT-BR) trained in Brazilian Portuguese to develop an Informativeness Index, assigning scores to 26.804 quarterly financial statement notes from 1.152 companies in Brazil over the span of 12 years. As a verification of our model’s capability to understand textual data, we calculate the usual readability metrics (Flesch-Kincaid reading ease, Fog index, SMOG index, Loughran McDonald Index) for all the notes and employ machine learning models to evaluate which readability metric best represents an informativeness index built upon the dimensions of Boilerplateness, Completeness and Density, expecting our proposed metric to be poorly related to the readability metrics. The evaluation of which readability metric is closest to measuring the informativeness of financial text is based on the feature importance, which indicates the best proxy for financial text readability of Portuguese text is be the Loughran McDonald Index. The Loughran-McDonald Index is the only one with any relevance in the regressors, and as is based on file size, we assume our metric as capable of measuring textual information value better than common readability metrics, while pointing to the Loughran-McDonald to be a reasonable proxy to informational value of financial text. This research innovates by presenting a new method to quantify the informational value of financial information, contributing to value-relevance literature as well as literature of machine learning employment in accounting research, additionally we do so within a not-so-explored field (Portuguese financial information) with a reasonably large dataset. Further research may be needed to combine our proposed model with market-related metrics or human experiments to increase the validity of the metric concept.
URI: https://repositorio.ufpe.br/handle/123456789/67273
Appears in Collections:Teses de Doutorado - Ciências Contábeis

Files in This Item:
File Description SizeFormat 
TESE Arlindo Menezes da Costa Neto.pdf907.42 kBAdobe PDFThumbnail
View/Open


This item is protected by original copyright



This item is licensed under a Creative Commons License Creative Commons