Please use this identifier to cite or link to this item:
https://repositorio.ufpe.br/handle/123456789/67273
Share on
| Title: | Machine Learning and Readability in Accounting: An Ensemble Learning Approach |
| Authors: | COSTA NETO, Arlindo Menezes da |
| Keywords: | Informativeness; Machine Learning; Accounting information |
| Issue Date: | 26-Nov-2025 |
| Publisher: | Universidade Federal de Pernambuco |
| Citation: | COSTA NETO, Arlindo Menezes da. Machine Learning and Readability in Accounting: An Ensemble Learning Approach. 2025. Tese (Doutorado em Ciências Contábeis) - Universidade Federal de Pernambuco, Recife, 2025. |
| Abstract: | We expand on the value relevance of accounting information by exploring a new metric for valuing the financial text, to do so we employ a language model (FinBERT-PT-BR) trained in Brazilian Portuguese to develop an Informativeness Index, assigning scores to 26.804 quarterly financial statement notes from 1.152 companies in Brazil over the span of 12 years. As a verification of our model’s capability to understand textual data, we calculate the usual readability metrics (Flesch-Kincaid reading ease, Fog index, SMOG index, Loughran McDonald Index) for all the notes and employ machine learning models to evaluate which readability metric best represents an informativeness index built upon the dimensions of Boilerplateness, Completeness and Density, expecting our proposed metric to be poorly related to the readability metrics. The evaluation of which readability metric is closest to measuring the informativeness of financial text is based on the feature importance, which indicates the best proxy for financial text readability of Portuguese text is be the Loughran McDonald Index. The Loughran-McDonald Index is the only one with any relevance in the regressors, and as is based on file size, we assume our metric as capable of measuring textual information value better than common readability metrics, while pointing to the Loughran-McDonald to be a reasonable proxy to informational value of financial text. This research innovates by presenting a new method to quantify the informational value of financial information, contributing to value-relevance literature as well as literature of machine learning employment in accounting research, additionally we do so within a not-so-explored field (Portuguese financial information) with a reasonably large dataset. Further research may be needed to combine our proposed model with market-related metrics or human experiments to increase the validity of the metric concept. |
| URI: | https://repositorio.ufpe.br/handle/123456789/67273 |
| Appears in Collections: | Teses de Doutorado - Ciências Contábeis |
Files in This Item:
| Fichero | Descripción | Tamaño | Formato | |
|---|---|---|---|---|
| TESE Arlindo Menezes da Costa Neto.pdf | 907.42 kB | Adobe PDF | ![]() Visualizar/Abrir |
Este arquivo é protegido por direitos autorais |
Este item está licenciada sob uma Licença Creative Commons

