Title: Method for improvement of transparency: use of text mining techniques for reclassification of governmental expenditures records in Brazil

Authors: Gustavo De Oliveira Almeida; Kate Revoredo; Claudia Cappelli; Cristiano Maciel

Addresses: Graduate Program in Business, Federal Fluminense University, Niteroi/RJ, Brazil; Graduate Program in Informatics, Federal University of Rio de Janeiro State, Rio de Janeiro/RJ, Brazil ' Graduate Program in Informatics, Federal University of Rio de Janeiro State, Rio de Janeiro/RJ, Brazil ' Graduate Program in Informatics, Federal University of Rio de Janeiro State, Rio de Janeiro/RJ, Brazil ' Computing Institute, Mato Grosso Federal University, Cuiabá/MT, Brazil

Abstract: Many countries have transparency laws requiring availability of data. However, often data is available but not transparent. We present the Transparency Portal of Brazilian Federal Government case and discuss limitations of public acquisitions data stored in free text format. We employed text-mining techniques to reclassify descriptive texts of measurement units related to products and services. The solution presented in KNIME and Java aggregated measurements in the original (n = 69,372 with 78% reduction in number of descriptions, 94% items classified) and in cross validation sample (n = 105,266 with 88% reduction, classifying 78% of items). In addition, we tested computational time for processing of texts for a wide range of data input sizes, suggesting the stability and scalability of the solution to process larger datasets. Finally, we produced analysis identifying probable input errors, suppliers and purchasing units with abnormal transactions and factors affecting procurement prices. We present suggestions for future research and improvements.

Keywords: e-government; data mining; open government; text mining; transparency; KNIME; knowledge discovery; techniques; Brazil.

DOI: 10.1504/IJBIDM.2021.112989

International Journal of Business Intelligence and Data Mining, 2021 Vol.18 No.2, pp.155 - 196

Received: 05 Jan 2018
Accepted: 30 Jun 2018

Published online: 28 Jan 2021 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article