Title: Method for improvement of transparency: use of text mining techniques for reclassification of governmental expenditures records in Brazil
Authors: Gustavo De Oliveira Almeida; Kate Revoredo; Claudia Cappelli; Cristiano Maciel
Addresses: Graduate Program in Business, Federal Fluminense University, Niteroi/RJ, Brazil; Graduate Program in Informatics, Federal University of Rio de Janeiro State, Rio de Janeiro/RJ, Brazil ' Graduate Program in Informatics, Federal University of Rio de Janeiro State, Rio de Janeiro/RJ, Brazil ' Graduate Program in Informatics, Federal University of Rio de Janeiro State, Rio de Janeiro/RJ, Brazil ' Computing Institute, Mato Grosso Federal University, Cuiabá/MT, Brazil
Abstract: Many countries have transparency laws requiring availability of data. However, often data is available but not transparent. We present the Transparency Portal of Brazilian Federal Government case and discuss limitations of public acquisitions data stored in free text format. We employed text-mining techniques to reclassify descriptive texts of measurement units related to products and services. The solution presented in KNIME and Java aggregated measurements in the original (n = 69,372 with 78% reduction in number of descriptions, 94% items classified) and in cross validation sample (n = 105,266 with 88% reduction, classifying 78% of items). In addition, we tested computational time for processing of texts for a wide range of data input sizes, suggesting the stability and scalability of the solution to process larger datasets. Finally, we produced analysis identifying probable input errors, suppliers and purchasing units with abnormal transactions and factors affecting procurement prices. We present suggestions for future research and improvements.
Keywords: e-government; data mining; open government; text mining; transparency; KNIME; knowledge discovery; techniques; Brazil.
DOI: 10.1504/IJBIDM.2021.112989
International Journal of Business Intelligence and Data Mining, 2021 Vol.18 No.2, pp.155 - 196
Received: 05 Jan 2018
Accepted: 30 Jun 2018
Published online: 15 Feb 2021 *