Title: Enrichment of data in digital documents with metadata extraction

Authors: Clovis Dos Santos Júnior; Carina Friedrich Dorneles

Addresses: Institute of Exact and Natural Sciences, Federal University of Rondonópolis (UFR), Rondonópolis, Mato Grosso, Brazil ' Department of Informatics and Statistics, Federal University of Santa Catarina (UFSC), Florianópolis, Santa Catarina, Brazil

Abstract: Companies have migrated their operational activities from paper documents to automated processes with fully digital storage. This management trend is positive, but printed documents, in most cases, cannot be discarded for administrative or legal reasons. This research used data extraction to enrich the database of a Non-Governmental Organisation (NGO) that monitors the use of public financial resources in counties. The implementation analysed the digital files containing official documents and identified the words with the highest occurrence according to algorithms presented in the research results. The solution created in the research added metadata to improve the search for documents in the database and improve the procedural follow-up of administrative and judicial actions. The results were positive with success in the extraction of the keywords in each document and presented with examples in the results section, showing the steps used to add metadata in the documents.

Keywords: electronic document; text mining; data extraction; NGO.

DOI: 10.1504/IJMSO.2023.135335

International Journal of Metadata, Semantics and Ontologies, 2023 Vol.16 No.2, pp.187 - 193

Received: 30 Jul 2022
Received in revised form: 17 Mar 2023
Accepted: 22 Mar 2023

Published online: 05 Dec 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article