Title: A multiclass classification approach for incremental entity resolution on short textual data

Authors: João Antonio Silva; Denilson Alves Pereira

Addresses: Department of Computer Science, Universidade Federal de Lavras, P.O. Box 3037, 37.200-000, Lavras, Brazil ' Department of Computer Science, Universidade Federal de Lavras, P.O. Box 3037, 37.200-000, Lavras, Brazil

Abstract: Several web applications maintain data repositories containing references to thousands of real-world entities originating from multiple sources, and they continually receive new data. Identifying the distinct entities and associating the correct references to each one is a problem known as entity resolution. The challenge is to solve the problem incrementally, as the data arrive, especially when those data are described by a single textual attribute. In this paper, we propose a new approach for incremental entity resolution. The method we have implemented, called AssocIER, uses an ensemble of multiclass classifiers with self-training and detection of novel classes. We have evaluated our method in various real-world datasets and scenarios, comparing it with a traditional entity resolution approach. The results show that AssocIER is effective and efficient to solve unstructured data in collections with a large number of entities and features, and is able to detect hundreds of novel classes.

Keywords: entity resolution; associative classification; incremental learning; novel class detection; self-training.

DOI: 10.1504/IJBIDM.2021.112988

International Journal of Business Intelligence and Data Mining, 2021 Vol.18 No.2, pp.218 - 245

Received: 21 Mar 2018
Accepted: 28 Jun 2018

Published online: 15 Feb 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article