Authors: Shen Lu; Richard S. Segall
Addresses: Department of Computer Science, Donaghey College of Engineering and Information Technology, University of Arkansas at Little Rock, Little Rock, AR 72204, USA ' Department of Computer and Information Technology, College of Business, Arkansas State University, State University, AR 72467-0130, USA
Abstract: Multiple records for different visits of patients result in redundant information among multiple data sources. We can increase the amount of information available for population units required by stand-alone and distributed databases by matching and merging duplicate records. In this paper, we provide an algorithm, called entity resolution of the Fellegi-Sunter (ERFS) model. In this paper, we used the Fellegi-Sunter model to improve the results of semantic analysis for identification of similar records. According to our experimental results we find that ERFS yields rates that are higher for about 11.07% of the experiments than those using the Stanford entity resolution framework (SERF). Because we found that for these 11.07% there were 38.1% of the experiments conducted having increases ranging from 12.7% to 21.9%, with mid-range size of the number of records having an average increase of 16.96%, it can be concluded that ERFS should be used to link similar records.
Keywords: Fellegi-Sunter model; expectation maximisation; SERF model; record linkage; medical records; bioinformatics data; patient records; redundant information; semantic analysis; entity resolution.
International Journal of Information and Decision Sciences, 2013 Vol.5 No.2, pp.169 - 187
Published online: 28 Feb 2014 *Full-text access for editors Access for subscribers Purchase this article Comment on this article