Authors: Anusuya Kirubakaran; M. Aramudhan
Addresses: Mother Teresa Women's University, Kodaikanal, India ' PKIET, Karaikal, India
Abstract: Even though modern world is ruled by data and preventive measures are in place to keep the data quality higher, risk intelligence teams are challenged for one of the risk analysis task aimed at record linkages on heterogeneous data from multiple data sources due higher ratio of non-standard and poor quality data present in big data systems caused by variety of data format across regions, data platforms, data storage systems, data migration, etc. To keep these record linkages in mind, in this paper, we try to address the complications in name matching process irrespective of spelling, structure and phonetic variations. Success of name matching is achieved when the algorithm is capable of handling names with discrepancies due to naming conventions, cross language translation, operating system transformation, data migration, batch feeds, typos and other external factors. In this paper, we have discussed the varieties of name representation in data source and the methods to parse and find the maximum probabilities of name match comparable to watchdog security with high accuracy as well as the percentage of false negative rate being reduced. The proposed methods can be applied to financial sector's risk intelligence analysis like know your customer (KYC), anti-money laundering (AML), customer due diligence (CDD), anti-terrorism, watchlist screening and fraud detection.
Keywords: hybrid name matching; string similarity measure; data matching; risk intelligence.
International Journal of Data Analysis Techniques and Strategies, 2018 Vol.10 No.3, pp.273 - 290
Received: 18 Jul 2016
Accepted: 15 Nov 2016
Published online: 01 Aug 2018 *