Authors: Deepesh Kumar Srivastava; Basav Roychoudhury
Addresses: Institute of Management Technology Dubai, UAE ' Indian Institute of Management Shillong, Meghalaya, 793014, India
Abstract: Profile matching of a person using various online social networks is a non-trivial task. Major challenges in developing a reliable and scalable matching scheme include the non-availability of the required information or having contradictory information for the same user across these networks. In this study, we propose a method that utilises the contents generated by or shared with users across their online social networks. With the help of text mining techniques, we extract the high frequency words and common high frequency words in the user's posts/tweets (content attributes). Based on experiments with real datasets, this method provides 72.5% accuracy in identity matching amongst user's profiles. Given the data, we develop classification models, and we achieved accuracy and F1 score of 72.5% and 67.0%, respectively. This study will be helpful to enhance the accuracy of the identity resolution frameworks.
Keywords: classification models; content attributes; identity matching; identity resolution; online social networks; text mining.
International Journal of Enterprise Network Management, 2022 Vol.13 No.1, pp.19 - 36
Received: 27 Jul 2019
Accepted: 29 Nov 2019
Published online: 20 Apr 2022 *