Title: Topic-driven top-k similarity search by applying constrained meta-path based in content-based schema-enriched heterogeneous information network

Authors: Phu Pham; Phuc Do

Addresses: Faculty of Information Systems, University of Information Technology (UIT), VNU-HCM, Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Vietnam ' Faculty of Information Systems, University of Information Technology (UIT), VNU-HCM, Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Vietnam

Abstract: In this paper, we propose a model of TopCPathSim in order to address the problem related to 'topic-driven' similarity searching based on 'constrained meta-path' (or also called 'restricted meta-path') between same-typed objects within the content-based heterogeneous information networks (HINs). The topic distributions over content-based objects such as: paper/article on the bibliographic network or user's comments/reviews on the social networks, etc. are obtained by using the LDA topic model. We conduct the experiments on the real DBLP, Aminer and ACM datasets which demonstrate the effectiveness of our proposed model. Throughout experiments, our proposed model gains about 73.56% in accuracy. The output results also show that the combination of probabilistic topic model with constrained meta-path is promising to leverage the output quality of topic-oriented similarity searching in content-based HINs.

Keywords: constrained meta-path; content-based heterogeneous information network; topic-driven similarity search; LDA; topic modelling.

DOI: 10.1504/IJBIDM.2020.109295

International Journal of Business Intelligence and Data Mining, 2020 Vol.17 No.3, pp.349 - 376

Received: 20 Dec 2017
Accepted: 14 Feb 2018

Published online: 03 Sep 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article