Title: The approach of using ontology as a pre-knowledge source for semi-supervised labelled topic model by applying text dependency graph

Authors: Phu Pham; Phuc Do

Addresses: Faculty of Information Science and Engineering, University of Information Technology (UIT), VNU-HCM, Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Vietnam ' Faculty of Information Science and Engineering, University of Information Technology (UIT), VNU-HCM, Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Vietnam

Abstract: Multiple topics discovering from text is an important task in text mining. In the past, the supervised approaches fail to explore multiple topics in text. The topic modelling approach, such as: LSI, pLSI, LDA, etc. are considered as an unsupervised method which supports to discover distributions of multiple topics in text documents. The labelled LDA (LLDA) model is a supervised method which enables to integrate human labelled topics with the given text corpus during the process of modelling topics. However, in real applications, we may not have enough high qualified knowledge to properly assign the topics for all documents before applying the LLDA. In this paper, we present two approaches which have taken the advantage of dependency graph-of-words (GOW) in text analysis. The GOW approach uses frequent sub-graph mining (FSM) technique to extract graph-based concepts from the text. Our first approach is the method of using graph-based concepts for constructing domain-specific ontology. It is called GC2Onto model. In our second approach, the graph-based concepts are also applied to improve the quality of traditional LLDA. It is called LLDA-GOW model. We combine two GC2Onto and LLDA-GOW models to leverage the multiple topic identification as well as other mining tasks in the text.

Keywords: topic identification; labelled topic modelling; latent Dirichlet allocation; LDA; labelled LDA; LLDA; ontology-driven topic labelling; dependency graph.

DOI: 10.1504/IJBIDM.2021.115477

International Journal of Business Intelligence and Data Mining, 2021 Vol.18 No.4, pp.488 - 523

Received: 29 Jan 2018
Accepted: 28 Nov 2018

Published online: 04 May 2021 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article