Title: A hierarchical topic modelling approach for short text clustering
Authors: Rahul Pradhan; Dilip Kumar Sharma
Addresses: GLA University, Mathura, UP, India ' GLA University, Mathura, UP, India
Abstract: Social networking websites such as Twitter and WeChat provide services for microblogging to its users; they post millions of short messages on it every day. Creating a dataset of these messages helps in solving many non-trivial tasks in the domain of computer science, natural language processing, opinion mining, and many more. Topic modelling is critical in understanding the tweets and segregate then into manageable sets. We are bringing the topic modelling approaches to cluster the tweets or short text messages to groups as conventional approaches fail to properly deal with noisy, high volume, dimensionality, and short text sparseness. The method we have proposed can deal with the issue of data sparsity of short text. Our method involves a hierarchical two-stage clustering method. We have analysed the results on standard datasets, and we find that our method had better results as compared to other methods.
Keywords: short text clustering; STT; topic modelling; Dirichlet multinomial mixture; DMM; Twitter topic modelling.
DOI: 10.1504/IJICT.2022.123161
International Journal of Information and Communication Technology, 2022 Vol.20 No.4, pp.463 - 481
Received: 30 Apr 2020
Accepted: 03 Oct 2020
Published online: 01 Jun 2022 *