Title: A hierarchical topic modelling approach for short text clustering

Authors: Rahul Pradhan; Dilip Kumar Sharma

Addresses: GLA University, Mathura, UP, India ' GLA University, Mathura, UP, India

Abstract: Social networking websites such as Twitter and WeChat provide services for microblogging to its users; they post millions of short messages on it every day. Creating a dataset of these messages helps in solving many non-trivial tasks in the domain of computer science, natural language processing, opinion mining, and many more. Topic modelling is critical in understanding the tweets and segregate then into manageable sets. We are bringing the topic modelling approaches to cluster the tweets or short text messages to groups as conventional approaches fail to properly deal with noisy, high volume, dimensionality, and short text sparseness. The method we have proposed can deal with the issue of data sparsity of short text. Our method involves a hierarchical two-stage clustering method. We have analysed the results on standard datasets, and we find that our method had better results as compared to other methods.

Keywords: short text clustering; STT; topic modelling; Dirichlet multinomial mixture; DMM; Twitter topic modelling.

DOI: 10.1504/IJICT.2022.123161

International Journal of Information and Communication Technology, 2022 Vol.20 No.4, pp.463 - 481

Received: 30 Apr 2020
Accepted: 03 Oct 2020

Published online: 01 Jun 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article