Authors: Shanmugam Poomagal; Palanisamy Visalakshi; Thiagarajan Hamsapriya
Addresses: Department of Applied Mathematics and Computational Sciences, PSG College of Technology, Coimbatore, Tamil Nadu, India ' Department of Electronics and Communication Engineering, PSG College of Technology, Coimbatore, Tamil Nadu, India ' Oriental Institute of Science and Technology, Bhopal, India
Abstract: A popular social networking service called Twitter is used to post short messages that could be useful to someone in the world. These messages have been analysed by the researchers in different ways. This paper proposes a clustering technique to cluster the tweets in the Twitter. The basic aim of performing this clustering is to identify the groups of similar tweets posted and this information is useful to identify various user communities. These user communities can be recommended to the advertisers in Twitter by matching their topic of interest with the advertisers' field. Suffix Tree Clustering (STC) algorithm is the core web documents clustering algorithm which groups similar documents into clusters by constructing suffix tree. We used STC along with semantic similarity among the posted tweets to identify the topics of interest. The proposed method is compared with STC and Lingo algorithms using intra-cluster distance and inter-cluster distance. Results show that the proposed method performs better than the existing methods with 10.59% reduction in the intra-cluster distance value and 44.99% increase in the inter-cluster distance value.
Keywords: Twitter; tweets; semantic similarity; suffix tree clustering; STC; Lingo; inter-cluster distance; intra-cluster distance; tweet clustering; social networking; user communities; web based communities; virtual communities; online communities; semantic similarity; advertising.
International Journal of Web Based Communities, 2015 Vol.11 No.2, pp.170 - 187
Received: 08 May 2021
Accepted: 12 May 2021
Published online: 21 Mar 2015 *