Title: A novel approach to text clustering using shift k-medoid

Authors: Mohit; A. Charan Kumari; Meghna Sharma

Addresses: Department of CSE and IT, The NorthCap University, Gurugram, India ' Department of CSE and IT, The NorthCap University, Gurugram, India ' Department of CSE and IT, The NorthCap University, Gurugram, India

Abstract: As the amount of data is growing day by day, there is a need to convert it into some effective manner so as to extract some useful information from huge data. Text mining is used to perform this task. In this paper, text clustering is used to convert the large data into different cluster forms to extract the meaningful information for the purpose of analysis so as to get the summarised data. Three partitioning-based clustering techniques, i.e., k-means, k-means fast and k-medoids are compared, and a new algorithm named shift k-medoid is proposed, which is hybrid of k-medoid and mean shift clustering algorithms. Cosine similarity, correlation coefficient and Jaccard similarity measures are used to check the performance of the algorithms and two measures, i.e., randomised feature and normalised mutual information (NMI) feature are used to test the accuracy of the algorithms. The outcomes demonstrate that the best performance is accomplished by using the proposed algorithm.

Keywords: text clustering; cosine measure; Jaccard measure; correlation coefficient; shift k-medoid.

DOI: 10.1504/IJSCCPS.2019.100186

International Journal of Social Computing and Cyber-Physical Systems, 2019 Vol.2 No.2, pp.106 - 118

Received: 05 May 2017
Accepted: 15 Nov 2017

Published online: 17 Jun 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article