Authors: Mohit; A. Charan Kumari; Meghna Sharma
Addresses: Department of CSE and IT, The NorthCap University, Gurugram, India ' Department of CSE and IT, The NorthCap University, Gurugram, India ' Department of CSE and IT, The NorthCap University, Gurugram, India
Abstract: As the amount of data is growing day by day, there is a need to convert it into some effective manner so as to extract some useful information from huge data. Text mining is used to perform this task. In this paper, text clustering is used to convert the large data into different cluster forms to extract the meaningful information for the purpose of analysis so as to get the summarised data. Three partitioning-based clustering techniques, i.e., k-means, k-means fast and k-medoids are compared, and a new algorithm named shift k-medoid is proposed, which is hybrid of k-medoid and mean shift clustering algorithms. Cosine similarity, correlation coefficient and Jaccard similarity measures are used to check the performance of the algorithms and two measures, i.e., randomised feature and normalised mutual information (NMI) feature are used to test the accuracy of the algorithms. The outcomes demonstrate that the best performance is accomplished by using the proposed algorithm.
Keywords: text clustering; cosine measure; Jaccard measure; correlation coefficient; shift k-medoid.
International Journal of Social Computing and Cyber-Physical Systems, 2019 Vol.2 No.2, pp.106 - 118
Received: 05 May 2017
Accepted: 15 Nov 2017
Published online: 14 Jun 2019 *