Title: Clustering of text documents with keyword weighting function

Authors: A. Christy; G. Meera Gandhi; S. Vaithyasubramanian

Addresses: Faculty of Computing, Sathyabama Institute of Science and Technology, Chennai, India ' Faculty of Computing, Sathyabama Institute of Science and Technology, Chennai, India ' Department of Mathematics, Sathyabama Institute of Science and Technology, Chennai, India

Abstract: In this digital world, data is available in abundance everywhere and it is growing at a phenomenal rate. Making data available readily for decision making is an important task of data analyst. In this article, we propose an unsupervised learning algorithm for text document clustering by adopting keyword weighting function. Documents are pre-processed and relevant keywords based on their weights are grouped together. Clustered keyword weighting (CKW) takes each class in the training collection as a known cluster, and searches for feature weights iteratively to optimise the clustering objective function, in order to retrieve the best clustering result. Performance of CKW is validated by clustering BBC news collection text collections. Experiments were conducted with simple K-means, hierarchical clustering algorithms and our keyword weighting and clustering approach has shown improved cluster quality compared to the other methods.

Keywords: documents; cluster; unsupervised; feature; K-means; normalised.

DOI: 10.1504/IJIE.2019.100029

International Journal of Intelligent Enterprise, 2019 Vol.6 No.1, pp.19 - 31

Received: 06 Mar 2018
Accepted: 02 May 2018

Published online: 04 Jun 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article