CC-K-means: a candidate centres-based K-means algorithm for text data Online publication date: Tue, 21-Jun-2016
by Xuan Li; Yongquan Liang; Yuhao Cai
International Journal of Collaborative Intelligence (IJCI), Vol. 1, No. 3, 2016
Abstract: K-means algorithm, one of the clustering algorithms, is widely applied to solve clustering problems of various data thanks to its simplicity and efficiency. However, the randomness of selecting centre points of the traditional K-means algorithm results in some defects such as low-speed of convergence or instability of clustering results. To overcome the impact of high-dimension during text clustering, latent semantic index (LSI) model is firstly adopted to reduce the dimensions of feature vector, and then weighted adjusted cosine similarity is used to calculate the similarity between documents to obtain better clustering effects. The high-density candidate centre points are partly updated to get the final clustering centres on the basis of density in the process of finding clustering centres. Experiment results show that the proposed algorithm can accurately find representative and decentralised clustering centres, which express a better performance in clustering.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Collaborative Intelligence (IJCI):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com