Automatic generation of initial value k to apply k-means method for text documents clustering Online publication date: Thu, 26-Feb-2015
by Namita Gupta, P.C. Saxena, J.P. Gupta
International Journal of Data Mining, Modelling and Management (IJDMMM), Vol. 3, No. 1, 2011
Abstract: Retrieving relevant text documents on a topic from a large document collection is a challenging task. Different clustering algorithms are developed to retrieve relevant documents of interest. Hierarchical clustering shows quadratic time complexity of O(n²) for n text documents. K-means algorithm has a time complexity of O(n) but it is sensitive to the initial randomly selected cluster centres, giving local optimum solution. Global k-means employs the k-means algorithm as a local search procedure to produce global optimum solution but shows polynomial time complexity of O(nk) to produce k clusters. In this paper, we propose an approach of clustering text documents that overcomes the drawback of k-means and global k-means and gives global optimal solution with time complexity of O(lk) to obtain k clusters from initial set of l starting clusters. Experimental evaluation on Reuters newsfeeds (Reuters-21578) shows clustering results (entropy, purity, F-measure) obtained by proposed method comparable with k-means and global k-means.
Online publication date: Thu, 26-Feb-2015
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Data Mining, Modelling and Management (IJDMMM):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email email@example.com