An improved parallel K-means algorithm based on MapReduce
by Dongbo Zhang; Yanfang Shou; Jianmin Xu
International Journal of Embedded Systems (IJES), Vol. 9, No. 3, 2017

Abstract: The K-means algorithm is one of the most popular clustering algorithms. However, it is sensitive to initialised partitions and circular dataset. To address this problem, this paper introduces a CK-means clustering algorithm based on the K-means algorithm and the Canopy algorithm, which uses the MapReduce programming model of Hadoop platform. The experimental results prove that the CK-means algorithm has strong advantages for processing large datasets. The theoretical analysis shows that the CK-means algorithm and the traditional algorithm are of the same order of magnitude. The experimental results on artificial data show that the improved algorithm is better than the traditional algorithm in terms of acceleration ratio, accuracy and expansion rate. An experiment on real data is performed to obtain appropriate parameters.

Online publication date: Wed, 21-Jun-2017

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Embedded Systems (IJES):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com