Int. J. of Wireless and Mobile Computing   »   2016 Vol.11, No.2

 

 

Title: CDNASA: clustering data with noise and arbitrary shape

 

Authors: Zhong-Han Niu; Jian-Cong Fan; Wen-Hua Liu; Liang Tang; Shuai Tang

 

Addresses:
Provincial Key Lab. for Information Technology of Wisdom Mining of Shandong Province, Shandong University of Science and Technology, Qingdao 266590, China; College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
Provincial Key Lab. for Information Technology of Wisdom Mining of Shandong Province, Shandong University of Science and Technology, Qingdao 266590, China; College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

 

Abstract: In many data domains, especially for spatial data, clusters of data are of arbitrary shape, size and density. Traditional clustering methods often fail to identify clusters efficiently or accurately in those situations. But the need for scalable spatial clustering algorithms has emerged with the rapid growth of spatial data in recent years. In this paper we propose a spatial clustering method, named CDNASA, based on the idea that each data object belongs to a certain space and if the two spaces have overlapping sections, they can be merged into one cluster. The data points which cannot be merged by any cluster are noise points. The effectiveness and efficiency of the proposed algorithm are tested on both synthetic and real data sets. Experimental results show that the quality of clusters discovered by CDNASA is much better than those by existing algorithms, especially for arbitrary shaped clusters. CDNASA also has the characteristics of noise-tolerance as well as low time and space complexity.

 

Keywords: data clustering; spatial data; data mining; noise; arbitrary shape; spatial clustering.

 

DOI: 10.1504/IJWMC.2016.10001085

 

Int. J. of Wireless and Mobile Computing, 2016 Vol.11, No.2, pp.100 - 111

 

Submission date: 12 Apr 2016
Date of acceptance: 11 Jul 2016
Available online: 02 Nov 2016

 

 

Editors Full text accessAccess for SubscribersPurchase this articleComment on this article