Title: CDNASA: clustering data with noise and arbitrary shape

Authors: Zhong-Han Niu; Jian-Cong Fan; Wen-Hua Liu; Liang Tang; Shuai Tang

Addresses: Provincial Key Lab. for Information Technology of Wisdom Mining of Shandong Province, Shandong University of Science and Technology, Qingdao 266590, China; College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China ' Provincial Key Lab. for Information Technology of Wisdom Mining of Shandong Province, Shandong University of Science and Technology, Qingdao 266590, China; College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China ' College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China ' College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China ' College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

Abstract: In many data domains, especially for spatial data, clusters of data are of arbitrary shape, size and density. Traditional clustering methods often fail to identify clusters efficiently or accurately in those situations. But the need for scalable spatial clustering algorithms has emerged with the rapid growth of spatial data in recent years. In this paper we propose a spatial clustering method, named CDNASA, based on the idea that each data object belongs to a certain space and if the two spaces have overlapping sections, they can be merged into one cluster. The data points which cannot be merged by any cluster are noise points. The effectiveness and efficiency of the proposed algorithm are tested on both synthetic and real data sets. Experimental results show that the quality of clusters discovered by CDNASA is much better than those by existing algorithms, especially for arbitrary shaped clusters. CDNASA also has the characteristics of noise-tolerance as well as low time and space complexity.

Keywords: data clustering; spatial data; data mining; noise; arbitrary shape; spatial clustering.

DOI: 10.1504/IJWMC.2016.080173

International Journal of Wireless and Mobile Computing, 2016 Vol.11 No.2, pp.100 - 111

Received: 27 Apr 2016
Accepted: 11 Jul 2016

Published online: 02 Nov 2016 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article