Title: SED-Stream: discriminative dimension selection for evolution-based clustering of high dimensional data streams

Authors: Kitsana Waiyamai; Thanapat Kangkachit; Thanawin Rakthanmanon; Rattanapong Chairukwattana

Addresses: Faculty of Engineering, Department of Computer Engineering, Kasetsart University, Bangkok 10900, Thailand ' Faculty of Engineering, Department of Computer Engineering, Kasetsart University, Bangkok 10900, Thailand ' Faculty of Engineering, Department of Computer Engineering, Kasetsart University, Bangkok 10900, Thailand ' Faculty of Engineering, Department of Computer Engineering, Kasetsart University, Bangkok 10900, Thailand

Abstract: Clustering of high dimensional data streams become one of the most challenging data mining tasks. Our previous work, SE-Stream is a standard-deviation based projected clustering method to support high dimensional data streams. Besides its ability to find clusters within subgroups of dimensions, SE-Stream is able to monitor and detect change in the clustering structure during the progression of data streams. Extended from SE-Stream, some selected dimensions are used to represent the clusters. Our idea is to select a better set of dimensions to increase the quality of the output clustering. Our proposed SED-Stream projects any cluster to its discriminative dimensions that are highly relevant to the cluster itself but distinguished from the other clusters. Experimental results on both real-world and synthetic stream datasets show that SED-Stream is better than its previous version, SE-Stream, in terms of both purity and f-measure. Compared with HPStream, a state of the art algorithm for projected clustering of high dimensional data streams, SED-Stream outperforms HPStream in terms of f-measure, and has comparable purity.

Keywords: discriminative dimension selection; evolution-based stream clustering; high dimensional data streams; evolving data streams; data mining.

DOI: 10.1504/IJISTA.2014.065174

International Journal of Intelligent Systems Technologies and Applications, 2014 Vol.13 No.3, pp.187 - 201

Received: 05 Nov 2013
Accepted: 05 May 2014

Published online: 15 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article