Title: Mining underlying correlated-clusters in high-dimensional data streams

Authors: Wei Fan, Toyohide Watanabe, Koichi Asakura

Addresses: Department of Systems and Social Informatics, Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8603, Japan. ' Department of Systems and Social Informatics, Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8603, Japan. ' School of Informatics, Daido University, 10-3, Takiharu-cho, Minami-ku, Nagoya, 457-8530, Japan

Abstract: High-dimensional data streams pose challenges to traditional clustering algorithm, due to their inherent sparsity and data tend to cluster in different subspaces of the entire feature space. In this paper, we resolve the subspace clustering problem by mining correlated-clusters, in which selected features are correlated with each other. Moreover, taking data evolution in data streams into account, we propose methods to mine correlations of features incrementally and adaptively. At each time tick t, according to our proposed multiple regression measurement, we cluster the newly arrived data sample to one of correlated-clusters whose local correlations fit to the data sample and also update the local correlations adaptively, based on an incremental principal component analysis technology. The results of experiments on high-dimensional synthetic data and real data demonstrate that our methods can achieve higher accuracy of query than related work and perform much more efficiently. Additionally, our proposed methods are able to forecast missing values in streaming data successfully.

Keywords: data streams; correlated cluster mining; multiple regression measurement; incremental PCA; principal component analysis; IPCA; social computing.

DOI: 10.1504/IJSHC.2010.032689

International Journal of Social and Humanistic Computing, 2010 Vol.1 No.3, pp.282 - 299

Published online: 12 Apr 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article