Title: A semi-supervised approach to projected clustering with applications to microarray data

Authors: Kevin Y. Yip, Lin Cheung, David W. Cheung, Liping Jing, Michael K. Ng

Addresses: Department of Computer Science, Yale University, New Haven, Connecticut, USA. ' Department of Computer Science, The University of Hong Kong, Hong Kong. ' Department of Computer Science, The University of Hong Kong, Hong Kong. ' Center for Mathematical Imaging and Vision, Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong. ' Center for Mathematical Imaging and Vision, Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong

Abstract: Recent studies have suggested that extremely low dimensional projected clusters exist in real datasets. Here, we propose a new algorithm for identifying them. It combines object clustering and dimension selection, and allows the input of domain knowledge in guiding the clustering process. Theoretical and experimental results show that even a small amount of input knowledge could already help detect clusters with only 1% of the relevant dimensions. We also show that this semi-supervised algorithm can perform knowledge-guided selective clustering when there are multiple meaningful object groupings. The algorithm is also shown effective in analysing a microarray dataset.

Keywords: data mining; semi-supervised algorithms; project clustering; bioinformatics; microarray data; meaningful object groupings.

DOI: 10.1504/IJDMB.2009.026700

International Journal of Data Mining and Bioinformatics, 2009 Vol.3 No.3, pp.229 - 259

Published online: 23 Jun 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article