Title: A pattern matching approach for clustering gene expression data

Authors: Rosy Das, Jugal Kalita, Dhruba K. Bhattacharyya

Addresses: Department of Computer Science and Engineering, Tezpur University, Napaam, Assam, 784028, India. ' Department of Computer Science, University of Colorado, Colorado Springs, CO 809933, USA. ' Department of Computer Science and Engineering, Tezpur University, Napaam, Assam, 784028, India

Abstract: Identifying groups of genes with similar expression time courses is crucial in the analysis of gene expression time series data. This paper proposes a regulation-based clustering approach, PatternClus, for clustering gene expression data. The method also identifies sub-clusters based on an order preserving ranking approach. The clustering method was experimented in light of real life datasets and the proposed method has been established to perform satisfactorily. PatternClus was compared to some of the well-known clustering algorithms (k-means and hierarchical algorithm) and was found to give better results in terms of z-score measure of cluster validation. An incremental version of PatternClus is also presented here which helps in identifying clusters incrementally where the database is continuously increasing.

Keywords: gene expression; microarrays; regulation patterns; pattern matching; clustering; sub-clusters; incremental clustering.

DOI: 10.1504/IJDMMM.2011.041492

International Journal of Data Mining, Modelling and Management, 2011 Vol.3 No.2, pp.130 - 149

Published online: 24 Jul 2011 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article