Title: CNNC: a common nearest neighbour clustering approach for gene expression data

Authors: Mausumi Goswami, Rosy Sarmah, D.K. Bhattacharyya

Addresses: Department of Computer Science and Engineering, Tezpur University, Tezpur, Assam, 784028, India. ' Department of Computer Science and Engineering, Tezpur University, Tezpur, Assam, 784028, India. ' Department of Computer Science and Engineering, Tezpur University, Tezpur, Assam, 784028, India

Abstract: We present an effective common nearest neighbour-based clustering technique (CNNC) for finding clusters over gene expression data. CNNC attempts to find all the clusters over gene expression data qualitatively. Our algorithm works by finding clusters using a nearest neighbour-based approach. A regulation-based module for finding sub clusters is also presented here. CNNC was tested on several real-life datasets and the effectiveness is established in terms of well known z-score measure and p-value over several real-life datasets. Using z-score analysis we show that CNNC outperforms other comparable algorithms. The p-value analysis shows that our technique is capable in detecting biologically relevant clusters from gene expression data.

Keywords: common nearest neighbour; Pearson correlation coefficient; regulation pattern; gene expression data; CNN clustering; clusters; bioinformatics.

DOI: 10.1504/IJCVR.2011.042268

International Journal of Computational Vision and Robotics, 2011 Vol.2 No.2, pp.115 - 126

Published online: 02 Sep 2011 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article