Title: Gene microarray data analysis using parallel point-symmetry-based clustering

Authors: Anasua Sarkar; Ujjwal Maulik

Addresses: Information Technology Department, Government College of Engineering and Leather Technology, Kolkata, India ' Department of Computer Science and Engineering, Jadavpur University, Kolkata, India

Abstract: Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.

Keywords: clustering algorithms; cluster validity measures; k-means clustering; gene microarray data; gene expression; point-symmetry based distance; parallel algorithms; bioinformatics.

DOI: 10.1504/IJDMB.2015.067320

International Journal of Data Mining and Bioinformatics, 2015 Vol.11 No.3, pp.277 - 300

Received: 12 Dec 2011
Accepted: 02 Nov 2012

Published online: 05 Feb 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article