Article: K-walks: clustering gene-expression data using a K-means clustering algorithm optimised by random walks Journal: International Journal of Data Mining and Bioinformatics (IJDMB) 2016 Vol.16 No.2 pp.121 - 140 Abstract: Gene-expression data obtained from the biological experiments always have thousands of dimensions, which can be very confusing and perplexing to biologists when viewed as a whole. Clustering analysis is an explorative data-mining technique for statistical data analysis that is widely used in gene-expression data analysis. Practical approaches employed for solving the clustering problem use iterative procedures such as K-means, which typically converge to one of many local minima. Here, we propose a simulated annealing approximation algorithm that is optimised using random walks to solve the K-means clustering problem. The algorithm is verified with synthetic and real-world data sets and compared with other well-known K-means variants. The new algorithm is less sensitive to initial cluster centres, and the primary strength of our algorithm is its ability to produce high-quality clustering results for thousands of high-dimensional data. However, the algorithm is computationally intensive. Inderscience Publishers - linking academia, business and industry through research

Title: K-walks: clustering gene-expression data using a K-means clustering algorithm optimised by random walks

Authors: Min Yao; Qinghua Wu; Juan Li; Tinghua Huang

Addresses: College of Animal Science, Yangtze University, Jingzhou, Hubei 434025, China ' College of Life Science, Yangtze University, Jingzhou, Hubei 434025, China ' College of Chemistry, Xiangtan University, Xiangtan, Hunan 411105, China ' College of Animal Science, Yangtze University, Jingzhou, Hubei 434025, China

Abstract: Gene-expression data obtained from the biological experiments always have thousands of dimensions, which can be very confusing and perplexing to biologists when viewed as a whole. Clustering analysis is an explorative data-mining technique for statistical data analysis that is widely used in gene-expression data analysis. Practical approaches employed for solving the clustering problem use iterative procedures such as K-means, which typically converge to one of many local minima. Here, we propose a simulated annealing approximation algorithm that is optimised using random walks to solve the K-means clustering problem. The algorithm is verified with synthetic and real-world data sets and compared with other well-known K-means variants. The new algorithm is less sensitive to initial cluster centres, and the primary strength of our algorithm is its ability to produce high-quality clustering results for thousands of high-dimensional data. However, the algorithm is computationally intensive.

Keywords: gene expression data; K-means clustering algorithm; optimisation; random walks; bioinformatics.

DOI: 10.1504/IJDMB.2016.080039

International Journal of Data Mining and Bioinformatics, 2016 Vol.16 No.2, pp.121 - 140

Received: 20 Feb 2016
Accepted: 18 Sep 2016
Published online: 29 Oct 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: K-walks: clustering gene-expression data using a K-means clustering algorithm optimised by random walks

Keep up-to-date