Title: Sparse p-norm Nonnegative Matrix Factorization for clustering gene expression data

Authors: Weixiang Liu, Kehong Yuan

Addresses: Life Science Division, Graduate School at Shenzhen, Tsinghua University, Shenzhen, 518055, China. ' Life Science Division, Graduate School at Shenzhen, Tsinghua University, Shenzhen, 518055, China

Abstract: Nonnegative Matrix Factorization (NMF) is a powerful tool for gene expression data analysis as it reduces thousands of genes to a few compact metagenes, especially in clustering gene expression samples for cancer class discovery. Enhancing sparseness of the factorisation can find only a few dominantly coexpressed metagenes and improve the clustering effectiveness. Sparse p-norm (p > 1) Nonnegative Matrix Factorization (sp-NMF) is a more sparse representation method using high order norm to normalise the decomposed components. In this paper, we investigate the benefit of high order normalisation for clustering cancer-related gene expression samples. Experimental results demonstrate that sp-NMF leads to robust and effective clustering in both automatically determining the cluster number, and achieving high accuracy.

Keywords: nonnegative matrix factorization; clustering analysis; gene expression data; NMF; p-norm; sparseness; data mining; bioinformatics; metagenes; cancer class discovery.

DOI: 10.1504/IJDMB.2008.020524

International Journal of Data Mining and Bioinformatics, 2008 Vol.2 No.3, pp.236 - 249

Published online: 29 Sep 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article