Title: Biomarker identification by knowledge-driven multilevel ICA and motif analysis

Authors: Li Chen, Jianhua Xuan, Chen Wang, Yue Wang, Ie-Ming Shih, Tian-Li Wang, Zhen Zhang, Robert Clarke, Eric P. Hoffman

Addresses: Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, 22203 VA, USA. ' Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, 22203 VA, USA. ' Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, 22203 VA, USA. ' Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, 22203 VA, USA. ' Department of Pathology, Gynecology and Oncology, The Johns Hopkins University, School of Medicine, Baltimore, 21231 MD, USA. ' Department of Pathology, Gynecology and Oncology, The Johns Hopkins University, School of Medicine, Baltimore, 21231 MD, USA. ' Department of Pathology, Gynecology and Oncology, The Johns Hopkins University, School of Medicine, Baltimore, 21231 MD, USA. ' Department of Oncology and Physiology and Biophysics, Georgetown University, School of Medicine, Washington, 20057 DC, USA. ' Research Center for Genetic Medicine, Children's National Medical Center, Washington, 20010 DC, USA

Abstract: Traditional statistical methods often fail to identify biologically meaningful biomarkers from expression data alone. In this paper, we develop a novel strategy, namely knowledge-driven multi-level Independent Component Analysis (ICA), to infer regulatory signals and identify biomarkers based on clustering results and partial prior knowledge. A statistical test is designed to evaluate significance of transcription factor enrichment for extracted gene set based on motif information. The experimental results on an Rsf-1 (HBXAP) induced microarray data set show that our method can successfully extract biologically meaningful biomarkers related to ovarian cancer compared to other gene selection methods with or without prior knowledge.

Keywords: biomarker identification; multi-level ICA; motif analysis; gene clustering; gene regulatory networks; microarray data analysis; independent component analysis; bioinformatics; regulatory signals; ovarian cancer.

DOI: 10.1504/IJDMB.2009.029201

International Journal of Data Mining and Bioinformatics, 2009 Vol.3 No.4, pp.365 - 381

Published online: 09 Nov 2009 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article