Title: A Bayesian framework for knowledge driven regression model in micro-array data analysis

Authors: Rong Jin, Luo Si, Christina Chan

Addresses: Department of Computer Science and Engineering, Michigan State University, MI, USA. ' Department of Computer Science, Purdue University, West Lafayette, IN, USA. ' Department of Chemical Engineering and Material Science, Michigan State University, MI 48864, USA

Abstract: This paper addresses the sparse data problem in the linear regression model, namely the number of variables is significantly larger than the number of the data points for regression. We assume that in addition to the measured data points, the prior knowledge about the input variables may be provided in the form of pair wise similarity. We presented a full Bayesian framework to effectively exploit the similarity information of the input variables for linear regression. Empirical studies with gene expression data show that the regression errors can be reduced significantly by incorporating the similarity information derived from gene ontology.

Keywords: Bayesian analysis; knowledge driven data regression; graph Laplacian; gene expression data; data mining; bioinformatics; sparse data; linear regression models; pair wise similarity; micro-array data analysis; gene ontology.

DOI: 10.1504/IJDMB.2008.020525

International Journal of Data Mining and Bioinformatics, 2008 Vol.2 No.3, pp.250 - 267

Published online: 29 Sep 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article