Title: Predicting gene functions from multiple biological sources using novel ensemble methods

Authors: Chandan K. Reddy; Mohammad S. Aziz

Addresses: Department of Computer Science, Wayne State University, Detroit, MI, 48084, USA ' Department of Computer Science, Wayne State University, Detroit, MI, 48084, USA

Abstract: The functional classification of genes plays a vital role in molecular biology. Detecting previously unknown role of genes and their products in physiological and pathological processes is an important and challenging problem. In this work, information from several biological sources such as comparative genome sequences, gene expression and protein interactions are combined to obtain robust results on predicting gene functions. The information in such heterogeneous sources is often incomplete and hence making the maximum use of all the available information is a challenging problem. We propose an algorithm that improves the performance of prediction of different models built on individual sources. We also develop a heterogeneous boosting framework that uses all the available information even if some sources do not provide any information about some of the genes. We demonstrate the superior performance of the proposed methods in terms of accuracy and F-measure compared to several imputation and integration schemes.

Keywords: gene function prediction; data integration; ensemble methods; heterogeneous boosting; functional classification; molecular biology; bioinformatics; genome sequences; gene expression data; protein interactions.

DOI: 10.1504/IJDMB.2015.069418

International Journal of Data Mining and Bioinformatics, 2015 Vol.12 No.2, pp.184 - 206

Accepted: 20 Sep 2013
Published online: 15 May 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article