Title: Large margin classifiers and Random Forests for integrated biological prediction

Authors: Sheng Liu; Yixin Chen; Dawn Wilkins

Addresses: Department of Computer and Information Science, University of Mississippi, MS 38677, USA. ' Department of Computer and Information Science, University of Mississippi, MS 38677, USA. ' Department of Computer and Information Science, University of Mississippi, MS 38677, USA

Abstract: Incorporating various sources of biological information is important for biological discovery. For example, genes have a multiview representation. They can be represented by features such as sequence length and pairwise similarities. Hence, the types vary from numerical features to categorical features. We propose a large margin Random Forests (RF) classification approach based on RF proximity kernals. Random Forests accommodate mixed data types naturally. The performance on four biological datasets is promising compared with other state of the art methods including Support Vector Machines (SVMs) and RF classifiers. It demonstrates high potential in the discovery of functional roles of biomolecules.

Keywords: random forests; proximity kernels; mixed type data; large margin classification; SVMs; support vector machines; bioinformatics; biomolecules; gene representation.

DOI: 10.1504/IJBRA.2012.045975

International Journal of Bioinformatics Research and Applications, 2012 Vol.8 No.1/2, pp.38 - 53

Published online: 05 Dec 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article