Title: Mapping genomic features to functional traits through microbial whole genome sequences

Authors: Wei Zhang; Erliang Zeng; Dan Liu; Stuart E. Jones; Scott Emrich

Addresses: Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA ' Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA; Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA ' Department of Biological Science, University of Notre Dame, Notre Dame, IN 46556, USA ' Department of Biological Science, University of Notre Dame, Notre Dame, IN 46556, USA ' Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA; Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA

Abstract: Recently, the utility of trait-based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. We proposed a machine learning framework to quantitatively link the genomic features with functional traits. Genes from bacteria genomes belonging to different functional traits were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. Extensive experimental results demonstrated that functional trait related genes can be detected using our method. Further, the method has the potential to provide novel biological insights.

Keywords: functional traits; genomic signatures; sporulation; feature selection; machine learning; phenotype-genotype association; microbial diversity; functional genomics; feature mapping; genomic features; functional traits; genome sequences; bioinformatics; genes; bacteria genomes.

DOI: 10.1504/IJBRA.2014.062995

International Journal of Bioinformatics Research and Applications, 2014 Vol.10 No.4/5, pp.461 - 478

Published online: 24 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article