Title: Genomic law guided gene prediction in fungi and metazoans

Authors: Yaping Fang; Jun Li

Addresses: Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA ' Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA

Abstract: Protein coding gene prediction by computational approaches is a fundamental step for genome annotation. However, it is a challenge to accurately predict eukaryotic genes in silico. By surveying the model genomes, we found that the Spearman's rank correlation coefficient between the number of experimental-verified genes and the size of genomes was 0.96 for all eukaryotes except plants, indicating the relationship between genome size and the number of coding genes can be expressed with a monotonic function. Regression analysis found that the relationship of total protein coding genes over genome size followed a logarithmic equation. We integrated the equation into ab initio gene prediction software to guide the gene prediction by constraining the total number of predicted genes. We evaluated the software in three eukaryotic genomes. Results showed that >90% of false positive predictions were removed while >80% of true positives were retained, resulting in much higher specificity.

Keywords: gene prediction; gene count; gene structure; genome size; eukaryotic genome annotation; fungi; metazoans; protein coding; eukaryotic genes.

DOI: 10.1504/IJCBDD.2013.052197

International Journal of Computational Biology and Drug Design, 2013 Vol.6 No.1/2, pp.157 - 169

Received: 12 Apr 2012
Accepted: 19 Jul 2012

Published online: 18 Sep 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article