Title: Bloat free genetic programming: application to human oral bioavailability prediction

Authors: Sara Silva; Leonardo Vanneschi

Addresses: INESC-ID Lisboa, IST/Technical University of Lisbon, Portugal; CISUC, University of Coimbra, Portugal ' ISEGI, Universidade Nova de Lisboa, Lisbon, Portugal; DISCo, University of Milano-Bicocca, Milan, Italy

Abstract: Being able to predict the human oral bioavailability for a potential new drug is extremely important for the drug discovery process. This problem has been addressed by several prediction tools, with Genetic Programming providing some of the best results ever achieved. In this paper we use the newest developments of Genetic Programming, in particular the latest bloat control method, Operator Equalisation, to find out how much improvement we can achieve on this problem. We show examples of some actual solutions and discuss their quality, comparing them with previously published results. We identify some unexpected behaviours related to overfitting, and discuss the way for further improving the practical usage of the Genetic Programming approach.

Keywords: genetic programming; bloat control; code growth; operator equalisation; data mining; drug discovery; human oral bioavailability; prediction; symbolic regression; overfitting; solution length; feature selection; new drugs.

DOI: 10.1504/IJDMB.2012.050266

International Journal of Data Mining and Bioinformatics, 2012 Vol.6 No.6, pp.585 - 601

Published online: 17 Dec 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article