Title: Feature selection for the imbalanced QSAR problems by using EasyEnsemble

Authors: Tian-Yu Liu, Guo-Zheng Li, Jack Y. Yang, Mary Qu Yang

Addresses: School of Electric, Shanghai Dianji University, Shanghai 200040, China. ' Department of Control Science and Engineering, Tongji University, Shanghai, 201804, China. ' Harvard Medical School, Harvard University, Cambridge, Massachusetts, 02140-0888, USA. ' National Human Genome Research Institute, National Institutes of Health (NIH), US Department of Health and Human Services, Bethesda, MD, 20852, USA

Abstract: Activities of drug molecules can be predicted by Quantitative Structure Activity Relationship (QSAR) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an imbalanced situation. Here we propose one embedded feature selection algorithm i.e., Prediction Risk based feature selection for EasyEnsemble (PREE) to treat this problem and improve generalisation performance of the EasyEnsemble classifier. Experimental results on the drug molecules data sets show that PREE obtains better performance, compared with the asymmetric bagging and EasyEnsemble.

Keywords: QSAR models; quantitative structure activity relationship; imbalanced problem; feature selection; EasyEnsemble classifier; drug molecules; molecular activities; prediction risk; ensemble learning.

DOI: 10.1504/IJCBDD.2008.022206

International Journal of Computational Biology and Drug Design, 2008 Vol.1 No.4, pp.334 - 346

Published online: 22 Dec 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article