Title: Improving prediction accuracy of drug activities by utilising unlabelled instances with feature selection

Authors: Guo-Zheng Li, Jack Y. Yang, Wen-Cong Lu, Dan Li, Mary Qu Yang

Addresses: School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China. ' Harvard Medical School, Harvard University, Cambridge, Massachusetts 02140-0888, USA. ' Department of Chemistry, School of Science, Shanghai University, Shanghai 200444, China. ' School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China. ' National Human Genome Research Institute, National Institutes of Health (NIH), US Department of Health and Human Services, Bethesda, MD 20852, USA

Abstract: Molecular activities can be predicted by Quantitative Structure Activity Relationship (QSAR). Because of the high cost of experiments, the number of drug molecules with known activity is much less than that of unknown, to predict molecular activities utilising unlabelled instances will be an interesting issue. Here, Semi Supervised Learning (SSL) is introduced and a SSL method, Co-Training is investigated on predicting drug activities utilising unlabelled instances. At the same time, a novel algorithm called FESCOT is proposed, which applies feature selection to remove redundant and irrelevant features for Co-Training. Numerical experimental results show that Co-Training and feature selection helps to improve the prediction ability of Co-Training.

Keywords: QSAR; semi-supervised learning; SSL; co-training; feature selection; K-nearest neighbour; KNN; molecular activities; quantitative structure activity relationship; drug molecules; prediction accuracy; drug activities.

DOI: 10.1504/IJCBDD.2008.018706

International Journal of Computational Biology and Drug Design, 2008 Vol.1 No.1, pp.1 - 13

Published online: 14 Jun 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article