Title: Feature selection for gene function prediction using multi-labelled lazy learning

Authors: Yu-Hai Liu, Guo-Zheng Li, Hong-Yu Zhang, Mary Qu Yang, Jack Y. Yang

Addresses: Plexus Team, Qingdao R&D Center, Alcatel-Lucent Technologies, No. 169 Songling Road, Qingdao 266101, China. ' Department of Control Science and Engineering, Tongji University, Shanghai 201804, China. ' Department of Electronic, Ocean University of China, Qingdao 266100, China. ' National Human Genome Research Institute National Institutes of Health (NIH), US Department of Health and Human Services, Bethesda, MD 20852, USA. ' Harvard Medical School, Harvard University, Cambridge, Massachusetts 02140 0888, USA

Abstract: In multi-label learning, each instance in the training set is associated with a set of labels, and the task is to output a label set whose size is unknown a priori for each unseen instance. In this paper, feature selection for the multi-label method was proposed based on mutual information. In detail, we use the distribution of mutual information for feature selection in the multi-label problems. Our experiment was preceded on a multi-label lazy learning approach named ML-kNN, which is derived from the traditional k-Nearest Neighbour (KNN) algorithm. Experimental results on a real-world multi-label bioinformatics data show that ML-kNN with feature selection greatly outperforms the prior ML-kNN algorithm.

Keywords: multi-label classification; feature selection; ML-kNN; gene function prediction; multi-label learning; lazy learning; k-nearest neighbour; multi-label bioinformatics.

DOI: 10.1504/IJFIPM.2008.021388

International Journal of Functional Informatics and Personalised Medicine, 2008 Vol.1 No.3, pp.223 - 233

Published online: 22 Nov 2008 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article