Title: Using the two-population genetic algorithm with distance-based k-nearest neighbour voting classifier for high-dimensional data

Authors: Chien-Pang Lee; Wen-Shin Lin

Addresses: Department of Maritime Information and Technology, National Kaohsiung Marine University, No.482, Zhongzhou 3rd Rd., Qijin Dist., Kaohsiung City 805, Taiwan ' Department of Plant Industry, National Pingtung University of Science and Technology, No.1, Shuefu Road, Neipu, Pingtung 912, Taiwan

Abstract: Owing to developments in computer technology, high-dimensional data has become a popular research issue. However, the traditional statistical methods cannot perform well when the variable numbers (p) are greater than the sample size (n). Accordingly, this paper proposes a novel hybrid model that combines statistical methodology with data mining techniques for the classification of high-dimensional data. In the proposed model, the Fisher's least significant difference test was originally used for initial dimension reduction. Subsequently, this paper uses a two-population genetic algorithms and a non-parametric statistics classification method (distance-based k-nearest neighbour voting classifier) to evaluate and to rank the variables' importance. Furthermore, the evaluation of the relevant variables for classification is considered with the outlier detection method. Eight different public gene expression datasets are used to compare the performance of the proposed model with the existing methods. The experimental results indicate that the proposed model performs better than the existing methods in terms of the classification accuracy.

Keywords: genetic algorithms; k-nearest neighbour; Fisher's least significant difference; outlier detection; high-dimensional data; gene expression data; data classification; bioinformatics; data mining.

DOI: 10.1504/IJDMB.2016.075820

International Journal of Data Mining and Bioinformatics, 2016 Vol.14 No.4, pp.315 - 331

Accepted: 13 Nov 2015
Published online: 06 Apr 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article