Title: Optimal feature subset selection in high dimensional data clustering

Authors: Kasturi Chandrahaasan Sharmili; Arul Gnanaprakasaam Chilambuchelvan

Addresses: Department of MCA, RMK Engineering College, Chennai, India ' Department of EIE, RMD Engineering College, RSM Nagar, Thiruvallur – 601206, India

Abstract: Feature subset selection is the process of identifying and removing many irrelevant and redundant features. Initially, the input micro array dataset is selected from the medical database. Then, preprocessing step is done in the input dataset. The resultant output is fed to the second step; here, the features are optimally selected using clustering and tree generation process. In our proposed technique, the modified kernel-based fuzzy c-means clustering algorithm with optimal minimum spanning tree algorithm is applied on the high dimensional dataset to select the important features, in which the optimal features are selected by means of binary cuckoo search. Next, the classification is done through neuro fuzzy classifier. At last, the experimentation is performed by means of different micro array dataset. The result proves that neuro fuzzy classifier outperformed the existing approach by attaining maximum accuracy of 89% for GLA-BRA-180 dataset when compared existing NN only achieved 68.2%, fuzzy classifier attains 63.1% and KNN classifier attains 67.3%.

Keywords: neurofuzzy classifiers; microarrays; modified kernel-based FCM; fuzzy c-means clustering; minimum spanning tree; binary cuckoo search; optimisation; feature subsets; feature selection; subset selection; high dimensional data clustering; medical data; neural networks; fuzzy logic.

DOI: 10.1504/IJBIDM.2016.081866

International Journal of Business Intelligence and Data Mining, 2016 Vol.11 No.3, pp.242 - 263

Received: 02 Aug 2016
Accepted: 14 Sep 2016

Published online: 29 Jan 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article