Title: Data dimensionality reduction with application to improving classification performance and explaining concepts of data sets

Authors: Xiuju Fu, Lipo Wang

Addresses: Institute of High Performance Computing, Science Park 2, 117528, Singapore. ' School of Electrical and Electronic Engineering, Nanyang Technological University, Block S1, Nanyang Avenue, 639798, Singapore

Abstract: Data dimensionality reduction is usually carried out before patterns are input to classifiers. In order to obtain good results in data mining, selecting relevant data is desirable. In many cases, irrelevant or redundant attributes are included in data sets, which interfere with knowledge discovery from data sets. In this paper, we propose a rule-extraction method based on a novel separability-correlation measure (SCM) ranking the importance of attributes. According to the attribute ranking results, the attribute subsets that lead to the best classification results are selected and used as inputs to a classifier, such as an RBF neural network in our paper. The complexity of the classifier can thus be reduced and its classification performance improved. Our method uses the classification results with reduced attribute sets to extract rules. Computer simulations show that our method leads to smaller rule sets with higher accuracies compared with other methods.

Keywords: radial basis function; rule extraction; classification performance; RBF neural networks; data mining; knowledge discovery; data dimensionality reduction; data sets; classifiers; separability-correlation measure; attributes ranking; simulation; feature selection.

DOI: 10.1504/IJBIDM.2005.007319

International Journal of Business Intelligence and Data Mining, 2005 Vol.1 No.1, pp.65 - 87

Published online: 05 Jul 2005 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article