Authors: Anita Bai; Anima Pradhan
Addresses: Department of Computer Science and Engineering, NIT Rourkela, 769008, India ' Department of Computer Science and Engineering, NIT Rourkela, 769008, India
Abstract: DNA microarray consists of huge amount of features with small number of samples. In this paper, we address the dimension reduction of DNA features in which relevant features are extracted among thousands of irrelevant ones through dimensionality reduction. This enhances the speed and accuracy of the classifiers. Principal component analysis (PCA) is a very powerful statistical technique, is used to satisfy the aim, is to project the original I-dimensional space into an I0 dimensional linear subspace, where I > I0 such that the variance in the data is maximally explained within the smaller I0 dimensional space to solve the curse of dimensionality problem. Neural networks (NN) and support vector machine (SVM) are implemented and their performances are measured and compared in terms of predictive accuracy, specificity and sensitivity. In our first contribution, we implemented PCA for significant feature extraction and then implement FFNN trained using back propagation (BP) and SVM on the reduced feature set. In the second part, we attempt to validate our results on three public data sets viz., leukaemia, ovarian and colon cancer data.
Keywords: cancer classification; feature extraction; principal component analysis; PCA; neural networks; support vector machines; SVM; microarray cancer data; DNA features; leukaemia; ovarian cancer; colon cancer.
International Journal of Computational Intelligence Studies, 2014 Vol.3 No.4, pp.339 - 355
Received: 18 May 2013
Accepted: 16 Nov 2013
Published online: 19 Jan 2015 *