Title: Searching for multiple equivalent predictors from oral squamous cell carcinoma dataset using statistically equivalent signature algorithm

Authors: Utkarsh Mahadeo Khaire; R. Dhanalakshmi

Addresses: National Institute of Technology Nagaland, Dimapur, Nagaland 797103, India ' National Institute of Technology Puducherry, Karaikal, 609609, India

Abstract: Selecting highly predictive features from the high dimensional dataset is a formidable task. The existing feature selection algorithms available today are not dealing with multiple equally predictive subsets of features. We strongly believe that there is other subsets of features as well, which can give equivalent predictive accuracy as that of state-of-the-art algorithms. Statistically equivalent signature (SES) is one such feature selection algorithm, which is centred on constraint-based learning of Bayesian networks. The proposed model selects equivalent subsets of features from oral squamous cell carcinoma (OSCC) dataset with the help of SES. To inspect the validity of SES algorithms output, we are using K-nearest neighbour (KNN), support vector machine (SVM) and neural networks (NN) on each subset of predictive features. Finally, the results of proposed technique is compared with support vector machine - recursive feature elimination (SVM-RFE). SES produces more stable accuracy as compared to SVM-RFE.

Keywords: high dimensional dataset; statistically equivalent signature; SES; oral squamous cell carcinoma; OSCC; microarray; K-nearest neighbour; KNN; support vector machine; SVM; neural networks.

DOI: 10.1504/IJMOR.2020.109052

International Journal of Mathematics in Operational Research, 2020 Vol.17 No.1, pp.78 - 89

Received: 14 Feb 2019
Accepted: 22 May 2019

Published online: 17 Aug 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article