Title: An experimental approach of applying boruta and elastic net for variable selection in classifying breast cancer datasets

Authors: C.P. Sumathi; M.S. Padmavathi

Addresses: SDNB Vaishnav College for Women, Chennai, 600-044, Tamil Nadu, India ' SDNB Vaishnav College for Women, Chennai, 600-044, Tamil Nadu, India

Abstract: Feature selection identifies the key aspects involved in predicting the outcome. In this study, we propose boruta and elastic net (Enet) feature selection for classifying breast cancer datasets. A comparative study of boruta, Enet along with genetic algorithms (GA) and consistency-based subset feature selections are done, where Enet selected best features for Wisconsin diagnostic breast cancer (WDBC) and breast cancer datasets. To prove the stability of Enet feature selection, variable importance of machine learning algorithms like naive Bayes (NB), multilayer perceptron (MLP) and random forest (RF) is evaluated and compared. It is proved that the features obtained by Enet contain all the common variables selected by tested machine learning algorithms. The proposed Enet feature selection along with MLP for classification yields a better receiver operating characteristic (ROC): 0.990, 0.687 and a reduced root mean squared error (RMSE): 0.159, 0.429 for WDBC and breast cancer datasets, when compared with naive Bayes and RF.

Keywords: boruta; breast cancer; consistency-based; elastic net; multilayer perceptron; MLP; genetic algorithm; naive Bayes; variable importance; random forest.

DOI: 10.1504/IJKEDM.2019.105265

International Journal of Knowledge Engineering and Data Mining, 2019 Vol.6 No.4, pp.356 - 375

Received: 28 Apr 2019
Accepted: 04 Aug 2019

Published online: 22 Feb 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article