Title: Supervised learning approaches and feature selection - a case study in diabetes

Authors: Yugowati Praharsi; Shaou-Gang Miaou; Hui-Ming Wee

Addresses: Department of Industrial and System Engineering, Chung Yuan Christian University, Chung Li, 32023, Taiwan; Department of Information Technology, Satya Wacana Christian University, Salatiga, 50711, Indonesia ' Department of Electronic Engineering, Chung Yuan Christian University, Chung Li, 32023, Taiwan ' Department of Industrial and System Engineering, Chung Yuan Christian University, No. 200, Chung Pei Rd., Chungli, 32023, Taiwan

Abstract: Data description and classification are important tasks in supervised learning. In this study, three supervised learning methods such as k-nearest neighbour (k-NN), support vector data description (SVDD) and support vector machine (SVM) are considered because they do not suffer from the problem of introducing a new class. The data sample chosen is Pima Indians diabetes. The results show that feature selection based on mean information gain and a standard deviation threshold can be considered as a substitute for forward selection. This indicates that data variation using information gain is an important factor that must be considered in selecting feature subset. Finally, among eight candidate features, glucose level is the most prominent feature for diabetes detection in all classifiers and feature selection methods under consideration. Relevancy measurement in information gain can sort out the most important feature to the least significant one. It can be very useful in medical applications such as defining feature prioritisation for symptom recognition.

Keywords: supervised learning; k-nearest neighbour; k-NN; support vector data description; SVDD; support vector machines; SVM; classification; feature selection; glucose level; diabetes detection; feature prioritisation; symptom recognition.

DOI: 10.1504/IJDATS.2013.055346

International Journal of Data Analysis Techniques and Strategies, 2013 Vol.5 No.3, pp.323 - 337

Published online: 28 Feb 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article