Title: Development of predictive model of diabetic using supervised machine learning classification algorithm of ensemble voting
Authors: Debabrata Datta; Madhubrata Bhattacharya; S. Suman Rajest; T. Shynu; R. Regin; S. Silvia Priscila
Addresses: Department of Information Technology, Heritage Institute of Technology, AnandaPur, Kolkata-700107, India ' Department of Physics, The Heritage College, Chowbaga Road, AnandaPur, Kolkata-700107, India ' Department of Research and Development, Dhaanish Ahmed College of Engineering, Chennai-601301, Tamil Nadu, India ' Department of Biomedical Engineering, Agni College of Technology, Chennai, Tamil Nadu, India ' Department of Computer Science and Engineering, SRM Instıtute of Science and Technology, Ramapuram, Chennai-89, Tamil Nadu, India ' Department of Computer Science, Bharath Institute of Higher Education and Research (BIHER), Tamil Nadu, India
Abstract: Predicting the health status of patients suffering from diabetic is an important task in the health sector because the medical history of diabetic evidenced that it is a slow killer. If data collection is enough, suitable, and noise-free, such difficulties can be predicted accurately. AI-based machine learning algorithms can predict diabetes. Overfitting and underfitting impair the accuracy of classification machine learning models. Individual machine-learning models are weak learners. Hence, the demand is to develop a strong model (overall model) by combining all weak learner models to improve accuracy. Voting creates a robust and accurate model. Voting is classified as soft and hard. Ensemble machines learning models like RF, AdaBoost, and Gboost are integrated with LR, DT and KNN models. Our ensemble voting classifier model combines RF, AdaBoost, Gboost, LR, DT, and KNN. This voting model predicts diabetes with 97+ % accuracy. LR, DT, and KNN models estimate precision, recall, and F1. We tested our proposed models on two sets of input datasets with numerical and categorical features and found that categorical features improve prediction accuracy.
Keywords: diabetic; ensemble voting; classification; K-nearest neighbour; KNN; adaptive boosting; AdaBoost; random forest; RF; logistic regression; LR; decision tree; DT; gradient boosting; Gboost.
DOI: 10.1504/IJBRA.2023.133695
International Journal of Bioinformatics Research and Applications, 2023 Vol.19 No.3, pp.151 - 169
Received: 28 Feb 2023
Accepted: 10 May 2023
Published online: 29 Sep 2023 *