Title: Fine-tuning predictive models: a comprehensive analysis for accurate diabetes risk stratification
Authors: Nuzhat Ahmad Yatoo; I. Sathik Ali
Addresses: Department of Computer Applications, B.S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India ' Department of Information Technology, B.S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India
Abstract: Diabetes is a major global health concern since it causes serious complications like kidney disease, heart problems, and eyesight loss. In pursuit of accurate disease diagnosis machine learning (ML) methods have been employed resulting in favourable outcomes. In this study, an innovative diabetes prediction model is introduced that incorporates a comparison of various ML techniques, including logistic regression, K-nearest neighbour, naive Bayes, decision tree, and CatBoost on a diabetes database in order to improve on existing systems for disease prediction. The model is specifically concerned with diabetes by establishing the best performing model based on performance metrics such as accuracy, recall, precision, F1 score, Mathews correlation coefficient (MCC), Cohen Kappa, index of agreement and area under the curve (AUC). To optimise their results the techniques are subjected to hyperparameter tuning. The metric values thus obtained from the proposed methodology establish CatBoost as the best performing model and, hence, the most viable for diabetes prediction.
Keywords: diabetes; machine learning; feature selection; SMOTE Tomek; hyper parameter optimisation; prediction.
DOI: 10.1504/IJBRA.2025.146351
International Journal of Bioinformatics Research and Applications, 2025 Vol.21 No.3, pp.256 - 283
Received: 16 Jan 2024
Accepted: 18 Jun 2024
Published online: 23 May 2025 *