Authors: Sihem Khemakhem; Younes Boujelbene
Addresses: Faculty of Economics and Management of Sfax, University of Sfax, Road of Airport, Km 4 Sfax, 3018, Tunisia ' Faculty of Economics and Management of Sfax, University of Sfax, Road of Airport, Km 4 Sfax, 3018, Tunisia
Abstract: Support vector machines (SVM) have a limited performance in credit scoring issues due to the imbalanced data sets in which the number of unpaid is lower than paid loans. In this work, we developed an SVM model with more kernels on a set of imbalanced data and suggested two data resampling alternatives: random over sampling (ROS) and synthetic minority oversampling technique (SMOTE). The aim of this work is to explore the relevance of re-sampling data with the SVM technique for an accurate credit risk prediction rate to the class imbalance constraint. The performance criteria chosen to evaluate the suggested technique were accuracy, sensitivity specificity, error type I, error type II, G-mean and the area under the receiver operating characteristic curve (AUC). Significant empirical results obtained from an experimental study of a real imbalanced database of loans granted by a Tunisian bank demonstrated the performance improvement thanks to sampling strategies in SVM, thus leading to a better prediction accuracy of the creditworthiness of borrowers.
Keywords: credit scoring; support vector machines; SVM; synthetic minority oversampling technique; SMOTE; random over sampling; ROS; credit risk assessment; imbalanced datasets; performance criteria; Tunisian bank; creditworthiness prediction accuracy.
International Journal of Data Mining, Modelling and Management, 2018 Vol.10 No.2, pp.171 - 187
Accepted: 09 Sep 2017
Published online: 07 Jun 2018 *