Title: Support vector machines for credit risk assessment with imbalanced datasets

Authors: Sihem Khemakhem; Younes Boujelbene

Addresses: Faculty of Economics and Management of Sfax, University of Sfax, Road of Airport, Km 4 Sfax, 3018, Tunisia ' Faculty of Economics and Management of Sfax, University of Sfax, Road of Airport, Km 4 Sfax, 3018, Tunisia

Abstract: Support vector machines (SVM) have a limited performance in credit scoring issues due to the imbalanced data sets in which the number of unpaid is lower than paid loans. In this work, we developed an SVM model with more kernels on a set of imbalanced data and suggested two data resampling alternatives: random over sampling (ROS) and synthetic minority oversampling technique (SMOTE). The aim of this work is to explore the relevance of re-sampling data with the SVM technique for an accurate credit risk prediction rate to the class imbalance constraint. The performance criteria chosen to evaluate the suggested technique were accuracy, sensitivity specificity, error type I, error type II, G-mean and the area under the receiver operating characteristic curve (AUC). Significant empirical results obtained from an experimental study of a real imbalanced database of loans granted by a Tunisian bank demonstrated the performance improvement thanks to sampling strategies in SVM, thus leading to a better prediction accuracy of the creditworthiness of borrowers.

Keywords: credit scoring; support vector machines; SVM; synthetic minority oversampling technique; SMOTE; random over sampling; ROS; credit risk assessment; imbalanced datasets; performance criteria; Tunisian bank; creditworthiness prediction accuracy.

DOI: 10.1504/IJDMMM.2018.092538

International Journal of Data Mining, Modelling and Management, 2018 Vol.10 No.2, pp.171 - 187

Accepted: 09 Sep 2017
Published online: 07 Jun 2018 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article