Title: Clustering-based hybrid resampling techniques for social lending data

Authors: Pankaj Kumar Jadwal; Sonal Jain; Basant Agarwal

Addresses: JK Lakshmipat University, Jaipur, India ' JK Lakshmipat University, Jaipur, India ' Department of Computer Science and Engineering, Indian Institute of Information Technology Kota, MNIT Campus, Jaipur, India

Abstract: Social lending is the most popular and emerging loan disbursement process where an individual can act as a borrower or lender. Credit risk evaluation of the borrowers in an effective way is a crucial task, especially in social lending, where chances of being defaulted are more than the traditional models. Social lending datasets are imbalanced in nature due to the low number of defaulters than successful borrowers. Machine learning models based on such datasets contain biasing towards the class representing the majority of samples (majority class). Therefore, the probability of accurate prediction of minority class samples is decreased due to biasing towards majority class samples. In this paper, we propose a novel clustering-based hybrid sampling (CBHS) algorithm, where multi-phase K-means clustering is applied on the minority class samples to perform oversampling (KMBOS), and fuzzy C-means clustering is used on the majority class samples to perform undersampling (FCBU). Experiments results show that KMBOS and FCBU algorithms outperform state of the art techniques of oversampling and undersampling.

Keywords: credit risk; clustering; classification; hybrid model; oversampling; undersampling; class imbalance.

DOI: 10.1504/IJISTA.2021.120495

International Journal of Intelligent Systems Technologies and Applications, 2021 Vol.20 No.3, pp.183 - 198

Received: 04 Apr 2020
Accepted: 19 Aug 2020

Published online: 24 Jan 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article