Title: A hybrid under-sampling approach for mining unbalanced datasets: applications to banking and insurance
Authors: Madireddi Vasu, Vadlamani Ravi
Addresses: Institute for Development and Research in Banking Technology, Castle Hills Road #1, Masab Tank, Hyderabad-500057, (AP) India. ' Institute for Development and Research in Banking Technology, Castle Hills Road #1, Masab Tank, Hyderabad-500057, (AP) India
Abstract: In solving unbalanced classification problems, machine learning algorithms are overwhelmed by the majority class and consequently misclassify the minority class observations. Here, we propose a hybrid under-sampling approach to improve the performance of classifiers. The proposed approach first employs k-reverse nearest neighbour (kRNN) method to detect the outliers from majority class. After removing the outliers, using K-means clustering, K-clusters are selected to further reduce the influence of the majority class. Then, we employed support vector machine (SVM), logistic regression (LR), multi layer perceptron (MLP), radial basis function network (RBF), group method of data handling (GMDH), genetic programming (GP) and decision tree (J48) for classification purpose. The effectiveness of the proposed approach was demonstrated on datasets taken from insurance fraud detection and credit card churn in banking domain. Ten-fold cross validation method was used in the study. It is observed that the proposed approach improved the performance of the classifiers.
Keywords: insurance fraud detection; credit card churn prediction; data mining; unbalanced datasets; machine learning; banking; classifiers; classifier performance; k-means clustering; support vector machines; SVM; logistic regression; multilayer perceptron; radial basis function networks; RBF neural networks; GMDH; genetic programming; decision trees.
International Journal of Data Mining, Modelling and Management, 2011 Vol.3 No.1, pp.75 - 105
Published online: 03 Mar 2011 *Full-text access for editors Access for subscribers Purchase this article Comment on this article