A hybrid under-sampling approach for mining unbalanced datasets: applications to banking and insurance Online publication date: Thu, 03-Mar-2011
by Madireddi Vasu, Vadlamani Ravi
International Journal of Data Mining, Modelling and Management (IJDMMM), Vol. 3, No. 1, 2011
Abstract: In solving unbalanced classification problems, machine learning algorithms are overwhelmed by the majority class and consequently misclassify the minority class observations. Here, we propose a hybrid under-sampling approach to improve the performance of classifiers. The proposed approach first employs k-reverse nearest neighbour (kRNN) method to detect the outliers from majority class. After removing the outliers, using K-means clustering, K-clusters are selected to further reduce the influence of the majority class. Then, we employed support vector machine (SVM), logistic regression (LR), multi layer perceptron (MLP), radial basis function network (RBF), group method of data handling (GMDH), genetic programming (GP) and decision tree (J48) for classification purpose. The effectiveness of the proposed approach was demonstrated on datasets taken from insurance fraud detection and credit card churn in banking domain. Ten-fold cross validation method was used in the study. It is observed that the proposed approach improved the performance of the classifiers.
Online publication date: Thu, 03-Mar-2011
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Data Mining, Modelling and Management (IJDMMM):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email firstname.lastname@example.org