Synthetic sampling approach based on model-based clustering for imbalanced data Online publication date: Tue, 08-Jan-2019
by Shaukat Ali Shahee; Usha Ananthakumar
International Journal of Artificial Intelligence and Soft Computing (IJAISC), Vol. 6, No. 4, 2018
Abstract: A dataset exhibits class imbalance problem when one class has very few examples compared to the other class also referred to as between class imbalance. Apart from between-class imbalance, imbalance within classes where classes are composed of different number of sub-clusters with these sub-clusters containing different number of examples may also affect the performance of the classifier. In this paper, we propose a method that can handle both between-class and within-class imbalance simultaneously that also takes into consideration various data intrinsic characteristics. The proposed method uses model-based clustering with respect to classes to identify the sub-clusters present in the dataset and oversamples examples in each sub-cluster in such a manner that it eliminates between class and within class imbalance simultaneously. We validate our approach using neural network on ten publicly available datasets. The experimental results show the proposed method to be statistically significantly superior to other methods.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Artificial Intelligence and Soft Computing (IJAISC):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com