Authors: Sotiris B. Kotsiantis
Addresses: Educational Software Development Laboratory, Department of Mathematics, University of Patras, Rio 26500, Greece
Abstract: Many real-world data sets exhibit skewed class distributions in which almost all instances are allotted to a class and far fewer instances to a smaller, but usually more interesting class. A classifier induced from an imbalanced data set has, characteristically, a low error rate for the majority class and an undesirable error rate for the minority class. This paper firstly provides a systematic study on the various methodologies that have tried to handle this problem. Finally, it presents an experimental study of these methodologies with a modification of Decorate algorithm and it concludes that such a framework can be a more valuable solution to the problem. Our method seems to permit improved identification of difficult small classes in predictive analysis, while keeping the classification ability of the majority class in an acceptable level.
Keywords: supervised machine learning; imbalanced data sets; local learning; skewed class distributions; classification.
International Journal of Computer Applications in Technology, 2008 Vol.33 No.2/3, pp.91 - 98
Published online: 10 Dec 2008 *Full-text access for editors Access for subscribers Purchase this article Comment on this article