Title: The effect of class imbalance, complexity, size, and learning distribution on classifier performance

Authors: Sofia Visa

Addresses: Department of Mathematics and Computer Science, College of Wooster, 1189 Beall Ave., Wooster, OH 44691, USA

Abstract: Classes of real world datasets have various properties (such as imbalance, size, complexity, and class distribution) that make the classification task more difficult. We investigate the robustness of six classification techniques over data having various combinations of the above mentioned properties. One artificial domain and six real world datasets are used in these experiments. Results of our analysis point to the frequency-based classifiers (such as the fuzzy and the Bayes classifiers) as being more robust over various imbalance, size, complexity, and training distribution.

Keywords: classification techniques; learning distribution; imbalance data; fuzzy sets; fuzzy logic; classifier performance; data complexity; size.

DOI: 10.1504/IJAIP.2011.043435

International Journal of Advanced Intelligence Paradigms, 2011 Vol.3 No.3/4, pp.341 - 366

Published online: 26 Mar 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article