Authors: Xiaohui Yuan; Mohamed Abouelenien
Addresses: Faculty of Information Engineering, China University of Geosciences, Wuhan, China; Department of Computer Science and Engineering, University of North Texas, Denton, Texas, 76203, USA ' Department of Computer Science and Engineering, University of North Texas, Denton, Texas, 76203, USA
Abstract: The acquisition of face images is usually limited due to policy and economy considerations, and hence the number of training examples of each subject varies greatly. The problem of face recognition with imbalanced training data has drawn attention of researchers and it is desirable to understand in what circumstances imbalanced dataset affects the learning outcomes, and robust methods are needed to maximise the information embedded in the training dataset without relying much on user introduced bias. In this article, we study the effects of uneven number of training images for automatic face recognition and proposed a multi-class boosting method that suppresses the face recognition errors by training an ensemble with subsets of examples. By recovering the balance among classes in the subsets, our proposed multiBoost.imb method circumvents the class skewness and demonstrates improved performance. Experiments are conducted with four popular face datasets and two synthetic datasets. The results of our method exhibits superior performance in high imbalanced scenarios compared to AdaBoost.M1, SAMME, RUSboost, SMOTEboost, SAMME with SMOTE sampling and SAMME with random undersampling. Another advantage that comes with ensemble training using subsets of examples is the significant gain in efficiency.
Keywords: classification; imbalanced data; multi-class boosting; learning; biometrics; image acquisition; facial images; face recognition; training data.
International Journal of Granular Computing, Rough Sets and Intelligent Systems, 2015 Vol.4 No.1, pp.13 - 29
Received: 20 Mar 2015
Accepted: 20 Mar 2015
Published online: 16 Feb 2016 *