Title: A novel oversampling technique based on the manifold distance for class imbalance learning

Authors: Yinan Guo; Botao Jiao; Lingkai Yang; Jian Cheng; Shengxiang Yang; Fengzhen Tang

Addresses: School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China ' School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China ' School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China ' China Coal Research Institute, Beijing 100013, China ' De Montfort University, Leicester LE1 9BH, UK ' Shenyang Institute of Automation, Shenyang, China

Abstract: Oversampling is a popular problem-solver for class imbalance learning by generating more minority samples to balance the dataset size of different classes. However, resampling in original space is ineffective for the imbalance datasets with class overlapping or small disjunction. Based on this, a novel oversampling technique based on manifold distance is proposed, in which a new minority sample is produced in terms of the distances among neighbours in manifold space, rather than the Euclidean distance among them. After mapping the original data to its manifold structure, the overlapped majority and minority samples will lie in areas easily being partitioned. In addition, the new samples are generated based on the neighbours locating nearby in manifold space, avoiding the adverse effect of the disjoint minority classes. Following that, an adaptive adjustment method is presented to determine the number of the newly generated minority samples according to the distribution density of the matched-pair data. The experimental results on 48 imbalanced datasets indicate that the proposed oversampling technique has the better classification accuracy.

Keywords: class imbalance learning; oversampling; manifold learning; overlapping; small disjunction.

DOI: 10.1504/IJBIC.2021.119197

International Journal of Bio-Inspired Computation, 2021 Vol.18 No.3, pp.131 - 142

Received: 02 Mar 2020
Accepted: 06 Aug 2020

Published online: 29 Nov 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article