Title: Logistic regression for imbalanced learning based on clustering

Authors: Huaping Guo; Tao Wei

Addresses: School of Computer and Information Technology, Xinyang Normal University, Xinyang, 464000, China ' Computer College, Henan Institute of Engineering, Zhengzhou, 450000, China

Abstract: Class-imbalance is very common in the real world. For the imbalanced class distribution, traditional state-of-the-art classifiers do not work well on imbalanced datasets. In this paper, we apply the well known statistical model logistic regression to imbalanced learning problem and, in order to improve its performance, we use cluster algorithms as the data pre-processing approach to partition majority class data to clusters. Then the logistic regression is learned on the corresponding rebalanced datasets. Experimental results show that, compared with other state-of-the art methods, the proposed one shows significantly better performance on measures of recall, g-mean, f-measure, AUC and accuracy.

Keywords: class imbalance; logistic regression; clustering.

DOI: 10.1504/IJCSE.2019.096987

International Journal of Computational Science and Engineering, 2019 Vol.18 No.1, pp.54 - 64

Received: 07 Mar 2017
Accepted: 15 Sep 2017

Published online: 14 Dec 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article