Title: Imbalanced COVID-19 dataset classification with bidirectional sampling based on sample correlation

Authors: Mansheng Xiao; Mingkai Fan; Guocai Zuo

Addresses: School of Computer Science, Hunan University of Technology, Zhuzhou 412007, China ' School of Computer Science, Hunan University of Technology, Zhuzhou 412007, China ' Hunan Software Vocational and Technical University, Xiangtan 411100, China

Abstract: Aiming at the problem that the classification hyperplane is inclined toward the positive class when the CNN model directly classifies the imbalanced dataset, resulting in a high misclassification rate, a bidirectional sampling method based on sample correlation is proposed. Firstly, the sampling ratio is designed according to the numbers of the two types of samples, and then, considering the influence of the positional correlation between the samples, the methods of under-sampling negative samples and oversampling of positive samples are proposed. Therefore, the balance of the numbers of positive and negative samples is achieved. Finally, after sampling the imbalanced dataset of Kaggle images, the deep learning model SSD is used to train and identify the COVID-19 samples. The experimental comparison results show that the method proposed in this paper can improve the evaluation indices such as F+-measure and G-means by more than 5% in the identification of COVID-19.

Keywords: bidirectional sampling; sample correlation; FCM; SSD; COVID-19.

DOI: 10.1504/IJES.2023.134105

International Journal of Embedded Systems, 2023 Vol.16 No.1, pp.1 - 8

Received: 17 Aug 2022
Received in revised form: 16 Nov 2022
Accepted: 14 Dec 2022

Published online: 11 Oct 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article