Title: DSCC: a data set of cervical cell images for cervical cytology screening

Authors: Hua Chen; Juan Liu; Yu Jin; Baochuan Pang; Dehua Cao; Di Xiao

Addresses: Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, Hubei, China ' Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, Hubei, China ' Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, Hubei, China ' Landing Artificial Intelligence Centre for Pathological Diagnosis, Wuhan, China ' Landing Artificial Intelligence Centre for Pathological Diagnosis, Wuhan, China ' Landing Artificial Intelligence Centre for Pathological Diagnosis, Wuhan, China

Abstract: The lack of large-scale public datasets aiming for cytological screening of cervical cancer has hindered the research of developing robust cytological screening models. To address this problem, we develop a dataset DSCC containing 15,509 cervical cell images labelled by experienced cytologists. As far as we know, the number of cell images in DSCC is nearly four times that of the largest data set known at present. Considering that the purpose of cytological screening is not for cancer diagnosis, but for judging whether the subject needs further examination, we classify the cell images into three categories: Normal, SIL (squamous intra-epithelial lesion or cancer cell, suggesting further examination), ASC (atypical squamous cell, needing to be confirmed by a professional cytologist). Furthermore, we also provide a nucleus mask map for each cell based on the annotation of the cytologists, to facilitate researchers to conduct different studies. Based on the mask map, we extract 78 features for each cell that are included in the data set as well. Experiments results demonstrate that DSCC is very useful for researchers to build classification methods for automatic cervical cytology screening.

Keywords: cell image; cervical cytology screening; dataset; classification; machine learning.

DOI: 10.1504/IJDMB.2022.130325

International Journal of Data Mining and Bioinformatics, 2022 Vol.27 No.1/2/3, pp.171 - 186

Received: 31 May 2022
Accepted: 18 Oct 2022

Published online: 17 Apr 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article