Title: A preliminary study on automatic breast cancer data classification using semi-supervised fuzzy c-means

Authors: Daphne Teck Ching Lai; Jonathan M. Garibaldi

Addresses: School of Computer Science, University of Nottingham, Jubilee Campus, Nottingham NG8 1BB, UK ' School of Computer Science, University of Nottingham, Jubilee Campus, Nottingham NG8 1BB, UK

Abstract: Soria et al. have successfully identified six clinically useful and novel subgroups in the Nottingham Tenovus Breast Cancer (NTBC) data set. However, the methodology used is semi-manual and no single clustering can automatically classify the data set so far. In this work, two variations of semi-supervised Fuzzy c-Means (ssFCM) algorithms are explored to classify the NTBC data set into the same six subgroups. Three experiments were conducted using the two ssFCM algorithms and the results are evaluated by using inter-rater agreement measures. The ssFCM algorithms identified the six classes of breast cancer but it is in low agreement with Soria's classification. This, together with high agreement using two clustering algorithms, suggests that the problem may lie in the way we use ssFCM rather than in model correctness. Despite this, we consider the ssFCM results promising and note that work for further investigation in ssFCM is required.

Keywords: fuzzy c-means clustering; semi-supervised FCM; breast cancer; data classification; automatic classification.

DOI: 10.1504/IJBET.2013.058535

International Journal of Biomedical Engineering and Technology, 2013 Vol.13 No.4, pp.303 - 322

Received: 23 Nov 2012
Accepted: 28 Sep 2013

Published online: 27 Sep 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article