Title: Fuzzy cluster stability analysis with missing values using resampling

Authors: Selma T. Milagre, Carlos Dias Maciel, Jose Carlos Pereira, Adriano A. Pereira

Addresses: Computational Science Department, Federal University of Goias, Av. Dr. Lamartine Pinto de Avelar, 1120 Catalao, GO 75705-220, Brazil. ' Electrical Department, University of Sao Paulo, Av. Trabalhador Sao-Carlense, 400, Sao Carlos, SP 13566-590, Brazil. ' Electrical Department, University of Sao Paulo, Av. Trabalhador Sao-Carlense, 400, Sao Carlos, SP 13566-590, Brazil. ' Department of Electrical Engineering, Federal University of Uberlandia, Av. Joao Naves de Avila, 2121, Uberlandia, MG 38400-902, Brazil

Abstract: Exploratory data analysis is often necessary to evaluate potential hypotheses for subsequent studies such as grouping the data in clusters. In real data sets the occurrence of incompleteness is very common. We propose a method that tolerates missing values for fuzzy clustering using resampling (bootstrapping) and cluster stability analysis. The quality of classification is based on the measures like F1 and Hubert. The central idea is to compare a reference cluster with many clusters from sub-samples of the original data set. The results demonstrate that our method is capable of identifying relevant partitions even with high presence of missing values.

Keywords: bioinformatics; missing values; fuzzy clustering; stability analysis; resampling; exploratory data analysis; clusters; bootstrapping; classification.

DOI: 10.1504/IJBRA.2009.024038

International Journal of Bioinformatics Research and Applications, 2009 Vol.5 No.2, pp.207 - 223

Published online: 24 Mar 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article