Article: Clustering in the membership embedding space Journal: International Journal of Knowledge Engineering and Soft Data Paradigms (IJKESDP) 2009 Vol.1 No.4 pp.363 - 375 Abstract: In several applications of data mining to high-dimensional data, clustering techniques developed for low-to-moderate sized problems obtain unsatisfactory results. This is an aspect of the curse of dimensionality issue. A traditional approach is based on representing the data in a suitable similarity space instead of the original high-dimensional attribute space. In this paper, we propose a solution to this problem using the projection of data onto a so-called membership embedding space obtained by using the memberships of data points on fuzzy sets centred on some prototypes. This approach can increase the efficiency of the popular fuzzy C-means method in the presence of high-dimensional datasets, as we show in an experimental comparison. We also present a constructive method for prototypes selection based on simulated annealing that is viable for semi-supervised clustering problems. Inderscience Publishers - linking academia, business and industry through research

Title: Clustering in the membership embedding space

Authors: Maurizio Filippone, Francesco Masulli, Stefano Rovetta

Addresses: Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK. ' DISI, Dipartimento di Informatica e Scienze dell' Informazione, Universita di Genova and CNISM, Via Dodecaneso 35, Genoa, Italy; Center for Biotechnology, Temple University, 1900 N 12th Street, Philadelphia, PA 19122, USA. ' DISI, Dipartimento di Informatica e Scienze dell' Informazione, Universita di Genova and CNISM, Via Dodecaneso 35, Genoa, Italy

Abstract: In several applications of data mining to high-dimensional data, clustering techniques developed for low-to-moderate sized problems obtain unsatisfactory results. This is an aspect of the curse of dimensionality issue. A traditional approach is based on representing the data in a suitable similarity space instead of the original high-dimensional attribute space. In this paper, we propose a solution to this problem using the projection of data onto a so-called membership embedding space obtained by using the memberships of data points on fuzzy sets centred on some prototypes. This approach can increase the efficiency of the popular fuzzy C-means method in the presence of high-dimensional datasets, as we show in an experimental comparison. We also present a constructive method for prototypes selection based on simulated annealing that is viable for semi-supervised clustering problems.

Keywords: high-dimensional datasets; unsupervised clustering; semi-supervised clustering; fuzzy sets; embedding spaces; fuzzy C-means; FCM; simulated annealing; curse of dimensionality; knowledge engineering; data mining; membership embedding space.

DOI: 10.1504/IJKESDP.2009.028988

International Journal of Knowledge Engineering and Soft Data Paradigms, 2009 Vol.1 No.4, pp.363 - 375

Published online: 19 Oct 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Clustering in the membership embedding space

Keep up-to-date