Title: Hierarchical clustering algorithm with combined criteria for large and complex similarity data
Author: Kensuke Tanioka; Hiroshi Yadohisa
Graduate School of Culture and Information Science, Doshisha University, Tatara Miyakodani 1-3, Kyotanabe-shi, Kyoto 610-0394, Japan.
Department of Culture and Information Science, Doshisha University, Tatara Miyakodani 1-3, Kyotanabe-shi, Kyoto 610-0394, Japan
Journal: Int. J. of Knowledge Engineering and Soft Data Paradigms, 2011 Vol.3, No.2, pp.121 - 131
Abstract: Recent developments in information technology have enabled us to deal with large and complex similarity data. Researchers often need to know the cluster structures of such a datasets before constructing inferential models or other such interrogation techniques. To reveal cluster structures, Chamelleon (Karypis et al., 1999) can make subclusters from datasets using graph partition methods and apply hierarchical clustering to reduce the amount of calculations and to reflect the structures in the clusters. Chamelleon can capture arbitrary shaped clusters from similarity data. It can consider intrasimilarities and intersimilarities when two clusters are combined. In addition, the method is robust for outliers whose objects are far from other objects in the same subcluster. However, it cannot detect the cluster structures that cannot be detected by the group average method. This paper proposes a new hierarchical clustering method based on the single linkage method for use when similarity data and subclusters are given. The proposed method has three advantages. First, it considers the intrasimilarities and intersimilarities of some parts in subclusters. Second, it considers the effects of outliers and cluster sizes. Finally, it detects arbitrary shaped cluster structures that cannot be detected by Chamelleon.
Keywords: arbitrary shaped clusters; Cure; Chamelleon; outliers; single linkage method; hierarchical clustering; complex similarity data; cluster structures; subclusters; cluster size.