Title: Evaluation of clustering algorithms for word sense disambiguation

Authors: Bartosz Broda; Wojciech Mazur

Addresses: Institute of Informatics, Wroclaw University of Technology, 50-370 Wroclaw, Poland. ' Institute of Informatics, Wroclaw University of Technology, 50-370 Wroclaw, Poland

Abstract: Word sense disambiguation in text is still a difficult problem as the best supervised methods require laborious and costly preparation of training data. This work focuses on evaluation of a few selected clustering algorithms in the task of word sense disambiguation. We used five datasets for two languages (English and Polish). Five clustering algorithms (k-means, k-medoids, hierarchical agglomerative clustering, hierarchical divisive clustering, graph-partitioning-based clustering) and two weighting schemes were tested. The best parameters of the algorithms were chosen using 5 × 2 cross validation. BCubed measure was employed for evaluation of clustering. We conclude that with these settings agglomerative hierarchical clustering achieves best results for all the datasets.

Keywords: clustering algorithms; word sense disambiguation; WSD; BCubed; senseval; bag of words; English; Polish.

DOI: 10.1504/IJDATS.2012.047817

International Journal of Data Analysis Techniques and Strategies, 2012 Vol.4 No.3, pp.219 - 236

Published online: 06 Sep 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article