Title: A comparative study of feature weighting methods for document co-clustering

Authors: Yunming Ye, Xutao Li, Biao Wu, Yan Li

Addresses: Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, China. ' Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, China. ' Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, China. ' Department of Computing, The Hong Kong Polytechnic University, Mailbox 56, PQ 821, Hung Hom, KLN, Hong Kong

Abstract: Document clustering is an important task in data mining. Co-clustering has become one of state-of-the-art methods for this task. In this paper, we propose a feature weighting co-clustering algorithm for document co-clustering and present a comparative study on how different weighting methods affect its performance. The compared feature weighting approaches include inverse document frequency-based methods, information theory-based methods and term variance-based methods. The comparison results on benchmark data sets show that the mutual information weighting method can lead to better performance for the proposed algorithm than other weighting schemes.

Keywords: document co-clustering; feature weighting; text clustering; document clustering; data mining; inverse document frequency; information theory; term variance.

DOI: 10.1504/IJITCC.2011.039286

International Journal of Information Technology, Communications and Convergence, 2011 Vol.1 No.2, pp.206 - 220

Published online: 29 Mar 2011 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article