Int. J. of Information Technology, Communications and Convergence   »   2011 Vol.1, No.2

 

 

Title: A comparative study of feature weighting methods for document co-clustering

 

Author: Yunming Ye, Xutao Li, Biao Wu, Yan Li

 

Addresses:
Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, China.
Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, China.
Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, China.
Department of Computing, The Hong Kong Polytechnic University, Mailbox 56, PQ 821, Hung Hom, KLN, Hong Kong

 

Abstract: Document clustering is an important task in data mining. Co-clustering has become one of state-of-the-art methods for this task. In this paper, we propose a feature weighting co-clustering algorithm for document co-clustering and present a comparative study on how different weighting methods affect its performance. The compared feature weighting approaches include inverse document frequency-based methods, information theory-based methods and term variance-based methods. The comparison results on benchmark data sets show that the mutual information weighting method can lead to better performance for the proposed algorithm than other weighting schemes.

 

Keywords: document co-clustering; feature weighting; text clustering; document clustering; data mining; inverse document frequency; information theory; term variance.

 

DOI: 10.1504/IJITCC.2011.039286

 

Int. J. of Information Technology, Communications and Convergence, 2011 Vol.1, No.2, pp.206 - 220

 

Available online: 29 Mar 2011

 

 

Editors Full text accessAccess for SubscribersPurchase this articleComment on this article