Title: Constrained co-clustering with non-negative matrix factorisation

Authors: Amit Salunke; Xumin Liu; Manjeet Rege

Addresses: Department of Computer Science, Rochester Institute of Technology, Rochester, NY, USA. ' Department of Computer Science, Rochester Institute of Technology, Rochester, NY, USA. ' Department of Computer Science, Rochester Institute of Technology, Rochester, NY, USA

Abstract: Co-clustering refers to the problem of deriving sub-matrices of the data matrix by simultaneously clustering the rows (data instances) and columns (features) of the matrix. While very effective in discovering useful knowledge, many of the co-clustering algorithms adopt a completely unsupervised approach. Integration of domain knowledge can guide the co-clustering process and greatly enhance the overall performance. We propose a semi-supervised Non-negative Matrix-factorisation (SS-NMF) based framework to integrate domain knowledge in the form of must-link and cannot-link constraints. Specifically, we augment the data matrix by integrating the constraints using metric learning and then perform NMF to obtain co-clustering. Under the proposed framework, we present two approaches to integrate domain knowledge, viz. a distance metric learning approach and an information theoretic metric learning approach. Through experiments performed on real-world web service data and publicly available text datasets, we demonstrate the performance of the proposed SS-NMF based approach for data co-clustering.

Keywords: semi-supervised matrix factorisation; non-negative matrix factorisation; clustering; data co-clustering; cannot-link constraints; must-link constraints; domain knowledge integration; metric learning; information theory.

DOI: 10.1504/IJBIDM.2012.048728

International Journal of Business Intelligence and Data Mining, 2012 Vol.7 No.1/2, pp.60 - 79

Published online: 12 Nov 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article