Title: Distributed document clustering algorithms: a recent survey

Authors: J.E. Judith; J. Jayakumari

Addresses: Department of Computer Science and Engineering, Noorul Islam Centre for Higher Education, Kumaracoil, India ' Department of Electronics and Communication Engineering, Noorul Islam Centre for Higher Education, Kumaracoil, India

Abstract: Distributed data mining paradigm is an active research area due to the enormous volume of data that are to be processed from across a wide cluster of data nodes. Document clustering algorithms are widely applied in a variety of distributed environments like peer-to-peer networks, wireless sensor networks, etc. This paper entails a comprehensive review on most of the recent distributed document clustering algorithms that is ultimately making massive impacts on the technological realm. These algorithms are analysed based on few pivotal elements such as clustering quality, scale-up, speed-up and accuracy. Recent advances in technology have developed MapReduce-based distributed document clustering algorithms, which show dramatic improvements in the aforementioned analytical elements. Based on the review, intelligent discussions are presented for algorithm development and implementation.

Keywords: distributed documents; document clustering; speed-up; scale-up; MapReduce; clustering algorithms; data mining.

DOI: 10.1504/IJENM.2015.071134

International Journal of Enterprise Network Management, 2015 Vol.6 No.3, pp.207 - 221

Received: 10 Oct 2014
Accepted: 17 Nov 2014

Published online: 13 Aug 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article