Title: Distributed document clustering algorithms: a recent survey
Authors: J.E. Judith; J. Jayakumari
Addresses: Department of Computer Science and Engineering, Noorul Islam Centre for Higher Education, Kumaracoil, India ' Department of Electronics and Communication Engineering, Noorul Islam Centre for Higher Education, Kumaracoil, India
Abstract: Distributed data mining paradigm is an active research area due to the enormous volume of data that are to be processed from across a wide cluster of data nodes. Document clustering algorithms are widely applied in a variety of distributed environments like peer-to-peer networks, wireless sensor networks, etc. This paper entails a comprehensive review on most of the recent distributed document clustering algorithms that is ultimately making massive impacts on the technological realm. These algorithms are analysed based on few pivotal elements such as clustering quality, scale-up, speed-up and accuracy. Recent advances in technology have developed MapReduce-based distributed document clustering algorithms, which show dramatic improvements in the aforementioned analytical elements. Based on the review, intelligent discussions are presented for algorithm development and implementation.
Keywords: distributed documents; document clustering; speed-up; scale-up; MapReduce; clustering algorithms; data mining.
DOI: 10.1504/IJENM.2015.071134
International Journal of Enterprise Network Management, 2015 Vol.6 No.3, pp.207 - 221
Received: 10 Oct 2014
Accepted: 17 Nov 2014
Published online: 13 Aug 2015 *