Title: KH-FC: krill herd-based fractional calculus algorithm for text document clustering using MapReduce structure

Authors: Priyanka Shivaprasad More; Baljit Singh Saini

Addresses: School of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India ' School of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India

Abstract: In this paper, krill herd-based fractional calculus (KH-FC) using MapReduce framework is proposed for effective text document clustering. Here, the stop word removal and stemming model is applied in the pre-processing step, helps to remove redundant information and hence the size of the information is reduced, which further enhances the clustering accuracy. Furthermore, term frequency (TF) and inverse document frequency (IDF) are employed for extracting significant features. Finally, the developed KH-FC model is utilised for clustering the text documents. The developed KH-FC algorithm is developed by combining the FC concept into the KH technique. In this method, pre-processing and feature extraction is performed in the mapper phase, whereas the clustering process is executed in the reducer phase. The performance of the developed approach is evaluated based on performance metrics, like accuracy, Jaccard coefficient, and F-measure. The developed KH-FC approach obtained better performance in terms of accuracy, Jaccard coefficient, and F-measure is 0.983, 0.936 and 0.967, respectively.

Keywords: text document clustering; fraction calculus; krill herd algorithm; Jaccard similarity; term frequency-inverse document frequency; IDF.

DOI: 10.1504/IJCSE.2022.127188

International Journal of Computational Science and Engineering, 2022 Vol.25 No.6, pp.668 - 684

Received: 06 Apr 2021
Accepted: 13 Aug 2021

Published online: 25 Nov 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article