Title: An improved TFIDF algorithm based on dual parallel adaptive computing model

Authors: Yuwan Gu; Yaru Wang; Juan Huan; Yuqiang Sun; Shoukun Xu

Addresses: School of Information Science and Engineering, Changzhou University, Changzhou, China ' School of Information Science and Engineering, Changzhou University, Changzhou, China ' School of Information Science and Engineering, Changzhou University, Changzhou, China ' School of Information Science and Engineering, Changzhou University, Changzhou, China ' School of Information Science and Engineering, Changzhou University, Changzhou, China

Abstract: The double parallel cloud computing framework based on graphics processing unit (GPU) and MapReduce is proposed. The method aims at the low efficiency for the large data sets on the stand-alone by text categorisation algorithm, constructs the adaptive computation process of double parallel computing and combines the advantage of improved term frequency-inverse document frequency (TFIDF) algorithm, and improves TFIDF text categorisation algorithm with double parallel adaptive computing. In different operating environments, the efficiency of improved TFIDF algorithm will be compared with different computing nodes. The result shows that the improved TFIDF based on dual parallel adaptation has an increase of 6.48% on Macro_F1 compared to the TFIDF based on CPU, and the operating efficiency has increased by nearly seven times. With the number of nodes increasing, the algorithm execution efficiency with double parallel adaptive computing is getting more and more effective.

Keywords: improved TFIDF algorithm; MapReduce; graphics processing unit; GPU; parallel computation.

DOI: 10.1504/IJES.2020.108278

International Journal of Embedded Systems, 2020 Vol.13 No.1, pp.18 - 27

Received: 01 Dec 2018
Accepted: 20 Jan 2019

Published online: 08 Jul 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article