Title: Design of a web-based text mining strategy for collecting customer information

Authors: Jiafei Geng; Xiaoli Wu

Addresses: Department of Public Security, Anhui Public Security College, Hefei, 230088, China ' Institute of Economy, Trade and Tourism, Anhui Lvhai Vocational College of Business, Hefei, 230001, China

Abstract: The research firstly improves the keyword extraction algorithm in the text mining algorithm by introducing information entropy, relative entropy, word length weighting factor and word location weighting factor, and processes them in the Spark computing framework. The improved algorithm was then applied to a web text mining system and used to mine and collect information from internet customers. To test the performance of the proposed algorithm, the study compared the recall, accuracy and F-values of the four algorithms under four datasets. The proposed algorithm was found to have better performance with maximum recall value, accuracy value and F-value of 81.6%, 84.8% and 84.1% respectively in the dataset. Finally, it was found that the latter could achieve a maximum prediction accuracy of 99.5%, which is much more accurate than the traditional algorithm for customer information collection.

Keywords: text mining; web; information gathering; big data; computational framework.

DOI: 10.1504/IJCSYSE.2024.142775

International Journal of Computational Systems Engineering, 2024 Vol.8 No.3/4, pp.220 - 228

Received: 22 Mar 2023
Accepted: 11 Jun 2023

Published online: 21 Nov 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article