Title: Large scale MicroBlog location data capture method based on dynamic web page parsing

Authors: Yu Ji; Huanhuan Liu; Zhenzhen Wang; Rui Sun

Addresses: Organization Department, Hebei Institute of Mechanical and Electrical Technology, Xingtai, 054000, China ' Department of Information Engineering, Hebei Institute of Mechanical and Electrical Technology, Xingtai, 054000, China ' Department of Information Engineering, Hebei Institute of Mechanical and Electrical Technology, Xingtai, 054000, China ' Department of Information Engineering, Hebei Institute of Mechanical and Electrical Technology, Xingtai, 054000, China

Abstract: Due to the large scale of data, the deviation coefficient of the captured data is large and the capture efficiency is low. To this end, a large-scale Weibo location data retrieval method based on dynamic web page parsing is proposed. Firstly, based on the source of Weibo location data, artificial neural models and random functions are introduced to calculate the weights of feature data. Next, generate a feature vector table and classifier model, and filter the feature text using the established classification model. Finally, by matching the feature data of Weibo location data between dynamic script sites and web pages, a dynamic script parsing framework for Weibo location data on web pages is constructed, and dynamic web page parsing technology is used to capture Weibo location data. The experimental results show that the proposed method has only a 0.1% error in data capture bias, and the capture efficiency reaches 99%. Therefore, this method can significantly improve the crawling effect of large-scale Weibo location data and has certain feasibility.

Keywords: dynamic web page parsing; MicroBlog location data; crawling; artificial neuron model; random function; dynamic script site.

DOI: 10.1504/IJWBC.2025.145137

International Journal of Web Based Communities, 2025 Vol.21 No.1/2, pp.36 - 49

Received: 28 Jun 2023
Accepted: 07 Nov 2023

Published online: 21 Mar 2025 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article