Research on crawling mechanism and policy for crawling product information from mobile internet
by Shu Wang; Jia Chen; Chonghuan Xu
International Journal of Computing Science and Mathematics (IJCSM), Vol. 8, No. 6, 2017

Abstract: Product information on the mobile internet grows fast in volume and becomes hard in acquisition. Companies tend to deliver product information on their well-tuned mobile websites or websites that is responsive to various mobile devices. Thus, this kind of site is more of a web app than a traditional website, which we call a rich internet application (RIA). With RIAs, information are kept secret from search engine spiders by means of HTML5, Ajax and other scripting techniques in deep web, user interactions are needed to trigger some prescribed events in some certain order to show the whole picture of the information we need. In this paper, we identified the crux of the problem is how to provide the mechanism to parse the scripts and manipulate document object model (DOM) and the policy to trigger user events and run the scrape process. A new mechanism and policy was formulated based on web crawler techniques and studies in Ajax-specified web crawlers. By remodelling web pages redesigning the architecture of web crawler and refining scrape algorithm, we successfully scrape product data from mobile internet RIAs.

Online publication date: Wed, 03-Jan-2018

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Computing Science and Mathematics (IJCSM):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com