Title: Web task automation: a standards-based proposal

Authors: Vicente Luque Centeno, Carlos Delgado Kloos, Luis Sanchez Fernandez, Norberto Fernandez Garcia

Addresses: Department of Telematics Engineering, Universidad Carlos III de Madrid, 28911 Leganes (Madrid), Spain. ' Department of Telematics Engineering, Universidad Carlos III de Madrid, 28911 Leganes (Madrid), Spain. ' Department of Telematics Engineering, Universidad Carlos III de Madrid, 28911 Leganes (Madrid), Spain. ' Department of Telematics Engineering, Universidad Carlos III de Madrid, 28911 Leganes (Madrid), Spain

Abstract: Tasks on the web are performed world-wide for many different purposes (banking, shopping, auctions, e-mail, hotel reservations, flight booking, etc.). Up to now, using typical HTML-based web browsers for the web required users to mechanically and continually interact with computer screen-view of remotely retrieved documents (clicking on links or buttons, filling and submitting forms, screen-scrolling, visually finding data on the screen, to name a few). When the amount of data within those documents is large, this manual navigation easily becomes cost and effort overwhelming, even for the simplest tasks. Developing ad-hoc wrapper agents that automate these tasks for the user, by intelligently integrating semistructured web|s data from heterogeneous sources, may considerably reduce these interactivity and effort requirements. Bargain finders or price comparers, among others, might present only final valuable results to the users, considerably reducing navigation effort. However, ad-hoc wrapper agents have traditionally had large development and maintenance costs. Due to the semistructured nature of HTML, any minor unexpected change often makes them not work properly. This paper presents several standards-based new techniques for reducing these development and maintenance costs and making these programs more compact and stable.

Keywords: web wrapper agent; screen scrapping; web data integration; semistructured information automation; web tasks; web mediator; information retrieval; software maintenance; XPath; Message Sequence Charts.

DOI: 10.1504/IJWET.2004.005239

International Journal of Web Engineering and Technology, 2004 Vol.1 No.3, pp.374 - 391

Published online: 14 Sep 2004 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article