Title: Content-based indexing of visual information in the web pages

Authors: Ali Benafia; Ramdane Maamri; Zaidi Sahnoun; Sara Benafia

Addresses: University of Batna, Avenue Chahid Boukhlouf, Batna (05000), Algeria ' Lire Laboratory, Department of Software Technologies and Information Systems, Faculty of New Technologies of Information and Communication, University Constantine 2, Nouvelle ville Ali, Mendjli BP67A, Constantine (25000), Algeria ' Lire Laboratory, Department of Software Technologies and Information Systems, Faculty of New Technologies of Information and Communication, University Constantine 2, Nouvelle ville Ali, Mendjli BP67A, Constantine (25000), Algeria ' University of Biskra, BP 145 RP, Biskra (07000), Algeria

Abstract: As the World Wide Web has grown, the methods for indexing have changed considerably. It is in this context that we present in this paper a novel approach to the indexing of web pages. In general, a web page contains multiple objects at once; some objects have a specific stain for static and dynamic page design. Others are more important and relevant in terms of information as regards the page content. In order to identify this informative part, we developed a novel approach for cleaning superfluous objects and then have kept the image and textual part in the web page. Once the images and texts are identified in page, we have established a link between the text and the visual characteristics of image to build a bridge over the semantic gap. To do this, we then explore the possibilities of integration of visual and textual features. The proposed approach is tested on a large corpus and the results are compared with the human expert indexing.

Keywords: image features; textual features; feature extraction; term extraction; web page cleaning; text similarity; web pages; content-based indexing; visual information; web page indexing.

DOI: 10.1504/IJRIS.2015.070879

International Journal of Reasoning-based Intelligent Systems, 2015 Vol.7 No.1/2, pp.93 - 113

Received: 17 Sep 2013
Accepted: 07 Apr 2014

Published online: 31 Jul 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article