Title: A multimodal content-based approach for web pages analysis

Authors: Antonio M. Rinaldi

Addresses: DIETI – Dipartimento di Ingegneria Elettrica e delle Tecnologie dell'Informazione, 80125 Via Claudio, 21, Napoli, Italy; IKNOS-LAB – Intelligent and Knowledge Systems, LUPT, Università di Napoli Federico II, 80134 Via Toledo, 402, Napoli, Italy

Abstract: Multimedia information retrieval (MIR) has become a very active research area. Many retrieval approaches, based on extracting and representing visual properties of multimedia data, have been developed. These approaches are usually based on the processing of low level features (i.e., visual features for image and video data) or on high level knowledge: a combination of the two has not still drawn enough attention. This paper introduces a novel approach and a complete framework to MIR, in which the user uses both textual and visual semantic information in order to retrieve data more accurately. Our approach has been implemented in a prototype system and several experiments in the World Wide Web domain have been carried out using numerous tests. These tests show that the integration of such data significantly improves the retrieval system performances.

Keywords: multimodal web information retrieval; content-based information retrieval; CBIR; semantic relatedness metric; ontologies; WordNet; web pages; web page analysis; websites; website analysis; multimedia information retrieval; MIR; textual information; visual information; semantic information.

DOI: 10.1504/IJKEDM.2013.059346

International Journal of Knowledge Engineering and Data Mining, 2013 Vol.2 No.4, pp.292 - 316

Published online: 31 Jul 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article