Title: Automatic web pages hierarchical classification using dynamic domain ontologies

Authors: Antonio M. Rinaldi

Addresses: Dipartimento di Informatica e Sistemistica, Università di Napoli Federico II, 80125 Via Claudio, 21, Napoli, Italy

Abstract: The use of ontologies for knowledge representation has had a fast increase in the last years and they are used in several application context. One of these challenging applications is the web. Managing large amount of information on internet needs more efficient and effective methods and techniques for mining and representing information. In this article, we present a methodology for automatic topic annotation of web pages. We describe an algorithm for words disambiguation using an apposite metric for measuring the semantic relatedness and we show a technique which allows to detect the topic of the analysed document using ontologies extracted from a knowledge base. The strategy is implemented in a system where these information are used to build a topic hierarchy automatically created and not a priori defined for classifying web pages. Experimental results are presented and discussed in order to measure the effectiveness of our approach.

Keywords: knowledge engineering; document analysis; text processing; semantic networks; ontologies; web page classification; natural language processing; NLP; hierarchical classification; topic annotation; subject annotation; websites.

DOI: 10.1504/IJKWI.2011.045162

International Journal of Knowledge and Web Intelligence, 2011 Vol.2 No.4, pp.231 - 256

Published online: 07 Mar 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article