Enhanced and combined centroid-based approach for multi-label genre classification of web pages
by Chaker Jebari
International Journal of Metaheuristics (IJMHEUR), Vol. 4, No. 3/4, 2015

Abstract: This paper proposes an enhanced and combined centroid-based approach to classify web pages by genre. To deal with the complexity of web pages, the proposed approach implements a multi-label classification scheme in which a web page can be affected to more than one genre. In addition, it implements an incremental classification to handle the rapid evolution of web genres. In this classification, web pages are classified one by one, according to the similarity between the new page and each genre centroid, our approach either adjusts the genre centroid or considers the new page as noise page and discards it. Moreover, our approach combines three homogenous and centroid-based classifiers: contextual, logical and hyper link classifiers. These classifiers exploit the character n-grams extracted from different sources which are URL, title, headings and anchors. Experiments conducted using a known multi-label corpus showing that our approach is very fast and outperforms many other multi-label classifiers.

Online publication date: Fri, 29-Jan-2016

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Metaheuristics (IJMHEUR):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com