Detecting spam web pages using multilayer extreme learning machine
by Rajendra Kumar Roul
International Journal of Big Data Intelligence (IJBDI), Vol. 5, No. 1/2, 2018

Abstract: Web spamming generally increases the ranking of some unimportant pages higher in the search results. Detecting and eliminating such spam pages are the need of the day, which mislead the search engine to obtain high-quality information. Aiming in this direction, this study focuses on two important aspects of machine learning. First, it proposes a new content-based spam detection technique which identifies nine important features that help to detect a page is either spam or non-spam. Each feature has an associated value which is calculated by parsing the documents and then performing the require techniques i.e. necessary steps to compute its score. These nine important features along with the class label (spam or non-spam) generate a feature vector for training the classifiers in order to detect the spam pages. Secondly, it highlights the importance of deep learning using multilayer extreme learning machine in the field of spam page detection. For experimental work, two benchmark datasets (WEBSPAM-UK2002 and WEBSPAM-UK2006) have been used and the results using multilayer ELM are found to be more promising compared to other established classifiers.

Online publication date: Fri, 01-Dec-2017

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Big Data Intelligence (IJBDI):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com