Improve feature selection method of web page language identification using fuzzy ARTMAP
by Choon-Ching Ng, Ali Selamat
International Journal of Intelligent Information and Database Systems (IJIIDS), Vol. 4, No. 6, 2010

Abstract: The information available in languages other than English on the World Wide Web and global information systems is increasing significantly. Different languages can be produced by using one particular script such as Arabic, Persian, Urdu and Pashto that use Arabic script letters. The issue is how to produce reliable features of a web page that is to undergo language identification. Incorrectly identifying the language results in garbled translations as well as faulty and incomplete analyses. The aim of this study is to enhance the effectiveness of feature selection method of web page language identification. We have investigated total N-grams, N-grams frequency, N-grams frequency document frequency, and N-grams frequency inverse document frequency of web page language identification. From the experimental results, it is proven that N-grams frequency gives the most promising result compared to other feature selection methods.

Online publication date: Mon, 15-Nov-2010

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Intelligent Information and Database Systems (IJIIDS):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com