A parallel ACO algorithm to select terms to categorise longer documents
by M. Janaki Meena; K.R. Chandran; A. Karthik; A. Vijay Samuel
International Journal of Computational Science and Engineering (IJCSE), Vol. 6, No. 4, 2011

Abstract: Text categorisation (TC) is the task of assigning predefined categories to text. The primary step in TC is to transform documents into a representation suitable for machine learning algorithms. Bag of Words is the most popular document representation. Most of the machine learning algorithms are sensitive to the features fed into it and are misled by the high dimensionality of text. Feature selection (FS) is an important preprocessing step to remove redundant and irrelevant terms in the training corpus. This paper proposes an ant colony optimization (ACO) algorithm to select features for categorizing longer documents whose categories are closely related. Heuristic value for each word is computed by the statistical dependency of the term to a category and its compactness value. Compactness of a term indicates its spread in a document. Experiments were conducted with documents from 20 newsgroup and Reuters-21578 benchmarks. The selected features were fed into the naïve Bayes classifier and its performance was analysed. It was observed that the performance of the classifier improves with the features selected by the proposed method. The processes involved in algorithm are time intensive and demands parallelism. Hence the ACO algorithm was parallelised using the MapReduce programming model. The parallel algorithm was implemented and tested with a cluster of six machines formed using Hadoop.

Online publication date: Sat, 21-Mar-2015

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Computational Science and Engineering (IJCSE):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com