CNN-based text multi-classifier using filters initialised by N-gram vector
by Yan Xiang; Ying Xu; Zhengtao Yu; Dangguo Shao; Hongbin Wang; Yantuan Xian
International Journal of Information and Communication Technology (IJICT), Vol. 15, No. 4, 2019

Abstract: Text classification based on convolutional neural networks (CNN) has got more attention recently. This paper presents an improved CNN-based text multi-classifier. First, word vector training is performed on the corpus to be classified. Then, the most important N-grams for a particular category are selected and clustered into different groups. Finally the centroid vectors of different groups are used to initialise the centre weights of filters. Initialisation weights enable CNN to extract N-gram features more effectively and ultimately improve text classification results. Multi-classification experiments using multiple advanced models were performed on different data sets. Experiments show that the proposed model is more accurate and stable than other baseline models.

Online publication date: Tue, 22-Oct-2019

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Information and Communication Technology (IJICT):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com