Title: Short text classification using feature enrichment from credible texts

Authors: Issa M. Alsmadi; Keng Hoon Gan

Addresses: School of Computer Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia ' School of Computer Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia

Abstract: Classifying Tweet's contents can become a useful feature for other application tasks. However, such classification can be quite challenging due to the short length and sparsity of tweet contents. Although individual tweets have limited length, their contents delve into different topics. Therefore, due to such diverse contents, achieving good coverage of content features remains a challenge. We adopt the expansion of keywords technique in this research and study the enrichment of tweet contents using text from credible sources, such as news sites. For evaluation, we conduct experiments on two Twitter datasets using four standard classifiers. The proposed approach has enhanced the performance of the classification task, with improvements in accuracy ranging from +0.05% to +3.54% for both datasets. Experimental results positively demonstrate that the proposed feature enrichment method can overcome the sparseness limitation of short text with improved classification performances when running on various classifiers.

Keywords: short text; classification; social media; Twitter; enrichment; feature selection.

DOI: 10.1504/IJWET.2020.107689

International Journal of Web Engineering and Technology, 2020 Vol.15 No.1, pp.59 - 80

Published online: 08 Jun 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article