Title: A new text categorisation strategy: prototype design and experimental analysis

Authors: N. Venkata Sailaja; L. Padma Sree; N. Mangathayaru

Addresses: Department of Computer Science and Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India ' Department of Electronics and Communication Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India ' Department of Information Technology, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India

Abstract: Since a decade, ample amount of text data is being generated through various web sources in online or offline scenarios. This huge amount of data is mainly inconsistent and non-structured format, so hard to process through computing machines available. With the advent of computers and the information age, statistical and analytical problems have also grown both in the size and complexity. Text classification using various machine learning mechanisms encounters the difficulty of the high dimensionality of attributes vector. Therefore, a feature selection technique is very much required to discard irrelevant as well as noisy attributes from the feature set vector so that the ML algorithms can work efficiently. In this paper, a hybrid method is proposed for text documents classification. Further, proposed method's performance is evaluated on standard datasets, i.e., Reuters-21578 and 20 newsgroups. We opted 'bydate' version of the dataset containing 18,941 documents. Through our experiments, we attempted to explore the various performance measures.

Keywords: text classification; rough sets; RS; information retrieval feature selection; machine learning; evaluation; 20 newsgroups; Reuters-21578.

DOI: 10.1504/IJKL.2020.106650

International Journal of Knowledge and Learning, 2020 Vol.13 No.2, pp.146 - 167

Accepted: 14 Feb 2019
Published online: 16 Apr 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article