Authors: Abdullah Alsaeedi
Addresses: Department of Computer Science, College of Computer Science and Engineering (CCSE), Taibah University, Medina, Saudi Arabia
Abstract: Text document classification approaches are designed to categorise documents into predefined classes. These approaches have two main components: document representation models and term-weighting methods. The high dimensionality of feature space has always been a major problem in text classification methods. To resolve high dimensionality issues and to improve the accuracy of text classification, various feature selection approaches were presented in the literature. Besides which, several term-weighting schemes were introduced that can be utilised for feature selection methods. This work surveys and investigates various term (feature) weighting approaches that have been presented in the text classification context.
Keywords: document frequency; supervised term weighting; text classification; unsupervised term weighting.
International Journal of Data Mining, Modelling and Management, 2020 Vol.12 No.2, pp.237 - 254
Received: 10 Oct 2018
Accepted: 15 Mar 2019
Published online: 20 Apr 2020 *