Title: A survey of term weighting schemes for text classification

Authors: Abdullah Alsaeedi

Addresses: Department of Computer Science, College of Computer Science and Engineering (CCSE), Taibah University, Medina, Saudi Arabia

Abstract: Text document classification approaches are designed to categorise documents into predefined classes. These approaches have two main components: document representation models and term-weighting methods. The high dimensionality of feature space has always been a major problem in text classification methods. To resolve high dimensionality issues and to improve the accuracy of text classification, various feature selection approaches were presented in the literature. Besides which, several term-weighting schemes were introduced that can be utilised for feature selection methods. This work surveys and investigates various term (feature) weighting approaches that have been presented in the text classification context.

Keywords: document frequency; supervised term weighting; text classification; unsupervised term weighting.

DOI: 10.1504/IJDMMM.2020.106741

International Journal of Data Mining, Modelling and Management, 2020 Vol.12 No.2, pp.237 - 254

Received: 10 Oct 2018
Accepted: 15 Mar 2019

Published online: 20 Apr 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article