Title: A comparison of text classification methods using different stemming techniques

Authors: Mariem Bounabi; Karim El Moutaouakil; Khalid Satori

Addresses: Computer Sciences, Imaging and Numerical Analysis Laboratory (LIIAN), USMBA University Fes, Fez City, Morocco ' Hoceima National School of Applied Sciences (ENSAH), Mohammed First University, Al-Hoceima, Morocco ' Computer sciences, Imaging and Numerical Analysis Laboratory (LIIAN), USMBA University Fes, Fez City, Morocco

Abstract: In the retrieval of information, two factors have an important impact on the performance of systems: the extract features and the matching process. In this work, we compare three well-known stemming techniques: Lovins stemmer, iterated Lovins and snowball stemmer. Concerning the classification phase, we compare, experimentally, six methods: BNET, NBMU, CNB, RF, SLogicF, and SVM. Basing on this comparison, we propose a new retrieval system by calling the voting method, as a matching tool, to improve the performance of the classical systems. In this paper, we use the TF-IDF algorithm to extract features. The envisaged systems are tested on two databases: BBCNEWS and BBCSPORT. The systems based on Lovins stemmers and on the voting technique give the best results. In fact, for the first databases, the best accuracy observed is for the system Lovins + Vote with a recognition rate of 97%. Concerning the second database, the system snowball +Vote gives us 99% as recognition rate.

Keywords: NBMU; SVM; RF; NB; SLogiF; CNB; voting technique; classification; stemmer; term-weighting.

DOI: 10.1504/IJCAT.2019.101171

International Journal of Computer Applications in Technology, 2019 Vol.60 No.4, pp.298 - 306

Received: 31 May 2017
Accepted: 20 Feb 2018

Published online: 26 Jul 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article