Title: A comparative study on various pre-processing techniques and deep learning algorithms for text classification

Authors: P. Bhuvaneshwari; A. Nagaraja Rao

Addresses: School of Computer Science and Engineering, VIT, Vellore, India ' School of Computer Science and Engineering, VIT, Vellore, India

Abstract: Pre-processing is the primary technique employed in sentiment analysis and selecting the suitable techniques for the corresponding application can increase the classifier accuracy. It reduces the complexity innate in the raw data which makes the classifier to learn faster and precisely. Despite its importance, the pre-processing in polarity deduction has not attained much attention in the deep learning literature. So, in this paper, 13 popularly used pre-processing techniques are evaluated on three different domain online user review datasets. For evaluating the impact of each pre-processing technique, four deep neural networks are utilised, and they are auto-encoder, convolution neural network (CNN), long short-term memory (LSTM), and bidirectional LSTM (Bi-LSTM). The purpose of this paper is to identify the appropriate pre-processing techniques and the best classifier which achieves higher accuracy. Experimental results of this study show that using appropriate pre-processing techniques can significantly improve the classifier accuracy. Also, it is noted that Bi-LSTM model achieves higher accuracy rate than the remaining neural networks.

Keywords: pre-processing; sentiment analysis; deep learning; auto-encoder; convolution neural network; CNN; long short-term memory; LSTM; bidirectional LSTM; Bi-LSTM.

DOI: 10.1504/IJCC.2022.121076

International Journal of Cloud Computing, 2022 Vol.11 No.1, pp.61 - 78

Received: 20 Jun 2019
Accepted: 01 Oct 2019

Published online: 18 Feb 2022 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article