Title: Sentiment classification using unlabelled data with emoticon classification

Authors: S. Surya Kumari; G. Anjan Babu

Addresses: Department of Computer Science, S.V. University, Tirupati, AP, 517502, India ' Department of Computer Science, S.V. University, Tirupati, AP, 517502, India

Abstract: Sentiment analysis is the part of opinion mining used to discover the variations of user mood. Generally sentiment analysis deals with feature extraction and sentiment classification, most of the analysis is done by using text mining, mostly training classifiers on labelled data. Emoticon reactions become a major means of communication in social media, where they express the emotions and provide non-verbal communication. This paper propose a classifier making use of emoticons and unsupervised learning, namely K-means clustering, to provide sentiment analysis in an automated matter. The proposed method is trained using data with emoticon expressions collected from Facebook and evaluated on six different sentiment analysis datasets. Accuracy and ARI metrics are used for evaluation and findings are positive: the classifier outperforms K-means clustering and sentistrength2 algorithm in accuracy and training time is correlated to emoticons instead of text features, which is an order less.

Keywords: sentiment emoticon clustering algorithm; SECA; adjusted rand index; ARI; tokenisation; lemmatisation; TF-IDF; sentistrength2; k-means; pre-processing; sentiment classification; clustering.

DOI: 10.1504/IJKESDP.2020.112616

International Journal of Knowledge Engineering and Soft Data Paradigms, 2020 Vol.7 No.1, pp.1 - 13

Received: 07 Jan 2019
Accepted: 03 Jul 2019

Published online: 25 Jan 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article