Title: Lexicon-based Twitter sentiment analysis for vote share prediction using emoji and N-gram features

Authors: Barkha Bansal; Sangeet Srivastava

Addresses: Department of Applied Sciences, The NorthCap University, Gurugram, 122017, India ' Department of Applied Sciences, The NorthCap University, Gurugram, 122017, India

Abstract: Recently, Twitter sentiment analysis (TSA) has been successfully employed to monitor and forecast elections in many studies. However, most of the existing studies rely on extracting sentiments from explicit textual features. Moreover, only few studies have included non-textual features such as emojis for election forecasts. In this study, we incorporated N-gram features to predict vote shares of 2017 Uttar Pradesh (UP) legislative elections. Also, sentiment distribution of tweets containing emojis was significantly different from tweets without emojis. Therefore, emoji sentiments were detected and incorporated to predict the vote shares. We collected more than 0.3 million tweets, wherein geo-tagging was applied on search keywords that were not exclusive to elections. We employed seven lexicons for labelling tweets and compared two methods to reduce prediction error: sentiment magnitude-based criteria and polarity of tweets. Results show that proposed method of incorporating N-gram features and emoji sentiments significantly decreases prediction error.

Keywords: Twitter sentiment analysis; TSA; emoji sentiment; geo-tag; electoral forecast; lexicon; N-grams.

DOI: 10.1504/IJWBC.2019.098693

International Journal of Web Based Communities, 2019 Vol.15 No.1, pp.85 - 99

Received: 23 Sep 2018
Accepted: 11 Oct 2018

Published online: 29 Mar 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article