Authors: Monalisha Ghosh; Goutam Sanyal
Addresses: National Institute of Technology Durapur, Mahatma Gandhi Rd, A-Zone, Durgapur, West Bengal 713209, India ' Computer Science and Engineering Department, National Institute of Technology Durapur, Mahatma Gandhi Rd, A-Zone, Durgapur, West Bengal 713209, India
Abstract: Researches on sentiment analysis are growing to a great extent and attracting wide ranges of attention from academics and industries as well. Feature generation and selection are consequent for text mining as the high dimensional feature set can affect the performance of sentiment analysis. This paper exhibits the efficacy of the proposed combined feature selection technique on machine learning classification algorithms over their individual usefulness. Initially, we transform the review datasets into the feature vector of unigram features along with bi-tagged features based on POS pattern. Next, information gain (IG), Chi squared (χ2) and minimum redundancy maximum relevancy (mRMR) feature selection methods are applied to obtain an optimal feature subset for further functionality. These features are then given input to multiple machine learning classifiers, namely, support vector machine (SVM), multinomial Naïve Bayes (MNB), Bernoulli Naïve Bayes (BNB) and logistic regression (LR) on multi domain product review datasets. The performance of the algorithm is measured by evaluation methods such as precision, recall, and F-measure. Experimental results show that the feature selection method mRMR with SVM achieved a better accuracy of 91.39, which is encouraging and comparable to the related research.
Keywords: sentiment analysis; opinion mining; text classification; feature selection method; machine learning algorithms optimal feature vector.
International Journal of Data Mining, Modelling and Management, 2019 Vol.11 No.4, pp.391 - 416
Available online: 06 Sep 2019 *Full-text access for editors Access for subscribers Purchase this article Comment on this article