Title: Sentiment classification of Chinese online reviews: analysing and improving supervised machine learning

Authors: Pei Yin; Hongwei Wang; Lijuan Zheng

Addresses: Department of Management Science and Engineering, School of Economics and Management, Tongji University, 1239 Siping Road, Shanghai 200092, China. ' Department of Management Science and Engineering, School of Economics and Management, Tongji University, 1239 Siping Road, Shanghai 200092, China. ' Department of Management Science and Engineering, School of Economics and Management, Tongji University, 1239 Siping Road, Shanghai 200092, China

Abstract: With the boost of online reviews, a large quantity of consumers' opinions on certain products and services are generated and spread over the internet, thus techniques of sentiment classification for online reviews rise in response to the requirement of retrieving valuable information. This paper is mainly focused on improving sentiment classification of Chinese online reviews through analysing and improving each step in supervised machine learning. At first, adjectives, adverbs, and verbs are selected as the initial text features. Then, three statistic methods (DF, IG and CHI) are utilised to extract features. At last, a Boolean method is applied to set weight to features and a support vector machine (SVM) is employed as the classifier. Several comparative experiments have been conducted on reviews of two domains: mobile phone (product) reviews and hotel (service) reviews. The experimental results indicate that part of speech (POS), the number of features, evaluation domain, feature extraction algorithm and kernel function of SVM have great influences on sentiment classification, while the number of training corpora has a little impact. In addition, further improvements of DF IG and CHI have been made, which demonstrate the theoretical significance and the practical value of this research.

Keywords: sentiment classification; feature extraction; Chinese online reviews; supervised machine learning; web technology; customer opinion; information retrieval; statistical methods; Boolean method; support vector machines; SVM; China; mobile phone reviews; hotel reviews; mobile phones; product reviews; service reviews; cell phones; hotels; web reviews; consumer reviews.

DOI: 10.1504/IJWET.2012.050968

International Journal of Web Engineering and Technology, 2012 Vol.7 No.4, pp.381 - 398

Available online: 11 Dec 2012

Full-text access for editors Access for subscribers Purchase this article Comment on this article