Title: User-generated content data analysis using machine learning methods: a case study in Bangkok, Thailand

Authors: Naragain Phumchusri; Naina Chugh

Addresses: Department of Industrial Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand ' Department of Industrial Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand

Abstract: With the travel and tourism (T&T) sector being the backbone to the global economy and the sector becoming more saturated and competitive, insights on T&T are vital now, more than ever. The rise of social media and user-generated contents has effectuated the opportunity for a systematic analysis of tourist preferences via user-generated content. The objective of this paper is to obtain insights on tourist preferences and tourism trends in Bangkok, Thailand through user-generated content scraped from TripAdvisor's online reviews of tours and activities. In order to develop insights on tourist preferences and tourism trends in Bangkok, various analyses are implemented, including sentiment analysis to gather tourist point-of view, association rules mining to find patterns of preferences, and natural language processing along with text frequency analysis to understand what features tourists are most frequently talking about. This paper also proposes machine learning prediction models using logistic regression, support vector machine and random forest algorithm to forecast 5-star ratings of reviews – with the goal to identify factors significantly affecting positive sentiments on tours and activities.

Keywords: user-generated content; TripAdvisor; sentiment analysis; machine learning; data analysis; Bangkok; Thailand.

DOI: 10.1504/IJBDA.2022.124054

International Journal of Business and Data Analytics, 2022 Vol.2 No.1, pp.72 - 109

Received: 28 Apr 2021
Accepted: 07 Jan 2022

Published online: 11 Jul 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article