Title: A unified workflow strategy for analysing large-scale TripAdvisor reviews with BOW model

Authors: Jale Bektaş; Arwa Elsadig

Addresses: School of Applied Technology and Management, Computer Technology and Information Systems, Mersin University, Mersin, 33730, Turkey ' School of Applied Technology and Management, Computer Technology and Information Systems, Mersin University, Mersin, 33730, Turkey

Abstract: Nowadays, firms need to transform customer online reviews data properly into information to achieve goals such as having a competitive edge and improving the quality of service. This paper presents a unified workflow to solve the problems of analysing large-scale data with 710,450 reviews for 1,134 hotels by using text mining methods among the different touristic regions of Turkey. Firstly, a star schema dimensional data mart is built that includes one fact table and two dimensional tables. Then, a series of text mining processes which includes data cleaning, tokenisation, and analysis are applied. Text mining is implemented through standard BOW and the extended BON model. The results show significant findings through this workflow. We propose to build a dimensional model dataset before performing any text mining process, since building such a dataset will optimise the data retrieval process and help to represent the data along with different measures of interest.

Keywords: online TripAdvisor reviews; text mining; big data; N-gram tokenisation; dimensional data mart; data mining; BOW; BON.

DOI: 10.1504/IJBIDM.2022.123801

International Journal of Business Intelligence and Data Mining, 2022 Vol.21 No.1, pp.102 - 117

Received: 11 Sep 2020
Accepted: 24 Dec 2020

Published online: 04 Jul 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article