Title: From user generated content to social data warehouse: processes, operations and data modelling

Authors: Afef Walha; Faiza Ghozzi; Faiez Gargouri

Addresses: MIRACL Laboratory, Higher Institute of Computer Science and Multimedia, Sfax University, Sfax, Tunisia ' MIRACL Laboratory, Higher Institute of Computer Science and Multimedia, Sfax University, Sfax, Tunisia ' MIRACL Laboratory, Higher Institute of Computer Science and Multimedia, Sfax University, Sfax, Tunisia

Abstract: Social data warehouse (SDW) combines corporate data with user-generated content (UGC) to improve decision maker analysis. UGC data are heterogeneous, unstructured and informal. Their mapping into meaningful and valuable information has recently become a hot topic in social business intelligence. It is established through specific extraction, transformation and loading (ETL) processes during SDW development. Our main focus, in this work, is on ETL design and the issues emerging when UGC semantic analysis is integrated into SDW modelling. In fact, the complexity of ETL modelling is managed by partitioning its aspects into processes, operations and data. Besides, ETL4Social architecture is organised in three layers: meta-modelling, modelling and instantiation. The proposed meta-models concentrate both on ETL4Social concepts and notations. Their accuracy is shown through an illustrative example entailing generic models mapping UGC into SDW. These models are implemented in ETL4SocialTool, helping the designer to model complex ETL scenario.

Keywords: social media; data warehouse; user-generated content; UGC; extraction; transformation and loading; ETL design; meta-model; model; business process modelling and notation; BPMN; data flow; data process; semantic analysis.

DOI: 10.1504/IJWET.2019.105589

International Journal of Web Engineering and Technology, 2019 Vol.14 No.3, pp.203 - 230

Published online: 05 Mar 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article