Title: Scalable big data modelling
Authors: Jayesh Patel
Addresses: Rockstar Games, San Diego, California, USA
Abstract: In the information age, data integration has become easier than ever. Enterprises integrate a wide range of data sources to enrich big data lakes. Enterprise big data lake made data consumption simpler and faster for all stakeholders. Often, stakeholders face challenges to limit data that they need for analysis and making effective decisions. As more data from ever-growing data sources is coming in, users are flooded with a variety of data. Data models alleviated the pain to serve insights to enterprise users. Data models provided insights after data cleansing, aggregating, and applying business rules. As data models in big data grow, queries and analysis require processing the large volume of data and big joins. It leads to long response and processing times. Data modelling in big data platforms needs attention to effectively cleanse, organise, and store big data to ensure timely availability of enterprise insights. As the scale is a critical aspect of the big data platform, big data should be modelled in a way that accessibility and delivery of insights should not be affected when the scale goes up. This paper presents best practices to model structured and semi-structured data in the big data platform.
Keywords: enterprise big data models; scalable modelling; big data lake; dimensional models; big joins; Hadoop; Spark.
International Journal of Big Data Intelligence, 2020 Vol.7 No.4, pp.194 - 201
Received: 02 Mar 2020
Accepted: 15 Sep 2020
Published online: 15 Mar 2021 *