You can view the full text of this article for free using the link below.

Title: Data warehouse ETL+Q auto-scale framework

Authors: Pedro Martins; Maryam Abbasi; Pedro Furtado

Addresses: Department of Informatics, Faculty of Sciences and Technology, University of Coimbra, Portugal ' Department of Informatics, Faculty of Sciences and Technology, University of Coimbra, Portugal ' Department of Informatics, Faculty of Sciences and Technology, University of Coimbra, Portugal

Abstract: In this paper, we investigate the problem of providing scalability (out and in) to extraction transformation load (ETL) and querying (Q) (ETL+Q) process of data warehouses. In general, data loading, transformation and integration are heavy tasks that are performed only periodically, instead of row by row. Parallel architectures and mechanisms are able to optimise the ETL process by speeding-up each part of the pipeline process as more performance is needed. We propose parallelisation solutions, called AScale, for each part of the ETL+Q, that is, an approach that enables the automatic scalability and freshness of any data warehouse and ETL+Q process. Our results show that the proposed system algorithms can handle scalablity to provide the desired processing speed.

Keywords: data warehousing; scalability; freshness; processing speed; performance; parallel processing; distributed systems; parallelisation; load balancing; extraction transformation load; ETL; querying; data warehouses.

DOI: 10.1504/IJBISE.2016.081592

International Journal of Business Intelligence and Systems Engineering, 2016 Vol.1 No.1, pp.49 - 76

Available online: 16 Jan 2017 *

Full-text access for editors Access for subscribers Free access Comment on this article