Authors: Pedro Martins; Maryam Abbasi; Pedro Furtado
Addresses: Informatics Department, University of Coimbra, Polo 2, Portugal ' Informatics Department, University of Coimbra, Polo 2, Portugal ' Informatics Department, University of Coimbra, Polo 2, Portugal
Abstract: We investigate the problem of providing automatic scalability and data freshness to data warehouses, at the same time dealing with high-rate data efficiently. In general, data freshness is not guaranteed in those contexts, since data loading, transformation and integration are heavy tasks that are performed only periodically, instead of row by row. Many current data warehouses are designed to be deployed and work in a single server. However, for many applications, problems related to data volume processing times, data rates and requirements for fresh and fast responses, require new solutions to be found. The solution is to use/build parallel architectures and mechanisms to speed-up data integration and to handle fresh data efficiently. We propose a universal data warehouse parallelisation solution, that is, an approach that enables the automatic scalability and freshness of any data warehouse and ETL process. Our results show that the proposed system can handle scalablity to provide the desired processing speed and data freshness.
Keywords: scalability evaluation; ETL; data freshness; high-rate data; performance evaluation; scale-out; scale-in; data warehouses; data warehouse parallelisation; processing speed; big data.
International Journal of Business Process Integration and Management, 2015 Vol.7 No.4, pp.300 - 313
Published online: 15 Dec 2015 *Full-text access for editors Access for subscribers Purchase this article Comment on this article