Title: A platform for big data analytics on distributed scale-out storage system

Authors: Kyar Nyo Aye; Thandar Thein

Addresses: Software Department, Computer University (Thaton), The Union of Myanmar ' Hardward Department, University of Computer Studies (Yangon), The Union of Myanmar

Abstract: Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information. Hadoop-based platform emerges to deal with big data. In Hadoop NameNode is used to store metadata in a single system's memory, which is a performance bottleneck for scale-out. Gluster file system has no performance bottlenecks related to metadata. To achieve massive performance, scalability and fault tolerance for big data analytics, a big data platform is proposed. The proposed big data platform consists of big data storage and big data processing. The Hadoop big data platform and the proposed big data platform are implemented on commodity Linux virtual machines clusters and performance evaluations are conducted. According to the evaluation analysis, the proposed big data platform provides better scalability, fault tolerance, and faster query response time than the Hadoop platform.

Keywords: big data analytics; big data platforms; Hadoop MapReduce; Gluster file system; Apache Pig; Apache Hive; Jaql; distributed storage systems; scale-out storage systems; metadata storage; big data storage; big data processing; performance evaluation; scalability; fault tolerance; query response time.

DOI: 10.1504/IJBDI.2015.069088

International Journal of Big Data Intelligence, 2015 Vol.2 No.2, pp.127 - 141

Received: 07 Oct 2014
Accepted: 04 Dec 2014

Published online: 09 May 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article