International Journal of Cloud Computing (14 papers in press)
Task Scheduling and Virtual Resource Optimizing in Hadoop YARN-based Cloud Computing Environment
by Frederic Nzanywayingoma
Abstract: we are living in the data world where a high volume of data is changing the way things used to be in traditional IT industry. Big Data is being generated everywhere around us at all times by cameras, mobile devices, sensors, and software logs with large amount of data in units of hundreds of terabytes to petabytes. Therefore, to analyze these massive data, new skills, intensive applications and storage clusters are needed. Apache Hadoop is one of the most recently popular tools developed for big data processing. It has been deployed by many giant companies to stream large files in big datasets. The main purpose in this paper is to analyze different scheduling algorithms that can help to achieve better performance, efficiency and reliability of Hadoop YARN environment. We describe some task schedulers which consider different levels of Hadoop such as FIFO (First In First Out) scheduler, fair scheduler, delay scheduler, deadline constraint scheduler, dynamic priority scheduling, capacity scheduler, and we analyze the performance of these widely used Hadoop task schedulers based on the following elements: makespan; turnaround time; and throughput. A reliable scheduling algorithm is suggested which can work efficiently in Hadoop environments. To conclude this paper, the experimental results were given.
Keywords: Hadoop; MapReduce; Task Scheduling; YARN; HDFS; JobTracker; TaskTracker.
A Survey of Scheduling Frameworks in Big Data Systems
by Ji LIU, Esther Pacitti, Patrick Valduriez
Abstract: Cloud and big data technologies are now converging to enable organizations to outsource data in the cloud and get value from data through big data analytics. Big data systems typically exploit computer clusters to gain scalability and obtain a good cost-performance ratio. However, scheduling a workload in a computer cluster remains a well-known open problem. Scheduling methods are typically implemented in a scheduling framework and may have different objectives. In this paper, we survey scheduling methods and frameworks for big data systems, propose a taxonomy and analyze the features of the different categories of scheduling frameworks. These frameworks have been designed initially for the cloud (MapReduce) to process Web data. We examine sixteen popular scheduling frameworks and discuss their features. Our study shows that different frameworks are proposed for different big data systems, different scales of computer clusters and different objectives. We propose the main dimensions for workloads and metrics for benchmarks to evaluate these scheduling frameworks. Finally, we analyze their limitations and propose new research directions.
Keywords: Big data; cloud computing; cluster computing; parallel processing; scheduling method; scheduling framework.
Optimal Cloud Resource Provisioning for Auto-scaling Enterprise Applications
by Satish Srirama, Alireza Ostovar
Abstract: Auto-scaling enterprise/workflow systems on cloud needs to deal with both the scaling policy, which determines "when to scale" and the resource provisioning policy, which determines "how to scale". This paper presents a novel resource provisioning policy that can find the most cost optimal setup of variety of instances of cloud that can fulfill incoming workload. All major factors involved in resource amount estimation such as processing power, periodic cost and configuration cost of each instance type, lifetime of each running instance, and capacity of clouds are considered in the model. Benchmark experiments were conducted on Amazon cloud and were matched with Amazon AutoScale, using a real load trace and through two main control flow components of enterprise applications, AND and XOR. The experiments showed that the model is plausible for auto-scaling any web/services based enterprise workflow/application on the cloud, along with the effect of individual parameters on the optimal policy.
Keywords: Cloud computing; auto-scaling; enterprise applications; resource provisioning; optimization; control flows.
A Game Theory-based Design of Incentive Schemes for Social Cloud Services
by Eric Kuada
Abstract: This paper presents the design of incentive schemes that encourage the contribution of resources to the Opportunistic Cloud Services (OCS) social cloud platform as well as the efficient usage of such resources. A game theoretic approach has been employed to model and design the incentive schemes with two game models presented. The existence of a pure strategy Nash equilibrium for both the cooperative and non-cooperative games has been proven. Three base incentive schemes have also been presented. These schemes are the Dominant Strategy Scheme, Equi-Profit Scheme, and Dominant Equi-Profit Scheme. Evaluation of the incentive schemes are performed and conclusion made that the schemes meet the desired properties of budget-balance, ex-post individual rationality, incentive compatibility, allocative efficiency, robustness, and flexibility to accommodate changing user behavior on the OCS social cloud services platform.
Keywords: opportunistic cloud services; social cloud; game theory; incentive mechanisms; social network; cloud computing; system modeling; game models; mechanism design; resource allocation.
Genetic and static algorithm for task scheduling in cloud computing
by Jocksam Gonçalves De Matos, Carlos Heitor Pereira Liberalino, Carla Katarina De Monteiro Marques
Abstract: Technological advancement has required ever more computing resources. In this context the cloud computing emerges as a newparadigm to meet this demand, though its resources are physically limited due to the growing data traffic that the system may be subject. The task scheduling aims to distribute tasks in order to make them more efficient in the use of computing resources. Thus, this paper aims to propose a solution to the task scheduling problem in cloud computing in order to reduce the processing time of the tasks and the number of virtual machines. This algorithm was designed from heuristic solution with the aid of a static algorithm. The proposed algorithm was mainly inspired by the set partitioning problem that aims to reduce the number of virtual machines. The metaheuristic genetic algorithm was used in the first stage of the algorithm, in order to reduce the processing time of the tasks. The static algorithm is designed to solve the set partitioning problem. Their performance was compared with two algorithms, classic and heuristic. The CloudSim, a computer simulator in the cloud that has characteristics and attributes of a real cloud was used as a way to evaluate the proposed algorithm, along with realistic workloads in experiments that showed the algorithms behavior under different conditions of use.
Keywords: distributed computing; cloud computing; scheduling; metaheuristic.
Review of Remote Data Integrity Auditing Schemes in Cloud Computing: Taxonomy, Analysis, and Open Issues
by JAYA R.A.O. GUDEME, Syam Kumar Pasupuleti, Ramesh Kandukuri
Abstract: Cloud storage provides reliable and resilient storage infrastructure for users to store data remotely based on pay-as-you-go pricing model. Presently, many data owners in academic and business environment are choosing cloud for storing their data in the cloud to save costs. Cloud storage provides many benefits to data owners such as low capital costs, scalability, and access of data from anywhere, anytime, irrespective of location and device. Despite these appealing benefits, storage service brings security challenges such as confidentiality, integrity and availability as outsourced data is not always trustworthy due to loss of physical control and possession over data. One of the primary concern is the integrity of data stored in the cloud. To address the remote data integrity, many researchers have focused on Remote Data Integrity Auditing (RDIA) techniques. In this paper, we give an extensive review of remote data integrity auditing techniques in the cloud computing. In our review, we present a thematic taxonomy of remote data integrity auditing techniques, investigate similarities and differences, and finally discuss critical issues to be addressed for efficient and secure designing of remote auditing protocols for cloud data storage in future research.
Keywords: Cloud computing; Cloud storage; Integrity; Remote data auditing; Provable Data Possession; PDP; Proof of Retrievability; PoR.
Special Issue on: ICA CON 2016 & 2017 A Collaborative Community of Leaders Cloud Computing in Education
Extreme Value Analysis for Capacity Design
by Szilard Bozoki, Andras Pataricza
Abstract: Cloud computing has become the fundamental platform for service offerings. Such services frequently face peaks in their variable workload. Thus, the cloudification of critical applications with strict Service Level Agreements (e.g. performability) need a properly engineered capacity to withstand peak loads. A core problem is the prediction of the value of peaks, especially in bursty workloads. They originate in the cumulative effect of hard-to-predict rare and extreme events. Luckily, system monitoring collects enough vital information for a prediction by statistical methods. Extreme value analysis focuses on the prediction of future peaks.
This paper investigates the use of extreme value theory for capacity planning in cloud platforms and services and assesses the technical metrology aspects as well.
Keywords: cloud computing; performability engineering; capacity design; extreme value analysis; Facebook Prophet.
A Formal Model Toward ScientificWorkflow Security in the Cloud
by Donghoon Kim, Mladen Vouk
Abstract: Scientific workflow management systems (SWFMS) may be vulnerable in the Cloud since they may have not embraced practical security solutions yet. This paper presents an approach to formal modeling of scientific workflow security in the Cloud.We focus on the procedure to build secure data flows in a holistic way. This work suggests that a white-list approach to input validation can play a vital in protecting the flows from zero-day attacks.
Keywords: Formal method; security; workflow; security property; input validation; access control; cloud.
A tale of two cloud analytics platforms for education
by Gokul Bhandari
Abstract: In this paper, we compare, using the Gartners business analytics framework, the two most popular cloud analytics platforms currently being used in higher education: IBMs Watson Analytics (WA) and SAPs Lumira Cloud (LC). The Gartner framework enables one to examine an analytics platform from three broad perspectives: people, processes, and platform. Platform capabilities enable us to identify several functional modules which can be used to evaluate the tools perceived usefulness (PU) and perceived ease of use (PEU). Our empirical studies find that WA and LC are similar in terms of their PU and PEU.
Keywords: Cloud analytics; SAP Lumira Cloud; IBM Watson Analytics; Gartner analytics framework.
Why is Garbage Collection causing my Service Level Objectives to fail?
by Panagiotis Patros, Kenneth Kent, Michael Dawson
Abstract: Cloud computing abstracts resources and provides them as-a-Service
to its tenant clients. Platform as a service clouds, which are one of the main
types of cloud computing, provide large parts of the hardware/software stack
to their users. Cloud systems are expected to abide by certain Service Level
Objectives and maintain a certain Quality of Service, which can be impacted by
Garbage Collection (GC). However, cloud benchmarking is mostly focused in
the interconnectivity of cloud services and often neglects the inner workings of
language runtimes. In this paper, we present and evaluate CloudGC, a benchmark
aiming to stress the GC component of a runtime in various and controllable ways.
We then deploy our CloudGC on a cloud system to evaluate the SLO satisfaction
of the four GC policies of the IBM J9 Java runtime. Our findings indicate that the
default policy Gencon generally outperforms the other three policies, including
Balanced, the policy which aims in amortizing the costs.
Keywords: Cloud; Garbage Collection; Service Level Objectives; Benchmarking; Performance Interference; CloudGC.
A Framework for Achieving the Required Degree of Multitenancy Isolation for Deploying Components of a Cloud-hosted Service
by Laud Ochei, Andrei Petrovski, Julian Bass
Abstract: When a cloud offering is provided to multiple users/tenants, multitenancy isolation has to be implemented. While several approaches exist for implementing multitenancy, little attention has been paid to implementing the required degree of isolation since there are varying degrees of isolation that can be implemented for each tenant. This paper presents a framework for achieving the required degree of isolation between tenants accessing a cloud offering so that the required performance, resource utilization and access privilege of one tenant does not affect other tenants when there are workload changes. The framework is composed of two main constituents (i) Component-based approach to Multitenancy Isolation through Request Re-routing (COMITRE), (ii) an optimization model for providing optimal solutions for deploying components of a cloud-hosted service. We demonstrate using a case study of (i) a Cloud-hosted Bug Tracking System and (ii) a synthetic dataset, that the required degree of multitenancy isolation can be achieved, while at the same time providing optimal solutions for deploying components of a cloud-hosted service. We also provide challenges and recommendations for implementing the framework on different layers of the cloud stack.
Keywords: Multitenancy; Degree of Isolation; Cloud-hosted service; Bug Tracking System; Global Software Development tools; Components; Optimal solution; Optimization Model.
Systematic Performance Evaluation Using Component-in-the-Loop Approach
by Imre Kocsis, Attila Klenik, Andras Pataricza, Mikos Telek, Florian Dee, David Cseh
Abstract: Timeliness and throughput critical applications require a framework offering predictable temporal characteristics. The best practice for estimating a prediction of the system dynamics relies on benchmarking, i.e., measuring the reaction of the system under evaluation by applying a representative workload to it. Each novel middleware solution needs such an evaluation as part of the development process to assure an appropriate throughput in the future use.
General purpose Blockchain frameworks are viable replacements for many current systems in several sectors such as finance, healthcare, and IoT by providing a fully distributed, secure, and non-repudiable ledger as a service. Blockchain technologies target domains with a large number of interactions, thus demanding strict performance guarantees in the form of formal Service Level Agreements. Engineering for performance targets in a trustworthy manner requires performance models. However, performance characteristics of Blockchain systems are highly unexplored due to the novelty of the technology.
This paper proposes a general-purpose, systematic methodology for the performance analysis of complex systems, such as Blockchain frameworks. A component-in-the-loop approach aids the identification of throughput bottlenecks, sensitivity analysis, and configuration optimisation. The Linux Foundation-hosted Hyperledger Fabric a pilot reference implementation of a Blockchain framework serves as a case study for the presented methodology.
Keywords: performance evaluation; Blockchain; component-in-the-loop; exploratory data analysis; sensitivity analysis.
SECross: Securing Cross Cloud Boundary
by Xianqing Yu, Mladen Vouk, Young-Hyun Oh
Abstract: Multi-cloud system may be cost-efficient and practical to integrate resources of multiple clouds. However, different clouds are usually managed by different organizations with different security policies and management platforms. When some components of a multi-cloud system are compromised, attackers can potentially have a high privilege that impacts the rest of system.We analyzed the threats to overall system when some components of the multi-cloud system in a public cloud are compromised. We developed a model we call SECross for fine-grain database access policy for SECross components, and the method for users to access computing machines. We analyzed how SECross resists various potential attacks when any of SECross components are compromised.
Keywords: VCL; IaaS; Hybrid Cloud; Cloud Computing; Softlayer; Multi-cloud; Security; Public Cloud; Private Cloud; Security Policy;.
Cloud-based Environment in Support of IoT Education
by Anand Singh, Yannis Viniotis
Abstract: Students taking an IoT curriculum need to acquire skills (among others) in areas as (a) developers of IoT applications, (b) architects of IoT systems, and, (c) administrators of such systems. At North Carolina State University, we have developed a cloud-based environment to support the development of such skills. The environment is based on IBMs Watson IoT Cloud Platform and uses components such as Intels Edison Boards, Raspberry Pis, Cisco IoT gateways, TI boards, sensors/actuators, and GitHub, to give students an end-to-end experience in all aspects of IoT solution and system development. In this paper, we discuss the challenges we faced, how we overcame them, feedback from students and plans for our next steps.
Keywords: IoT systems; Cloud platforms; Edge Computing; Curriculum development.