Authors: Mohamed Labidi; Oleg Lodygensky; Gilles Fedak; Maher Khemakhem; Mohamed Jemni
Addresses: LaTICE, University of Tunis, Tunis, Tunisia ' IN2P3, University of Paris XI, France ' INRIA, University of Lyon, France ' Computer Science Department, Faculty of Computing and Information Technology, King Abdul-Aziz University, KSA ' LaTICE, University of Tunis, Tunis, Tunisia
Abstract: With the emergence of big data, data scheduling is becoming an important field of research in distributed computing. Software data scheduler often relies on data management policies that can be defined by the user and provide high level features. Such advanced features become necessary nowadays to execute data intensive applications, and this implies that data and task schedulers should cooperate closely to address the large data processing issue and ensure an optimal distribution of data intensive applications. In this paper, we propose XtremDew, the data and task cooperative scheduler platform. We deal with the distribution of the optical character recognition (OCR) on large scale. We show, in particular, the benefit of the focus on data scheduling to distribute our OCR application. We build the data driven distributing platform by combining two existing middleware: BitDew, as the data scheduler, and XtremWeb-HEP, as the task scheduler. Taking advantage of both middlewares, XtremDew provides new features. To evaluate the efficiency of our approach, we compare different strategies of scheduling tasks and data and we present several scenarios that illustrate the benefits of using XtremDew to execute data-intensive applications.
Keywords: big data; data intensive application; cooperative middleware; big data processing.
International Journal of High Performance Computing and Networking, 2020 Vol.16 No.1, pp.55 - 66
Received: 08 Jan 2020
Accepted: 09 May 2020
Published online: 28 Sep 2020 *