Authors: Arun Kumar Parakh; M. Balakrishnan; Kolin Paul
Addresses: Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India ' Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India ' Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India
Abstract: Applications need specific or custom optimisations to completely exploit the compute capabilities of the underlying hardware. This is often a very tedious task for the programmer. Moreover, many of these applications do not scale well with data size. The Map-Reduce (MR) framework provides a high level of abstraction to map these applications onto the distributed/parallel architectures, but with a large performance penalty. We analyse a state-of-the-art MR framework to assess its performance penalty. The primary objective of this work is to reduce the performance gap between MR and native compute unified device architecture (CUDA) implementation of the applications (onlyCUDA). This work reports deployment of three applications on graphics processor units (GPUs) using MR framework. We study the performance of these applications on modern GPUs with different cache configurations. The results show that the performance of the applications with MR framework does not decline much if the reconfigurable cache of modern GPUs is utilised properly. We show penalty reduction of 5×, 6.45× and 15.87× for SmithWaterman (SW) algorithm, N-body (NB) simulation, and Blowfish (BF) algorithm, respectively.
Keywords: Map-Reduce; onlyCUDA; CUDA; penalty; performance; SmithWaterman; N-Body; Blowfish; graphics processor units; GPUs; CPU; cache; speed; simulation.
International Journal of High Performance Systems Architecture, 2015 Vol.5 No.3, pp.166 - 177
Available online: 04 Jul 2015 *Full-text access for editors Access for subscribers Purchase this article Comment on this article