Article: Evaluating attainable memory bandwidth of parallel programming models via BabelStream Journal: International Journal of Computational Science and Engineering (IJCSE) 2018 Vol.17 No.3 pp.247 - 262 Abstract: Many scientific codes consist of memory bandwidth bound kernels. One major advantage of many-core devices such as general purpose graphics processing units (GPGPUs) and the Intel Xeon Phi is their focus on providing increased memory bandwidth over traditional CPU architectures. Peak memory bandwidth is usually unachievable in practice and so benchmarks are required to measure a practical upper bound on expected performance. We augment the standard STREAM kernels with a dot product kernel to investigate the performance of simple reduction operations on large arrays. The choice of programming model should ideally not limit the achievable performance on a device. BabelStream (formally GPU-STREAM) has been updated to incorporate a wide variety of the latest parallel programming models, all implementing the same parallel scheme. As such this tool can be used as a kind of Rosetta Stone which provides both a cross-platform and cross-programming model array of results of achievable memory bandwidth. Inderscience Publishers - linking academia, business and industry through research

Title: Evaluating attainable memory bandwidth of parallel programming models via BabelStream

Authors: Tom Deakin; James Price; Matt Martineau; Simon McIntosh-Smith

Addresses: Department of Computer Science, University of Bristol, Bristol, UK ' Department of Computer Science, University of Bristol, Bristol, UK ' Department of Computer Science, University of Bristol, Bristol, UK ' Department of Computer Science, University of Bristol, Bristol, UK

Abstract: Many scientific codes consist of memory bandwidth bound kernels. One major advantage of many-core devices such as general purpose graphics processing units (GPGPUs) and the Intel Xeon Phi is their focus on providing increased memory bandwidth over traditional CPU architectures. Peak memory bandwidth is usually unachievable in practice and so benchmarks are required to measure a practical upper bound on expected performance. We augment the standard STREAM kernels with a dot product kernel to investigate the performance of simple reduction operations on large arrays. The choice of programming model should ideally not limit the achievable performance on a device. BabelStream (formally GPU-STREAM) has been updated to incorporate a wide variety of the latest parallel programming models, all implementing the same parallel scheme. As such this tool can be used as a kind of Rosetta Stone which provides both a cross-platform and cross-programming model array of results of achievable memory bandwidth.

Keywords: performance portability; many-core; parallel programming models; memory bandwidth benchmark.

DOI: 10.1504/IJCSE.2018.095847

International Journal of Computational Science and Engineering, 2018 Vol.17 No.3, pp.247 - 262

Received: 17 Jan 2017
Accepted: 10 Apr 2017
Published online: 25 Oct 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Evaluating attainable memory bandwidth of parallel programming models via BabelStream

Keep up-to-date