Authors: Tom Deakin; James Price; Matt Martineau; Simon McIntosh-Smith
Addresses: Department of Computer Science, University of Bristol, Bristol, UK ' Department of Computer Science, University of Bristol, Bristol, UK ' Department of Computer Science, University of Bristol, Bristol, UK ' Department of Computer Science, University of Bristol, Bristol, UK
Abstract: Many scientific codes consist of memory bandwidth bound kernels. One major advantage of many-core devices such as general purpose graphics processing units (GPGPUs) and the Intel Xeon Phi is their focus on providing increased memory bandwidth over traditional CPU architectures. Peak memory bandwidth is usually unachievable in practice and so benchmarks are required to measure a practical upper bound on expected performance. We augment the standard STREAM kernels with a dot product kernel to investigate the performance of simple reduction operations on large arrays. The choice of programming model should ideally not limit the achievable performance on a device. BabelStream (formally GPU-STREAM) has been updated to incorporate a wide variety of the latest parallel programming models, all implementing the same parallel scheme. As such this tool can be used as a kind of Rosetta Stone which provides both a cross-platform and cross-programming model array of results of achievable memory bandwidth.
Keywords: performance portability; many-core; parallel programming models; memory bandwidth benchmark.
International Journal of Computational Science and Engineering, 2018 Vol.17 No.3, pp.247 - 262
Received: 17 Jan 2017
Accepted: 10 Apr 2017
Published online: 22 Oct 2018 *