Performance comparison of different parallel lattice Boltzmann implementations on multi-core multi-socket systems Online publication date: Tue, 04-Nov-2008
by S. Donath, K. Iglberger, G. Wellein, T. Zeiser, A. Nitsure, U. Rude
International Journal of Computational Science and Engineering (IJCSE), Vol. 4, No. 1, 2008
Abstract: In this report, we discuss the performance behaviour of different parallel lattice Boltzmann implementations. In previous works, we already proposed a fast serial implementation and a cache oblivious spatial and temporal blocking algorithm for the lattice Boltzmann method (LBM) in three spatial dimensions. The cache oblivious update scheme has originally been proposed by Frigo et al. The main idea is to provide maximum performance results for stencil-based methods by dividing the space-time domain in an optimal way, independently of any external parameters, such as cache size. In view of the increasing gap between processor speed and memory performance, this approach offers a promising path to increase cache utilisation. We present results for the shared memory parallelisation of the cache oblivious implementation based on task queueing in comparison to the iterative standard implementation, thereby focusing on the special issues for multi-core and multi-socket systems.
Online publication date: Tue, 04-Nov-2008
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Computational Science and Engineering (IJCSE):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email firstname.lastname@example.org