Title: Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method

Authors: T. Zeiser, G. Wellein, A. Nitsure, K. Iglberger, U. Rude, G. Hager

Addresses: Regionales Rechenzentrum Erlangen, Martensstr. 1, 91058 Erlangen, Germany. ' Regionales Rechenzentrum Erlangen, Martensstr. 1, 91058 Erlangen, Germany. ' Regionales Rechenzentrum Erlangen, Martensstr. 1, 91058 Erlangen, Germany. ' Lehrstuhl fur Systemsimulation, Cauerstr. 6, 91058 Erlangen, Germany. ' Lehrstuhl fur Systemsimulation, Cauerstr. 6, 91058 Erlangen, Germany. ' Regionales Rechenzentrum Erlangen, Martensstr. 1, 91058 Erlangen, Germany

Abstract: In this report we propose a parallel cache oblivious spatial and temporal blocking algorithm for the lattice Boltzmann method in three spatial dimensions. The algorithm has originally been proposed by Frigo et al. (1999) and divides the space-time domain of stencil-based methods in an optimal way, independently of any external parameters, e.g., cache size. In view of the increasing gap between processor speed and memory performance this approach offers a promising path to increase cache utilisation. We find that even a straightforward cache oblivious implementation can reduce memory traffic at least by a factor of two if compared to a highly optimised standard kernel and improves scalability for shared memory parallelisation. Due to the recursive structure of the algorithm we use an unconventional parallelisation scheme based on task queuing.

Keywords: lattice Boltzmann method; cache optimisation; cache oblivious blocking; multi core; task queuing; shared memory parallelisation; memory traffic reduction.

DOI: 10.1504/PCFD.2008.018088

Progress in Computational Fluid Dynamics, An International Journal, 2008 Vol.8 No.1/2/3/4, pp.179 - 188

Published online: 30 Apr 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article