Title: Porting the MPI-parallelised LES model PALM to multi-GPU systems and many integrated core processors - an experience report

Authors: Helge Knoop; Tobias Gronemeier; Matthias Sühring; Peter Steinbach; Matthias Noack; Florian Wende; Thomas Steinke; Christoph Knigge; Siegfried Raasch; Klaus Ketelsen

Addresses: Institute of Meteorology and Climatology, Leibniz Universität Hannover, Hannover, Germany ' Institute of Meteorology and Climatology, Leibniz Universität Hannover, Hannover, Germany ' Institute of Meteorology and Climatology, Leibniz Universität Hannover, Hannover, Germany ' Scionics Computer Innovation GmbH, Dresden, Germany ' Supercomputing Department, Zuse Institute Berlin, Berlin, Germany ' Supercomputing Department, Zuse Institute Berlin, Berlin, Germany ' Supercomputing Department, Zuse Institute Berlin, Berlin, Germany ' Institute of Meteorology and Climatology, Leibniz Universität Hannover, Hannover, Germany ' Institute of Meteorology and Climatology, Leibniz Universität Hannover, Hannover, Germany ' Independent Software Consultant, Berlin, Germany

Abstract: The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

Keywords: computational fluid dynamics; CFD; graphics processing unit; GPU; many integrated core processors; MIC; Xeon Phi; high performance computing; HPC; large-eddy simulation; LES; MPI; OpenMP; OpenACC; porting.

DOI: 10.1504/IJCSE.2018.095850

International Journal of Computational Science and Engineering, 2018 Vol.17 No.3, pp.297 - 309

Received: 18 Jan 2017
Accepted: 29 Apr 2017

Published online: 25 Oct 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article