Title: Improving runtime performance and energy consumption through balanced data locality with NUMA-BTLP and NUMA-BTDM static algorithms for thread classification and thread type-aware mapping
Authors: Iulia Ştirb
Addresses: Department of Computers and Software Engineering, Politehnica University of Timișoara, 2 Piața Victoriei, Timișoara, Romania
Abstract: Extending compilers like LLVM with NUMA-aware optimisations significantly improves runtime performance and energy consumption on NUMA systems. The paper presents NUMA-BTDM algorithm, which is a compile-time thread-type dependent mapping algorithm that performs the mapping uniformly based on the type of each thread given by NUMA-BTLP algorithm following a static analysis on the code. First, the compiler inserts in the program code architecture dependent code that detects at runtime the characteristics of the underlying architecture for Intel processors, and then the mapping is performed at runtime (using specific functions calls from the PThreads library) depending on these characteristics following a compile-time mapping analysis which gives the CPU affinity of each thread. NUMA-BTDM allows the application to customise, control and optimise the thread mapping and achieves balanced data locality on NUMA systems for C parallel code that combine PThreads based task parallelism with OpenMP based loop parallelism.
Keywords: static thread mapping; task parallelism; compiler optimisation; non-uniform memory access; NUMA systems; improving performance; improving energy consumption; balanced data locality.
International Journal of Computational Science and Engineering, 2020 Vol.22 No.2/3, pp.200 - 210
Received: 17 Jun 2017
Accepted: 20 Mar 2018
Published online: 18 May 2020 *