Efficient implementation of OpenACC cache directive on NVIDIA GPUs
by Ahmad Lashgar; Amirali Baniasadi
International Journal of High Performance Computing and Networking (IJHPCN), Vol. 13, No. 1, 2019

Abstract: OpenACC's programming model presents a simple interface to programmers, offering a trade-off between performance and development effort. OpenACC relies on compiler technologies to generate efficient code and optimise the performance. The cache directive is among the challenges to implement directives. The cache directive allows the programmer to utilise the accelerator's hardware- or software-managed caches by passing hints to the compiler. In this paper, we investigate the implementation aspect of cache directive under NVIDIA-like GPUs and propose optimisations for the CUDA backend. We use CUDA's shared memory as the software-managed cache space. We first show that a straightforward implementation can be very inefficient, and undesirably downgrade performance. We investigate the differences between this implementation and hand-written CUDA alternatives and introduce the following optimisations to bridge the performance gap between the two: 1) improving occupancy by sharing the cache among several parallel threads; 2) optimising cache fetch and write routines via parallelisation and minimising control flow. Investigating three test cases, we show that the best cache directive implementation can perform very close to hand-written CUDA equivalent and improve performance up to 2.4× (compared to the baseline OpenACC.)

Online publication date: Tue, 11-Dec-2018

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of High Performance Computing and Networking (IJHPCN):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com