Authors: Robert J. Halstead; Jason Villarreal; Walid A. Najjar
Addresses: Department of Computer Science and Engineering, University of California, Riverside, CA 92507, USA ' Jacquard Computing, Riverside, CA 92507, USA ' Department of Computer Science and Engineering, University of California, Riverside, CA 92507, USA
Abstract: Algorithms that exhibit irregular memory access patterns are known to show poor performance on multiprocessor architectures, particularly when memory access latency is variable. Many common data structures, including graphs, trees, and linked-lists, exhibit these irregular memory access patterns. While FPGA-based code accelerators have been successful on applications with regular memory access patterns, they have not been further explored for irregular memory access patterns. Multithreading has been shown to be an effective technique in masking long latencies. We describe the compiler generation of concurrent hardware threads for FPGAs with the objective of masking the memory latency caused by irregular memory access patterns. The CHAT compiler extends the ROCCC toolset to generate customised state information for each dynamically generated thread. Initial results show a speed-up of 50x.
Keywords: irregular memory access patterns; custom hardware accelerated threads; CHAT; compilers; field programmable gate arrays; FPGA; C to VHDL; irregular applications; reconfigurable systems; multiprocessor architectures; memory access latency; concurrent hardware threads.
International Journal of High Performance Computing and Networking, 2014 Vol.7 No.4, pp.258 - 268
Received: 17 Mar 2012
Accepted: 24 Jan 2013
Published online: 11 Jun 2014 *