Authors: Yuanming Zhang; Kanemitsu Ootsu; Takashi Yokota; Takanobu Baba
Addresses: College of Computer Science and Technology, Zhejiang University of Technology, 18 Chaowang St., Hangzhou 310014, China ' Department of Information Science, Utsunomiya University, 7-1-2 Yoto, Utsunomiya 321-8585, Japan ' Department of Information Science, Utsunomiya University, 7-1-2 Yoto, Utsunomiya 321-8585, Japan ' Department of Information Science, Utsunomiya University, 7-1-2 Yoto, Utsunomiya 321-8585, Japan
Abstract: Thread decomposition is critical for pipelined multithreading (PMT) to gain higher performance on target multi-core processors. This paper presents an automatic thread decomposition approach, which maps the decomposition problem onto a graph-theoretic framework to construct an optimised directed acyclic graph (DAG) with minimal bottleneck node size and balanced node size. In this approach, control dependence is treated as special data dependence and then an effective approach is proposed to remove redundant control dependences. A weighted DAG is constructed by assigning appropriate weights to all nodes and all dependences according to profile information. An automatic thread decomposition algorithm is given to generate an optimised pipeline based on the weighted DAG. The algorithm has been evaluated on a commodity multi-core processor, and experimental results show that it has achieved speedup ranging from 113% to 174% on some SPEC CPU 2000 benchmark programs.
Keywords: high performance computing; pipelined multithreading; PMT; multi-core processors; automatic thread decomposition; optimised pipelines; graph theory; directed acyclic graph; DAG; control dependence.
International Journal of High Performance Computing and Networking, 2013 Vol.7 No.3, pp.227 - 237
Available online: 25 Sep 2013 *Full-text access for editors Access for subscribers Purchase this article Comment on this article