International Journal of High Performance Systems Architecture (5 papers in press)
A framework for evaluating branch predictors using multiple performance parameters
by Moumita Das, Ansuman Banerjee, Bhaskar Sardar
Abstract: Selecting a branch predictor for a program for prediction is a challenging task.
The performance of a branch predictor is measured not only by the prediction accuracy - parameters like predictor size, energy expenditure, latency of execution play a key role in predictor selection. For a specific program, a predictor which provides the best results based on one of these parameters, may not be the best when some other parameter is considered. The task to select the best predictor considering all the different parameters, is therefore, a non-trivial one, and is considered one of the foremost challenges. In this paper, we propose a framework to systematically address this important challenge using the concept of aggregation and unification. For a given program, our framework considers the performance of the different predictors, with respect to the different parameters, and makes a predictor selection based on all of them. On one side, our framework can be an important aid for deciding on the best predictor to use at runtime. On the other side, the proposal of new predictor can be systematically evaluated and placed in purview of existing ones, considering the parameters of choice. We present experimental results of our framework on the Siemens, SPEC 2006 and SPEC 2017 benchmarks.
Keywords: Branch prediction; prediction accuracy; execution latency; rank aggregation.
Image saliency and co-saliency detection by low-rank multiscale fusion
by Rui Huang, Wei Feng, Jizhou Sun, Yaobin Zou
Abstract: Saliency and co-saliency detection aim to distinguish conspicuous foreground objects from single and multiple images, thus are essential in many multimedia and vision applications. To achieve balanced efficiency and accuracy, most recent successful saliency detectors are based on superpixels. However, saliency detection with single-scale superpixel segmentation may fail in capturing intrinsic salient objects in complex natural scenes with small-scale high-contrast backgrounds. To tackle this problem and realize reliable saliency and co-saliency detection, we present a simple strategy using multiscale superpixels to jointly detect salient object via low-rank analysis. Specifically, we first build a multiscale superpixel pyramid and derive the corresponding saliency map by multimodal saliency features and priors at each single scale. Then, we use joint low-rank analysis of multiscale saliency maps to obtain a more reliable and adaptively-fused saliency map, which properly takes all scales saliency into account. We further propose a GMM-based co-saliency prior to enable the above approach to detect co-salient objects from multiple images. Extensive experiments on benchmark datasets validate the effectiveness and superiority of the proposed saliency and co-saliency detector over state-of-the-arts.
Keywords: Saliency; co-saliency; co-saliency prior; generative model; GMM; low-rank analysis; multiscale.
Towards Designing Quantum Reversible 32-bit MIPS Register File
by Mohammad Samadi Gharajeh, Majid Haghparast
Abstract: Reversible circuit design can be applied in various emerging technologies such as quantum computing. Since researchers have proposed many building blocks and designed small circuits (e.g., reversible full adder) already, it is the time to design large-scale reversible circuits. This paper proposes a novel quantum reversible 32-bit MIPS register file for quantum computer processors. It presents a reversible 5-to-32 decoder, thirty-two reversible buffer registers, and two reversible 32-to-1 multiplexers, too. The proposed reversible decoder block, namely GH-DEC, and the proposed reversible multiplexer block, namely GH-MUX, use the Feynman, Toffoli, and Fredkin gates. They have been designed by a minimum number of constant inputs, number of garbage outputs, and quantum cost. Besides, output expressions of all the circuits are simplified to enhance the performance of proposed quantum design, considerably. Comparison results show that the proposed reversible design surpasses the existing works in terms of the number of constant inputs, number of garbage outputs, and quantum cost.
Keywords: Reversible Circuit Design; Quantum Computing; Reversible Register File; Reversible Decoder; Reversible Multiplexer.
A Review of Shared Resource Contention in Multicores and its Mitigating
by Preeti Jain, Sunil Surve
Abstract: Chip Multiprocessor (CMP) systems have become inevitable to meet high computing demands. Having high potential and the reduced latency in inter-processor communication amongst the CMP cores makes it a viable solution for parallel execution, in contrast to conventional, single core processors. In such systems sharing of resources is imperative for better resource utilization. The challenge arises when various application programs running on neighbouring cores compete for these resources concurrently and introduce contention. Further an urgency to mitigate contention aggravates as process-level parallelism grows rapidly. Extensive studies in the past have been carried out to study contention due to resource sharing and various techniques are proposed to mitigate it. We present in a simple, lucid and captivating manner a summary of previous work on contention in multicores due to various shared resources like shared caches, main memory, memory bus bandwidth, prefetchers etc. The work aims to briefly discuss key ideas proposed by the research community to alleviate resource contention due to various resources, under a single umbrella. The paper provides better understanding on the contention problem in multicores as we present a cumulative overview of previous challenges due to all shared resources. The work throws light on the fact that, alone a single shared component is not a dominant reason for performance degradation in CMPs, rather all elements in the memory hierarchy introduce resource contention thereby affecting performance cumulatively. The work presented would assist novice readers, researchers and academicians to further serve to propose optimal policies to address contention in designing multicore applications, considering overall impact of these resources on the performance of multicore systems.
Keywords: Multicore; shared resources; contention; LLC; Main memory; bus bandwidth; prefetching; mitigating techniques.
Partial Product Generation for Unbalanced Ternary Signed Multiplication
by Samira Din Mohammadi, Reza Faghih Mirzaee, Keivan Navi
Abstract: Signed multiplication is an essential operation in computer arithmetic. The first step of multiplication is called partial product generation. Partial products are simply generated in binary logic by ANDing every bit of multiplier with the bits of multiplicand. No matter that the numbers are signed or unsigned, AND is the partial product generator in binary logic. However, the same process in ternary logic is not as simple as in binary. The AND gate loses its efficiency. The employment of an ordinary 1-digit ternary multiplier is not sufficient either since it only multiplies two positive ternary digits. New ternary operators are required for the multiplication of negative digits. This paper presents these operators for the unbalanced ternary signed multiplier. The proposed operators are realized with three different well-known ternary circuit topologies by 32nm CMOS technology.
Keywords: Baugh-Wooley Multiplication; Computer Arithmetic; Partial Product Generation; Ternary Signed Multiplier; Ternary Logic.