Forthcoming articles

International Journal of High Performance Systems Architecture

International Journal of High Performance Systems Architecture (IJHPSA)

These articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Register for our alerting service, which notifies you by email when new issues are published online.

Open AccessArticles marked with this Open Access icon are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.
We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of High Performance Systems Architecture (5 papers in press)

Regular Issues

  • A framework for evaluating branch predictors using multiple performance parameters   Order a copy of this article
    by Moumita Das, Ansuman Banerjee, Bhaskar Sardar 
    Abstract: Selecting a branch predictor for a program for prediction is a challenging task. The performance of a branch predictor is measured not only by the prediction accuracy - parameters like predictor size, energy expenditure, latency of execution play a key role in predictor selection. For a specific program, a predictor which provides the best results based on one of these parameters, may not be the best when some other parameter is considered. The task to select the best predictor considering all the different parameters, is therefore, a non-trivial one, and is considered one of the foremost challenges. In this paper, we propose a framework to systematically address this important challenge using the concept of aggregation and unification. For a given program, our framework considers the performance of the different predictors, with respect to the different parameters, and makes a predictor selection based on all of them. On one side, our framework can be an important aid for deciding on the best predictor to use at runtime. On the other side, the proposal of new predictor can be systematically evaluated and placed in purview of existing ones, considering the parameters of choice. We present experimental results of our framework on the Siemens, SPEC 2006 and SPEC 2017 benchmarks.
    Keywords: Branch prediction; prediction accuracy; execution latency; rank aggregation.

  • Image saliency and co-saliency detection by low-rank multiscale fusion   Order a copy of this article
    by Rui Huang, Wei Feng, Jizhou Sun, Yaobin Zou 
    Abstract: Saliency and co-saliency detection aim to distinguish conspicuous foreground objects from single and multiple images, thus are essential in many multimedia and vision applications. To achieve balanced efficiency and accuracy, most recent successful saliency detectors are based on superpixels. However, saliency detection with single-scale superpixel segmentation may fail in capturing intrinsic salient objects in complex natural scenes with small-scale high-contrast backgrounds. To tackle this problem and realize reliable saliency and co-saliency detection, we present a simple strategy using multiscale superpixels to jointly detect salient object via low-rank analysis. Specifically, we first build a multiscale superpixel pyramid and derive the corresponding saliency map by multimodal saliency features and priors at each single scale. Then, we use joint low-rank analysis of multiscale saliency maps to obtain a more reliable and adaptively-fused saliency map, which properly takes all scales saliency into account. We further propose a GMM-based co-saliency prior to enable the above approach to detect co-salient objects from multiple images. Extensive experiments on benchmark datasets validate the effectiveness and superiority of the proposed saliency and co-saliency detector over state-of-the-arts.
    Keywords: Saliency; co-saliency; co-saliency prior; generative model; GMM; low-rank analysis; multiscale.

  • Towards Designing Quantum Reversible 32-bit MIPS Register File   Order a copy of this article
    by Mohammad Samadi Gharajeh, Majid Haghparast 
    Abstract: Reversible circuit design can be applied in various emerging technologies such as quantum computing. Since researchers have proposed many building blocks and designed small circuits (e.g., reversible full adder) already, it is the time to design large-scale reversible circuits. This paper proposes a novel quantum reversible 32-bit MIPS register file for quantum computer processors. It presents a reversible 5-to-32 decoder, thirty-two reversible buffer registers, and two reversible 32-to-1 multiplexers, too. The proposed reversible decoder block, namely GH-DEC, and the proposed reversible multiplexer block, namely GH-MUX, use the Feynman, Toffoli, and Fredkin gates. They have been designed by a minimum number of constant inputs, number of garbage outputs, and quantum cost. Besides, output expressions of all the circuits are simplified to enhance the performance of proposed quantum design, considerably. Comparison results show that the proposed reversible design surpasses the existing works in terms of the number of constant inputs, number of garbage outputs, and quantum cost.
    Keywords: Reversible Circuit Design; Quantum Computing; Reversible Register File; Reversible Decoder; Reversible Multiplexer.

  • A Review of Shared Resource Contention in Multicores and its Mitigating Techniques   Order a copy of this article
    by Preeti Jain, Sunil Surve 
    Abstract: Chip Multiprocessor (CMP) systems have become inevitable to meet high computing demands. Having high potential and the reduced latency in inter-processor communication amongst the CMP cores makes it a viable solution for parallel execution, in contrast to conventional, single core processors. In such systems sharing of resources is imperative for better resource utilization. The challenge arises when various application programs running on neighbouring cores compete for these resources concurrently and introduce contention. Further an urgency to mitigate contention aggravates as process-level parallelism grows rapidly. Extensive studies in the past have been carried out to study contention due to resource sharing and various techniques are proposed to mitigate it. We present in a simple, lucid and captivating manner a summary of previous work on contention in multicores due to various shared resources like shared caches, main memory, memory bus bandwidth, prefetchers etc. The work aims to briefly discuss key ideas proposed by the research community to alleviate resource contention due to various resources, under a single umbrella. The paper provides better understanding on the contention problem in multicores as we present a cumulative overview of previous challenges due to all shared resources. The work throws light on the fact that, alone a single shared component is not a dominant reason for performance degradation in CMPs, rather all elements in the memory hierarchy introduce resource contention thereby affecting performance cumulatively. The work presented would assist novice readers, researchers and academicians to further serve to propose optimal policies to address contention in designing multicore applications, considering overall impact of these resources on the performance of multicore systems.
    Keywords: Multicore; shared resources; contention; LLC; Main memory; bus bandwidth; prefetching; mitigating techniques.

  • Partial Product Generation for Unbalanced Ternary Signed Multiplication   Order a copy of this article
    by Samira Din Mohammadi, Reza Faghih Mirzaee, Keivan Navi 
    Abstract: Signed multiplication is an essential operation in computer arithmetic. The first step of multiplication is called partial product generation. Partial products are simply generated in binary logic by ANDing every bit of multiplier with the bits of multiplicand. No matter that the numbers are signed or unsigned, AND is the partial product generator in binary logic. However, the same process in ternary logic is not as simple as in binary. The AND gate loses its efficiency. The employment of an ordinary 1-digit ternary multiplier is not sufficient either since it only multiplies two positive ternary digits. New ternary operators are required for the multiplication of negative digits. This paper presents these operators for the unbalanced ternary signed multiplier. The proposed operators are realized with three different well-known ternary circuit topologies by 32nm CMOS technology.
    Keywords: Baugh-Wooley Multiplication; Computer Arithmetic; Partial Product Generation; Ternary Signed Multiplier; Ternary Logic.