Forthcoming articles


International Journal of High Performance Systems Architecture


These articles have been peer-reviewed and accepted for publication in IJHPSA, but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.


Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.


Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.


Articles marked with this Open Access icon are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.


Register for our alerting service, which notifies you by email when new issues of IJHPSA are published online.


We also offer RSS feeds which provide timely updates of tables of contents, newly published articles and calls for papers.


International Journal of High Performance Systems Architecture (4 papers in press)


Regular Issues


  • CHILL: A System for Fine-Grained Mapping of Chained High Impact Long-Latency Load Phases on Tightly Coupled Heterogeneous Multi-cores   Order a copy of this article
    by Robert Chen, Glenn Reinman 
    Abstract: With increasing power and application demands, heterogeneous multi-core processors are becoming more prevalent. However, the key to proper utilization of heterogeneous multi-cores is assigning, or mapping, the right application to the right core type. Recent work has shown that fine-grained mapping takes advantage of short program phases with highly variant performance requirements, and can elicit greater benefits from tightly coupled heterogeneous multi-cores. In this paper, we show that bottlenecks in performance can occur in fine-grained program phases during chains of high impact long-latency loads. We design a system that detects these bottleneck phases, and propose accelerating these phases on the out-of-order core for better performance and energy efficiency. Our system operates within 10% of performance, and 2.6% of energy to an oracle resource mapper. This translates to a 44.4% performance gain, and 9.2% energy savings over existing fine-grained mapping techniques.
    Keywords: Heterogeneous Multi-cores; Tightly Coupled Multi-cores; Resource Mapping; Fine-grained Mapping; Fine-grained Scheduling; Bottleneck Phases.

  • A Novel High-Performance and Reliable Multi-Threshold CNFET Full Adder Cell Design   Order a copy of this article
    by Yavar Safaei Mehrabani 
    Abstract: Full Adder cell is widely employed in larger circuits such as multiplier, compressor, address calculation of cache memory, and so on. Therefore it plays an important role in determining the entire performance of digital system. In this paper a novel high-speed, high-performance, and reliable Full Adder cell based on NAND, MAJORITY-not, and NOR at nanoscale using Carbon Nanotube Field-Effect Transistors (CNFETs) is presented. Several simulations have been carried out using different power supplies, load capacitors, frequencies, and temperatures at 32nm-CMOS and 32nm-CNFET technologies using HSPICE simulator tool. Simulation results demonstrate the superiority of the proposed cell in terms of delay and power-delay product (PDP) compared to other full adder cells. In addition, to evaluate the robustness of the CNFET-based Full Adder cells with respect to process variation (diameter mismatches of the CNFETs nanotubes), Monte Carlo transient analysis is conducted. Experimental results confirm that the proposed design can function more properly and experience less diameter variations in the presence of the process fluctuations than the other cells do.
    Keywords: Carbon Nanotube Field-Effect Transistor (CNFET); full adder cell; high-performance; nanoelectronics; process fluctuation.

  • Hardware Design of Parallel Switch Setting Algorithm for Benes Networks   Order a copy of this article
    by Mei Yang, Yikun Jiang 
    Abstract: Benes/Clos networks have been used in many areas, such as interconnection network in parallel computers, multiprocessors system, and networks-on-chip. The parallel switch setting algorithm is the key to satisfy the requirements of high performance switching networks. The Lees routing algorithm is by far the most efficient parallel routing algorithm for Benes networks. However, there is no hardware implementation for this algorithm. In this paper, the Lees routing algorithm is fully implemented in RTL and synthesized. We have refined the algorithm in data structure and initialization/updating of relation values to make it suitable for hardware implementation. The simulation and synthesis results of the switching setting circuits for 4x4 to 64x64 Benes networks confirm that the timing, area, and power consumption of the circuit is consistent with the complexity of the Lees algorithm. To the best of our knowledge, this is the first complete hardware implementation of the parallel switch setting algorithm which can handle all types of permutations including partial ones.
    Keywords: Benes; Parallel Algorithm; Hardware; RTL; Implementation; Synthesis.

  • Multi-architecture profiler for Android   Order a copy of this article
    by Anderson Luiz Sartor, Antonio Carlos Schneider Beck 
    Abstract: Performance and energy consumption are well-known constraints of modern embedded systems, thus, application analysis in early stages of the development cycle is mandatory. However, the few available tools to evaluate the behavior of an application considering different architectures are not able to provide a complete solution for this task. In this context, this work presents a multi-architecture profiling tool for Android applications, which fully supports ARM, MIPS, and x86 architectures. It provides a wide range of information per application, including energy consumption, execution time and other statistics. For that, we have extended the Android Emulator QEMU and developed post-processing tools. As case study, we have compared different architectures in terms of performance and energy consumption. By the use of the proposed tool, we show that, given a fixed energy budget, a different amount of applications can be executed depending on how they were implemented, which varies according to the processor.
    Keywords: Android emulator; Android applications; QEMU; profiling; profiler; java native interface; JNI; performance evaluation; energy consumption estimation; Dalvik.