International Journal of High Performance Systems Architecture (4 papers in press)
CHILL: A System for Fine-Grained Mapping of Chained High Impact Long-Latency Load Phases on Tightly Coupled Heterogeneous Multi-cores
by Robert Chen, Glenn Reinman
Abstract: With increasing power and application demands, heterogeneous multi-core processors are becoming more prevalent. However, the key to proper utilization of heterogeneous multi-cores is assigning, or mapping, the right application to the right core type. Recent work has shown that fine-grained mapping takes advantage of short program phases with highly variant performance requirements, and can elicit greater benefits from tightly coupled heterogeneous multi-cores. In this paper, we show that bottlenecks in performance can occur in fine-grained program phases during chains of high impact long-latency loads. We design a system that detects these bottleneck phases, and propose accelerating these phases on the out-of-order core for better performance and energy efficiency. Our system operates within 10% of performance, and 2.6% of energy to an oracle resource mapper. This translates to a 44.4% performance gain, and 9.2% energy savings over existing fine-grained mapping techniques.
Keywords: Heterogeneous Multi-cores; Tightly Coupled Multi-cores; Resource Mapping; Fine-grained Mapping; Fine-grained Scheduling; Bottleneck Phases.
A Novel High-Performance and Reliable Multi-Threshold CNFET Full Adder Cell Design
by Yavar Safaei Mehrabani
Abstract: Full Adder cell is widely employed in larger circuits such as multiplier, compressor, address calculation of cache memory, and so on. Therefore it plays an important role in determining the entire performance of digital system. In this paper a novel high-speed, high-performance, and reliable Full Adder cell based on NAND, MAJORITY-not, and NOR at nanoscale using Carbon Nanotube Field-Effect Transistors (CNFETs) is presented. Several simulations have been carried out using different power supplies, load capacitors, frequencies, and temperatures at 32nm-CMOS and 32nm-CNFET technologies using HSPICE simulator tool. Simulation results demonstrate the superiority of the proposed cell in terms of delay and power-delay product (PDP) compared to other full adder cells. In addition, to evaluate the robustness of the CNFET-based Full Adder cells with respect to process variation (diameter mismatches of the CNFETs nanotubes), Monte Carlo transient analysis is conducted. Experimental results confirm that the proposed design can function more properly and experience less diameter variations in the presence of the process fluctuations than the other cells do.
Keywords: Carbon Nanotube Field-Effect Transistor (CNFET); full adder cell; high-performance; nanoelectronics; process fluctuation.
Hardware Design of Parallel Switch Setting Algorithm for Benes Networks
by Mei Yang, Yikun Jiang
Abstract: Benes/Clos networks have been used in many areas, such as interconnection network in parallel computers, multiprocessors system, and networks-on-chip. The parallel switch setting algorithm is the key to satisfy the requirements of high performance switching networks. The Lees routing algorithm is by far the most efficient parallel routing algorithm for Benes networks. However, there is no hardware implementation for this algorithm. In this paper, the Lees routing algorithm is fully implemented in RTL and synthesized. We have refined the algorithm in data structure and initialization/updating of relation values to make it suitable for hardware implementation. The simulation and synthesis results of the switching setting circuits for 4x4 to 64x64 Benes networks confirm that the timing, area, and power consumption of the circuit is consistent with the complexity of the Lees algorithm. To the best of our knowledge, this is the first complete hardware implementation of the parallel switch setting algorithm which can handle all types of permutations including partial ones.
Keywords: Benes; Parallel Algorithm; Hardware; RTL; Implementation; Synthesis.
Multi-architecture profiler for Android
by Anderson Luiz Sartor, Antonio Carlos Schneider Beck
Abstract: Performance and energy consumption are well-known constraints of modern embedded systems, thus, application analysis in early stages of the development cycle is mandatory. However, the few available tools to evaluate the behavior of an application considering different architectures are not able to provide a complete solution for this task. In this context, this work presents a multi-architecture profiling tool for Android applications, which fully supports ARM, MIPS, and x86 architectures. It provides a wide range of information per application, including energy consumption, execution time and other statistics. For that, we have extended the Android Emulator QEMU and developed post-processing tools. As case study, we have compared different architectures in terms of performance and energy consumption. By the use of the proposed tool, we show that, given a fixed energy budget, a different amount of applications can be executed depending on how they were implemented, which varies according to the processor.
Keywords: Android emulator; Android applications; QEMU; profiling; profiler; java native interface; JNI; performance evaluation; energy consumption estimation; Dalvik.