International Journal of High Performance Systems Architecture (8 papers in press)
Soft Skills Requirements in Mobile Applications Development Employment Market
by JIngdong Jia, Zupeng Chen, Xi Liu
Abstract: The soft skills of developers have a major influence on the quality of software product and project. However, which soft skills are important for mobile applications development remains unknown. Additionally, it is necessary to examine the differences of soft skills requirements between traditional software and mobile applications development. In this article, based on text mining including word segmentation, similarity calculation and clustering analysis, we analyse lots of advertisements, and extract 13 categories of soft skills requirements for mobile applications development. We also compare the categories with those for traditional software development. We find that communication and teamwork are still the most important two soft skills. However, fast learning is more important for mobile developers, and we identified four soft skills that are not proposed before. Additionally, season has a minor impact on soft skills requirements of mobile applications development.
Keywords: soft skill; mobile application development; job advertisement; text mining; cluster analysis.
Energy Optimized Cryptography (EOC)for Low Power Devices in Internet of Things
by RAJESH G, Vamsi Krishna C, Christopher Selvaraj B, Roshan Karthik S, Arun Kumar Sangaiah
Abstract: Internet of Things(IoT) has a plethora of devices ranging from high capacity servers to low powered devices that works with Bluetooth, ZigBee, GPRS, RFID and WiFi etc,. These the low power devices are constrained to security, power management, reliability and privacy limitations. The existing traditional security algorithms could not be applied to these low power devices, due tothe high processing and battery power requirements. Here proposed an Energy Optimized Cryptography (EOC) for low power devices in IoT. Here the security of the low power devices are providedby two light weight security techniques called R2CV, a sub key generation method and Optimized Message Authentication Code Generation Function (OMGF) tomaintain security without compromising energy and processing power consumption. The proposed security algorithms reduce the computational requirements for sub key generation and MAC generation in low power devices. The experimental results are compared with the existing security algorithms like RC5 and SHA, and is proven that R2CV and OMGF reduce the time consumed, increase battery life and in turn it extends the network life time.
Keywords: IoT Security; low-power devices; Message authentication code; Energy efficiency; Internet of Things.
Real-Time Physical Register File Allocation with Neural Networks for Simultaneous Multi-Threading Processors
by Wenjun Wang, Wei-Ming Lin
Abstract: Simultaneous Multi-Threading (SMT) processors improve system performance by allowing concurrent execution of multiple independent threads with shared key resources. Physical register file, shared among the threads in real time, is one of the most critical resources in deciding overall system performance. Disproportional distribution of registers among the threads may easily hamper normal processing of some threads. In this paper, we develop a machine learning algorithm to efficiently allocate registers among concurrent executing threads based on current resource utilization circumstances. An off-line training process is first employed to establish a well-trained neural network which is then applied to dynamically adjust the resource distribution in real time. Our experiment results on M-sim, which is a multi- threaded micro-architectural simulation environment, show that our proposed technique significantly improves the average system throughput by up to 42% without sacrificing execution fairness among the threads.
Keywords: Simultaneous Multi-Threading; Register Re- naming; Physical Register File; Neural Networks; Machine Learning.
Multiprocessing Scalable String Matching Algorithm for Network Intrusion Detection System
by Adnan Hnaif, Ali Aldahoud, Mohammad Alia, Issa Al’otoum, Duaa Hani
Abstract: With high increasing speed of today's computer networks which affects the performance of security issues in terms of detection speed, the traditional security tools such as firewall is insufficient to protect the networks from external threads. Intrusion Detection Systems (IDS) are one of the most reliable tools that can be used to monitor all the network traffic to identify unauthorized usage of computer system networks.rnIn this paper, we have proposed a scalable string matching algorithm based on Network IDS (NIDS) to enhance the speed of NIDS detection engine, which called Multiprocessing Scalable String Matching Algorithm for Network Intrusion Detection System (MSNIDS). The MSNIDS implemented by using enhanced weighted exact matching algorithm (EWEMA) in both sequential and parallel processing. The MSNIDS based on EWEMA can be achieved more than 89% in sequential processing time compared with WEMA, and 86% in parallel processing time compared with sequential matching processing.
Keywords: String Matching Algorithms; Distributed Architecture; Parallel Processing; Network Intrusion Detection System.
An Efficient VLSI Architecture For Two-Dimensional Discrete Wavelet Transform
by Rohan Pinto
Abstract: In this paper, a memory efficient 2-D discrete wavelet transform (DWT) structure is presented for high-speed application. The architecture is based on the modified lifting scheme to reduce the critical path to one multiplier delay. In order to increase the speed of processing, four pipeline stages are introduced in the structure. The computation time for an N x N image is N2/4, as the throughput rate of the structure is four. The results after comparison reveal that the proposed architecture has a temporal memory lower than the other DWT architectures. The Z-scan method is employed to fetch the input data which suits the transpose unit design. Five registers and a multiplexer constitute a transpose unit, which is required to transpose the data between the row and the column processor. The proposed 2-D dual-scan DWT architecture has the merits of low latency, low control complexity and regular signal ow, making it suitable for a very large-scale integration (VLSI) implementation. The architecture is modeled in VHDL and synthesized with the CMOS 180nm technology.
Keywords: Discrete wavelet transform (DWT); lifting scheme; pipeline; VLSI; architecture.
Heterogeneous Computing on Mobile GPU-FPGA Cooperation Platform
by Nan Hu, Xuehai Zhou, Xi Li
Abstract: In recent years, mobile GPUs have been widely adopted in Systems-On-Chip(SoCs) platforms, especially in the graphics area. Meanwhile, reconfigurable processors and emerging FPGA computing devices are also widely used. However, the research of mobile GPU for general computing cooperation with FPGA, is still scarce. Such heterogeneous systems pose a great challenge to the parallel programming. In this paper, we present a Flow-Lead-In Architecture (FLIA) is proposed as a unified data flow driven development model based on coupled GPU-FPGA. The servant represents an intermediate language module that is compiled from the high-level programming language and is compiled to different types of processors at runtime. Execution-flow abstracts the communication task between the servants and controls the pipeline execution for spatial parallelism. By scheduling multiple servants to heterogeneous processors, the cooperation system uses fewer resources to achieve near performance and power with the pure FPGA system.
Keywords: heterogeneous computing; GPU-FPGA cooperation; mobile GPU; ARM GPU FPGA partitioning; reconfigurable computing.
A framework for evaluating branch predictors using multiple performance parameters
by Moumita Das, Ansuman Banerjee, Bhaskar Sardar
Abstract: Selecting a branch predictor for a program for prediction is a challenging task.
The performance of a branch predictor is measured not only by the prediction accuracy - parameters like predictor size, energy expenditure, latency of execution play a key role in predictor selection. For a specific program, a predictor which provides the best results based on one of these parameters, may not be the best when some other parameter is considered. The task to select the best predictor considering all the different parameters, is therefore, a non-trivial one, and is considered one of the foremost challenges. In this paper, we propose a framework to systematically address this important challenge using the concept of aggregation and unification. For a given program, our framework considers the performance of the different predictors, with respect to the different parameters, and makes a predictor selection based on all of them. On one side, our framework can be an important aid for deciding on the best predictor to use at runtime. On the other side, the proposal of new predictor can be systematically evaluated and placed in purview of existing ones, considering the parameters of choice. We present experimental results of our framework on the Siemens, SPEC 2006 and SPEC 2017 benchmarks.
Keywords: Branch prediction; prediction accuracy; execution latency; rank aggregation.
Special Issue on: On-Chip Communication Theory and Applications
Parallel Video Processing on FPGA Architecture
by Lamjed Touil, Abdessalem Bn Abdelali, Lilia Kechiche, Bouraoui Ouni, Abdelatif MTIBAA
Abstract: Real time Video applications are becoming widely used in many domains with more demand for high performance. Video processing is intensive and habitually has accompanying real-time or super-real-time requirements. Such us, Multiple cameras are used in monitoring and surveillance systems in automatically real time analyze video to detect unusual events. Due to the strong computational imposed by video algorithms, real-time video treatment is notably amenable to concurrent processing. Classical implementation solutions whether based on general purpose processors or dedicated ones like DSP cannot fulfill wanted performance. In this article, we focus on the applicability of computing reconfigurable architectures to parallel video processing applications. The experiment results show that the proposed hardware-oriented multi-treatment architecture can provide an average frame rate of 45 frames/s at high definition resolution. Statistics show a consumption about 18 % of logic resources and 27% of on chip memory which gives the possibility to integrate additional treatments.
Keywords: FPGA; MPMC; Video processing; Cut Detection; Picture in Picture.