Inderscience PublishersInderscience PublishersInderscience Publishers About Inderscience Contact Information Current Site Map General Help
  PUBLISHERS OF DISTINGUISHED ACADEMIC, SCIENTIFIC AND PROFESSIONAL JOURNALS

Forthcoming Papers > International Journal of High Performance Systems Architecture (IJHPSA)        Journal Homepage

This page lists papers submitted for IJHPSA via the web that have been reviewed and accepted but not yet published. Please note that titles, authors, abstracts and keywords may change upon publication.

Our TOC e-mail alerting service will notify you immediately when new issues of IJHPSA are published on-line. Click here to register for our TOC E-Mail Alerting. We also offer the convenience of RSS feeds which provide a means to view new content timely posted to your web site or desktop. Click here to start to use our free RSS news feeds.

International Journal of High Performance Systems Architecture (11 papers in press)

  • HIGH SPEED, LOW POWER, 100 MS/s FRONT END TRACK-AND-HOLD AMPLIFIER FOR 10-bit PIPELINED ADC
    by Meganathan Deivasigamani, Rajapaul Perinbam 
    Abstract: The work focuses on the design of a high speed, low power track-and-hold amplifier (THA) for 10-bit 100MS/s pipelined analog-to-digital converter (ADC). A wide-bandwidth and high gain two-stage operational trans-conductance amplifier (OTA) is selected as OTA of THA to reduce the power consumption and thermal noise contribution by the OTA. The bootstrap technique is employed to reduce the non-linearity error associated with the input signal. The signal swing of the circuit is allowed to exceed the supply voltage, which further reduces the thermal noise contributed by the circuit and increases the dynamic range of the circuit. The circuit is implemented in UMC180nm digital CMOS technology. The THA circuit along with the biasing circuit consumes 5.706mW power and it achieves 81.23dB as the spurious free dynamic range for 2V output at 100MHz sampling rate. The dynamic range of the THA is 94.97dB. The proposed THA is simulated using SPECTRE simulator under a variety of process and temperature conditions.
    Keywords: Multiplying Digital to Analog Converter (MDAC), Common Mode Feedback (CMFB), Switched Capacitor (SC), peak-to-peak (p-p).
     
  • Dual Channel Addition Based FFT Processor Architecture for Signal and Image Processing
    by Subhendu Kumar Sahoo, Chandra Shekhar Sharma, Sudeepti Kodali, Abhijit R . Asati, Anu Gupta 
    Abstract: This paper presents a novel fixed-point 16-bit word-width 16-point FFT/IFFT processor architecture designed primarily for the signal and image processing application. The 16-point FFT is realized by using Cooley-Tukey decimation in time algorithm. This approach reduces the number of required complex multiplications compared to a normal discrete Fourier transform. Since multipliers are very power hungry elements in VLSI designs, they result in significant power consumption. So, the complex multiplication operations are realized using shift-and-add operations. The proposed algorithm performs all intermediate addition operation using a novel dual channel addition technique, which avoids carry propagation delay. Only in the last stage, carry look ahead adders are used to give final result. This dual channel addition algorithm reduces the critical delay path by 42 % and 38.29 % as compared to traditional and Maharatna approach respectively.
    Keywords: DFT, FFT processor, IFFT processor, Multiplier
     
  • An Architecture for DICOM Medical Images Storage and Retrieval Adopting Distributed File Systems
    by mario dantas, Douglas Macedo, Hilton Perantunes, Aldo Wangenhein 
    Abstract: Conventional storage and retrieval of information from telemedicine environments are usually based on ordinary database systems. Therefore, aspects such as scalability, information distribution, high performance system techniques and operational costs are well known challenges to be circumvented in the research for novel proposals. In this research work, it is presented an architecture that targets high performance levels to store and retrieve DICOM medical images adopting a distributed approach in a cluster configuration. The proposal has two main components. The first element is a data model that is based on image hierarchy, considering the Hierarchical Data Format 5 (HDF5). On the other hand, the second component is a distributed file system, characterized by the Parallel Virtual File System (PVFS) that was employed in this proposal as a distributed storage data system. As a result, this paper presents a differentiated approach for storage and retrieval of information for a telemedicine environment. Experimental results, utilizing the architecture, indicate an enhanced level of performance around 16% in terms of storage process, this number represents an improved performance in comparison to a conventional database system.
    Keywords: Distributed file systems; medical images; DICOM; HDF 5; PVFS.
     
  • Online Mapping of MPI-2 Dynamic Tasks to Processes and Threads
    by Joao Vicente Ferreira Lima, Nicolas Maillard 
    Abstract: In recent years, distributed platforms became largely used on HPC, and most of these architectures have di erent levels of parallelism. Hence, one of the key design stages in parallel programming is task mapping which attempts to maximise processor utilisation and minimise communication cost. However, this depends on a programming environment with ecient mapping scheme. This paper presents a library to MPI-2 (libSpawn) that implements a scheme to map tasks between processes and threads in order to minimise communications and task creation costs. We evaluated the libSpawn with two dynamic MPI programs: Fibonacci and Mergesort. Our experiments demonstrate that the mapping scheme o ers signi cant performance improvements.
    Keywords: dynamic programs; task mapping; load balancing; multithreaded programming; high performance.
     
  • HieraAnalyses - A tool for hierarchical analysis of parallel programs
    by Thatyana Seraphim, Enzo Seraphim, Gonzalo Travieso 
    Abstract: Detailed information for performance analysis of parallel programs can be collected through trace files. Generally, trace files contain a register of individual events that occurred during program execution. Considering that the events traced are commonly of low level, like communication operations in a parallel system, and that it is increasingly common for the application programmer to use higher level abstractions (e.g. a parallel eigenvalues routine), a semantic gap exists between the collected information and the concepts used for the development of the application, hindering an effective use of that information. In this work, a new approach to trace files is proposed, where the files retain information about the different hierarchical levels in the application. The files follow an XML format, where routines are XML tags, with auxiliary routines called during its execution as child tags. The approach is demonstrated by its implementation for the MPI library level and the OOPS level, this last one being an object-oriented framework with higher level abstractions for the development of parallel programs that uses MPI for its implementation. To complement the work, some analysis tools using the file format are presented.
    Keywords: trace; performance analysis; parallel programming
     
  • Context-oriented Exception Handling
    by Fabiane Cristine Dillenburg, Jorge Luis Victoria Barbosa 
    Abstract: The growing availability of small, faster computational devices has made mobile computing more popular in our daily life. The development of new applications demands an adaption of programming languages to this new reality, in order to take advantage of new available technologies. In this context, this paper presents a proposal for the specification of context-oriented exception handling features, aiming the development of mobile and ubiquitous applications. These features were implemented in the Holoparadigm mobile development and execution platform. Also, these features were tested using specially crafted applications that triggered exception handling.
    Keywords: mobile computing; ubiquitous computing; programming languages; exception handling; Holoparadigm
     
  • An Evaluation of the Performance Impact of Generic Group Communication APIs
    by Leandro Sales, Henrique Teofilo, Nabor Mendonca, Jonathan D'Orleans, Rafael Barbosa, Fernando Trinta 
    Abstract: This paper presents an evaluation of the performance impact of two generic group communication APIs, namely Hedera and jGCS, over three well-known group communication systems, namely JGroups, Spread and Appia. The evaluation compared the performance of different configurations of the three group communication systems in a local clustered environment, under different message and cluster sizes, both in standalone mode and when used as plug-ins for the two generic APIs. The results show that there are significant differences in the overhead imposed by each generic API with respect to the performance of the three group communication systems, when used in standalone mode, and that those differences are strongly related to variations in message and also to the way the generic APIs and their plug-in mechanisms are implemented. Based on those results, the paper discusses some of the circumstances upon which it would be worth implementing group communication using the investigated systems.
    Keywords: group communication; performance evaluation; generic APIs
     
  • A Pattern Based Instruction Encoding Technique for High Performance Architectures
    by Ricardo Santos, Rafael Batistella, Rodolfo Azevedo 
    Abstract: In this paper we propose a new technique to reduce the program footprint and the instruction fetch latency in high performance architectures adopting long instructions in the memory. Our technique is based on an algorithm that factors long instructions into instruction patterns and encoded instructions, which contains no redundant data and it is stored into an I-cache. The instruction patterns look like a map to the decode logic to prepare the instruction to be executed in the execution stages. These patterns are stored into a new cache (P-cache). We evaluated this technique in a high performance architecture called 2D-VLIW through trace-driven experiments with MediaBench, and SPEC programs. We compared the 2D-VLIW execution time performance before and after the encoding, and also with other encoding techniques implemented in computer architectures. Experimental results reveal that our encoding strategy provides a program execution time that is up to 69% better than EPIC.
    Keywords: computer architecture; high performance; PBIW; pattern based instruction encoding; instruction encoding; 2D-VLIW; EPIC; memory bottleneck, P-cache; pattern cache.
     
  • FPGA Implementation and Performance Evaluation of an RFC 2544 Compliant Ethernet TestSet
    by Cristiano Both 
    Abstract: With the constant and rapid advances in microelectronics and networking technology, network service providers needs for tuning up services in order to attract more subscribers have become more important. Ethernet technology has improved in terms of communication speed and have established itself as a standard enabling more recently throughput rates in the range of 1-100 Gbps. However, the need for quality services requires Ethernet testers to be not only standard compliant but also meet performance criteria as specified by the standard. Performance criteria are difficult to prove and typically cannot be accomplished by software due to the limitations of the underlying general purpose hardware as well as the existence of many software layers. In this paper, we propose a design, an implementation and the performance verification achievements of an Ethernet tester compliant with the throughput and latency tests specified by the RFC 2544 for 10/100 Mpbs Ethernet networks. The results showed that the device designed achieved the performance criteria defined by the RFC while it was implemented in a Commercial Off-The-Shelf (COTS) low cost FPGA board. The performance was compared to an existent software implementation and the results showed that the usual limitations added by several hardware and software layers can be overcome by implementing a frame generator, monitor and media access (MAC layer 2) directly in an FPGA device.
    Keywords: Performance Evaluation, High Performance, Reconfigurable Computing, Computer Networks
     
  • Evolutionary Computer-Aided Design for Efficient Application Mapping on NoC Platforms
    by Marcus V. C. da Silva, Nadia Nedjah, Luiza Mourelle 
    Abstract: Network-on-chip (NoC) are considered the next generation of communication infrastructure, which will be omnipresent in different environments. In the platform-based methodology, an application is implemented by a set of collaborating intellectual properties (IPs) blocks. Increasing scale integration, increases the number of possible IPs to be addressed on a NoC platform. To select the best set of IPs and to physically organize them is a combinatorial problem hard to be solved. In this paper, we propose a multi-objective evolutionary-based decision support system to aid the mapping stage on a platformbased NoC design. The IP mapping optimization is driven by the area occupied, execution time and power consumption.
    Keywords: networ-on-chip, mapping problem, evolutionary computation
     
  • A Massively Parallel Hardware Architecture for Ray-Tracing
    by Alexandre Nery, Nadia Nedjah, Felipe França 
    Abstract: In this paper, we propose an architecture, which we call GridRT, capable of dealing with the main features, such as shadows and reflections effects, of Ray Tracing used for rendering three-dimensional scenes. This architecture achieves an efficient overall performance yet using a simple and compact massively parallel design. The design exploits the usage of XilinxR Floating Point Operator IP Core and the spatial data structure of Regular Grids.
    Keywords: ray-tracing; architecture; parallelism; computer graphics; fpga