Authors: Yifeng Zhu, Hong Jiang, Xiao Qin, David Swanson
Addresses: Department of Electrical and Computer Engineering, University of Maine, Orono, ME, USA. ' Department of Computer Science and Engineering, University of Nebraska–Lincoln, NE, USA. ' Department of Computer Science, New Mexico Institute of Mining and Technology, Socorro, NM, USA. ' Department of Computer Science and Engineering, University of Nebraska–Lincoln, NE, USA
Abstract: In this work, we investigate parallel I/O efficiencies in parallelised BLAST, the most popular tool for searching similarity in biological databases and implement two variations by incorporating the PVFS and CEFT-PVFS parallel I/O facilities. Our goal is to study the performance gain from parallel I/O under the constraints of different numbers of commodity storage devices in a Linux cluster. We also evaluate two read performance optimisation techniques employed in CEFT-PVFS: (1) doubling the degree of parallelism is shown to have comparable read performance with respect to PVFS when both systems have the same number of servers; (2) skipping hot-spot nodes can reduce the performance penalty when I/O workloads are highly imbalanced. The I/O resource contention between multiple applications, running in the same cluster, can degrade the performance of the original parallel BLAST and the PVFS version up to 10- and 21-fold, respectively; whereas, the one based on CEFT-PVFS, which has the ability to skip hot-spot nodes, suffered only a two-fold performance degradation.
Keywords: bioinformatics; sequence comparison; parallel I/O; cluster computing; PVFS; CEFT-PVFS; BLAST; input/output; high performance computing; biological sequence search; Linux clusters; read performance optimisation; performance evaluation; molecular biology; access patterns.
International Journal of High Performance Computing and Networking, 2004 Vol.1 No.4, pp.214 - 222
Available online: 07 Dec 2005 *Full-text access for editors Access for subscribers Purchase this article Comment on this article