Article: Benchmark datasets for the DNA fragment assembly problem Journal: International Journal of Bio-Inspired Computation (IJBIC) 2013 Vol.5 No.6 pp.384 - 394 Abstract: Many computational intelligence approaches have been used for the fragment assembly problem. However, the comparison and analysis of these approaches is difficult due to the lack of availability of standard benchmarks. Although similar datasets may be used as a starting point, there is not enough information to reproduce the exact overlaps matrix for the fragments used by the various approaches, creating a problem for consistency. This paper presents a collection of benchmark datasets for a wide range of fragment lengths, number of fragments, and sequence lengths, along with a description of the method used to produce them. A website has been created to maintain the datasets and the tables of results at http://chac.sis.uia.mx/fragbench/. Researchers are invited to add to the datasets by following the method described, as well as to submit results obtained by their algorithms on the benchmarks. Inderscience Publishers - linking academia, business and industry through research

Title: Benchmark datasets for the DNA fragment assembly problem

Authors: Guillermo M. Mallén-Fullerton; James Alexander Hughes; Sheridan Houghten; Guillermo Fernández-Anaya

Addresses: Engineering Department, Universidad Iberoamericana, Prol. Paseo de la Reforma 880, Lomas de Santa Fe, 01219 México, Distrito Federal, México ' Department of Computer Science, Brock University, 500 Glenridge Avenue, St. Catharines, Ontario, L2S 3A1, Canada ' Department of Computer Science, Brock University, 500 Glenridge Avenue, St. Catharines, Ontario, L2S 3A1, Canada ' Department of Physics and Mathematics, Universidad Iberoamericana, Prol. Paseo de la Reforma 880, Lomas de Santa Fe, 01219 México, Distrito Federal, México

Abstract: Many computational intelligence approaches have been used for the fragment assembly problem. However, the comparison and analysis of these approaches is difficult due to the lack of availability of standard benchmarks. Although similar datasets may be used as a starting point, there is not enough information to reproduce the exact overlaps matrix for the fragments used by the various approaches, creating a problem for consistency. This paper presents a collection of benchmark datasets for a wide range of fragment lengths, number of fragments, and sequence lengths, along with a description of the method used to produce them. A website has been created to maintain the datasets and the tables of results at http://chac.sis.uia.mx/fragbench/. Researchers are invited to add to the datasets by following the method described, as well as to submit results obtained by their algorithms on the benchmarks.

Keywords: bioinformatics; DNA fragments; fragment assembly problem; FAP; DNA sequence assembly; benchmark datasets; benchmarking.

DOI: 10.1504/IJBIC.2013.058912

International Journal of Bio-Inspired Computation, 2013 Vol.5 No.6, pp.384 - 394

Received: 10 Oct 2013
Accepted: 10 Oct 2013
Published online: 31 Mar 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Benchmark datasets for the DNA fragment assembly problem

Keep up-to-date