Authors: Guillermo M. Mallén-Fullerton; James Alexander Hughes; Sheridan Houghten; Guillermo Fernández-Anaya
Addresses: Engineering Department, Universidad Iberoamericana, Prol. Paseo de la Reforma 880, Lomas de Santa Fe, 01219 México, Distrito Federal, México ' Department of Computer Science, Brock University, 500 Glenridge Avenue, St. Catharines, Ontario, L2S 3A1, Canada ' Department of Computer Science, Brock University, 500 Glenridge Avenue, St. Catharines, Ontario, L2S 3A1, Canada ' Department of Physics and Mathematics, Universidad Iberoamericana, Prol. Paseo de la Reforma 880, Lomas de Santa Fe, 01219 México, Distrito Federal, México
Abstract: Many computational intelligence approaches have been used for the fragment assembly problem. However, the comparison and analysis of these approaches is difficult due to the lack of availability of standard benchmarks. Although similar datasets may be used as a starting point, there is not enough information to reproduce the exact overlaps matrix for the fragments used by the various approaches, creating a problem for consistency. This paper presents a collection of benchmark datasets for a wide range of fragment lengths, number of fragments, and sequence lengths, along with a description of the method used to produce them. A website has been created to maintain the datasets and the tables of results at http://chac.sis.uia.mx/fragbench/. Researchers are invited to add to the datasets by following the method described, as well as to submit results obtained by their algorithms on the benchmarks.
Keywords: bioinformatics; DNA fragments; fragment assembly problem; FAP; DNA sequence assembly; benchmark datasets; benchmarking.
International Journal of Bio-Inspired Computation, 2013 Vol.5 No.6, pp.384 - 394
Received: 10 Oct 2013
Accepted: 10 Oct 2013
Published online: 27 Jan 2014 *