Title: Detecting distant homologies on protozoans metabolic pathways using scientific workflows

Authors: Sergio Manuel Serra Da Cruz, Vanessa Batista, Edno Silva, Frederico Tosta, Clarissa Vilela, Rafael Cuadrat, Diogo Tschoeke, Alberto M.R. Davila, Maria Luiza Machado Campos, Marta Mattoso

Addresses: Computer Science Department, COPPE, Federal University of Rio de Janeiro (UFRJ), P.O. Box 68511, 21941-972 Rio de Janeiro, RJ, Brazil. ' Computer Science Department, COPPE, Federal University of Rio de Janeiro (UFRJ), P.O. Box 68511, 21941-972 Rio de Janeiro, RJ, Brazil. ' Computer Science Department, COPPE, Federal University of Rio de Janeiro (UFRJ), P.O. Box 68511, 21941-972 Rio de Janeiro, RJ, Brazil. ' Computer Science Department, COPPE, Federal University of Rio de Janeiro (UFRJ), P.O. Box 68511, 21941-972 Rio de Janeiro, RJ, Brazil. ' Computer Science Department, COPPE, Federal University of Rio de Janeiro (UFRJ), P.O. Box 68511, 21941-972 Rio de Janeiro, RJ, Brazil. ' Oswaldo Cruz Institute, FIOCRUZ, Av. Brasil 4365, 21040-360 Rio de Janeiro, RJ, Brazil. ' Oswaldo Cruz Institute, FIOCRUZ, Av. Brasil 4365, 21040-360 Rio de Janeiro, RJ, Brazil. ' Oswaldo Cruz Institute, FIOCRUZ, Av. Brasil 4365, 21040-360 Rio de Janeiro, RJ, Brazil. ' PPGI – IM/NCE, Federal University of Rio de Janeiro (UFRJ), Bloco C, CCMN, Sala E-2206, 21945970 – Rio de Janeiro, RJ, Brazil. ' Computer Science Department, COPPE, Federal University of Rio de Janeiro (UFRJ), P.O. Box 68511, 21941-972 Rio de Janeiro, RJ, Brazil

Abstract: Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.

Keywords: distant homologies; neglected diseases; protozoans; Kepler; scientific workflows; WfMS; workflow management systems; bioinformatics; metabolic pathways; genome homology workflows; genomic pipelines; trypanomatids.

DOI: 10.1504/IJDMB.2010.033520

International Journal of Data Mining and Bioinformatics, 2010 Vol.4 No.3, pp.256 - 280

Published online: 02 Jun 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article