Authors: Vasa Curcin, Moustafa Ghanem, Yike Guo
Addresses: Department of Computing, Imperial College London, SW7 2AZ, UK. ' Department of Computing, Imperial College London, SW7 2AZ, UK. ' Department of Computing, Imperial College London, SW7 2AZ, UK
Abstract: Scientific workflow systems provide languages for representing complex scientific processes as decompositions into lower level tasks, down to the level of atomic, executable units. To support data analysis activities, a wide variety of such languages represent data transformation and processing operations as task nodes within a workflow. Adding data type information to the task inputs and outputs allows workflow authors to perform type checking at design time, search for compatible nodes in public component repositories and define specifications of abstract workflows. Introducing support for strict data typing simplifies the implementation of a workflow system in addressing these issues, but at the expense of losing flexibility. We address this challenge by introducing workflow type signatures suitable for use in registries and for type matching, and developing a polymorphic type inference over compositions of such signatures. The focus is on the relational data model, popular in data analysis workflow systems, and the techniques introduced are validated by applying the inference engine prototype to an adverse drug reaction study implemented in the relational algebra subset of the Discovery Net workflow system.
Keywords: scientific workflows; dataflow; workflow type signatures; registries; type matching; polymorphic type inference; relational data models; data analysis; adverse drug reaction.
International Journal of Business Process Integration and Management, 2010 Vol.5 No.1, pp.45 - 62
Published online: 10 May 2010 *Full-text access for editors Access for subscribers Purchase this article Comment on this article