Title: Towards rule-based metabolic databases: a requirement analysis based on KEGG

Authors: Stephan Richter; Ingo Fetzer; Martin Thullner; Florian Centler; Peter Dittrich

Addresses: Bio Systems Analysis Group, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Ernst-Abbe-Platz 2, 07737 Jena, Germany ' Stockholm Resilience Centre, Stockholm University, Kräftriket 2B, 11419 Stockholm, Sweden ' Department of Environmental Microbiology, UFZ - Helmholtz Centre for Environmental Research, Permoserstraße 15, 04318 Leipzig, Germany ' Department of Environmental Microbiology, UFZ - Helmholtz Centre for Environmental Research, Permoserstraße 15, 04318 Leipzig, Germany ' Bio Systems Analysis Group, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Ernst-Abbe-Platz 2, 07737 Jena, Germany

Abstract: Knowledge of metabolic processes is collected in easily accessible online databases which are increasing rapidly in content and detail. Using these databases for the automatic construction of metabolic network models requires high accuracy and consistency. In this bipartite study we evaluate current accuracy and consistency problems using the KEGG database as a prominent example and propose design principles for dealing with such problems. In the first half, we present our computational approach for classifying inconsistencies and provide an overview of the classes of inconsistencies we identified. We detected inconsistencies both for database entries referring to substances and entries referring to reactions. In the second part, we present strategies to deal with the detected problem classes. We especially propose a rule-based database approach which allows for the inclusion of parameterised molecular species and parameterised reactions. Detailed case-studies and a comparison of explicit networks from KEGG with their anticipated rule-based representation underline the applicability and scalability of this approach.

Keywords: metabolism; rule-based metabolic databases; rule-based modelling; inconsistency check; cell models; KEGG database; reaction balance; pathway analysis; ontology; bioinformatics; classification; inconsistencies; parameterised molecular species; parameterised reactions.

DOI: 10.1504/IJDMB.2015.072103

International Journal of Data Mining and Bioinformatics, 2015 Vol.13 No.3, pp.289 - 319

Received: 08 Feb 2014
Accepted: 14 Jan 2015

Published online: 30 Sep 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article