Title: A performance evaluation of NoSQL databases to manage proteomics data

Authors: Chaimaa Messaoudi; Rachida Fissoune; Hassan Badir

Addresses: National School of Applied Sciences, Abdelmalek Essaadi University, BP 1818, 90000, Tangier, Morocco ' National School of Applied Sciences, Abdelmalek Essaadi University, BP 1818, 90000, Tangier, Morocco ' National School of Applied Sciences, Abdelmalek Essaadi University, BP 1818, 90000, Tangier, Morocco

Abstract: NoSQL databases have recently been introduced as alternatives to traditional relational database management systems because of their capabilities in terms of storing data and query retrieval. Biological datasets can be modelled using various models, for example, graphs (protein-protein interaction) or documents (protein sequence information). Applications that involve these two data models can be combined into a single unique architecture either using the polyglot persistence approach or using a multi-model approach. This paper evaluates the performance of a polyglot persistence approach versus a multi-model store. The polyglot persistence approach combines a graph-oriented database (Neo4j) and a document-oriented database (MongoDB); and the multi-model system is OrientDB. The comparisons are made following these aspects: importation, single operations, and query performance. OrientDB demonstrates a potential to manage large proteomics dataset for query retrieval and graph importation. However, when updating records, OrientDB was found to be slow. There is no single store that performs better in all cases.

Keywords: proteomics; MongoDB; multi-model; Neo4j; OrientDB; polyglot persistence.

DOI: 10.1504/IJDMB.2018.095556

International Journal of Data Mining and Bioinformatics, 2018 Vol.21 No.1, pp.70 - 89

Received: 12 Feb 2018
Accepted: 11 Sep 2018

Published online: 09 Oct 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article