Title: Evaluating data mining algorithms using molecular dynamics trajectories

Authors: Vasileios A. Tatsis; Christos Tjortjis; Panagiotis Tzirakis

Addresses: Department of Engineering Informatics and Telecommunications, Section of Applied Informatics, University of Western Macedonia, Vermiou & Ligeris, Kozani 50100, Greece ' Department of Computer Science, University of Ioannina, Ioannina 45110, Greece ' Department of Engineering Informatics and Telecommunications, Section of Applied Informatics, University of Western Macedonia, Vermiou & Ligeris, Kozani 50100, Greece

Abstract: Molecular dynamics simulations provide a sample of a molecule's conformational space. Experiments on the µs time scale, resulting in large amounts of data, are nowadays routine. Data mining techniques such as classification provide a way to analyse such data. In this work, we evaluate and compare several classification algorithms using three data sets which resulted from computer simulations, of a potential enzyme mimetic biomolecule. We evaluated 65 classifiers available in the well-known data mining toolkit Weka, using 'classification' errors to assess algorithmic performance. Results suggest that: (i) 'meta' classifiers perform better than the other groups, when applied to molecular dynamics data sets; (ii) Random Forest and Rotation Forest are the best classifiers for all three data sets; and (iii) classification via clustering yields the highest classification error. Our findings are consistent with bibliographic evidence, suggesting a 'roadmap' for dealing with such data.

Keywords: classification; data mining; evaluation; molecular dynamics; simulation; bioinformatics; molecules; conformational space.

DOI: 10.1504/IJDMB.2013.055499

International Journal of Data Mining and Bioinformatics, 2013 Vol.8 No.2, pp.169 - 187

Received: 06 Nov 2010
Accepted: 28 Nov 2011

Published online: 20 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article