Title: Exploiting multi-layered vector spaces for signal peptide detection

Authors: Tom Johnsten; Laura Fain; Leanna Fain; Ryan G. Benton; Ethan Butler; Lewis Pannell; Ming Tan

Addresses: School of Computing, University of South Alabama, 150 Jaguar Drive, Suite 2101, Mobile, AL 36688, USA ' School of Computing, University of South Alabama, 150 Jaguar Drive, Suite 2101, Mobile, AL 36688, USA ' School of Computing, University of South Alabama, 150 Jaguar Drive, Suite 2101, Mobile, AL 36688, USA ' Center for Visual and Decision Informatics, University of Louisiana at Lafayette, 635 Cajundome, Lafayette, LA 70506, USA ' USA Mitchell Cancer Institute, 1660 Springhill Avenue, Mobile, Alabama 36604, USA ' USA Mitchell Cancer Institute, 1660 Springhill Avenue, Mobile, Alabama 36604, USA ' USA Mitchell Cancer Institute, 1660 Springhill Avenue, Mobile, Alabama 36604, USA

Abstract: Analysing and classifying sequences based on similarities and differences is a mathematical problem of escalating relevance and importance in many scientific disciplines. One of the primary challenges in applying machine learning algorithms to sequential data, such as biological sequences, is the extraction and representation of significant features from the data. To address this problem, we have recently developed a representation, entitled Multi-Layered Vector Spaces (MLVS), which is a simple mathematical model that maps sequences into a set of MLVS. We demonstrate the usefulness of the model by applying it to the problem of identifying signal peptides. MLVS feature vectors are generated from a collection of protein sequences and the resulting vectors are used to create support vector machine classifiers. Experiments show that the MLVS-based classifiers are able to outperform or perform on par with several existing methods that are specifically designed for the purpose of identifying signal peptides.

Keywords: bioinformatics; signal peptide detection; support vector machines; SVM; multi-layered vector spaces; modelling; machine learning; sequential data; signal peptides; protein sequences; classification.

DOI: 10.1504/IJDMB.2015.071544

International Journal of Data Mining and Bioinformatics, 2015 Vol.13 No.2, pp.141 - 157

Received: 19 Oct 2013
Accepted: 24 Jun 2014

Published online: 31 Aug 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article