Title: A new feature selection method for computational prediction of type III secreted effectors

Authors: Yang Yang; Sihui Qi

Addresses: Department of Computer Science and Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China ' Department of Computer Science and Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China

Abstract: The type III secretion system (T3SS) is a specialised protein delivery system that plays an important role in pathogenic bacteria. However, the secretion mechanism has not been fully understood yet. Especially, the identification of type III secreted effectors is a notoriously challenging problem which has attracted a lot of research interests in recent years. In this paper, we introduce a machine learning method using amino acid sequence features for predicting T3SEs. We use a topic model called HMM-LDA to select useful features, and conduct experiments on Pseudomonas syringae as well as some other bacterial genomes. The cross-validation results on P. syringae data set show an improved prediction accuracy with the reduced feature set. The experimental results on the test sets also demonstrate that the accuracy of the proposed method is comparable to or better than the accuracies achieved by other available T3SE prediction tools.

Keywords: type III secretion system; effectors; topic models; computational prediction; feature selection; bioinformatics; protein delivery systems; pathogenic bacteria; machine learning; amino acid sequences.

DOI: 10.1504/IJDMB.2014.064894

International Journal of Data Mining and Bioinformatics, 2014 Vol.10 No.4, pp.440 - 454

Received: 10 May 2012
Accepted: 02 Nov 2012

Published online: 21 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article