Title: Modelling splice sites with locality-sensitive sequence features

Authors: Sing-Wu Liou; Yin-Fu Huang

Addresses: Graduate School of Engineering Science and Technology, National Yunlin University of Science and Technology, 123 University Road, Section 3, Touliu, Yunlin, Taiwan 640, ROC ' Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, 123 University Road, Section 3, Touliu, Yunlin, Taiwan 640, ROC

Abstract: The splice sites are essential for pre-mRNA maturation and crucial for Splice Site Modelling (SSM); however, there are gaps between the splicing signals and the computationally identified sequence features. In this paper, the Locality Sensitive Features (LSFs) are proposed to reduce the gaps by homogenising their contexts. Under the skewness-kurtosis based statistics and data analysis, SSM attributed with LSFs is fulfilled by double-boundary outlier filters. The LSF-based SSM had been applied to six model organisms of diverse species; by the accuracy and Receiver Operating Characteristic (ROC) analysis, the promising results show the proposed methodology is versatile and robust for the splice-site classification. It is prospective the LSF-based SSM can serve as a new infrastructure for developing effective splice-site prediction methods and have the potential to be applied to other sequence prediction problems.

Keywords: splice sites; modelling; splice site classification; locality-sensitive sequence features; splicing signals; bioinformatics; sequence prediction.

DOI: 10.1504/IJDMB.2013.050979

International Journal of Data Mining and Bioinformatics, 2013 Vol.7 No.1, pp.78 - 102

Received: 19 Nov 2010
Accepted: 04 Apr 2011

Published online: 20 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article