Authors: Thanapat Kangkachit; Kitsana Waiyamai; Philippe Lenca
Addresses: Faculty of Engineering, Department of Computer Engineering, Kasetsart University, Bangkok 10900, Thailand ' Faculty of Engineering, Department of Computer Engineering, Kasetsart University, Bangkok 10900, Thailand ' Institut Telecom, Telecom Bretagne, UMR CNRS 6285 Lab-STICC, Université européenne de Bretagne, 29238, France
Abstract: Reactive motifs are short conserved sub-sequences discovered from functional sites of enzyme sequences, and can be used as an effective representation of enzyme sequences. However, the lack of site information leads to low-coverage reactive motifs. With the use of background knowledge, a motif generalisation method is required to increase reactive motifs' coverage. We show that a fuzzy concept lattice (FCL) provides an efficient representation of both single-value and multi-value biological background knowledge and an efficient computational support for generalising reactive motifs. Compared to statistical and expert-based motifs, we show that the generalised reactive motifs using FCL with SVM classifier produce satisfactory accuracy in classifying new enzymes. Further, they improve interpretability of the classification results and provide more biological evidences to biologists. All of the generalised reactive motifs are relevant to the functional sites, and the way they are combined to perform protein function is useful for numerous applications in bioinformatics.
Keywords: enzyme classification; complete substitution group; FCL; fuzzy concept lattice; reactive motifs; generalised motifs; active sites; binding sites; amino acid substitution matrix; amino acid properties; functional informatics; enzyme sequences; protein function; bioinformatics.
International Journal of Functional Informatics and Personalised Medicine, 2014 Vol.4 No.3/4, pp.243 - 258
Received: 07 May 2013
Accepted: 05 Dec 2013
Published online: 19 Mar 2015 *