Title: Mechanisms classification for glycoside hydrolases by sequence and structure features using computational methods
Authors: Fan Yang; Lin Wang
Addresses: Department of Molecular Genetics, University of Toronto, Medical Sciences Building Toronto, Ontario M5S 1A8, Canada ' Shandong Provincial Key Laboratory Based Intelligent Computing, University of Jinan, No. 106. Jiwei Road, Jinan 250022, Shandong, P.R, China
Abstract: Glycoside Hydrolases (GHs) have played key roles in the development of biofuels as well as many other industries. Research aimed at accurate classification of catalytic mechanisms to increase the catalytic activity of GHs is receiving extensive attention. The traditional theories or methods used in the study of catalytic mechanisms of GHs are limited by reaction conditions. They are not suitable for the study of various GHs because different enzymes would show devious physicochemical properties. In this paper, a new method is proposed to classify and predict the catalytic mechanism of a certain glycoside hydrolase according to their sequence and structure features using k-Nearest Neighbour (kNN) classifier, Support Vector Machine (SVM), Naive Bayes (NB) Classifier and the Multilayer Perception (MLP) Classifier. The classification performance of the four computational methods used were evaluated and compared. Experimental results show that each classifier has its own advantages, but the kNN classifier is more accurate at the overall level. This research also helps us to gain a better understanding of the catalytic mechanisms in different GHs.
Keywords: glycoside hydrolase; catalytic mechanisms; classification; data mining; bioinformatics; computational methods; structure features; sequence features; biofuels; classifiers; k-nearest neighbour; kNN; support vector machines; SVM; naive Bayes; multilayer perception; MLP.
International Journal of Data Mining and Bioinformatics, 2014 Vol.9 No.4, pp.444 - 457
Received: 13 May 2011
Accepted: 02 Mar 2012
Published online: 15 Oct 2013 *