Title: Sequence based human leukocyte antigen gene prediction using informative physicochemical properties

Authors: Watshara Shoombuatong; Panuwat Mekha; Jeerayut Chaijaruwanich

Addresses: Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand ' Department of Computer Science, Maejo University, Chiang Mai 50130, Thailand ' Department of Computer Science, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

Abstract: Prediction of different classes within the human leukocyte antigen (HLA) gene family can provide insight into the human immune system and its response to viral pathogens. Therefore, it is desirable to develop an efficient and easily interpretable method for predicting HLA gene class compared to existing methods. We investigated the HLA gene prediction problem as follows: (a) establishing a dataset (HLA262) such that the sequence identity of the complete HLA dataset was reduced to 30%; (b) proposing a feature set of informative physicochemical properties that cooperate with SVM (named HLAPred) to achieve high accuracy and sensitivity (90.04% and 82.99%, respectively) compared with existing methods; and (c) analysing the informative physicochemical properties to understand the physicochemical properties and molecular mechanisms of the HLA gene family.

Keywords: human leukocyte antigen; support vector machines; SVM; physicochemical properties; HLA gene prediction; human immune system; viral pathogens; gene sequences; bioinformatics.

DOI: 10.1504/IJDMB.2015.072072

International Journal of Data Mining and Bioinformatics, 2015 Vol.13 No.3, pp.211 - 224

Received: 08 Feb 2014
Accepted: 06 Sep 2014

Published online: 30 Sep 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article