Authors: Sunghwan Bae; Taesung Park
Addresses: Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea ' Department of Statistics, Seoul National University, Seoul, South Korea
Abstract: The recent development of next generation sequencing technology has led to the identification of several disease-related genetic variants. In this study, we systematically compare the performance of prediction models using common and rare variants from the Whole Exome Sequencing data of the Type 2 Diabetes Genetic Exploration by Next generation sequencing in multi-ethnic samples. We evaluated several methods for predicting binary phenotypes such as Stepwise Logistic Regression, Penalised Regression and Support Vector Machine (SVM). We first constructed prediction models by combining variable selection and prediction methods for Type 2 Diabetes. We then calculated the Area Under the Curve (AUC) to compare the performance of the prediction models. The results indicate that the performance of the common and rare variants combination was better than either that of the common variants only or the rare variants only. Further, the AUC values of SVM were always larger than those of other prediction models.
Keywords: WES; whole exome sequencing; risk prediction model; T2D; type 2 diabetes; penalised regression methods; stepwise selection; SVM; support vector machine.
International Journal of Data Mining and Bioinformatics, 2018 Vol.20 No.1, pp.77 - 90
Available online: 26 May 2018 *Full-text access for editors Access for subscribers Free access Comment on this article