Title: Small Ancestry Informative Marker panels for complete classification between the original four HapMap populations

Authors: Damrongrit Setsirichok; Theera Piroonratana; Anunchai Assawamakin; Touchpong Usavanarong; Chanin Limwongse; Waranyu Wongseree; Chatchawit Aporntewan; Nachol Chaiyaratana

Addresses: Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand ' Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand ' Biostatistics and Informatics Laboratory, Genome Institute, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, 113 Thailand Science Park, Phahonyothin Road, Klong 1, Klong Luang, Pathumthani 12120, Thailand ' Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand ' Division of Molecular Genetics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Prannok Road, Bangkoknoi, Bangkok 10700, Thailand ' Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand ' Department of Mathematics, Faculty of Science, Chulalongkorn University, 254 Phayathai Road, Pathumwan, Bangkok 10330, Thailand ' Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand; Division of Molecular Genetics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Prannok Road, Bangkoknoi, Bangkok 10700, Thailand

Abstract: A protocol for the identification of Ancestry Informative Markers (AIMs) from genome-wide Single Nucleotide Polymorphism (SNP) data is proposed. The protocol consists of three main steps: identification of potential positive selection regions via FST extremity measurement, SNP screening via two-stage attribute selection and classification model construction using a Naïve Bayes classifier. The two-stage attribute selection is composed of a newly developed round robin Symmetrical Uncertainty (SU) ranking technique and a wrapper embedded with a Naïve Bayes classifier. The protocol has been applied to the HapMap Phase II data. Two AIM panels, which consist of 10 and 16 SNPs that lead to complete classification between CEU, CHB, JPT and YRI populations, are identified. Moreover, the panels are at least four times smaller than those reported in previous studies. The results suggest that the protocol could be useful in a scenario involving a larger number of populations.

Keywords: AIM identification; ancestry informative markers; attribute selection; HapMap populations; heterozygosity; population classification; positive selection; SNP; single nucleotide polymorphism; bioinformatics.

DOI: 10.1504/IJDMB.2012.050249

International Journal of Data Mining and Bioinformatics, 2012 Vol.6 No.6, pp.651 - 674

Received: 07 May 2010
Accepted: 26 Feb 2011

Published online: 11 Nov 2012 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article