Title: Modified bagging of maximal information coefficient for genome-wide identification

Authors: Han-Ming Liu; Nini Rao; Dan Yang; Ling Yang; Yi Li; Feng-Biao Guo

Addresses: Key Laboratory for NeuroInformation of Ministry of Education, Center for Information in BioMedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, No. 4, Section 2, North Jianshe Road, 610054, Chengdu, Sichuan, China; School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, Jiangxi, 341000, China ' Key Laboratory for NeuroInformation of Ministry of Education, Center for Information in BioMedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, No. 4, Section 2, North Jianshe Road, 610054, Chengdu, Sichuan, China ' School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, Jiangxi, 341000, China ' Key Laboratory for NeuroInformation of Ministry of Education, Center for Information in BioMedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, No. 4, Section 2, North Jianshe Road, 610054, Chengdu, Sichuan, China ' Key Laboratory for NeuroInformation of Ministry of Education, Center for Information in BioMedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, No. 4, Section 2, North Jianshe Road, 610054, Chengdu, Sichuan, China ' Key Laboratory for NeuroInformation of Ministry of Education, Center for Information in BioMedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, No. 4, Section 2, North Jianshe Road, 610054, Chengdu, Sichuan, China

Abstract: A new method, modified Bagging (mBagging) of Maximal Information Coefficient (mBoMIC), was developed for genome-wide identification. Traditional Bagging is inadequate to meet some requirements of genome-wide identification, in terms of statistical performance and time cost. To improve statistical performance and reduce time cost, an mBagging was developed to introduce Maximal Information Coefficient (MIC) into genome-wide identification. The mBoMIC overcame the weakness of original MIC, i.e., the statistical power is inadequate and MIC values are volatile. The three incompatible measures of Bagging, i.e. time cost, statistical power and false positive rate, were significantly improved simultaneously. Compared with traditional Bagging, mBagging reduced time cost by 80%, improved statistical power by 15%, and decreased false positive rate by 31%. The mBoMIC has sensitivity and university in genome-wide identification. The SNPs identified only by mBoMIC have been reported as SNPs associated with cardiac disease.

Keywords: modified bagging; mBagging; time cost; statistical performance; maximal information coefficient; genome-wide identification; SNPs; genetic variations; single nucleotide polymorphisms; heart disease; cardiac disease; ensemble method; multi-gene co-effects; bioinformatics.

DOI: 10.1504/IJDMB.2016.074875

International Journal of Data Mining and Bioinformatics, 2016 Vol.14 No.3, pp.229 - 257

Received: 14 Jan 2015
Accepted: 06 Aug 2015

Published online: 22 Feb 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article