Title: A novel approach for imputation of missing continuous attribute values in databases using genetic algorithm
Authors: R. Devi Priya; S. Kuppuswami
Addresses: Kongu Engineering College, Erode-638 052, Tamil Nadu, India ' Kongu Engineering College, Erode-638 052, Tamil Nadu, India
Abstract: Missing values in databases are more common and if untreated distort the estimates. Numerous methods were developed by researchers to replace the missing values in continuous attributes. The simple methods used are less efficient and the efficient methods are very complex to implement. Hence, to maintain a balance between simplicity and efficiency a new method called Bayesian genetic algorithm (BGA) is proposed based on genetic algorithm and Bayes theorem for both missing at random (MAR) and missing completely at random (MCAR) assumption. Accuracy of BGA is compared with that of mean, kNN and multiple imputation in finding the missing values and the results are studied. BGA produces more accurate results than other methods in four datasets studied at different rates of missingness ranging from 5% to 60%. BGA works better even in large datasets resulting in less biased estimates.
Keywords: continuous attributes; missing values; Bayesian genetic algorithms; BGA; missing at random; MAR; missing completely at random; MCAR; databases.
International Journal of Information Technology and Management, 2015 Vol.14 No.2/3, pp.185 - 200
Received: 02 May 2012
Accepted: 05 Dec 2012
Published online: 18 Mar 2015 *