Title: A novel approach for imputation of missing continuous attribute values in databases using genetic algorithm

Authors: R. Devi Priya; S. Kuppuswami

Addresses: Kongu Engineering College, Erode-638 052, Tamil Nadu, India ' Kongu Engineering College, Erode-638 052, Tamil Nadu, India

Abstract: Missing values in databases are more common and if untreated distort the estimates. Numerous methods were developed by researchers to replace the missing values in continuous attributes. The simple methods used are less efficient and the efficient methods are very complex to implement. Hence, to maintain a balance between simplicity and efficiency a new method called Bayesian genetic algorithm (BGA) is proposed based on genetic algorithm and Bayes theorem for both missing at random (MAR) and missing completely at random (MCAR) assumption. Accuracy of BGA is compared with that of mean, kNN and multiple imputation in finding the missing values and the results are studied. BGA produces more accurate results than other methods in four datasets studied at different rates of missingness ranging from 5% to 60%. BGA works better even in large datasets resulting in less biased estimates.

Keywords: continuous attributes; missing values; Bayesian genetic algorithms; BGA; missing at random; MAR; missing completely at random; MCAR; databases.

DOI: 10.1504/IJITM.2015.068461

International Journal of Information Technology and Management, 2015 Vol.14 No.2/3, pp.185 - 200

Received: 02 May 2012
Accepted: 05 Dec 2012

Published online: 18 Mar 2015 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article