Title: Missing Data Imputation Techniques

Authors: Qinbao Song, Martin Shepperd

Addresses: Department of Computer Science and Technology, Xi'an Jiaotong University, 28 Xian-Ning West Road, Xi'an, Shaanxi 710049, China. ' School of IS, Computing and Maths, Brunel University, Uxbridge UB8 3PH, UK

Abstract: Intelligent data analysis techniques are useful for better exploring real-world data sets. However, the real-world data sets always are accompanied by missing data that is one major factor affecting data quality. At the same time, good intelligent data exploration requires quality data. Fortunately, Missing Data Imputation Techniques (MDITs) can be used to improve data quality. However, no one method MDIT can be used in all conditions, each method has its own context. In this paper, we introduce the MDITs to the KDD and machine learning communities by presenting the basic idea and highlighting the advantages and limitations of each method.

Keywords: data quality; KDD; data mining; machine learning; data cleaning; missing data; data imputation; missingness mechanism; missing data pattern; single imputation; multiple imputation; intelligent data analysis; knowledge discovery from databases.

DOI: 10.1504/IJBIDM.2007.015485

International Journal of Business Intelligence and Data Mining, 2007 Vol.2 No.3, pp.261 - 291

Published online: 19 Oct 2007 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article