You can view the full text of this article for free using the link below.

Title: Missing data imputation by the aid of features similarities

Authors: Samih M. Mostafa

Addresses: Mathematics Department, Faculty of Science, South Valley University, Qena, Eygpt

Abstract: The missing data is likely to occur in statistical analyses. The quality of the data is affected by the used imputation method. In this paper, a method is proposed to impute the missing data on variables of interest (i.e., recipient) using observed values from other variables (i.e., donors). Some existing methods rely upon only the recipient (e.g., unconditional means), others rely on the recipient and one donor (i.e., interpolation). The proposed method depends on the similarities of the values in the donor to impute the missing data in the recipient. If the similarities are not sufficient to impute all missing values, another method is combined with the proposed method to impute the residual missing data. The proposed approach is straightforward and can be combined with existing methods. The empirical study validated the superiority of the proposed approach and showed that it can significantly improve the quality of data. In addition, the improvement is more remarkable when the missing values ratio is greater.

Keywords: imputation; unconditional mean; missingness mechanisms; missing values.

DOI: 10.1504/IJBDM.2020.106883

International Journal of Big Data Management, 2020 Vol.1 No.1, pp.81 - 103

Received: 07 Mar 2019
Accepted: 21 Aug 2019

Published online: 24 Apr 2020 *

Full-text access for editors Full-text access for subscribers Free access Comment on this article