Title: An effective ensemble method for missing data imputation

Authors: Bikash Baruah; Manash P. Dutta; Dhruba K. Bhattacharyya

Addresses: Department of Computer Science and Engineering, NIT Arunachal Pradesh, India ' Department of Computer Science and Engineering, NIT Arunachal Pradesh, India ' Department of Computer Science and Engineering, Tezpur University, Tezpur, India

Abstract: The presence of missing data in a dataset plays a vital role in the design of classification, clustering, or regression methods. An efficient missing data imputation can enhance the overall performance of a machine learning method. This paper ensembles k-nearest neighbour imputation, local least square imputation, miss forest imputation, and k-means clustering imputation using the bagging approach to handle missing values over a wide range of datasets. The method has been tested with eight different datasets in terms of root mean square error, median absolute percentage error, mean absolute percentage error, and standard deviation. Experimental results show that our method gives a low error rate compared to its closed competitors.

Keywords: missing data imputation; ensemble method; k-nearest neighbour; KNN; local least square; LLC; miss forest; k-means clustering; KMC.

DOI: 10.1504/IJICS.2023.128846

International Journal of Information and Computer Security, 2023 Vol.20 No.3/4, pp.295 - 314

Received: 25 Mar 2021
Accepted: 26 Jul 2021

Published online: 07 Feb 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article