Title: Categorical missing data imputation approach via sparse representation

Authors: Xiaochen Shao; Sen Wu; Xiaodong Feng; Rui Song

Addresses: Donlinks School of Economics and Management, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083, China ' Donlinks School of Economics and Management, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083, China ' School of Political Science and Public Administration, University of Electronic Science and Technology of China, 2006 Xiyuan Road, High-Tech West District, Chengdu, Sichuan 611731, China ' Datang Telecom Technology and Industry Group, 40 Xueyuan Road, Haidian District, Beijing 100083, China

Abstract: K-nearest neighbour (KNN) is an important method for imputation of categorical missing data. The effectiveness of KNN is highly sensitive to some local parameters such as the choice of similarity function and number of neighbours. Aimed at solving these two issues, a categorical missing data imputation algorithm (CSR) is proposed. It firstly conducts matrix transform to make categorical data more complied with calculation. Then it introduces locality constraint thought to sparse representation theory by using KNN as dictionary construction. After that, this method gets weight vectors for each missing instance with smoothness and local structure feature. Lastly, the algorithm selects the maximal corresponding reconstruction value of each missing attribute to fill up the missing data by using the sparse reconstruction coefficient vector. Empirical tests show that CSR outperforms KNNimpute (including its two derivative methods IKNNimpute, SKNNimpute) and LLSimpute from the view of efficiency and stability.

Keywords: missing values; K-nearest neighbour; kNN; categorical attribute; missing data imputation; sparse representation; dictionary learning; locality constraint; lasso optimisation; distance penalty; local smoothness.

DOI: 10.1504/IJSTM.2016.078542

International Journal of Services Technology and Management, 2016 Vol.22 No.3/4/5, pp.256 - 270

Received: 20 Jan 2016
Accepted: 15 Apr 2016

Published online: 22 Aug 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article