Title: A comparative analysis of predictive data mining techniques

Authors: Xueping Li, Godswill Chukwugozie Nsofor, Laigang Song

Addresses: Department of Industrial and Information Engineering, University of Tennessee, Knoxville, TN 37996, USA. ' Department of Industrial and Information Engineering, University of Tennessee, Knoxville, TN 37996, USA. ' Department of Industrial and Information Engineering, University of Tennessee, Knoxville, TN 37996, USA

Abstract: It is non-trivial to select the appropriate prediction technique from a variety of existing techniques for a datasets, since the competitive evaluation of techniques (bagging, boosting, stacking and meta-learning) can be time consuming. This paper compares five predictive data mining techniques on four unique datasets that have a combination of the following characteristics: few predictor variables, many predictor variables, highly collinear variables, very redundant variables and the presence of outliers. Different data mining techniques, including multiple linear regression (MLR), principal component regression (PCR), ridge regression, partial least squares (PLS) and non-linear partial least squares (NLPLS), are applied to each of the datasets. The comparisons are based on different criteria: R-square, R-square adjusted, mean square error (MSE), mean absolute error (MAE), coefficient of efficiency, condition number (CN) and the number of variables of features included in the model. The advantages and disadvantages of the techniques are discussed and summarised.

Keywords: predictive data mining; statistical analysis; knowledge discovery; multiple linear regression; MLR; principal component regression; PCR; ridge regression; partial least squares; nonlinear PLS.

DOI: 10.1504/IJRAPIDM.2009.029380

International Journal of Rapid Manufacturing, 2009 Vol.1 No.2, pp.150 - 172

Published online: 28 Nov 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article