Title: A comparative analysis of predictive data mining techniques

 

Author: Xueping Li, Godswill Chukwugozie Nsofor, Laigang Song

 

Address: Department of Industrial and Information Engineering, University of Tennessee, Knoxville, TN 37996, USA. ' Department of Industrial and Information Engineering, University of Tennessee, Knoxville, TN 37996, USA. ' Department of Industrial and Information Engineering, University of Tennessee, Knoxville, TN 37996, USA

 

Journal: Int. J. of Rapid Manufacturing, 2009 Vol.1, No.2, pp.150 - 172

 

Abstract: It is non-trivial to select the appropriate prediction technique from a variety of existing techniques for a datasets, since the competitive evaluation of techniques (bagging, boosting, stacking and meta-learning) can be time consuming. This paper compares five predictive data mining techniques on four unique datasets that have a combination of the following characteristics: few predictor variables, many predictor variables, highly collinear variables, very redundant variables and the presence of outliers. Different data mining techniques, including multiple linear regression (MLR), principal component regression (PCR), ridge regression, partial least squares (PLS) and non-linear partial least squares (NLPLS), are applied to each of the datasets. The comparisons are based on different criteria: R-square, R-square adjusted, mean square error (MSE), mean absolute error (MAE), coefficient of efficiency, condition number (CN) and the number of variables of features included in the model. The advantages and disadvantages of the techniques are discussed and summarised.

 

Keywords: predictive data mining; statistical analysis; knowledge discovery; multiple linear regression; MLR; principal component regression; PCR; ridge regression; partial least squares; nonlinear PLS.

 

DOI: 10.1504/IJRAPIDM.2009.029380

10.1504/09.29380

 

 

Purchase this articleComment on this article