Authors: Xueping Li, Godswill Chukwugozie Nsofor, Laigang Song
Addresses: Department of Industrial and Information Engineering, University of Tennessee, Knoxville, TN 37996, USA. ' Department of Industrial and Information Engineering, University of Tennessee, Knoxville, TN 37996, USA. ' Department of Industrial and Information Engineering, University of Tennessee, Knoxville, TN 37996, USA
Abstract: It is non-trivial to select the appropriate prediction technique from a variety of existing techniques for a datasets, since the competitive evaluation of techniques (bagging, boosting, stacking and meta-learning) can be time consuming. This paper compares five predictive data mining techniques on four unique datasets that have a combination of the following characteristics: few predictor variables, many predictor variables, highly collinear variables, very redundant variables and the presence of outliers. Different data mining techniques, including multiple linear regression (MLR), principal component regression (PCR), ridge regression, partial least squares (PLS) and non-linear partial least squares (NLPLS), are applied to each of the datasets. The comparisons are based on different criteria: R-square, R-square adjusted, mean square error (MSE), mean absolute error (MAE), coefficient of efficiency, condition number (CN) and the number of variables of features included in the model. The advantages and disadvantages of the techniques are discussed and summarised.
Keywords: predictive data mining; statistical analysis; knowledge discovery; multiple linear regression; MLR; principal component regression; PCR; ridge regression; partial least squares; nonlinear PLS.
International Journal of Rapid Manufacturing, 2009 Vol.1 No.2, pp.150 - 172
Available online: 28 Nov 2009 *Full-text access for editors Access for subscribers Purchase this article Comment on this article