Title: Application of ensemble methods for classification of water quality

Authors: Mohamad Sakizadeh

Addresses: Department of Environmental Sciences, Faculty of Sciences, Shahid Rajaee Teacher Training University, Shahid Shabanloo Avenue, Lavizan, Tehran, Iran

Abstract: Groundwater pollution in Shoosh Aquifer located in Khuzestan Province, Iran, was considered, using an eight years time period data set collected from 30 sampling wells. Cluster analysis rendered a dendrogram where 30 sampling wells were grouped into three statistically significant clusters. The classification methods, k-nearest neighbour and classification tree, were utilised to classify sampling stations, with respect to the level of pollution. The optimum tree depth and number of neighbours were determined by 4-fold misclassification error which both had an error of 0.167. An ensemble was created using these base classifiers. In addition, considering the small sample size of our data in this study, random subspace as a feature selection method was amalgamated with k-nearest neighbour ensemble. The misclassification errors of classification tree and k-nearest neighbour ensembles were 0.13 and 0.10, respectively. The results of this study confirmed the high accuracy of ensemble methods for data classification.

Keywords: groundwater contamination; classification methods; classification tree; k-nearest neighbour; k-NN; ensemble methods.

DOI: 10.1504/IJW.2017.083764

International Journal of Water, 2017 Vol.11 No.2, pp.114 - 131

Received: 29 Jun 2015
Accepted: 04 Nov 2015

Published online: 12 Apr 2017 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article