Title: Winsorize tree algorithm for handling outlier in classification problem

Authors: Chee Keong Ch'ng; Nor Idayu Mahat

Addresses: School of Quantitative Sciences, College of Arts and Sciences, Universiti Utara Malaysia, 06010 UUM Sintok, Kedah, Malaysia ' School of Quantitative Sciences, College of Arts and Sciences, Universiti Utara Malaysia, 06010 UUM Sintok, Kedah, Malaysia

Abstract: Classification and regression tree (CART) has been widely used nowadays for providing users supports in classification and prediction. However, having outlier in database is inevitable and could affect the size and accuracy of the tree. Negligence in handling the outlier could affect the splitting point which yields to bias and inaccurate tree. In this paper, we propose a winsorize tree algorithm for detecting and handling the outlier before calculating gini index measurement in all non-terminal nodes. As such, the constructed tree will grow without the necessity to be pruned. For evaluation, the proposed approach was compared to classical tree and pruned tree. The results obtained from seven real datasets indicate that the proposed winsorize tree performs as good as or even better compare to the other investigated trees.

Keywords: winsorize tree algorithm; gini index; error rate; classification; outlier; classification and regression tree; winsorized tree.

DOI: 10.1504/IJOR.2020.107073

International Journal of Operational Research, 2020 Vol.38 No.2, pp.278 - 293

Accepted: 29 Aug 2017
Published online: 04 May 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article