Authors: Kelly C. Ramos Da Silva; Helder L. Costa De Oliveira; André C.P.L.F. De Carvalho
Addresses: Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil ' Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil ' Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil
Abstract: The output of outlier detection algorithm applied to multidimensional dataset usually consists of scores defining the level of abnormality of each instance. However, this process per se does not identify the outlying instances. For this purpose, it is common to use an outlier rule to convert outlier scores into labels. The problem is therefore to determine an appropriate outlier rule, based on certain patterns of the scores alone. In order to deal with this problem, we studied and evaluated several traditional robust outlier rules following a pragmatic approach. The analysis of the results was facilitated by an evaluation measure developed by us. This measure was proved to be more effective than traditional measures involving only true positive and true negative rates. By using this measure, we were able to study the behaviour of different outlier rules whose performances were evaluated under varying skewness and contamination level.
Keywords: outlier detection; outlier rule; evaluation measure; boxplot; adjusted boxplot; k-NN.
International Journal of Business Intelligence and Data Mining, 2021 Vol.19 No.2, pp.135 - 152
Received: 17 Sep 2018
Accepted: 08 May 2019
Published online: 21 Jul 2021 *