Title: Performance evaluation of outlier rules for labelling outliers in multidimensional dataset

Authors: Kelly C. Ramos Da Silva; Helder L. Costa De Oliveira; André C.P.L.F. De Carvalho

Addresses: Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil ' Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil ' Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil

Abstract: The output of outlier detection algorithm applied to multidimensional dataset usually consists of scores defining the level of abnormality of each instance. However, this process per se does not identify the outlying instances. For this purpose, it is common to use an outlier rule to convert outlier scores into labels. The problem is therefore to determine an appropriate outlier rule, based on certain patterns of the scores alone. In order to deal with this problem, we studied and evaluated several traditional robust outlier rules following a pragmatic approach. The analysis of the results was facilitated by an evaluation measure developed by us. This measure was proved to be more effective than traditional measures involving only true positive and true negative rates. By using this measure, we were able to study the behaviour of different outlier rules whose performances were evaluated under varying skewness and contamination level.

Keywords: outlier detection; outlier rule; evaluation measure; boxplot; adjusted boxplot; k-NN.

DOI: 10.1504/IJBIDM.2021.117111

International Journal of Business Intelligence and Data Mining, 2021 Vol.19 No.2, pp.135 - 152

Received: 17 Sep 2018
Accepted: 08 May 2019

Published online: 17 Aug 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article