Title: Fraud detection with machine learning: model comparison

Authors: João Pacheco; João Chela; Guilherme Salomé

Addresses: Getulio Vargas Foundation (FGV), Rio de Janeiro, Brazil ' Getulio Vargas Foundation (FGV), Rio de Janeiro, Brazil ' Eli Lilly and Company, Indianapolis, Indiana, USA

Abstract: This work evaluates the performance of different models for predicting three types of fraudulent behaviour in a novel dataset with imbalanced data. The logistic regression model, a staple in the credit risk industry, is compared to several machine learning models. This work shows that in the binary classification case, all compared models achieved similar results to the logistic regression. The random forest model showed superior performance when classifying credit frauds ending in lawsuits. In the multi-label classification case, the logistic regression attains high levels of precision for all types of fraud, but at lower recall rates, whereas the random forest model achieves higher recall rates, but with lower precision rates.

Keywords: fraud detection; machine learning; imbalanced data; multi-label classification.

DOI: 10.1504/IJBIDM.2023.130587

International Journal of Business Intelligence and Data Mining, 2023 Vol.22 No.4, pp.434 - 450

Received: 24 May 2021
Accepted: 26 Oct 2021

Published online: 01 May 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article