Title: Fraud detection with machine learning: model comparison
Authors: João Pacheco; João Chela; Guilherme Salomé
Addresses: Getulio Vargas Foundation (FGV), Rio de Janeiro, Brazil ' Getulio Vargas Foundation (FGV), Rio de Janeiro, Brazil ' Eli Lilly and Company, Indianapolis, Indiana, USA
Abstract: This work evaluates the performance of different models for predicting three types of fraudulent behaviour in a novel dataset with imbalanced data. The logistic regression model, a staple in the credit risk industry, is compared to several machine learning models. This work shows that in the binary classification case, all compared models achieved similar results to the logistic regression. The random forest model showed superior performance when classifying credit frauds ending in lawsuits. In the multi-label classification case, the logistic regression attains high levels of precision for all types of fraud, but at lower recall rates, whereas the random forest model achieves higher recall rates, but with lower precision rates.
Keywords: fraud detection; machine learning; imbalanced data; multi-label classification.
DOI: 10.1504/IJBIDM.2023.130587
International Journal of Business Intelligence and Data Mining, 2023 Vol.22 No.4, pp.434 - 450
Received: 24 May 2021
Accepted: 26 Oct 2021
Published online: 01 May 2023 *