Title: Prediction of customer churn risk with advanced machine learning methods

Authors: Oguzhan Akan; Abhishek Verma; Sonika Sharma

Addresses: Department of Computer Science, California State University, Northridge, CA 91330, USA ' Department of Computer Science, California State University, Northridge, CA 91330, USA ' Department of Commerce, Shaheed Bhagat Singh College, University of Delhi, Delhi, 110017, India

Abstract: Customer churn risk prediction is an important area of research as it directly impacts the revenue stream of businesses. An ability to predict customer churn allows businesses to come up with better strategies to retain existing customers. In this research we perform a comprehensive comparison of feature selection methods, upsampling methods, and machine learning methods on the customer churn risk dataset: i) Our research compares likelihood-based, tree-based, and layer-based machine learning methods on the churn dataset; ii) Models built on the churn dataset without upsampling performed better than oversampling methods. However, synthetic minority oversampling technique (SMOTE) and adaptive synthetic sampling (ADASYN) helped stabilise model performance; iii) the models built on ADASYN dataset were slightly better than the SMOTE counterparts; iv) it was observed that XGBoost and deep cascading forest (DCF) combined with XGBoost were consistently better across all metrics compared to other methods; and v) information Value analysis performed better than PCA. In particular, IVR DCFX model has the best AUROC score with 89.1%.

Keywords: customer churn; DNNs; deep neural networks; DCF; deep cascading forest; SMOTE; synthetic minority oversampling technique; ADASYN; adaptive synthetic sampling.

DOI: 10.1504/IJDS.2025.144832

International Journal of Data Science, 2025 Vol.10 No.1, pp.70 - 95

Received: 20 May 2023
Accepted: 15 Apr 2024

Published online: 04 Mar 2025 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article