Authors: N.D. Patel; B.M. Mehtre; Rajeev Wankar
Addresses: Centre of Excellence in Cyber Security, Institute for Development and Research in Banking Technology (IDRBT), Hyderabad, India; School of Computer and Information Sciences (SCIS), University of Hyderabad (UoH), Telangana, India ' Centre of Excellence in Cyber Security, Institute for Development and Research in Banking Technology (IDRBT), Hyderabad, India ' School of Computer and Information Sciences (SCIS), University of Hyderabad (UoH), Telangana, India
Abstract: Existing machine-learning research aims to improve the predictive capability of datasets using various feature selection and classification models. In the intrusion detection, data consists of normal data and a minimal number of attack data. This data imbalance causes prediction performance degradation due to factors such as prediction bias of small data presence of outliers. To address this issue, we oversampled the minority class of the existing intrusion detection datasets using four data oversampling methods and tested using three different classifiers. To further ensure the real-time applicability of these oversampling methods with these classifiers, we also generate a real-time testbed (RTT) resampled dataset. It is observed that CTGAN oversampling method, along with the LightGBM classifier, gives outperforming results on the existing CICIDS2018 and RTT resampled dataset. Test results also outperformed over the existing intrusion detection methods and datasets (credit card, gambling fraud, ISCX-Bot-2014, and CICIDS2017) in terms of accuracy, precision, etc.
Keywords: intrusion detection system; data imbalance; SMOTE; borderline-SMOTE; ADASYN; CTGAN; oversampling; classification model; NSL-KDD; CIC-IDS2018; attack detection system.
International Journal of Ad Hoc and Ubiquitous Computing, 2023 Vol.42 No.4, pp.243 - 257
Received: 07 Mar 2022
Received in revised form: 25 May 2022
Accepted: 08 Jun 2022
Published online: 21 Apr 2023 *