Title: A comparison of two blending-based ensemble techniques for network anomaly detection in Spark distributed environment

Authors: Gagandeep Kaur; Meenal Jain

Addresses: Department of CSE and IT, JIIT, Noida Sector 62, Noida, 201309, India ' Department of CSE and IT, JIIT, Noida Sector 62, Noida, 201309, India

Abstract: In this paper, two blending-based ensemble models, namely, logistic regression-based blending ensemble and SVM-based blending ensemble have been compared in terms of their total training time in a distributed environment and their detection accuracy rates. To handle process of concept drift two clustering algorithms have been compared for their training times in a distributed environment. Tests have been conducted on different machines by varying the number of executor cores to study time latency in a distributed Spark environment. Logistic regression-based blending ensemble with an accuracy of 93% and an accuracy of 98% using SVM-based blending ensemble was achieved. The proposed models have been evaluated using CIDDS dataset.

Keywords: resilient distributed data structures; Apache Spark; clustering; K-means; Gaussian mixture model; GMM; random forest; ensemble; anomaly detection.

DOI: 10.1504/IJAHUC.2020.109794

International Journal of Ad Hoc and Ubiquitous Computing, 2020 Vol.35 No.2, pp.71 - 83

Received: 04 Mar 2020
Accepted: 17 Apr 2020

Published online: 24 Sep 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article