Authors: Jing Wang; Dezhi Han
Addresses: School of Information Engineering, Shanghai Maritime University, 1550 Pudong Avenue, Pudong New Area, Shanghai, Shanghai, 201306, China ' School of Information Engineering, Shanghai Maritime University, 1550 Pudong Avenue, Pudong New Area, Shanghai, Shanghai, 201306, China
Abstract: With the advent of the era of big data, network intrusion detection systems based on K-means algorithm cannot meet the detection efficiency and detection speed requirements in big data environment. The DPC algorithm can be applied to high-dimensional network traffic and large-scale data application environments, but there are problems of large calculated amount and limited serial processing capability. Aiming at the problems of DPC algorithm, the DPC algorithm is adjusted firstly to improve the clustering accuracy of the algorithm. Then, the DPC algorithm a parallelised on the Spark platform, so that the processing ability and running speed of the DPC algorithm is greatly improved by running in parallel in the memory of multiple virtual machines. The experimental results show that the network intrusion detection system based on parallel DPC clustering algorithm has higher detection rate and lower false rate. The parallelisation clustering efficiency is much higher than the single-computer clustering efficiency.
Keywords: DPC; clustering; network intrusion detection; Spark; parallel.
International Journal of Embedded Systems, 2020 Vol.13 No.3, pp.318 - 327
Received: 09 Feb 2019
Accepted: 05 Aug 2019
Published online: 30 Sep 2020 *