Title: Malicious URL detection with feature extraction based on machine learning

Authors: Baojiang Cui; Shanshan He; Xi Yao; Peilin Shi

Addresses: Nation Engineering Laboratory for Mobile Network Security, Beijing University of Post and Telecommunications, Beijing, China ' Nation Engineering Laboratory for Mobile Network Security, Beijing University of Post and Telecommunications, Beijing, China ' QIHU 360 Software Co. Limited, Building #2, No. 6 Jiuxianqiao Road, Chaoyang District, Beijing, China ' Nation Engineering Laboratory for Mobile Network Security, Beijing University of Post and Telecommunications, Beijing, China

Abstract: Many web applications suffer from various web attacks due to the lack of awareness concerning security. Therefore, it is necessary to improve the reliability of web applications by accurately detecting malicious URLs. In previous studies, keyword matching has always been used to detect malicious URLs, but this method is not adaptive. In this paper, statistical analyses based on gradient learning and feature extraction using a sigmoidal threshold level are combined to propose a new detection approach based on machine learning techniques. Moreover, the naïve Bayes, decision tree and SVM classifiers are used to validate the accuracy and efficiency of this method. Finally, the experimental results demonstrate that this method has a good detection performance, with an accuracy rate above 98.7%. In practical use, this system has been deployed online and is being used in large-scale detection, analysing approximately 2 TB of data every day.

Keywords: malicious URLs; feature selection; machine learning; multiple algorithms.

DOI: 10.1504/IJHPCN.2018.094367

International Journal of High Performance Computing and Networking, 2018 Vol.12 No.2, pp.166 - 178

Received: 24 Mar 2016
Accepted: 27 Jul 2016

Published online: 31 Aug 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article