Title: Detection of phishing websites using data mining tools and techniques

Authors: Mansi Somani; Mamatha Balachandra

Addresses: Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India ' Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India

Abstract: Phishing, a prevailing cyber-security issue, is one of the most common attacks to obtain user's sensitive information. To eradicate it, the users or software should detect it first. A popular approach to carry out phishing is through generating phishing URLs. A URL could be legitimate or phishy which fits phishing into a perfect classification-type problem in data mining. Hence, data mining algorithms - C4.5 (J48), SVM, Random Forest, Treebag and GBM have been trained to carry out a comparison on measures - accuracy, recall and precision to determine the most suited model. Rules have been listed that categories the features which make a website phishy or legitimate. Work has been done using R language on RStudio. The dataset used comprises of 11,055 tuples and 31 attributes. It is trained, tested and used for detection. Among the five classifiers used, the best accuracy is obtained through Random Forest model which is 97.21%.

Keywords: phishing; security; data mining; URL; features; algorithm; classifiers; accuracy; precision; recall; confusion matrix.

DOI: 10.1504/IJAIP.2022.123021

International Journal of Advanced Intelligence Paradigms, 2022 Vol.22 No.1/2, pp.167 - 183

Received: 31 May 2018
Accepted: 22 Oct 2018

Published online: 23 May 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article