Authors: Brad Wardman; Jason Britt; Gary Warner
Addresses: PayPal Inc., 2211 N 1st St., San Jose, California 95131, USA ' University of Alabama at Birmingham, 1720 2nd Ave. S, Birmingham, AL 35233, USA ' University of Alabama at Birmingham, 1720 2nd Ave. S, Birmingham, AL 35233, USA
Abstract: Organisations continue to pursue new strategies to thwart phishing attacks as well as investigate the criminals behind these scams. In order to address these issues, a novel algorithm named syntactical fingerprinting is proposed which automatically identifies phishing websites and implies the provenance of these websites using the structural components that compose the website. Syntactical fingerprinting demonstrates the ability to accurately identify newly observed phishing websites through an experiment on a custom dataset consisting of 49,840 URLs collected over three months by the UAB phishing data mine. An additional experiment was run over a different set of website content in early 2011 which exhibits the use of syntactical fingerprinting as a distance metric for clustering phishing websites. Finally, varying the threshold value used by syntactical fingerprinting demonstrates the capability for phishing investigators to identify not only the source of phishing websites, but individual phishers as well.
Keywords: cybercrime; digital forensics; website clustering; file matching algorithms; anti-fraud; fraud detection; social engineering; provenance; attribution; branding; phishing attacks; phishers; electronic security; syntactical fingerprinting; phishing websites; phishing website identification; phisher identification.
International Journal of Electronic Security and Digital Forensics, 2014 Vol.6 No.1, pp.62 - 80
Received: 03 Jul 2013
Accepted: 14 Jan 2014
Published online: 27 Mar 2014 *