Title: Malware detection using augmented naive Bayes with domain knowledge and under presence of class noise

Authors: Ismahani Ismail; Muhammad Nadzir Marsono; Sulaiman Mohd Nor

Addresses: Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310, Johor Bahru, Malaysia ' Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310, Johor Bahru, Malaysia ' Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310, Johor Bahru, Malaysia

Abstract: Malicious software (malware) attacks on the internet are on the rise in frequency and sophistication. Malware detection based on its content can detect malware more accurate because it relies on screening the payload for known malware signatures. New malware variants still exhibit prevalent contents that can be detected by looking at fixed substrings especially when using n-grams and machine learning technique. This paper focuses on detecting malware based on content classification technique that is augmented with domain knowledge (Snort signatures) to abridge features set and improve detection accuracy. Using 15 days dataset, the generated naive Bayes model with domain knowledge using the most descriptive 91,127 features shows the lowest false negative (around 2%). However, the presence of class noise has a significant impact on the results, even for machine learning technique augmented with domain knowledge.

Keywords: malware detection; feature classification; class noise; domain knowledge; augmented naive Bayes; content classification; machine learning.

DOI: 10.1504/IJICS.2014.065173

International Journal of Information and Computer Security, 2014 Vol.6 No.2, pp.179 - 197

Received: 20 Jul 2013
Accepted: 04 May 2014

Published online: 31 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article