Title: K-mean and mean-shift algorithms in machine learning model for efficient malware categorisation

Authors: Keshava Srinivas; Rahul Trivedi; Dharmik Patel; Kakelli Anil Kumar

Addresses: School of Computer Science and Engineering, Vellore Institute of Technology, Vellore-632014, TN, India ' School of Computer Science and Engineering, Vellore Institute of Technology, Vellore-632014, TN, India ' School of Computer Science and Engineering, Vellore Institute of Technology, Vellore-632014, TN, India ' School of Computer Science and Engineering, Vellore Institute of Technology, Vellore-632014, TN, India

Abstract: Malware can cause havoc not only in the personal devices of individuals but also in multinational organisations on their data handling capabilities. So, it is paramount to classify whether a file is malware or not, and classifying based on its behaviour is highly essential for efficient malware analysis. This paper proposes a new technique with the help of based on machine learning (ML) to classify and categorise malware (PE32 files). We have extracted various features from the PE32 file from its subsections and applied the dimension reductionality techniques like principal component analysis (PCA) and non-negative matrix factorisation (NMF) to reduce the number of features that have highly preferable to use by the machine learning model. Our proposed technique has been implemented using K-mean and mean-shift algorithms for better accuracy in malware detection and classification concerning the behaviour.

Keywords: static analysis; clustering; K-means; mean-shift; principal component analysis; PCA; non-matrix factorisation; elbow method; silhouette measure.

DOI: 10.1504/IJITST.2022.125789

International Journal of Internet Technology and Secured Transactions, 2022 Vol.12 No.5, pp.406 - 424

Received: 09 Dec 2020
Accepted: 22 Sep 2021

Published online: 28 Sep 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article