Title: Obfuscated code is identifiable by a token-based code clone detection technique

Authors: Junaid Akram; Danish Vasan; Ping Luo

Addresses: Key Laboratory of Information System Security, School of Software, Tsinghua University, 100084, China ' Key Laboratory of Information System Security, School of Software, Tsinghua University, 100084, China ' Key Laboratory of Information System Security, School of Software, Tsinghua University, 100084, China

Abstract: Recently, developers use obfuscation techniques to make their code difficult to understand or analyse, especially malware developers. In Android applications, if the application is obfuscated, it is hard to retrieve the exact source code after applying reverse engineering techniques on it. In this paper, we propose an approach which is based on clone detection technique and it can detect obfuscated code in Android applications very efficiently. We perform two experiments on different types of datasets including obfuscated and non-obfuscated application's source code. We successfully detected the obfuscated code of two types, including identifier-renaming and string-encryption with a high accuracy of 95%. A comparative study with other state-of-the-art tools proves the efficiency of our proposed approach. Experimental results show that our approach is reliable, efficient and can be implemented at a large-scale level.

Keywords: obfuscation handling; code clones; software security; malware detection; Android applications; code reuse.

DOI: 10.1504/IJICS.2022.127132

International Journal of Information and Computer Security, 2022 Vol.19 No.3/4, pp.254 - 273

Received: 12 Nov 2019
Accepted: 09 Apr 2020

Published online: 23 Nov 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article