MALWARE CLASSIFICATION OF DECOMPILED WINDOWS EXECUTABLES USING TRANSFER LEARNING TECHNIQUES

Loading...
Thumbnail Image

Date

2022-05

Authors

Dyussekeyev, Askar

Journal Title

Journal ISSN

Volume Title

Publisher

Nazarbayev University School of Engineering and Digital Sciences

Abstract

Malicious software is recognized as a threat at both the individual and national levels due to the threat to critical infrastructures such as energy systems and communication networks, which are increasingly subject to probes and attacks. The high variability and creativity of malware strategies and anti-detection techniques significantly complicate their detection by traditional methods. Machine learning has previously proven to be an effective method to resolve malware classification issues. We propose using the transfer learning technique to employ promising algorithms from other problem domains and test their efficacy in the problem of malware detection. In this paper, we consider two approaches that utilize pre-trained models from the domains of Computer Vision (CV) and Natural Language Processing (NLP) and apply them to a malware classification problem. The industry-leading decompiler IDA Pro is applied to convert binary samples to decompiled codes. Then, the text classification model CodeBERT was applied to classify various malware families. The primary difference of such design from previous papers is introducing the decompiling stage, which will allow the application of techniques from the text processing domain. Regarding the Computer Vision method, each word of decompiled code is encoded into a 3-bytes form followed by transforming the resulting bytes sequence into an RGB image. Then, we apply fine-tuned variations of the state-of-the-art model for Computer Vision. The main contributions of this paper are introducing the decompilation state of Windows binaries and assessing current state-of-the-art models from NLP and CV domains for malware classification. In the last section, a comparison of the baseline and implemented techniques during this research is provided. Moreover, opportunities and limitations of presented approaches are provided as well as implications and proposals for further research.

Description

Keywords

Type of access: Gated Access, Natural Language Processing, Computer Vision, Research Subject Categories::TECHNOLOGY, CV, CodeBERT, IDA Pro, Transfer Learning Techniques, Malware

Citation

Dyussekeyev, A. (2022). Malware Classification of Decompiled Windows Executables using Transfer Learning Techniques (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstan