MALWARE CLASSIFICATION OF DECOMPILED WINDOWS EXECUTABLES USING TRANSFER LEARNING TECHNIQUES
Loading...
Date
2022-05
Authors
Dyussekeyev, Askar
Journal Title
Journal ISSN
Volume Title
Publisher
Nazarbayev University School of Engineering and Digital Sciences
Abstract
Malicious software is recognized as a threat at both the individual and national levels
due to the threat to critical infrastructures such as energy systems and communication
networks, which are increasingly subject to probes and attacks. The high variability
and creativity of malware strategies and anti-detection techniques significantly
complicate their detection by traditional methods. Machine learning has previously
proven to be an effective method to resolve malware classification issues. We propose
using the transfer learning technique to employ promising algorithms from other
problem domains and test their efficacy in the problem of malware detection. In this
paper, we consider two approaches that utilize pre-trained models from the domains
of Computer Vision (CV) and Natural Language Processing (NLP) and apply them
to a malware classification problem. The industry-leading decompiler IDA Pro is
applied to convert binary samples to decompiled codes. Then, the text classification
model CodeBERT was applied to classify various malware families. The primary
difference of such design from previous papers is introducing the decompiling stage,
which will allow the application of techniques from the text processing domain. Regarding
the Computer Vision method, each word of decompiled code is encoded into
a 3-bytes form followed by transforming the resulting bytes sequence into an RGB image.
Then, we apply fine-tuned variations of the state-of-the-art model for Computer
Vision. The main contributions of this paper are introducing the decompilation state
of Windows binaries and assessing current state-of-the-art models from NLP and CV
domains for malware classification. In the last section, a comparison of the baseline
and implemented techniques during this research is provided. Moreover, opportunities
and limitations of presented approaches are provided as well as implications and
proposals for further research.
Description
Keywords
Type of access: Gated Access, Natural Language Processing, Computer Vision, Research Subject Categories::TECHNOLOGY, CV, CodeBERT, IDA Pro, Transfer Learning Techniques, Malware
Citation
Dyussekeyev, A. (2022). Malware Classification of Decompiled Windows Executables using Transfer Learning Techniques (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstan