MALWARE CLASSIFICATION OF DECOMPILED WINDOWS EXECUTABLES USING TRANSFER LEARNING TECHNIQUES

dc.contributor.authorDyussekeyev, Askar
dc.date.accessioned2022-06-10T04:47:51Z
dc.date.available2022-06-10T04:47:51Z
dc.date.issued2022-05
dc.description.abstractMalicious software is recognized as a threat at both the individual and national levels due to the threat to critical infrastructures such as energy systems and communication networks, which are increasingly subject to probes and attacks. The high variability and creativity of malware strategies and anti-detection techniques significantly complicate their detection by traditional methods. Machine learning has previously proven to be an effective method to resolve malware classification issues. We propose using the transfer learning technique to employ promising algorithms from other problem domains and test their efficacy in the problem of malware detection. In this paper, we consider two approaches that utilize pre-trained models from the domains of Computer Vision (CV) and Natural Language Processing (NLP) and apply them to a malware classification problem. The industry-leading decompiler IDA Pro is applied to convert binary samples to decompiled codes. Then, the text classification model CodeBERT was applied to classify various malware families. The primary difference of such design from previous papers is introducing the decompiling stage, which will allow the application of techniques from the text processing domain. Regarding the Computer Vision method, each word of decompiled code is encoded into a 3-bytes form followed by transforming the resulting bytes sequence into an RGB image. Then, we apply fine-tuned variations of the state-of-the-art model for Computer Vision. The main contributions of this paper are introducing the decompilation state of Windows binaries and assessing current state-of-the-art models from NLP and CV domains for malware classification. In the last section, a comparison of the baseline and implemented techniques during this research is provided. Moreover, opportunities and limitations of presented approaches are provided as well as implications and proposals for further research.en_US
dc.identifier.citationDyussekeyev, A. (2022). Malware Classification of Decompiled Windows Executables using Transfer Learning Techniques (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstanen_US
dc.identifier.urihttp://nur.nu.edu.kz/handle/123456789/6202
dc.language.isoenen_US
dc.publisherNazarbayev University School of Engineering and Digital Sciencesen_US
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjectType of access: Gated Accessen_US
dc.subjectNatural Language Processingen_US
dc.subjectComputer Visionen_US
dc.subjectResearch Subject Categories::TECHNOLOGYen_US
dc.subjectCVen_US
dc.subjectCodeBERTen_US
dc.subjectIDA Proen_US
dc.subjectTransfer Learning Techniquesen_US
dc.subjectMalwareen_US
dc.titleMALWARE CLASSIFICATION OF DECOMPILED WINDOWS EXECUTABLES USING TRANSFER LEARNING TECHNIQUESen_US
dc.typeMaster's thesisen_US
workflow.import.sourcescience

Files

Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
Thesis - Askar Dyussekeyev.pdf
Size:
1.23 MB
Format:
Adobe Portable Document Format
Description:
Thesis
No Thumbnail Available
Name:
Presentation - Askar Dyussekeyev.pptx
Size:
2.98 MB
Format:
Microsoft Powerpoint XML
Description:
Presentation
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.28 KB
Format:
Item-specific license agreed upon to submission
Description: