Dyussekeyev, Askar2022-06-102022-06-102022-05Dyussekeyev, A. (2022). Malware Classification of Decompiled Windows Executables using Transfer Learning Techniques (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstanhttp://nur.nu.edu.kz/handle/123456789/6202Malicious software is recognized as a threat at both the individual and national levels due to the threat to critical infrastructures such as energy systems and communication networks, which are increasingly subject to probes and attacks. The high variability and creativity of malware strategies and anti-detection techniques significantly complicate their detection by traditional methods. Machine learning has previously proven to be an effective method to resolve malware classification issues. We propose using the transfer learning technique to employ promising algorithms from other problem domains and test their efficacy in the problem of malware detection. In this paper, we consider two approaches that utilize pre-trained models from the domains of Computer Vision (CV) and Natural Language Processing (NLP) and apply them to a malware classification problem. The industry-leading decompiler IDA Pro is applied to convert binary samples to decompiled codes. Then, the text classification model CodeBERT was applied to classify various malware families. The primary difference of such design from previous papers is introducing the decompiling stage, which will allow the application of techniques from the text processing domain. Regarding the Computer Vision method, each word of decompiled code is encoded into a 3-bytes form followed by transforming the resulting bytes sequence into an RGB image. Then, we apply fine-tuned variations of the state-of-the-art model for Computer Vision. The main contributions of this paper are introducing the decompilation state of Windows binaries and assessing current state-of-the-art models from NLP and CV domains for malware classification. In the last section, a comparison of the baseline and implemented techniques during this research is provided. Moreover, opportunities and limitations of presented approaches are provided as well as implications and proposals for further research.enAttribution-NonCommercial-ShareAlike 3.0 United StatesType of access: Gated AccessNatural Language ProcessingComputer VisionResearch Subject Categories::TECHNOLOGYCVCodeBERTIDA ProTransfer Learning TechniquesMalwareMALWARE CLASSIFICATION OF DECOMPILED WINDOWS EXECUTABLES USING TRANSFER LEARNING TECHNIQUESMaster's thesis