A Modern Solution for Windows Malware Detection with Static PE Header Features

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Malware continues to pose a significant threat to individuals and businesses of all sizes. Malware analysts have been trying to create an automated solution for detecting malware and usually resort to Machine Learning. They use features extracted statically and dynamically from portable executable (PE) files to train the model, which they then use for malware detection. In this project we created a dataset comprising of 34,055 samples, using the low-level tool called "PExtract". This dataset is aimed for malware detection and contains DOS Header information, Optional Header information, Section Names, DLLs and API calls. Using the string features from the dataset, we trained and evaluated 15 models, including Transformers, Naive Bayes, and a Neural Network model proposed by the state-of-the-art. We reproduced API-MalDetect, as the representation of state-of-the-art, in order to compare it to our proposal. We found that using the ModernBERT model for static malware detection is the best option for string type features. During evaluation, we found that ModernBERT yields the best results, reaching 98.7% accuracy on our test set. API-MalDetect reached peak 91.55% accuracy when trained on Section Names. Conversely, Naive Bayes models consistently underperformed and are therefore not recommended when working with low-level static features. Our contributions include the development of the PExtract tool and dataset, as well as the fine-tuned Transformer model, ModernBERT, which outperforms current state-of-the-art methods.
Original languageEnglish
Title of host publication2025 9th Cyber Security in Networking Conference (CSNet)
PublisherIEEE
ISBN (Electronic)9798331575564
ISBN (Print)9798331575571
DOIs
Publication statusPublished - 16 Dec 2025
Event9th Cyber Security in Networking Conference 2025 - Abu Dhabi, United Arab Emirates
Duration: 20 Oct 202522 Oct 2025

Conference

Conference9th Cyber Security in Networking Conference 2025
Abbreviated titleCSNet 2025
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period20/10/2522/10/25

Keywords

  • Accuracy
  • Neural networks
  • Feature extraction
  • Transformers
  • Probabilistic logic
  • Malware
  • Tokenization
  • Bayes methods
  • Proposals
  • Random forests
  • malware detection
  • transformers
  • malware analysis
  • malware dataset
  • neural networks
  • deep learning

Fingerprint

Dive into the research topics of 'A Modern Solution for Windows Malware Detection with Static PE Header Features'. Together they form a unique fingerprint.

Cite this