Comparison of Label Encoding and Evidence Counting for Malware Classification

Min Xuan Low, Timothy Tzen Vun Yap*, Wooi King Soo, Hu Ng, Vik Tor Goh, Ji Jian Chin, Thiam Yong Kuek

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

Malware is a type of software that is aimed to attack or harm a computer system without the owner's knowledge or permission. However, the traditional methods such as signature-based and behaviour-based malware analysis techniques heavily depend on manual inspection and detection by analysts, and it is nearly impossible, although it is very reliable, given the cons when dealing with techniques like polymorphism, metamorphism and obfuscation. Thus, this project aims to develop models to detect malware in operation. Two different techniques are considered for feature engineering, namely label encoding and evidence counting. Feature selection is applied to isolate less significant features. Machine learning models such as Random Forest (RF), Decision Tree (DT), K-Nearest Neighbour (K-NN) and Support Vector Machine (SVM) Classifiers, Multilayer Perceptron (MLP) and Long-Short Term Memory (LSTM) are applied for comparison of the efficacy of the two feature engineering approaches. The performances of the models are evaluated through accuracy, precision, recall, F1-score, and loss. Tree-based models such as RF and DT seem to be more suitable for mining patterns from labelled data, as their performances are generally better in the label encoding approach. SVM and K-NN tend to not cope very well with labelled data in this study. Deep learning approaches in this study has shown potential in malware classification, with further improvements required in building a robust solution against solving complex real-world malware detention and classification.

Original languageEnglish
Pages (from-to)17-30
Number of pages14
JournalJournal of System and Management Sciences
Volume12
Issue number6
DOIs
Publication statusPublished - Dec 2022

Keywords

  • classification
  • deep learning
  • machine learning
  • malware detection

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Information Systems
  • Computer Science Applications
  • Information Systems and Management
  • Management of Technology and Innovation

Fingerprint

Dive into the research topics of 'Comparison of Label Encoding and Evidence Counting for Malware Classification'. Together they form a unique fingerprint.

Cite this