TY - GEN
T1 - Android Malware Detection Using API Calls: A Comparison of Feature Selection and Machine Learning Models
AU - Muzaffar, Ali
AU - Ragab Hassan, Hani
AU - Lones, Michael Adam
AU - Zantout, Hind
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022/2/2
Y1 - 2022/2/2
N2 - Android has become a major target for malware attacks due its popularity and ease of distribution of applications. According to a recent study, around 11,000 new malware appear online on daily basis. Machine learning approaches have shown to perform well in detecting malware. In particular, API calls has been found to be one of the best performing features in malware detection. However, due to the functionalities provided by the Android SDK, applications can use many API calls, creating a computational overhead while training machine learning models. In this study, we look at the benefits of using feature selection to reduce this overhead. We consider three different feature selection algorithms, mutual information, variance threshold and Pearson correlation coefficient, when used with five different machine learning models: support vector machines, decision trees, random forests, Naïve Bayes and AdaBoost. We collected a dataset of 40,000 Android applications that used 134,207 different API calls. Our results show that the number of API calls can be reduced by approximately 95%, whilst still being more accurate than when the full API feature set is used. Random forests achieve the best discrimination between malware and benign applications, with an accuracy of 96.1%.
AB - Android has become a major target for malware attacks due its popularity and ease of distribution of applications. According to a recent study, around 11,000 new malware appear online on daily basis. Machine learning approaches have shown to perform well in detecting malware. In particular, API calls has been found to be one of the best performing features in malware detection. However, due to the functionalities provided by the Android SDK, applications can use many API calls, creating a computational overhead while training machine learning models. In this study, we look at the benefits of using feature selection to reduce this overhead. We consider three different feature selection algorithms, mutual information, variance threshold and Pearson correlation coefficient, when used with five different machine learning models: support vector machines, decision trees, random forests, Naïve Bayes and AdaBoost. We collected a dataset of 40,000 Android applications that used 134,207 different API calls. Our results show that the number of API calls can be reduced by approximately 95%, whilst still being more accurate than when the full API feature set is used. Random forests achieve the best discrimination between malware and benign applications, with an accuracy of 96.1%.
UR - http://www.scopus.com/inward/record.url?scp=85125229811&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-95918-0_1
DO - 10.1007/978-3-030-95918-0_1
M3 - Conference contribution
SN - 9783030959173
T3 - Lecture Notes in Networks and Systems
SP - 3
EP - 12
BT - Proceedings of the International Conference on Applied CyberSecurity (ACS) 2021
A2 - Ragab Hassen, Hani
A2 - Batatia, Hadj
PB - Springer
ER -