Classification of SARS-CoV-2 and Non-SARS-CoV-2 Using Machine Learning Algorithms

Om Prakash Singh, Marta Vallejo, Ismail M. El-badawy, Ali Aysha, Jagannathan Madhanagopal, Ahmad Athif Mohd Faudzi

Research output: Contribution to journalArticlepeer-review


Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (DSP) and machine learning approaches. This study presents an alignment-free approach to classify the SARS-CoV-2 using complementary DNA, which is DNA synthesized from the single-stranded RNA virus. Herein, a total of 1582 samples, with different lengths of genome sequences from different regions, were collected from various data sources and divided into a SARS-CoV-2 and a non-SARS-CoV-2 group. We extracted eight biomarkers based on three-base periodicity, using DSP techniques, and ranked those based on a filter-based feature selection. The ranked biomarkers were fed into k-nearest neighbor, support vector machines, decision trees, and random forest classifiers for the classification of SARS-CoV-2 from other coronaviruses. The training dataset was used to test the performance of the classifiers based on accuracy and F-measure via 10-fold cross-validation. Kappa-scores were estimated to check the influence of unbalanced data. Further, 10x10 cross-validation paired t-test was utilized to test the best model with unseen data. Random forest was elected as the best model, differentiating the SARS-CoV-2 coronavirus from other coronaviruses and a control a group with an accuracy of 97.4%, sensitivity of 96.2%, and specificity of 98.2%, when tested with unseen samples. Moreover, the proposed algorithm was computationally efficient, taking only 0.31 seconds to compute the genome biomarkers, outperforming previous studies.
Original languageEnglish
Article number104650
JournalComputers in Biology and Medicine
Early online date21 Jul 2021
Publication statusE-pub ahead of print - 21 Jul 2021


Dive into the research topics of 'Classification of SARS-CoV-2 and Non-SARS-CoV-2 Using Machine Learning Algorithms'. Together they form a unique fingerprint.

Cite this