Investigating the Stability of SMOTE-Based Oversampling on COVID-19 Data

Jih Soong Tan*, Hui Jia Yee, Ivan Boo, Ian K. T. Tan, Helmi Zakariah

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Predictive analytic methods for medical diagnosis can be helpful in supporting decision-making of medical treatment, which in turn reduce the need for medical experts’ attention. However, imbalanced data problems often exist in medical diagnosis datasets and negatively impact the models’ predictive performance. The results of learning algorithms on imbalanced data are biased and often cause over-fitting of the majority class. The Synthetic Minority Over-sampling Technique (SMOTE) was proposed to deal with this over-fitting challenge. The application of SMOTE requires the over-sampling of the minority class(es). However, there are vague guidelines on how much oversampling on the minority class is suitable. Therefore, experiments on oversampling using SMOTE with different oversampling ratio setups are done on a medical diagnosis dataset. It is observed that the increase in oversampling rate will reduce the accuracy and precision. Oversampling to a uniform level and excessive oversampling can cause poorer performance. Both recall and precision should be considered based on the costs when deciding the best oversampling percentage.

Original languageEnglish
Title of host publicationIntelligent Computing. SAI 2023
EditorsKohei Arai
Number of pages11
ISBN (Electronic)9783031379635
ISBN (Print)9783031379628
Publication statusPublished - 20 Aug 2023
EventComputing Conference 2023 - London, United Kingdom
Duration: 22 Jun 202323 Jun 2023

Publication series

NameLecture Notes in Networks and Systems
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389


ConferenceComputing Conference 2023
Country/TerritoryUnited Kingdom


  • Boosting
  • COVID-19
  • Data Pre-processing
  • Data Sampling

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Computer Networks and Communications


Dive into the research topics of 'Investigating the Stability of SMOTE-Based Oversampling on COVID-19 Data'. Together they form a unique fingerprint.

Cite this