An Evaluation Study on the Predictive Models of Breast Cancer Risk Factor Classification

Wen San Yee, Hu Ng*, Timothy Tzen Vun Yap, Vik Tor Goh, Keng Hong Ng, Dong Theng Cher

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

This research is intended to explore and evaluate various predictive models for the classification performance of breast cancer risk factors. First, data acquisition is being carried out to obtained three datasets from Breast Cancer Surveillance Consortium (BCSC). After that, data integration is performed to combine the datasets into one. Then, data preprocessing is performed to do data cleaning. Feature selection is executed to eliminate unrelated attributes. Data resampling is applied to resolve imbalanced data. Four classifiers namely Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Multilayer Perceptron (MLP) are used in classifying the risk factors of breast cancer. These four classifiers undergo training and testing data with 80-20, 70-30, and 60-40 train test splits. RF performs the best performance with 82% of accuracy at 80-20 train test split.

Original languageEnglish
Pages (from-to)129-145
Number of pages17
JournalJournal of Logistics, Informatics and Service Science
Volume9
Issue number3
DOIs
Publication statusPublished - 2022

Keywords

  • boruta feature selection
  • breast cancer
  • data resampling
  • logistic regression
  • multilayer perceptron
  • random forest
  • support vector machine

ASJC Scopus subject areas

  • Management Information Systems
  • Information Systems
  • Computer Networks and Communications
  • Information Systems and Management
  • Management of Technology and Innovation

Fingerprint

Dive into the research topics of 'An Evaluation Study on the Predictive Models of Breast Cancer Risk Factor Classification'. Together they form a unique fingerprint.

Cite this