Abstract
This research is intended to explore and evaluate various predictive models for the classification performance of breast cancer risk factors. First, data acquisition is being carried out to obtained three datasets from Breast Cancer Surveillance Consortium (BCSC). After that, data integration is performed to combine the datasets into one. Then, data preprocessing is performed to do data cleaning. Feature selection is executed to eliminate unrelated attributes. Data resampling is applied to resolve imbalanced data. Four classifiers namely Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Multilayer Perceptron (MLP) are used in classifying the risk factors of breast cancer. These four classifiers undergo training and testing data with 80-20, 70-30, and 60-40 train test splits. RF performs the best performance with 82% of accuracy at 80-20 train test split.
Original language | English |
---|---|
Pages (from-to) | 129-145 |
Number of pages | 17 |
Journal | Journal of Logistics, Informatics and Service Science |
Volume | 9 |
Issue number | 3 |
DOIs | |
Publication status | Published - 2022 |
Keywords
- boruta feature selection
- breast cancer
- data resampling
- logistic regression
- multilayer perceptron
- random forest
- support vector machine
ASJC Scopus subject areas
- Management Information Systems
- Information Systems
- Computer Networks and Communications
- Information Systems and Management
- Management of Technology and Innovation