Abstract
Balanced datasets play a key role in the bias observed in machine learning algorithms towards classification and prediction. The CSE-CIC IDS datasets published in 2017 and 2018 have both attracted considerable scholarly attention towards research in intrusion detection systems. Recent work published using this dataset indicates little attention paid to the imbalance of the dataset. The study presented in this paper sets out to explore the degree to which imbalance has been treated and provide a taxonomy of the machine learning approaches developed using these datasets. A survey of published works related to these datasets was done to deliver a combined qualitative and quantitative methodological approach for our analysis towards deriving a taxonomy. The research presented here confirms that the impact of bias due to the imbalance datasets is rarely addressed. This data supports further research and development of supervised machine learning techniques which reduce the impact of bias in classification or prediction due to these imbalance datasets.
| Original language | English |
|---|---|
| Title of host publication | 2020 International Conference on Communications, Signal Processing, and their Applications (ICCSPA) |
| Publisher | IEEE |
| ISBN (Electronic) | 9781728165356 |
| ISBN (Print) | 9781728165363 |
| DOIs | |
| Publication status | Published - 2 Apr 2021 |
| Event | International Conference on Communications, Signal Processing, and their Applications 2020 - Sharjah, United Arab Emirates Duration: 16 Mar 2021 → 18 Mar 2021 |
Conference
| Conference | International Conference on Communications, Signal Processing, and their Applications 2020 |
|---|---|
| Abbreviated title | ICCSPA 2020 |
| Country/Territory | United Arab Emirates |
| City | Sharjah |
| Period | 16/03/21 → 18/03/21 |
Keywords
- Measurement
- machine learning algorithms
- Taxonomy
- Machine learning
- Signal processing
- Predictive models
- Research and Development
- Balance
- dataset
- intrusion detection system