Abstract
Support Vector Machines (SVMs) are known to be robust tools for classification and regression in noisy and complex domains. SVM ensembles have been widely used to improve classification accuracy in complicated pattern recognition tasks. A good example is the DNA microarray data -for tumor classification- which is usually characterized by low sample size, high dimensionality, noise and large biological variability. In this work we propose to apply an ensemble of SVMs coupled with feature-subset selection methods to alleviate the curse of dimensionality associated with expression-based classification of DNA data in order to achieve stable and reliable results. We compare the single SVM classifier to SVM ensembles applying two different feature-subset selection techniques, namely random selection and k-means clustering, and combining the base classifiers using either majority vote or SVM fusion. Two real-world datasets are used as benchmarks to evaluate and compare the performance. Experimental results show that the ensemble with k-means clustering for feature-subset selection which uses SVM base classifiers and an SVM combiner achieves the best classification accuracy, and that feature-subset-selection methods can have a considerable impact on the classification accuracy.
Original language | English |
---|---|
Pages (from-to) | 1-11 |
Number of pages | 11 |
Journal | International Journal of Applied Mathematics and Statistics |
Volume | 28 |
Issue number | 4 |
Publication status | Published - 2012 |
Keywords
- Ensemble classification
- Feature selection
- Feature subsets
- Microarray data
- Support vector machines (SVM)
- SVM fusion
ASJC Scopus subject areas
- Applied Mathematics