Abstract
We describe the concept of feature bias (FB) strategies and compare such strategies with traditional feature selection (FS) for predictive machine learning on a collection of datasets. FS is a common step in many classification and regression tasks. It is necessary because machine learning tools often cannot cope when the data has thousands of attributes. However, the strategy used by FS techniques is essentially binary. It is hoped that most "irrelevant" features are removed prior to the application of machine learning, and that the subsequent machine learning stage will be much faster (since there are fewer features to process) and also more successful (since many features will be removed by FS that seem unimportant for the classification task at hand). However, FS methods typically rely on standard statistical ideas and are unable to guarantee that all and only relevant features remain. A feature bias strategy, on the other hand, is an alternative approach in which we never entirely remove any feature from consideration. Experimental results reveal that FB can greatly improve upon FS for prediction tasks, particularly on poorly correlated datasets. We propose a tentative guideline for choosing an FS or FB strategy based on simply calculated inherent correlation of the dataset.
Original language | English |
---|---|
Title of host publication | Proceedings of the 10th IASTED International Conference on Artificial Intelligence and Applications, AIA 2010 |
Pages | 50-57 |
Number of pages | 8 |
Publication status | Published - 2010 |
Event | 10th IASTED International Conference on Artificial Intelligence and Applications - Innsbruck, Austria Duration: 15 Feb 2010 → 17 Feb 2010 |
Conference
Conference | 10th IASTED International Conference on Artificial Intelligence and Applications |
---|---|
Abbreviated title | AIA 2010 |
Country/Territory | Austria |
City | Innsbruck |
Period | 15/02/10 → 17/02/10 |
Keywords
- Classification
- Feature bias
- Feature selection
- Machine learning
- Prediction tasks
- Proteomics