TY - GEN
T1 - New Feature Splitting Criteria for Co-training Using Genetic Algorithm Optimization
AU - Salaheldin, Ahmed
AU - El Gayar, Neamat
PY - 2010
Y1 - 2010
N2 - Often in real world applications only a small number of labeled data is available while unlabeled data is abundant. Therefore, it is important to make use of unlabeled data. Co-training is a popular semi-supervised learning technique that uses a small set of labeled data and enough unlabeled data to create more accurate classification models. A key feature for successful co-training is to split the features among more than one view. In this paper we propose new splitting criteria based on the confidence of the views, the diversity of the views, and compare them to random and natural splits. We also examine a previously proposed artificial split that maximizes the independence between the views, and propose a mixed criterion for splitting features based on both the confidence and the independence of the views. Genetic algorithms are used to choose the splits which optimize the independence of the views given the class, the confidence of the views in their predictions, and the diversity of the views. We demonstrate that our proposed splitting criteria improve the performance of co-training.
AB - Often in real world applications only a small number of labeled data is available while unlabeled data is abundant. Therefore, it is important to make use of unlabeled data. Co-training is a popular semi-supervised learning technique that uses a small set of labeled data and enough unlabeled data to create more accurate classification models. A key feature for successful co-training is to split the features among more than one view. In this paper we propose new splitting criteria based on the confidence of the views, the diversity of the views, and compare them to random and natural splits. We also examine a previously proposed artificial split that maximizes the independence between the views, and propose a mixed criterion for splitting features based on both the confidence and the independence of the views. Genetic algorithms are used to choose the splits which optimize the independence of the views given the class, the confidence of the views in their predictions, and the diversity of the views. We demonstrate that our proposed splitting criteria improve the performance of co-training.
UR - http://www.scopus.com/inward/record.url?scp=77952086656&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-12127-2-3
DO - 10.1007/978-3-642-12127-2-3
M3 - Conference contribution
AN - SCOPUS:77952086656
SN - 9783642121265
T3 - Lecture Notes in Computer Science
SP - 22
EP - 32
BT - Multiple Classifier Systems. MCS 2010
PB - Springer
T2 - 9th International Workshop on Multiple Classifier Systems 2010
Y2 - 7 April 2010 through 9 April 2010
ER -