TY - GEN
T1 - Readable and accurate rulesets with ORGA
AU - Daud, M. N R
AU - Corne, David
PY - 2008
Y1 - 2008
N2 - A key task for data mining is to produce accurate and descriptive models. 'Human readable' models are often necessary to enable understanding, potentially leading to further insight, and also inducing trust in the user. Rules, or decision trees (if not too numerous or large) are readable, unlike, for example SVM models. However, descriptiveness and accuracy normally conflict; a challenge is to find algorithms that have both high accuracy and high readability. We introduce ORGA (Optimized Ripper using Genetic Algorithm) which hybridizes evolutionary search with the RIPPER ruleset algorithm. RIPPER is effective at producing accurate and readable rulesets, and we show that ORGA provides significant further improvement. ORGA outperforms overall a suitable set of comparative algorithms including implementations of RIPPER, C4.5 and PART. On a majority of the datasets, ORGA's outperformance of the other algorithms is spectacular, and it is rarely dominated in terms of both accuracy and readability. © 2008 Springer-Verlag Berlin Heidelberg.
AB - A key task for data mining is to produce accurate and descriptive models. 'Human readable' models are often necessary to enable understanding, potentially leading to further insight, and also inducing trust in the user. Rules, or decision trees (if not too numerous or large) are readable, unlike, for example SVM models. However, descriptiveness and accuracy normally conflict; a challenge is to find algorithms that have both high accuracy and high readability. We introduce ORGA (Optimized Ripper using Genetic Algorithm) which hybridizes evolutionary search with the RIPPER ruleset algorithm. RIPPER is effective at producing accurate and readable rulesets, and we show that ORGA provides significant further improvement. ORGA outperforms overall a suitable set of comparative algorithms including implementations of RIPPER, C4.5 and PART. On a majority of the datasets, ORGA's outperformance of the other algorithms is spectacular, and it is rarely dominated in terms of both accuracy and readability. © 2008 Springer-Verlag Berlin Heidelberg.
KW - Data mining
KW - Human readability
KW - Hybrid machine learning
UR - http://www.scopus.com/inward/record.url?scp=56449122790&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-87700-4_86
DO - 10.1007/978-3-540-87700-4_86
M3 - Conference contribution
SN - 3540876995
SN - 9783540876991
VL - 5199 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 869
EP - 878
BT - Parallel Problem Solving from Nature - PPSN X - 10th International Conference, Proceedings
T2 - 10th International Conference on Parallel Problem Solving from Nature
Y2 - 13 September 2008 through 17 September 2008
ER -