TY - JOUR
T1 - Imbalanced learning for insurance using modified loss functions in tree-based models
AU - Hu, Changyue
AU - Quan, Zhiyu
AU - Chong, Wing Fung
N1 - Funding Information:
The authors are grateful to anonymous reviewers for their careful reading and insightful comments. Zhiyu Quan would like to thank the Campus Research Board for the funding support of this research project through the Arnold O. Beckman Research Award (No. RB21058 ).
Publisher Copyright:
© 2022
PY - 2022/9
Y1 - 2022/9
N2 - Tree-based models have gained momentum in insurance claim loss modeling; however, the point mass at zero and the heavy tail of insurance loss distribution pose the challenge to apply conventional methods directly to claim loss modeling. With a simple illustrative dataset, we first demonstrate how the traditional tree-based algorithm's splitting function fails to cope with a large proportion of data with zero responses. To address the imbalance issue presented in such loss modeling, this paper aims to modify the traditional splitting function of Classification and Regression Tree (CART). In particular, we propose two novel modified loss functions, namely, the weighted sum of squared error and the sum of squared Canberra error. These modified loss functions impose a significant penalty on grouping observations of non-zero response with those of zero response at the splitting procedure, and thus significantly enhance their separation. Finally, we examine and compare the predictive performance of such modified tree-based models to the traditional model on synthetic datasets that imitate insurance loss. The results show that such modification leads to substantially different tree structures and improved prediction performance.
AB - Tree-based models have gained momentum in insurance claim loss modeling; however, the point mass at zero and the heavy tail of insurance loss distribution pose the challenge to apply conventional methods directly to claim loss modeling. With a simple illustrative dataset, we first demonstrate how the traditional tree-based algorithm's splitting function fails to cope with a large proportion of data with zero responses. To address the imbalance issue presented in such loss modeling, this paper aims to modify the traditional splitting function of Classification and Regression Tree (CART). In particular, we propose two novel modified loss functions, namely, the weighted sum of squared error and the sum of squared Canberra error. These modified loss functions impose a significant penalty on grouping observations of non-zero response with those of zero response at the splitting procedure, and thus significantly enhance their separation. Finally, we examine and compare the predictive performance of such modified tree-based models to the traditional model on synthetic datasets that imitate insurance loss. The results show that such modification leads to substantially different tree structures and improved prediction performance.
KW - Canberra distance
KW - Custom loss
KW - Imbalanced learning
KW - Predictive model of insurance claims
KW - Regression tree
KW - Tree-based algorithms
UR - http://www.scopus.com/inward/record.url?scp=85130154536&partnerID=8YFLogxK
U2 - 10.1016/j.insmatheco.2022.04.010
DO - 10.1016/j.insmatheco.2022.04.010
M3 - Article
AN - SCOPUS:85130154536
SN - 0167-6687
VL - 106
SP - 13
EP - 32
JO - Insurance: Mathematics and Economics
JF - Insurance: Mathematics and Economics
ER -