Accelerating the computation of Shapley effects for datasets with many observations

Giovanni Rabitti*, George Tzougas

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

14 Downloads (Pure)

Abstract

Shapley effects are enjoying increasing popularity as importance measures. These indices allocate the variance of the quantity of interest among every risk factor, and a risk factor explaining more variance than another one is more important. Recently, Vallarino et al. (ASTIN Bull J IAA, 2023. https://doi.org/10.1017/asb.2023.34) propose a computational strategy for Shapley effects using the idea of cohorts of similar observations. However, this strategy becomes extremely computationally demanding if the dataset contains many observations. In this work we propose a computational shortcut based on design of experiments and clustering techniques to speed up the computational time. Using the well-known French claim frequency dataset, we demonstrate the huge reduction in computational time, without a significant loss of accuracy in the estimation of the Shapley effects.

Original languageEnglish
JournalEuropean Actuarial Journal
Early online date7 Feb 2025
DOIs
Publication statusE-pub ahead of print - 7 Feb 2025

Keywords

  • Conditional Latin hypercube sampling
  • Hierarchical k-means
  • Large insurance data
  • Latin hypercube sampling
  • Shapley effects

ASJC Scopus subject areas

  • Statistics and Probability
  • Economics and Econometrics
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Accelerating the computation of Shapley effects for datasets with many observations'. Together they form a unique fingerprint.

Cite this