Pseudo-model-free hedging for variable annuities via deep reinforcement learning

Wing Fung Chong, Haoen Cui, Yuxuan Li*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)
52 Downloads (Pure)

Abstract

This paper proposes a two-phase deep reinforcement learning approach, for hedging variable annuity contracts with both GMMB and GMDB riders, which can address model miscalibration in Black-Scholes financial and constant force of mortality actuarial market environments. In the training phase, an infant reinforcement learning agent interacts with a pre-designed training environment, collects sequential anchor-hedging reward signals, and gradually learns how to hedge the contracts. As expected, after a sufficient number of training steps, the trained reinforcement learning agent hedges, in the training environment, equally well as the correct Delta while outperforms misspecified Deltas. In the online learning phase, the trained reinforcement learning agent interacts with the market environment in real time, collects single terminal reward signals, and self-revises its hedging strategy. The hedging performance of the further trained reinforcement learning agent is demonstrated via an illustrative example on a rolling basis to reveal the self-revision capability on the hedging strategy by online learning.

Original languageEnglish
Pages (from-to)503-546
Number of pages44
JournalAnnals of Actuarial Science
Volume17
Issue number3
Early online date14 Mar 2023
DOIs
Publication statusPublished - Nov 2023

Keywords

  • Hedging strategy self-revision
  • Online learning phase
  • Sequential anchor-hedging reward signals
  • Single terminal reward signals
  • Training phase
  • Two-phase deep reinforcement learning
  • Variable annuities hedging

ASJC Scopus subject areas

  • Statistics and Probability
  • Economics and Econometrics
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Pseudo-model-free hedging for variable annuities via deep reinforcement learning'. Together they form a unique fingerprint.

Cite this