Planning problems with a first-order structure can be modelled compactly with Relational Markov Decision Processes (RMDPs). If the model is unknown, value-based reinforcement learning methods can be used to solve these problems. The action-value function is approximated with features which are conjunctive ground state fluents. However, this approximation does not exploit the first-order structure of RMDPs and the generated policy can only solve a ground MDP of the RMDP. Our objective is to learn a generalised function approximation which induces a policy that can solve multiple ground MDPs. We achieve this by using conjunctive lifted state fluents as first-order features. This first-order approximation gives better generalisation but has a coarser granularity which can worsen performance. We propose the combination of first-order features and ground features to get both of their strengths. Empirical results for four domains show that our method could generalise over problems regardless of their scales and allow transfer learning.
|Publication status||Published - 5 Aug 2021|
|Event||31st International Conference on Automated Planning and Scheduling: Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning - Online, Guangzhou, China|
Duration: 2 Jun 2021 → 13 Jun 2021
|Conference||31st International Conference on Automated Planning and Scheduling|
|Abbreviated title||ICAPS 2021|
|Period||2/06/21 → 13/06/21|