The True Online Continuous Learning Automation (TOCLA) in a continuous control benchmarking of actor-critic algorithms

Gordon William Frost, Marta Vallejo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Downloads (Pure)

Abstract

Reinforcement learning problems are often discretised, use linear function approximation, or perform batch updates. However, many applications that can benefit from reinforcement learning contain continuous variables and are inherently non-linear, for example, the control of aerospace or maritime robotic vehicles. Recent work has brought focus onto online temporal difference methods, specifically for using non-linear function approximation. In this paper, we evaluate the Forward Actor-Critic against the regular Actor-Critic, and Continuous Actor-Critic Learning Automation. We also propose and evaluate a new algorithm called True Online Continuous Learning Automation (TOCLA) which combines these two approaches. The chosen benchmark problem was the MountainCarContinuous-v0 environment from OpenAI Gym, which represents a further step in complexity over the benchmark used to test the Forward Actor
Critic in previous works. Our results demonstrate the superiority of TOCLA in terms of its sensitivity to hyper-parameter selection compared with the Forward Actor Critic, Continuous Actor-Critic Learning Automation, and Actor Critic algorithms.
Original languageEnglish
Title of host publication2020 IEEE Symposium Series on Computational Intelligence: Adaptive Dynamic Programming and Reinforcement Learning (IEEE ADPRL)
PublisherIEEE
Publication statusAccepted/In press - 18 Sep 2020
Event2020 IEEE Symposium Series on Computational Intelligence - Camberra, Australia
Duration: 1 Dec 20204 Dec 2020
Conference number: 46
http://www.ieeessci2020.org/

Conference

Conference2020 IEEE Symposium Series on Computational Intelligence
Abbreviated titleSSCI 2020
CountryAustralia
CityCamberra
Period1/12/204/12/20
Internet address

Keywords

  • Reinforcement learning
  • TOCLA
  • Actor-Critic
  • Nonlinear Function Approximation

Fingerprint Dive into the research topics of 'The True Online Continuous Learning Automation (TOCLA) in a continuous control benchmarking of actor-critic algorithms'. Together they form a unique fingerprint.

Cite this