Abstract
Reinforcement learning problems are often discretised, use linear function approximation, or perform batch updates. However, many applications that can benefit from reinforcement learning contain continuous variables and are inherently non-linear, for example, the control of aerospace or maritime robotic vehicles. Recent work has brought focus onto online temporal difference methods, specifically for using non-linear function approximation. In this paper, we evaluate the Forward Actor-Critic against the regular Actor-Critic, and Continuous Actor-Critic Learning Automation. We also propose and evaluate a new algorithm called True Online Continuous Learning Automation (TOCLA) which combines these two approaches. The chosen benchmark problem was the MountainCarContinuous-v0 environment from OpenAI Gym, which represents a further step in complexity over the benchmark used to test the Forward Actor
Critic in previous works. Our results demonstrate the superiority of TOCLA in terms of its sensitivity to hyper-parameter selection compared with the Forward Actor Critic, Continuous Actor-Critic Learning Automation, and Actor Critic algorithms.
Critic in previous works. Our results demonstrate the superiority of TOCLA in terms of its sensitivity to hyper-parameter selection compared with the Forward Actor Critic, Continuous Actor-Critic Learning Automation, and Actor Critic algorithms.
Original language | English |
---|---|
Title of host publication | 2020 IEEE Symposium Series on Computational Intelligence: Adaptive Dynamic Programming and Reinforcement Learning (IEEE ADPRL) |
Publisher | IEEE |
Publication status | Accepted/In press - 18 Sep 2020 |
Event | 2020 IEEE Symposium Series on Computational Intelligence - Camberra, Australia Duration: 1 Dec 2020 → 4 Dec 2020 Conference number: 46 http://www.ieeessci2020.org/ |
Conference
Conference | 2020 IEEE Symposium Series on Computational Intelligence |
---|---|
Abbreviated title | SSCI 2020 |
Country | Australia |
City | Camberra |
Period | 1/12/20 → 4/12/20 |
Internet address |
Keywords
- Reinforcement learning
- TOCLA
- Actor-Critic
- Nonlinear Function Approximation