Evaluation of Q-learning for search and inspect missions using underwater vehicles

Gordon William Frost, David Michael Lane

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)


An application for offline Reinforcement Learning in the underwater domain is proposed. We present and evaluate the integration of the Q-learning algorithm into an Autonomous Underwater Vehicle (AUV) for learning the action-value function in simulation. Three separate experiments are presented. The first compares two search policies: the ε — least visited, and random action, with respect to convergence time. The second experiment presents the effect of the learning discount factor, gamma, on the convergence time of the ε — least visited search policy. The final experiment is to validate the use of a policy learnt offline on a real AUV. This learning phase occurs offline within the continuous simulation environment which had been discretized into a grid-world learning problem. Presented results show the system's convergence to a global optimal solution whilst following both sub-optimal policies during simulation. Future work is introduced, after discussion of our results, to enable the system to be used in a real world application. The results presented, therefore, form the basis for future comparative analysis of the necessary improvements such as function approximation of the state space.
Original languageEnglish
Title of host publicationIEEE Oceans 2014 St John's
ISBN (Print)978-1-4799-4920-5
Publication statusPublished - 14 Sept 2014


Dive into the research topics of 'Evaluation of Q-learning for search and inspect missions using underwater vehicles'. Together they form a unique fingerprint.

Cite this