Deep Coherent Exploration For Continuous Control

Deep Coherent Exploration For Continuous Control

Authors	Y. Zhang H. van Hoof
Publication date	2021
Journal	Proceedings of Machine Learning Research
Event	38th International Conference on Machine Learning
Volume \| Issue number	139
Pages (from-to)	12567-12577
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	In policy search methods for reinforcement learning (RL), exploration is often performed by injecting noise either in action space at each step independently or in parameter space over each full trajectory. In prior work, it has been shown that with linear policies, a more balanced trade-off between these two exploration strategies is beneficial. However, that method did not scale to policies using deep neural networks. In this paper, we introduce deep coherent exploration, a general and scalable exploration framework for deep RL algorithms for continuous control, that generalizes step-based and trajectory-based exploration. This framework models the last layer parameters of the policy network as latent variables and uses a recursive inference step within the policy update to handle these latent variables in a scalable manner. We find that deep coherent exploration improves the speed and stability of learning of A2C, PPO, and SAC on several continuous control tasks.
Document type	Article
Note	International Conference on Machine Learning, 18-24 July 2021, Virtual. - With supplementary file.
Language	English
Published at	https://proceedings.mlr.press/v139/zhang21t.html
Downloads	zhang21t (Final published version)
Supplementary materials	zhang21t-supp
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Deep Coherent Exploration For Continuous Control