Postponed updates for temporal-difference reinforcement learning

Authors	H. van Seijen S. Whiteson
Publication date	2009
Book title	2009 9th International Conference on Intelligent Systems Design and Applications (ISDA 2009): Pisa, Italy, 30 November-2 December 2009
ISBN	9780769538723
Event	9th International Conference on Intelligent Systems Design and Applications (ISDA 2009), Pisa, Italy
Pages (from-to)	665-672
Publisher	Piscataway, NJ: IEEE
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	This paper presents postponed updates, a new strategy for TD methods that can improve sample efficiency without incurring the computational and space requirements of model-based RL. By recording the agent's last-visit experience, the agent can delay its update until the given state is revisited, thereby improving the quality of the update. Experimental results demonstrate that postponed updates outperforms several competitors, most notably eligibility traces, a traditional way to improve the sample efficiency of TD methods. It achieves this without the need to tune an extra parameter as is needed for eligibility traces.
Document type	Conference contribution
Published at	http://doi.ieeecomputersociety.org/10.1109/ISDA.2009.76
Permalink to this page

Back

UvA-DARE