Reinforcement learning and planning for autonomous agent navigation With a focus on sparse reward settings

Open Access
Authors
Supervisors
Cosupervisors
Award date 15-03-2024
Number of pages 148
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Being able to navigate our surroundings enables us humans to freely interact with our environment and is therefore an important skill for truly autonomous technical systems as well. The machine learning paradigm of reinforcement learning (RL) enables learning (neural network) policies for decision making through continuous interaction with the environment. However, if the rewards that are received as feedback are sparse, improving the policy gets difficult and inefficient. Therefore, this thesis focusses on improving policy learning under sparse rewards for autonomous agents tasked to reach dedicated goal locations.
First, we present a novel spatial gradient (SG) strategy to select starting states at the boundary of the agents’ capabilities, which results in a curriculum that improves learning progress.
Afterwards, we combine planning over abstract sub-goals with reinforcement learning to obtain policies to reach these sub-goals. The resulting sub-tasks make policy learning easier.
We first present our hierarchical VI-RL policy architecture that utilizes a learned transition model for planning, which captures agent capabilities and enables generalization.
Subsequently, we improve efficiency and performance of the sub-goal planning by learning to locally refine simple shortest path plans based on detailed local state information. Our proposed RL-trained Value Refinement Network (VRN) architecture additionally enables navigating dynamic environments without repeated global re-planning.
Finally, we address the practically relevant setting where continuous environment interaction is not possible. Our HORIBLe-VRN algorithm allows to learn our hierarchical planning-based policies from pre-collected data, incorporating latent sub-goal inference as well as offline RL to improve over sub-optimal demonstrations.
Document type PhD thesis
Language English
Downloads
Permalink to this page
cover
Back