Reinforcement learning and planning for autonomous agent navigation With a focus on sparse reward settings
| Authors | |
|---|---|
| Supervisors | |
| Cosupervisors |
|
| Award date | 15-03-2024 |
| Number of pages | 148 |
| Organisations |
|
| Abstract |
Being able to navigate our surroundings enables us humans to freely interact with our environment and is therefore an important skill for truly autonomous technical systems as well. The machine learning paradigm of reinforcement learning (RL) enables learning (neural network) policies for decision making through continuous interaction with the environment. However, if the rewards that are received as feedback are sparse, improving the policy gets difficult and inefficient. Therefore, this thesis focusses on improving policy learning under sparse rewards for autonomous agents tasked to reach dedicated goal locations.
First, we present a novel spatial gradient (SG) strategy to select starting states at the boundary of the agents’ capabilities, which results in a curriculum that improves learning progress. Afterwards, we combine planning over abstract sub-goals with reinforcement learning to obtain policies to reach these sub-goals. The resulting sub-tasks make policy learning easier. We first present our hierarchical VI-RL policy architecture that utilizes a learned transition model for planning, which captures agent capabilities and enables generalization. Subsequently, we improve efficiency and performance of the sub-goal planning by learning to locally refine simple shortest path plans based on detailed local state information. Our proposed RL-trained Value Refinement Network (VRN) architecture additionally enables navigating dynamic environments without repeated global re-planning. Finally, we address the practically relevant setting where continuous environment interaction is not possible. Our HORIBLe-VRN algorithm allows to learn our hierarchical planning-based policies from pre-collected data, incorporating latent sub-goal inference as well as offline RL to improve over sub-optimal demonstrations. |
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |
