Representation learning for sparse point cloud perception
| Authors | |
|---|---|
| Supervisors | |
| Cosupervisors | |
| Award date | 06-02-2026 |
| Number of pages | 112 |
| Organisations |
|
| Abstract |
3D perception is a cornerstone of modern machine intelligence, underpinning critical applications such as autonomous driving, robotics, and spatial understanding. Among 3D data representations, point clouds offer a direct and geometry-rich description of the physical world, yet their inherent sparsity, non-uniform sampling, and high annotation cost impose fundamental limitations on perception performance. These challenges are further exacerbated by the closed-set assumptions and static taxonomies that dominate conventional 3D learning paradigms. This doctoral thesis investigates how effective point cloud representations can be learned to overcome these limitations and advance robust, scalable 3D perception.
The thesis addresses this question from four complementary perspectives. First, it studies how different scene representations encode distinct inductive priors and demonstrates that their integration can significantly improve urban-scale point cloud semantic segmentation. Second, to mitigate data sparsity and reduce annotation requirements, a self-supervised spatio-temporal pre-training framework is proposed that leverages temporal continuity across LiDAR sequences. Third, the thesis moves beyond open-vocabulary perception by introducing the task of 3D Auto-Vocabulary Segmentation, enabling a system to proactively discover, name, and segment semantic entities without human-provided category definitions. Finally, it explores how dynamically generated, scene-specific vocabularies can be used as supervision to alleviate category exposure bias and improve open-vocabulary 3D segmentation on large-scale, auto-labelled datasets. Collectively, this work advances point cloud representation learning through multimodal fusion, temporal modeling, autonomous semantic discovery, and generative supervision, contributing toward more scalable and robust 3D perception systems. |
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |
