Representation learning for sparse point cloud perception

W. Wei

Representation learning for sparse point cloud perception

Authors	W. Wei
Supervisors	T. Gevers
Cosupervisors	M.R. Oswald
Award date	06-02-2026
Number of pages	112
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	3D perception is a cornerstone of modern machine intelligence, underpinning critical applications such as autonomous driving, robotics, and spatial understanding. Among 3D data representations, point clouds offer a direct and geometry-rich description of the physical world, yet their inherent sparsity, non-uniform sampling, and high annotation cost impose fundamental limitations on perception performance. These challenges are further exacerbated by the closed-set assumptions and static taxonomies that dominate conventional 3D learning paradigms. This doctoral thesis investigates how effective point cloud representations can be learned to overcome these limitations and advance robust, scalable 3D perception. The thesis addresses this question from four complementary perspectives. First, it studies how different scene representations encode distinct inductive priors and demonstrates that their integration can significantly improve urban-scale point cloud semantic segmentation. Second, to mitigate data sparsity and reduce annotation requirements, a self-supervised spatio-temporal pre-training framework is proposed that leverages temporal continuity across LiDAR sequences. Third, the thesis moves beyond open-vocabulary perception by introducing the task of 3D Auto-Vocabulary Segmentation, enabling a system to proactively discover, name, and segment semantic entities without human-provided category definitions. Finally, it explores how dynamically generated, scene-specific vocabularies can be used as supervision to alleviate category exposure bias and improve open-vocabulary 3D segmentation on large-scale, auto-labelled datasets. Collectively, this work advances point cloud representation learning through multimodal fusion, temporal modeling, autonomous semantic discovery, and generative supervision, contributing toward more scalable and robust 3D perception systems.
Document type	PhD thesis
Language	English
Downloads	Thesis
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Representation learning for sparse point cloud perception