T-MAE : Temporal Masked Autoencoders for Point Cloud Representation Learning

W. Wei; F.K. Nejadasl; T. Gevers; M.R. Oswald

doi:https://doi.org/10.1007/978-3-031-73247-8_11

T-MAE : Temporal Masked Autoencoders for Point Cloud Representation Learning

Authors	W. Wei F.K. Nejadasl T. Gevers M.R. Oswald
Publication date	2025
Host editors	A. Leonardis E. Ricci S. Roth O. Russakovsky T. Sattler G. Varol
Book title	Computer Vision – ECCV 2024
Book subtitle	18th European Conference, Milan, Italy, September 29–October 4, 2024 : proceedings
ISBN	9783031732461
ISBN (electronic)	9783031732478
Series	Lecture Notes in Computer Science
Event	The 18th European Conference on Computer Vision ECCV 2024
Volume \| Issue number	XI
Pages (from-to)	178–195
Publisher	Cham: Springer
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI) Faculty of Science (FNWI) - Institute for Biodiversity and Ecosystem Dynamics (IBED)
Abstract	The scarcity of annotated data in LiDAR point cloud understanding hinders effective representation learning. Consequently, scholars have been actively investigating efficacious self-supervised pre-training paradigms. Nevertheless, temporal information, which is inherent in the LiDAR point cloud sequence, is consistently disregarded. To better utilize this property, we propose an effective pre-training strategy, namely Temporal Masked Auto-Encoders (T-MAE), which takes as input temporally adjacent frames and learns temporal dependency. A SiamWCA backbone, containing a Siamese encoder and a windowed cross-attention (WCA) module, is established for the two-frame input. Considering that the movement of an ego-vehicle alters the view of the same instance, temporal modeling also serves as a robust and natural data augmentation, enhancing the comprehension of target objects. SiamWCA is a powerful architecture but heavily relies on annotated data. Our T-MAE pre-training strategy alleviates its demand for annotated data. Comprehensive experiments demonstrate that T-MAE achieves the best performance on both Waymo and ONCE datasets among competitive self-supervised approaches.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1007/978-3-031-73247-8_11
Downloads	T-MAE (Final published version)
Supplementary materials	Supplementary Material
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

T-MAE : Temporal Masked Autoencoders for Point Cloud Representation Learning

UvA-DARE

Digital Academic Repository

T-MAE : Temporal Masked Autoencoders for Point Cloud Representation Learning

T-MAE : Temporal Masked Autoencoders for Point Cloud Representation Learning