Temporally Consistent Semantic Segmentation using Spatially Aware Multi-view Semantic Fusion for Indoor RGB-D videos

F. Sun; S. Karaoglu; T. Gevers

doi:https://doi.org/10.1109/ICCVW60793.2023.00459

Temporally Consistent Semantic Segmentation using Spatially Aware Multi-view Semantic Fusion for Indoor RGB-D videos

Authors	F. Sun S. Karaoglu T. Gevers
Publication date	2023
Book title	2023 IEEE/CVF International Conference on Computer Vision Workshops
Book subtitle	proceedings: ICCVW 2023 : Paris, France, 2-6 October 2023
ISBN	9798350307450
ISBN (electronic)	9798350307443
Event	2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
Pages (from-to)	4250-4259
Number of pages	10
Publisher	Los Alamitos, California: IEEE Computer Society
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	The task of performing image semantic segmentation faces challenges in achieving consistent and robust results across a sequence of video frames. This problem becomes more prominent for indoor scenes where small camera movement can lead to drastic appearance changes, occlusions, and loss of global context information.To overcome these challenges, this paper proposes a novel approach that combines multi-view semantic fusion with spatial reasoning to produce view-invariant semantic features for temporally consistent semantic segmentation for indoor RGB-D videos.The experiments are conducted on the ScanNet dataset, showing that the proposed spatially aware multi-view fusion mechanism significantly improves the state-of-the-art image semantic segmentation methods Mask2Former and ViT-Adapter. In particular, the proposed pipeline offers improvements of 5%, 9.9%, and 14.4% in 2D mIoU, cross-view consistency, and temporal consistency, respectively, when compared to Mask2Former. Similarly, when compared to ViT-Adapter, the proposed mechanism offers enhancements of 4.8%, 8.9%, and 10.9% in the same metrics.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1109/ICCVW60793.2023.00459
Other links	https://www.proceedings.com/72202.html https://www.scopus.com/pages/publications/85182946608
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Temporally Consistent Semantic Segmentation using Spatially Aware Multi-view Semantic Fusion for Indoor RGB-D videos