Temporally Consistent Semantic Segmentation using Spatially Aware Multi-view Semantic Fusion for Indoor RGB-D videos

Authors
Publication date 2023
Book title 2023 IEEE/CVF International Conference on Computer Vision Workshops
Book subtitle proceedings: ICCVW 2023 : Paris, France, 2-6 October 2023
ISBN
  • 9798350307450
ISBN (electronic)
  • 9798350307443
Event 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
Pages (from-to) 4250-4259
Number of pages 10
Publisher Los Alamitos, California: IEEE Computer Society
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract

The task of performing image semantic segmentation faces challenges in achieving consistent and robust results across a sequence of video frames. This problem becomes more prominent for indoor scenes where small camera movement can lead to drastic appearance changes, occlusions, and loss of global context information.To overcome these challenges, this paper proposes a novel approach that combines multi-view semantic fusion with spatial reasoning to produce view-invariant semantic features for temporally consistent semantic segmentation for indoor RGB-D videos.The experiments are conducted on the ScanNet dataset, showing that the proposed spatially aware multi-view fusion mechanism significantly improves the state-of-the-art image semantic segmentation methods Mask2Former and ViT-Adapter. In particular, the proposed pipeline offers improvements of 5%, 9.9%, and 14.4% in 2D mIoU, cross-view consistency, and temporal consistency, respectively, when compared to Mask2Former. Similarly, when compared to ViT-Adapter, the proposed mechanism offers enhancements of 4.8%, 8.9%, and 10.9% in the same metrics.

Document type Conference contribution
Language English
Published at https://doi.org/10.1109/ICCVW60793.2023.00459
Other links https://www.proceedings.com/72202.html https://www.scopus.com/pages/publications/85182946608
Permalink to this page
Back