Learning Hierarchical Embedding for Video Instance Segmentation

Open Access
Authors
  • Y. Yin
Publication date 2021
Book title MM '21
Book subtitle Proceedings of the 29th ACM International Conference on Multimedia : October 20-24, 2021, Virtual Event, China
ISBN (electronic)
  • 9781450386517
Event 29th ACM International Conference on Multimedia, MM 2021
Pages (from-to) 1884-1892
Publisher New York, NY: Association for Computing Machinery
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
In this paper, we address video instance segmentation using a new generative model that learns effective representations of the target and background appearance. We propose to exploit hierarchical structural embedding over spatio-temporal space, which is compact, powerful, and flexible in contrast to current tracking-by-detection methods. Specifically, our model segments and tracks instances across space and time in a single forward pass, which is formulated as hierarchical embedding learning. The model is trained to locate the pixels belonging to specific instances over a video clip. We firstly take advantage of a novel mixing function to better fuse spatio-temporal embeddings. Moreover, we introduce normalizing flows to further improve the robustness of the learned appearance embedding, which theoretically extends conventional generative flows to a factorized conditional scheme. Comprehensive experiments on the video instance segmentation benchmark, i.e., YouTube-VIS, demonstrate the effectiveness of the proposed approach. Furthermore, we evaluate our method on an unsupervised video object segmentation dataset to demonstrate its generalizability.
Document type Conference contribution
Note With supplemental material
Language English
Published at https://doi.org/10.1145/3474085.3475342
Downloads
3474085.3475342 (Final published version)
Supplementary materials
Permalink to this page
Back