Learning Hierarchical Embedding for Video Instance Segmentation

Z. Qin; X. Lu; X. Nie; X. Zhen; Y. Yin

doi:https://doi.org/10.1145/3474085.3475342

Learning Hierarchical Embedding for Video Instance Segmentation

Authors	Z. Qin X. Lu X. Nie X. Zhen Y. Yin
Publication date	2021
Book title	MM '21
Book subtitle	Proceedings of the 29th ACM International Conference on Multimedia : October 20-24, 2021, Virtual Event, China
ISBN (electronic)	9781450386517
Event	29th ACM International Conference on Multimedia, MM 2021
Pages (from-to)	1884-1892
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	In this paper, we address video instance segmentation using a new generative model that learns effective representations of the target and background appearance. We propose to exploit hierarchical structural embedding over spatio-temporal space, which is compact, powerful, and flexible in contrast to current tracking-by-detection methods. Specifically, our model segments and tracks instances across space and time in a single forward pass, which is formulated as hierarchical embedding learning. The model is trained to locate the pixels belonging to specific instances over a video clip. We firstly take advantage of a novel mixing function to better fuse spatio-temporal embeddings. Moreover, we introduce normalizing flows to further improve the robustness of the learned appearance embedding, which theoretically extends conventional generative flows to a factorized conditional scheme. Comprehensive experiments on the video instance segmentation benchmark, i.e., YouTube-VIS, demonstrate the effectiveness of the proposed approach. Furthermore, we evaluate our method on an unsupervised video object segmentation dataset to demonstrate its generalizability.
Document type	Conference contribution
Note	With supplemental material
Language	English
Published at	https://doi.org/10.1145/3474085.3475342
Downloads	3474085.3475342 (Final published version)
Supplementary materials	mfp0995aux
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Learning Hierarchical Embedding for Video Instance Segmentation