Dynamic Transformer for Few-shot Instance Segmentation

H. Wang; J. Liu; Y. Liu; S. Maji; J.-J. Sonke; E. Gavves

doi:https://doi.org/10.1145/3503161.3548227

Dynamic Transformer for Few-shot Instance Segmentation

Authors	H. Wang J. Liu Y. Liu S. Maji J.-J. Sonke E. Gavves
Publication date	2022
Book title	MM '22
Book subtitle	proceedings of the 30th ACM International Conference on Multimedia : October 10-14, 2022, Lisboa, Portugal
ISBN (electronic)	9781450392037
Event	30th ACM International Conference on Multimedia
Pages (from-to)	2969–2977
Publisher	New York, NY: The Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Few-shot instance segmentation aims to train an instance segmentation model that can fast adapt to novel classes with only a few reference images. Existing methods are usually derived from standard detection models and tackle few-shot instance segmentation indirectly by conducting classification, box regression, and mask prediction on a large set of redundant proposals followed by indispensable post-processing, e.g., Non-Maximum Suppression. Such complicated hand-crafted procedures and hyperparameters lead to degraded optimization and insufficient generalization ability. In this work, we propose an end-to-end Dynamic Transformer Network, DTN for short, to directly segment all target object instances from arbitrary categories given by reference images, relieving the requirements of dense proposal generation and post-processing. Specifically, a small set of Dynamic Queries, conditioned on reference images, are exclusively assigned to target object instances and generate all the instance segmentation masks of reference categories simultaneously. Moreover, a Semantic-induced Transformer Decoder is introduced to constrain the cross-attention between dynamic queries and target images within the pixels of the reference category, which suppresses the noisy interaction with the background and irrelevant categories. Extensive experiments are conducted on the COCO-20 dataset. The experiment results demonstrate that our proposed Dynamic Transformer Network significantly outperforms the state-of-the-arts.
Document type	Conference contribution
Note	With supplementary video
Language	English
Published at	https://doi.org/10.1145/3503161.3548227
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Dynamic Transformer for Few-shot Instance Segmentation