MVT: Multi-view Vision Transformer for 3D object recognition

S. Chen; T. Yu; P. Li

MVT: Multi-view Vision Transformer for 3D object recognition

Authors	S. Chen T. Yu P. Li
Publication date	2021
Book title	32nd British Machine Vision Conference 2021
Book subtitle	BMVC 2021, Online, November 22-25, 2021
Event	32nd British Machine Vision Conference
Article number	349
Number of pages	14
Publisher	BMVA Press
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Inspired by the great success achieved by CNN in image recognition, view-based methods applied CNNs to model the projected views for 3D object understanding and achieved excellent performance. Nevertheless, multi-view CNN models cannot model the communications between patches from different views, limiting its effectiveness in 3D object recognition. Inspired by the recent success gained by vision Transformer in image recognition, we propose a Multi-view Vision Transformer (MVT) for 3D object recognition. Since each patch feature in a Transformer block has a global reception field, it naturally achieves communications between patches from different views. Meanwhile, it takes much less inductive bias compared with its CNN counterparts. Considering both effectiveness and efficiency, we develop a global-local structure for our MVT. Our experiments on two public benchmarks, ModelNet40 and ModelNet10, demonstrate the competitive performance of our MVT.
Document type	Conference contribution
Language	English
Other links	https://dblp.org/db/conf/bmvc/bmvc2021.html https://www.bmvc2021-virtualconference.com/programme/accepted-papers/
Downloads	0264 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

MVT: Multi-view Vision Transformer for 3D object recognition