Adding object detection skills to visual dialogue agents

G. Bani; D. Belli; G. Dagan; A. Geenen; A. Skliar; A. Venkatesh; T. Baumgärtner; E. Bruni; R. Fernández

doi:https://doi.org/10.1007/978-3-030-11018-5_17

Adding object detection skills to visual dialogue agents

Authors	G. Bani D. Belli G. Dagan A. Geenen A. Skliar A. Venkatesh T. Baumgärtner E. Bruni R. Fernández
Publication date	2019
Host editors	L. Leal-Taixé S. Roth
Book title	Computer Vision – ECCV 2018 Workshops
Book subtitle	Munich, Germany, September 8-14, 2018 : proceedings
ISBN	9783030110178
ISBN (electronic)	9783030110185
Series	Lecture Notes in Computer Science
Event	15th European Conference on Computer Vision, Workshops
Volume \| Issue number	IV
Pages (from-to)	180-187
Publisher	Cham: Springer
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Our goal is to equip a dialogue agent that asks questions about a visual scene with object detection skills. We take the first steps in this direction within the GuessWhat?! game. We use Mask R-CNN object features as a replacement for ground-truth annotations in the Guesser module, achieving an accuracy of 57.92%. This proves that our system is a viable alternative to the original Guesser, which achieves an accuracy of 62.77% using ground-truth annotations, and thus should be considered an upper bound for our automated system. Crucially, we show that our system exploits the Mask R-CNN object features, in contrast to the original Guesser augmented with global, VGG features. Furthermore, by automating the object detection in GuessWhat?!, we open up a spectrum of opportunities, such as playing the game with new, non-annotated images and using the more granular visual features to condition the other modules of the game architecture.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1007/978-3-030-11018-5_17 (Final published version)
Published at	https://staff.fnwi.uva.nl/r.fernandezrovira/papers/2018/BaniEtal-sivl2018.pdf (Accepted author manuscript)
Downloads	BaniEtal-sivl2018 (Accepted author manuscript) Adding Object Detection Skills to Visual Dialogue Agents (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Adding object detection skills to visual dialogue agents