Adding object detection skills to visual dialogue agents
| Authors |
|
|---|---|
| Publication date | 2019 |
| Host editors |
|
| Book title | Computer Vision – ECCV 2018 Workshops |
| Book subtitle | Munich, Germany, September 8-14, 2018 : proceedings |
| ISBN |
|
| ISBN (electronic) |
|
| Series | Lecture Notes in Computer Science |
| Event | 15th European Conference on Computer Vision, Workshops |
| Volume | Issue number | IV |
| Pages (from-to) | 180-187 |
| Publisher | Cham: Springer |
| Organisations |
|
| Abstract |
Our goal is to equip a dialogue agent that asks questions about a visual scene with object detection skills. We take the first steps in this direction within the GuessWhat?! game. We use Mask R-CNN object features as a replacement for ground-truth annotations in the Guesser module, achieving an accuracy of 57.92%. This proves that our system is a viable alternative to the original Guesser, which achieves an accuracy of 62.77% using ground-truth annotations, and thus should be considered an upper bound for our automated system. Crucially, we show that our system exploits the Mask R-CNN object features, in contrast to the original Guesser augmented with global, VGG features. Furthermore, by automating the object detection in GuessWhat?!, we open up a spectrum of opportunities, such as playing the game with new, non-annotated images and using the more granular visual features to condition the other modules of the game architecture.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1007/978-3-030-11018-5_17 |
| Published at | https://staff.fnwi.uva.nl/r.fernandezrovira/papers/2018/BaniEtal-sivl2018.pdf |
| Downloads |
BaniEtal-sivl2018
(Accepted author manuscript)
Adding Object Detection Skills to Visual Dialogue Agents
(Final published version)
|
| Permalink to this page | |