The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models
| Authors | |
|---|---|
| Publication date | 2023 |
| Host editors |
|
| Book title | The 2023 Conference on Empirical Methods in Natural Language Processing |
| Book subtitle | EMNLP 2023 : Proceedings of the Conference : December 6-10, 2023 |
| ISBN (electronic) |
|
| Event | 2023 Conference on Empirical Methods in Natural Language Processing |
| Pages (from-to) | 5817–5830 |
| Publisher | Stroudsburg, PA: Association for Computational Linguistics |
| Organisations |
|
| Abstract |
Despite the impressive performance achieved by pre-trained language-and-vision models in downstream tasks, it remains an open question whether this reflects a proper understanding of image-text interaction. In this work, we explore to what extent they handle basic linguistic constructions—active-passive voice, coordination, and relative clauses—that even preschool children can typically master. We present BLA, a novel, automatically constructed benchmark to evaluate multimodal models on these Basic Language Abilities. We show that different types of Transformer-based systems, such as CLIP, ViLBERT, and BLIP2, generally struggle with BLA in a zero-shot setting, in line with previous findings. Our experiments, in particular, show that most of the tested models only marginally benefit when fine-tuned or prompted with construction-specific samples. Yet, the generative BLIP2 shows promising trends, especially in an in-context learning setting. This opens the door to using BLA not only as an evaluation benchmark but also to improve models’ basic language abilities.
|
| Document type | Conference contribution |
| Note | With supplementary video |
| Language | English |
| Related dataset | The BLA Benchmark: Investigating Basic Language Abilities of Multimodal Models |
| Published at | https://doi.org/10.18653/v1/2023.emnlp-main.356 |
| Downloads |
2023.emnlp-main.356
(Final published version)
|
| Supplementary materials | |
| Permalink to this page | |
