Leveraging Query Expansion and Reformulation for Image Retrieval With Large Language and Vision-Language Models
| Authors |
|
|---|---|
| Publication date | 2024 |
| Book title | 21st International Conference on Content-Based Multimedia Indexing |
| Book subtitle | CBMI 2024 : September 18-20, 2024, Reykjavik, Iceland : conference proceedings |
| ISBN |
|
| ISBN (electronic) |
|
| Event | 21st International Conference on Content-based Multimedia Indexing |
| Pages (from-to) | 23-29 |
| Number of pages | 7 |
| Publisher | Piscataway, NJ: IEEE |
| Organisations |
|
| Abstract |
This research builds on novel text-based image retrieval (IR) methods that leverage vision-language models (VLMs) and large language models (LLMs). The study highlights the need for an image retrieval evaluation strategy that reflects the use of conversational IR systems in the real world, and introduces a novel evaluation framework for interactive ad-hoc text-based IR. Unimodal IR models that perform the retrieval based on image captions generated automatically are compared against popular crossmodal IR models, to conclude that the latter remain superior in performance. Several strategies for automated query expansion (QE) and reformulation (QR) are explored. A generative LLM is prompted to generate keywords related to the original query or rephrase it based on artificial user relevance feedback (RF) deployed in the evaluation framework. Particularly, the image captions of the relevant images retrieved in the first IR round are provided as context for the generative LLM. Our main observation is that the retrieval models based on a large VLM, such as BLIP-2, benefit more from QE, and that the QE strategies based on keyword extraction outperform QR alternatives based on summarization. However, the approach should be further investigated to determine how the results are influenced by the type and quality of image annotations.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1109/CBMI62980.2024.10859227 |
| Other links | https://www.proceedings.com/78720.html |
| Downloads | |
| Permalink to this page | |