Interactive Image Retrieval Meets Query Rewriting with Large Language and Vision Language Models

Open Access
Authors
Publication date 10-2025
Journal ACM Transactions on Multimedia Computing Communications and Applications
Article number 286
Volume | Issue number 21 | 10
Number of pages 23
Organisations
  • Faculty of Economics and Business (FEB) - Amsterdam Business School Research Institute (ABS-RI)
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Image search is a pivotal task in multi-media and computer vision, finding applications across diverse domains, ranging from internet search to medical diagnostics. Conventional image search systems operate by accepting textual or visual queries and retrieving the top-relevant candidate results from the database. However, prevalent methods often rely on single-turn procedures, introducing potential inaccuracies and limited recall. These methods also face challenges, such as vocabulary mismatch and the semantic gap, constraining their overall effectiveness. To address these issues, we propose an interactive image retrieval system capable of refining queries based on user relevance feedback in a multi-turn setting. This system incorporates an image captioner based on a vision-language model (VLM) to enhance the quality of text-based queries, resulting in more informative queries with each iteration. Moreover, we introduce a denoiser based on a large language model (LLM) to refine text-based query expansions, mitigating inaccuracies in image descriptions generated by captioning models. To evaluate our system, we curate a new dataset by adapting the MSR-VTT and MSVD video retrieval datasets to the image retrieval task, offering multiple relevant ground-truth images for each query. Through comprehensive experiments, we validate the effectiveness of our proposed system against baseline methods, achieving state-of-the-art performance with a notable 10 the integration of an LLM-based denoiser, the curation of a meticulously designed evaluation dataset, and thorough experimental validation.
Document type Article
Language English
Published at https://doi.org/10.1145/3744910
Downloads
Permalink to this page
Back