ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding

Open Access
Authors
Publication date 2025
Book title MM '25
Book subtitle Proceedings of the 33rd ACM International Conference on Multimedia : October 27-31, 2025, Dublin Ireland
ISBN (electronic)
  • 9798400720352
Event 33rd ACM International Conference on Multimedia
Pages (from-to) 6700-6709
Publisher New York, NY: Association for Computing Machinery
Organisations
  • Faculty of Economics and Business (FEB) - Amsterdam Business School Research Institute (ABS-RI)
  • Faculty of Economics and Business (FEB)
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Visual art understanding requires joint modeling of multiple perspectives and contextual inference rooted in cultural, historical, and stylistic knowledge. Recent multimodal large language models (MLLMs) demonstrate strong performance in generic captioning, primarily based on object recognition and training on large-scale generic data. They struggle in providing captions incorporating the multiple perspectives that fine art demands. In this work, we introduce ArtRAG, a novel training-free framework that integrates structured knowledge into a retrieval-augmented generation (RAG) pipeline for multi-perspective artwork explanation. ArtRAG automatically constructs an Art Context Knowledge Graph (ACKG) from domain-specific textual sources, organizing entities such as artists, themes, movements, and historical events into a rich, interpretable knowledge graph. At inference time, a multi-granular structured context retriever selects semantically and topologically relevant subgraphs to guide explanation generation. This approach enables MLLMs to produce contextually grounded, multi-perspective descriptions. Experiments on the SemArt and Artpedia datasets demonstrate that ArtRAG outperforms existing heavily trained baselines. Human evaluations further confirm ArtRAG's ability to generate coherent, informative, and culturally enriched interpretations of artworks.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/3746027.3755673
Downloads
3746027.3755673 (Final published version)
Permalink to this page
Back