ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
| Authors | |
|---|---|
| Publication date | 2025 |
| Book title | MM '25 |
| Book subtitle | Proceedings of the 33rd ACM International Conference on Multimedia : October 27-31, 2025, Dublin Ireland |
| ISBN (electronic) |
|
| Event | 33rd ACM International Conference on Multimedia |
| Pages (from-to) | 6700-6709 |
| Publisher | New York, NY: Association for Computing Machinery |
| Organisations |
|
| Abstract |
Visual art understanding requires joint modeling of multiple perspectives and contextual inference rooted in cultural, historical, and stylistic knowledge. Recent multimodal large language models (MLLMs) demonstrate strong performance in generic captioning, primarily based on object recognition and training on large-scale generic data. They struggle in providing captions incorporating the multiple perspectives that fine art demands. In this work, we introduce ArtRAG, a novel training-free framework that integrates structured knowledge into a retrieval-augmented generation (RAG) pipeline for multi-perspective artwork explanation. ArtRAG automatically constructs an Art Context Knowledge Graph (ACKG) from domain-specific textual sources, organizing entities such as artists, themes, movements, and historical events into a rich, interpretable knowledge graph. At inference time, a multi-granular structured context retriever selects semantically and topologically relevant subgraphs to guide explanation generation. This approach enables MLLMs to produce contextually grounded, multi-perspective descriptions. Experiments on the SemArt and Artpedia datasets demonstrate that ArtRAG outperforms existing heavily trained baselines. Human evaluations further confirm ArtRAG's ability to generate coherent, informative, and culturally enriched interpretations of artworks.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/3746027.3755673 |
| Downloads |
3746027.3755673
(Final published version)
|
| Permalink to this page | |
