Multimodal representation learning for fine art analysis

Open Access
Authors
Supervisors
Award date 25-06-2026
ISBN
  • 9789465375632
Number of pages 276
Organisations
  • Faculty of Economics and Business (FEB) - Amsterdam Business School Research Institute (ABS-RI)
Abstract
This thesis develops multimodal representation learning methods for fine art analysis. Fine art exists within historically situated systems of creation, interpretation, and institutional evaluation in which artworks, artists, movements, and institutions are simultaneously represented as visual artifacts, documented in textual records, and interconnected through relational structures. The thesis addresses four fundamental challenges: enriching visual representations with structured knowledge graph relationships for artwork categorization; measuring abstract cultural constructs such as originality and canonization; modeling temporal dynamics of creative careers; and handling modality asymmetry in fine art knowledge graphs where different entity types are characterized by different modality combinations. First, the thesis introduces ArtSAGENet, a multimodal architecture that integrates visual representations of artworks with graph neural networks for style classification, artist attribution, creation period estimation, and tag prediction, improving performance by modeling semantic relationships between artists and artworks. Next, the thesis develops a computational measure of visual originality and examines its relationship to long-term canonization across expert, peer, and market evaluative regimes, defining visual originality as novelty expressed in the visual features of artworks relative to visually similar predecessors within their historical context. The thesis then introduces Set2Seq Transformer, an architecture for sequential multiple-instance learning that learns temporal and position-aware representations of sets across time. Finally, the thesis proposes VL-KGE, a multimodal knowledge graph embedding framework that integrates pretrained vision-language models with structured relational modeling to learn unified representations for heterogeneous fine art knowledge graphs.
Document type PhD thesis
Language English
Downloads
Permalink to this page
cover
Back