Multimodal representation learning for fine art analysis

A. Efthymiou

Multimodal representation learning for fine art analysis

Authors	A. Efthymiou
Supervisors	N.M. Wijnberg M. Worring
Award date	25-06-2026
ISBN	9789465375632
Number of pages	276
Organisations	Faculty of Economics and Business (FEB) - Amsterdam Business School Research Institute (ABS-RI)
Abstract	This thesis develops multimodal representation learning methods for fine art analysis. Fine art exists within historically situated systems of creation, interpretation, and institutional evaluation in which artworks, artists, movements, and institutions are simultaneously represented as visual artifacts, documented in textual records, and interconnected through relational structures. The thesis addresses four fundamental challenges: enriching visual representations with structured knowledge graph relationships for artwork categorization; measuring abstract cultural constructs such as originality and canonization; modeling temporal dynamics of creative careers; and handling modality asymmetry in fine art knowledge graphs where different entity types are characterized by different modality combinations. First, the thesis introduces ArtSAGENet, a multimodal architecture that integrates visual representations of artworks with graph neural networks for style classification, artist attribution, creation period estimation, and tag prediction, improving performance by modeling semantic relationships between artists and artworks. Next, the thesis develops a computational measure of visual originality and examines its relationship to long-term canonization across expert, peer, and market evaluative regimes, defining visual originality as novelty expressed in the visual features of artworks relative to visually similar predecessors within their historical context. The thesis then introduces Set2Seq Transformer, an architecture for sequential multiple-instance learning that learns temporal and position-aware representations of sets across time. Finally, the thesis proposes VL-KGE, a multimodal knowledge graph embedding framework that integrates pretrained vision-language models with structured relational modeling to learn unified representations for heterogeneous fine art knowledge graphs.
Document type	PhD thesis
Language	English
Downloads	Thesis
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Multimodal representation learning for fine art analysis