A multimodal turn in Digital Humanities. Using contrastive machine learning models to explore, enrich, and analyze digital visual historical collections

T. Smits; M. Wevers

doi:https://doi.org/10.1093/llc/fqad008

A multimodal turn in Digital Humanities. Using contrastive machine learning models to explore, enrich, and analyze digital visual historical collections

Authors	T. Smits M. Wevers
Publication date	09-2023
Journal	Digital Scholarship in the Humanities
Volume \| Issue number	38 \| 3
Pages (from-to)	1267-1280
Number of pages	14
Organisations	Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) - Amsterdam School of Historical Studies (ASH)
Abstract	Until recently, most research in the Digital Humanities (DH) was monomodal, meaning that the object of analysis was either textual or visual. Seeking to integrate multimodality theory into the DH, this article demonstrates that recently developed multimodal deep learning models, such as Contrastive Language Image Pre-training (CLIP), offer new possibilities to explore and analyze image–text combinations at scale. These models, which are trained on image and text pairs, can be applied to a wide range of text-to-image, image-to-image, and image-to-text prediction tasks. Moreover, multimodal models show high accuracy in zero-shot classification, i.e. predicting unseen categories across heterogeneous datasets. Based on three exploratory case studies, we argue that this zero-shot capability opens up the way for a multimodal turn in DH research. Moreover, multimodal models allow scholars to move past the artificial separation of text and images that was dominant in the field and analyze multimodal meaning at scale. However, we also need to be aware of the specific (historical) bias of multimodal deep learning that stems from biases in the training data used to train these models.
Document type	Article
Language	English
Published at	https://doi.org/10.1093/llc/fqad008
Other links	https://www.scopus.com/pages/publications/85168633804
Downloads	fqad008 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

A multimodal turn in Digital Humanities. Using contrastive machine learning models to explore, enrich, and analyze digital visual historical collections