Intrinsic evaluation of Mono- and Multilingual Dutch Language Models

D. Vlantis; J. Bloem

Intrinsic evaluation of Mono- and Multilingual Dutch Language Models

Authors	D. Vlantis J. Bloem
Publication date	2025
Journal	Computational Linguistics in the Netherlands Journal
Event	Computational Linguistics in the Netherlands (CLIN) 34
Volume \| Issue number	14
Pages (from-to)	525-553
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC) Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR)
Abstract	Through transfer learning, multilingual language models can produce good results on extrinsic, downstream NLP tasks in low-resource languages despite a lack of abundant training data. In most cases, however, monolingual models still perform better. Using the Dutch SimLex-999 dataset, we intrinsically evaluate several pre-trained monolingual stacked encoder LLMs for Dutch and compare them to several multilingual models that support Dutch, including two with parallel architectures (BERTje and mBERT). We also try to improve these models’ semantic representations by tuning the multilingual models on additional Dutch data. Furthermore, we explore the effect of tuning these models on written versus transcribed spoken data. While we can improve multilingual model performance through fine-tuning, we find that significant amounts of fine-tuning data and compute are required to outscore monolingual models on the intrinsic evaluation metric.
Document type	Article
Language	English
Published at	https://www.clinjournal.org/clinj/article/view/215
Downloads	24_intrinsic_vlantis (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Intrinsic evaluation of Mono- and Multilingual Dutch Language Models