Comparing domain-specific and domain-general BERT variants for inferred real-world knowledge through rare grammatical features in Serbian

S. Lee; J. Bloem

Comparing domain-specific and domain-general BERT variants for inferred real-world knowledge through rare grammatical features in Serbian

Authors	S. Lee J. Bloem
Publication date	2023
Host editors	J. Piskorski M. Marcińczuk P. Nakov M. Ogrodniczuk S. Pollak P. Přibáň P. Rybak J. Steinberger R. Yangarber
Book title	The 9th Workshop on Slavic Natural Language Processing 2023
Book subtitle	EACL 2023 : proceedings of the workshop (SlavicNLP 2023) : May 6, 2023
ISBN (electronic)	9781959429579
Event	The 9th Workshop on Slavic Natural Language Processing
Pages (from-to)	47-60
Number of pages	14
Publisher	Stroudsburg, PA: Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Transfer learning is one of the prevailing approaches towards training language-specific BERT models. However, some languages have uncommon features that may prove to be challenging to more domain-general models but not domain-specific models. Comparing the performance of BERTić, a Bosnian-Croatian-Montenegrin-Serbian model, and Multilingual BERT on a Named-Entity Recognition (NER) task and Masked Language Modelling (MLM) task based around a rare phenomenon of indeclinable female foreign names in Serbian reveals how the different training approaches impacts their performance. Multilingual BERT is shown to perform better than BERTić in the NER task, but BERTić greatly exceeds in the MLM task. Thus, there are applications both for domain-general training and domain-specific training depending on the tasks at hand.
Document type	Conference contribution
Language	English
Published at	https://aclanthology.org/2023.bsnlp-1.7
Downloads	2023.bsnlp-1.7 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Comparing domain-specific and domain-general BERT variants for inferred real-world knowledge through rare grammatical features in Serbian