POS-tagging of Historical Dutch

D. Hupkes; R. Bod

POS-tagging of Historical Dutch

Authors	D. Hupkes R. Bod
Publication date	2016
Host editors	N. Calzolari K. Choukri T. Declerck S. Goggi M. Grobelnik B. Maegaard J. Mariani H. Mazo A. Moreno J. Odijk S. Piperidis
Book title	LREC 2016 : Tenth International Conference on Language Resources and Evaluation
Book subtitle	May 23-28, 2016, Grand Hotel Bernardin Conference Center, Portorož, Slovenia
ISBN (electronic)	9782951740891
Event	Language Resources and Evaluation Conference (LREC 2016)
Pages (from-to)	77-82
Publisher	Paris: European Language Resources Association (ELRA)
Organisations	Faculty of Science (FNWI) Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	We present a study of the adequacy of current methods that are used for POS-tagging historical Dutch texts, as well as an exploration of the influence of employing different techniques to improve upon the current practice. The main focus of this paper is on (unsupervised) methods that are easily adaptable for different domains without requiring extensive manual input. It was found that modernising the spelling of corpora prior to tagging them with a tagger trained on contemporary Dutch results in a large increase in accuracy, but that spelling normalisation alone is not sufficient to obtain state-of-the-art results. The best results were achieved by training a POS-tagger on a corpus automatically annotated by projecting (automatically assigned) POS-tags via word alignments from a contemporary corpus. This result is promising, as it was reached without including any domain knowledge or context dependencies. We argue that the insights of this study combined with semi-supervised learning techniques for domain adaptation can be used to develop a general-purpose diachronic tagger for Dutch.
Document type	Conference contribution
Language	English
Published at	http://www.lrec-conf.org/proceedings/lrec2016/summaries/196.html
Downloads	196_Paper (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

POS-tagging of Historical Dutch