DutchParl: A corpus of parliamentary documents in Dutch

M. Marx; A. Schuth

DutchParl: A corpus of parliamentary documents in Dutch

Authors	M. Marx A. Schuth
Publication date	2010
Book title	Proceedings of the 10th Dutch-Belgian Information Retrieval Workshop (DIR 2010)
Event	10th Dutch-Belgian Information Retrieval Workshop (DIR 2010), Nijmegen, the Netherlands
Pages (from-to)	82-83
Publisher	Nijmegen: Radboud Universiteit Nijmegen, Information Foraging Lab
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	A corpus called DutchParl is created which aims to contain all digitally available parliamentary documents written in the Dutch language. The first version of DutchParl contains documents from the parliaments of The Netherlands, Flanders and Belgium. The corpus is divided along three dimensions: per parliament, scanned or digital documents, written recordings of spoken text and others. The digital collection contains more than 800 million tokens, the scanned collection more than 1 billion. All documents are available as UTF-8 encoded XML files with extensive metadata in Dublin Core standard. The text itself is divided into pages which are divided into paragraphs. Every document, page and paragraph has a unique URN which resolves to a web page. Every page element in the XML files is connected to a facsimile image of that page in PDF or JPEG format. We created a viewer in which both versions can be inspected simultaneously. A search-engine for the complete collection is available online. The corpus is available for download in several formats. The corpus can be used for corpus-linguistic and political science research, and is suitable for performing scalability tests for XML information systems.
Document type	Conference contribution
Language	English
Published at	http://www.ru.nl/publish/pages/544689/proceedings_dir2010.pdf
Downloads	332665.pdf (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

DutchParl: A corpus of parliamentary documents in Dutch