The ParlaMint corpora of parliamentary proceedings

Open Access
Authors
  • T. Erjavec
  • M. Ogrodniczuk
  • P. Osenova
  • N. Ljubešić
  • K. Simov
  • A. Pančur
  • M. Rudolf
  • M. Kopp
  • S. Barkarson
  • S. Steingrímsson
  • Ç. Çöltekin
  • J. de Does
  • K. Depuydt
  • T. Agnoloni
  • G. Venturi
  • M.C. Pérez
  • L.D. de Macedo
  • C. Navarretta
  • G. Luxardo
  • M. Coole
  • P. Rayson
  • V. Morkevičius
  • T. Krilavičius
  • R. Darģis
  • O. Ring
  • R. van Heusden
  • M. Marx ORCID logo
  • D. Fišer
Publication date 03-2023
Journal Language Resources and Evaluation
Volume | Issue number 57 | 1
Pages (from-to) 415-448
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.
Document type Article
Note Publisher Copyright: © 2022, The Author(s).
Language English
Published at https://doi.org/10.1007/s10579-021-09574-0
Other links https://www.scopus.com/pages/publications/85124105199
Downloads
Permalink to this page
Back