Exemelification of parliamentary debates

Authors
Publication date 2009
Host editors
  • R. Aly
  • C. Hauff
  • I. den Hamer
  • D. Hiemstra
  • T. Huibers
  • F. de Jong
Book title Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop (DIR 2009)
Series CTIT Workshop Proceedings Series
Event 9th Dutch-Belgian Information Retrieval Workshop (DIR 2009), Enschede, The Netherlands
Pages (from-to) 19-25
Publisher Enschede: University of Twente, Centre for Telematics and Information Technology (CTIT)
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Parliamentary debates are an interesting domain to apply state-of-the-art information retrieval technology. Parliamentary debates are highly structured transcripts of meetings of politicians in parliament. These debates are an important part of the cultural heritage of countries; they are often free of copy-right; citizens often have a legal right to inspect them; and several countries make great effort to digitize their entire historical collection and open that up to the general public. This provides many opportunities for the IR community.
In this paper we analyze the structure of the parliamentary proceedings and sketch a widely applicable DTD. We show how proceedings in PDF format can be transformed into deeply nested XML. We call this process "exemelification". Having the proceedings in XML makes a wide range of applications possible. We elaborate on four of these: entry point retrieval, advanced content and structure search; automatic creation of tables of contents and hyperlinked navigation menus; large savings on storage space and bandwidth for scanned documents.
Document type Conference contribution
Language English
Published at http://dir2009.cs.utwente.nl/dir2009proceedings.pdf
Permalink to this page
Back