The importance of morphological normalization for XML retrieval

J. Kamps; M.J. Marx; M. de Rijke; B. Sigurbjörnsson

The importance of morphological normalization for XML retrieval

Authors	J. Kamps M.J. Marx M. de Rijke B. Sigurbjörnsson
Publication date	2003
Host editors	N. Fuhr N. Gö G. Kazai
Book title	Proceedings of the First Annual Workshop of the Initiative for the Evaluation of XML Retrieval (INEX)
Pages (from-to)	41-48
Publisher	ERCIM Publications
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Current information retrieval systems typically ignore structural aspects of documents, solely focusing on the textual content instead. But documents containing additional structure in the form of HTML, XML, or SGML mark-up are pervasive on the Internet. The XML retrieval task presents a number of challenges for information retrieval, for we can no longer rely on the appropriate unit of retrieval to be fixed, or to be known beforehand. This implies that the effectiveness of standard IR techniques, such as morphological normalization methods, may not carry over to this particular task. This paper describes the fully automatic runs for the INEX 2002 task submitted by the Language and Inference Technology Group at the University of Amsterdam. We investigate the effectiveness of two standard approaches to morphological normalization, both a linguistically otivated stemming algorithm and a knowledge-poor character n-gramming technique. Our results show that morphological normalization is an important issue for XML retrieval. For all measurements, the combined run and the n-gram run perform better than the stemmed run.
Document type	Conference contribution
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

The importance of morphological normalization for XML retrieval