Filtering and clustering XML retrieval results

Authors
Publication date 2007
Host editors
  • N. Fuhr
  • M. Lalmas
  • A. Trotman
Book title Comparative Evaluation of XML Information Retrieval Systems
Book subtitle 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, Dagstuhl Castle, Germany, December 17-20, 2006 : revised and selected papers
ISBN
  • 9783540738879
ISBN (electronic)
  • 9783540738886
Series Lecture Notes in Computer Science
Event Comparative evaluation of XML information retrieval systems : 5th international workshop of the initiative for the evaluation of XML retrieval, INEX 2006
Pages (from-to) 121-136
Publisher Berlin: Springer
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
As part of the INEX 2006 Adhoc Track, we conducted a range of experiments with filtering and clustering XML element retrieval results. Our basic retrieval engine retrieves arbitrary elements from the collection (corresponding to the Thorough Task). These runs are filtered to remove textual overlap between elements (corresponding to the Focused Task). The resulting runs can be clustered per article (corresponding to the All in Context Task). Finally, we select the “best” element for each article (corresponding to the Best in Context Task). Our main findings are the following. First, a complete element index outperforms a restricted index based on section-structure, albeit the differences are small. Second, grouping non-overlapping elements per article does not lead to performance degradation, but may improve scores. Third, all restrictions of the “pure” element runs (by removing overlap, by grouping elements per article, or by selecting a single element per article) lead to some but only moderate loss of precision.
Document type Conference contribution
Language English
Published at https://doi.org/10.1007/978-3-540-73888-6_13
Permalink to this page
Back