An Easter Egg Hunting Approach to Test Collection Building in Dynamic Domains

Open Access
Authors
Publication date 07-06-2016
Host editors
  • C. Clarke
  • E. Yilmaz
  • N. Kando
  • M.P. Kato
  • K. Kishida
  • S. Yamamoto
Book title Proceedings of the Seventh International Workshop on Evaluating Information Access (EVIA 2016)
Book subtitle a Satellite Workshop of the NTCIR-12 Conference, June 7, 2016 Tokyo Japan
ISBN (electronic)
  • 9784860490720
Event Seventh International Workshop on Evaluating Information Access
Number of pages 8
Publisher Tokyo: National Institute of Informatics
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
Test collections for offline evaluation remain crucial for information retrieval research and industrial practice, yet the classical Sparck Jones and Van Rijsbergen approach to test collection building based on the pooling of runs on a large collection is expensive and being pushed beyond its limits with the ever increasing size and dynamic nature of the collections. We experiment with a novel approach to reusable test collection building, where we inject judged pages into an existing corpus, and have systems retrieve pages from the extended corpus with the aim to create a reusable test collection. In a metaphorical way, we hide the Easter eggs for systems to retrieve. Our experiments exploit the unique setup of the TREC Contextual Suggestion Track, which allowed both submissions from a fixed corpus (ClueWeb12) as well as from the open web. We conduct an extensive analysis of the reusability of the test collection based on ClueWeb12, and find it too low for reliable offline testing. Then, we detail the expansion with judged pages from the open web, and do extensive analysis on the reusability of the resulting expanded test collection, and observe a dramatic increase in reusability. Our approach offers novel and cost effective ways to build new test collections, and to refresh and update existing test collections. This explores new ways of effective maintenance of offline test collections for dynamic domains such as the web.
Document type Conference contribution
Language English
Related publication Test Collection Building and Maintenance in Dynamic Domains
Published at http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/pdf/evia/01-EVIA2016-HashemiS.pdf
Other links http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/EVIA/toc_evia.html
Downloads
01-EVIA2016-HashemiS (Final published version)
Permalink to this page
Back