Comparative analysis of clicks and judgments ir evaluation

doi:https://doi.org/10.1145/1507509.1507522

Comparative analysis of clicks and judgments ir evaluation

Authors	J. Kamps M. Koolen A. Trotman
Publication date	2009
Book title	Proceedings of Workshop on Web Search Click Data (WSCD09)
Book subtitle	Barcelona, Spain, February 9, 2009
ISBN	9781605584348
Event	Workshop on Web Search Click Data (WSCD09), Barcelona, Spain
Pages (from-to)	80-87
Number of pages	8
Publisher	New York: ACM Press
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Queries and click-through data taken from search engine transaction logs is an attractive alternative to traditional test collections, due to its volume and the direct relation to end-user querying. The overall aim of this paper is to answer the question: How does click-through data differ from explicit human relevance judgments in information retrieval evaluation? We compare a traditional test collection with manual judgments to transaction log based test collections---by using queries as topics and subsequent clicks as pseudo-relevance judgments for the clicked results. Specifically, we investigate the following two research questions: Firstly, are there significant differences between clicks and relevance judgments. Earlier research suggests that although clicks and explicit judgments show reasonable agreement, clicks are different from static absolute relevance judgments. Secondly, are there significant differences between system ranking based on clicks and based on relevance judgments? This is an open question, but earlier research suggests that comparative evaluation in terms of system ranking is remarkably robust.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/1507509.1507522
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Comparative analysis of clicks and judgments ir evaluation