Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search Dataset

P. Hager; R. Deffayet; Jean-Michel Renders; Onno Zoeter; Maarten de Rijke

doi:https://doi.org/10.1145/3626772.3657892

Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search Dataset

Authors	P. Hager R. Deffayet Jean-Michel Renders Onno Zoeter Maarten de Rijke
Publication date	2024
Book title	SIGIR '24
Book subtitle	Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval : July 14-18, 2024, Washington, DC, USA
ISBN (electronic)	9798400704314
Event	47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024
Pages (from-to)	1546-1556
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Unbiased learning-to-rank (ULTR) is a well-established framework for learning from user clicks, which are often biased by the ranker collecting the data. While theoretically justified and extensively tested in simulation, ULTR techniques lack empirical validation, especially on modern search engines. The Baidu-ULTR dataset released for the WSDM Cup 2023, collected from Baidu's search engine, offers a rare opportunity to assess the real-world performance of prominent ULTR techniques. Despite multiple submissions during the WSDM Cup 2023 and the subsequent NTCIR ULTRE-2 task, it remains unclear whether the observed improvements stem from applying ULTR or other learning techniques. In this work, we revisit and extend the available experiments on the Baidu-ULTR dataset. We find that standard unbiased learning-to-rank techniques robustly improve click predictions but struggle to consistently improve ranking performance, especially considering the stark differences obtained by choice of ranking loss and query-document features. Our experiments reveal that gains in click prediction do not necessarily translate to enhanced ranking performance on expert relevance annotations, implying that conclusions strongly depend on how success is measured in this benchmark.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3626772.3657892 (Final published version)
Downloads	3626772.3657892 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search Dataset