Unifying Online and Counterfactual Learning to Rank

H. Oosterhuis; M. de Rijke

Unifying Online and Counterfactual Learning to Rank

Authors	H. Oosterhuis M. de Rijke
Publication date	08-12-2020
Number of pages	9
Publisher	ArXiv
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Optimizing ranking systems based on user interactions is a well-studied problem. State-of-the-art methods for optimizing ranking systems based on user interactions are divided into online approaches - that learn by directly interacting with users - and counterfactual approaches - that learn from historical interactions. Existing online methods are hindered without online interventions and thus should not be applied counterfactually. Conversely, counterfactual methods cannot directly benefit from online interventions. We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. The estimator corrects for the effect of position bias, trust bias, and item-selection bias by using corrections based on the behavior of the logging policy and on online interventions: changes to the logging policy made during the gathering of click data. Our experimental results, conducted in a semi-synthetic experimental setup, show that, unlike existing counterfactual LTR methods, the intervention-aware estimator can greatly benefit from online interventions.
Document type	Working paper
Language	English
Related publication	Unifying Online and Counterfactual Learning to Rank
Published at	https://arxiv.org/abs/2012.04426
Downloads	oosterhuis-2020-unifying-arxiv (Submitted manuscript)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Unifying Online and Counterfactual Learning to Rank