When people change their mind: Off-policy evaluation in non-stationary recommendation environments

R. Jagerman; I. Markov; M. de Rijke

doi:https://doi.org/10.1145/3289600.3290958

When people change their mind: Off-policy evaluation in non-stationary recommendation environments

Authors	R. Jagerman I. Markov M. de Rijke
Publication date	2019
Book title	WSDM'19
Book subtitle	proceedings of the Twelfth ACM International Conference on Web Search and Data Mining : February 11-15, 2019 : Melbourne, Australia
ISBN (electronic)	9781450359405
Event	12th ACM International Conference on Web Search and Data Mining, WSDM 2019
Pages (from-to)	447-455
Number of pages	9
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI) Faculty of Science (FNWI)
Abstract	We consider the novel problem of evaluating a recommendation policy offline in environments where the reward signal is non-stationary. Non-stationarity appears in many Information Retrieval (IR) applications such as recommendation and advertising, but its effect on off-policy evaluation has not been studied at all. We are the first to address this issue. First, we analyze standard off-policy estimators in non-stationary environments and show both theoretically and experimentally that their bias grows with time. Then, we propose new off-policy estimators with moving averages and show that their bias is independent of time and can be bounded. Furthermore, we provide a method to trade-off bias and variance in a principled way to get an off-policy estimator that works well in both non-stationary and stationary environments. We experiment on publicly available recommendation datasets and show that our newly proposed moving average estimators accurately capture changes in non-stationary environments, while standard off-policy estimators fail to do so.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3289600.3290958 (Final published version)
Other links	https://www.scopus.com/pages/publications/85061745088
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

When people change their mind: Off-policy evaluation in non-stationary recommendation environments