When people change their mind: Off-policy evaluation in non-stationary recommendation environments
| Authors | |
|---|---|
| Publication date | 2019 |
| Book title | WSDM'19 |
| Book subtitle | proceedings of the Twelfth ACM International Conference on Web Search and Data Mining : February 11-15, 2019 : Melbourne, Australia |
| ISBN (electronic) |
|
| Event | 12th ACM International Conference on Web Search and Data Mining, WSDM 2019 |
| Pages (from-to) | 447-455 |
| Number of pages | 9 |
| Publisher | New York, NY: Association for Computing Machinery |
| Organisations |
|
| Abstract |
We consider the novel problem of evaluating a recommendation policy offline in environments where the reward signal is non-stationary. Non-stationarity appears in many Information Retrieval (IR) applications such as recommendation and advertising, but its effect on off-policy evaluation has not been studied at all. We are the first to address this issue. First, we analyze standard off-policy estimators in non-stationary environments and show both theoretically and experimentally that their bias grows with time. Then, we propose new off-policy estimators with moving averages and show that their bias is independent of time and can be bounded. Furthermore, we provide a method to trade-off bias and variance in a principled way to get an off-policy estimator that works well in both non-stationary and stationary environments. We experiment on publicly available recommendation datasets and show that our newly proposed moving average estimators accurately capture changes in non-stationary environments, while standard off-policy estimators fail to do so. |
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/3289600.3290958 |
| Other links | https://www.scopus.com/pages/publications/85061745088 |
| Permalink to this page | |
