Cascading Non-stationary Bandits: Online Learning to Rank in the Non-stationary Cascade Model
| Authors | |
|---|---|
| Publication date | 29-05-2019 |
| Edition | v1 |
| Number of pages | 12 |
| Publisher | Ithaca, NY: ArXiv |
| Organisations |
|
| Abstract |
Non-stationarity appears in many online applications such as web search and advertising. In this paper, we study the online learning to rank problem in a non-stationary environment where user preferences change abruptly at an unknown moment in time. We consider the problem of identifying the K most attractive items and propose cascading non-stationary bandits, an online learning variant of the cascading model, where a user browses a ranked list from top to bottom and clicks on the first attractive item. We propose two algorithms for solving this non-stationary problem: CascadeDUCB and CascadeSWUCB. We analyze their performance and derive gap-dependent upper bounds on the n-step regret of these algorithms. We also establish a lower bound on the regret for cascading non-stationary bandits and show that both algorithms match the lower bound up to a logarithmic factor. Finally, we evaluate their performance on a real-world web search click dataset.
|
| Document type | Working paper |
| Note | Versions v2 and v3 (2019) also available on ArXiv.org. |
| Language | English |
| Related publication | Cascading non-stationary bandits: Online learning to rank in the non-stationary cascade model |
| Published at | https://arxiv.org/abs/1905.12370 |
| Downloads |
1905.12370v1
(Accepted author manuscript)
|
| Permalink to this page | |
