Wireless channel selection with restless bandits

J. Kuhn; Y. Nazarathy

doi:https://doi.org/10.1007/978-3-319-47766-4_18

Wireless channel selection with restless bandits

Authors	J. Kuhn Y. Nazarathy
Publication date	2017
Host editors	R.J. Boucherie N.M. van Dijk
Book title	Markov Decision Processes in Practice
ISBN	9783319477640
ISBN (electronic)	9783319477664
Series	International Series in Operations Research and Management Science
Pages (from-to)	463-485
Number of pages	23
Publisher	Cham: Springer
Organisations	Faculty of Science (FNWI) - Korteweg-de Vries Institute for Mathematics (KdVI)
Abstract	Wireless devices are often able to communicate on several alternative channels; for example, cellular phones may use several frequency bands and are equipped with base-station communication capability together with WiFi and Bluetooth communication. Automatic decision support systems in such devices need to decide which channels to use at any given time so as to maximize the long-run average throughput. A good decision policy needs to take into account that, due to cost, energy, technical, or performance constraints, the state of a channel is only sensed when it is selected for transmission. Therefore, the greedy strategy of always exploiting those channels assumed to yield the currently highest transmission rate is not necessarily optimal with respect to long-run average throughput. Rather, it may be favourable to give some priority to the exploration of channels of uncertain quality. In this chapter we model such on-line control problems as a special type of Restless Multi-Armed Bandit (RMAB) problem in a partially observable Markov decision process framework. We refer to such models as Reward-Observing Restless Multi-Armed Bandit (RORMAB) problems. These types of optimal control problems were previously considered in the literature in the context of: (i) the Gilbert-Elliot (GE) channels (where channels are modelled as a two state Markov chain), and (ii) Gaussian autoregressive (AR) channels of order 1. A virtue of this chapter is that we unify the presentation of both types of models under the umbrella of our newly defined RORMAB. Further, since RORMAB is a special type of RMAB we also present an account of RMAB problems together with a pedagogical development of the Whittle index which provides an approximately optimal control method. Numerical examples are provided.
Document type	Chapter
Language	English
Published at	https://doi.org/10.1007/978-3-319-47766-4_18 (Final published version)
Other links	https://www.scopus.com/pages/publications/85015031046
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Wireless channel selection with restless bandits