Wireless channel selection with restless bandits

Authors
Publication date 2017
Host editors
  • R.J. Boucherie
  • N.M. van Dijk
Book title Markov Decision Processes in Practice
ISBN
  • 9783319477640
ISBN (electronic)
  • 9783319477664
Series International Series in Operations Research and Management Science
Pages (from-to) 463-485
Number of pages 23
Publisher Cham: Springer
Organisations
  • Faculty of Science (FNWI) - Korteweg-de Vries Institute for Mathematics (KdVI)
Abstract

Wireless devices are often able to communicate on several alternative channels; for example, cellular phones may use several frequency bands and are equipped with base-station communication capability together with WiFi and Bluetooth communication. Automatic decision support systems in such devices need to decide which channels to use at any given time so as to maximize the long-run average throughput. A good decision policy needs to take into account that, due to cost, energy, technical, or performance constraints, the state of a channel is only sensed when it is selected for transmission. Therefore, the greedy strategy of always exploiting those channels assumed to yield the currently highest transmission rate is not necessarily optimal with respect to long-run average throughput. Rather, it may be favourable to give some priority to the exploration of channels of uncertain quality. In this chapter we model such on-line control problems as a special type of Restless Multi-Armed Bandit (RMAB) problem in a partially observable Markov decision process framework. We refer to such models as Reward-Observing Restless Multi-Armed Bandit (RORMAB) problems. These types of optimal control problems were previously considered in the literature in the context of: (i) the Gilbert-Elliot (GE) channels (where channels are modelled as a two state Markov chain), and (ii) Gaussian autoregressive (AR) channels of order 1. A virtue of this chapter is that we unify the presentation of both types of models under the umbrella of our newly defined RORMAB. Further, since RORMAB is a special type of RMAB we also present an account of RMAB problems together with a pedagogical development of the Whittle index which provides an approximately optimal control method. Numerical examples are provided.

Document type Chapter
Language English
Published at https://doi.org/10.1007/978-3-319-47766-4_18
Other links https://www.scopus.com/pages/publications/85015031046
Permalink to this page
Back