Contextual-Bandit based MIMO Relay Selection Policy with Channel Uncertainty

Ankit Gupta, Naveen Mysore Balasubramanya, Mathini Sellathurai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)


In this work, we exploit the potential benefits of multi-arm bandit scheme in cooperative multiple-input multiple output (MIMO) wireless networks. In particular, we consider an online-policy for amplify-and-forward MIMO relay selection (RS), where relays are provided with uncertain channel state information (CSI). We design the RS policy as a sequential experience-driven learning algorithm with a contextual bandit (CB) approach, where the algorithm learns to select an optimal relay node using the imperfect CSI provided as a context vector and the past experience of rewards procured with current policy, with the aim of maximizing the cumulative mean reward over time. Further, with extensive simulation result, we demonstrate that proposed CB based RS policy achieves superior performance gains compared to conventional Gram-Schmidt method.

Original languageEnglish
Title of host publication2020 IEEE International Conference on Communications (ICC)
ISBN (Electronic)9781728150895
Publication statusPublished - 27 Jul 2020
Event2020 IEEE International Conference on Communications - Dublin, Ireland
Duration: 7 Jun 202011 Jun 2020

Publication series

NameIEEE International Conference on Communications
ISSN (Print)1550-3607


Conference2020 IEEE International Conference on Communications
Abbreviated titleICC 2020

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Contextual-Bandit based MIMO Relay Selection Policy with Channel Uncertainty'. Together they form a unique fingerprint.

Cite this