A Sequential Experience-driven Contextual Bandit Policy for MIMO TWAF Online Relay Selection

Ankit Gupta, Mathini Sellathurai, Tharmalingam Ratnarajah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work, we derive a sequential experience-driven contextual bandit (CB)-based policies for online relay selection in multiple-input multiple-output (MIMO) two-way amplify-and-forward (TWAF) relay networks, where the relays are provided with quantized imperfect channel gain information. The proposed CB-based policy acquires information about the optimal relay node by resolving the exploration-versus-exploitation dilemma. In particular, we propose a linear upper confidence bound (LinUCB)-based CB policy, and an adaptive active greedy (AAG)-based CB policy that utilizes active learning heuristics. With simulation results, we show that the proposed CB-based policies can reduce the feedback overhead by a factor of eight and time-cost by 70% while outperforming the best conventional Gram-Schmidt (GS) algorithm.

Original languageEnglish
Title of host publication23rd IEEE International Workshop on Signal Processing Advances in Wireless Communication 2022
PublisherIEEE
ISBN (Electronic)9781665494557
DOIs
Publication statusPublished - 28 Jul 2022
Event23rd IEEE International Workshop on Signal Processing Advances in Wireless Communication 2022 - Oulu, Finland
Duration: 4 Jul 20226 Jul 2022

Conference

Conference23rd IEEE International Workshop on Signal Processing Advances in Wireless Communication 2022
Abbreviated titleSPAWC 2022
Country/TerritoryFinland
CityOulu
Period4/07/226/07/22

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'A Sequential Experience-driven Contextual Bandit Policy for MIMO TWAF Online Relay Selection'. Together they form a unique fingerprint.

Cite this