Abstract
In this work, we derive a sequential experience-driven contextual bandit (CB)-based policies for online relay selection in multiple-input multiple-output (MIMO) two-way amplify-and-forward (TWAF) relay networks, where the relays are provided with quantized imperfect channel gain information. The proposed CB-based policy acquires information about the optimal relay node by resolving the exploration-versus-exploitation dilemma. In particular, we propose a linear upper confidence bound (LinUCB)-based CB policy, and an adaptive active greedy (AAG)-based CB policy that utilizes active learning heuristics. With simulation results, we show that the proposed CB-based policies can reduce the feedback overhead by a factor of eight and time-cost by 70% while outperforming the best conventional Gram-Schmidt (GS) algorithm.
Original language | English |
---|---|
Title of host publication | 23rd IEEE International Workshop on Signal Processing Advances in Wireless Communication 2022 |
Publisher | IEEE |
ISBN (Electronic) | 9781665494557 |
DOIs | |
Publication status | Published - 28 Jul 2022 |
Event | 23rd IEEE International Workshop on Signal Processing Advances in Wireless Communication 2022 - Oulu, Finland Duration: 4 Jul 2022 → 6 Jul 2022 |
Conference
Conference | 23rd IEEE International Workshop on Signal Processing Advances in Wireless Communication 2022 |
---|---|
Abbreviated title | SPAWC 2022 |
Country/Territory | Finland |
City | Oulu |
Period | 4/07/22 → 6/07/22 |
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Computer Science Applications
- Information Systems