Abstract
In this work, we derive a sequential experience-driven contextual bandit (CB)-based policies for online relay selection in multiple-input multiple-output (MIMO) two-way amplify-and-forward (TWAF) relay networks, where the relays are provided with quantized imperfect channel gain information. The proposed CB-based policy acquires information about the optimal relay node by resolving the exploration-versus-exploitation dilemma. In particular, we propose a linear upper confidence bound (LinUCB)-based CB policy, and an adaptive active greedy (AAG)-based CB policy that utilizes active learning heuristics. With simulation results, we show that the proposed CB-based policies can reduce the feedback overhead by a factor of eight and time-cost by 70% while outperforming the best conventional Gram-Schmidt (GS) algorithm.
| Original language | English |
|---|---|
| Title of host publication | 23rd IEEE International Workshop on Signal Processing Advances in Wireless Communication 2022 |
| Publisher | IEEE |
| ISBN (Electronic) | 9781665494557 |
| DOIs | |
| Publication status | Published - 28 Jul 2022 |
| Event | 23rd IEEE International Workshop on Signal Processing Advances in Wireless Communication 2022 - Oulu, Finland Duration: 4 Jul 2022 → 6 Jul 2022 |
Conference
| Conference | 23rd IEEE International Workshop on Signal Processing Advances in Wireless Communication 2022 |
|---|---|
| Abbreviated title | SPAWC 2022 |
| Country/Territory | Finland |
| City | Oulu |
| Period | 4/07/22 → 6/07/22 |
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Computer Science Applications
- Information Systems