TY - JOUR
T1 - Function Approximation Based Reinforcement Learning for Edge Caching in Massive MIMO Networks
AU - Garg, Navneet
AU - Sellathurai, Mathini
AU - Bhatia, Vimal
AU - Ratnarajah, Tharmalingam
N1 - Funding Information:
Manuscript received June 15, 2020; revised September 12, 2020 and December 15, 2020; accepted December 15, 2020. Date of publication December 28, 2020; date of current version April 16, 2021. This work was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) under grants EP/P009549/1, EP/P009670/1; UK-India Education and Research Initiative Thematic Partnerships under grants DST-UKIERI-2016-17-0060, DST/INT/UK/P-129/2016, UGC-UKIERI 2016-17-058, and SPARC/2018-2019/P148/SL. The associate editor coordinating the review of this article and approving it for publication was M. C. Gursoy. (Corresponding author: Navneet Garg.) Navneet Garg and Tharmalingam Ratnarajah are with the School of Engineering, The University of Edinburgh, Edinburgh EH8 9YL, U.K. (e-mail: [email protected]).
Publisher Copyright:
© 1972-2012 IEEE.
PY - 2021/4
Y1 - 2021/4
N2 - Caching popular contents in advance is an important technique to achieve low latency and reduced backhaul congestion in future wireless communication systems. In this article, a multi-cell massive multi-input-multi-output system is considered, where locations of base stations are distributed as a Poisson point process. Assuming probabilistic caching, average success probability (ASP) of the system is derived for a known content popularity (CP) profile, which in practice is time-varying and unknown in advance. Further, modeling CP variations across time as a Markov process, reinforcement Q-learning is employed to learn the optimal content placement strategy to optimize the long-term-discounted ASP and average cache refresh rate. In the Q-learning, the number of Q-updates are large and proportional to the number of states and actions. To reduce the space complexity and update requirements towards scalable Q-learning, two novel (linear and non-linear) function approximations-based Q-learning approaches are proposed, where only a constant (4 and 3 respectively) number of variables need updation, irrespective of the number of states and actions. Convergence of these approximation-based approaches are analyzed. Simulations verify that these approaches converge and successfully learn the similar best content placement, which shows the successful applicability and scalability of the proposed approximated Q-learning schemes.
AB - Caching popular contents in advance is an important technique to achieve low latency and reduced backhaul congestion in future wireless communication systems. In this article, a multi-cell massive multi-input-multi-output system is considered, where locations of base stations are distributed as a Poisson point process. Assuming probabilistic caching, average success probability (ASP) of the system is derived for a known content popularity (CP) profile, which in practice is time-varying and unknown in advance. Further, modeling CP variations across time as a Markov process, reinforcement Q-learning is employed to learn the optimal content placement strategy to optimize the long-term-discounted ASP and average cache refresh rate. In the Q-learning, the number of Q-updates are large and proportional to the number of states and actions. To reduce the space complexity and update requirements towards scalable Q-learning, two novel (linear and non-linear) function approximations-based Q-learning approaches are proposed, where only a constant (4 and 3 respectively) number of variables need updation, irrespective of the number of states and actions. Convergence of these approximation-based approaches are analyzed. Simulations verify that these approaches converge and successfully learn the similar best content placement, which shows the successful applicability and scalability of the proposed approximated Q-learning schemes.
KW - Linear function approximation
KW - massive MIMO
KW - non-linear function approximation
KW - Poisson point process
KW - Q-learning, wireless edge caching
UR - http://www.scopus.com/inward/record.url?scp=85099104084&partnerID=8YFLogxK
U2 - 10.1109/TCOMM.2020.3047658
DO - 10.1109/TCOMM.2020.3047658
M3 - Article
AN - SCOPUS:85099104084
SN - 0090-6778
VL - 69
SP - 2304
EP - 2316
JO - IEEE Transactions on Communications
JF - IEEE Transactions on Communications
IS - 4
ER -