西安电子科技大学学报 ›› 2022, Vol. 49 ›› Issue (4): 39-48.doi: 10.19665/j.issn1001-2400.2022.04.006

• 信息与通信工程 • 上一篇    下一篇

采用DDPG的联合波束成形和功率控制算法

李中捷1,2(),高伟1,2(),熊吉源1,2(),李江虹1,2()   

  1. 1.中南民族大学 电子信息工程学院,湖北 武汉 430074
    2.中南民族大学 智能无线通信湖北重点实验室,湖北 武汉 430074
  • 收稿日期:2021-04-16 出版日期:2022-08-20 发布日期:2022-08-15
  • 通讯作者: 高伟,熊吉源,李江虹
  • 作者简介:李中捷(1974—),男,教授,博士,E-mail: lizhongjie@mail.scuec.edu.cn
  • 基金资助:
    国家自然科学基金(61379028);国家自然科学基金(61671483);湖北省自然科学基金(2016CFA089);中央高校基本科研业务费专项资金(CZY19003)

DDPG method for joint beamforming and power control in mmwave communication

LI Zhongjie1,2(),GAO Wei1,2(),XIONG Jiyuan1,2(),LI Jianghong1,2()   

  1. 1. College of Electronic and Information Engineering,South-Central University for Nationalities,Wuhan 430074,China
    2. Hubei Key Laboratory of Intelligent Wireless Communications,South-Central University for Nationalities,Wuhan 430074,China
  • Received:2021-04-16 Online:2022-08-20 Published:2022-08-15
  • Contact: Wei GAO,Jiyuan XIONG,Jianghong LI

摘要:

针对现有波束成形算法性能,大程度依赖信道状态信息质量而不适用于快速变化的实际系统和忽略功率控制问题,导致用户间干扰严重,从而降低通信链路频谱效率的问题,提出了基于深度强化学习的联合波束成形和功率控制算法,在无须完整信道状态信息的情况下对波束成形矩阵和功率控制问题进行联合求解。同时提出信息交互协议利于基站了解环境信息,并设计集中式训练分布式执行结构的双模型系统对联合优化问题进行求解。首先基站收集本地样本并上传至云端,云端接受基站上传的本地样本后,采用深度Q学习对波束成形进行设计;然后用深度确定性策略梯度算法取代深度Q学习求解功率控制问题,从而解决深度Q学习不适用于连续变量的问题。云端模型训练完成后,广播给所有基站进行分布式执行获取本地样本。仿真结果表明,在多用户通信环境中该算法的频谱效率优于传统波束成形算法和基于深度Q学习的联合波束成形和功率控制算法,从而验证了该算法的有效性。

关键词: 深度强化学习, 深度确定性策略梯度, 波束成形, 功率控制

Abstract:

The majority of existing beamforming algorithms rely heavily on the quality of instantaneous channel state information (CSI),which is unsuitable for practical systems and ignores power control issues,resulting in serious inter-user interference and lowering the spectrum efficiency.A deep reinforcement learning-based joint beamforming and power control technique is proposed to jointly tackle the beamforming design and power control problems without the requirement for perfect CSI.First,an information exchange protocol is proposed to facilitate the base station to understand environmental information,and a dual-model system with a centralized training distributed execution structure is designed to solve the joint optimization problem.The cloud utilizes Deep Q-Learning (DQN) to design the beamforming after receiving the local samples uploaded by the base station,which is collected and uploaded to the cloud by the base station.Considering that deep Q learning is not applicable to continuous variables,we employ the deep deterministic strategy gradient algorithm(DDPG) to solve the power control problem.The cloud model is broadcast to all base stations for distributed execution in order to acquire local samples once the training is completed.Simulation results show that the proposed scheme for spectral efficiency optimization significantly outperforms the traditional beamforming algorithm.

Key words: deep reinforcement learning, deep deterministic policy gradient, beamforming, power control

中图分类号: 

  • TN928