Journal of Xidian University ›› 2022, Vol. 49 ›› Issue (4): 39-48.doi: 10.19665/j.issn1001-2400.2022.04.006

• Information and Communications Engineering • Previous Articles     Next Articles

DDPG method for joint beamforming and power control in mmwave communication

LI Zhongjie1,2(),GAO Wei1,2(),XIONG Jiyuan1,2(),LI Jianghong1,2()   

  1. 1. College of Electronic and Information Engineering,South-Central University for Nationalities,Wuhan 430074,China
    2. Hubei Key Laboratory of Intelligent Wireless Communications,South-Central University for Nationalities,Wuhan 430074,China
  • Received:2021-04-16 Online:2022-08-20 Published:2022-08-15
  • Contact: Wei GAO,Jiyuan XIONG,Jianghong LI E-mail:lizhongjie@mail.scuec.edu.cn;17779867105@163.com;346142726@qq.com;1070424651@qq.com

Abstract:

The majority of existing beamforming algorithms rely heavily on the quality of instantaneous channel state information (CSI),which is unsuitable for practical systems and ignores power control issues,resulting in serious inter-user interference and lowering the spectrum efficiency.A deep reinforcement learning-based joint beamforming and power control technique is proposed to jointly tackle the beamforming design and power control problems without the requirement for perfect CSI.First,an information exchange protocol is proposed to facilitate the base station to understand environmental information,and a dual-model system with a centralized training distributed execution structure is designed to solve the joint optimization problem.The cloud utilizes Deep Q-Learning (DQN) to design the beamforming after receiving the local samples uploaded by the base station,which is collected and uploaded to the cloud by the base station.Considering that deep Q learning is not applicable to continuous variables,we employ the deep deterministic strategy gradient algorithm(DDPG) to solve the power control problem.The cloud model is broadcast to all base stations for distributed execution in order to acquire local samples once the training is completed.Simulation results show that the proposed scheme for spectral efficiency optimization significantly outperforms the traditional beamforming algorithm.

Key words: deep reinforcement learning, deep deterministic policy gradient, beamforming, power control

CLC Number: 

  • TN928