西安电子科技大学学报 ›› 2024, Vol. 51 ›› Issue (1): 52-59.doi: 10.19665/j.issn1001-2400.20230310

• 信息与通信工程 • 上一篇    下一篇

一种改进的双深度Q网络服务功能链部署算法

刘道华(), 魏丁二(), 宣贺君(), 余长鸣(), 寇丽博()   

  1. 信阳师范大学 计算机与信息技术学院,河南 信阳 464000
  • 收稿日期:2022-11-28 出版日期:2024-01-20 发布日期:2023-08-30
  • 作者简介:刘道华(1974—),男,教授,博士,E-mail:ldhzzx@163.com;
    魏丁二(1998—),男,信阳师范大学硕士研究生,E-mail:1186871688@qq.com;
    宣贺君(1988—),男,副教授,博士,E-mail: xuanhejun0896@xynu.edu.cn;
    余长鸣(1999—),女,信阳师范大学硕士研究生,E-mail:ycmxxn@163.com;
    寇丽博(1998—),女,信阳师范大学硕士研究生,E-mail: 1659100219@qq.com.
  • 基金资助:
    国家自然科学基金(61572417);河南省科技攻关项目(222102210265);河南省本科高校研究性教学改革项目(2022SYJXLX061);河南省高等学校重点科研项目(22A520007);河南省研究生教育改革与质量提升工程项目(YJS2024AL104)

Improved double deep Q network algorithm for service function chain deployment

LIU Daohua(), WEI Dinger(), XUAN Hejun(), YU Changming(), KOU Libo()   

  1. School of Computer and Information Technology,Xinyang Normal University,Xinyang 464000,China
  • Received:2022-11-28 Online:2024-01-20 Published:2023-08-30

摘要:

网络功能虚拟化已成为未来通信网络的关键技术,动态服务功能链的高效部署是提高网络性能迫切需要解决的问题之一。为降低通信网络服务器能量消耗以及改善通信网络服务质量,提出一种改进的双深度Q网络的动态服务功能链部署算法。由于网络状态及服务功能链的动态性,首先将服务功能链部署问题建模为马尔可夫决策过程。根据通信网络中资源的状态以及所选择的动作计算奖励函数值,对双深度Q网络进行在线训练,得到最优深度神经网络模型,从而确定最优的在线服务功能链部署策略。为解决传统深度强化学习从经验回放池中采用均匀抽取经验样本而导致神经网络学习效率低的问题,设计一种基于重要性采样的优先级经验回放方法以抽取经验样本,从而有效地避免训练样本之间的高度相关性,进一步提高离线学习神经网络的效率。仿真结果表明,所提出基于改进双深度Q网络的服务功能链部署算法能够提高奖励值,与传统的双深度Q网络算法相比,在能量消耗与阻塞率方面分别降低约19.89%~36.99%与9.52%~16.37%。

关键词: 服务功能链, 马尔科夫决策过程, 网络能耗, 双深度Q网络

Abstract:

Network Function Virtualization(NFV) has become the key technology of next generation communication.Virtual Network Function Service Chain(VNF-SC) mapping is the key issue of the NFV.To reduce the energy consumption of the communication network server and improve the quality of service,a Function Chain(SFC) deployment algorithm based on an improved Double Deep Q Network(DDQN) is proposed to reduce the energy consumption of network servers and improve the network quality of service.Due to the dynamic change of the network state,the service function chain deployment problem is modeled as a Markov Decision Process(MDP).Based on the network state and action rewards,the DDQN is trained online to obtain the optimal deployment strategy for the service function chain.To solve the problem that traditional deep reinforcement learning draws experience samples uniformly from the experience replay pool leading to low learning efficiency of the neural network,a prioritized experience replay method based on importance sampling is designed to draw experience samples so as to avoid high correlation between training samples to improve the learning efficiency of the neural network.Experimental results show that the proposed SFC deployment algorithm based on the improved DDQN can increase the reward value,and that compared with the traditional DDQN algorithm,it can reduce the energy consumption and blocking rate by 19.89%~36.99% and 9.52%~16.37%,respectively.

Key words: service function chain, Markov decision process, network energy consumption, DDQN

中图分类号: 

  • TP393