电子科技 ›› 2024, Vol. 37 ›› Issue (5): 9-17.doi: 10.16180/j.cnki.issn1007-7820.2024.05.002

• • 上一篇    下一篇

Q学习差分进化算法求解热电动态经济排放调度

方帅, 陈旭, 李康吉   

  1. 江苏大学 电气信息工程工程学院,江苏 镇江 212013
  • 收稿日期:2022-12-09 出版日期:2024-05-15 发布日期:2024-05-21
  • 作者简介:方帅(1998-),男,硕士研究生。研究方向:智能算法及电力调度应用。
    陈旭(1988-),男,博士,副教授。研究方向:计算智能、机器学习及电力调度应用。
    李康吉(1979-),男,博士,教授。研究方向:建筑环境系统高效率建模、优化与控制方法。
  • 基金资助:
    国家自然科学基金(61873114);江苏大学农业装备学部青年计划项目(NZXB20210211)

A Q-Learning Differential Evolution Algorithm for Combined Heat and Power Dynamic Economic Emission Dispatch

FANG Shuai, CHEN Xu, LI Kangji   

  1. School of Electrical and Information Engineering,Jiangsu University,Zhenjiang 212013,China
  • Received:2022-12-09 Online:2024-05-15 Published:2024-05-21
  • Supported by:
    National Natural Science Foundation of China(61873114);Youth Program of Faculty of Agricultural Equipment Jiangsu University(NZXB20210211)

摘要:

热电联产动态经济排放调度同时考虑了燃料成本花费和污染气体排放两个目标值,且下一时间段的热电产量受当前时间段热电产量的影响,这是近年来电力系统运行中的一个重要问题。文中提出一种基于Q学习强化多目标差分进化(Q Learning Multi-Objective Differential Evolution,QLMODE)算法,以此求解热电联产动态经济排放调度(Combined Heat and Power Dynamic Economic Emission Dispatch,CHPDEED)问题。在QLMODE中,采用Q学习技术调整算法的比例因子参数,即在迭代过程中利用子代解和父代解之间的支配关系确定动作奖励和惩罚,并通过Q学习调整参数值,以获得最适合环境模型的算法参数。文中将所提QLMODE用于求解11机组和33机组的热电联产动态经济排放调度问题。仿真结果表明,与4种成熟的多目标优化算法相比,QLMODE算法燃料成本最小,污染气体排放最少,收敛性和多样性指标优于其他4种算法,且QLMODE在两组问题上都获得了更好的Pareto最优前沿。

关键词: Q学习, 强化学习, 多目标算法, 差分进化, 热电联产, 经济排放调度, 动态调度, 电力系统

Abstract:

The dynamic economic emission scheduling of cogeneration takes into account both fuel cost and pollution gas emission, and the thermoelectricity output in the next period is affected by the thermoelectricity output in the current period, which is an important problem in power system operation in recent years. In this study, a new QLMODE(Q-Learning Multi-Objective Differential Evolution) algorithm is proposed to solve the CHPDEED(Combined Heat and Power Dynamic Economic Emission Dispatch) problem. In QLMODE, the Q-learning technique is used to adjust the scale factor parameters of the algorithm, that is, in the iterative process, the action reward and punishment are determined by using the dominant relationship between the child solution and the parent solution, and the parameter values are adjusted by Q-learning to obtain the most suitable algorithm parameters for the environmental model. The proposed QLMODE is used to solve the CHPDEED with 11 units and 33 units. The simulation results show that compared with four mature multi-objective optimization algorithms, the QLMODE algorithm has the least fuel cost and the least pollution gas emission, the convergence and diversity index of QLMODE algorithm is better than the other four algorithms, and QLMODE has a better Pareto optimal frontier on both sets of problems.

Key words: Q learning, reinforcement learning, multi-objective algorithm, differential evolution, cogeneration combined heat and power, economic emission dispatch, dynamic dispatch, power system

中图分类号: 

  • TP18