西安电子科技大学学报 ›› 2021, Vol. 48 ›› Issue (1): 160-167.doi: 10.19665/j.issn1001-2400.2021.01.018

• • 上一篇    下一篇

应用Q学习决策的最优攻击路径生成方法

李腾(),曹世杰(),尹思薇(),魏大卫(),马鑫迪(),马建峰()   

  1. 西安电子科技大学 网络与信息安全学院,陕西 西安 710071
  • 收稿日期:2020-08-18 出版日期:2021-02-20 发布日期:2021-02-03
  • 作者简介:李 腾(1991—),男,讲师,博士,E-mail: tengli@xidian.edu.cn|曹世杰(2000—),男,西安电子科技大学本科生,E-mail: caoshijie@sechnic.com|尹思薇(1998—),女,西安电子科技大学本科生,E-mail: 1065150375@qq.com|魏大卫(1994—),男,硕士,E-mail: dawei_wei@126.com|马鑫迪(1989—),男,讲师,博士,E-mail: xdma@xidian.edu.cn|马建峰(1963—),男,教授,博士,E-mail: jfma@mail.xidian.edu.cn
  • 基金资助:
    国家自然基金青年基金(61902291);国家自然基金青年基金(61902290);中国博士后面上基金(2019M653567);陕西省自然基金面上基金(2019JM-425);陕西省自然基金面上基金(2019JM-109)

Optimal method for the generation of the attack path based on the Q-learning decision

LI Teng(),CAO Shijie(),YIN Siwei(),WEI Dawei(),MA Xindi(),MA Jianfeng()   

  1. School of Network and Information Security,Xidian University,Xi’an 710071,China
  • Received:2020-08-18 Online:2021-02-20 Published:2021-02-03

摘要:

论文主要研究的是基于Q-learning算法生成一种动态寻找最优攻击路径的方法,并且能够提高攻击方法的高效性与适应性。以Q-learning算法为基础,参考网络连通性,通过分区的手段,利用删除网络拓扑中不可达的路径的化简方法,并通过机器学习的方式模拟黑客攻击,将状态与动作结合,在不断地学习中能够提高自身的适应与决策能力,以达到高效生成最优攻击路径的目的。经过实验,所建立的模拟攻击者能够在存在IDS报警装置的环境里中获取到Q-learning方法中的状态-价值表,并且能够通过遍历Q表获取从源主机到目的主机的最优攻击路径序列,验证了模型和算法的有效性与准确性。同时,通过提前分区域分析主机可达性,删减了冗余节点,在大型的网络拓扑中具有很大的优势。

关键词: 攻击图, 网络安全, 强化学习, 最优化算法, Q-learning

Abstract:

The main research purpose of this paper is to generate a dynamic approach to finding the optimal attack path based on the Q-learning algorithm in machine learning,and to improve the efficiency and adaptability of this approach.The method,based on the Q-learning algorithm and by the reference network connectivity and partition,uses the delete inaccessible path in the network topology reduction method,and simulated by machine learning hacker attacks,combines state and action,in keep learning to improve their ability of adaptation and decision-making,so as to generate the optimal attack path efficiently.Finally,through experiments,the established simulated attacker can obtain the state-value table in the Q-learning method in the environment with the IDS alarm device,and can obtain the optimal attack path sequence from the source host to the destination host by traversing the Q table,which verifies the validity and accuracy of the model and algorithm.At the same time,by analyzing the host reachability in advance,the redundant nodes are greatly reduced,a great advantage in large network topology.

Key words: attack graph, network security, reinforcement learning, optimization algorithm, Q-learning

中图分类号: 

  • TP309