西安电子科技大学学报 ›› 2020, Vol. 47 ›› Issue (1): 37-43.doi: 10.19665/j.issn1001-2400.2020.01.006

• • 上一篇    下一篇

一种改进的卷积神经网络恶意域名检测算法

杨路辉1,刘光杰1,2,翟江涛2,刘伟伟1,白惠文1,戴跃伟1,2   

  1. 1. 南京理工大学 自动化学院,江苏 南京 210094
    2. 南京信息工程大学 电子与信息工程学院,江苏 南京 210044
  • 收稿日期:2019-09-06 出版日期:2020-02-20 发布日期:2020-03-19
  • 作者简介:杨路辉(1992—),男,南京理工大学博士研究生,E-mail:yangluhui005@foxmail.com
  • 基金资助:
    国家自然科学基金(U1836104);国家自然科学基金(61602247);国家自然科学基金(61702235);国家自然科学基金(U1636117);江苏省自然科学基金(BK20160840)

Improved algorithm for detection of the malicious domain name based on the convolutional neural network

YANG Luhui1,LIU Guangjie1,2,ZHAI Jiangtao2,LIU Weiwei1,BAI Huiwen1,DAI Yuewei1,2   

  1. 1. School of Automation, Nanjing University of Science & Technology, Nanjing 210094, China
    2. School of Electronic & Information Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China
  • Received:2019-09-06 Online:2020-02-20 Published:2020-03-19

摘要:

针对现有检测方法对算法生成的恶意域名检测效率不高,尤其对几种难检测的恶意域名类型检测率低的问题,提出了一种改进的基于卷积神经网络的恶意域名检测算法。该算法在现有的卷积神经网络模型的基础上,增加了提取更深层字符级特征的卷积分支,从而同时提取恶意域名的浅层和深层字符级特征并融合; 引入一种聚焦损失函数以解决样本难易程度和数量的双重不平衡导致检测率低的问题,可提高对难样本的检测准确率。改进后的算法对20种恶意域名的平均检测准确率为97.62%,与原算法相比提高了0.94%; 对4种较难检测域名的检测准确率分别提高了3.71%、4.6%、11.18%和17.8%。实验结果表明,改进的算法能够提高对恶意域名的检测准确率,尤其能够显著提升对部分难检测域名的检测准确率。

关键词: 卷积神经网络, 域名生成算法, 深度学习, 信息安全

Abstract:

Aiming at the problem that the existing detection methods are not efficient in detecting the malicious domain name generated by the algorithm, especially the detection rate of several types of malicious domain names that are difficult to detect is low, an improved algorithm for detection of the malicious domain name based on the convolutional neural network is proposed. Based on the existing convolutional neural network model, this algorithm adds convolutional branches to extract deeper character-level features, so that both shallow and deep character-level features of malicious domain names could be extracted and fused simultaneously. A focal loss function is introduced as a loss function to solve the problem of sample imbalance caused by difficulty and quantity, which is used to improve the detection accuracy of hard-to-detect samples. The average detection accuracy of the improved algorithm for 20 types of malicious domain names is 97.62%, that is, 0.94% higher than that of the original algorithm, and the detection accuracy of four hard-to-detect domain names is increased by 3.71%, 4.6%, 11.18% and 17.8%, respectively. Experimental results show that the improved algorithm can effectively improve the detection accuracy of malicious domain names, especially for some hard-to-detect domain names.

Key words: convolutional neural network, domain generation algorithms, deep learning, information security

中图分类号: 

  • TP309