西安电子科技大学学报 ›› 2023, Vol. 50 ›› Issue (4): 139-147.doi: 10.19665/j.issn1001-2400.2023.04.014

• 网络空间安全专栏 • 上一篇    下一篇

结合协同训练的多视图加密恶意流量检测方法

霍跃华1,2(),吴文昊1(),赵法起1(),王强3,4()   

  1. 1.中国矿业大学(北京) 机电与信息工程学院,北京 100083
    2.中国矿业大学(北京) 网络与信息中心,北京 100083
    3.中国科学院 信息工程研究所,北京 100084
    4.中国科学院大学 网络空间安全学院,北京 100049
  • 收稿日期:2023-01-15 出版日期:2023-08-20 发布日期:2023-10-17
  • 作者简介:霍跃华(1981—),男,高级工程师,E-mail:huoyh@cumtb.edu.cn;|吴文昊(1997—),男,中国矿业大学(北京)硕士研究生,E-mail:ZQT2100407188@student.cumtb.edu.cn;|赵法起(1997—),男,中国矿业大学(北京)硕士研究生,E-mail:ZQT2000407167@student.cumtb.edu.cn;|王强(1997—),男,中国科学院信息工程研究所博士研究生,E-mail:owangqiang@qq.com
  • 基金资助:
    信息系统安全技术重点实验室基金资助项目(CNKLSTISS-6142111190501)

Multi-view encryption malicious traffic detection method combined with co-training

HUO Yuehua1,2(),WU Wenhao1(),ZHAO Faqi1(),WANG Qiang3,4()   

  1. 1. School of Mechanical Electronic & Information Engineering,China University of Mining and Technology-Beijing,Beijing 100083,China
    2. School of Network and Information Center,China University of Mining and Technology-Beijing,Beijing 100083,China
    3. Institute of Information Engineering,Chinese Academy of Science,Beijing 100084,China
    4. School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2023-01-15 Online:2023-08-20 Published:2023-10-17

摘要:

针对基于机器学习的传输层安全协议加密恶意流量检测方法对标注样本依赖度高的问题,提出了一种基于半监督学习的传输层安全协议加密恶意流量检测方法。在少量标注样本的情况下,利用协同训练策略协同加密流量的两个视图,通过引入无标注样本进行训练,扩大样本集,进而减少对标注样本的依赖。首先,提取加密流量特征中独立性强的流元数据特征和证书特征,并分别构建协同训练的两个视图。其次,针对两个视图分别构建XGBoost分类器和随机森林分类器。最后,通过协同训练策略协同两个分类器构成多视图协同训练分类器检测模型,利用小规模标注样本和大量无标注样本进行模型训练。在公开数据集上,模型准确率达到了99.17%,召回率达到了98.54%,误报率低于0.18%。实验结果表明,在小规模标注样本的条件下,能够有效降低对标注样本依赖度。

关键词: 协同训练, 传输层安全协议, 多视图, 特征选择, 半监督学习

Abstract:

Aiming at the problem of high dependence on labeled samples in machine learning-based malicious traffic detection methods for transport layer security protocol encryption,a semi-supervised learning-based malicious traffic detection method for transport layer security protocol encryption is proposed.With only a small number of labeled samples,the co-training strategy is utilized for the first time to joint two views of the encrypted traffic,and the training is performed by introducing unlabeled samples to expand the sample set and thereby to reduce the dependence on labeled samples.First,the flow metadata features with strong independence and certificate features in encrypted traffic features are extracted to construct each view for collaborative training,respectively.Second,the XGBoost classifier and random forest classifier are constructed for each view respectively.Finally,the two classifiers are collaboratively trained to form a multi-view co-training classifier detection model through the co-training strategy,with the model trained using a small number of labeled samples and a large number of unlabeled samples.The model achieves an accuracy rate of 99.17%,a recall rate of 98.54%,and a false positive rate of less than 0.18% on the public dataset.Experimental results show that the proposed method can effectively reduce the dependence on labeled samples under the condition of a small number of labeled samples.

Key words: co-training, transport layer security, multi-view, feature selection, semi-supervised learning

中图分类号: 

  • TP393