结合协同训练的多视图加密恶意流量检测方法

doi:10.19665/j.issn1001-2400.2023.04.014

Abstract

Abstract:

Aiming at the problem of high dependence on labeled samples in machine learning-based malicious traffic detection methods for transport layer security protocol encryption,a semi-supervised learning-based malicious traffic detection method for transport layer security protocol encryption is proposed.With only a small number of labeled samples,the co-training strategy is utilized for the first time to joint two views of the encrypted traffic,and the training is performed by introducing unlabeled samples to expand the sample set and thereby to reduce the dependence on labeled samples.First,the flow metadata features with strong independence and certificate features in encrypted traffic features are extracted to construct each view for collaborative training,respectively.Second,the XGBoost classifier and random forest classifier are constructed for each view respectively.Finally,the two classifiers are collaboratively trained to form a multi-view co-training classifier detection model through the co-training strategy,with the model trained using a small number of labeled samples and a large number of unlabeled samples.The model achieves an accuracy rate of 99.17%,a recall rate of 98.54%,and a false positive rate of less than 0.18% on the public dataset.Experimental results show that the proposed method can effectively reduce the dependence on labeled samples under the condition of a small number of labeled samples.

Key words: co-training, transport layer security, multi-view, feature selection, semi-supervised learning

CLC Number:

TP393

HUO Yuehua,WU Wenhao,ZHAO Faqi,WANG Qiang. Multi-view encryption malicious traffic detection method combined with co-training[J].Journal of Xidian University, 2023, 50(4): 139-147.

Figures/Tables 10

References 26

[1]	谭豪, 申兵, 苗旭东, 等. Gimli认证加密方案的不可能差分分析[J]. 西安电子科技大学学报, 2022, 49(5):213-220.
	TAN Hao, SHEN Bing, MIAO Xudong, et al. Impossible Differential Cryptanalysis of the Gimli Authenticated Encryption Scheme[J]. Journal of Xidian University, 2022, 49(5):213-220.
[2]	刘亚, 宫佳欣, 赵逢禹. 加密算法Simpira v2的不可能差分攻击[J]. 西安电子科技大学学报, 2022, 49(5):201-212.
	LIU Ya, GONG Jiaxin, ZHAO Fengyu. Impossible Differential Attack on the Encryption Algorithm Simpira v2[J]. Journal of Xidian University, 2022, 49(5):201-212.
[3]	Google. Transparencyreport (2022)[EB/OL].[2022-09-24]. https://transparencyreport.google.com/https/overview.
[4]	鲁刚, 郭荣华, 周颖, 等. 恶意流量特征提取综述[J]. 信息网络安全, 2018, 2018(9):1-9.
	LU Gang, GUO Ronghua, ZHOU Ying, et al. Review of Malicious Traffic Feature Extraction[J]. Netinfo Security, 2018, 2018(9):1-9.
[5]	GALLAGHER S. Nearly Half of Malware Now Use TLS to Conceal Communications (2022)[EB/OL].[2022-09-24]. https://news.sophos.com/en-us/2021/04/21/nearly-half-of-malware-now-use-tls-to-conceal-communications/.
[6]	WANG Q, LI W, BAO H, et al. High-Efficient and Few-Shot Adaptive Encrypted Traffic Classification with Deep Tree[C]// MILCOM 2022-2022 IEEE Military Communications Conference (MILCOM).Piscataway:IEEE, 2022:458-463.
[7]	FANG Y, XU Y, HUANG C, et al. Against Malicious SSL/TLS Encryption:Identify Malicious Traffic Based on Random Forest[C]// Fourth International Congress on Information and Communication Technology.Berlin:Springer, 2020:99-115.
[8]	康鹏, 杨文忠, 马红桥. TLS协议恶意加密流量识别研究综述[J]. 计算机工程与应用, 2022, 58(12) :1-11. doi: 10.3778/j.issn.1002-8331.2110-0029
	KANG Peng, YANG Wenzhong, MA Hongqiao. TLS Malicious Encrypted Traffic Identification Research[J]. Computer Engineering and Applications, 2022, 58(12):1-11. doi: 10.3778/j.issn.1002-8331.2110-0029
[9]	LI W, ZHANG X Y, BAO H, et al. Robust Network Traffic Identification with Graph Matching[J]. Computer Networks, 2022, 218:109368. doi: 10.1016/j.comnet.2022.109368
[10]	LI W, ZHANG X Y, BAO H, et al. ProGraph:Robust Network Traffic Identification with Graph Propagation[J]. IEEE/ACM Transactions on Networking, 2022:1-15.
[11]	曾勇, 吴正远, 董丽华, 等. 加密流量中的恶意流量识别技术[J]. 西安电子科技大学学报, 2021, 48(3):170-187.
	ZENG Yong, WU Zhengyuan, DONG Lihua, et al. Research on Malicious Traffic Identification Technology in Encrypted Traffic[J]. Journal of Xidian University, 2021, 48(3):170-187.
[12]	KESHKEH K, JANTAN A, ALIEYAN K, et al. A Review on TLS Encryption Malware Detection:TLS Features,Machine Learning Usage,and Future Directions[C]// International Conference on Advances in Cyber Security.Berlin:Springer, 2021:213-229.
[13]	邹洁, 朱国胜, 祁小云, 等. 基于C4.5决策树的HTTPS加密流量分类方法[J]. 计算机科学, 2020, 47(S1):381-385.
	ZOU Jie, ZHU Guosheng, QI Xiaoyun, et al. HTTPS Encrypted Traffic Classification Method Based on C4.5 Decision Tree[J]. Computer Science, 2020, 47(S1):381-385.
[14]	TORROLEDO I, CAMACHO L D, BAHNSEN A C. Hunting Malicious TLS Certificates with Deep Neural Networks[C]// Proceedings of the 11th ACM workshop on Artificial Intelligence and Security. New York: ACM, 2018:64-73.
[15]	YU B, FANG Y, YANG Q, et al. A Survey of Malware Behavior Description and Analysis[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(5):583-603.
[16]	HUO Y H, ZHAO F Q, ZHANG H S, et al. AS-DMF:A Lightweight Malware Encrypted Traffic Detection Method Based on Active Learning and Feature Selection[J]. Wireless Communications and Mobile Computing, 2022:1-14.
[17]	VAN ENGELEN J E, HOOS H H. A Survey on Semi-Supervised Learning[J]. Machine Learning, 2020, 109(2):373-440. doi: 10.1007/s10994-019-05855-6
[18]	卢宛芝, 丁要军. 基于半监督多视图特征协同训练的网络恶意流量识别方法[J]. 通信技术, 2022, 55(4):513-518.
	LU Wanzhi, DING Yaojun. Network Malicious Traffic Identification Method Based on Semi-supervised Muiti-View Feature Co-Training[J]. Communication Technology, 2022, 55(4):513-518.
[19]	ABDELGAYED T S, MORSI W G, SIDHU T S. Fault Detection and Classification Based on Co-Training of Semisupervised Machine Learning[J]. IEEE Transactions on Industrial Electronics, 2017, 65(2):1595-1605. doi: 10.1109/TIE.41
[20]	ILIYASU A S, DENG H. Semi-Supervised Encrypted Traffic Classification with Deep Convolutional Generative Adversarial Networks[J]. IEEE Access, 2019, 8:118-216. doi: 10.1109/Access.6287639
[21]	霍跃华, 赵法起, 吴文昊. 多特征融合的煤矿网络加密恶意流量检测方法[J]. 工矿自动化, 2022, 48(7):142-148.
	HUO Yuehua, ZHAO Faqi, WU Wenhao. Multi-Feature Fusion Based Encrypted Malicious Traffic Detection Method for Coal Mine Network[J]. Journal of Mine Automation, 2022 48(7):142-148.
[22]	PAXSON V. Bro:A System for Detecting Network Intruders in Real-Time[J]. Computer networks, 1999, 31(23-24):2435-2463. doi: 10.1016/S1389-1286(99)00112-7
[23]	ANDERSON B, PAUL S, MCGREW D. Deciphering Malware’s Use of TLS (without Decryption)[J]. Journal of Computer Virology and Hacking Techniques, 2018, 14(3):195-211. doi: 10.1007/s11416-017-0306-6
[24]	霍跃华, 赵法起. 基于stacking和多特征融合的加密恶意流量检测研究(2022)[J/OL].[2022-09-24].https://doi.org/10.19678/j.issn.1000-3428.0064805.
	HUO Yuehua, ZHAO Faqi. Analysis of Encrypted Malicious Traffic Detection Based on Stacking and Muti-Feature Fusion (2022)[J/OL].[2022-09-24].https://doi.org/10.19678/j.issn.1000-3428.0064805.
[25]	YU T, ZOU F, LI L, et al. An Encrypted Malicious Traffic Detection System Based on Neural Network[C]// 2019 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC).Piscataway:IEEE, 2019:62-70.
[26]	黄欣辰, 皋军, 黄豪杰. 基于PCA降维的成对约束半监督聚类集成[J]. 计算机与现代化, 2021, 2021(1):94-99.
	HUANG Xinchen, GAO Jun, HUANG Haojie. Semi-Supervised Clustering Ensemble with Pairwise Constraints Based on PCA Demension Reduction[J]. Computer and Modernization, 2021, 2021(1):94-99.

	类型	加密流量	总计
	CClearner	267 9
	Dridex	423 5
恶意流量	Emotet	153 6	197 40
	Rzay	188 7
	TrickBot	584 9
	Zeus	355 4
良性流量	Benign	193 45	193 45

特征	特征重要性值
前向流中两个连续数据包最小的到达间隔时间	0.785 4
源端口	0.056 6
后向数据包包含的最大头部信息的字节数	0.055 5
前向流中最后一个数据包的窗口大小	0.026 1
后向流中第一个数据包的窗口大小	0.014 5
两个连续数据包最小的到达间隔时间	0.012 0

	A_cc	R_ec	F_PR
视图1/XGBoost	99.90	99.86	0.07
视图2/RF	98.70	99.28	1.90
MCC/100	99.17	98.54	0.04

标注数量	A_cc/%	R_ec/%	F_PR/%
20	96.88±3.40	95.37±6.90	1.57±1.62
30	97.31±2.88	95.82±6.18	1.17±1.21
40	98.76±0.52	98.08±1.25	0.55±0.63
50	98.84±0.45	98.09±1.04	0.39±0.65
60	99.01±0.55	98.69±1.07	0.66±0.86
70	98.86±0.54	98.85±1.21	0.36±0.48
80	98.98±0.44	98.34±0.87	0.36±0.52
90	99.14±0.70	98.46±1.39	0.16±0.28
100	99.17±0.36	98.54±0.62	0.04±0.37

Multi-view encryption malicious traffic detection method combined with co-training

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 26

Related Articles 15

Metrics

Comments

Recommended 10

[1]	LIU Jingmei,YAN Yibo. Artificial fish feature selection network intrusion detection system [J]. Journal of Xidian University, 2023, 50(4): 132-138.
[2]	YANG Liying,MENG Tianhao,ZHANG Qingyang,CHAO Si. Implementation of EEG emotion analysis via feature fusion [J]. Journal of Xidian University, 2022, 49(6): 95-102.
[3]	RONG Ruyi,XUE Peiyun,BAI Jing,JIA Hairong,XIE Yali. Micro-expression recognition based on two-channel decision information fusion [J]. Journal of Xidian University, 2022, 49(4): 127-133.
[4]	LIU Yunrui,ZHOU Shuisheng. Application of least squares loss in the multi-view learning algorithm [J]. Journal of Xidian University, 2021, 48(6): 151-160.
[5]	LIU Jingmei,GAO Yuanbo. Fast network intrusion detection system using adaptive binning feature selection [J]. Journal of Xidian University, 2021, 48(1): 176-182.
[6]	XIAO Lijun,GUO Jichang,GU Xiangyuan. Algorithm for selection of features based on dynamic weights using redundancy [J]. Journal of Xidian University, 2019, 46(5): 155-161.
[7]	LIU Yongli,GUO Chengyi,LIU Jing,WU Yan. Multi-view fuzzy clustering algorithm using FCS [J]. Journal of Xidian University, 2019, 46(4): 99-106.
[8]	CHEN Chu. RFCcertDT: a testing tool for certificate validation in SSL/TLS [J]. Journal of Xidian University, 2019, 46(3): 20-25.
[9]	LI Zhipeng;MA Tianxiang;DU Lan;XU Danlei;LIU Hongwei;ZHANG Zijing. Multi-class classifier design for feature fusion in radar HRRP recognition [J]. J4, 2013, 40(1): 111-117.
[10]	DOU Zengfa;GAO Lin. Feature selection in conditional random fields using a membrane particle swarm optimizer [J]. J4, 2012, 39(5): 107-112.
[11]	LI Xue;TIAN Bin;ZHAN Yi;SUN Yong-jun. Digital modulated signal recognition based on intrisic time-scale decomposition parameters [J]. J4, 2010, 37(6): 1132-1136.
[12]	CHEN Chang-hong;ZHAO Heng;LIANG Ji-min;JIAO Li-cheng. Influence of feature selection on FHMM [J]. J4, 2010, 37(5): 934-940.
[13]	SHANG Rong-hua;JIAO Li-cheng;WU Jian-she;MA Wen-ping;LI Yang-yang. Immune clonal multi-objective algorithm for unsupervised feature selection [J]. J4, 2010, 37(1): 18-22.
[14]	LU Man-jun;ZHAN Yi;SI Xi-cai;YANG Xiao-niu. Extraction of the transient characteristics of the communication radiating source and individual indentification [J]. J4, 2009, 36(4): 736-740.
[15]	ZHU Hu-ming;JIAO Li-cheng. Parallel immune clonal selection for feature selection [J]. J4, 2008, 35(5): 853-857.