Journal of Xidian University ›› 2023, Vol. 50 ›› Issue (4): 139-147.doi: 10.19665/j.issn1001-2400.2023.04.014

• Special Issue on Cyberspace Security • Previous Articles     Next Articles

Multi-view encryption malicious traffic detection method combined with co-training

HUO Yuehua1,2(),WU Wenhao1(),ZHAO Faqi1(),WANG Qiang3,4()   

  1. 1. School of Mechanical Electronic & Information Engineering,China University of Mining and Technology-Beijing,Beijing 100083,China
    2. School of Network and Information Center,China University of Mining and Technology-Beijing,Beijing 100083,China
    3. Institute of Information Engineering,Chinese Academy of Science,Beijing 100084,China
    4. School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2023-01-15 Online:2023-08-20 Published:2023-10-17


Aiming at the problem of high dependence on labeled samples in machine learning-based malicious traffic detection methods for transport layer security protocol encryption,a semi-supervised learning-based malicious traffic detection method for transport layer security protocol encryption is proposed.With only a small number of labeled samples,the co-training strategy is utilized for the first time to joint two views of the encrypted traffic,and the training is performed by introducing unlabeled samples to expand the sample set and thereby to reduce the dependence on labeled samples.First,the flow metadata features with strong independence and certificate features in encrypted traffic features are extracted to construct each view for collaborative training,respectively.Second,the XGBoost classifier and random forest classifier are constructed for each view respectively.Finally,the two classifiers are collaboratively trained to form a multi-view co-training classifier detection model through the co-training strategy,with the model trained using a small number of labeled samples and a large number of unlabeled samples.The model achieves an accuracy rate of 99.17%,a recall rate of 98.54%,and a false positive rate of less than 0.18% on the public dataset.Experimental results show that the proposed method can effectively reduce the dependence on labeled samples under the condition of a small number of labeled samples.

Key words: co-training, transport layer security, multi-view, feature selection, semi-supervised learning

CLC Number: 

  • TP393