Journal of Xidian University ›› 2021, Vol. 48 ›› Issue (4): 176-183.doi: 10.19665/j.issn1001-2400.2021.04.023

• Computer Science and Technology & Cyberspace Security • Previous Articles     Next Articles

Unbalanced data weighted boundary point integration undersampling method

HE Yunbin(),LENG Xin(),WAN Jing()   

  1. School of Computer Science and Technology,Harbin University,Harbin 150000,China
  • Received:2020-05-11 Online:2021-08-30 Published:2021-08-31

Abstract:

In order to effectively solve the problem that boundary points are deleted directly from unbalanced data and effectively maintain the information on most kinds of data,a clustering-based weighted boundary point integration undersampling algorithm is proposed.First,the algorithm extracts the number of minority class sets as the initial number of clustering centers of majority class sets to cluster.Then,the variation coefficient is introduced to identify the boundary points,and the identified boundary points are weighted so that the weighted boundary points can be added to the unbalanced data processing.Then,the cluster density is used to divide majority class sets into the high-density cluster and low-density cluster,delete the low-density cluster,and finally obtain the reduced majority of the sample sets.Then,the reduced majority of class samples is combined with the minority of class samples to form a balanced data set,which is trained with the Ada boost to get the final classification model.This method can be used to reduce the dataset and improve the efficiency of execution.The results show that the proposed method can effectively handle the problem of unbalanced data,and improve the execution efficiency and accuracy of the under-sampling algorithm for unbalanced data weighted boundary point integration.

Key words: sampling, clustering, unbalanced data, weighted boundary point

CLC Number: 

  • TP311.13