Journal of Xidian University ›› 2021, Vol. 48 ›› Issue (4): 176-183.doi: 10.19665/j.issn1001-2400.2021.04.023
• Computer Science and Technology & Cyberspace Security • Previous Articles Next Articles
HE Yunbin(),LENG Xin(),WAN Jing()
Received:
2020-05-11
Online:
2021-08-30
Published:
2021-08-31
CLC Number:
HE Yunbin,LENG Xin,WAN Jing. Unbalanced data weighted boundary point integration undersampling method[J].Journal of Xidian University, 2021, 48(4): 176-183.
"
数据集 | C4.5 | SMOTE-Boost | PC-Boost | Ada-Boost | CWBUSC |
---|---|---|---|---|---|
Glass | 0.785 | 0.846 | 0.978 | 0.813 | 0.981 |
Vehicle | 0.879 | 0.617 | 0.862 | 0.554 | 0.901 |
Satimage | 0.564 | 0.678 | 0.657 | 0.647 | 0.784 |
Vowel | 0.937 | 0.955 | 0.975 | 0.971 | 0.991 |
Pima | 0.600 | 0.648 | 0.519 | 0.681 | 0.702 |
Haberman | 0.323 | 0.391 | 0.407 | 0.369 | 0.423 |
Letter | 0.947 | 0.986 | 0.969 | 0.691 | 0.951 |
"
数据集 | C4.5 | SMOTE-Boost | PC-Boost | Ada-Boost | CWBUSC |
---|---|---|---|---|---|
Glass | 0.859 | 0.911 | 0.949 | 0.894 | 0.950 |
Vehicle | 0.925 | 0.953 | 0.962 | 0.955 | 0.977 |
Satimage | 0.727 | 0.760 | 0.822 | 0.770 | 0.853 |
Vowel | 0.958 | 0.987 | 0.921 | 0.976 | 0.989 |
Pima | 0.661 | 0.724 | 0.767 | 0.703 | 0.789 |
Haberman | 0.486 | 0.556 | 0.674 | 0.523 | 0.680 |
Letter | 0.937 | 0.975 | 0.958 | 0.848 | 0.948 |
[1] | 陈旭, 刘鹏鹤, 孙毓忠, 等. 面向不均衡医学数据集的疾病预测模型研究[J]. 计算机学报, 2019, 42(03):596-609. |
CHEN Xu, LIU Penghe, SUN Yuzhong, et al. Research on Disease Prediction Models Based on Imbalanced Medical Data Sets[J]. Chinese Journal of Computers, 2019, 42(03):596-609. | |
[2] |
FIOREU, DE S A, PERLA F, et al. Using Generative Adversarial Networks for Improving Classification Effectiveness in Credit Card Fraud Detection[J]. Information Sciences, 2017, 479:448-455.
doi: 10.1016/j.ins.2017.12.030 |
[3] | 宋胜利, 王少龙, 陈平. 面向文本分类的中文文本语义表示方法[J]. 西安电子科技大学学报, 2013, 40(02):89-97. |
SONG Shengli, WANG Shaolong, CHEN Ping. Semantic Representation of Chinese Text for Text Classification[J]. Journal of Xidian University, 2013, 40(02):89-97. | |
[4] |
AIHONG W, NAN Y, CAO X. Multi-Classification Cluster Analysis of Large Data Based on Knowledge Element in Microblogging Short Text[J]. Cluster Computing, 2019, 22(2):4119-4127.
doi: 10.1007/s10586-017-1517-9 |
[5] | GUPTA M, BAKLIWAL A, AGARWAL S, et al. A Comparative Study of Spam SMS Detection Using Machine Learning Classifiers[C]//Proceedings of the 2018 Eleventh International Conference on Contemporary Computing (IC3).Piscataway:IEEE, 2018:1-7. |
[6] |
MARTIN-DIAZ I, MORINIGO-SOTELO D, DUQUE-PEREZ O, et al. Early Fault Detection in Induction Motors Using AdaBoost with Imbalanced Small Data and Optimized Sampling[J]. IEEE Transactions on Industry Applications, 2017, 53(3):3066-3075.
doi: 10.1109/TIA.2016.2618756 |
[7] | LIN W C, TSAI C F, HU Y H, et al. Clustering-Based Undersampling in Class-Imbalanced Data[J]. Information Sciences, 2017, 409:17-26. |
[8] | 熊冰妍, 王国胤, 邓维斌. 基于样本权重的不平衡数据欠抽样方法[J]. 计算机研究与发展, 2016, 53(11):2613-2622. |
XIONG Bingyan, WANG Guoyin, DENG Weibin. Under-Sampling Method Based on Sample Weight for Inbalanced Data[J]. Journal of Computer Research and Development, 2016, 53(11):2613-2622. | |
[9] | KANG Q, SHI L, ZHOU M C, et al. A Distance-Based WeightedUndersampling Scheme for Support Vector Machines and Its Application to Imbalanced Classification[J]. IEEE Transactions on Neural Networks, 2018, 29(9):4152-4165. |
[10] | 蔡艳艳, 宋晓东. 针对非平衡数据分类的新型模糊SVM模型[J]. 西安电子科技大学学报, 2015, 42(05):120-124. |
CAI Yanyan, SONG Xiaodong. New Fuzzy SVM Model Used in Imbalanced Datasets[J]. Journal of Xidian University, 2015, 42(05):120-124. | |
[11] | 李钊, 袁文浩, 任崇广, 等. 跨层精度自动调节的k均值聚类近似计算方法[J]. 西安电子科技大学学报, 2020, 47(03):50-57. |
LI Zhao, YUAN Wenhao, REN Chongguang, et al. Approximate Computing Method Based on Cross-Layer Dynamic Precision Scaling For the k-means[J]. Journal of Xidian University, 2020, 47(03):50-57. | |
[12] | 薛丽香, 邱保志. 基于变异系数的边界点检测算法[J]. 模式识别与人工智能, 2009, 22(5):799-802. |
XUE Lixiang, QIU Baozhi. Boundary Points Detection Algorithm Based on Coefficient Variation[J]. Pattern Recognition and Artificial Intelligence, 2009, 22(5):799-802. | |
[13] | BREUNIG M M, KRIEGEL H P, NG R T. LOF:Identifying Density-Based Local Outliers[C/OL].[2020-04-30].https://www.docin.com/P-1572662359.html . |
[14] | XIA C, HSU W, LEE M L, et al. BORDER:Efficient Computation of Boundary Points[J]. IEEE Transactions on Knowledge and Engineering, 2006, 18(3):289-303. |
[15] | 黄浩, 何钦铭, 陈奇. 基于加权边界度的稀有类检测算法[J]. 软件学报, 2012, 23(05):1195-1206. |
HUANG Hao, HE Qinming, CHEN Qi. Rare Categeory Detection Algorithm Based on Weighted Boundary Dedree[J]. Jouanal of Software, 2012, 23(05):1195-1206. | |
[16] |
FREUND Y, SCHAPIRE R E. A Decision-Theoretic Generalization of On-Line Learning and An Application to Boosting[J]. Journal of Computer and System Sciences, 1997, 55(1):119-139.
doi: 10.1006/jcss.1997.1504 |
[1] | ZHANG Chunxiang,ZHOU Xuesong,GAO Xueyao,LIU Huan. Semi-supervised word sense disambiguation by combining k-means clustering and the LSTM network [J]. Journal of Xidian University, 2021, 48(6): 161-171. |
[2] | ZHANG Jiaqi,TAO Haihong,ZHANG Xiushe,HAN Chunlei. A multi-frame track before detect algorithm utilizing measurement space clustering [J]. Journal of Xidian University, 2021, 48(5): 231-238. |
[3] | ZHOU Jianyu,WEI Yinsheng,XU Rongqing. Improved ionospheric clutter classification method based on fuzzy C-means clustering [J]. Journal of Xidian University, 2021, 48(2): 35-41. |
[4] | SUN Zhengyang,DONG Mei,CHEN Baixiao. Interrupted sampling repeater jamming suppression based on time-frequency analysis and band-pass filtering [J]. Journal of Xidian University, 2021, 48(2): 139-146. |
[5] | YANG Hongyu,ZENG Renyun. Method for assessment of network security situation with deep learning [J]. Journal of Xidian University, 2021, 48(1): 183-190. |
[6] | LI Jinze,WANG Zhonghao,LI Mengheng,QIN Tuanfa. Spectrum sharing management method for the small-area-blockchain based on district partition [J]. Journal of Xidian University, 2020, 47(6): 122-130. |
[7] | LI Zhao,YUAN Wenhao,REN Chongguang,HUANG Chengcheng,DONG Xiaoxiao. Approximate computing method based on cross-layer dynamic precision scaling for the k-means [J]. Journal of Xidian University, 2020, 47(3): 50-57. |
[8] | LIU Yiming,SHENG Wen,SHI Duanyang. Multi-beam tracking scheduling strategy for phased array radar based on the cost-effectiveness ratio [J]. Journal of Xidian University, 2019, 46(6): 155-162. |
[9] | LIU Yongli,GUO Chengyi,LIU Jing,WU Yan. Multi-view fuzzy clustering algorithm using FCS [J]. Journal of Xidian University, 2019, 46(4): 99-106. |
[10] | ZHANG Shubo,REN Shuxia,WU Tao. Improved spectral clustering community detection algorithm by combining the probability matrix [J]. Journal of Xidian University, 2019, 46(3): 167-172. |
[11] | LIU Jingmei,HAN Qingqing,SHEN Zhiwei,LIU Jingwei. Method for Secret key generation using k-means clustering [J]. Journal of Xidian University, 2019, 46(1): 8-13. |
[12] | ZHAO Tong,LI Xiansheng,ZHANG Lei,DING Enjie,HU Yanjun. Algorithm for cooperational localization of the sectional interval and LOS node in a coal mine [J]. Journal of Xidian University, 2019, 46(1): 166-173. |
[13] | LIU Daohua, HU Xiuyun, ZHAO Yansong, CUI Yushuang. Particle swarm optimization method based on dynamic sub-swarms with entropy weight [J]. Journal of Xidian University, 2018, 45(6): 69-74. |
[14] | YANG Liying;YANG Shengnan;YUAN Xiguo;GENG Fangge;ZHANG Junying. Analyzing pan-cancer DNA methylation patterns via clustering [J]. Journal of Xidian University, 2018, 45(4): 23-28. |
[15] | DING Hao;WANG Jianye;LIU Wei;XIONG Yongzhong. High-speed high-broadband master-slave sampling and hold circuit [J]. Journal of Xidian University, 2018, 45(4): 123-128. |
|