西安电子科技大学学报 ›› 2021, Vol. 48 ›› Issue (4): 136-143.doi: 10.19665/j.issn1001-2400.2021.04.018

• 计算机科学与技术&网络空间安全 • 上一篇    下一篇

DB-SMOTE及多层堆叠用于心律失常识别

王波(),邓科()   

  1. 西安交通大学 智能网络与网络安全教育部重点实验室,陕西 西安 710049
  • 收稿日期:2020-03-15 出版日期:2021-08-30 发布日期:2021-08-31
  • 通讯作者: 邓科
  • 作者简介:王 波(1994—),男,西安交通大学硕士研究生,E-mail: wangbo19941226@stu.xjtu.edu.cn
  • 基金资助:
    国家自然科学基金(61671364);国家自然科学基金(61671366);国家自然科学基金(61701390)

DB-SMOTE and multi-layer stacking used for Arrhythmia recognition

WANG Bo(),DENG Ke()   

  1. Ministry of Education Key Laboratory for Intelligent Networks and Network Security,Xi’an Jiaotong University,Xi’an 710049,China
  • Received:2020-03-15 Online:2021-08-30 Published:2021-08-31
  • Contact: Ke DENG

摘要:

为提高心电图的识别效果,尤其是少数类疾病的识别准确率,设计了基于聚类插值过采样算法和多层堆叠模型的心律失常识别方法。由于合成少数类过采样算法忽略了少数类数据的类内不平衡性,提出了聚类插值过采样算法生成少数类边界样本。该算法利用有噪空间密度聚类算法把少数类数据分为多个簇并滤去噪声样本,以各簇的边界数据为主体生成新样本,并利用t分布随机邻近嵌入算法可视化分析样本生成情况。因单一的分类器性能不能满足需求,故使用多层堆叠模型组合多个不同分类器来进行识别。多层堆叠模型分为两层:第一层基模型K近邻、极端梯度提升树和梯度提升树将特征F映射为F',第二层逻辑回归模型识别特征F'。用MITBIH数据集进行检测,上述识别方法的分类准确率达99.66%,能较大地提升少数类样本的识别效果,因此该方法能够有效地用于心律失常识别。

关键词: 心电图, 聚类插值过采样算法, t分布随机近邻嵌入, 堆叠模型

Abstract:

In order to improve the recognition performance of the electrocardiogram,especially the recognition accuracy of minor diseases,this paper proposes the electrocardiogram recognition architecture based on the DB-SMOTE algorithm and multi-layered stacking model.The DB-SMOTE algorithm is proposed to solve the problem because the classical oversampling SMOTE algorithm ignores the intra-class unbalance of minority-class data.The new algorithm utilizes DBSCAN clustering to divide the data of minority classes into multiple clusters and filter out the noise samples,takes the boundary data of each cluster as the main body to generate new samples,and analyzes visually by tSNE.The performance of a single classifier cannot meet the requirements,so a multi-layered stack classification is used for identification,which is divided into two parts:the first is based on KNN,Xgboost and GBDT,and the feature F is mapped to F';the second part of the model is to identify the feature F'.This architecture has a 99.66% accuracy rate in identifying the electrocardiogram and can improve the recognition accuracy of minor diseases well,so it can be used to identify arrhythmias effectively.

Key words: electrocardiogram, DB-SMOTE, tSNE, stacking

中图分类号: 

  • TP181