西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (5): 155-161.doi: 10.19665/j.issn1001-2400.2019.05.022

• • 上一篇    下一篇

一种采用冗余性动态权重的特征选择算法

肖利军,郭继昌(),顾翔元   

  1. 天津大学 电气自动化与信息工程学院, 天津 300072
  • 收稿日期:2019-06-07 出版日期:2019-10-20 发布日期:2019-10-30
  • 通讯作者: 郭继昌
  • 作者简介:肖利军(1994—),男,天津大学硕士研究生,E-mail:xljtju@163.com.
  • 基金资助:
    国家自然科学基金(61771334)

Algorithm for selection of features based on dynamic weights using redundancy

XIAO Lijun,GUO Jichang(),GU Xiangyuan   

  1. School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
  • Received:2019-06-07 Online:2019-10-20 Published:2019-10-30
  • Contact: Jichang GUO

摘要:

由于候选特征与类标签间的相关性,候选特征、已选特征与类标签间的交互性以及特征间的冗余性是特征选择算法应考虑的重要因素,而一些基于互信息和三维互信息的特征选择算法没有同时考虑相关性、交互性和冗余性信息,这影响了它们的性能。针对该问题,提出一种采用冗余性动态权重的特征选择算法,将对称不确定性和三路交互信息作为评价指标,采用一种动态更新特征权重的方法使目标函数在考虑相关性、交互性的基础上,同时考虑特征间的冗余性。在10种数据集上利用3种分类器与典型的基于互信息的特征选择算法做了对比实验,结果表明所提算法具有更好的特征选择性能。

关键词: 特征选择, 冗余性, 三路交互信息, 对称不确定性, 分类

Abstract:

The relevance between the candidate feature and the class label, the interaction information among the candidate feature, the selected feature and the class label, and the redundancy between the candidate feature and the selected feature are important factors that should both be considered by feature selection algorithms. Some feature selection algorithms based on mutual information and three-dimensional mutual information do not consider the relevance, the interaction information and the redundancy at the same time, which affects their performance. Therefore, a feature selection algorithm based on dynamic weights using redundancy is proposed. The algorithm uses three-way interaction information and symmetrical uncertainty as criteria and adopts a method for dynamically updating the weights of candidate features. The objective function can emphatically consider the redundancy between the candidate features and the selected feature on the basis of the fact that the relevance and interaction information are considered. Comparative experiments with typical feature selection algorithms based on mutual information are conducted on ten datasets by using three classifiers. The experimental results show that the proposed algorithm has a better feature selection performance.

Key words: feature selection, redundancy, three-way interaction information, symmetrical uncertainty, classification

中图分类号: 

  • TP301.6