西安电子科技大学学报

• 研究论文 • 上一篇    下一篇

面向工业大数据的多层增量特征提取方法

汪星;黄小瑜;刘瑄璞;孔宪光;牛萌   

  1. (西安电子科技大学 机电工程学院,陕西 西安 710071)
  • 收稿日期:2017-11-11 发布日期:2018-09-25
  • 通讯作者: 黄小瑜(1994-),女,西安电子科技大学硕士研究生,E-mail: huangxy0828@126.com
  • 作者简介:汪星(1989-),男,助理工程师,硕士,E-mail:wangx@xidian.edu.cn
  • 基金资助:

    陕西省国际科技合作与交流计划资助项目(2016KW-048,BD18016040001)

Multi-layer incremental feature extraction method for industrial big data

WANG Xing;HUANG Xiaoyu;LIU Xuanpu;KONG Xianguang;NIU Meng   

  1. (School of Mechano-electronic Engineering, Xidian Univ., Xian 710071, China)
  • Received:2017-11-11 Published:2018-09-25

摘要:

针对工业大数据中高维小样本情况导致增量线性判别分析失效问题,提出了一种面向工业大数据的多层增量特征提取方法,对高维小样本数据进行有效降维,并最大限度保留样本的变异信息和判别信息.首先,利用滑动窗口增量实时更新数据流,检测和过滤离群点,基于增量主成分分析对数据进行初步特征提取,利用Fisher准则函数量化各主元所包含的分类信息;然后,采用熵值法确定各主元贡献率和识别能力的权重,对主元进行筛选,由筛选出的主元构成新的特征空间;最后,将当前窗口的高维数据通过增量线性判别分析投影,完成二次特征提取的同时确定样本类别.实验结果表明,该方法可有效提取实时数据特征,同时很好地保留其判别能力.

关键词: 工业大数据, 高维小样本, 特征提取, 增量线性判别分析, 增量主成分分析, 熵值法

Abstract:

We focus on the failure of Incremental Linear Discriminant Analysis in the case of a high-dimensional small sample in industrial big data. An improved multi-layer incremental feature extraction method for industrial big data is proposed to solve this problem which can reduce dimension effectively, and at the same time the variance information and discriminant information on the sample is kept as much as possible. First, the data streams are updated incrementally with the sliding window in real time, and its outliers are detected and filtered. Second, the incremental principal component analysis is made initially to extract the features of the data and the Fisher discriminant function is used to quantify the classification information contained in each principal element. Then the contribution and recognition ability of principal components are weighted by the entropy evaluation method to select the principal components. The selected principal components constitute a new feature space. Finally, Incremental Linear Discriminant Analysis is made to complete the second feature extraction and classification for the high-dimensional data. Experimental results indicate that the improved method could extract the features effectively in real-time and that the industrial data can be discriminated better.

Key words: industrial big data, high dimensional and small sample, feature extraction, incremental linear discriminant analysis, incremental principal component analysis, entropy method