西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (1): 86-92.doi: 10.19665/j.issn1001-2400.2019.01.014

• • 上一篇    下一篇

融合非线性幂函数和谱减法的CFCC特征提取

白静,史燕燕(),薛珮芸,郭倩岩   

  1. 太原理工大学 信息与计算机学院,山西 太原 030024
  • 收稿日期:2018-06-26 出版日期:2019-02-20 发布日期:2019-03-05
  • 通讯作者: 史燕燕
  • 作者简介:白 静(1965-),女,教授,博士, E-mail: bj613@126.com
  • 基金资助:
    山西省科技攻关(社会发展)项目(20120313013-6);山西省青年科技研究基金(2013021016-1)

CFCC feature extraction for fusion of the power-law nonlinearity function and spectral subtraction

BAI Jing,SHI Yanyan(),XUE Peiyun,GUO Qianyan   

  1. College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
  • Received:2018-06-26 Online:2019-02-20 Published:2019-03-05
  • Contact: Yanyan SHI

摘要:

为提高噪声环境下的语音识别准确率,提出一种改进的语音特征提取算法。该算法采用模拟人耳听觉特性的非线性幂函数提取一种新的耳蜗滤波倒谱系数,并在特征提取前端引入谱减法对信号进行增强,将提取到的新的特征及其一阶差分组成一种混合特征参数;再联合主成分分析对该混合特征进行降维,将最终得到的特征用于一个非特定人、孤立词、小词汇量的语音识别系统。实验结果表明:采用非线性幂函数提取的耳蜗滤波倒谱系数特征与传统的耳蜗滤波倒谱系数特征相比,明显提高了语音识别准确率;混合特征参数相比单一特征能达到更佳的语音识别性能;结合主成分分析后的特征集在信噪比为0dB时的识别正确率可达到88.10%。

关键词: 语音识别, 非线性幂函数, 耳蜗滤波倒谱系数, 谱减法

Abstract:

This paper presents an improved speech feature extraction algorithm for improving the accuracy of speech recognition in noisy environment. A New Cochlear Filter Cepstral Coefficient(NCFCC) is extracted by the power-law nonlinear function which can simulate the auditory characteristics of the human ear. Then, the spectral subtraction is introduced in the feature extraction front end to enhance the signal, and the new feature and the first order difference are composed of a mixed feature parameter, after which the combined principal component analysis is made to reduce the dimension of the hybrid feature. The final feature is used in a non-specific persons, isolated words, and small-vocabulary speech recognition system. Experimental results show that, compared with the traditional Cochlear Filter Cepstral Coefficients(CFCC) feature, the Cochlear Filter Cepstral Coefficients extracted from the power-law nonlinear function significantly improve the accuracy of speech recognition. The mixed feature parameter can achieve a better speech recognition performance than a single feature. Combined with the feature set of the principal component analysis(PCA) ,the recognition accuracy can reach up to 88.10% when the signal to noise ratio(SNR) is 0 dB.

Key words: speech recognition, power-law nonlinearity function, cochlear filter cepstral coefficients, spectral subtraction

中图分类号: 

  • TN912.34