西安电子科技大学学报

• 研究论文 • 上一篇    下一篇

情感语音的非线性动力学特征

姚慧;孙颖;张雪英   

  1. (太原理工大学 信息工程学院,山西 太原  030024)
  • 收稿日期:2015-06-15 出版日期:2016-10-20 发布日期:2016-12-02
  • 作者简介:姚慧(1991- ),女,太原理工大学硕士研究生,E-mail:5366970@qq.com.
  • 基金资助:

    国家自然科学基金资助项目(61371193);山西省青年科技研究基金资助项目(2013021016-2);山西省回国留学人员科研资助项目(2013-034)

Research on nonlinear dynamics features of emotional speech

YAO Hui;SUN Ying;ZHANG Xueying   

  1. (College of Information Engineering, Taiyuan Univ. of Technology, Taiyuan  030024, China)
  • Received:2015-06-15 Online:2016-10-20 Published:2016-12-02

摘要:

基于语音发声过程中的混沌特性,提出了非线性动力学模型与情感语音信号处理相结合的方法.提取了该模型下情感语音的非线性特征: 最小延迟时间、关联维数、Kolmogorov熵、最大Lyapunov指数和Hurst指数.设计情感语音识别对比实验以验证非线性特征性能.首先,选用德国柏林语音库和自主录制的TYUT2.0情感语音数据库中的3种情感(高兴、悲伤和愤怒)作为实验数据来源;其次,分别提取非线性特征、韵律特征和梅尔频率倒谱系数特征,采用支持向量机进行了情感识别.结果表明,非线性特征在柏林数据库实验中的识别率高于韵律特征识别率,但是略低于梅尔频率倒谱系数特征识别率,验证了非线性特征是一组区分情感的有效特征; 在TYUT2.0数据库中的识别率均高于韵律特征和梅尔频率倒谱系数特征的识别率,在语料真实度和自然度更高的TYUT2.0数据库中识别结果相对更高,鲁棒性更好.

关键词: 情感语音识别, 混沌特性, 非线性特征, 动力学模型

Abstract:

The application of nonlinear measures based on the chaotic characteristics of emotional speech is proposed. Nonlinear features such as minimum delay time, dimension correlation, Kolmogorov entropy, Lyapunov exponent and Hurst exponent are extracted from the emotional speech signal. The performance of nonlinear features is verified by the comparisons of recognition rates of different features (nonlinear characteristics, prosodic features and MFCC features). First, the Berlin emotional speech database and TYUT2.0 emotional speech database are chosen as the corpus independently, both covering three emotional classifications (anger, happiness and fear). The effectiveness of the nonlinear characteristics is tested on the Support Vector Machine Network. The result shows that the performance of nonlinear features outperforms that of prosodic features on the Berlin emotional speech database and that of prosodic features and MFCC on TYUT2.0 emotional speech database. In addition, nonlinear features have obvious advantage in detecting more natural emotional speech and better robustness.

Key words: emotional speech recognition, chaos theory, nonlinear features, dynamic model