西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (2): 89-94.doi: 10.19665/j.issn1001-2400.2019.02.015

• • 上一篇    下一篇

感知联合优化的深度神经网络语音增强方法

袁文浩,娄迎曦,梁春燕,王志强   

  1. 山东理工大学 计算机科学与技术学院,山东 淄博 255000
  • 收稿日期:2018-09-25 出版日期:2019-04-20 发布日期:2019-04-20
  • 作者简介:袁文浩(1985-),男,讲师,博士,E-mail:why_sdut@126.com.
  • 基金资助:
    国家自然科学基金青年基金(61701286);国家自然科学基金青年基金(11704229);山东省自然科学基金(ZR2015FL003);山东省自然科学基金(ZR2017MF047);山东省自然科学基金(ZR2017LA011);山东省自然科学基金(ZR2018LF002)

Speech enhancement method based on the perceptual joint optimization deep neural network

YUAN Wenhao,LOU Yingxi,LIANG Chunyan,WANG Zhiqiang   

  1. College of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
  • Received:2018-09-25 Online:2019-04-20 Published:2019-04-20

摘要:

基于深度神经网络的语音增强模型的训练一般采用均方误差作为代价函数,没有针对语音增强问题进行优化。针对这一问题,从相邻帧网络输出之间的相关性和各时频单元的语音存在情况两方面进行考虑;通过在代价函数中对相邻帧的网络输出进行关联,并设计一个反映时频单元语音存在情况的感知系数,提出了一种感知联合优化的深度神经网络语音增强方法。实验结果表明,相比基于均方误差的语音增强方法,该方法显著地提高了增强语音的语音质量和可懂度,具有更好的语音增强性能。

关键词: 语音增强, 深度神经网络, 代价函数, 相关性

Abstract:

In the training of speech enhancement models based on the deep neural network (DNN), the mean square error is generally adopted as the cost function, which is not optimized for the speech enhancement problem. In view of this problem, to consider the correlation between the adjacent frames of the network’s output and the presence of the speech component in each time-frequency unit, by correlating the adjacent frames of the network’s output and designing a perceptual coefficient related to the presence of the speech component in time-frequency units in the cost function, a speech enhancement method based on the joint optimization DNN is proposed. Experimental results show that compared with the speech enhancement method based on the mean square error, the proposed method significantly improves the quality and intelligibility of the enhanced speech and has a better speech enhancement performance.

Key words: speech enhancement, deep neural network, cost function, correlation

中图分类号: 

  • TN912.3