西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (4): 130-136.doi: 10.19665/j.issn1001-2400.2019.04.018

• • 上一篇    下一篇

一种时频平滑的深度神经网络语音增强方法

袁文浩,梁春燕,娄迎曦,房超,王志强   

  1. 山东理工大学 计算机科学与技术学院,山东 淄博 255000
  • 收稿日期:2019-03-22 出版日期:2019-08-20 发布日期:2019-08-15
  • 作者简介:袁文浩(1985—),男,讲师, 博士,E-mail: why_sdut@126.com.
  • 基金资助:
    国家自然科学基金青年基金(61701286);国家自然科学基金青年基金(11704229);山东省自然科学基金(ZR2015FL003);山东省自然科学基金(ZR2017MF047);山东省自然科学基金(ZR2017LA011);山东省自然科学基金(ZR2018LF002)

Speech enhancement method based on the time-frequency smoothing deep neural network

YUAN Wenhao,LIANG Chunyan,LOU Yingxi,FANG Chao,WANG Zhiqiang   

  1. School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
  • Received:2019-03-22 Online:2019-08-20 Published:2019-08-15

摘要:

由于现有的基于深度神经网络的语音增强方法在网络结构的设计上缺乏对语音增强问题自身特点的考虑, 针对这一问题,基于语音增强在时间和频率两个维度上的不同特性,受传统语音增强方法中的含噪语音局部特征计算方法启发,设计了一种在时间和频率两个维度上进行不同处理的时频平滑网络。该网络采用门控循环单元来表达含噪语音在时间上的相关性,同时采用卷积神经网络来表达含噪语音在频率上的相关性,实现了类似传统语音增强方法的时频平滑处理。实验结果表明,这种时频平滑网络在保证语音增强系统因果性的前提下,相比其他网络显著地提高了语音增强性能,增强后的语音具有更好的语音质量和可懂度。

关键词: 语音增强, 时频平滑, 卷积神经网络, 深度神经网络

Abstract:

In the existing speech enhancement methods based on the deep neural network, the characteristics of speech enhancement problem are not fully considered in the design of the network structure. In view of this problem, based on the different characteristics of speech enhancement in time and frequency, inspired by the feature calculation method in the traditional speech enhancement methods, a time-frequency smoothing network with different processings in time and frequency is designed. In this network, a gated recurrent unit is used to express the correlation of noisy speech with time, and a convolutional neural network is used to express the correlation of the noisy speech with frequency, which realizes a time-frequency smoothing process similar to that of the traditional speech enhancement methods. Experimental results show that the proposed time-frequency smoothing network can significantly improve the speech enhancement performance compared with other networks under the premise of ensuring the causality of the speech enhancement system and that the enhanced speech has a better speech quality and intelligibility.

Key words: speech enhancement, time-frequency smoothing, convolutional neural network, deep neural network

中图分类号: 

  • TN912.3