西安电子科技大学学报 ›› 2022, Vol. 49 ›› Issue (3): 183-190.doi: 10.19665/j.issn1001-2400.2022.03.020

• 计算机科学与技术&人工智能 • 上一篇    下一篇

一种用于实时语音增强的卷积准循环网络

时云龙(),袁文浩(),胡少东(),娄迎曦()   

  1. 山东理工大学 计算机科学与技术学院,山东 淄博 255000
  • 收稿日期:2021-05-25 修回日期:2021-12-08 出版日期:2022-06-20 发布日期:2022-07-04
  • 通讯作者: 袁文浩
  • 作者简介:时云龙(1996—),男,山东理工大学硕士研究生,E-mail: syljoy@163.com|胡少东(1996—),男,山东理工大学硕士研究生,E-mail: hsd_sdut@163.com|娄迎曦(1996—),女,山东理工大学硕士研究生,E-mail: lyx_joy@163.com
  • 基金资助:
    国家自然科学基金(61701286)

Convolutional quasi-recurrent network for real-time speech enhancement

SHI Yunlong(),YUAN Wenhao(),HU Shaodong(),LOU Yingxi()   

  1. School of Computer Science and Technology,Shandong University of Technology,Zibo 255000,China
  • Received:2021-05-25 Revised:2021-12-08 Online:2022-06-20 Published:2022-07-04
  • Contact: Wenhao YUAN

摘要:

为了在保证实时性的前提下,进一步提高深度神经网络的语音增强性能,提出一种用于实时语音增强的卷积准循环网络。该网络采用因果形式的输入,只利用含噪语音当前帧及过去帧的时频域特征,以满足实时语音增强方法的输入要求;基于准循环神经网络对含噪语音时间维度上的相关性进行建模,利用其对含噪语音序列的并行处理能力,提高网络模型的计算效率;同时使用卷积层改进准循环神经网络在隐层对含噪语音频率维度特征的计算方式,使网络模型能够更好地利用含噪语音相邻频带之间的局部相关性,提高网络模型的语音增强性能。实验结果表明,与基于准循环神经网络的语音增强方法相比,基于卷积准循环网络的语音增强方法不仅提高了语音增强性能,还降低了网络模型的参数量;与其他语音增强方法相比,卷积准循环网络在保证因果形式输入的前提下,有效地抑制了背景噪声对目标语音的干扰、降低了目标语音的失真程度,拥有更好的语音增强性能。最后,在不同计算平台上验证了基于卷积准循环网络的语音增强方法的实时性。

关键词: 语音增强, 准循环神经网络, 卷积神经网络, 实时性

Abstract:

To improve the speech enhancement performance of deep neural networks under the premise of ensuring the real-time performance,a convolutional quasi-recurrent network for real-time speech enhancement is proposed.The network uses a causal input,and it only uses the time-frequency domain features of the current and past frames of the noisy speech to meet the input requirements of the real-time speech enhancement method.The network uses the quasi-recurrent neural network to model the correlation of the noisy speech in the time domain,and uses its parallel calculations capability for the noisy speech sequences to improve the computational efficiency of the model.The network uses the convolutional layer to improve the feature extraction method of the quasi-recurrent neural network for the frequency domain feature of the noisy speech,which enables the network to better utilize the local correlation between the adjacent frequency bands of the noisy speech and improve the performance of speech enhancement.Experimental results show that,compared with the speech enhancement method based on the quasi-recurrent network,the speech enhancement method based on the convolutional quasi-recurrent network not only improves the speech enhancement performance,but also reduces the parameter number of the network model.Compared with existing methods,the convolutional quasi-recurrent network effectively suppresses the interference of background noise on the target speech,reduces the distortion of the target speech,and has a better speech enhancement performance under the premise of ensuring the causal input.The real-time performance of the speech enhancement method based on the convolutional quasi-recurrent network is verified on different computing platforms.

Key words: speech enhancement, quasi-recurrent network, convolutional neural network, real-time performance

中图分类号: 

  • TN912