西安电子科技大学学报 ›› 2020, Vol. 47 ›› Issue (1): 104-110.doi: 10.19665/j.issn1001-2400.2020.01.015

• • 上一篇    下一篇

融合多头自注意力机制的语音增强方法

常新旭,张杨,杨林,寇金桥,王昕,徐冬冬   

  1. 北京计算机技术及应用研究所, 北京 100854
  • 收稿日期:2019-09-28 出版日期:2020-02-20 发布日期:2020-03-19
  • 作者简介:常新旭(1995—),男,北京计算机技术及应用研究所硕士研究生,E-mail:614032144@qq.com
  • 基金资助:
    装备发展部信息系统局“十三五”预研课题(31511040401);装备预研领域基金(61400040201)

Speech enhancement method based on the multi-head self-attention mechanism

CHANG Xinxu,ZHANG Yang,YANG Lin,KOU Jinqiao,WANG Xin,XU Dongdong   

  1. Beijing Institute of Computer Technology and Application, Beijing 100854, China
  • Received:2019-09-28 Online:2020-02-20 Published:2020-03-19

摘要:

由于人类在听觉感知过程中存在掩蔽效应,因此能量高的信号会屏蔽掉其他能量低的信号。受到这一现象启发,结合自注意力方法和多头注意力方法,提出融合多头自注意力机制的语音增强方法。通过对输入的含噪语音特征施加多头自注意力计算,可以使得输入语音特征的干净语音部分和噪声部分有较为明显的区分,从而使得后续的处理能够更有效地抑制噪声。实验结果表明,这种语音增强方法相较于使用门控循环神经网络的语音增强方法,语音增强性能更好,增强语音的语音质量与可懂度更高。

关键词: 语音增强, 深度神经网络, 自注意力, 多头注意力, 门控循环单元

Abstract:

The human ear can only accept one sound signal at one time, and the signal with the highest energy will shield other signals with low energy. According to the above principle, this paper combines the self-attention and the multi-head attention to propose a speech enhancement method based on the multi-head self-attention mechanism. By applying multi-head self-attention calculation to the input noisy speech features, the clean speech part and the noise part of the input speech feature can be clearly distinguished, thereby enabling subsequent processing to suppress noise more effectively. Experimental results show that the proposed method significantly outperforms the method based on the recurrent neural network in terms of both speech quality and intelligibility.

Key words: speech enhancement, deep neural network, self attention, multi-head attention, gated recurrent unit

中图分类号: 

  • TN912.35