西安电子科技大学学报 ›› 2022, Vol. 49 ›› Issue (2): 108-115.doi: 10.19665/j.issn1001-2400.2022.02.013

• 信息与通信工程 • 上一篇    下一篇

结合自适应软掩模和混合特征的语音增强

张敏(),贾海蓉(),张刚敏(),王素英()   

  1. 太原理工大学 信息与计算机学院,山西 太原 030024
  • 收稿日期:2020-10-28 出版日期:2022-04-20 发布日期:2022-05-31
  • 通讯作者: 贾海蓉
  • 作者简介:张 敏(1997—),女,太原理工大学硕士研究生,E-mail: 1640167660@qq.com;|张刚敏(1997—),女,太原理工大学硕士研究生,E-mail: 1353430842@qq.com;|王素英(1998—),女,太原理工大学硕士研究生,E-mail: 2356275208@qq.com
  • 基金资助:
    国家自然科学基金(12004275);山西省留学回国人员科技活动择优资助(20200017);Research Project Supported by Shanxi Scholarship Council of China(2020042);山西省应用基础研究项目自然科学基金(20210302123186)

Speech enhancement combining the self-adaptive soft mask and mixed features

ZHANG Min(),JIA Hairong(),ZHANG Gangmin(),WANG Suying()   

  1. College of Information and Computer,Taiyuan University of Technology,Taiyuan 030024,China
  • Received:2020-10-28 Online:2022-04-20 Published:2022-05-31
  • Contact: Hairong JIA

摘要:

针对采用梅尔域特征进行语音增强时存在有效特征丢失的问题,提出采用更符合人耳压缩感知的幂函数提取带噪语音的伽马通域特征,将其与梅尔域特征深度混合进行语音增强,用于改善梅尔域滤波器在高频处丢失有效特征的局限性。同时,为了捕获语音的瞬变信息和相邻帧语音信息间的联系,求取混合特征的差分导数,将其与初始特征融合得到混合特征。其次,由于传统的时频掩蔽无法根据信噪比的不同自动调节,从而影响了增强语音的可懂度。为使系统在提升语音质量的同时尽可能地减少语音失真,提出一种可以根据信噪比信息自适应调节的软掩模,其可以根据语音信噪比信息的不同进行自动调节,得到相应信噪比条件下的掩蔽值,并在其中融入可提升语音可懂度的相位差信息。最后,对不同噪声背景下的多条语音进行实验。实验结果表明,采用混合特征和自适应软掩模进行语音增强时,保持了语音频谱的完整性,可提升主观语音增强质量和短时客观可懂度,验证了所提算法的有效性。

关键词: 语音增强, 神经网络, 信噪比, 混合特征, 软掩模

Abstract:

Aiming at the problem of the loss of effective features when using Mel domain features for speech enhancement,this paper proposes a method to extract the Gammatone domain features of noisy speech using a power function that is more in line with human ear compressive perception,and deep-mix it with Mel domain features for speech enhancement.In order to improve the limitation of the Mel domain filter losing effective features at high frequencies.At the same time,in order to capture the connection between the transient information on the speech and the speech information on the adjacent frames,the differential derivative of the mixed feature is obtained,and the mixed feature is obtained by fusing it with the initial feature.Second,since traditional time-frequency masking cannot be automatically adjusted according to the difference in the signal-to-noise ratio,the intelligibility of an enhanced speech is affected.In order to improve the speech quality while improving the speech intelligibility,a soft mask that can be adjusted adaptively according to the signal-to-noise ratio information is proposed,and the phase difference information of the voice is incorporated.Finally,experiments are conducted on multiple speeches under different noise backgrounds.Experimental results show that when using mixed features and self-adaptive soft masks for speech enhancement,the subjective speech quality and short-term objective intelligibility of the enhanced speech can be improved,which verifies the effectiveness of the proposed algorithm in this paper.

Key words: speech enhancement, neural networks, signal-to-noise ratio, mixed feature, soft mask

中图分类号: 

  • TN912.35