西安电子科技大学学报 ›› 2021, Vol. 48 ›› Issue (3): 91-98.doi: 10.19665/j.issn1001-2400.2021.03.012

• 计算机科学与技术&人工智能 • 上一篇    下一篇

动态特征联合新掩模优化神经网络语音增强

梅淑琳1(),贾海蓉1(),王晓刚2(),武奕峰2()   

  1. 1.太原理工大学 信息与计算机学院,山西 太原 030024
    2.中国联通 山西省分公司网络优化中心,山西 太原 030000
  • 收稿日期:2019-12-12 出版日期:2021-06-20 发布日期:2021-07-05
  • 通讯作者: 贾海蓉
  • 作者简介:梅淑琳(1996—),女,太原理工大学硕士研究生,E-mail:1243748225@qq.com|王晓刚(1977—),男, E-mail:wangxg117@chinaunicom.cn|武奕峰(1977—),男, E-mail:wyf911@126.com
  • 基金资助:
    国家自然科学基金(12004275);山西省留学回国人员科技活动择优资助(20200017)

Combination of dynamic features with a new mask to optimize neural network speech enhancement

MEI Shulin1(),JIA Hairong1(),WANG Xiaogang2(),WU Yifeng2()   

  1. 1. College of Information and Computer,Taiyuan University of Technology,Taiyuan 030024,China
    2. Network Optimization Center,China Unicom Shanxi Branch,Taiyuan 030000,China
  • Received:2019-12-12 Online:2021-06-20 Published:2021-07-05
  • Contact: Hairong JIA

摘要:

针对神经网络语音增强算法因特征选取不能全面表示语音非线性结构导致语音质量较差的问题,提出一种动态特征联合新掩模优化神经网络语音增强的方法。首先,提取带噪语音的3种特征并进行拼接以得到静态特征,后求一阶、二阶差分导数,捕捉语音的瞬息信号,融合成动态特征,动静结合完成特征内部互补,减少语音失真。其次,为了使增强语音的可懂度和清晰度同时达到最好,提出一种新的自适应掩模,它既能自适应调整语音、噪声的能量比例,又能自适应调节传统掩模和平方根掩模的比例;并用Gammatone通道权重修改每个通道内的掩模值,模仿人类听觉系统,进一步提升语音的可懂度。最后,对不同噪声背景下的多条语音进行实验仿真。结果表明,与已有的文献中不同算法相比,该算法的信噪比、主观语音质量、短时客观可懂度值都较高,验证了该算法的有效性。

关键词: 动态特征, 自适应掩模, 语音增强, 神经网络

Abstract:

Concerning the problem that the Neural Network speech enhancement algorithm cannot fully represent the nonlinear structure of speech due to feature selection,which leads to speech distortion.This paper proposes the combination of dynamic features with a new mask to optimize neural network speech enhancement.First,three features of noisy speech are extracted and spliced to obtain static features.Then,the first and second difference derivatives are obtained to capture the instantaneous signals of speech and fuse them into dynamic features.The combination of dynamic and static features completes internal complementarity of features and reduced speech distortion.Second,in order to enhance the intelligibility and clarity of speech at the same time,an adaptive mask is proposed,which can adjust the energy ratio of speech and noise as well as the ratio of the traditional mask and the square root mask.The Gammatone channel weight is used to modify the mask value in each channel to simulate the human auditory system and further improve the speech intelligibility.Finally,the simulation of multiple voices under different noise backgrounds shows that compared with different literature algorithms,the algorithm has a higher SNR,subjective speech quality and short-term objective intelligibility,which verifies the effectiveness of the algorithm.

Key words: dynamic characteristics, adaptive mask, speech enhancement, Neural Network

中图分类号: 

  • TN912.35