Journal of Xidian University ›› 2021, Vol. 48 ›› Issue (3): 91-98.doi: 10.19665/j.issn1001-2400.2021.03.012

• Computer Science and Technology & Artificial Intelligence • Previous Articles     Next Articles

Combination of dynamic features with a new mask to optimize neural network speech enhancement

MEI Shulin1(),JIA Hairong1(),WANG Xiaogang2(),WU Yifeng2()   

  1. 1. College of Information and Computer,Taiyuan University of Technology,Taiyuan 030024,China
    2. Network Optimization Center,China Unicom Shanxi Branch,Taiyuan 030000,China
  • Received:2019-12-12 Online:2021-06-20 Published:2021-07-05
  • Contact: Hairong JIA E-mail:1243748225@qq.com;helenjia722@163.com;wangxg117@chinaunicom.cn;wyf911@126.com

Abstract:

Concerning the problem that the Neural Network speech enhancement algorithm cannot fully represent the nonlinear structure of speech due to feature selection,which leads to speech distortion.This paper proposes the combination of dynamic features with a new mask to optimize neural network speech enhancement.First,three features of noisy speech are extracted and spliced to obtain static features.Then,the first and second difference derivatives are obtained to capture the instantaneous signals of speech and fuse them into dynamic features.The combination of dynamic and static features completes internal complementarity of features and reduced speech distortion.Second,in order to enhance the intelligibility and clarity of speech at the same time,an adaptive mask is proposed,which can adjust the energy ratio of speech and noise as well as the ratio of the traditional mask and the square root mask.The Gammatone channel weight is used to modify the mask value in each channel to simulate the human auditory system and further improve the speech intelligibility.Finally,the simulation of multiple voices under different noise backgrounds shows that compared with different literature algorithms,the algorithm has a higher SNR,subjective speech quality and short-term objective intelligibility,which verifies the effectiveness of the algorithm.

Key words: dynamic characteristics, adaptive mask, speech enhancement, Neural Network

CLC Number: 

  • TN912.35