Journal of Xidian University ›› 2022, Vol. 49 ›› Issue (2): 108-115.doi: 10.19665/j.issn1001-2400.2022.02.013

• Information and Communications Engineering • Previous Articles     Next Articles

Speech enhancement combining the self-adaptive soft mask and mixed features

ZHANG Min(),JIA Hairong(),ZHANG Gangmin(),WANG Suying()   

  1. College of Information and Computer,Taiyuan University of Technology,Taiyuan 030024,China
  • Received:2020-10-28 Online:2022-04-20 Published:2022-05-31
  • Contact: Hairong JIA E-mail:1640167660@qq.com;helenjia722@163.com;1353430842@qq.com;2356275208@qq.com

Abstract:

Aiming at the problem of the loss of effective features when using Mel domain features for speech enhancement,this paper proposes a method to extract the Gammatone domain features of noisy speech using a power function that is more in line with human ear compressive perception,and deep-mix it with Mel domain features for speech enhancement.In order to improve the limitation of the Mel domain filter losing effective features at high frequencies.At the same time,in order to capture the connection between the transient information on the speech and the speech information on the adjacent frames,the differential derivative of the mixed feature is obtained,and the mixed feature is obtained by fusing it with the initial feature.Second,since traditional time-frequency masking cannot be automatically adjusted according to the difference in the signal-to-noise ratio,the intelligibility of an enhanced speech is affected.In order to improve the speech quality while improving the speech intelligibility,a soft mask that can be adjusted adaptively according to the signal-to-noise ratio information is proposed,and the phase difference information of the voice is incorporated.Finally,experiments are conducted on multiple speeches under different noise backgrounds.Experimental results show that when using mixed features and self-adaptive soft masks for speech enhancement,the subjective speech quality and short-term objective intelligibility of the enhanced speech can be improved,which verifies the effectiveness of the proposed algorithm in this paper.

Key words: speech enhancement, neural networks, signal-to-noise ratio, mixed feature, soft mask

CLC Number: 

  • TN912.35