Electronic Science and Technology ›› 2022, Vol. 35 ›› Issue (3): 8-15.doi: 10.16180/j.cnki.issn1007-7820.2022.03.002

Previous Articles     Next Articles

Speech Enhancement Method Based on Convolutional Recurrent Network and Non-Local Module

Hui LI1,Hao JING2,Kanghua YAN2,Lianghao XU2   

  1. 1. School of Physics and Electronic Information Engineering,Henan Polytechnic University,Jiaozuo 454000,China
    2. School of Electrical Engineering and Automation,Henan Polytechnic University,Jiaozuo 454000,China
  • Received:2020-11-16 Online:2022-03-15 Published:2022-04-02
  • Supported by:
    National Natural Science Foundation of China(11804081);Basic and Frontier Technology Research Program of Henan(152300410103)


The existing deep neural network speech enhancement methods ignore the importance of phase spectrum learning and cause the enhanced speech quality to be unsatisfactory. In view of this problem, a speech enhancement method based on convolutional recurrent network and non-local modules is proposed in the present study. By designing an encoder-decoder network, the time-domain representation of the speech signal is used as the input of the encoding end for deep feature extraction, so as to make full use of the amplitude information and phase information of the speech signal. Non-local modules are added to the convolutional layers of the encoder and decoder to extract key features of the speech sequence while suppressing useless features. A gated loop unit network is introduced to capture the timing correlation information between the speech sequences. The experimental results on the ST-CMDS Chinese speech dataset show that compared with the unprocessed noisy speech, the quality and intelligibility of the enhanced speech are improved by 61% and 7.93% on average.

Key words: speech enhancement, deep neural network, convolutional recurrent network, non-local module, supervised learning, gated recurrent unit, magnitude spectrum, phase spectrum

CLC Number: 

  • TN912.35