Journal of Xidian University ›› 2019, Vol. 46 ›› Issue (2): 89-94.doi: 10.19665/j.issn1001-2400.2019.02.015

Previous Articles     Next Articles

Speech enhancement method based on the perceptual joint optimization deep neural network

YUAN Wenhao,LOU Yingxi,LIANG Chunyan,WANG Zhiqiang   

  1. College of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
  • Received:2018-09-25 Online:2019-04-20 Published:2019-04-20

Abstract:

In the training of speech enhancement models based on the deep neural network (DNN), the mean square error is generally adopted as the cost function, which is not optimized for the speech enhancement problem. In view of this problem, to consider the correlation between the adjacent frames of the network’s output and the presence of the speech component in each time-frequency unit, by correlating the adjacent frames of the network’s output and designing a perceptual coefficient related to the presence of the speech component in time-frequency units in the cost function, a speech enhancement method based on the joint optimization DNN is proposed. Experimental results show that compared with the speech enhancement method based on the mean square error, the proposed method significantly improves the quality and intelligibility of the enhanced speech and has a better speech enhancement performance.

Key words: speech enhancement, deep neural network, cost function, correlation

CLC Number: 

  • TN912.3