Electronic Science and Technology ›› 2025, Vol. 38 ›› Issue (1): 45-51.doi: 10.16180/j.cnki.issn1007-7820.2025.01.007

Previous Articles     Next Articles

Speech Enhancement Based on the Logarithmic Processing and Time Frequency Masking Estimation

WANG Xianyun(), DOU Shanshan, CHENG Chuhao   

  1. Acoustic Department,China Electronics Technology Group Corporation Third Research Institute,Beijing 100015,China
  • Received:2023-05-11 Revised:2023-06-25 Online:2025-01-15 Published:2025-01-06
  • Supported by:
    Science and Technology on Near-Surface Detection Laboratory(6142414210405)

Abstract:

In view of the problem of inaccurate speech estimation by time-spectrum model, this study proposes a model transformation method to obtain the logarithmic probability density function of noise and speech. With the help of the logarithmic relationship among noisy speech, clean speech and noise, and the MMSE(Minimum Mean Square Error) estimation theory, a time-frequency mask of the estimating log-spectrum of speech is derived. A soft mask is also derived based on the logarithmic probability distribution of speech and noise, which can weight the logarithmic subbands of noisy speech to reduce noise and improve the accuracy of speech estimation. The simulation results show that compared with unprocessed noisy speech, the proposed method has an improvement of more than 3 dB in noise suppression. The average improvement in auditory perception of time-frequency mask and soft mask based on MMSE is 27.7% and 29.4%, and the average improvement in intelligibility is 12.7% and 14.3%, respectively.

Key words: speech processing, speech enhancement, logarithmic probability density function, time-frequency mask, noise suppression, unsupervised learning, soft mask, logarithmic spectrum

CLC Number: 

  • TN912.35