电子科技 ›› 2025, Vol. 38 ›› Issue (1): 45-51.doi: 10.16180/j.cnki.issn1007-7820.2025.01.007

• • 上一篇    下一篇

基于对数处理机制和时频掩蔽估计的语音增强

王显云(), 窦姗姗, 程楚皓   

  1. 中国电子科技集团公司第三研究所 声学部,北京 100015
  • 收稿日期:2023-05-11 修回日期:2023-06-25 出版日期:2025-01-15 发布日期:2025-01-06
  • 通讯作者: 王显云(1984-),男,E-mail:wang2005ji@126.com,博士,高级工程师。研究方向:语音与音频信号处理。
  • 基金资助:
    近地面探测技术重点实验室基金(6142414210405)

Speech Enhancement Based on the Logarithmic Processing and Time Frequency Masking Estimation

WANG Xianyun(), DOU Shanshan, CHENG Chuhao   

  1. Acoustic Department,China Electronics Technology Group Corporation Third Research Institute,Beijing 100015,China
  • Received:2023-05-11 Revised:2023-06-25 Online:2025-01-15 Published:2025-01-06
  • Supported by:
    Science and Technology on Near-Surface Detection Laboratory(6142414210405)

摘要:

针对时频谱模型估计语音不准确的问题,文中提出采用模型变换的方式来获得噪声和语音的对数概率密度函数,同时借助带噪语音、干净语音和噪声之间的对数关系并结合最小均方误差(Minimum Mean Square Error, MMSE)估计理论推导出估计语音对数谱的时频掩模。基于语音和噪声的对数概率分布推导出了一种软掩模,该软掩模可对带噪语音的对数子带进行加权以降低噪声,提高语音估计的准确性。仿真结果表明,与未处理的含噪语音相比,所提方法在噪声抑制方面具有3 dB以上的提升,基于最小均方误差的时频掩模和软掩模在听觉感知方面的平均提升量分别为27.7%和29.4%,在可懂度方面的平均提升量分别为12.7%和14.3%。

关键词: 语音处理, 语音增强, 对数概率密度函数, 时频掩模, 噪声抑制, 非监督学习, 软掩蔽, 对数谱

Abstract:

In view of the problem of inaccurate speech estimation by time-spectrum model, this study proposes a model transformation method to obtain the logarithmic probability density function of noise and speech. With the help of the logarithmic relationship among noisy speech, clean speech and noise, and the MMSE(Minimum Mean Square Error) estimation theory, a time-frequency mask of the estimating log-spectrum of speech is derived. A soft mask is also derived based on the logarithmic probability distribution of speech and noise, which can weight the logarithmic subbands of noisy speech to reduce noise and improve the accuracy of speech estimation. The simulation results show that compared with unprocessed noisy speech, the proposed method has an improvement of more than 3 dB in noise suppression. The average improvement in auditory perception of time-frequency mask and soft mask based on MMSE is 27.7% and 29.4%, and the average improvement in intelligibility is 12.7% and 14.3%, respectively.

Key words: speech processing, speech enhancement, logarithmic probability density function, time-frequency mask, noise suppression, unsupervised learning, soft mask, logarithmic spectrum

中图分类号: 

  • TN912.35