结合自适应软掩模和混合特征的语音增强

doi:10.19665/j.issn1001-2400.2022.02.013

Abstract

Abstract:

Aiming at the problem of the loss of effective features when using Mel domain features for speech enhancement,this paper proposes a method to extract the Gammatone domain features of noisy speech using a power function that is more in line with human ear compressive perception,and deep-mix it with Mel domain features for speech enhancement.In order to improve the limitation of the Mel domain filter losing effective features at high frequencies.At the same time,in order to capture the connection between the transient information on the speech and the speech information on the adjacent frames,the differential derivative of the mixed feature is obtained,and the mixed feature is obtained by fusing it with the initial feature.Second,since traditional time-frequency masking cannot be automatically adjusted according to the difference in the signal-to-noise ratio,the intelligibility of an enhanced speech is affected.In order to improve the speech quality while improving the speech intelligibility,a soft mask that can be adjusted adaptively according to the signal-to-noise ratio information is proposed,and the phase difference information of the voice is incorporated.Finally,experiments are conducted on multiple speeches under different noise backgrounds.Experimental results show that when using mixed features and self-adaptive soft masks for speech enhancement,the subjective speech quality and short-term objective intelligibility of the enhanced speech can be improved,which verifies the effectiveness of the proposed algorithm in this paper.

Key words: speech enhancement, neural networks, signal-to-noise ratio, mixed feature, soft mask

CLC Number:

TN912.35

ZHANG Min,JIA Hairong,ZHANG Gangmin,WANG Suying. Speech enhancement combining the self-adaptive soft mask and mixed features[J].Journal of Xidian University, 2022, 49(2): 108-115.

Figures/Tables 6

References 15

[1]	BAO F, ABDULLA W H. A New Ratio Mask Representation for CASA-Based Speech Enhancement[J]. IEEE/ACM Transactions on Audio,Speech and Language Processing, 2019, 27(1):7-19. doi: 10.1109/TASLP.2018.2868407
[2]	白静, 史燕燕, 薛珮芸, 等. 融合非线性幂函数和谱减法的CFCC特征提取[J]. 西安电子科技大学学报, 2019, 46(1):86-92.
	BAI Jing, SHI Yanyan, XUE Peiyun, et al. CFCC Feature Extraction for Fusion of the Power-Law Nonlinearity Function and Spectral Subtraction[J]. Journal of Xidian University, 2019, 46(1):86-92.
[3]	尹向雷, 郑恩让, 马令坤, 等. 基于掩蔽效应的维纳滤波器语音增强及DSP实现[J]. 电子技术应用, 2010, 36(4):123-126.
	YIN Xianglei, ZHENG Enrang, MA Lingkun, et al. Speech Enhancement and DSP Implementation of Wiener Filter Based on Masking Effect[J]. Application of Electronic Technique, 2010, 36(4):123-126.
[4]	WANG Y, NARAYANAN A, WANG D L. On Training Targets for Supervised Speech Separation[J]. IEEE/ACM Transactions on Audio,Speech and Language Processing, 2014, 22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935
[5]	KANG T G, SHIN J W, KIM N S. DNN-Based Monaural Speech Enhancement with Temporal and Spectral Variations Equalization[J]. Digital Signal Processing, 2018, 74:102-110. doi: 10.1016/j.dsp.2017.12.002
[6]	KIM C, STERN R M. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition[J]. IEEE/ACM Transactions on Audio,Speech and Language Processing, 2016, 24(7):1315-1329. doi: 10.1109/TASLP.2016.2545928
[7]	CHEN J T, WANG Y X, WANG D L, et al. A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios[J]. IEEE/ACM Transactions on Audio,Speech and Language Processing, 2014, 22(12):1993-2002. doi: 10.1109/TASLP.2014.2359159
[8]	王雁, 贾海蓉, 吉慧芳, 等. 特征联合优化深度信念网络的语音增强算法[J]. 计算机工程与应用, 2019, 55(9):38-42.
	WANG Yan, JIA Hairong, JI Huifang, et al. Feature Joint Optimization of Deep Belief Network for Speech Enhancement[J]. Computer Engineering and Applications, 2019, 55(9):38-42.
[9]	余琳, 姜囡. 基于Gammatone滤波器的混合特征语音情感识别[J]. 光电技术应用, 2020, 35(3):50-54.
	YU Lin, JIANG Nan. Speech Emotion Recognition with Mixed Features Based on Gammatone Filter[J]. Electro-Optic Technology Application, 2020, 35(3):50-54.
[10]	郭卉, 姜囡, 任杰. 基于MFCC和GFCC混合特征的语音情感识别研究[J]. 光电技术应用, 2019, 34(6):34-39.
	GUO Hui, JIANG Nan, REN Jie. Research on Speech Emotion Recognition Based on Mixed Features of MFCC and GFCC[J]. Electro-Optic Technology Application, 2019, 34(6):34-39.
[11]	李如玮, 孙晓月, 刘亚楠, 等. 基于深度学习的听觉倒谱系数语音增强算法[J]. 华中科技大学学报:自然科学版, 2019, 47(9):78-83.
	LI Ruwei, SUN Xiaoyue, LIU Yanan, et al. Speech Enhancement Based on Auditory Cepstral Coefficient with Deep Learning[J]. Journal of Huazhong University of Science and Technology:Nature Science Edition, 2019, 47(9):78-83.
[12]	贾海蓉, 王卫梅, 吉慧芳. 信噪比信息与时频特征修正相位的语音增强[J]. 西安电子科技大学学报, 2019, 46(5):162-170.
	JIA Hairong, WANG Weimei, JI Huifang. SpeechEnhancement Based on the Modified Phase Using Signal-to-Noise Ratio Information and Time-frequency Characteristics[J]. Journal of Xidian University, 2019, 46(5):162-170.
[13]	ROMERO E, MAZZANTI F, DELGADO J, et al. Weighted Contrastive Divergence[J]. Neural Networks, 2019, 114:147-156. doi: 10.1016/j.neunet.2018.09.013
[14]	HU Y, LOIZOU P C. Evaluation of Objective Quality Measures for Speech Enhancement[J]. IEEE Transactions on Audio,Speech and Language Processing, 2008, 16(1):229-238. doi: 10.1109/TASL.2007.911054
[15]	TAAL C H, HENDRIKS R C, HEUSDENS R, et al. An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech[J]. IEEE Transactions on Audio,Speech and Language Processing, 2011, 19(7):2125-2136. doi: 10.1109/TASL.2011.2114881

噪声	信噪比/dB	PESQ
噪声	信噪比/dB	noisy	实验1	实验2	实验3
factory	-5	1.343 8	1.531 7	1.543 2	1.858 1
	0	1.582 3	1.866 0	2.126 3	2.318 3
	5	1.901 4	2.423 2	2.538 6	2.699 7
white	-5	1.310 9	1.846 5	1.959 4	2.211 3
	0	1.497 7	2.235 6	2.379 7	2.465 2
	5	1.767 7	2.633 6	2.718 1	2.891 2
pink	-5	1.229 7	1.541 8	1.755 8	1.984 2
	0	1.516 5	2.049 6	2.199 2	2.389 4
	5	1.845 1	2.486 2	2.646 7	2.812 3

噪声	信噪比/dB	STOI
噪声	信噪比/dB	noisy	实验1	实验2	实验3
factory	-5	0.562 2	0.627 6	0.647 3	0.682 4
	0	0.666 8	0.755 2	0.789 3	0.794 5
	5	0.780 1	0.866 3	0.879 8	0.887 9
white	-5	0.652 2	0.757 6	0.757 8	0.774 6
	0	0.735 2	0.837 5	0.831 7	0.849 7
	5	0.810 5	0.897 7	0.891 7	0.900 1
pink	-5	0.584 5	0.685 9	0.720 0	0.736 9
	0	0.685 9	0.806 3	0.820 5	0.830 1
	5	0.792 3	0.883 4	0.891 1	0.906 7

Speech enhancement combining the self-adaptive soft mask and mixed features

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 6

References 15

Related Articles 15

Metrics

Comments

Recommended 10

[1]	ZHANG Yang,ZHENG Guotian,ZHANG Jian,PANG Lihua,LUAN Yingzi. Low complexity preamble detection algorithm in the low SNR region [J]. Journal of Xidian University, 2022, 49(2): 1-10.
[2]	LIN Hongbo,MA Yang. Spatially adaptive EPLL denoising for low-frequency seismic random noise [J]. Journal of Xidian University, 2021, 48(6): 204-211.
[3]	LV Wenkai,YANG Pengfei,DING Yunqing,ZHANG Heyu,ZHENG Tianyang. JEDERL:A task scheduling optimization algorithm for heterogeneous computing platforms [J]. Journal of Xidian University, 2021, 48(6): 67-74.
[4]	ZHANG Yuhao,CHENG Peitao,ZHANG Shuhao,WANG Xiumei. Lightweight image super-resolution with the adaptive weight learning network [J]. Journal of Xidian University, 2021, 48(5): 15-22.
[5]	SONG Jianqiao,WANG Feng,NIU Jin,SHI Zezhou,MA Junhui. Potential emotion recognition based on the fusion of the spatio-temporal neural network and facial pulse signals [J]. Journal of Xidian University, 2021, 48(4): 159-167.
[6]	HUI Haisheng,ZHANG Xueying,WU Zelin,LI Fenglian. Method for stroke lesion segmentation using the primary-auxiliary path attention compensation network [J]. Journal of Xidian University, 2021, 48(4): 200-208.
[7]	CAO Yi,CAI Xiaodong. Effective learning strategy for hard samples [J]. Journal of Xidian University, 2021, 48(3): 99-105.
[8]	WANG Ping,JIANG Yuze,ZHAO Guanghui. Object detection based on the multiscale location Enhancement network [J]. Journal of Xidian University, 2021, 48(3): 85-90.
[9]	MEI Shulin,JIA Hairong,WANG Xiaogang,WU Yifeng. Combination of dynamic features with a new mask to optimize neural network speech enhancement [J]. Journal of Xidian University, 2021, 48(3): 91-98.
[10]	GUO Zekun,TIAN Long,HAN Ning,WANG Penghui,LIU Hongwei,CHEN Bo. Radar HRRP based few-shot target recognition with CNN-SSD [J]. Journal of Xidian University, 2021, 48(2): 7-14.
[11]	ZHANG Shudong,GAO Haichang,CAO Xiwen,KANG Shuai. Adaptive fast and targeted adversarial attack for speech recognition [J]. Journal of Xidian University, 2021, 48(1): 168-175.
[12]	ZHOU Yu,CHEN Zhixiong,ZHUO Zepeng,DU Xiaoni. Survey of results of (n,m)-functions against differential power attack [J]. Journal of Xidian University, 2021, 48(1): 50-60.
[13]	DANG Jisheng,YANG Jun. 3D model recognition and segmentation based on multi-feature fusion [J]. Journal of Xidian University, 2020, 47(4): 149-157.
[14]	LI Kunlun,ZHANG Lu,XU Hongke,SONG Huansheng. Waveletdomain dilated network for fast low-dose CT image reconstruction [J]. Journal of Xidian University, 2020, 47(4): 86-93.
[15]	NGUYEN Van-Truong,CAI Jueping,WEI Linyu,CHU Jie. Low complexity probability-based piecewise linear approximation of the sigmoid function [J]. Journal of Xidian University, 2020, 47(3): 58-65.