动态特征联合新掩模优化神经网络语音增强

doi:10.19665/j.issn1001-2400.2021.03.012

Abstract

Abstract:

Concerning the problem that the Neural Network speech enhancement algorithm cannot fully represent the nonlinear structure of speech due to feature selection,which leads to speech distortion.This paper proposes the combination of dynamic features with a new mask to optimize neural network speech enhancement.First,three features of noisy speech are extracted and spliced to obtain static features.Then,the first and second difference derivatives are obtained to capture the instantaneous signals of speech and fuse them into dynamic features.The combination of dynamic and static features completes internal complementarity of features and reduced speech distortion.Second,in order to enhance the intelligibility and clarity of speech at the same time,an adaptive mask is proposed,which can adjust the energy ratio of speech and noise as well as the ratio of the traditional mask and the square root mask.The Gammatone channel weight is used to modify the mask value in each channel to simulate the human auditory system and further improve the speech intelligibility.Finally,the simulation of multiple voices under different noise backgrounds shows that compared with different literature algorithms,the algorithm has a higher SNR,subjective speech quality and short-term objective intelligibility,which verifies the effectiveness of the algorithm.

Key words: dynamic characteristics, adaptive mask, speech enhancement, Neural Network

CLC Number:

TN912.35

MEI Shulin,JIA Hairong,WANG Xiaogang,WU Yifeng. Combination of dynamic features with a new mask to optimize neural network speech enhancement[J].Journal of Xidian University, 2021, 48(3): 91-98.

Figures/Tables 9

References 17

[1]	贾海蓉, 王卫梅, 王雁, 等. 区分性联合稀疏字典交替优化的语音增强[J]. 西安电子科技大学学报, 2019,46(3):74-81.
	JIA Hairong, WANG Weimei, WANG Yan, et al. Speech Enhancement Based on Discriminative Joint Sparse Dictionaryalternate Optimization[J]. Journal of Xidian University, 2019,46(3):74-81.
[2]	袁文浩, 娄迎曦, 梁春燕, 等. 感知联合优化的深度神经网络语音增强方法[J]. 西安电子科技大学学报, 2019,46(2):89-94.
	YUAN Wenhao, LOU Yingxi, LIANG Chunyan, et al. Speech Enhancement Method Based on the Perceptual Joint Optimization Deep Neural Network[J]. Journal of Xidian University, 2019,46(2):89-94.
[3]	MOHAMMADIHA N, SMARAGDIS P, LEIJON A. Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization[J]. IEEE Transactions on Audio,Speech,and Language Processing, 2013,21(10):2140-2151. doi: 10.1109/TASL.2013.2270369
[4]	WANG Y, NARAYANAN A, WANG D L. On Training Targets for Supervised Speech Separation[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2014,22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935
[5]	李保明, 付小宁. 基于理想组合掩蔽的监督性语音增强算法[J]. 计算机科学与应用, 2018,8(4):546-552.
	LI Baoming, FU Xiaoning. Supervised Speech Enhancement Algorithm Based on Phase Spectrum Estimation[J]. Computer Science and Application, 2018,8(4):546-552.
[6]	王雁, 贾海蓉, 吉慧芳, 等. 特征联合优化深度信念网络的语音增强算法[J]. 计算机工程与应用, 2019,55(9):38-42.
	WANG Yan, JIA Hairong, JI Huifang, et al. Feature Joint Optimization of Deep Belief Network for Speech Enhancement[J]. Computer Engineering and Applications, 2019,55(9):38-42.
[7]	BAO F, ABDULLA W H. Noise Masking Method Based on an Effective Ratio Mask Estimation in Gammatone Channels[J]. APSIPA Transactions on Signal and Information Processing, 2018,7:1-12.
[8]	郭欣, 贾海蓉, 王栋. 利用子空间改进的K-SVD语音增强算法[J]. 西安电子科技大学学报, 2016,43(6):109-115.
	GUO Xin, JIA Hairong, WANG Dong. Speech Enhancement Using the Improved K-SVD Algorithm by Subspace[J]. Journal of Xidian University, 2016,43(6):109-115.
[9]	LI R, SUN X, LIU Y, et al. Multi-resolution Auditory Cepstral Coefficient and Adaptive Mask for Speech Enhancement with Deep Neural Network[J]. Eurasip Journal on Advances in Signal Processing, 2019,2019(1):22. doi: 10.1186/s13634-019-0618-4
[10]	British Standards Institution. Specification for Normal Equal-loudness Level Contours for Pure Tones Under Free-field Listening Conditions:BS-3383:1988[S]. 1988.
[11]	白静, 史燕燕, 薛珮芸, 等. 融合非线性幂函数和谱减法的CFCC特征提取[J]. 西安电子科技大学学报, 2019,46(1):86-92.
	BAI Jing, SHI Yanyan, XUE Peiyun, et al. CFCC Feature Extraction for Fusion of the Power-law Nonlinearity Function and Spectral Subtraction[J]. Journal of Xidian University, 2019,46(1):86-92.
[12]	XU Y, DU J, DAI D R, et al. A regression Approach to Speech Enhancement Based on Deep Neural Network[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2015,23(1):7-19. doi: 10.1109/TASLP.6570655
[13]	刘文举, 聂帅, 梁山, 等. 基于深度学习语音分离技术的研究现状与进展[J]. 自动化学报, 2016,42(6):819-833.
	LIU Wenjiu, NIE Shuai, LIANG Shan, et al. Deep Learning Based Speech Separation Technology and Its Developments[J]. Acta Automatica Sinica, 2016,42(6):819-833.
[14]	BAO F, ABDULLA W H. A New Time-frequency Binary Mask Estimation Method Based on Convex Optimization of Speech Power[J]. Speech Communication, 2018,97:51-65. doi: 10.1016/j.specom.2018.01.002
[15]	HE Q, BAO F, BAO C. Multiplicative Update of Auto-regressive Gains for Codebook-based Speech Enhancement[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2017,25(3):457-468. doi: 10.1109/TASLP.2016.2636445
[16]	袁文浩, 梁春燕, 娄迎曦, 等. 一种时频平滑的深度神经网络语音增强方法[J]. 西安电子科技大学学报, 2019,46(4):130-136.
	YUAN Wenhao, LIANG Chunyan, LOU Yingxi, et al. Speech Enhancement Method Based on the Time-frequency Smoothing Deep Neural Network[J]. Journal of Xidian University, 2019,46(4):130-136.
[17]	BAO F, ABDULLA W H. A New Ratio Mask Representation for CASA-based Speech Enhancement[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2019,27(1):7-19. doi: 10.1109/TASLP.2018.2868407

噪声	信噪比	SegSNR
噪声	信噪比	带噪语音	算法1	算法2	算法3
	10	-16.201 5	2.462 3	2.530 9	3.934 6
	5	-20.808 9	0.484 8	0.546 7	1.648 9
F16噪声	0	-25.111 2	-1.937 2	-1.839 4	-0.965 9
	-5	-29.335 9	-6.660 9	-6.423 7	-5.423 4
	-10	-33.786 7	-13.235 6	-13.567 2	-12.013 4
	10	-15.912 8	0.860 0	1.537 8	2.367 4
	5	-20.593 7	-1.876 5	-1.456 9	0.244 2
Babble噪声	0	-25.050 4	-6.401 2	-5.589	-4.578 0
	-5	-28.822 8	-7.881 1	-6.889 8	-6.035 6
	-10	-32.134 5	-14.125 6	-13.126 7	-12.332 4

噪声	信噪比/dB	PESQ
噪声	信噪比/dB	带噪语音	算法1	算法2	算法3
	10	2.197 6	2.623 8	2.666 7	2.719 7
	5	2.031 2	2.121 4	2.353 9	2.665 8
F16噪声	0	1.674 7	2.016 8	2.122 6	2.428 4
	-5	1.462 7	1.950 6	1.523 9	2.261 5
	-10	1.121 4	1.210 7	1.443 7	2.183 7
	10	2.235 1	2.558 5	2.602 0	2.610 3
	5	1.784 7	2.329 4	2.564 5	2.590 9
Babble噪声	0	1.422 7	1.796 8	2.110 3	2.253 8
	-5	1.143 7	1.567 8	1.994 7	2.167 4
	-10	0.986 9	1.099 4	1.234 8	2.010 2

噪声	信噪比/dB	STOI
噪声	信噪比/dB	带噪语音	算法1	算法2	算法3
	10	0.800 0	0.831 0	0.870 4	0.906 1
	5	0.769 3	0.800 1	0.818 9	0.859 9
F16噪声	0	0.712 0	0.766 8	0.794 0	0.814 0
	-5	0.666 3	0.702 0	0.759 0	0.775 0
	-10	0.643 0	0.660 0	0.678 9	0.696 3
	10	0.799 4	0.829 6	0.875 1	0.897 6
	5	0.752 0	0.812 1	0.823 0	0.864 3
Babble噪声	0	0.695 6	0.835 2	0.836 1	0.836 6
	-5	0.676 6	0.719 2	0.744 1	0.787 3
	-10	0.610 7	0.655 5	0.660 0	0.694 2

Combination of dynamic features with a new mask to optimize neural network speech enhancement

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 17

Related Articles 15

Metrics

Comments

Recommended 10

[1]	LV Wenkai,YANG Pengfei,DING Yunqing,ZHANG Heyu,ZHENG Tianyang. JEDERL:A task scheduling optimization algorithm for heterogeneous computing platforms [J]. Journal of Xidian University, 2021, 48(6): 67-74.
[2]	YU Haoyang,YIN Liang,LI Shufang,LV Shun. Recognition algorithm for the little sample radar modulation signal based on the generative adversarial network [J]. Journal of Xidian University, 2021, 48(6): 96-104.
[3]	HU Daiwang,JIAO Yiyuan,LI Yanni. Novel and efficient algorithm for entity relation extraction with the corpus knowledge graph [J]. Journal of Xidian University, 2021, 48(6): 75-83.
[4]	SUN Yanjing,WEI Li,ZHANG Nianlong,YUN Xiao,DONG Kaiwen,GE Min,CHENG Xiaozhou,HOU Xiaofeng. Person re-identification method combining the DD-GAN and Global feature in a coal mine [J]. Journal of Xidian University, 2021, 48(5): 201-211.
[5]	ZHOU Peng,YANG Jun. Semantic segmentation of remote sensing images based on neural architecture search [J]. Journal of Xidian University, 2021, 48(5): 47-57.
[6]	ZHANG Shuwei,LI Junmin. Human body detection algorithm in complex monitoring scenes [J]. Journal of Xidian University, 2021, 48(5): 68-77.
[7]	YANG Yunhang,MIN Lianquan. Multi-scalefusion sketch recognition model by dilated convolution [J]. Journal of Xidian University, 2021, 48(5): 92-99.
[8]	DONG Ruchan,JIAO Licheng,ZHAO Jin,SHEN Weiyan. Application of the deep fusion mechanism in object detection of remote sensing images [J]. Journal of Xidian University, 2021, 48(5): 128-138.
[9]	CHENG De,HAO Yi,ZHOU Jingyu,WANG Nannan,GAO Xinbo. Cross-modality person re-identification utilizing the hybrid two-stream neural networks [J]. Journal of Xidian University, 2021, 48(5): 190-200.
[10]	CHEN Changchuan,WANG Haining,HUANG Lian,HUANG Tao,LI Lianjie,HUANG Xiangkang,DAI Shaosheng. Facial expression recognition based on local representation [J]. Journal of Xidian University, 2021, 48(5): 100-109.
[11]	SONG Jianfeng,MIAO Qiguang,WANG Chongxiao,XU Hao,YANG Jin. Multi-scale single object tracking based on the attention mechanism [J]. Journal of Xidian University, 2021, 48(5): 110-116.
[12]	ZHANG Yuhao,CHENG Peitao,ZHANG Shuhao,WANG Xiumei. Lightweight image super-resolution with the adaptive weight learning network [J]. Journal of Xidian University, 2021, 48(5): 15-22.
[13]	HAN Yongsai,MA Shiping,HE Linyuan,LI Chenghao,ZHU Mingming,ZHANG Fei. Detection of the object in the fast remote sensing airport area on the improved YOLOv3 [J]. Journal of Xidian University, 2021, 48(5): 156-166.
[14]	HUI Haisheng,ZHANG Xueying,WU Zelin,LI Fenglian. Method for stroke lesion segmentation using the primary-auxiliary path attention compensation network [J]. Journal of Xidian University, 2021, 48(4): 200-208.
[15]	SONG Jianqiao,WANG Feng,NIU Jin,SHI Zezhou,MA Junhui. Potential emotion recognition based on the fusion of the spatio-temporal neural network and facial pulse signals [J]. Journal of Xidian University, 2021, 48(4): 159-167.