一种时频平滑的深度神经网络语音增强方法

doi:10.19665/j.issn1001-2400.2019.04.018

Abstract

Abstract:

In the existing speech enhancement methods based on the deep neural network, the characteristics of speech enhancement problem are not fully considered in the design of the network structure. In view of this problem, based on the different characteristics of speech enhancement in time and frequency, inspired by the feature calculation method in the traditional speech enhancement methods, a time-frequency smoothing network with different processings in time and frequency is designed. In this network, a gated recurrent unit is used to express the correlation of noisy speech with time, and a convolutional neural network is used to express the correlation of the noisy speech with frequency, which realizes a time-frequency smoothing process similar to that of the traditional speech enhancement methods. Experimental results show that the proposed time-frequency smoothing network can significantly improve the speech enhancement performance compared with other networks under the premise of ensuring the causality of the speech enhancement system and that the enhanced speech has a better speech quality and intelligibility.

Key words: speech enhancement, time-frequency smoothing, convolutional neural network, deep neural network

CLC Number:

TN912.3

YUAN Wenhao,LIANG Chunyan,LOU Yingxi,FANG Chao,WANG Zhiqiang. Speech enhancement method based on the time-frequency smoothing deep neural network[J].Journal of Xidian University, 2019, 46(4): 130-136.

Figures/Tables 5

References 20

[1]	刘文举, 聂帅, 梁山 , 等. 基于深度学习语音分离技术的研究现状与进展[J]. 自动化学报, 2016,42(6):819-833. doi: 10.16383/j.aas.2016.c150734
	LIU Wenju, NIE Shuai, LIANG Shan , et al. Deep Learning Based Speech Separation Technology and Its Developments[J]. Acta Automatica Sinica, 2016,42(6):819-833. doi: 10.16383/j.aas.2016.c150734
[2]	WANG D L, CHEN J . Supervised Speech Separation Based on Deep Learning: An Overview[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018,26(10):1702-1726.
[3]	WANG Q, DU J, DAI L R , et al. A Multiobjective Learning and Ensembling Approach to High-performance Speech Enhancement with Compact Neural Network Architectures[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018,26(7):1185-1197.
[4]	WANG Y, WANG D L . Towards Scaling Up Classification-based Speech Separation[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2013,21(7):1381-1390.
[5]	WANG Y, NARAYANAN A, WANG D L . On Training Targets for Supervised Speech Separation[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014,22(12):1849-1858.
[6]	WILLIAMSON D S, WANG D L . Time-frequency Masking in the Complex Domain for Speech Dereverberation and Denoising[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017,25(7):1492-1501.
[7]	XU Y, DU J, DAI L R , et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks[J]. IEEE Signal Processing Letters, 2014,21(1):65-68.
[8]	XU Y, DU J, DAI L R , et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015,23(1):7-19.
[9]	HUANG P S, KIM M, HASEGAWA-JOHNSON M , et al. Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015,23(12):2136-2147.
[10]	WENINGER F, ERDOGAN H, WATANABE S. et al. Speech Enhancement with LSTM Recurrent Neural Networks and Its Application to Noise-robust ASR [C]//Lecture Notes in Computer Science: 9237. Heidelberg: Springer Verlag, 2015: 91-99.
[11]	CHEN J, WANG D . Long Short-term Memory for Speaker Generalization in Supervised Speech Separation[J]. Journal of the Acoustical Society of America, 2017,141(6):4705-4714.
[12]	PARK S R, LEE J M. A Fully Convolutional Neural Network for Speech Enhancement [C]//Proceedings of the 2017 Annual Conference of the International Speech Communication Association. Baixas: International Speech Communication Association, 2017: 1993-1997.
[13]	FU S W, TSAO Y, LU X. SNR-aware Convolutional Neural Network Modeling for Speech Enhancement [C]//Proceedings of the 2016 Annual Conference of the International Speech Communication Association. Baixas: International Speech Communication Association, 2016: 3768-3772.
[14]	LOIZOU P C. Speech Enhancement: Theory and Practice[M]. Boca Raton: CRC Press, 2013.
[15]	COHEN I . Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging[J]. IEEE Transactions on Speech and Audio Processing, 2003,11(5):466-475.
[16]	GAROFOLO J S, LAMEL L F, FISHER W M , et al. TIMIT Acoustic-phonetic Continuous Speech Corpus [EB/OL]. [2018-09-10].https://catalog.ldc.upenn.edu/LDC93S1.
[17]	HU G . 100 Nonspeech Environmental Sounds[EB/OL]. [ 2018- 09- 03]. http://web.cse.ohio-state.edu/pnl/corpus/HuNonspeech/HuCorpus.html.
[18]	VARGA A, STEENEKEN H J M . Assessment for Automatic Speech Recognition: II. NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems[J]. Speech Communication, 1993,12(3):247-251.
[19]	RIX A W, BEERENDS J G, HOLLIER M P. et al. Perceptual Evaluation of Speech Quality (PESQ)-a New Method for Speech Quality Assessment of Telephone Networks and Codecs [C]//Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2001: 749-752.
[20]	TAAL C H, HENDRIKS R C, HEUSDENS R , et al. An Algorithm for Intelligibility Prediction of Time-frequency Weighted Noisy Speech[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2011,19(7):2125-2136.

噪声	信噪比/dB	含噪语音	全连接神经网络	门控循环单元	时频平滑网络
N₁	-7	1.62	1.65	2.06	2.24
	0	2.08	2.25	2.65	2.78
	7	2.54	2.68	3.09	3.21
N₂	-7	1.29	1.24	1.67	1.98
	0	1.63	1.79	2.24	2.47
	7	2.08	2.32	2.73	2.90
N₃	-7	1.49	1.45	1.78	2.10
	0	1.81	1.92	2.32	2.62
	7	2.21	2.42	2.80	3.08
N₄	-7	1.30	1.23	1.59	2.02
	0	1.57	1.69	2.07	2.51
	7	1.97	2.17	2.56	2.92

噪声	信噪比/dB	含噪语音	全连接神经网络	门控循环单元	时频平滑网络
N₁	-7	0.61	0.58	0.71	0.71
	0	0.76	0.73	0.84	0.84
	7	0.87	0.82	0.91	0.90
N₂	-7	0.48	0.44	0.58	0.61
	0	0.63	0.61	0.75	0.76
	7	0.79	0.75	0.86	0.86
N₃	-7	0.53	0.45	0.60	0.67
	0	0.69	0.64	0.78	0.82
	7	0.84	0.80	0.88	0.90
N₄	-7	0.52	0.46	0.63	0.67
	0	0.69	0.65	0.78	0.80
	7	0.84	0.79	0.88	0.88

Speech enhancement method based on the time-frequency smoothing deep neural network

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 5

References 20

Related Articles 15

Metrics

Comments

Recommended 10

[1]	YU Haoyang,YIN Liang,LI Shufang,LV Shun. Recognition algorithm for the little sample radar modulation signal based on the generative adversarial network [J]. Journal of Xidian University, 2021, 48(6): 96-104.
[2]	SUN Yanjing,WEI Li,ZHANG Nianlong,YUN Xiao,DONG Kaiwen,GE Min,CHENG Xiaozhou,HOU Xiaofeng. Person re-identification method combining the DD-GAN and Global feature in a coal mine [J]. Journal of Xidian University, 2021, 48(5): 201-211.
[3]	ZHOU Peng,YANG Jun. Semantic segmentation of remote sensing images based on neural architecture search [J]. Journal of Xidian University, 2021, 48(5): 47-57.
[4]	ZHANG Shuwei,LI Junmin. Human body detection algorithm in complex monitoring scenes [J]. Journal of Xidian University, 2021, 48(5): 68-77.
[5]	YANG Yunhang,MIN Lianquan. Multi-scalefusion sketch recognition model by dilated convolution [J]. Journal of Xidian University, 2021, 48(5): 92-99.
[6]	CHEN Changchuan,WANG Haining,HUANG Lian,HUANG Tao,LI Lianjie,HUANG Xiangkang,DAI Shaosheng. Facial expression recognition based on local representation [J]. Journal of Xidian University, 2021, 48(5): 100-109.
[7]	SONG Jianfeng,MIAO Qiguang,WANG Chongxiao,XU Hao,YANG Jin. Multi-scale single object tracking based on the attention mechanism [J]. Journal of Xidian University, 2021, 48(5): 110-116.
[8]	ZHANG Yuhao,CHENG Peitao,ZHANG Shuhao,WANG Xiumei. Lightweight image super-resolution with the adaptive weight learning network [J]. Journal of Xidian University, 2021, 48(5): 15-22.
[9]	HUI Haisheng,ZHANG Xueying,WU Zelin,LI Fenglian. Method for stroke lesion segmentation using the primary-auxiliary path attention compensation network [J]. Journal of Xidian University, 2021, 48(4): 200-208.
[10]	WANG Ping,JIANG Yuze,ZHAO Guanghui. Object detection based on the multiscale location Enhancement network [J]. Journal of Xidian University, 2021, 48(3): 85-90.
[11]	CAO Yi,CAI Xiaodong. Effective learning strategy for hard samples [J]. Journal of Xidian University, 2021, 48(3): 99-105.
[12]	MEI Shulin,JIA Hairong,WANG Xiaogang,WU Yifeng. Combination of dynamic features with a new mask to optimize neural network speech enhancement [J]. Journal of Xidian University, 2021, 48(3): 91-98.
[13]	GUO Zekun,TIAN Long,HAN Ning,WANG Penghui,LIU Hongwei,CHEN Bo. Radar HRRP based few-shot target recognition with CNN-SSD [J]. Journal of Xidian University, 2021, 48(2): 7-14.
[14]	LIU Jieyi,GONG Maoguo,ZHAN Tao,LI Hao,ZHANG Mingyang. Method for discrimination of false targets in multistation radar systems based on the deep neural network [J]. Journal of Xidian University, 2021, 48(2): 133-138.
[15]	ZHANG Hua,GAO Haoran,YANG Xingguo,LI Wenmin,GAO Fei,WEN Qiaoyan. TargetedFool:an algorithm for achieving targeted attacks [J]. Journal of Xidian University, 2021, 48(1): 149-159.