[1] |
刘文举, 聂帅, 梁山 , 等. 基于深度学习语音分离技术的研究现状与进展[J]. 自动化学报, 2016,42(6):819-833.
doi: 10.16383/j.aas.2016.c150734
|
|
LIU Wenju, NIE Shuai, LIANG Shan , et al. Deep Learning Based Speech Separation Technology and Its Developments[J]. Acta Automatica Sinica, 2016,42(6):819-833.
doi: 10.16383/j.aas.2016.c150734
|
[2] |
WANG D L, CHEN J . Supervised Speech Separation Based on Deep Learning: An Overview[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018,26(10):1702-1726.
|
[3] |
WANG Q, DU J, DAI L R , et al. A Multiobjective Learning and Ensembling Approach to High-performance Speech Enhancement with Compact Neural Network Architectures[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018,26(7):1185-1197.
|
[4] |
WANG Y, WANG D L . Towards Scaling Up Classification-based Speech Separation[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2013,21(7):1381-1390.
|
[5] |
WANG Y, NARAYANAN A, WANG D L . On Training Targets for Supervised Speech Separation[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014,22(12):1849-1858.
|
[6] |
WILLIAMSON D S, WANG D L . Time-frequency Masking in the Complex Domain for Speech Dereverberation and Denoising[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017,25(7):1492-1501.
|
[7] |
XU Y, DU J, DAI L R , et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks[J]. IEEE Signal Processing Letters, 2014,21(1):65-68.
|
[8] |
XU Y, DU J, DAI L R , et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015,23(1):7-19.
|
[9] |
HUANG P S, KIM M, HASEGAWA-JOHNSON M , et al. Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015,23(12):2136-2147.
|
[10] |
WENINGER F, ERDOGAN H, WATANABE S. et al. Speech Enhancement with LSTM Recurrent Neural Networks and Its Application to Noise-robust ASR [C]//Lecture Notes in Computer Science: 9237. Heidelberg: Springer Verlag, 2015: 91-99.
|
[11] |
CHEN J, WANG D . Long Short-term Memory for Speaker Generalization in Supervised Speech Separation[J]. Journal of the Acoustical Society of America, 2017,141(6):4705-4714.
|
[12] |
PARK S R, LEE J M. A Fully Convolutional Neural Network for Speech Enhancement [C]//Proceedings of the 2017 Annual Conference of the International Speech Communication Association. Baixas: International Speech Communication Association, 2017: 1993-1997.
|
[13] |
FU S W, TSAO Y, LU X. SNR-aware Convolutional Neural Network Modeling for Speech Enhancement [C]//Proceedings of the 2016 Annual Conference of the International Speech Communication Association. Baixas: International Speech Communication Association, 2016: 3768-3772.
|
[14] |
LOIZOU P C. Speech Enhancement: Theory and Practice[M]. Boca Raton: CRC Press, 2013.
|
[15] |
COHEN I . Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging[J]. IEEE Transactions on Speech and Audio Processing, 2003,11(5):466-475.
|
[16] |
GAROFOLO J S, LAMEL L F, FISHER W M , et al. TIMIT Acoustic-phonetic Continuous Speech Corpus [EB/OL]. [2018-09-10].https://catalog.ldc.upenn.edu/LDC93S1.
|
[17] |
HU G . 100 Nonspeech Environmental Sounds[EB/OL]. [ 2018- 09- 03]. http://web.cse.ohio-state.edu/pnl/corpus/HuNonspeech/HuCorpus.html.
|
[18] |
VARGA A, STEENEKEN H J M . Assessment for Automatic Speech Recognition: II. NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems[J]. Speech Communication, 1993,12(3):247-251.
|
[19] |
RIX A W, BEERENDS J G, HOLLIER M P. et al. Perceptual Evaluation of Speech Quality (PESQ)-a New Method for Speech Quality Assessment of Telephone Networks and Codecs [C]//Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2001: 749-752.
|
[20] |
TAAL C H, HENDRIKS R C, HEUSDENS R , et al. An Algorithm for Intelligibility Prediction of Time-frequency Weighted Noisy Speech[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2011,19(7):2125-2136.
|