一种采用栈自动编码机的语音分类算法

doi:10.3969/j.issn.1001-2400.2017.05.003

Abstract

Abstract:

In order to improve the accuracy of the voiced/unvoiced/silence classification, a new method based on the Stack Autoencoder (SAE) is proposed. This method is implemented with a deep neural network composed of SAE and Softmax. First, the SAE is trained with the unsupervised method using a speech parameter training sequence which includes the residual signal peak, gains, pitch periods, and line spectrum frequency (LSF), while the Softmax is trained with supervision by the use of the output of the SAE with the speech parameters training sequence as its training input. Then a supervised fine-turning method to the deep neural network is conducted to obtain the final parameters of the networks. Test results have shown that the accuracy of the speech classification of the presented method is better than the traditional methods in different background noise conditions with different signal-to-noise ratios (SNR), especially in the low SNR condition.

Key words: deep learning, stack autoencoder, speech processing, speech classification

MA Hongfei;ZHAO Yuejiao;LIU Ke;LIU Hao. MA Hongfei;ZHAO Yuejiao;LIU Ke;LIU Hao[J].Journal of Xidian University, 2017, 44(5): 13-17.

References

［1］ RABINER L R, SAMBUR M R. Voiced-unvoiced-silence Detection Using the Itakura LPC Distance Measure［C］//Proceedings of the IEEE Acoustics, Speech, and Signal Processing. New York: IEEE, 1977: 323-326.
［2］ BENGIO Y. Learning Deep Architectures for AI［J］. Foundations and Trends in Machine Learning, 2009, 2(1): 1-127.
［3］ LI D. A Tutorial Survey of Architectures, Algorithms, and Applications for Deep Learning［J］. APSIPA Transactions on Signal and Information Processing, 2015, 3: e2.
［4］ VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and Composing Robust Features with Denoising Autoencoders［C］//Proceedings of the 25th International Conference on Machine Learning. New York: ACM, 2008: 1096-1103.
［5］ BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy Layer-wise Training of Deep Networks［C］//Advances in Neural Information Processing Systems. Canada: Neural Information Processing System Foundation, 2007: 153-160.
［6］ DENG J, ZHANG Z, MARCHI E, et al. Sparse Autoencoder-based Feature Transfer Learning for Speech Emotion Recognition［C］//Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction. Washington: IEEE Computer Society, 2013: 511-516.
［7］ HINTON G E, OSINDERO S, TEH Y W. A Fast Learning Algorithm for Deep Belief Nets［J］. Neural Computation, 2006, 18(7): 1527-1554.
［8］ NG A. Sparse Autoencoder［J］. CS294A Lecture Notes, 2011, 72: 1-19.
［9］ TIKHONOV A N. Solution of Incorrectly Formulated Problems and the Regularization Method［J］. Soviet Math Dokl, 1962, 5: 1035-1038.
［10］ TANG K W, SKORIN-KAPOV J. Training Artificial Neural Networks: Backpropagation via Nonlinear Optimization［J］. Cit Journal of Computing & Information Technology, 2001, 9(1): 1-14.
［11］ BARNARD E. Optimization for Training Neural Nets［J］. IEEE Transactions on Neural Networks, 1992, 3(2): 232-240.
［12］ SUPPLEE L M, COHN R P, COLLURA J S, et al. MELP: the New Federal Standard at 2400bps［C］//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing: 2. Piscataway: IEEE, 1997: 1591-1594.
［13］ GAROFOLO J S, LAMEL L F, FISHER W M, et al. DARPA TIMIT Acoustic-phonetic Continous Speech Corpus CD-ROM. NIST Speech Disc 1-1. 1: NASA STI/Recon Technical Report N 93, 27403［R］. Washington: NASA, 1993.

[1]	LIU Jiawei,ZHANG Wenhui,KOU Xiaoli,LI Yanni. Harnessing adversarial examples via input denoising and hidden information restoring [J]. Journal of Xidian University, 2021, 48(6): 23-31.
[2]	LI Peng,FENG Cunqian,XU Xuguang,TANG Zixiang. Ballistic target fretting classification network based on Bayesian optimization [J]. Journal of Xidian University, 2021, 48(5): 139-148.
[3]	YAN Jia,CAO Yudong,REN Jiaxing,CHEN Donghao,LI Xiaohui. Deep asymmetric compression Hashing algorithm [J]. Journal of Xidian University, 2021, 48(5): 212-221.
[4]	NING Yang,DU Jianchao,HAN Shuo,YANG Chuankai. Fire segmentation based on the improved DeeplabV3+ and the analytical method for fire development [J]. Journal of Xidian University, 2021, 48(5): 38-46.
[5]	ZHOU Peng,YANG Jun. Semantic segmentation of remote sensing images based on neural architecture search [J]. Journal of Xidian University, 2021, 48(5): 47-57.
[6]	QI Yanjun,KONG Yueping,WANG Jiajing,ZHU Xudong. Gait recognition method combining LSTM and CNN [J]. Journal of Xidian University, 2021, 48(5): 78-85.
[7]	SONG Jianfeng,MIAO Qiguang,WANG Chongxiao,XU Hao,YANG Jin. Multi-scale single object tracking based on the attention mechanism [J]. Journal of Xidian University, 2021, 48(5): 110-116.
[8]	ZHANG Yuhao,CHENG Peitao,ZHANG Shuhao,WANG Xiumei. Lightweight image super-resolution with the adaptive weight learning network [J]. Journal of Xidian University, 2021, 48(5): 15-22.
[9]	HUI Haisheng,ZHANG Xueying,WU Zelin,LI Fenglian. Method for stroke lesion segmentation using the primary-auxiliary path attention compensation network [J]. Journal of Xidian University, 2021, 48(4): 200-208.
[10]	SUN Haojie,LI Miaoyu,ZHANG Panpan,XU Pengfei. Self-supervised facial asymmetry learning for automatic evaluation of facial paralysis [J]. Journal of Xidian University, 2021, 48(3): 115-122.
[11]	ZHANG Hua,GAO Haoran,YANG Xingguo,LI Wenmin,GAO Fei,WEN Qiaoyan. TargetedFool:an algorithm for achieving targeted attacks [J]. Journal of Xidian University, 2021, 48(1): 149-159.
[12]	YANG Hongyu,ZENG Renyun. Method for assessment of network security situation with deep learning [J]. Journal of Xidian University, 2021, 48(1): 183-190.
[13]	ZHANG Lu,SUN Rong,LIU Jingwei. Cloned piggybacking framework for distributed storage [J]. Journal of Xidian University, 2020, 47(6): 139-147.
[14]	HU Jianwei,ZHAO Wei,CUI Yanpeng,CUI Junjie. PHP code vulnerability mining technology based on theimproved ASTNN [J]. Journal of Xidian University, 2020, 47(6): 164-173.
[15]	GUO Liujun,ZHANG Xueying,CHEN Guijun. Deep linear discriminant analysis for two-stage brain-controlled character spelling decoding [J]. Journal of Xidian University, 2020, 47(4): 109-116.

MA Hongfei;ZHAO Yuejiao;LIU Ke;LIU Hao

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 10