Journal of Xidian University

Previous Articles     Next Articles

MA Hongfei;ZHAO Yuejiao;LIU Ke;LIU Hao

MA Hongfei;ZHAO Yuejiao;LIU Ke;LIU Hao   

  1. (State Key Lab. of Integrated Service Networks, Xidian Univ., Xi'an 710071, China)
  • Received:2016-10-26 Online:2017-10-20 Published:2017-11-29

Abstract:

In order to improve the accuracy of the voiced/unvoiced/silence classification, a new method based on the Stack Autoencoder (SAE) is proposed. This method is implemented with a deep neural network composed of SAE and Softmax. First, the SAE is trained with the unsupervised method using a speech parameter training sequence which includes the residual signal peak, gains, pitch periods, and line spectrum frequency (LSF), while the Softmax is trained with supervision by the use of the output of the SAE with the speech parameters training sequence as its training input. Then a supervised fine-turning method to the deep neural network is conducted to obtain the final parameters of the networks. Test results have shown that the accuracy of the speech classification of the presented method is better than the traditional methods in different background noise conditions with different signal-to-noise ratios (SNR), especially in the low SNR condition.

Key words: deep learning, stack autoencoder, speech processing, speech classification