Electronic Science and Technology ›› 2019, Vol. 32 ›› Issue (9): 76-79.doi: 10.16180/j.cnki.issn1007-7820.2019.09.016

Previous Articles    

Research on Perception Quantification-based Neural Speech Synthesis Methods

LIU Qingfeng,JIANG Yuan,HU Yajun,LIU Lijuan   

  1. National Engineering Laboratory for Speech and Language Information Processing,Hefei 230027,China
  • Received:2019-06-24 Online:2019-09-15 Published:2019-09-19
  • Supported by:
    National Natural Science Foundation of China(61871358)


Current neural network based speech synthesis framework is designed for single speaker, requiring at least a few hours training, and cannot make use of speech data from different speakers, languages, styles. To address this problem, a perception quantification-based neural network speech synthesis method was proposed. In the proposed method, a perception quantification-based model was designed to learn the representations for different attributes of speech. A unified acoustic model was built using the learnt perception quantification representations for different speakers, languages and styles. An adaptation method was introduced to transfer the knowledge from the unified acoustic model to new speakers with limited speech data. The proposed method could effectively control the speaker, language, and style of synthetic speech, achieve cross-language, cross-style speech synthesis, and the adaptation method could reduce the demand for training data to a few minutes. The proposed methods significantly improved the quality and flexibility of speech synthesis systems, and the naturalness of synthesized speech is similar to or better than an average mandarin speaker.

Key words: speech synthesis, perception quantification, neural networks, limited data, cross-language, style control

CLC Number: 

  • TN912.33