Current neural network based speech synthesis framework is designed for single speaker, requiring at least a few hours training, and cannot make use of speech data from different speakers, languages, styles. To address this problem, a perception quantification-based neural network speech synthesis method was proposed. In the proposed method, a perception quantification-based model was designed to learn the representations for different attributes of speech. A unified acoustic model was built using the learnt perception quantification representations for different speakers, languages and styles. An adaptation method was introduced to transfer the knowledge from the unified acoustic model to new speakers with limited speech data. The proposed method could effectively control the speaker, language, and style of synthetic speech, achieve cross-language, cross-style speech synthesis, and the adaptation method could reduce the demand for training data to a few minutes. The proposed methods significantly improved the quality and flexibility of speech synthesis systems, and the naturalness of synthesized speech is similar to or better than an average mandarin speaker.