Electronic Science and Technology ›› 2023, Vol. 36 ›› Issue (10): 39-55.doi: 10.16180/j.cnki.issn1007-7820.2023.10.006

Previous Articles     Next Articles

A Survey of Text-to-Image Synthesis Based on Generative Adversarial Network

LI Yueyang,TONG Guoxiang,ZHAO Yingzhi,LUO Qi   

  1. School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093,China
  • Received:2022-04-22 Online:2023-10-15 Published:2023-10-20
  • Supported by:
    National Key R&D Program of China(2018YFB1700902)

Abstract:

The text-to-image synthesis refers to translating the text description in sentence form into an image with similar semantics to the text. In the early research, the task of image generation is mainly based on keyword or sentence retrieval to align the visual content matched with the text. With the generative adversarial network, the method of text-to-image synthesis has made great progress in visual realism, diversity and semantic similarity. The generative adversarial network generates reasonable and real images through the confrontation between generator and discriminator, and shows strong ability in the fields of image restoration and super-resolution generation. Based on the review and summary of the latest research results in the field of text-to-image synthesis, a new classification method is proposed: Attention enhancement, multi-stage enhancement, scene layout enhancement and universality enhancement. The challenges and future development direction of text-to-image synthesis are also discussed in this study.

Key words: image generation, aligning the visual content, text matching, generator, discriminator, semantic similarity, generative adversarial network, scene layout

CLC Number: 

  • TP391