电子科技 ›› 2023, Vol. 36 ›› Issue (10): 39-55.doi: 10.16180/j.cnki.issn1007-7820.2023.10.006

• • 上一篇    下一篇

基于生成对抗网络的文本生成图像研究综述

李乐阳,佟国香,赵迎志,罗琦   

  1. 上海理工大学 光电信息与计算机工程学院,上海 200093
  • 收稿日期:2022-04-22 出版日期:2023-10-15 发布日期:2023-10-20
  • 作者简介:李乐阳(1997-),男,硕士研究生。研究方向:深度学习、机器学习。|佟国香(1968-),女,博士,副教授。研究方向:人工智能、物联网应用。|赵迎志(1996-),男,硕士研究生。研究方向:超分辨率图像重建。
  • 基金资助:
    国家重点研发计划项目(2018YFB1700902)

A Survey of Text-to-Image Synthesis Based on Generative Adversarial Network

LI Yueyang,TONG Guoxiang,ZHAO Yingzhi,LUO Qi   

  1. School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093,China
  • Received:2022-04-22 Online:2023-10-15 Published:2023-10-20
  • Supported by:
    National Key R&D Program of China(2018YFB1700902)

摘要:

文本生成图像是指将语句形式的文本描述翻译成与文本具有相似语义的图像。在早期研究中,图像生成任务主要基于关键字或语句的检索来实现与文本匹配的视觉内容的对齐。随着生成对抗网络的出现,文本生成图像的方法在视觉真实感、多样性和语义相似性方面取得了重大进展。生成对抗网络通过生成器和鉴别器之间的对抗来生成合理且真实的图像,并在图像修复和超分辨率生成等领域显示出良好的能力。在回顾并总结文本生成图像领域最新研究成果的基础上,文中提出了一种新的分类方法,即注意力增强、多阶段增强、场景布局增强和普适性增强,并讨论了文本生成图像面临的挑战和未来的发展方向。

关键词: 图像生成, 视觉内容对齐, 文本匹配, 生成器, 鉴别器, 语义相似性, 生成对抗网络, 场景布局

Abstract:

The text-to-image synthesis refers to translating the text description in sentence form into an image with similar semantics to the text. In the early research, the task of image generation is mainly based on keyword or sentence retrieval to align the visual content matched with the text. With the generative adversarial network, the method of text-to-image synthesis has made great progress in visual realism, diversity and semantic similarity. The generative adversarial network generates reasonable and real images through the confrontation between generator and discriminator, and shows strong ability in the fields of image restoration and super-resolution generation. Based on the review and summary of the latest research results in the field of text-to-image synthesis, a new classification method is proposed: Attention enhancement, multi-stage enhancement, scene layout enhancement and universality enhancement. The challenges and future development direction of text-to-image synthesis are also discussed in this study.

Key words: image generation, aligning the visual content, text matching, generator, discriminator, semantic similarity, generative adversarial network, scene layout

中图分类号: 

  • TP391