Journal of Xidian University ›› 2019, Vol. 46 ›› Issue (2): 152-157.doi: 10.19665/j.issn1001-2400.2019.02.025

Previous Articles     Next Articles

CNN image caption generation

LI Yong1,2,3,CHENG Honghong1,2,3,LIANG Xinyan1,2,3,GUO Qian1,2,3,QIAN Yuhua1,2,3   

  1. 1. Research Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China
    2. Key Lab. of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
    3. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
  • Received:2018-09-22 Online:2019-04-20 Published:2019-04-20

Abstract:

The image caption generation task needs to generate a meaningful sentence which can accurately describe the content of the image. Existing research usually uses the convolutional neural network to encode image information and the recurrent neural network to encode text information, due to the “serial character” of the recurrent neural network which result in the low performance. In order to solve this problem, the model we proposed is completely based on the convolutional neural network. We use different convolutional neural networks to process the data of two modals simultaneously. Benefiting from the “parallel character” of convolution operation, the efficiency of the operation has been significantly improved, and experiments have been carried out on two public data sets. Experimental results have also been improved in the specified evaluation indexes, which indicates the effectiveness of the model for processing the image caption generation task.

Key words: multi-modal data, image caption, long short term memory, neural networks

CLC Number: 

  • TP183