电子科技 ›› 2022, Vol. 35 ›› Issue (4): 20-27.doi: 10.16180/j.cnki.issn1007-7820.2022.04.004

• • 上一篇    下一篇

基于深度卷积特征的场景全局与局部表示方法

林潮威,李菲菲,陈虬   

  1. 上海理工大学 光电信息与计算机工程学院,上海 200093
  • 收稿日期:2020-11-21 出版日期:2022-04-15 发布日期:2022-04-15
  • 作者简介:林潮威(1996-),男,硕士研究生。研究方向:图像处理与深度学习。|李菲菲(1970-),女,博士,教授。研究方向:多媒体信息处理、图像处理与模式识别、信息检索。|陈虬(1972-),男,博士,教授,博士生导师。研究方向:图像处理与模式识别、计算机视觉、信息检索。
  • 基金资助:
    上海市高校特聘教授(东方学者)岗位计划(ES2015XX)

Globaland Local Scene Representation Method Based on Deep Convolutional Features

Chaowei LIN,Feifei LI,Qiu CHEN   

  1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China
  • Received:2020-11-21 Online:2022-04-15 Published:2022-04-15
  • Supported by:
    The Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning(ES2015XX)

摘要:

场景识别是计算机视觉研究中的一项基本任务。与图像分类不同,场景识别需要综合考虑场景的背景信息、局部场景特征以及物体特征等因素,导致经典卷积神经网络在场景识别上性能欠佳。为解决此问题,文中提出了一种基于深度卷积特征的场景全局与局部表示方法。此方法对场景图片的卷积特征进行变换从而为每张图片生成一个综合的特征表示。使用CAM获取局部关键区域,利用LSTM对局部区域的卷积特征进行编码形成场景图片的局部表示;通过注意力机制融合场景特征与物体特征形成场景图片的全局表示。最后,在MIT indoor 67场景识别数据集上进行实验,结果显示采用文中所提方法取得了87.59%的识别准确度。

关键词: 场景识别, 卷积神经网络, 卷积特征, 特征变换, 类激活图, 长短期记忆, 注意力机制, 端到端网络

Abstract:

Scene Recognition is a fundamental task in computer vision. Different from image classification, scene recognition needs to take a comprehensive consideration of factors such as global layout information, local scene features, and object features, which leads to the poor performance of classic convolutional neural network for scene recognition. In order to solve this issue, this study proposes a global and local scene representation method based on deep convolutional features. The proposed method transforms deep convolutional features of scene image to generate a comprehensive representation for each image. Specifically, CAM is used to discovery local key regions, and LSTM is used to encode convolutional features extracted from local key regions to produce the local representation for scene images. Attention mechanism is adopted to fuse scene features and object features to form a global representation for scene images. Finally, the evaluation experiments are conducted on MIT indoor 67 data set and the results show that the test accuracy is up to 87.59% using the proposed method.

Key words: scene recognition, convolutional neural networks, convolutional features, feature transform, CAM, LSTM, attention mechanism, end-to-end network

中图分类号: 

  • TP391