电子科技 ›› 2023, Vol. 36 ›› Issue (11): 19-27.doi: 10.16180/j.cnki.issn1007-7820.2023.11.004

• • 上一篇    下一篇

基于深度迁移学习与多尺度特征融合的场景识别方法

王桥,胡春燕,李菲菲   

  1. 上海理工大学 光电信息与计算机工程学院,上海 200093
  • 收稿日期:2022-05-24 出版日期:2023-11-15 发布日期:2023-11-20
  • 作者简介:王桥(1993-),男,硕士研究生。研究方向:计算机视觉与模式识别。|胡春燕(1976-),女,讲师。研究方向: 图像处理与模式识别、计算机视觉等。|李菲菲(1970-),女,博士,教授。研究方向: 多媒体信息处理、图像处理与模式识别、信息检索等。
  • 基金资助:
    上海市高校特聘教授(东方学者)岗位计划(ES2015XX)

Scene Recognition Algorithm Based on Deep Transfer Learning and Multi-Scale Feature Fusion

WANG Qiao,HU Chunyan,LI Feifei   

  1. School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
  • Received:2022-05-24 Online:2023-11-15 Published:2023-11-20
  • Supported by:
    Professor of Special Appointment(EasternScholar) at Shanghai Institutions of Higher Learning(ES2015XX)

摘要:

卷积神经网络(Convolutional Neural Networks,CNN)在场景识别领域取得了较好的研究成果,但该方法并未充分考虑到场景的特殊性。同类场景图像由于采样时的尺度、 视角以及背景的不同而具有类内差异性,存在于异类场景间的共有物体又使异类场景图像间具有一定的相似性。考虑到不同尺度大小的场景图像也会影响其中物体的大小,文中提出一种基于深度迁移学习与多尺度特征融合的场景识别方法。首先,使用迁移学习将在Places数据集上预训练出的网络参数迁移到CNN模型中,然后微调并再次训练网络,降低训练成本。随后,将从类激活图中获取的多尺度图像块送入CNN进行特征提取,并融合得到特征向量,使最终得到的场景图像特征更丰富。在SUN397数据集上的实验结果表明,与其它基于CNN算法相比,文中提出的算法提高了场景识别的准确度。

关键词: 场景识别, 卷积神经网络, SE-Block, 类激活图, 迁移学习, 多尺度, 特征融合, 支持向量机

Abstract:

CNN(Convolutional Neural Networks) hase achieved excellent results in the field of scene recognition research, but this method do not fully take into account the particularity of the scene. Due to different scales, viewpoints, and backgrounds, there exists large intra-class variation within the same scene class. On the other hand, the common objects also result in a certain inter-class similarities among heterogeneous scenes as well. Considering that scene images of different scales will affect the size of objects in them, this study proposes a scene recognition algorithm based on deep transfer learning and multi-scale feature fusion. First, the network parameters pre-trained on the Places data set are migrated to the CNN model used in this study using migration learning, and then the network is fine-tuned and retrained to reduce the training cost. Secondly, the multi-scale image blocks obtained from the class activation map are fed into the CNN for feature extraction, and the obtained feature vectors are fused to make the final scene image features more comprehensive. Experiment results carried out on the SUN397 data set show that compared with other CNN-based algorithms, the proposed algorithm significantly improves the accuracy of scene recognition.

Key words: scene recognition, convolutional neural network, SE-Block, class activation map, transfer learning, multi-scale, feature fusion, support vector machine

中图分类号: 

  • TP391