电子科技 ›› 2020, Vol. 33 ›› Issue (12): 54-58.doi: 10.16180/j.cnki.issn1007-7820.2020.12.011

• • 上一篇    下一篇

基于卷积神经网络与多尺度空间编码的场景识别方法

缪冉,李菲菲,陈虬   

  1. 上海理工大学 光电信息与计算机工程学院,上海 200093
  • 收稿日期:2019-09-14 出版日期:2020-12-15 发布日期:2020-12-22
  • 作者简介:缪冉(1995-),男,硕士研究生。研究方向:图像处理与模式识别。|李菲菲(1970-),女,博士,教授。研究方向:多媒体信息处理、图像处理与模式识别、信息检索等。|陈虬(1972-),男,博士,教授,博士生导师。研究方向:图像处理与模式识别、计算机视觉、信息检索等。
  • 基金资助:
    上海市高校特聘教授(东方学者)岗位计划(ES2015XX)

Scene Recognition Algorithm Based on Convolutional Neural Networks and Multi-Scale space Encoding

MIAO Ran,LI Feifei,CHEN Qiu   

  1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 20093,China
  • Received:2019-09-14 Online:2020-12-15 Published:2020-12-22
  • Supported by:
    The Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning(ES2015XX)

摘要:

场景图像往往是由一些前景物体与背景环境以一定的空间布局组成。同类场景图片由于采样时的尺度、视角以及背景的不同而具有严重的类内差异性;存在于异类场景间的共有物体也导致异类场景图像间具有一定的相似性。据此,文中提出了基于CNN与多尺度空间编码的场景描述及识别方法。该方法结合了多尺度密集采样方法、卷积网络算法与多尺度空间编码方法。多尺度空间的编码方法是将采样网络进行多次空间划分,且对不同子区域中的CNN特征进行聚合,生成多尺度空间VLAD。文中在Scene15场景数据集上进行了实验,结果显示测试精度达到了94.67%。

关键词: 场景识别, 卷积神经网络, K均值聚类, VLAD, 主成分分析法, 支持向量机

Abstract:

A scene image is generally composed of some foreground objects and background contexts with a certain spatial layout. Due to different scales, viewpoints, and backgrounds, there exists large intra-class variation within the same scene class. On the other hand, the common objects also result in a certain inter-class similarities among heterogeneous scenes as well. Consequently, the multi-scale space encoding based on convolutional neural networks (CNN) for scene representation is proposed in the study, which combines multi-scale dense sampling method, CNN algorithm, and multi-scale space encoding method. The multi-scale encoding method spatially partitions the sampling grid many times, and then aggregates the CNN features within sub-regions with different shapes for generating the multi-scale space VLAD. The experiment is carried out on the Scene15 scene dataset, and the test results show that the test accuracy reaches 94.67%.

Key words: scene recognition, convolutional neural networks, K-means clustering, VLAD, PCA, SVM

中图分类号: 

  • TP391