基于深度迁移学习与多尺度特征融合的场景识别方法

doi:10.16180/j.cnki.issn1007-7820.2023.11.004

摘要/Abstract

摘要：

卷积神经网络(Convolutional Neural Networks,CNN)在场景识别领域取得了较好的研究成果,但该方法并未充分考虑到场景的特殊性。同类场景图像由于采样时的尺度、视角以及背景的不同而具有类内差异性,存在于异类场景间的共有物体又使异类场景图像间具有一定的相似性。考虑到不同尺度大小的场景图像也会影响其中物体的大小,文中提出一种基于深度迁移学习与多尺度特征融合的场景识别方法。首先,使用迁移学习将在Places数据集上预训练出的网络参数迁移到CNN模型中,然后微调并再次训练网络,降低训练成本。随后,将从类激活图中获取的多尺度图像块送入CNN进行特征提取,并融合得到特征向量,使最终得到的场景图像特征更丰富。在SUN397数据集上的实验结果表明,与其它基于CNN算法相比,文中提出的算法提高了场景识别的准确度。

关键词: 场景识别, 卷积神经网络, SE-Block, 类激活图, 迁移学习, 多尺度, 特征融合, 支持向量机

Abstract:

CNN(Convolutional Neural Networks) hase achieved excellent results in the field of scene recognition research, but this method do not fully take into account the particularity of the scene. Due to different scales, viewpoints, and backgrounds, there exists large intra-class variation within the same scene class. On the other hand, the common objects also result in a certain inter-class similarities among heterogeneous scenes as well. Considering that scene images of different scales will affect the size of objects in them, this study proposes a scene recognition algorithm based on deep transfer learning and multi-scale feature fusion. First, the network parameters pre-trained on the Places data set are migrated to the CNN model used in this study using migration learning, and then the network is fine-tuned and retrained to reduce the training cost. Secondly, the multi-scale image blocks obtained from the class activation map are fed into the CNN for feature extraction, and the obtained feature vectors are fused to make the final scene image features more comprehensive. Experiment results carried out on the SUN397 data set show that compared with other CNN-based algorithms, the proposed algorithm significantly improves the accuracy of scene recognition.

Key words: scene recognition, convolutional neural network, SE-Block, class activation map, transfer learning, multi-scale, feature fusion, support vector machine

中图分类号:

TP391

王桥,胡春燕,李菲菲. 基于深度迁移学习与多尺度特征融合的场景识别方法[J]. 电子科技, 2023, 36(11): 19-27.

WANG Qiao,HU Chunyan,LI Feifei. Scene Recognition Algorithm Based on Deep Transfer Learning and Multi-Scale Feature Fusion[J]. Electronic Science and Technology, 2023, 36(11): 19-27.

图/表 13

图1

图2

图3

图4

图5

图6

图7

图8

表1

图9

图10

表2

表3

参考文献 26

[1]	Le Cun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324. doi: 10.1109/5.726791
[2]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90. doi: 10.1145/3065386
[3]	Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]. Boston: IEEE Conference on Computer Vision and Pattern Recognition, 2015:1-9.
[4]	Muhammad U, Wang W, Chattha S P, et al. Pre-trained VGGNet architecture for remote-sensing image scene classification[C]. Beijing: The Twenty-fourth International Conference on Pattern Recognition, 2018:1622-1627.
[5]	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]. Las Vegas: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016:770-778.
[6]	Herranz L, Jiang S, Li X. Scene recognition with CNNs:objects,scales and dataset bias[C]. Las Vegas: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016:571-579.
[7]	Chen L, Bo K, Lee F, et al. Advanced feature fusion algorithm based on multiple convolutional neural network for scene recognition[J]. Computer Modeling in Engineering & Sciences, 2020, 122(2):505-523.
[8]	Zhao Z, Larson M. From volcano to toyshop:Adaptive discriminative region discovery for scene recognition[C]. New York: Proceedings of the Twenty-sixth ACM International Conference on Multimedia, 2018:1760-1768.
[9]	Liu Y, Chen Q, Chen W, et al. Dictionary learning inspired deep network for scene recognition[C]. New Orleans: The Thirty-second AAAI Conference on Artificial Intelligence, 2018:7178-7185.
[10]	谢林, 李菲菲, 陈虬. 基于稀疏自动编码机的场景识别算法[J]. 电子科技, 2019, 32(1):38-41.
	Xie Lin, Li Feifei, Chen Qiu. Scene recognition algorithm based on sparse autoencoder[J]. Electronic Science and Technology, 2019, 32(1):38-41.
[11]	缪冉, 李菲菲, 陈虬. 基于卷积神经网络与多尺度空间编码的场景识别方法[J]. 电子科技, 2020, 33(12):1-7.
	Miao Ran, Li Feifei, Chen Qiu. Scene recognition algorithm based on convolutional neural networks and multi-scale space encoding[J]. Electronic Science and Technology, 2020, 33(12):1-7.
[12]	Cheng X, Lu J, Feng J, et al. Scene recognition with objectness[J]. Pattern Recognition, 2018, 74(9):474-487. doi: 10.1016/j.patcog.2017.09.025
[13]	Pan Y, Xia Y, Shen D. Foreground fisher vector:Encoding class-relevant foreground to improve image classification[J]. IEEE Transactions on Image Processing, 2019, 28(10):4716-4729. doi: 10.1109/TIP.83
[14]	Jégou H, Douze M, Schmid C, et al. Aggregating local descriptors into a compact image representation[C]. San Francisco: Computer Vision and Pattern Recognition, 2010:3304-3311.
[15]	Wang Z, Wang L, Wang Y, et al. Weakly supervised patchnets:Describing and aggregating local patches for scene recognition[J]. IEEE Transactions on Image Processing, 2017, 26(4):2028-2041. doi: 10.1109/TIP.2017.2666739 pmid: 28207394
[16]	Pan S J, Yang Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 22(10):345-1359.
[17]	Zhuang F, Qi Z, Duan K, et al. A comprehensive survey on transfer learning[J]. Proceedings of the IEEE, 2020, 109(1):43-76. doi: 10.1109/PROC.5
[18]	李新叶, 龙慎鹏, 朱婧. 基于深度神经网络的少样本学习综述[J]. 计算机应用研究, 2020, 37(8):2241-2247.
	Li Xinye, Long Shenpeng, Zhu Jing. Survey of few-shot learning based on deep neural network[J]. Application Research of Computers, 2020, 37(8):2241-2247.
[19]	Yang S, Lee F F, Miao R. et al. RS-CapsNet:An advanced capsule network[J]. IEEE Access, 2020, 8(10):85007-85018. doi: 10.1109/Access.6287639
[20]	Li T W, Lee G C. Performance analysis of fine-tune transferred deep learning[C]. Yunlin: IEEE the Third Eurasia Conference on IOT,Communication and Engineering, 2021:315-319.
[21]	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:770-778.
[22]	王宪保, 肖本督, 姚明海. 一种结合类激活映射的半监督图像分类方法[J]. 小型微型计算机系统, 2022, 43(6):1204-1209.
	Wang Xianbao, Xiao Bendu, Yao Minghai. Semi-supervised image classification method combined with class activation mapping[J]. Journal of Chinese Computer Systems, 2022, 43(6):1204-1209.
[23]	Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]. Salt Lake City: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:7132-7141.
[24]	Xiao J, Hays J, Ehinger K A, et al. Sun database:Large- scale scene recognition from abbey to zoo[C]. San Francisco: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2010:3485-3492.
[25]	朱晓慧, 钱丽萍, 傅伟. 图像数据增强技术研究综述[J]. 软件导刊, 2021, 20(5):230-236.
	Zhu Xiaohui, Qian Liping, Fu Wei. Overview of research on image data enhancement technology[J]. Software Guide, 2021, 20(5):230-236.
[26]	Shi J, Zhu H, Yu S, et al. Scene categorization model using deep visually sensitive features[J]. IEEE Access, 2019, 7(7):45230-45239. doi: 10.1109/Access.6287639

	批大小	迭代次数	学习率	学习衰减率	动量	权重衰减
特征提取器的分类器	50	40	1×10^-2	10	0.9	-
特征提取器	128	40	1×10^-2	20	0.9	-
类激活图生成	64	100	1×10^-5 (卷积层) 1×10^-2 (其余层)	20	0.9	1×10^-3

序号	类激活图生成器主干网络	特征提取器主干网络			准确度 /%
序号	类激活图生成器主干网络	全局尺度	1/4尺度	1/16尺度	准确度 /%
1	Places	Places	Places	ImageNet	73.93
2	Places/SE/ Finetune	Places/ Finetune	Places/ Finetune	ImageNet	76.18

算法	准确度/%
VS-CNN^[26]	43.14
Dual CNN-DL^[9]	70.13
Multi-scale CNNs^[6]	70.17
Adi-Red^[8]	73.59
本文方法	76.11