基于迁移学习与权重支持向量机的图像多标签标注算法

doi:10.16180/j.cnki.issn1007-7820.2020.03.003

电子科技 ›› 2020, Vol. 33 ›› Issue (3): 12-16.doi: 10.16180/j.cnki.issn1007-7820.2020.03.003

基于迁移学习与权重支持向量机的图像多标签标注算法

陈磊,李菲菲,陈虬

上海理工大学光电信息与计算机工程学院,上海 200093

收稿日期:2019-01-24 出版日期:2020-03-15 发布日期:2020-03-25
作者简介:陈磊(1995-),女,硕士研究生。研究方向:计算机视觉与模式识别|李菲菲(1970-),女,博士,教授。研究方向:多媒体信息处理,图像处理与模式识别、信息检索等|陈虬(1972-),男,博士,教授,博士生导师。研究方向:图像处理与模式识别、计算机视觉、信息检索等。
基金资助:
上海市高校特聘教授(东方学者)岗位计划(ES2012XX);上海市高校特聘教授(东方学者)岗位计划(ES2014XX)

Multi-label Image Annotation Algorithm Based on Transfer Learning and Weighted Ranking Support Vector Machine

CHEN Lei,LI Feifei,CHEN Qiu

School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China

Received:2019-01-24 Online:2020-03-15 Published:2020-03-25
Supported by:
The Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning(ES2012XX);The Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning(ES2014XX)

摘要/Abstract

摘要：

为解决图像的多标签自动标注中标签不平衡性的问题,提出了一种基于迁移学习与权重支持向量机的图像自动标注方法。为了解决所选数据集规模较小无法训练出最优的卷积神经网络的问题,文中采用迁移学习的方法,将通过Imagenet数据集训练出的Alexnet的参数迁移到文中所用的卷积神经网络模型中,并对最后一层全连接层进行微调,利用多标签分类多合页损失函数构成多分类的支持向量机。最后,文中对低频标签进行权重排序以得到图像的多标签标注结果。在Corel-5k、Esp-Game和IAPR-TC12共3个数据集上进行了实验,权重支持向量机获得的平均召回率分别提升了10%、9%和6%,低频标签对其平均精确率均提升了12%。实验结果表明,基于迁移学习的权重支持向量机的图像多标签标注方法可在有效提高数据集的召回率的同时提升低频标签的平均精确度。

关键词: 图像多标签标注, 迁移学习, 权重支持向量机, 卷积神经网络, 多合页损失函数, 低频标签

Abstract:

In order to resolve the class imbalance problem in multi-label image annotation, an improved annotation method based on transfer learning and WRSVM was proposed in this paper. As it was difficult to train a CNN from scratch by using small datasets, transfer learning was adopted to transfer the parameters of Alexnet trained by Imagenet dataset to the convolutional neural network model utilized in the study. Besides, the last fully connected layer was fine-tuned and the multi-label multi-hinge loss function was applied to constitute multi-class support vector machine. Finally, the weighted ranking was used to label the low-frequency labels to obtain the multi-label image annotation results. The experiments were performed on three datasets including Corel-5k, Esp-Game and IAPR-TC12. The experimental results showed that the average recall of the proposed method increased 10%, 9%, and 6%, respectively, and the average of precision increased 12% for the low-frequency labels, indicating the proposed CNN-WRSVM method could improve the average of recall and the average of precision for low-frequency labels.

Key words: multi label image annotation, transfer learning, WRSVM, CNN, MHL, low frequency labels

中图分类号:

TP391

陈磊,李菲菲,陈虬. 基于迁移学习与权重支持向量机的图像多标签标注算法[J]. 电子科技, 2020, 33(3): 12-16.

CHEN Lei,LI Feifei,CHEN Qiu. Multi-label Image Annotation Algorithm Based on Transfer Learning and Weighted Ranking Support Vector Machine[J]. Electronic Science and Technology, 2020, 33(3): 12-16.

图/表 5

图1

表1

表2

表3

表4

参考文献 17

[1]	秦莹华, 李菲菲, 陈虬 . 基于迁移学习的多标签图像标注[J]. 电子科技, 2018,31(8):25-28.
	Qing Yinghua, Li Feifei, Chen Qiu . Multi-label image annotation based on transfer learning[J]. Electronic Science and Technology, 2018,31(8):25-28.
[2]	Yang X, Qian X, Mei T . Learning salient visual word for scalable mobile image retrieval[J]. Pattern Recognition, 2015,48(2):3093-3101.
[3]	Cheng Q, Zhang Q, Fu P , et al. A survey and analysis on automatic image annotation[J]. Pattern Recognition, 2018,79(3):242-259.
[4]	Zhang R, Zhang L, Wang X J, et al. Multi-feature pLSA for combining visual features in image annotation[C]. Scottsdale:International Conference on Multimedia, 2011.
[5]	Kalayeh M M, Idrees H, Shah M . NMF-KNN:image annotation using weighted multi-view non-negative matrix factorization[C]. Columbus:IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[6]	Verma, Yashaswi, Jawahar C V. Image annotation using metric learning in semantic neighbourhoods[C]. Berlin:European Conference on Computer Vision, 2012.
[7]	Hou Y, Lin Z. Image tag completion and refinement by subspace clustering and matrix completion[C]. Chengdu:Visual Communications & Image Processing, 2016.
[8]	Lin Z, Ding G, Hu M , et al. Image tag completion via dual-view linear sparse reconstructions[J]. Computer Vision and Image Understanding, 2014,124(6):42-60.
[9]	Jing X Y, Wu F, Li Z , et al. Multi-label dictionary learning for image annotation[J]. IEEE Transactions on Image Processing, 2016,25(6):2712-2725.
[10]	Murthy V N, Can E F, Manmatha R. A hybrid mod-el for automatic image annotation[C]. Glasgow:Proceedings of International Conference on Multimedia Retrieval, 2014.
[11]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural net-works[C]. Stateline:International Conference on Neural Information Processing Systems, 2012.
[12]	Pan S J, Yang Q . A survey on transfer learning[J]. IEEE Transactions on Knowledge & Data Engineering, 2010,22(10):1345-1359.
[13]	Yosinski J, Clune J, Bengio Y, et al. How transfera ble are features in deep neural networks[C]. Montréal: Advances in Neural Information Processing Systems, 2014.
[14]	Chen M, Zheng A, Weinberger K. Fast image tag ging[C]. Atlanta:International Conference on Machine Learning, 2013.
[15]	Murthy V N, Maji S, Manmatha R. Automatic image annotation using deep learning representations[C]. Shanghai:Proceedings of the 5 ^th ACM on International Conference on Multimedia Retrieval , 2015.
[16]	汪鹏, 张奥帆, 王利琴 , 等. 基于迁移学习与多标签平滑策略的图像自动标注[J]. 计算机应用, 2018,38(11):3199-3203.
	Wang Peng, Zhang Aofan, Wang Liqim , et al. Image au-tomatic annotation based on transfer learning and multi-label smoothing strategy[J]. Journal of Computer Applications, 2018,38(11):3199-3203.
[17]	Kashani M M, Amiri S H. Leveraging deep learn-ing representation for search-based image annotation[C]. Shiraz:Artificial Intelligence & Signal Processing Conference, 2018.

层级名称	设置参数
输入层	图像尺寸为256×256
卷积层1	卷积核尺寸为11×11,最大池化层
卷积层2	卷积核尺寸为5×5,最大池化层
卷积层3	卷积核尺寸为3×3
卷积层4	卷积核尺寸为3×3
卷积层5	卷积核尺寸为3×3,最大池化层
全连接层1	输出特征向量维数4 096
全连接层2	输出特征向量维数4 096
全连接层3	输出特征向量维数4 096
输出层	多标签损失层,输出向量维数依数据集标签个数决定

数据集	Corel-5k	Esp-Game	IAPR-TC12
图像总数	5 000	20 770	19 627
训练集图像数	4 500	18 689	17 665
测试集图像数	499	2 081	1 962
标签数	260	268	291
平均标签数/幅	3.4	4.7	5.7
α	58.6	362.7	347.7

数据集	Corel-5k				ESP-Game				IAPR-TC 12
数据集	P	R	F₁	N+	P	R	F₁	N+	P	R	F₁	N+
2PKNN-ML^[6]	0.44	0.46	0.45	191	0.53	0.27	0.35	252	0.54	0.37	0.43	278
FastTag^[14]	0.32	0.43	0.37	166	0.46	0.22	0.30	247	0.47	0.26	0.34	280
CCA-KNN^[15]	0.42	0.52	0.46	201	0.46	0.36	0.41	260	0.45	0.38	0.41	278
MLDL^[9]	0.45	0.49	0.47	198	0.56	0.31	0.40	259	0.56	0.40	0.47	282
CNN-MLSU^[16]	0.37	0.49	0.42	-	-	-	-	-	0.44	0.38	0.41	-
NN-CNN^[17]	0.42	0.45	0.43	187	0.50	0.28	0.36	252	0.54	0.31	0.40	272
CNN-WRSVM	0.44	0.62	0.51	137	0.42	0.45	0.44	222	0.48	0.42	0.45	222

数据集	Corel-5k				ESP-Game				IAPR-TC 12
数据集	P	R	F₁	N+	P	R	F₁	N+	P	R	F₁	N+
CNN-SVM	0.25	0.39	0.31	91	0.43	0.31	0.36	188	0.50	0.30	0.38	166
CNN-WRSVM	0.37	0.32	0.34	72	0.55	0.25	0.35	163	0.32	0.25	0.38	148

基于迁移学习与权重支持向量机的图像多标签标注算法

Multi-label Image Annotation Algorithm Based on Transfer Learning and Weighted Ranking Support Vector Machine

RichHTML

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 17

相关文章 15

Metrics

本文评价

推荐阅读 10

[1]	李辉,王一丞. 基于CNNCIFG-Attention模型的文本情感分类[J]. 电子科技, 2022, 35(2): 46-51.
[2]	程晓雅,张雷. 基于深度学习的有遮挡人脸识别方法研究[J]. 电子科技, 2022, 35(1): 35-39.
[3]	战荫伟,朱百万,杨卓. 一种车脸识别算法的研究与应用[J]. 电子科技, 2021, 34(8): 1-7.
[4]	廖璐明,张伟. 基于改进VGG16网络的混合批量训练交通标志识别[J]. 电子科技, 2021, 34(8): 8-13.
[5]	马立新,豆晨飞,宋晨灿,杨天笑. 基于特征融合卷积神经网络的绝缘子无损检测[J]. 电子科技, 2021, 34(7): 26-30.
[6]	翟岳仙,刘翔,宋家琳. 基于迁移学习策略的肝纤维化分期诊断方法[J]. 电子科技, 2021, 34(6): 11-16.
[7]	顾伟,李菲菲,陈虬. 基于多特征融合的行人检测方法[J]. 电子科技, 2021, 34(5): 29-34.
[8]	薛永杰,巨志勇. 基于改进AlexNet的鱼类识别算法[J]. 电子科技, 2021, 34(4): 12-17.
[9]	闫书豪,乔美英. 基于一维WConv-BiLSTM的轴承故障诊断算法[J]. 电子科技, 2021, 34(4): 75-82.
[10]	葛靖,刘子龙. 基于CNN和LSTM的睡眠呼吸暂停检测算法[J]. 电子科技, 2021, 34(2): 21-26.
[11]	朱斌,刘子龙. 基于新型初始模块的卷积神经网络图像分类方法[J]. 电子科技, 2021, 34(2): 52-56.
[12]	张灿,陈玮,尹钟. 基于弱监督宫颈细胞图像的语义分割方法[J]. 电子科技, 2021, 34(12): 68-74.
[13]	张莹,刘子龙,万伟. 基于Faster R-CNN的无人机车辆目标检测[J]. 电子科技, 2021, 34(11): 11-20.
[14]	李诚,刘昊,蒋希峰,吴军法,韩文刚,高建国. 基于VGG网络的发电机定转子智能诊断算法[J]. 电子科技, 2021, 34(11): 62-66.
[15]	董波,周燕,王永雄. 基于渐进结构感受野和全局注意力的显著性检测[J]. 电子科技, 2021, 34(1): 23-30.