基于注意力机制和图像轮廓的实例分割算法

doi:10.16180/j.cnki.issn1007-7820.2024.04.009

摘要/Abstract

摘要：

基于图像轮廓的实例分割方法利用少量轮廓顶点来表示物体,减少了算法的参数量,提高了算法的运行效率,但导致算法的精度低于传统逐像素处理的分割算法,获得的分割结果质量较差。为提升算法的准确性,文中提出一种基于图像轮廓结合注意力机制的实例分割模型(Attend the Contour snake,AC-snake)。在主干网络中加入改进的大卷积核(Largekernel+)提升模型的感受野,提取更加丰富的特征信息。改进轮廓顶点变形阶段的网络结构,结合双通道注意力模块(Dual Channel attention,DC-attentio)加强轮廓顶点的有效信息,减少训练网络中的无效参数,提升检测精度和训练速度。实验结果表明,在Cityscapes验证数据集中,相较于原始模型,文中提出的改进模型性能有所提升。

关键词: 实例分割, 图像轮廓, 轮廓顶点, 逐像素, 注意力机制, 大卷积核, 感受野, 特征信息

Abstract:

Based on image contour, the instance segmentation method uses fewer contour nodes to represent an object, which effectively reduces the number of algorithmic parameters and improves its operation efficiency. However, with the segmentation result of poor quality, it is no match for traditional pixel-by-pixel processing segmentation algorithm in terms of accuracy. To improve the accuracy of the algorithm, it is of great necessity to introduce a refined model of the instance segmentation (Attend the Contour snake,AC-snake), which is based on image contour with a combination of attention mechanism. An improved Largekernel+ is added to the backbone network to improve the receptive field of the model and extract richer feature information. The network structure at the contour vertex deformation stage is improved, and the Dual Channel attention (DC-attentio) module is combined to enhance the effective information of contour vertex, reduce the invalid parameters in the training network, and improve the detection accuracy and training speed. The experimental results show that in Cityscapes validation data set, the improved model proposed in this study has improved performance when compared with the original model.

Key words: instance segmentation, image contour, contour node, pre-pixel, attention meachanism, large kernel, receptive field, feature information

中图分类号:

TN247

顾登华, 顾春华. 基于注意力机制和图像轮廓的实例分割算法[J]. 电子科技, 2024, 37(4): 62-68.

GU Denghua, GU Chunhua. Instance Segmentation Based on Attention and Image Contour[J]. Electronic Science and Technology, 2024, 37(4): 62-68.

图/表 11

图1

图2

图3

图4

表1

表2

表3

表4

表5

表6

表7

参考文献 20

[1]	He K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]. Venice: Proceedings of the IEEE International Conference on Computer Vision, 2017:662-667.
[2]	Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation[C]. Salt Lake City: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:537-542.
[3]	Liang X, Lin L, Wei Y, et al. Proposal-free network for instance-level object segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(12):2978-2991. doi: 10.1109/TPAMI.2017.2775623
[4]	Bai M, Urtasun R. Deep watershed transform for instance segmentation[C]. Holunono: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017:862-866.
[5]	冯芙蓉, 张兆功. 目标轮廓检测技术新进展[J]. 计算机科学, 2021, 48(S1):1-9.
	Feng Furong, Zhang Zhaogong. Recent advances for object contour detection technology[J]. Computer Science, 2021, 48(S1):1-9. doi: 10.1063/1.31600
[6]	Kass M, Witkin A, Terzopoulos D. Snakes:Active contour models[J]. International Journal of Computer Vision, 1988, 1(4):321-331. doi: 10.1007/BF00133570
[7]	Xie E, Sun P, Song X, et al. Polarmask:Single shot instance segmentation with polar representation[C]. Seattle: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020:561-570.
[8]	Peng S, Jiang W, Pi H, et al. Deep snake for real-time instance segmentation[C]. Seattle: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020:589-593.
[9]	Luo W, Li Y, Urtasun R, et al. Understanding the effective receptive field in deep convolutional neural networks[EB/OL].(2017-01-15) [2022-10-11] https://arxiv.org/abs/1701.04128.
[10]	Zhou X, Wang D, Krähenbühl P. Objects as points[EB/OL].(2019-04-16) [2022-10-11] https://arxiv.org/abs/1904.07850.
[11]	Zhou X, Zhuo J, Krahenbuhl P. Bottomup object detection by grouping extreme and center points[C]. Long Beach: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019:377-382.
[12]	Gu J X, Wang Z H, Kuen J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition, 2018, 77(10):354-377. doi: 10.1016/j.patcog.2017.10.013
[13]	Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90. doi: 10.1145/3065386
[14]	Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition[EB/OL].(2014-09-04) [2022-10-11] https://arxiv.org/abs/1409.1556.
[15]	Ding X, Zhang X, Han J, et al. Scaling up your kernels to 31×31:Revisiting large kernel design in cnns[C]. New Orleans: Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition, 2022:892-899.
[16]	Liu S, Chen T, Chen X, Chen T, et al. More convnets in the 2020s:Scaling up kernels beyond 51×51 using sparsity[EB/OL].(2022-07-07) [2022-10-11] https://arxiv.org/abs/2207.03620.
[17]	Liu Z, Lin Y, Cao Y, et al. Swin transformer:Hierarchicalvision transformer using shifted windows[C]. Montreal: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021:1587-1593.
[18]	Ding X, Zhang X, Ma N, et al. Repvgg:Making VGG-styleconvnets great again[C]. Kuala Lumpur: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021:933-938.
[19]	徐博文, 卢奕南. 基于改进SOLO网络的城市道路场景实例分割方法[J]. 吉林大学学报(理学版), 2022, 60(6):1356-1362.
	Xu Bowen, Lu Yinan. Urban road scene instance segmentation method based on improved SOLO network[J]. Journal of Jilin University(Science Edition), 2022, 60(6):1356-1362.
[20]	Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:159-166.

参数	数值	参数	数值
数据集	Cityscapes	K	128
Num	5 000	Epoch	200
Momentum	0.9	Learning Rate	0.000 1
Optimizer	Adam	Batch size	16

模型	SE	DC-attention	Largekernel+	mAP50/%	AP/%
模型1				59.4	33.3
模型2	√			60.8	33.4
模型3		√		60.6	33.7
模型4			√	61.1	33.8
模型5	√		√	61.9	34.3
模型6		√	√	62.3	34.4

模型	person/%	rider/%	car/%	truck/%
Baseline	37.3	27.0	56.3	32.1
Baseline+SE	36.6	27.0	56.4	32.5
Baseline+DC	37.7	28.7	57.0	33.6
Baseline+DC+Largekernel	37.4	27.6	57.0	33.8

模型	bus/%	train/%	motorcycle/%	bicycle/%
Baseline	54.1	20.8	19.0	19.4
Baseline+SE	53.8	24.3	18.6	17.5
Baseline+DC	54.8	19.9	17.6	19.8
Baseline+DC+Largekernel	54.9	27.1	19.0	18.7%

模型	training data	输入尺寸	AP/%	mAP50/%
Mask R-CNN	fine	1 024×2 048	26.2	49.9
PANet	fine	1 024×2 048	31.8	57.1
Polarmask	fine	1 024×2 048	30.5	56.9
DeepSnake	fine	1 024×2 048	31.7	58.4
本文	fine	1 024×2 048	31.9	58.9