基于通道特征金字塔的图像分割算法

doi:10.16180/j.cnki.issn1007-7820.2023.12.006

摘要/Abstract

摘要：

针对语义分割任务中存在的庞大参数计算成本和冗余参数量等问题,文中提出了通道特征金字塔模块来解决该问题。基于通道特征金字塔模块和轻量级注意力机制构建用于实时语义分割的网络。通道特征金字塔模块创造了足够的感受野并密集地利用了上下文信息,从第2个通道开始采用求和运算逐步组合特征图,并将它们连接起来构建最终分层特征图,在常规卷积层后添加卷积模块的注意力机制提升分割精度。在没有任何预训练和后处理的情况下,算法在CamVid数据集使用单块GTX2080Ti上仅用0.75 MB参数和5.3 MB内存就实现了68.1%的分割准确率,在Cityscapes数据集上以56帧的推理速度取得了75.7%的均交互比。

关键词: 预测任务, 语义分割, 推理速度, 通道特征, 注意力机制, 感受野, 上下文信息, 均交互比

Abstract:

In view of the problems of huge parameter calculation cost and redundant parameters in semantic segmentation tasks, this study proposes a channel feature pyramid module to solve this problem. Based on the channel feature pyramid module and a lightweight attention mechanism, a real-time semantic segmentation network is constructed. The channel feature pyramid module creates sufficient receptive field and densely utilizes context information, and gradually combines feature maps with summation operations starting from the second channel, and concatenates them to build the final hierarchical feature map, which is used in regular convolutional layers. The attention mechanism of the convolution module is added later to improve the segmentation accuracy. Without any pre-training and post-processing, the algorithm achieves a segmentation accuracy of 68.1% on the CamVid data set using only 0.75 MB parameters and 5.3 MB memory on a single GTX2080Ti, and 56 frames on the Cityscapes data set. The inference speed achieved an average interaction ratio of 75.7%.

Key words: prediction task, semantic segmentation, inference speed, channel features, attention mechanism, receptive field, context information, mean intersection over union

中图分类号:

TP391.41

孙红,杨晨,莫光萍. 基于通道特征金字塔的图像分割算法[J]. 电子科技, 2023, 36(12): 39-45.

SUN Hong,YANG Chen,MO Guangping. Research on Image Segmentation Algorithm Based on Channel Feature Pyramid[J]. Electronic Science and Technology, 2023, 36(12): 39-45.

图/表 10

图1

图2

图3

图4

表1

表2

表3

图5

表4

表5

参考文献 28

[1]	青晨, 禹晶, 肖创柏, 等. 深度卷积神经网络图像语义分割研究进展[J]. 中国图象图形学报, 2020, 25(6):1069-1090.
	Qing Chen, Yu Jing, Xiao Chuangbai, et al. Deep convolutional neural network for semantic image segmentation[J]. Journal of Image and Graphics, 2020, 25(6):1069-1090.
[2]	陈劲宏, 陈玮, 尹钟. 基于改进ExfuseNet模型的街景语义分割[J]. 电子科技, 2021, 35(6):28-34.
	Chen Jinhong, Chen Wei, Yin Zhong. Semantic segment-ation of streetscape based on improved ExfuseNet[J]. Electronic Science and Technology, 2022, 35(6):28-34.
[3]	Sandler M, Howard A, Zhu M, et al. Mobilenetv2:Inverted residuals and linear bottlenecks[C]. Salt Lake City: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:4510-4520.
[4]	Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]. Seoul: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019:1314-1324.
[5]	Zhang X, Zhou X, Lin M, et al. Shufflenet:An extremely efficient convolutional neural network for mobile devices[C]. Salt Lake City: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:6848-6856.
[6]	Yu C, Wang J, Peng C, et al. BiseNet:Bilateral segmenta-tion network for real-time semantic segmentation[C]. Munich: Proceedings of the European Conference on Computer Vision, 2018:325-341.
[7]	Li H, Xiong P, Fan H, et al. DFANet:Deep feature aggreg-ation for real-time segmentation[C]. Long Beach: Proc-eedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019:9522-9531.
[8]	Ronneberger O, Fischer P, Brox T. U-net:Convolutional networks for biomedical image segmentation[C]. Munich: International Conference on Medical Image Computing and Computer-Assisted Intervention,Springer,Cham, 2015:234-241.
[9]	Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation[C]. Long Beach: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019:3146-3154.
[10]	Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]. Honolulu: Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, 2017:2881-2890.
[11]	Chen L C, Papandreou G, Kokkinos I, et al. Deeplab:Se-mantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4):834-848. doi: 10.1109/TPAMI.2017.2699184
[12]	Paszke A, Chaurasia A, Kim S, et al. ENet:A deep neural network architecture for real-time semantic segmentation[C]. Honolulu: Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, 2017:253-263.
[13]	Mehta S, Rastegari M, Caspi A, et al. ESPNet:Efficient spatial pyramid of dilated convolutions for semantic segmentation[C]. Munich: Proceedings of the European Conference on Computer Vision, 2018:552-568.
[14]	Zhao H, Qi X, Shen X, et al. LCNet for real-time semantic segmentation on high-resolution images[C]. Munich: Proceedings of the European Conference on Computer Vision, 2018:405-420.
[15]	Chen L C, Papandreou G, Schroff F, et al. Rethinking a-trous convolution for semantic image segmentation[C]. Honolulu: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017:356-368
[16]	Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoderwith atrous separable convolution for semantic imagesegmentation[C]. Munich: Proceedings of the European Conference on Computer Vision, 2018:801-818.
[17]	Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]. Boston: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015:1-9.
[18]	Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:2818-2826.
[19]	Ioffe S, Szegedy C. Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]. Lille: International Conference on Machine Learning, 2015:448-456.
[20]	Chollet F. Xception:Deep learning with depth wise searable convolutions[C]. Honolulu: Proceedings of the IEEE Conference on Computer Vision and Pattern Reonition, 2017:1251-1258.
[21]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. Computer Science, 2014, 15(56):3431-3440.
[22]	Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4,inc-eption-resnet and the impact of residual connections on learning[C]. San Francisco: The Thirty-first AAAI Conference on Artificial Intelligence, 2017:4278-4284.
[23]	Li G, Yun I, Kim J, et al. DABNet:Depth-wise asymmetric bottleneck for real-time semantic segmentation[C]. Ca-rdiff: British Machine Vision Conference, 2019:259-271.
[24]	Badrinarayanan V, Kendall A, Cipolla R. SegNet:A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495. doi: 10.1109/TPAMI.2016.2644615 pmid: 28060704
[25]	Yang M, Yu K, Zhang C, et al. DenseASPP for semantic segmentation in street scenes[C]. Salt Lake City: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:3684-3692.
[26]	Yu C, Gao C, Wang J, et al. BiseNet v2:Bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11):3051-3068. doi: 10.1007/s11263-021-01515-2
[27]	Wang P, Chen P, Yuan Y, et al. Understanding convolute-on for semantic segmentation[C]. Lake Tahoe: IEEE Winter Conference on Applications of Computer Vision, 2018:1451-1460.
[28]	Romera E, Alvarez J M, Bergasa L M, et al. EFRNet:Effi-cient residual factorized convnet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 19(1):253-272.

实验条件	配置
GPU	RTX2080Ti
运行内存	11 GB
深度学习框架	Pytorch
优化器	Adam
初始学习率	0.01
学习策略	Poly
损失函数	CrossEntropy

序号	方法	评价指标mIoU/%	参数量/MB
1	CFP(1,2)	58.3	0.39
2	CFP(1,3)	60.9	0.43
3	CFP(2,2)	61.3	0.42
4	CFP(2,6)	62.7	0.58

序号	CBAM₁	CBAM₂	CBAM₃	CBAM₄	mIoU/%
1	P				64.1
2	P	P			65.3
3	P	P	P		66.9
4	P	P	P	P	68.1

模型	评价指标mIoU/%	参数量/MB
ENet^[12]	51.2	0.36
DeepLab v2^[11]	65.1	245.70
SegNet^[24]	55.3	28.90
PSPNet^[10]	70.8	250.80
DABNet^[25]	64.5	7.80
ESPNet^[13]	54.6	0.36
ICNet^[14]	66.5	26.30
BiseNet v2^[26]	67.3	49.00
本文模型	68.1	0.75

模型	预训练	参数量 /MB	速度 /frame·s^-1	评价指标 mIoU/%
DeepLab v2^[11]	ImageNet	245.70	<1	68.3
PSPNet^[10]	ImageNet	250.80	<1	75.1
SegNet^[24]	ImageNet	29.50	15	54.2
ENet^[12]	无	0.36	76	56.1
SQNet^[27]	ImageNet	-	17	57.6
ERFNet^[28]	无	0.36	48	66.2
ICNet^[14]	ImageNet	7.60	30	67.5
BiseNet v2^[21]	ImageNet	49.00	73	71.8
DABNet^[23]	无	0.75	26	68.1
本文模型	无	0.75	56	75.7