电子科技 ›› 2023, Vol. 36 ›› Issue (12): 39-45.doi: 10.16180/j.cnki.issn1007-7820.2023.12.006

• • 上一篇    下一篇

基于通道特征金字塔的图像分割算法

孙红,杨晨,莫光萍   

  1. 上海理工大学 光电信息与计算机工程学院,上海 200093
  • 收稿日期:2022-07-07 出版日期:2023-12-15 发布日期:2023-12-05
  • 作者简介:孙红(1964-),女,博士,副教授。研究方向:控制科学与工程、模式识别与智能系统。|杨晨(1998-),男,硕士研究生。研究方向:计算机视觉与图像处理。|莫光萍(1997-),女,硕士研究生。研究方向:计算机视觉与图像处理。
  • 基金资助:
    国家自然科学基金(61170277);国家自然科学基金(61472256);国家自然科学基金(61703277)

Research on Image Segmentation Algorithm Based on Channel Feature Pyramid

SUN Hong,YANG Chen,MO Guangping   

  1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China
  • Received:2022-07-07 Online:2023-12-15 Published:2023-12-05
  • Supported by:
    National Natural Science Foundation of China(61170277);National Natural Science Foundation of China(61472256);National Natural Science Foundation of China(61703277)

摘要:

针对语义分割任务中存在的庞大参数计算成本和冗余参数量等问题,文中提出了通道特征金字塔模块来解决该问题。基于通道特征金字塔模块和轻量级注意力机制构建用于实时语义分割的网络。通道特征金字塔模块创造了足够的感受野并密集地利用了上下文信息,从第2个通道开始采用求和运算逐步组合特征图,并将它们连接起来构建最终分层特征图,在常规卷积层后添加卷积模块的注意力机制提升分割精度。在没有任何预训练和后处理的情况下,算法在CamVid数据集使用单块GTX2080Ti上仅用0.75 MB参数和5.3 MB内存就实现了68.1%的分割准确率,在Cityscapes数据集上以56帧的推理速度取得了75.7%的均交互比。

关键词: 预测任务, 语义分割, 推理速度, 通道特征, 注意力机制, 感受野, 上下文信息, 均交互比

Abstract:

In view of the problems of huge parameter calculation cost and redundant parameters in semantic segmentation tasks, this study proposes a channel feature pyramid module to solve this problem. Based on the channel feature pyramid module and a lightweight attention mechanism, a real-time semantic segmentation network is constructed. The channel feature pyramid module creates sufficient receptive field and densely utilizes context information, and gradually combines feature maps with summation operations starting from the second channel, and concatenates them to build the final hierarchical feature map, which is used in regular convolutional layers. The attention mechanism of the convolution module is added later to improve the segmentation accuracy. Without any pre-training and post-processing, the algorithm achieves a segmentation accuracy of 68.1% on the CamVid data set using only 0.75 MB parameters and 5.3 MB memory on a single GTX2080Ti, and 56 frames on the Cityscapes data set. The inference speed achieved an average interaction ratio of 75.7%.

Key words: prediction task, semantic segmentation, inference speed, channel features, attention mechanism, receptive field, context information, mean intersection over union

中图分类号: 

  • TP391.41