电子科技 ›› 2025, Vol. 38 ›› Issue (5): 83-88.doi: 10.16180/j.cnki.issn1007-7820.2025.05.012

• • 上一篇    下一篇

融合通道全局注意力机制的双路径实时语义分割

胡锦磊1, 田恩刚1(), 瞿枫2   

  1. 1.上海理工大学 光电信息与计算机工程学院,上海 200093
    2.常州市住房公积金管理中心,江苏 常州 213003
  • 收稿日期:2023-11-14 修回日期:2023-12-13 出版日期:2025-05-15 发布日期:2025-05-14
  • 通讯作者: 田恩刚 E-mail:tianengang@163.com
  • 作者简介:胡锦磊(1997-),男,硕士研究生。研究方向:计算机视觉。
  • 基金资助:
    国家自然科学基金(62173231)

Dual-Path Real-Time Semantic Segmentation Network with Channel-Level Global Attention Mechanism

HU Jinlei1, TIAN Engang1(), QU Feng2   

  1. 1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China
    2. Changzhou Housing Provident Fund Management Center,Changzhou 213003,China
  • Received:2023-11-14 Revised:2023-12-13 Online:2025-05-15 Published:2025-05-14
  • Contact: TIAN Engang E-mail:tianengang@163.com
  • Supported by:
    National Natural Science Foundation of China(62173231)

摘要:

针对主流实时语义分割方法存在多尺度特征提取能力较差、轻量型骨干网络特征提取能力较弱、缺少上下文信息的有效融合等问题,文中提出一种融合通道全局注意力机制的双路实时语义分割模型。快速金字塔池化模块(Fast Pyramid Pooling Module, FPPM)在金字塔池化模块(Pyramid Pooling Module, PPM)基础上进行轻量化设计,在保持多尺度信息提取的同时提高模块的速度。空间信息分支可以弥补使用轻量级骨干网络产生的性能损失。通道全局注意力机制对空间信息和骨干网络提取的语义信息进行有效融合,并对全局信息进行交互,提高模型的分割性能。在不使用其他数据集预训练的情况下,所提模型在PASCAL VOC2012验证数据集上的平均交并比为73.8%,参数量为14.6 MB,在NVIDIA TITAN Xp上的每秒帧数可以达到43 frame∙s-1,表明该模型在精度和速度上实现了较好平衡。

关键词: 语义分割, 双路径, 注意力机制, 多尺度池化, 实时, 深度学习, 信息融合, 轻量化

Abstract:

In view of the problems of the mainstream real-time semantic segmentation methods, such as poor multi-scale feature extraction ability, weak feature extraction ability of lightweight backbone network, and lack of effective fusion of context information, a two-path real-time semantic segmentation model with global attention mechanism is proposed in this study. The FPPM(Fast Pyramid Pooling Module) is lightweight based on the PPM(Pyramid Pooling Module) to improve the speed of the module while maintaining multi-scale information extraction. Spatial information branching can compensate for the performance loss caused using lightweight backbone networks. The channel global attention mechanism effectively integrates spatial information and semantic information extracted by backbone network, and interacts with global information to improve the segmentation performance of the model. Without pre-training with other data sets, the proposed model achieves 73.8% average cross ratio on PASCAL VOC2012 validation data set with 14.6 MB of reference, and reaches 43 frame∙s-1 on NVIDIA TITAN Xp,which indicates that the model achieves a good balance in accuracy and speed.

Key words: semantic segmentation, dual-path, attention mechanism, multi-scale pooling, real-time, deep learning, information fusion, lightweight

中图分类号: 

  • TP391