西安电子科技大学学报 ›› 2023, Vol. 50 ›› Issue (1): 118-128.doi: 10.19665/j.issn1001-2400.2023.01.014
刘博翀(),蔡怀宇(),杨诗远(),李灏天(),汪毅(),陈晓冬()
收稿日期:
2022-03-04
出版日期:
2023-02-20
发布日期:
2023-03-21
通讯作者:
蔡怀宇(1965—),女,教授,E-mail:作者简介:
刘博翀(1998—),男,天津大学硕士研究生,E-mail:基金资助:
LIU Bochong(),CAI Huaiyu(),YANG Shiyuan(),LI Haotian(),WANG Yi(),CHEN Xiaodong()
Received:
2022-03-04
Online:
2023-02-20
Published:
2023-03-21
摘要:
在自动驾驶场景下,针对语义分割模型在车载硬件设备中部署时内存受限且算力不足的问题,需要设计一种较好权衡效率和精度的语义分割模型。采用单分支网络结构,设计了一个轻量级多尺度双向注意力网络。为了实现高效的特征提取,设计了一种轻量级卷积单元来构成网络的特征提取骨干。为了较好地定位和分割道路场景中尺度差异较大的物体,提出了一种多尺度双向注意力模块。它具有全局多尺度感受野,并且在沿一个方向编码通道注意力的同时保留了另一个方向的空间位置信息。基于该注意力模块,设计了跳跃注意力连接模块和特征注意力融合模块,使得输出特征兼具细节信息和语义信息。模型在Cityscapes数据集上以0.9M的参数量,取得了71.86%的平均交并比,同时在单个RTX2080Ti GPU下实现了88FPS的推理速度。实验结果表明,该模型能够实现较高的分割精度,适用于车载硬件下的部署和应用,具有一定的实用价值。
中图分类号:
刘博翀, 蔡怀宇, 杨诗远, 李灏天, 汪毅, 陈晓冬. 一种用于自动驾驶场景的轻量级语义分割网络[J]. 西安电子科技大学学报, 2023, 50(1): 118-128.
LIU Bochong, CAI Huaiyu, YANG Shiyuan, LI Haotian, WANG Yi, CHEN Xiaodong. Lightweight semantic segmentation network for autonomous driving scenarios[J]. Journal of Xidian University, 2023, 50(1): 118-128.
表5
Cityscapes 测试集上与近年来的轻量级语义分割模型的比较"
算法 | Backbone | 平均交并比/% | 参数量/M | 浮点运算数每秒/G | 帧数每秒 |
---|---|---|---|---|---|
CGNet[ | — | 65.99 | 0.50 | 6.98 | 83 |
LEDNet[ | — | 69.02 | 0.92 | 11.46 | 71 |
ESNet[ | — | 69.10 | 1.66 | 23.24 | 67 |
ERFNet[ | — | 68.13 | 2.06 | 25.79 | 55 |
BiSeNet[ | Xception39[ | 69.00 | 5.80 | 2.90 | 71 |
DFANet[ | Xception A | 70.30 | 7.80 | 2.10 | 77 |
ICNet[ | Resnet50[ | 69.50 | 26.13 | 9.63 | 37 |
LMBANet | — | 71.86 | 0.90 | 10.87 | 88 |
[1] | 汪梓艺, 苏育挺, 刘艳艳, 等. 一种改进DeeplabV3网络的烟雾分割算法[J]. 西安电子科技大学学报, 2019, 46(6):52-59. |
WANG Ziyi, SU Yuting, LIU Yanyan, et al. An Improved DeeplabV3 Network Smoke Segmentation Algorithm[J]. Journal of Xidian University, 2019, 46(6):52-59. | |
[2] | 回海生, 张雪英, 吴泽林, 等. 一种主辅路径注意力补偿的脑卒中病灶分割方法[J]. 西安电子科技大学学报, 2021, 48(4):200-208. |
HUI Haisheng, ZHANG Xueying, WU Zelin, et al. Lightweight Image Super-Resolution with the Adaptive Weight Learning Network[J]. Journal of Xidian University, 2021, 48(4):200-208. | |
[3] |
SHELHAMER E, LONG J, DARRELL T. Fully Convolutional Networks for Semantic Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39:640-651.
doi: 10.1109/TPAMI.2016.2572683 pmid: 27244717 |
[4] |
LI Y, SHI T, ZHANG Y, et al. Learning Deep Semantic Segmentation Network under Multiple Weakly-Supervised Constraints for Cross-Domain Remote Sensing Image Semantic Segmentation[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 175:20-33.
doi: 10.1016/j.isprsjprs.2021.02.009 |
[5] |
YUAN X, SHI J, GU L. A Review of Deep Learning Methods for Semantic Segmentation of Remote Sensing Imagery[J]. Expert Systems with Applications, 2021, 169:114417.
doi: 10.1016/j.eswa.2020.114417 |
[6] | WANG Y, ZHOU Q, LIU J, et al. LEDNet:A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation[C]// Proceedings of the 2019 IEEE International Conference on Image Processing(ICIP).Piscataway:IEEE, 2019:1860-1864. |
[7] | WANG Y, ZHOU Q, XIONG J, et al. ESNet:An Efficient Symmetric Network for Real-Time Semantic Segmentation[C]// Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision(PRCV).Cham:Springer, 2019:41-52. |
[8] |
ROMERA E, ALVAREZ J M, BERGASA L M, et al. ERFNet:Efficient Residual Factorized Convnet for Real-Time Semantic Segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 19(1):263-272.
doi: 10.1109/TITS.2017.2750080 |
[9] |
WANG K, YANG J, YUAN S, et al. A Lightweight Network with Attention Decoder for Real-Time Semantic Segmentation[J]. The Visual Computer, 2021, 38(7):2329-2339.
doi: 10.1007/s00371-021-02115-4 |
[10] |
ZHUANG M, ZHONG X, GU D, et al. LRDNet:A Lightweight and Efficient Network with Refined Dual Attention Decoder for Real-Time Semantic Segmentation[J]. Neurocomputing, 2021, 459:349-360.
doi: 10.1016/j.neucom.2021.07.019 |
[11] | YU C, WANG J, PENG C, et al. BiSeNet:Bilateral Segmentation Network for Real-Time Semantic Segmentation[C]// Proceedings of the European Conference on Computer Vision(ECCV).Cham:Springer, 2018:334-349. |
[12] |
WU T, TANG S, ZHANG R, et al. CGNet:A Light-weight Context Guided Network for Semantic Segmentation[J]. IEEE Transactions on Image Processing, 2021, 30:1169-1179.
doi: 10.1109/TIP.83 |
[13] |
GAO G, XU G, YU Y, et al. MSCFNet:A Lightweight Network with Multi-Scale Context Fusion for Real-Time Semantic Segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12):25489-25499.
doi: 10.1109/TITS.2021.3098355 |
[14] | YANG Q, CHEN T, FAN J, et al. EADNet:Efficient Asymmetric Dilated Network for Semantic Segmentation[C]// Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).Piscataway:IEEE, 2021:2315-2319. |
[15] | HE K, ZHANG X, REN S, et al. Identity Mappings in Deep Residual Networks[C]// Proceedings of the European Conference on Computer Vision.Cham:Springer, 2016:630-645. |
[16] | CHOLLET F. Xception:Deep Learning with Depthwise Separable Convolutions[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2017:1800-1807. |
[17] | PASZKE A, CHAURASIA A, KIM S, et al. ENet:A Deep Neural Network Architecture for Real-Time Semantic Segmentation[J/OL]. [2016-06-07] https://doi.org/10.48550/arXiv.1606.02147. |
[18] | ZHANG X, ZHOU X, LIN M, et al. ShuffleNet:An Extremely Efficient Convolutional Neural Network for Mobile Devices[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2018:6848-6856. |
[19] | 成磊, 王玥, 田春娜. 一种添加残差注意力机制的视觉目标跟踪算法[J]. 西安电子科技大学学报, 2020, 47(6):148-157. |
CHENG Lei, WANG Yue, TIAN Chunna. A Visual Object Tracking Algorithm with Residual Attention Mechanism[J]. Journal of Xidian University, 2020, 47(6):148-157. | |
[20] | 宋建锋, 苗启广, 王崇晓, 等. 注意力机制的多尺度单目标跟踪算法[J]. 西安电子科技大学学报, 2021, 48(5):110-116. |
SONG Jianfeng, MIAO Qiguang, WANG Chongxiao, et al. Multi-Scale Single Target Tracking Algorithm Based on Attention Mechanism[J]. Journal of Xidian University, 2021, 48(5):110-116. | |
[21] | CAO Y, XU J, LIN S, et al. GCNet:Non-Local Networks Meet Squeeze-Excitation Networks and Beyond[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.Piscataway:IEEE, 2019:1971-1980. |
[22] | 常新旭, 张杨, 杨林, 等. 融合多头自注意力机制的语音增强方法[J]. 西安电子科技大学学报, 2020, 47(1):104-110 |
CHANG Xinxu, ZHANG Yang, YANG Lin, et al. A Speech Enhancement Method Integrating Multi-Head Self-Attention Mechanism[J]. Journal of Xidian University, 2020, 47(1):104-110. | |
[23] | WOO S, PARK J, LEE J Y, et al. CBAM:Convolutional Block Attention Module[C]// Proceedings of the European Conference on Computer Vision(ECCV).Cham:Springer, 2018:3-19. |
[24] |
HU J, SHEN L, SUN G. Squeeze-and-Excitation Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8):2011-2023.
doi: 10.1109/TPAMI.2019.2913372 pmid: 31034408 |
[25] | TIAN Z, HE T, SHEN C, et al. Decoders Matter for Semantic Segmentation:Data-Dependent Decoding Enables Flexible Feature Aggregation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2019:3121-3130. |
[26] | CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes Dataset for Semantic Urban Scene Understanding[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2016:3213-3223. |
[27] | KINGMA D P, BA J. Adam:A Method for Stochastic Optimization[J/OL].[2017-01-30]. https://doi.org/10.48550/arXiv.1412.6980. |
[28] | REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized Intersection over Union:A Metric and a Loss for Bounding Box Regression[C] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2019:658-666. |
[29] | LI H, XIONG P, FAN H, et al. DFANet:Deep Feature Aggregation for Real-Time Semantic Segmentation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2019:9514-9523. |
[30] | ZHAO H, QI X, SHEN X, et al. ICNet for Real-Time Semantic Segmentation on High-Resolution Images[C]// Proceedings of the European Conference on Computer Vision(ECCV).Cham:Springer, 2018:418-434. |
[31] | HE K, ZHANG X, REN S, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2016:770-778. |
[1] | 张雅荔,李文元,李昌禄,丁少博. 融合注意力引导的多尺度低照度图像增强方法[J]. 西安电子科技大学学报, 2023, 50(1): 129-136. |
[2] | 崔少国,陈思奇,杜兴. 面向目标情感分析的双重图注意力网络模型[J]. 西安电子科技大学学报, 2023, 50(1): 137-148. |
[3] | 甘萍,农丽萍,张文辉,林基明,王俊义. 一种用于交通预测的注意力时空图神经网络[J]. 西安电子科技大学学报, 2023, 50(1): 168-176. |
[4] | 张强, 杨欣朋, 赵世祥, 卫栋栋, 韩臻. 注意力机制的SAR图像车辆目标检测网络[J]. 西安电子科技大学学报, 2023, 50(1): 36-47. |
[5] | 刘晓雯, 郭继昌, 郑司达. 采用渐进式网络的弱监督显著性目标检测算法[J]. 西安电子科技大学学报, 2023, 50(1): 48-57. |
[6] | 张泽欢, 刘强, 国狄非. 面向大规模零样本图像识别的高效算法框架[J]. 西安电子科技大学学报, 2022, 49(6): 103-110. |
[7] | 李娇娇, 刘志强, 宋锐, 李云松. 一种改进Unet网络的遥感影像分割算法[J]. 西安电子科技大学学报, 2022, 49(6): 67-75. |
[8] | 张兆宇,田春娜,周恒,田西兰. 联合在线分类的双注意力RGBT孪生网络跟踪[J]. 西安电子科技大学学报, 2022, 49(6): 76-85. |
[9] | 刘侍刚,张同,杨建功,葛宝. 递进式空洞残差深度双目立体匹配网络[J]. 西安电子科技大学学报, 2022, 49(5): 175-180. |
[10] | 王侃, 王孟洋, 刘鑫, 田国强, 李川, 刘伟. 融合自注意力机制与CNN-BiGRU的事件检测[J]. 西安电子科技大学学报, 2022, 49(5): 181-188. |
[11] | 齐佩汉,李冰,谢爱平,高向兰. 欠采样跳频通信信号深度学习重构方法[J]. 西安电子科技大学学报, 2022, 49(4): 1-7. |
[12] | 马仑,刘鑫,赵斌,王瑞平,廖桂生,张亚静. 利用多头-连体神经网络实现障碍行为识别[J]. 西安电子科技大学学报, 2022, 49(4): 100-108. |
[13] | 井佩光,李亚鑫,苏育挺. 一种多模态特征编码的短视频多标签分类方法[J]. 西安电子科技大学学报, 2022, 49(4): 109-117. |
[14] | 刘迪,郭继昌,汪昱东,张怡. 融合注意力机制的多尺度显著性目标检测网络[J]. 西安电子科技大学学报, 2022, 49(4): 118-126. |
[15] | 高德勇,康自兵,王松,王阳萍. 利用卷积块注意力机制识别人体动作的方法[J]. 西安电子科技大学学报, 2022, 49(4): 144-155. |
|