西安电子科技大学学报 ›› 2024, Vol. 51 ›› Issue (6): 25-39.doi: 10.19665/j.issn1001-2400.20240909

• 信息与通信工程 • 上一篇    下一篇

基于特征校准的双注意力遮挡行人检测器

汤书苑1,2,3,4(), 周一青1,2,3,4(), 李锦涛1,2,3,4(), 刘畅1,2,4(), 石晶林1,2,3,4()   

  1. 1.中国科学院计算技术研究所 处理器芯片全国重点实验室,北京 100190
    2.中国科学院计算技术研究所 无线通信技术研究中心,北京100190
    3.中国科学院大学,北京 100049
    4.移动计算与新型终端北京市重点实验室,北京 100190
  • 收稿日期:2024-01-06 出版日期:2024-12-20 发布日期:2024-10-08
  • 通讯作者: 周一青(1975—),女,研究员,E-mail:zhouyiqing@ict.ac.cn
  • 作者简介:汤书苑(1985—),女,中国科学院大学博士研究生,E-mail:tangshuyuan20b@ict.ac.cn;
    李锦涛(1962—),男,研究员,E-mail:jtli@ict.ac.cn;
    刘 畅(1986—),男,助理研究员,E-mail:liuchang@ict.ac.cn;
    石晶林(1972—),男,研究员,E-mail:sjl@ict.ac.cn
  • 基金资助:
    国家自然科学基金(U21A20449);江苏省重点研发计划(BE2021013-2)

Dual attention pedestrian detector for occlusion scenario based on feature calibration

TANG Shuyuan1,2,3,4(), ZHOU Yiqing1,2,3,4(), LI Jintao1,2,3,4(), LIU Chang1,2,4(), SHI Jinglin1,2,3,4()   

  1. 1. State Key Laboratory of Processors,Institute of Computing Technology,CAS,Beijing 100190,China
    2. Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
    3. University of the Chinese Academy of Sciences,Beijing 100049,China
    4. Beijing Key Laboratory of Mobile Computing and Pervasive Device,Beijing 100190,China
  • Received:2024-01-06 Online:2024-12-20 Published:2024-10-08

摘要:

基于计算机视觉的行人检测技术面临的主要挑战之一是遮挡问题,包括自然环境中物体对行人造成的类间遮挡以及行人与行人之间的类内遮挡。这些交织的遮挡模式限制了行人检测器的性能。为此,在Faster R-CNN标准行人检测框架的基础上,提出了一种基于特征校准的双注意力检测网络。该网络首先通过监督学习生成注意力掩码,用以表征图像中的行人空间特征;然后将掩码与主干特征融合,并结合通道注意力机制,校准行人区域。该方法能够增强行人的可见区域,同时减弱遮挡部分对分类和回归的干扰。此外,提出了一种基于遮挡率的非均匀采样策略,专门针对难例进行采样,帮助网络更有效地学习复杂遮挡模式。实验结果表明,与标准行人检测器相比,所提方法在CityPersons验证集的合理遮挡子集上性能提升了约2.5%。

关键词: 卷积神经网络, 行人检测, 双注意力机制, 特征校准, 难例挖掘, 遮挡率

Abstract:

One of the major challenges faced by pedestrian detection technology based on computer vision is the issue of occlusion,including inter-class occlusion caused by objects in the natural environment and intra-class occlusion between pedestrians.These intertwined occlusion patterns limit the performance of pedestrian detectors.To address this problem,this paper proposes a dual-attention detection network based on feature calibration within the standard Faster R-CNN pedestrian detection framework.The network first generates attention masks through supervised learning to represent the spatial features of pedestrians in the image.These masks are then fused with backbone features and combined with a channel attention mechanism to calibrate pedestrian regions.This approach enhances the visibility of pedestrian regions while reducing the impact of occluded parts on classification and regression.Additionally,a non-uniform sampling strategy based on occlusion rates is introduced,targeting hard examples to allow the network to better learn complex occlusion patterns.Experimental results demonstrate that in comparison with standard pedestrian detectors,the proposed method achieves a 2.5% performance improvement on the reasonable occlusion subset of the CityPersons validation dataset.

Key words: convolutional neural network, pedestrian detection, dual attention mechanism, feature calibration, hard example mining, occlusion rate

中图分类号: 

  • TP391