西安电子科技大学学报 ›› 2022, Vol. 49 ›› Issue (3): 160-170.doi: 10.19665/j.issn1001-2400.2022.03.018

• 计算机科学与技术&人工智能 • 上一篇    下一篇

特征增强的单阶段遥感图像目标检测模型

汪西莉(),梁敏(),刘涛()   

  1. 陕西师范大学 计算机科学学院,陕西 西安710119
  • 收稿日期:2021-01-27 修回日期:2021-11-24 出版日期:2022-06-20 发布日期:2022-07-04
  • 作者简介:汪西莉(1969—),女,教授,博士,E-mail: wangxili@snnu.edu.cn|梁敏(1997—),女,陕西师范大学硕士研究生,E-mail: liangmin521@snnu.edu.cn|刘涛(1995—),男,陕西师范大学硕士研究生,E-mail: 18220765210@163.com
  • 基金资助:
    科技部青藏高原科考专项(2019QZKK0405)

Feature enhanced single-stage remote sensing image object detection model

WANG Xili(),LIANG Min(),LIU Tao()   

  1. School of Computer Science,Shaanxi Normal University,Xi’an 710119,China
  • Received:2021-01-27 Revised:2021-11-24 Online:2022-06-20 Published:2022-07-04

摘要:

随着卷积神经网络的发展,遥感图像目标检测性能提升明显,但场景的复杂性和目标大小、形态的多样性依然对目标检测带来挑战。针对复杂情况下不同大小目标的检测问题进行研究。特征金字塔结构是解决不同大小目标检测的有效方法,但其逐层传递特征的方式可能产生特征丢失问题,故提出跳跃连接特征金字塔模块来增强特征金字塔结构中各层特征的语义和细节信息。同时,使用位置注意力强化目标区域特征是提升目标检出率的有效方法,并有助于复杂场景下目标的检测,但现有的位置注意力往往同时强化了不精确的预测结果,对最终预测结果产生干扰。为此提出基于锚框的位置注意力模块,强化更可能产生精确预测结果的特征区域。将跳跃连接特征金字塔模块和基于锚框的位置注意力模块嵌入到RetinaNet模型中,形成端到端的特征增强的单阶段遥感图像目标检测模型FENet(Feature Enhanced Network)。针对复杂的遥感影像目标检测进行实验,在UCAS-AOD数据集上FENet模型mAP比FAN(Face Attention Network)高1.78%,在RSOD数据集上比FAN模型提升了1.48%,且超越了其他先进模型。此外,FENet在单块Titan X GPU上对800×800图像的测试时间是0.058 s。实验结果表明,与同类模型相比,所提模型增强了目标的特征提取能力,进而提升了检测性能。

关键词: 遥感图像, 特征金字塔, 位置注意力, 锚框, 单阶段目标检测

Abstract:

Purpose:The performance of remote sensing object detection has been largely improved with the development of the convolutional neural network.However,the complexity of the scene and the diversity of the target size and shape are still challenging in the remote sensing object detection task.Thus,the deep detection model of different sizes’ objects in a complex scenario is studied.Methods:Feature pyramids are an effective method for detecting objects with different sizes.But the way of transferring the feature layer by layer may lose information in the feature pyramids.Therefore,this paper proposes a feature pyramid network with shortcut connections,which can enhance the semantic and detailed information on each feature layer in the feature pyramid.Moreover,using the spatial attention weight to strengthen the possible target area is an effective method to improve the target detection rate,and it is helpful for object detection in the complex scene.But the available spatial attention will strengthen the imprecise prediction results simultaneously,so that it may interfere with the final prediction results.For this purpose,this paper proposes an anchor-based spatial attention module which mainly strengthens feature regions that are more likely to produce accurate prediction results.In this paper,the feature pyramid network with shortcut connections and the anchor-based spatial attention module are embedded into the Retina Net to form an end-to-end feature enhanced single-stage remote sensing object detection model,namely FENet (Feature Enhanced Network).Results:Experimental results show that the FENet model is 1.78% higher in mAP than the FAN (Feature Enhanced Network) on UCAS-AOD remote sensing dataset,and 1.48% higher than the FAN model on RSOD dataset.And the mAP results of the FENet are superior to those of the comparable models.In addition,the test time of the FENet for an image of 800800 pixel in a single Titan X GPU is 0.058s.Conclusions:Experimental results show that the proposed model can effectively enhance the object feature extraction ability,and thus improve the detection performance.

Key words: remote sensing image, feature pyramids, spatial attention, anchor box, single-stage object detection

中图分类号: 

  • TP391.4