西安电子科技大学学报 ›› 2021, Vol. 48 ›› Issue (4): 151-158.doi: 10.19665/j.issn1001-2400.2021.04.020

• 计算机科学与技术&网络空间安全 • 上一篇    下一篇

结合帧间目标回归网络的无人机视频车辆检测

张智1(),郑锦2()   

  1. 1.中国民航大学 计算机科学与技术学院,天津 300300
    2.北京航空航天大学 计算机学院,北京 100191
  • 收稿日期:2020-05-19 出版日期:2021-08-30 发布日期:2021-08-31
  • 通讯作者: 郑锦
  • 作者简介:张 智(1993—),男,助理实验师,E-mail: zhangz@cauc.edu.cn
  • 基金资助:
    中央高校基本科研业务费中国民航大学专项(3122019123)

Interframe target regression network for vehicle detection in UAV video

ZHANG Zhi1(),ZHENG Jin2()   

  1. 1. School of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China
    2. School of Computer,Beihang University,Beijing 100191,China
  • Received:2020-05-19 Online:2021-08-30 Published:2021-08-31
  • Contact: Jin ZHENG

摘要:

无人机视频具有视角灵活、视域连续、监视范围广等优点,但同时也存在目标分布密集、运动噪声强等问题,给准确的目标检测造成了困难。针对这些问题,提出结合帧间目标回归网络的无人机视频车辆检测算法。根据无人机视频中车辆目标密集分布的特点,提出软化非极大值抑制作为单阶段全卷积目标检测的检测框合并策略,进而构建单帧车辆检测器;为应对单帧检测器直接应用于视频检测时易受运动噪声干扰、造成同一目标置信度变化的问题,设计帧间目标回归网络,利用帧间运动连续性融合相邻多帧的目标特征,并与当前帧目标特征进行匹配回归输出预测结果;最后利用单帧检测结果修正,实现检测性能的提升。通过对已有无人机数据集进行筛选、融合和补充标注,构建一个更全面的无人机视频车辆数据集。该方法在数据集上的平均精度较单阶段全卷积目标检测和基于光流引导特征融合的视频目标检测分别提高约2%和5%,可达47.42%。实验结果表明,该方法优于单阶段全卷积目标检测和基于光流引导特征融合的视频目标检测等视频目标检测算法,具有更好的鲁棒性和泛化性。

关键词: 无人机视频, 车辆检测, 帧间运动, 特征融合, 帧间目标融合

Abstract:

UAV video has many advantages of flexible view,continuous view and wide monitoring scope,and at the same time,there are many problems,such as crowded targets,strong motion noises and so on,which make target detection difficult.To solve these problems,this paper proposes a video vehicle detection algorithm based on the interframe target regression network.According to the characteristics of crowded vehicles in UAV video,soft non maximum suppression is proposed as the detecting-box merging strategy of FCOS,and thus a single-frame vehicle detector is constructed.In order to deal with the problem that the single-frame detector can be easily disturbed by motion noise when it is directly applied to video detection,thus resulting in the change of the confidence level for the same target,an interframe target regression network is designed.The target features of adjacent multiple frames are fused by using interframe movement continuity,and the fused features are matched with the target features of the current frame to output the prediction results.Finally,the detection performance is improved by correcting prediction results through single-frame detection results.Compared with FCOS and FGFA,the average precision of the proposed algorithm is improved by 2% and 5% respectively,reaching 47.42%.Experimental results show that it is better than the existing FCOS and FGFA,and has better robustness and generalization.

Key words: UAV video, vehicle detection, interframe movements, fusion feature, interframe target regression

中图分类号: 

  • TP391.4