电子科技 ›› 2024, Vol. 37 ›› Issue (4): 77-86.doi: 10.16180/j.cnki.issn1007-7820.2024.04.011

• • 上一篇    下一篇

基于锚点的快速三维手部关键点检测算法

秦晓飞, 何文, 班东贤, 郭宏宇, 于景   

  1. 上海理工大学 光电信息与计算机工程学院,上海 200093
  • 收稿日期:2022-12-17 出版日期:2024-04-15 发布日期:2024-04-19
  • 作者简介:秦晓飞(1982-),男,博士,高级工程师。研究方向:人工智能算法。
    何文(1996-),男,硕士研究生。研究方向:人机协作、计算机视觉。
  • 基金资助:
    国家自然科学基金(92048205);国家留学基金(202008310014)

Research on Fast 3D Hand Keypoint Detection Algorithm Based on Anchor

QIN Xiaofei, HE Wen, BAN Dongxian, GUO Hongyu, YU Jing   

  1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China
  • Received:2022-12-17 Online:2024-04-15 Published:2024-04-19
  • Supported by:
    National Natural Science Foundation of China(92048205);China Scholarship(202008310014)

摘要:

在人机协作任务中,手部关键点检测为机械臂提供目标点坐标,A2J(Anchor-to-Joint)是具有代表性的一种利用锚点进行关键点检测的方法。A2J以深度图为输入,可实现较好的检测效果,但对全局特征获取能力不足。文中设计了全局-局部特征融合模块(Global-Local Feature Fusion,GLFF)对骨干网络浅层和深层的特征进行融合。为了提升检测速度,文中将A2J的骨干网络替换为ShuffleNetv2并对其进行改造,用5×5深度可分离卷积替换3×3深度可分离卷积,增大感受野,有效提升了骨干网络对全局特征的提取能力。文中在锚点权重估计分支引入高效通道注意力模块(Efficient Channel Attention,ECA),提升了网络对重要锚点的关注度。在主流数据集ICVL和NYU上进行的训练和测试结果表明,相比于A2J,文中所提方法的平均误差分别降低了0.09 mm和0.15 mm。在GTX1080Ti显卡上实现了151 frame·s-1的检测速率,满足人机协作任务对于实时性的要求。

关键词: 人机协作, 三维手部关键点检测, 锚点, 深度图, 全局-局部特征融合, ShuffleNetv2, 深度可分离卷积, 高效通道注意力

Abstract:

In human-robotcollaboration tasks, hand key point detection provides target point coordinates for the robotic arm.A2J(Anchor-to-Joint) is a representative method of key point detection using anchor points.A2J can achieve better detection effect with depth map input, but it has insufficient ability to acquire global features.In this study, a GLF(Global-Local Feature Fusion) module is designed to fuse the shallow and deep features of the backbone network.In order to improve the detection speed, the backbone network of A2J is replaced with ShuffleNetv2 and reformed, and 3×3 depth separable convolution is replaced with 5×5 depth separable convolution to increase the sensitivity field and effectively improve the backbone network's ability to extract global features.ECA(Efficient Channel Attention) is introduced into the anchor weight estimation branch to improve the network's attention to important anchor points.The results of training and testing on the mainstream data sets ICVL and NYU show that the average error of the proposed method is reduced by 0.09 mm and 0.15 mm, respectively, compared with A2J.The detection rate of 151 frame·s-1 is realized on GTX1080Ti graphics card, which fully meets the real-time requirement of man-machine collaboration task.

Key words: human-robot collaboration, 3D hand keypoint detection, anchor point, depth map, global-local feature fusion, ShuffleNetv2, depthwise separable convolution, efficient channel attention

中图分类号: 

  • TP391.41