电子科技 ›› 2024, Vol. 37 ›› Issue (10): 64-70.doi: 10.16180/j.cnki.issn1007-7820.2024.10.009

• • 上一篇    下一篇

基于深度学习的行为识别方法

忻腾浩, 李菲菲   

  1. 上海理工大学 光电信息与计算机工程学院,上海 200093
  • 收稿日期:2023-03-14 出版日期:2024-10-15 发布日期:2024-11-04
  • 作者简介:忻腾浩(1996-),男,硕士研究生。研究方向:计算机视觉。
    李菲菲(1970-),女,博士,教授。研究方向:多媒体信息处理、图像处理与模式识别、信息检索等。
  • 基金资助:
    上海市高校特聘教授(东方学者)岗位计划(ES2015XX)

Research on Action Recognition Method Based on Deep Learning

XIN Tenghao, LI Feifei   

  1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology, Shanghai 200093,China
  • Received:2023-03-14 Online:2024-10-15 Published:2024-11-04
  • Supported by:
    The Program for Professor of Special Appointment(Eastern Scholar) at Shanghai Institutions of Higher Learning(ES2015XX)

摘要:

基于深度学习行为识别算法的关键在于提高关键点提取的准确性和稳定性,以此来更准确地识别目标动作。然而,较多算法在目标特征提取阶段仅加入看似具有较好功能的注意力机制,忽略了不同注意力机制对不同模型和任务的影响。因此,文中提出了一种基于不同注意力机制的姿态估计算法模型。该方法通过比较不同注意力机制对模型的影响,进一步说明了选择注意力机制的重要性。同时,考虑到关键点提取的稳定性,对模型的初始化进行微调,通过增加权重判断网络层类别,选择更合适的初始化方法以提高性能。相较于基准网络模型,该模型在多尺度和无多尺度CrowdPose数据集上所有的评价指标均有所提升。其中,平均精度在两种情况下的提升均超过了1%。

关键词: 行为识别, 姿态估计, 计算机视觉, 图卷积神经网络, 关键点, HRNet, 注意力机制, 平均精度

Abstract:

The key of current research on behavior recognition algorithms based on deep learning lies in enhancing the accuracy and stability of key point extraction, in order to achieve more accurate action recognition of targets. However, many current algorithms tend to just add attention mechanisms that appear to perform better in the feature extraction stage of the target, without considering the impact of different attention mechanisms on different models and tasks. Therefore, this study proposes an algorithmic model for pose estimation based on various attention mechanisms, which further highlights the importance of selecting an appropriate attention mechanism by comparing the impact of different attention mechanisms on the model. In addition, considering the stability of key point extraction, the initialization of the model is fine-tuned to select a more suitable initialization method that improves the performance by increasing the category of weights on network layer judgments. Compared with the performance of the benchmark network model, the model enhances all evaluation metrics on both multiscale and no-multiscale CrowdPose datasets, where the average accuracy improvement in both cases is more than 1%.

Key words: behavior recognition, pose estimation, computer vision, graph convolution network, key points, HRNet, attention mechanism, average precision

中图分类号: 

  • TP391