电子科技 ›› 2022, Vol. 35 ›› Issue (4): 35-39.doi: 10.16180/j.cnki.issn1007-7820.2022.04.006

• • 上一篇    下一篇

基于三维卷积和哈希方法的视频检索算法

陈汗青,李菲菲,陈虬   

  1. 上海理工大学 光电信息与计算机工程学院,上海 200093
  • 收稿日期:2020-11-24 出版日期:2022-04-15 发布日期:2022-04-15
  • 作者简介:陈汗青(1995-),男,硕士研究生。研究方向:图像处理与模式识别。|李菲菲(1970-),女,博士,教授。研究方向:多媒体信息处理、图像处理与模式识别、信息检索等。|陈虬(1972-),男,博士,教授,博士生导师。 研究方向:图像处理与模式识别、计算机视觉、信息检索等。
  • 基金资助:
    上海市高校特聘教授(东方学者)岗位计划(ES2015XX)

Video Retrieval Algorithm Based on 3D Convolution and Hash Method

Hanqing CHEN,Feifei LI,Qiu CHEN   

  1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 20093,China
  • Received:2020-11-24 Online:2022-04-15 Published:2022-04-15
  • Supported by:
    The Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning(ES2015XX)

摘要:

视频信息检索与其他多媒体检索的最大不同在于视频信息量较大,因此进行视频间相似度计算时的计算量较大。此外,对视频特征的提取中常常忽略视频帧之间的时间相关性,从而导致特征提取不充分,影响视频检索的精度。为此,文中提出基于三维卷积和哈希方法的视频检索方法。该方法构建了一个端到端的框架,使用三维卷积神经网络来提取视频中代表帧的特征,并将视频特征映射到低维的汉明空间中去,在汉明空间计算相似度。在两个视频数据集下的实验结果表明,相较于当前最新的视频检索算法,文中所提方法在精度上有较大的提升。

关键词: 视频检索, 三维卷积, 特征表示, 哈希方法, 监督学习, 特征降维, 汉明空间, 相似度匹配

Abstract:

Different from other multimedia information retrieval, video retrieval requires a large amount of computation in similarity calculation due to the large amount of information contained in videos. In addition, the temporal correlation between video frames is often ignored in feature extraction, which leads to insufficient feature extraction and affects the accuracy of video retrieval. For this problem, this study proposes a video retrieval method based on 3D convolution and Hash method. This method constructs an end-to-end framework, uses a 3D convolutional neural network to extract the features of the representative frames selected from the video, and then maps the features to the low-dimensional Hamming space to calculate the similarity in the Hamming space. Experimental results on two video data sets show that compared with the latest video retrieval algorithms, the proposed method has a greater improvement in accuracy.

Key words: video retrieval, 3D convolution, feature representation, Hash method, supervised learning, feature reduction, Hamming space, similarity matching

中图分类号: 

  • TP391