西安电子科技大学学报 ›› 2021, Vol. 48 ›› Issue (5): 8-14.doi: 10.19665/j.issn1001-2400.2021.05.002

• • 上一篇    下一篇

一种多尺度三维卷积的视频超分辨率方法

詹克羽(),孙岳(),李颖()   

  1. 西安电子科技大学 综合业务网理论及关键技术国家重点实验室,陕西 西安 710071
  • 收稿日期:2020-05-12 出版日期:2021-10-20 发布日期:2021-11-09
  • 通讯作者: 孙岳
  • 作者简介:詹克羽(1996—),男,西安电子科技大学硕士研究生,E-mail: qq_zky@163.com|李 颖(1973—),女,教授,博士,E-mail: yli@mail.xidian.edu.cn
  • 基金资助:
    国家自然科学基金(61971333)

Video super-resolution based on multi-scale 3D convolution

ZHAN Keyu(),SUN Yue(),LI Ying()   

  1. State Key Laboratory of Integrated Service Networks,Xidian University,Xi’an 710071,China
  • Received:2020-05-12 Online:2021-10-20 Published:2021-11-09
  • Contact: Yue SUN

摘要:

视频超分辨率技术可由低分辨率视频获得高分辨率视频,有效提升视频的显示效果。与单幅图像超分辨率不同,如何利用相邻视频帧之间的信息在视频超分辨率中则显得十分重要。为改善视频超分辨率重建的性能,充分利用视频帧的时间-空间相关性,提出一种基于多尺度三维卷积的视频超分辨率模型。该模型输入连续的多帧视频图像,输出中间帧的超分辨率重建结果,包括多尺度特征提取、特征融合以及高分辨率重建3个模块。首先,使用多尺度的三维卷积进行初步特征提取;然后,使用三维卷积残差结构进行特征融合,并将特征图进行通道分离,在融合不同尺度的特征时,有效地减少了网络的参数量;最后,使用多个残差密集连接块和亚像素卷积进行高分辨率重建,并结合全局残差连接得到重建的高分辨率视频图像。Vid4数据集上3倍和4倍超分辨率放大的实验结果表明,与其他已有方法相比,该方法可有效提升峰值信噪比和结构相似性性能,取得较好的视觉效果。

关键词: 视频超分辨率, 三维卷积, 残差网络, 时间-空间相关性

Abstract:

Video super-resolution aims to restore high-resolution videos from low-resolution videos,which can effectively improve the display effect of videos.What is different from single image super-resolution is that how to exploit the information between contiguous video frames is important for video super-resolution.In order to improve the performance of video super-resolution and make full use of the spatio-temporal information on video frames,a video super-resolution model based on multi-scale 3D convolution is proposed,which takes continuous video frames as the input and outputs the reconstruction super-resolution result of the intermediate frame.This model consists of three modules:multi-scale feature extraction,feature fusion and high-resolution reconstruction.First,multi-scale 3D convolution is used for preliminary feature extraction.Then,3D convolution residual structure is adopted in feature fusion,and the feature maps are split,which can not only fuse the features of different scales,but also effectively reduce the number of network parameters.Finally,residual dense blocks and sub-pixel convolution are used for high-resolution reconstruction,and the reconstructed video frame is obtained by combining with the global residual connection.Experimental results of 3× and 4× super-resolution in Vid4 dataset show that compared with other methods,the proposed method can enhance the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) performance effectively with a better visual effect.

Key words: video super-resolution, 3D convolution, residual network, spatio-temporal correlation

中图分类号: 

  • TP391.4