电子科技 ›› 2025, Vol. 38 ›› Issue (9): 1-8.doi: 10.16180/j.cnki.issn1007-7820.2025.09.001

• •    下一篇

基于深度学习的红外与可见光图像匹配

熊子恒(), 张轩雄   

  1. 上海理工大学 光电信息与计算机工程学院, 上海 200093
  • 收稿日期:2024-01-30 修回日期:2024-02-22 出版日期:2025-09-15 发布日期:2025-09-23
  • 通讯作者: 熊子恒 E-mail:Andy_xiong123@163.com
  • 作者简介:张轩雄(1965-),男,博士,教授。研究方向:微电子机械系统。
  • 基金资助:
    国家自然科学基金(62276167)

Deep Learning-Based Image Matching of Infrared and Visible Image

XIONG Ziheng(), ZHANG Xuanxiong   

  1. School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
  • Received:2024-01-30 Revised:2024-02-22 Online:2025-09-15 Published:2025-09-23
  • Contact: XIONG Ziheng E-mail:Andy_xiong123@163.com
  • Supported by:
    National Natural Science Foundation of China(62276167)

摘要:

基于卷积神经网络(Convolutional Neural Network, CNN)的图像相似度检测算法对图像特征的表达能力较差,且只适用于单一特定任务,易出现过拟合风险。针对该问题,文中提出一种基于深度学习的红外与可见光图片匹配方法(Deep Learning-based Image Matching of Infrared and Visible Image, DLIVM)。采用批通道归一化(Batch Channel Normalization, BCN)、注意力机制、度量学习和Frobenius范数等技术提升图像匹配性能和泛化能力。使用将批归一化层(Batch Normalization, BN)改进为BCN的ResNet-50(Residual Neural Network-50)网络作为主干网络提取图片特征,同时在残差单元内部加入注意力机制。使用二元交叉熵损失和度量学习相结合的方法构建目标函数,提升特征表示判别能力。使用Frobenius范数对模型参数进行正则化,以防止过拟合。结果表明,在3个广泛使用的红外与可见光数据集上,相较于对比方法,DLIVM方法的准确率分别提高了3.30%、0.86%、2.00%、7.50%、1.50%、0.69%。

关键词: 图像匹配, 卷积神经网络, 批通道归一化, 异源图像匹配, 注意力机制, 度量学习, 深度学习, 模态不变性

Abstract:

The image similarity detection algorithm based on CNN(Convolutional Neural Network) has poor ability to express image features and is only suitable for a single specific task, which is prone to overfitting risk. In this study, a method of DLIVM(Deep Learning-based Image Matching of Infrared and Visible Image) is proposed. This method uses BCN(Batch Channel Normalization), attention mechanism, metric learning and Frobenius norm to improve image matching performance and generalization ability. The ResNet-50(Residual Neural Network-50)network, which modified the BN(Batch Normalization) layer to BCN, is used as the backbone network to extract image features, and the attention mechanism is added inside the residual unit. The objective function is constructed by combining binary cross entropy loss and metric learning to improve the distinguishing ability of feature representation. The model parameters are regularized using the Frobenius norm to prevent overfitting. The results show that on three widely used infrared and visible data sets, the accuracy of DLIVM method is improved by 3.30%, 0.86%, 2.00%, 7.50%, 1.50% and 0.69%, respectively,when compared with the comparison method.

Key words: image matching, convolutional neural network, batch channel normalization, heterologous image matching, attention mechanism, metric learning, deep learning, modality invariance

中图分类号: 

  • TP391