西安电子科技大学学报 ›› 2024, Vol. 51 ›› Issue (1): 125-134.doi: 10.19665/j.issn1001-2400.20230304

• 计算机科学与技术 • 上一篇    下一篇

用于语义分割的自监督对比式表征学习

刘博翀(), 蔡怀宇(), 汪毅(), 陈晓冬()   

  1. 天津大学 精密仪器与光电子工程学院 光电信息技术教育部重点实验室,天津 300072
  • 收稿日期:2022-10-25 出版日期:2024-01-20 发布日期:2023-08-22
  • 通讯作者: 蔡怀宇(1965—),女,教授,E-mail:hycai@tju.edu.cn
  • 作者简介:刘博翀(1998—),男,天津大学硕士研究生,E-mail:2020202002@tju.edu.cn
    汪毅(1981—),女,副教授,E-mail:koala_wy@tju.edu.cn
    陈晓冬(1975—),男,教授,E-mail:xdchen@tju.edu.cn
  • 基金资助:
    天津市科技计划项目(17ZXGGX00140)

Self-supervised contrastive representation learning for semantic segmentation

LIU Bochong(), CAI Huaiyu(), WANG Yi(), CHEN Xiaodong()   

  1. Ministry of Education Key Laboratory of Optoelectronic Information Technology,School of Precision Instrument and Optoelectronic Engineering,Tianjin University,Tianjin 300072,China
  • Received:2022-10-25 Online:2024-01-20 Published:2023-08-22

摘要:

为了提升语义分割模型的精度,并减少逐像素标注大规模语义分割数据集的人力和时间成本,研究了自监督对比式表征学习的预训练方法,并结合语义分割任务的特点,设计了全局-局部交叉对比学习(GLCCL)方法。该方法将全局图像和局部分块后的一系列图像块输入到网络中分别编码全局和局部视觉表征,并通过构建包含全局对比、局部对比和全局-局部交叉对比的损失函数来指导模型训练,使得模型能够同时学习全局和局部区域的视觉表征以及跨区域语义相关性。使用该方法预训练BiSeNet再迁移到语义分割任务时,对比现有的自监督对比式表征学习和有监督预训练方法分别具有0.24%和0.9%平均交并比(MIoU)的性能提升。实验结果表明,该方法能够采用无标注的数据训练语义分割模型而实现分割效果的提升,具有一定的实用价值。

关键词: 语义分割, 自监督表征学习, 对比学习, 深度学习

Abstract:

To improve the accuracy of the semantic segmentation models and avoid the labor and time costs of pixel-wise image annotation for large-scale semantic segmentation datasets,this paper studies the pre-training methods of self-supervised contrastive representation learning,and designs the Global-Local Cross Contrastive Learning(GLCCL) method based on the characteristics of the semantic segmentation task.This method feeds global images and a series of image patches after local chunking into the network to extract global and local visual representations respectively,and guides the network training by constructing loss function that includes global contrast,local contrast,and global-local cross contrast,enabling the network to learn both global and local visual representations as well as cross-regional semantic correlations.When using this method to pre-train BiSeNet and transfer to the semantic segmentation task,compared with the existing self-supervised contrastive representational learning and supervised pre-training methods,the performance improvement of 0.24% and 0.9% mean intersection over union(MIoU) is achieved.Experimental results show that this method can improve the segmentation results by pre-training the semantic segmentation model with unlabeled data,which has a certain practical value.

Key words: semantic segmentation, self-supervised representation learning, contrastive learning, deep learning

中图分类号: 

  • TP391.4