电子科技 ›› 2024, Vol. 37 ›› Issue (10): 30-39.doi: 10.16180/j.cnki.issn1007-7820.2024.10.005

• • 上一篇    下一篇

基于多尺度特征与注意力机制的宫颈病变检测

冯婷1, 应捷1, 杨海马1, 李芳2   

  1. 1.上海理工大学 光电信息与计算机工程学院,上海 200093
    2.同济大学 医学院,上海 200120
  • 收稿日期:2023-03-06 出版日期:2024-10-15 发布日期:2024-11-04
  • 作者简介:冯婷(1998-),女,硕士研究生。研究方向:计算机视觉与光电检测技术。
    应捷(1973-),女,博士,副教授。研究方向:智能检测、医学图像处理、模式识别等。
  • 基金资助:
    上海市科委科技创新行动计划(21S31904200);上海市科委科技创新行动计划(22S31903700)

Detection of Cervical Lesions Based on Multi-Scale Features and Attention Mechanism

FENG Ting1, YING Jie1, YANG Haima1, LI Fang2   

  1. 1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology, Shanghai 200093,China
    2. School of Medical,Tongji University,Shanghai 200120,China
  • Received:2023-03-06 Online:2024-10-15 Published:2024-11-04
  • Supported by:
    Shanghai Science and Technology Innovation Action Plan(21S31904200);Shanghai Science and Technology Innovation Action Plan(22S31903700)

摘要:

宫颈上皮内瘤变(Cervical Intraepithelial Neoplasm,CIN)是宫颈浸润癌变相关度较高的癌前病变,准确检测CIN并对其分类处理有利于减少宫颈癌重症率。针对宫颈病变检测与分类准确率低等问题,文中提出一种融合多尺度特征与多注意力机制的YOLOv5-CBTR(You Only Look Once version 5-Convolutional Block Transformer)宫颈病变图像检测方法。主干网络采用带有SENet(Squeeze-and-Excitation Networks)注意力机制的SE-CSP(SENet-BottleneckCSP)进行特征提取。引入Transformer编码器模块,融合多特征信息并放大,采用多头注意力机制增强病变区域的特征提取能力。在特征融合层引入卷积注意力模块,多尺度融合病变特征信息。在边界回归框计算中引入幂变换,加快模型损失函数的收敛,整体实现宫颈病变的检测与分类。实验结果表明,YOLOv5-CBTR模型对RGB(白光)宫颈病变图像检测与分类的准确率、召回率、mAP(mean Average Precision)和F值分别为93.99%、92.91%、92.80%和93.45%,在多光谱宫颈图像检测与分类中模型的mAP值和F值分别为97.68%和95.23%。

关键词: 宫颈图像, 病变检测, 多尺度特征, 注意力机制, 多光谱图像, 编码器模块, 幂变换, 深度学习

Abstract:

CIN(Cervical Intraepithelial Neoplasm) is a precancerous lesion of the cervix with a high correlation to invasive cervical cancer. Accurate detection and classification of CIN is helpful to reduce the rate of severe cervical cancer. YOLOv5-CBTR(You Only Look Once version 5-Convolutional Block Transformer) cervical lesion detection method is proposed to address the issues of low accuracy in detection and classification of cervical lesions by combining multi-scale features and multiple attention mechanisms. The backbone network employs the SE-CSP (SENet-BottleneckCSP) with SENet (Squeeze-and-Excitation Networks) attention mechanism for feature extraction. The Transformer encoder module is introduced to fuse and amplify multi-feature information, and multi-head attention mechanism is used to enhance the feature extraction ability of lesion regions. Convolutional attention modules are introduced into the feature fusion layer for multiscale fusion of lesion feature information. The power transformation is introduced into the calculation of the boundary regression box, which speeds up the convergence of the model's loss function and realizes the detection and classification of cervical lesions. The experimental results show that the accuracy, recall rate, mAP(mean Average Precision), and F value of YOLOv5-CBTR model for the detection and classification of RGB cervical lesion images are 93.99%, 92.91%, 92.80%, and 93.45%, respectively. The mAP and F values of the model in multispectral cervical image detection and classification are 97.68% and 95.23%, respectively.

Key words: cervical image, lesion detection, multiscale features, attention mechanism, multispectral image, transformer encoder, power transformation, deep learning

中图分类号: 

  • TP391