西安电子科技大学学报 ›› 2024, Vol. 51 ›› Issue (3): 113-123.doi: 10.19665/j.issn1001-2400.20230801

• 计算机科学与技术 & 人工智能 • 上一篇    下一篇

基于多边形特征池化与融合的复杂文本检测

张相南1(), 高新波2(), 田春娜1()   

  1. 1.西安电子科技大学 电子工程学院,陕西 西安 710071
    2.重庆邮电大学 计算机科学与技术学院 图像认知重庆市重点实验室,重庆 400065
  • 收稿日期:2023-03-13 出版日期:2024-06-20 发布日期:2023-08-22
  • 通讯作者: 田春娜(1980—),女,教授,E-mail:chnatian@xidian.edu.cn
  • 作者简介:张相南(1991—),男,西安电子科技大学博士研究生,E-mail:zxnn81@outlook.com
    高新波(1972—),男,教授,E-mail:gaoxb@cqupt.edu.cn
  • 基金资助:
    国家自然科学基金(62173265);国家自然科学基金(62036007)

Complex text region detection based on polygon feature pooling and the transformer

ZHANG Xiangnan1(), GAO Xinbo2(), TIAN Chunna1()   

  1. 1. School of Electronic Engineering,Xidian University,Xi’an 710071,China
    2. Chongqing Key Laboratory of Image Cognition,College of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
  • Received:2023-03-13 Online:2024-06-20 Published:2023-08-22

摘要:

文本检测在图像理解中发挥着重要的作用。基于深度学习的文本检测是当前的主流算法,包括单阶段方法和双阶段方法两类,而且后者的检测精度往往高于前者。双阶段的检测方法通常包含感兴趣区域特征池化操作,为进一步的检测和识别任务提供特定维度的局部区域特征。然而对于弯曲文本等复杂文本区域来说,现有的基于矩形感兴趣区域的池化方法不再适用,而基于点特征替代区域特征的方法又损失了空间信息。针对该问题,提出了一种基于多边形特征池化和Transformer的复杂文本区域检测方法。首先,将复杂文本区域检测中感兴趣区域进行多边形特征池化,将池化操作的区域形状从矩形拓展到多边形并且不需要借助其他形状进行拟合,即可将多边形区域对应的特征池化为固定维度的特征序列,避免了拟合过程中出现误差。进而,将池化后的特征视为具有空间关系的序列,然后利用Transformer融合视觉特征之间的上下文关系,降低训练难度,提升检测精确度。在包含弯曲文本等复杂文本情况的ICDAR2015、MLT、Total Text和CTW1500数据集上的测试实验结果表明,提出的双阶段检测算法能更好地提取感兴趣区域特征,并取得了比现有方法更好的检测结果。

关键词: 文本检测, 双阶段方法, 多边形, 特征池化, Transformer

Abstract:

Text detection plays an important role in image understanding,and deep-learning-based algorithms are popular methods including single-stage and two-stage methods.Usually,two-stage based text detection methods have a higher accuracy than the single stage based methods.The two-stage text detection method usually contains the feature pooling operation in the region of interests(RoI),which provides the local region features with fixed dimensions for further detection and recognition tasks.However,for complex text areas such as a curved text,the existing pooling methods based on the rectangular RoI are no longer applicable.Using point features instead of area features to solve the problem loses spatial information compared with area features.To address this issue,we propose a complex text region detection method based on polygon feature pooling and Transformer.First,we extend the feature pooling shape of RoI from the rectangle to the polygon,which does not need any shape fitting.and the features of polygon RoI with fixed dimensions are pooled,which avoids the error in the fitting process.Furthermore,the pooled polygon region features are regarded as context-sensitive sequences,which are input to the Transformer to fuse the context of the visual feature to reduce the training difficulties and improves the detection accuracy.Our experiments on the complex text region datasets,such as ICDAR2015,MLT,Total Text and CTW1500,show that the proposed two-stage detection algorithm can extract the features of RoI very well and achieves better detection results than the state-of-the-art methods.

Key words: text region detection, two-stage methods, polygon, feature pooling, Transformer

中图分类号: 

  • TP391