电子科技 ›› 2019, Vol. 32 ›› Issue (12): 32-36.doi: 10.16180/j.cnki.issn1007-7820.2019.12.007

• • 上一篇    下一篇

基于Attention-CTC的自然场景文本识别算法

和文杰,刘敬彪,潘勉,吕帅帅   

  1. 杭州电子科技大学 电子信息学院,浙江 杭州 310018
  • 收稿日期:2018-12-07 出版日期:2019-12-15 发布日期:2019-12-24
  • 作者简介:和文杰(1994-),男,硕士研究生。研究方向:机器视觉。|刘敬彪(1964-),男,博士,教授。研究方向:电子系统集成技术。
  • 基金资助:
    国家自然科学基金(61871164);国家自然科学基金(61501155)

Natural Scene Text Recognition Algorithm Based on Attention-CTC

HE Wenjie,LIU Jingbiao,PAN Mian,LÜ Shuaishuai   

  1. School of Electronic and Information,Hangzhou Dianzi University,Hangzhou 310018,China
  • Received:2018-12-07 Online:2019-12-15 Published:2019-12-24
  • Supported by:
    National Natural Science Foundation of China(61871164);National Natural Science Foundation of China(61501155)

摘要:

针对自然场景下文本识别所存在的字符分割困难、识别精度依赖字典等问题,文中提出了一种基于注意力机制与连接时间分类损失相结合的文本识别算法。利用卷积神经网络与双向长短时期记忆网络实现对图像的特征编码,再使用Attention-CTC结构实现对特征序列的解码,有效解决Attention解码无约束的问题。该算法避免了对标签进行额外对齐预处理和后续语法处理,在加快训练收敛速度的同时显著提高了文本识别率。实验结果表明,该算法对字体模糊、背景复杂的文本图像都具有很好的鲁棒性。

关键词: 文本识别, 连接时间分类, 卷积神经网络, 循环神经网络, 多尺度特征提取, 注意力机制

Abstract:

In order to solve the problems of the difficulty of character segmentation in text recognition and the recognition accuracy dependent on dictionary in natural scene, a text recognition algorithm with attention mechanism and connection time classification loss was proposed. The convolutional neural network and the bidirectional long-term memory network were used to extra the feature of the image. The Attention-CTC structure was used to decode the feature sequence, which effectively solved the problem of Attention decoding unconstrained. The algorithm avoided additional alignment preprocessing and subsequent syntax processing on the tag, which sped up the training convergence rate and significantly improved the text recognition rate. Experimental results showed that the algorithm was robust to texts with complex fonts and complex backgrounds.

Key words: text recognition, connection time classification, convolution neural network, recurrent neural network, multi-scale feature extraction, attention mechanism

中图分类号: 

  • TP391