基于Attention-CTC的自然场景文本识别算法

doi:10.16180/j.cnki.issn1007-7820.2019.12.007

Abstract

Abstract:

In order to solve the problems of the difficulty of character segmentation in text recognition and the recognition accuracy dependent on dictionary in natural scene, a text recognition algorithm with attention mechanism and connection time classification loss was proposed. The convolutional neural network and the bidirectional long-term memory network were used to extra the feature of the image. The Attention-CTC structure was used to decode the feature sequence, which effectively solved the problem of Attention decoding unconstrained. The algorithm avoided additional alignment preprocessing and subsequent syntax processing on the tag, which sped up the training convergence rate and significantly improved the text recognition rate. Experimental results showed that the algorithm was robust to texts with complex fonts and complex backgrounds.

Key words: text recognition, connection time classification, convolution neural network, recurrent neural network, multi-scale feature extraction, attention mechanism

CLC Number:

TP391

HE Wenjie,LIU Jingbiao,PAN Mian,LÜ Shuaishuai. Natural Scene Text Recognition Algorithm Based on Attention-CTC[J].Electronic Science and Technology, 2019, 32(12): 32-36.

Figures/Tables 5

References 18

[1]	Yao C, Bai X, Shi B , et al. Strokelets:a learned multi-scale representation for scene text recognition [C].Columbus: Computer Vision and Pattern Recognition, 2014.
[2]	Jaderberg M, Vedaldi A, Zisserman A . Deep features for text spotting [C].Cham:Computer Vision, 2014.
[3]	薛皓天, 杨晶东, 谈凯德 . 一种改进的BP神经网络在手写体识别上的应用[J]. 电子科技, 2015,28(5):20-23.
	Xue Haotian, Yang Jingdong, Tan Kaide . Application of an improved BP neural network in handwriting recognition[J]. Electronic Science and Technology, 2015,28(5):20-23.
[4]	熊海朋, 陈洋洋, 陈春玮 . 基于卷积神经网络的场景图像文本定位研究[J]. 电子科技, 2018,31(1):50-53.
	Xiong Haipeng, Chen Xiangxiang, Chen Chunwei . Text location in image based on convolution neural network[J]. Electronic Science and Technology, 2018,31(1):50-53.
[5]	Bahdanau D, Chorowski J, Serdyuk D , et al. End-to-end attention-based large vocabulary speech recognition [C]. Shanghai:The 41^st IEEE International Conference on Acoustics, Speech and Signal Processing , 2016.
[6]	Luong M T, Pham H, Manning C D . Effective approaches to attention-based neural machine translation [C].Lisbon: Empirical Methods in Natural Language Processing, 2015.
[7]	Qu S, Xi Y, Ding S . Visual attention based on long-short term memory model for image caption generation [C]. Melbourne:Control & Decision Conference. 2017.
[8]	Bahdanau D, Cho K, Bengio Y . Neural machine translation by jointly learning to align and translate [C].San Diego: International Conference on Learning Representations, 2015.
[9]	Graves A, Gomez F . Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks [C].Hong Kong:International Conference on Machine Learning, 2006.
[10]	尹征, 唐春晖, 张轩雄 . 基于改进型稀疏自动编码器的图像识别[J]. 电子科技, 2016,29(1):124-127.
	Yin Zheng, Tang Chunhui, Zhang Xuanxiong . Image recognition based on improved sparse auto-encoder[J]. Electronic Science and Technology, 2016,29(1):124-127.
[11]	Hochreiter S, Schmidhuber J . Long short-term memory[J]. Neural Computation, 1997,9(8):1735-1780. doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276
[12]	Lucas S M, Panaretos A, Sosa L , et al. ICDAR 2003 robust reading competitions[J]. Proceeding of the Icdar, 2003,7(2-3):105-122.
[13]	Ioffe S, Szegedy C . Batch normalization: accelerating deep network training by reducing internal covariate shift [C].Lille Grand Palais:International Conference on Machine Learning, 2015.
[14]	Szegedy C, Vanhoucke V, Ioffe S , et al. Rethinking the inception architecture for computer vision [C].Las Vegas:Computer Vision and Pattern Recognition, 2016.
[15]	Szegedy C, Ioffe S, Vanhoucke V , et al. Inception-v4, inception-resnet and the impact of residual connections on learning [C].San Francisco:The Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[16]	Kim S, Hori T, Watanabe S . Joint CTC-attention based end-to-end speech recognition using multi-task learning [C].New Orleans:The 42^nd IEEE International Conference on Acoustics,Speech and Signal Processing , 2017.
[17]	Hori T, Watanabe S, Zhang Y , et al. Advances in joint CTC-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM [C].USA:IEEE International Conference, 2017.
[18]	Xu K, Li D, Cassimatis N , et al. LCANet:end-to-end lipreading with cascaded attention-CTC [C].Xi’an:China Automatic Face & Gesture Recognition, 2018.

模型	ICDAR03	IIIT5K	SVT
CRNN	0.76	0.68	0.70
Attention	0.80	0.74	0.77
Proposed (0.2)	0.86	0.76	0.78
Proposed (0.5)	0.81	0.73	0.76
Proposed (0.8)	0.78	0.70	0.71

Natural Scene Text Recognition Algorithm Based on Attention-CTC

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 5

References 18

Related Articles 7

Metrics

Comments

Recommended 10

[1]	ZHANG Shisen,SUN Xiankun,YIN Ling,LI Shixi. Design of Text Title Generation Prototype System Based on Neural Network [J]. Electronic Science and Technology, 2021, 34(5): 35-41.
[2]	XUE Yongjie,JU Zhiyong. Fish Recognition Algorithm Based on Improved AlexNet [J]. Electronic Science and Technology, 2021, 34(4): 12-17.
[3]	ZHU Bin,LIU Zilong. Image Classification Method Using Convolutional Neural Network Based on New Initial Module [J]. Electronic Science and Technology, 2021, 34(2): 52-56.
[4]	LI Fupeng,FU Dongxiang. Sentiment Analysis Method of Financial Text Based on Transformer Encoder [J]. Electronic Science and Technology, 2020, 33(9): 10-15.
[5]	QIN Xing,GAO Xiaoqi,CHEN Bin. Image Super-resolution Algorithm Based on SqueezeNet Convolution Neural Network [J]. Electronic Science and Technology, 2020, 33(5): 1-8.
[6]	ZHENG Meng. Research on Intelligent English Translation Method Based on Improved Attention Mechanism Model [J]. Electronic Science and Technology, 2020, 33(11): 84-87.
[7]	XIONG Hai-Peng, CHEN Xiang-Xiang, CHEN Chun-Wei. Text Location in Image Based on Convolution Neural Network [J]. , 2018, 31(1): 50-.