›› 2017, Vol. 30 ›› Issue (9): 46-.

• 论文 • 上一篇    下一篇

基于文本布局块距离度量的文档图像检索

王牡丹,邬春学   

  1. (上海理工大学 光电信息与计算机工程学院,上海 200093)
  • 出版日期:2017-09-15 发布日期:2017-11-03
  • 作者简介:王牡丹(1990-),女,硕士研究生。研究方向:数字图像处理。邬春学(1964-),男,博士,教授。研究方向:嵌入式系统及应用等。
  • 基金资助:

    国家自然科学基金(61202376);上海市教育基金会晨光计划基金(10CG49)

Document Image Retrieval Based on Text Layout Block Distance

WANG Mudan,WU Chunxue   

  1. (School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093, China)
  • Online:2017-09-15 Published:2017-11-03

摘要:

针对现有基于图像文档转换为文本后进行文档检索的方法,无法满足当今超大量数字图像库的处理场景。文中提出一种基于文本布局块的文档图像检索方法。根据文本布局块之间的距离特征,定义了新的距离函数,利用新的距离函数计算得到文本布局块之间的距离矩阵,并结合匈牙利算法求出文档图像的最佳匹配结果。通过大量实验证明,所提方法能够有效地提高图像文档检索准确度,并且能保证78.2%的正确率。

关键词: 图像文档检索, 文档图像分割, 文本布局块, 距离函数, 匈牙利算法

Abstract:

The existing methods of document retrieval based on the conversion of image documents into text can not meet the processing scenes of todays large number of digital image databases. This paper proposes a document image retrieval method based on text layout block. According to the feature of distance between text blocks, a new distance function is defined. Then, the new distance function is used to calculate the distance matrix between text blocks. Finally, the best matching result is obtained by combining the Hungarian algorithm. A lot of experiments show that this method can effectively improve the image document retrieval accuracy, and can guarantee the correct rate of 782%.

Key words: image document retrieval, document image segmentation, text layout block, distance function, Hungarian algorithm

中图分类号: 

  • TN911.73