J4 ›› 2014, Vol. 41 ›› Issue (6): 174-180.doi: 10.3969/j.issn.1001-2400.2014.06.029

• 研究论文 • 上一篇    下一篇

一种改进的多模式匹配算法

褚衍杰;李云照;魏强   

  1. (盲信号处理重点实验室,四川 成都  610041)
  • 收稿日期:2013-07-17 出版日期:2014-12-20 发布日期:2015-01-19
  • 通讯作者: 褚衍杰
  • 作者简介:褚衍杰(1982-),男,盲信号处理重点实验室博士研究生,E-mail:chuyanjie@tsinghua.org.cn.

Improved multi-pattern matching algorithm

CHU Yanjie;LI Yunzhao;WEI Qiang   

  1.  (Key Lab. of Science and Technology on Blind Signal Processing, Chengdu  610041, China)
  • Received:2013-07-17 Online:2014-12-20 Published:2015-01-19
  • Contact: CHU Yanjie

摘要:

针对WM算法在模式集规模大且最短模式长度小的情况下性能较低的问题,分析了WM算法及其改进的快速WM(QWM)算法的优缺点,在此基础上提出了模式分集思想,并优化了跳跃和确认机制,设计了子集WM(SWM)算法;然后针对该算法在域名过滤中的应用,对hash函数、匹配顺序等进行进一步优化.针对域名过滤的实验结果表明,当模式数量超过10000条时,SWM算法匹配时间是WM算法的8.9%~11.6%,说明SWM算法在模式集规模较大时,匹配速度能显著提高.

关键词: 模式匹配, 字符串匹配, WM算法, SWM算法, 域名过滤

Abstract:

To resolve the problem that when the number of rules is large and the length of the shortest rule is short, the performance of the WM algorithm will become less efficient, the paper analyzes the WM algorithm and an improved algorithm named the QWM algorithm, then proposes a new algorithm—the SWM algorithm. The new algorithm uses the idea of the sub pattern set and optimizes the shifting and affirming method. To use the SWM algorithm in domain name filtering, a new hash function and a new matching order are designed specially. The results in domain name filtering indicate that the SWM algorithm's matching time is about 8.9%~11.6% that of the WM algorithm when the number of patterns is more than 10000. The SWM algorithm can improve the speed of matching when the scale of the pattern is large.

Key words: pattern matching, string matching, Wu-Manber algorithm, subest Wu-Manber algorithm, domain name filtering