›› 2011, Vol. 24 ›› Issue (4): 24-.

• 论文 • 上一篇    下一篇

基于语义分组向量空间模型的Web新闻检索算法

王强,战忠丽,张凤君   

  1. (1.吉林电子信息职业技术学院 计算机系,吉林 吉林 132021;2.北华大学 计算机学院,吉林 吉林 132021)
  • 出版日期:2011-04-15 发布日期:2011-03-31
  • 作者简介:王强(1972-),男,硕士,讲师。研究方向:Web信息检索和人工智能。战忠丽(1972-),女,副教授。研究方向:多媒体技术。张凤君(1978-),男,硕士,讲师。研究方向:数据库技术。

Web News Retrieval Algorithm Based on the Semantic Group Vector Space Model

 WANG Qiang, ZHAN Zhong-Li, ZHANG Feng-Jun   

  1. (1.Department of Computer Science,Jilin Vocational College of Electronic Information,Jilin 132021,China;
    2.School of Computer Science,Beihua University,Jilin 132021,China)
  • Online:2011-04-15 Published:2011-03-31

摘要:

针对Web新闻信息结构和内容特征,在分析了传统的向量空间模型存在不足的基础上,提出了根据特征词进行语义分组的向量空间模型。该模型将一篇新闻报道中的特征词从语义上划分为相对独立的4个组:时间、地点、人物和事件,进而形成了4个向量空间,并对每个向量空间进行特征项权值和相似度的计算。理论分析和实验结果表明,改进后的模型更适应Web新闻信息的检索,使查准率、查全率和查询速度都有所提高。

关键词: 向量空间模型, 语义分组, 信息检索, 查准率, 查全率

Abstract:

Based on the structural and content feature of Web news information and the analysis of the insufficiency of the traditional vector space model,this paper proposes an improved vector space model that the semantic group is formed according to the characteristic word.This model divides the characteristic word of a report into four relatively independent groups according to the semantic meaning:time,place,character and event,and thus forms four vector spaces.Then the characteristic weight and the similarity to each vector space are calculated.Theoretical analysis and the experimental results show that the improvement of the model adapts better to Web news information retrieval,thus improving the precision,recall and computation speed.

Key words: vector space model;semantic group;information retrieval;precision;recall

中图分类号: 

  • TP391