›› 2013, Vol. 26 ›› Issue (7): 7-.

• 论文 • 上一篇    下一篇

科技项目申请书关键词提取方法

罗灏,徐小良,吕跃华   

  1. (1.杭州电子科技大学 计算机学院,浙江 杭州 310018;2.浙江省科技信息研究院 网络中心,浙江 杭州 310006)
  • 出版日期:2013-07-15 发布日期:2013-07-16
  • 作者简介:罗灏(1989—),男,硕士研究生。研究方向:知识发现与数据工程,中间件。E-mail:12234200@163.com。徐小良(1976—),男,教授,硕士生导师。研究方向:知识发现与数据工程,分布式计算。吕跃华(1978—),男,硕士研究生。研究方向:知识发现与数据工程。
  • 基金资助:

    浙江省重大科技专项基金资助项目(No.2008C11102)

Keywords Extraction for Technology Project Application

LUO Hao,XU Xiaoliang,LU Yuehua   

  1. (1.School of Computer Science and Technology,Hangzhou Dianzi University,Hangzhou 310018,China;2.Network Center,Institute of Scientific and Technological Information of Zhejiang Province,Hangzhou 310006,China)
  • Online:2013-07-15 Published:2013-07-16

摘要:

关键词提取在文本相似度计算得到应用。传统的关键词提取方法忽略文本中的未登录词以及缺乏对词语语义的理解。针对科技项目申请书,研究提出一种基于未登录词识别与语义的关键词提取方法。应用Lucene和统计相融合的方法进行分词,并识别未登录词作为申请书关键词的一部分;依据社会网络理论构建词语语义相似度网络,并计算词语关联度提取申请书其他关键词。实验结果表明,与传统的关键词提取方法相比,新方法能提取更准确的关键词,有更好的科技项目相似性检查效果。

关键词: 关键词提取;未登录词;社会网络理论;语义相似度网络

Abstract:

Keywords extraction is very important for the text similarity computing.By the traditional keywords extraction algorithm,the unknown words in the text are ignored and word semantics concept is lacking.In this paper,an extraction algorithm based on unknown words recognition and semantic is proposed for technology project applications.The algorithm integrates Lucene and statistics method for word segmentation and unknown words recognition,and a part of keywords are achieved.Then,the other keywords of applications are extracted by constructing the word semantic similarity network which based on Social Network and calculating the associated degrees of words.Experimental results show that compared with the traditional keyword extraction algorithm,the proposed algorithm can extract more accurate keywords,and it provides a better result of checking on the similarity of technology projects.

Key words: keywords extraction;unknown words;social network theory;semantic similarity network

中图分类号: 

  • TP391