›› 2016, Vol. 29 ›› Issue (2): 53-.

• 论文 • 上一篇    下一篇

基于PubMed的共词聚类分析方法

冒纯丽,曹春萍   

  1. (上海理工大学 光电信息与计算机工程学院,上海 200093)
  • 出版日期:2016-02-15 发布日期:2016-02-25
  • 作者简介:冒纯丽(1991—),女,硕士研究生。研究方向:数据挖掘。曹春萍(1968—),女,副教授。研究方向:智能决策知识系统,个性化服务。
  • 基金资助:

    国家高科技研究发展计划(863)基金资助项目(2014AA021502)

Co-word Clustering Analysis Based on PubMed

MAO Chunli,CAO Chunping   

  1. (School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
  • Online:2016-02-15 Published:2016-02-25

摘要:

针对传统共词聚类分析法中共词矩阵构建不能全面反映主题词之间的关联问题,提出了基于高频主题词共现于同一篇文献多种格式内容构建共词矩阵的方法,针对传统聚类算法对于类团非球状且类团大小相异较大导致聚类效果不理想等问题,利用改进的CRUE聚类算法对共词矩阵聚类。并对PubMed中肺癌领域相关文献进行共词聚类分析,实验论证了改进后共词聚类分析方法的可行性。

关键词: 共词聚类分析, 共词矩阵, CRUE聚类, PubMed

Abstract:

The co-word matrix in the current co-word clustering analysis can not fully reflect the connection between the keywords.This paper proposes a new method to build co-word matrix based on the high frequency keywords co-occurrence in the same paper with variety of formats.The shortcomings of traditional clustering algorithms,such as poor performance in non-spherical cases and difference in size clusters,are pointed out.The paper proposes an improved CRUE algorithm to cluster the Co-word matrix.The new co-word clustering analysis has been made of lung cancer in PubMed,which proves its feasibility.

Key words: co word clustering analysis;co word matrix;CRUE clustering algorithm;PubMed

中图分类号: 

  • G354