›› 2015, Vol. 28 ›› Issue (11): 47-.

• 论文 • 上一篇    下一篇

基于密度和聚类指数改进的K-means算法

毛秀,冒纯丽,丁岳伟   

  1. (上海理工大学 光电信息与计算机工程学院,上海 200093)
  • 出版日期:2015-11-15 发布日期:2015-12-15
  • 作者简介:毛秀(1991—),女,硕士研究生。研究方向:数据挖掘。E-mail:1064627898@qq.com
  • 基金资助:

    国家高科技研究发展计划基金资助项目(2014AA021502)

Improved K-means Algorithm Based on Density and Clustering Index

MAO Xiu,MAO Chunli,DING Yuewei   

  1. (School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
  • Online:2015-11-15 Published:2015-12-15

摘要:

传统K-means算法中,随机选择到的初始聚类中心不同会得到不一样的簇类,人工给定的k值与实际聚类数较难达到一致,针对这些问题,文中提出了基于密度和聚类指数改进的K-means聚类算法。根据密度获取高密度集HP,从此集合中选择相互之间距离最大的两对对象均值当成第一轮聚类的聚类中心,新的聚类中心可通过最大距离积法获取,并参考聚类指数确定合适的k值。通过进行实验确认了该算法有较高的准确性和有效性。

关键词: k均值算法, 初始聚类中心, 高密度集, 最大距离积法, 聚类指数

Abstract:

The traditional K-means clustering algorithm initializing cluster centers randomly leads to the difference of clustering results.In the actual cases,it is difficult to give the exact number of clusters k to the algorithm in advance.In view of this,an improved K-means algorithm is presented based on density and clustering index.The algorithm generates the high density set named High Point(HP) according to all the objects' densities and chooses the mean of the furthest mutual distance two sample objects couples from HP as the first cluster's initial centers.The new initial center from rest objects in HP can be obtained by the maximum distances product algorithm.At the same time,the new algorithm automatically analyzes the clustering quality as the growth of the k values and determines the optimal number of clusters by selecting the index.Experiments show that the new algorithm achieves higher accuracy and validity.

Key words: K means algorithm;initial clustering center;high density set;maximum distances product;clustering index

中图分类号: 

  • TP306.1