基于密度和聚类指数改进的K-means算法

›› 2015, Vol. 28 ›› Issue (11): 47-.

基于密度和聚类指数改进的K-means算法

毛秀,冒纯丽,丁岳伟

(上海理工大学光电信息与计算机工程学院,上海 200093)

出版日期:2015-11-15 发布日期:2015-12-15
作者简介:毛秀(1991—),女,硕士研究生。研究方向:数据挖掘。E-mail:1064627898@qq.com
基金资助:
国家高科技研究发展计划基金资助项目(2014AA021502)

Improved K-means Algorithm Based on Density and Clustering Index

MAO Xiu,MAO Chunli,DING Yuewei

(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)

Online:2015-11-15 Published:2015-12-15

摘要/Abstract

摘要：

传统K-means算法中,随机选择到的初始聚类中心不同会得到不一样的簇类,人工给定的k值与实际聚类数较难达到一致,针对这些问题,文中提出了基于密度和聚类指数改进的K-means聚类算法。根据密度获取高密度集HP,从此集合中选择相互之间距离最大的两对对象均值当成第一轮聚类的聚类中心,新的聚类中心可通过最大距离积法获取,并参考聚类指数确定合适的k值。通过进行实验确认了该算法有较高的准确性和有效性。

关键词: k均值算法, 初始聚类中心, 高密度集, 最大距离积法, 聚类指数

Abstract:

The traditional K-means clustering algorithm initializing cluster centers randomly leads to the difference of clustering results.In the actual cases,it is difficult to give the exact number of clusters k to the algorithm in advance.In view of this,an improved K-means algorithm is presented based on density and clustering index.The algorithm generates the high density set named High Point(HP) according to all the objects' densities and chooses the mean of the furthest mutual distance two sample objects couples from HP as the first cluster's initial centers.The new initial center from rest objects in HP can be obtained by the maximum distances product algorithm.At the same time,the new algorithm automatically analyzes the clustering quality as the growth of the k values and determines the optimal number of clusters by selecting the index.Experiments show that the new algorithm achieves higher accuracy and validity.

Key words: K means algorithm;initial clustering center;high density set;maximum distances product;clustering index

中图分类号:

TP306.1

毛秀,冒纯丽,丁岳伟. 基于密度和聚类指数改进的K-means算法[J]. , 2015, 28(11): 47-.

MAO Xiu,MAO Chunli,DING Yuewei. Improved K-means Algorithm Based on Density and Clustering Index[J]. , 2015, 28(11): 47-.

[1]	崔莉. 最优化问题算法重用研究[J]. , 2016, 29(7): 26-.
[2]	任乃飞于璐. 基于混合遗传算法的协同制造系统调度研究[J]. , 2016, 29(6): 29-.
[3]	张宇惠张凤登. 基于HMM模型的驾驶员换道行为识别分析[J]. , 2016, 29(6): 34-.
[4]	顾健. 相关向量机分类方法的应用研究[J]. , 2016, 29(6): 37-.
[5]	周静,崔国民,彭富裕,肖媛. 基于正弦调整的粒子群算法应用于换热网络[J]. , 2016, 29(4): 37-.
[6]	马立新,王继银,栾健,黄阳龙. 三目标自适应变异微粒群算法的无功优化[J]. , 2016, 29(4): 41-.
[7]	李映,李玉龙,王阳萍. 一种改进的混合协同过滤推荐算法[J]. , 2016, 29(4): 45-.
[8]	丁岳伟,窦飞飞. 多目标猫群优化算法支持下的云计算任务调度[J]. , 2016, 29(2): 4-.
[9]	杨俊,魏静萱. 梯度策略自适应差分进化算法[J]. , 2016, 29(1): 25-.
[10]	黎红玲,罗林,蒲冬梅,刘好斌. 基于柯西分布的粒子群优化算法改进[J]. , 2016, 29(1): 33-.
[11]	苏凡军,唐启桂. SBHCF:基于奇异值分解的混合协同过滤推荐算法[J]. , 2016, 29(1): 44-.
[12]	葛懂林,付东翔,朱卫云. 一种改进的对角线解耦在变风空调上的应用[J]. , 2015, 28(11): 37-.
[13]	荣少巍. 基于改进A*算法的水下航行器自主搜索航迹规划[J]. , 2015, 28(4): 17-.