基于Hadoop平台下的Canopy-Kmeans高效算法

›› 2014, Vol. 27 ›› Issue (2): 29-.

基于Hadoop平台下的Canopy-Kmeans高效算法

赵庆

(西安电子科技大学电子工程学院,陕西西安 710071)

出版日期:2014-02-15 发布日期:2014-01-12
作者简介:赵庆(1988—),男,硕士研究生。研究方向:云计算,Hadoop平台下大数据及大规模数据挖掘。E-mail:522698733@qq.com

Efficient Algorithm of Canopy-Kmeans Based on Hadoop Platform

ZHAO Qing

(School of Electronic Engineering,Xidian University,Xi'an 710071,China)

Online:2014-02-15 Published:2014-01-12

摘要/Abstract

摘要：

介绍了Hadoop平台下MapReduce的编程模型;分析了传统聚类Kmeans和Canopy算法的优缺点,并提出了基于Canopy的改进Kmeans算法。针对Canopy-Kmeans算法中Canopy选取的随机性问题,采用“最小最大原则”对该算法进行改进,避免了Cannopy选取的盲目性。采用MapReduce并行编程方法,以海量新闻信息聚类作为应用背景。实验结果表明,此方法相对于传统Kmeans和Canopy算法有着更高的准确率和稳定性。

关键词: Hadoop, MapReduce, Canopy-Kmeans算法, 聚类

Abstract:

This paper studies MapReduce programming model under the Hadoop platform,analyzes the advantages and the disadvantages of traditional Kmeans and Canopy algorithms,and then proposes an improved Kmeans algorithm based on Canopy.The "minimum maximum principle" is used to improve the randomicity problem of Canopy-Kmeans algorithm to avoid the blindness of Cannopy.The MapReduce parallel programming method is carried out in massive news aggregation.The experiments show that this method has higher accuracy and stability than the traditional Kmeans and Canopy algorithms.

Key words: Hadoop;MapReduce;Canopy-Kmeans algorithm;clustering

中图分类号:

TP301.6

赵庆. 基于Hadoop平台下的Canopy-Kmeans高效算法[J]. , 2014, 27(2): 29-.

ZHAO Qing. Efficient Algorithm of Canopy-Kmeans Based on Hadoop Platform[J]. , 2014, 27(2): 29-.

[1]	金霄,吴飞,鄢松,陆雯霞,张忠艺. 基于GAWK-means的地铁车站指纹定位方法[J]. 电子科技, 2022, 35(2): 34-39.
[2]	徐航帆,刘丛,唐坚刚,彭敦陆. 改进地标点采样的加速谱聚类算法[J]. 电子科技, 2021, 34(5): 47-53.
[3]	杨珊珊,张大兴,郭家伟,王诗迢. 结合关键点和块优点的复制粘贴检测算法[J]. 电子科技, 2020, 33(3): 38-43.
[4]	缪冉,李菲菲,陈虬. 基于卷积神经网络与多尺度空间编码的场景识别方法[J]. 电子科技, 2020, 33(12): 54-58.
[5]	蓝机满. 基于云计算的数据挖掘系统设计[J]. 电子科技, 2019, 32(8): 70-74.
[6]	张长青,杨楠. 基于车联网大数据分析的实时路况检测系统[J]. 电子科技, 2019, 32(8): 66-70.
[7]	李琪,张欣,张平康,张航. 基于Spark框架的CFSFDP改进算法[J]. 电子科技, 2019, 32(5): 38-44.
[8]	章裕润,吴飞,毛万葵. 基于WiFi-GM指纹的室内定位算法[J]. 电子科技, 2019, 32(5): 49-54.
[9]	向志华,邵亚丽. 一种结合贪心选择和特征加权的高维数据聚类算法[J]. 电子科技, 2019, 32(11): 70-73.
[10]	于慧,王宇嘉,陈强,肖闪丽. 基于多种群动态协同的多目标粒子群算法[J]. 电子科技, 2019, 32(10): 28-33.
[11]	潘迪. 基于DBSCAN与梯度划分的Kinect障碍物轮廓检测算法[J]. 电子科技, 2019, 32(1): 86-90.
[12]	滕泉 1，沈景凤 1，徐斌 2，王玮玮 3. 基于贪婪算法的旅游路线优化问题[J]. , 2017, 30(9): 142-.
[13]	杨臣君，张欣，杨卓东. 基于Hadoop的交通数据分析系统[J]. , 2017, 30(4): 156-.
[14]	李根，王亚刚，周小伟，张凤登. 一种基于密度均值的谱聚类算法[J]. , 2016, 29(8): 74-.
[15]	邬春学，刘训洋. 改进粒子群结合K均值聚类的图像分割算法[J]. , 2016, 29(8): 92-.