›› 2014, Vol. 27 ›› Issue (2): 29-.

• Articles • Previous Articles     Next Articles

Efficient Algorithm of Canopy-Kmeans Based on Hadoop Platform

 ZHAO Qing   

  1. (School of Electronic Engineering,Xidian University,Xi'an 710071,China)
  • Online:2014-02-15 Published:2014-01-12

Abstract:

This paper studies MapReduce programming model under the Hadoop platform,analyzes the advantages and the disadvantages of traditional Kmeans and Canopy algorithms,and then proposes an improved Kmeans algorithm based on Canopy.The "minimum maximum principle" is used to improve the randomicity problem of Canopy-Kmeans algorithm to avoid the blindness of Cannopy.The MapReduce parallel programming method is carried out in massive news aggregation.The experiments show that this method has higher accuracy and stability than the traditional Kmeans and Canopy algorithms.

Key words: Hadoop;MapReduce;Canopy-Kmeans algorithm;clustering

CLC Number: 

  • TP301.6