Electronic Science and Technology ›› 2019, Vol. 32 ›› Issue (5): 38-44.doi: 10.16180/j.cnki.issn1007-7820.2019.05.008

Previous Articles     Next Articles

Improved CFSFDP Algorithm Based on Spark Framework

LI Qi,ZHANG Xin,ZHANG Pingkang,ZHANG Hang   

  1. School of Big Data and Information Engineering,Guizhou University,Guiyang 550025,China
  • Received:2018-04-29 Online:2019-05-15 Published:2019-05-06
  • Supported by:
    International Science & Technology Cooperation Program of China(2014DFA00670);Postgraduate Education Reform Project of Guizhou Province(课题黔教研合JG字[2016]15);Key Industry Project of Guizhou Science and Techonology Agency(黔科合GY字[2010]3056)

Abstract:

CFSFDP algorithm based on density is a clustering algorithm. In order to rid dependency on artificial selection of decision graph, this paper used the idea of slope to calculate the demarcation point of clustering center points and un-clustering center points. This improvement eliminated personal equation and realized auto-calculation of center points. Parallel processing for the algorithm was conducted through the Spark framework. The experiments showed that this algorithm was applicable to clustering analysis of mass data, since it improved efficiency by eliminating personal equation and displayed great speed up ratio and extendibility after paralleling.

Key words: Spark, CFSFDP algorithm, decision diagram, density peaks, clustering, parallel

CLC Number: 

  • TP301.6