电子科技 ›› 2019, Vol. 32 ›› Issue (8): 70-74.doi: 10.16180/j.cnki.issn1007-7820.2019.08.015

• • 上一篇    下一篇

基于云计算的数据挖掘系统设计

蓝机满   

  1. 惠州工程职业学院,广东 惠州 516023
  • 收稿日期:2018-10-15 出版日期:2019-08-15 发布日期:2019-08-12
  • 作者简介:蓝机满(1983-),男,讲师。研究方向:大数据数据分析技术。

Design of Data Mining System Based on Cloud Computing

LAN Jiman   

  1. Huizhou Engineering Vocational College,Huizhou 516001,China
  • Received:2018-10-15 Online:2019-08-15 Published:2019-08-12

摘要:

为了高效、快速地解决呈指数增长的数据处理问题,提高数据储存、运算能力,文中提出了基于云计算的数据挖掘系统的设计。该系统首先分析了主流云计算平台Spark的组件构成和运行机制,深入研究其计算架构的编程原理。同时利用Spark进行了C4.5算法和K-medoids聚类算法的并行化设计,有效提高算法的运行速度、收敛速度和结果的稳定性。测试表明,在进行海量数据的分析处理时,文中提出的云计算平台在分类误差内,可有效提高整体系统的运算速度,分类效率也大幅提高。

关键词: 云计算, 数据挖掘, Spark, C4.5算法, K-medoids聚类算法

Abstract:

In order to solve exponentially increasing data processing problems and improve data storage and computing power efficiently and quickly, this paper proposed a cloud computing-based data mining system design. The system first analyzed the component composition and operation mechanism of the mainstream cloud computing platform Spark, and deeply studied the programming principle of its computing architecture. At the same time, Spark was used to parallelize the C4.5 algorithm and K-medoids clustering algorithm, which effectively improved the running speed, convergence speed and stability of the algorithm. The test showed that in the analysis and processing of massive data, the cloud computing platform proposed in this paper could effectively improve the computing speed of the whole system and improve the classification efficiency.

Key words: cloud computing, data mining, Spark, C4.5 algorithm, K-medoids clustering algorithm

中图分类号: 

  • TN99