›› 2018, Vol. 31 ›› Issue (3): 48-.

• 论文 • 上一篇    下一篇

基于决策树分类算法异构数据的索引优化

 郑博文, 赵逢禹   

  1. 上海理工大学 光电信息与计算机工程学院
  • 出版日期:2018-03-15 发布日期:2018-03-15
  • 作者简介:郑博文(1992-),男,硕士研究生。研究方向:软件工程、数据挖掘。 赵逢禹(1963-),男,博士,教授。研究方向:软件工程、软件质量控制。
  • 基金资助:

    国家自然科学基金青年基金(61402288)

Optimization for Heterogenuous Data Index Based on Decision Tree Classification

ZHENG Bowen,ZHAO Fengyu   

  1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology
  • Online:2018-03-15 Published:2018-03-15

摘要:

海量数据的索引是提高分布式环境下海量数据的查询重要手段。为了构建高效的索引结构,人们提出了多种异构数据索引优化方法。文中给出了基于决策树分类算法下的索引优化方法。基于决策树分类算法构建索引决策树,利用该索引决策树对各个子空间表的属性列进行决策,建立索引表,根据索引表数据建立索引,再根据各子空间上的索引构建全局索引。该二级索引结构为快速定位索引信息提供了技术支持。实验结果表明,索引决策树是一个对优化异构数据索引合适的方法。

关键词: 决策树, 索引结构, 大数据, 索引优化

Abstract:

The massive data index is an important means to improve the query efficiency of massive data in distributed environment. In order to construct an efficient index structure, Some heterogeneous data index optimization methods have proposed. This paper gives the index optimization method based on the index of decision tree classification , Firstly, an index decision tree is build up based on data tables and their index. then an index structure is obtained according to decisions given by the decision tree for each subspace. A global level index structure can be created based on local index. The two level index structure can used to rapid position index information and reduce data searching time. Finally, the experimental results show that the index of decision tree is a proper method to optimize heterogeneous spatial data index.

Key words: decision tree;index structure;big data;optimizing index

中图分类号: 

  • TN915