J4 ›› 2014, Vol. 41 ›› Issue (2): 191-196.doi: 10.3969/j.issn.1001-2400.2014.02.031

• Original Articles • Previous Articles    

Prefetching method for Hadoop MapReduce environments

ZHANG Xiaohong1,2;LUO Fen2;JIA Zongpu2;SHEN Jiquan3   

  1. (1. Shenzhen Institute of Advanced Technology, Chinese Academy of Science, Shenzhen  518055, China;
    2. School of Computer Science and Technology, Henan Polytechnic Univ., Jiaozuo  454000, China;
    3. Center of Modern Education, Henan Polytechnic Univ., Jiaozuo  454000, China)
  • Received:2013-01-13 Online:2014-04-20 Published:2014-05-30
  • Contact: ZHANG Xiaohong E-mail:xh.zhang@hpu.edu.cn

Abstract:

Due to the data dependency and the special task execution mode in MapReduce environments, reduce tasks always cause massive remote data access delay and unnecessary resource competition, which degrades the system performance. To solve the performance problem, we propose a pre-fetching method based on pre-scheduling. The method hides the remote data access delay by pre-fetching, and controls the resource competition by adjusting resource allocation of reduce tasks. The method is implemented in Hadoop-0.20.2. The experimental results show that the method improves the system performance by more than 10%, compared with default Hadoop MapReduce and Hadoop Online Prototype.

Key words: MapReduce, distributed computing, pre-fetching, scheduling

CLC Number: 

  • TP316.4