一种适用于Hadoop MapReduce环境的数据预取方法

doi:10.3969/j.issn.1001-2400.2014.02.031

Abstract

Abstract:

Due to the data dependency and the special task execution mode in MapReduce environments, reduce tasks always cause massive remote data access delay and unnecessary resource competition, which degrades the system performance. To solve the performance problem, we propose a pre-fetching method based on pre-scheduling. The method hides the remote data access delay by pre-fetching, and controls the resource competition by adjusting resource allocation of reduce tasks. The method is implemented in Hadoop-0.20.2. The experimental results show that the method improves the system performance by more than 10％, compared with default Hadoop MapReduce and Hadoop Online Prototype.

Key words: MapReduce, distributed computing, pre-fetching, scheduling

CLC Number:

TP316.4

ZHANG Xiaohong;LUO Fen;JIA Zongpu;SHEN Jiquan. Prefetching method for Hadoop MapReduce environments[J].J4, 2014, 41(2): 191-196.

References

［1］ Gantz J, Reinsel D. The Digital Universe Decade-are You Ready? ［DB/OL］. ［2012-12-26］. http://www.emc.com/collateral/demos/microsite s/idc-digi-taluniverse/iview.htm.
［2］ Dean J, Ghemawat S. Mapreduce: Simplified Data Processing on Large Custers ［J］. Communications of the ACM, 2008, 51(1): 107-113.
［3］ Ghemawat S, Gobioff H, Leung S. The Google File System ［C］//Proceedings of the 19th ACM Symposium on Operating Systems Principles. New York: ACM, 2003: 29-43.
［4］ The Apache Software Foundation. Welcome to Hadoop Mapreduce! ［DB/OL］. ［2012-12-26］. http://hadoop.apache.org/mapreduce/.
［5］ Menon A. Big Data @ Facebook ［C］//Proceedings of Workshop on Management of Big Data Systems. New York: ACM, 2012: 31-32.
［6］ Lattanzi S, Moseley B, Suri S, et al. Filtering: a Method for Solving Graph Problems in MapReduce［C］//Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures. New York: ACM, 2011:85-94.
［7］ Shao B, Wang H, Xiao Y. Managing and Mining Large Graphs: Systems and Implementations［C］//Proceedings of the ACM SIGMOD International Conference on Management of Data. NewYork: ACM, 2012: 589-592.
［8］ Chen Y, Alspaugh S, Katz R. Interactive Analytical Processing in Big Data Systems: a Cross-industry Study of MapReduce Workloads［C］//Proceedings of the VLDB Endowment: 5. NewYork: ACM, 2012: 1802-1813.
［9］ Seo S, Jang I, Woo K, et al. HPMR: Prefetching and Pre-shuffling in Shared Mapreduce Computation Environment ［C］//Proceedings of IEEE International Conference on Cluster Computing. Piscataway: IEEE, 2009: 1-8(528917).
［10］ Ibrahim S, Jin H, Lu L, et al. Leen: Locality/Fairness-aware Key Partitioning for Mapreduce in the Cloud ［C］//Proceedings of the IEEE International Conference on Cloud Computing Technology and Science. Piscataway: IEEE, 2010: 17-24.
［11］ Su Y, Chen P, Chang J, et al. Variable-sized Map and Locality-aware Reduce on Public-resource Grids ［J］. Future Generation Computer Systems, 2011, 27(6): 843-849.
［12］ Condie T, Conway N, Alvaro P, et al. Mapreduce Online［C］//Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation. Berkeley: Usenix Association, 2010: 21.
［13］ Zaharia M, Borthakur D, Sarma J S, et al. Job Scheduling for Multi-user Mapreduce Clusters［R/OL］. ［2009-12-13］. http://www.eecs.berkeley.edu.Pubs/TechRpts/2009/EECS-2009-55.pdf.

[1]	LV Wenkai,YANG Pengfei,DING Yunqing,ZHANG Heyu,ZHENG Tianyang. JEDERL:A task scheduling optimization algorithm for heterogeneous computing platforms [J]. Journal of Xidian University, 2021, 48(6): 67-74.
[2]	ZHANG Yupeng,WU Zili,CHEN Ming,ZHANG Lulu. Optimization of task scheduling oriented to cross microservice chains [J]. Journal of Xidian University, 2021, 48(6): 32-39.
[3]	ZHAO Hui,FENG Nanzhi,WANG Quan,WAN Bo,WANG Jing. Dynamic semi-online task scheduling method for the edge computing platform [J]. Journal of Xidian University, 2021, 48(6): 8-15.
[4]	SHU Xinfeng,WANG Changtai,WANG Yan,ZHANG Lili. Propositional projection temporal logic based distributed model checking method [J]. Journal of Xidian University, 2020, 47(4): 39-47.
[5]	ZHANG Yunpu,SHAN Ganlin,DUAN Xiusheng,WANG Meng. Scheduling of active/passive sensors for radiation control [J]. Journal of Xidian University, 2019, 46(6): 67-74.
[6]	LIU Yiming,SHENG Wen,SHI Duanyang. Multi-beam tracking scheduling strategy for phased array radar based on the cost-effectiveness ratio [J]. Journal of Xidian University, 2019, 46(6): 155-162.
[7]	ZHEN Yan,ZHAO Hu. Resource scheduling strategy in hierarchical software defined wireless sensor networks [J]. Journal of Xidian University, 2019, 46(4): 87-98.
[8]	XIA Jun,YANG Yi,LIN Yi. Algorithm for scheduling energy-saving frame-based tasks on the heterogeneous multi-core SoC [J]. Journal of Xidian University, 2019, 46(3): 89-95.
[9]	WANG Yan. Enhanced multi-objective evolutionary algorithm for workflow scheduling on the cloud platform [J]. Journal of Xidian University, 2019, 46(1): 130-136.
[10]	WANG Jianwei;ZHANG Hailin. Cumulative distribution function based resource allocation in the uplink of the small cells network [J]. Journal of Xidian University, 2018, 45(3): 163-168.
[11]	LI Zhao;JIA Wenhao;BAI Yujiao. Adaptive proportional fair scheduling with global-fairness [J]. Journal of Xidian University, 2018, 45(1): 6-11+22.
[12]	LIU Didi;LIN Jiming;WANG Junyi;CHEN Xiaohui;ZHANG Wenhui. Power allocation and transmission scheduling for a transmitter with hybrid energy sources [J]. Journal of Xidian University, 2016, 43(6): 8-14.
[13]	YANG Fan;XU Zhanqi;LI Danwu;ZHU Jianfeng;MA Tao;DING Zhe. Multicast scheduling algorithm with a dynamic weight for the input buffered Crossbar [J]. Journal of Xidian University, 2016, 43(6): 80-85.
[14]	JING Weipeng;HUO Shuaiqi;CHEN Guangsheng;LIU Yaqiu. Novel mixed-criticality reliability scheduling strategy and schedulability test [J]. Journal of Xidian University, 2016, 43(6): 158-163.
[15]	WANG Aimin;LIU Yongqiang;ZHANG Jing;LIU Yanheng. Coverage algorithm for finding the minimum working sets in WSNs [J]. Journal of Xidian University, 2016, 43(4): 141-146.

Prefetching method for Hadoop MapReduce environments

PDF (PC)

Like

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0