西安电子科技大学学报 ›› 2025, Vol. 52 ›› Issue (1): 181-195.doi: 10.19665/j.issn1001-2400.20241005

• 计算机科学与技术 & 网络空间安全 • 上一篇    下一篇

面向WaaS平台的多工作流容错调度策略

支文韬1,2(), 赵辉1,2,3(), 孟繁鑫1(), 王静1(), 万波1,2,3(), 王泉1,3()   

  1. 1.西安电子科技大学 计算机科学与技术学院,陕西 西安 710071
    2.西安电子科技大学 杭州研究院,浙江 杭州 311231
    3.陕西省智能人机交互与可穿戴技术重点实验室,陕西 西安 710071
  • 收稿日期:2024-01-06 出版日期:2024-10-24 发布日期:2024-10-24
  • 通讯作者: 王 静(1981—),女,副教授,博士,E-mail:wangjing@mail.xidian.edu.cn
  • 作者简介:支文韬(1999—),男,西安电子科技大学硕士研究生,E-mail:22031212306@stu.xidian.edu.cn
    赵 辉(1983—),男,副教授,博士,E-mail:hzhao@mail.xidian.edu.cn
    孟繁鑫(1998—),男,西安电子科技大学硕士研究生,E-mail:20031211543@stu.xidian.edu.cn
    万 波(1976—),男,教授,博士,E-mail:wanbo@xidian.edu.cn
    王 泉(1970—),男,教授,博士,E-mail:qwang@xidian.edu.cn
  • 基金资助:
    陕西省重点研发计划(2024GX-YBXM-010);陕西省重点研发计划(2024GX-YBXM-140);陕西省重点研发计划(2024GX-YBXM-039);陕西省创新能力支撑计划(2023-CX-TD-08);陕西省秦创原“科学家+工程师”团队(2023KXJ-040);中央高校基本科研业务费专项资金(ZYTS24089)

Multi-workflow fault-tolerant scheduling strategy for WaaS platforms

ZHI Wentao1,2(), ZHAO Hui1,2,3(), MENG Fanxin1(), WANG Jing1(), WAN Bo1,2,3(), WANG Quan1,3()   

  1. 1. School of Computer Science and Technology,Xidian University,Xi’an 710071,China
    2. Hangzhou Institute of Technology,Xidian University,Hangzhou 311231,China
    3. Shaanxi Province Key Laboratory of Smart Human-Computer Interaction and Wearable Technology,Xi’an 710071,China
  • Received:2024-01-06 Online:2024-10-24 Published:2024-10-24

摘要:

随着科学计算复杂性提高,工作流成为实现科学计算自动化的重要模型。WaaS平台从IaaS供应商处租用虚拟机,为用户提供科学工作流计算服务。目前针对WaaS平台的工作流调度研究并未考虑虚拟机宕机导致任务运行失败以及虚拟机供应延迟的情况。针对此问题,提出一种面向WaaS平台的多工作流容错调度策略。首先,针对WaaS平台不直接调度硬件资源而是在虚拟机和容器层面调度工作流的特点,考虑虚拟机宕机以及供应延迟对调度的影响,建立适合WaaS平台的工作流调度模型。其次,提出一种WaaS平台下多工作流容错调度策略,包括预处理、容错方法选择、任务分配和资源调整四个阶段。其中,设计一种截止时间划分算法来确定调度顺序,通过将任务复制和重新提交相结合的方式选择容错算法,考虑任务属性和虚拟机供应延迟来进行虚拟机选择与任务分配,设计资源调整算法为即将开始的任务提前部署资源,以避免虚拟机或容器的供应延迟。最后,通过在不同虚拟机宕机概率、工作负载和截止时间约束下的实验对比,证明了提出的WaaS平台容错调度策略的有效性。

关键词: 多工作流, 容错调度算法, 工作流即服务平台, 资源供应延迟

Abstract:

As the complexity of scientific computation increases,workflows have become an essential model for automating scientific computations.Workflow as a Service(WaaS) platforms rent virtual machines from Infrastructure as a Service(IaaS) providers to offer users the service of running scientific workflow computations.However,current researches on workflow scheduling in WaaS platforms do not consider the potential for virtual machine downtime to lead to task failures and the delays in virtual machine provisioning.To address this issue,this paper proposes a multi-workflow fault-tolerant scheduling strategy for WaaS platforms.First,considering that WaaS platforms do not schedule hardware resources but operate at the level of virtual machines and containers,we establish a workflow scheduling model suitable for WaaS platforms,taking into account the impact of virtual machine provisioning delays on scheduling.Second,we propose a multi-workflow fault-tolerant scheduling strategy for WaaS platforms,which includes preprocessing,fault-tolerance selection method,task scheduling,and resource adjustment.This involves designing an improved deadline division algorithm for determining the scheduling order,creating a fault-tolerance selection algorithm that combines replication and resubmission,considering task attributes and virtual machine provisioning delays for virtual machine selection and task allocation,and designing a resource adjustment algorithm for avoiding the waiting time for the provisioning delay of virtual machines or containers by deploying resources in advance for the upcoming tasks.Finally,by comparing the proposed scheduling strategy under different virtual machine downtime probabilities,workloads,and deadlines with other algorithms,we demonstrate the effectiveness of the proposed fault-tolerant scheduling strategy for WaaS platforms.

Key words: multi-workflow, fault tolerance scheduling algorithm, WaaS platforms, resource provisioning delay

中图分类号: 

  • TP301.6