Scalable Adaptive Optimizations for Stream-based Workflows in Multi-HPC-Clusters and Cloud Infrastructures

Liang Liang, Rosa Filgueira, Yan Yan, Thomas Heinis

Research output: Contribution to journalArticlepeer-review

Abstract

This work presents three new adaptive optimization techniques to maximize the performance of dispel4py workflows. dispel4py is a parallel Python-based stream-oriented dataflow framework that acts as a bridge to existing parallel programming frameworks like MPI or Python multiprocessing. When a user runs a dispel4py workflow, the original framework performs a fixed workload distribution among the processes available for the run. This allocation does not take into account the features of the workflows, which can cause scalability issues, especially for data-intensive scientific workflows. Our aim, therefore, is to improve the performance of dispel4py workflows by testing different workload strategies that automatically adapt to workflows at runtime. For achieving this objective, we have implemented three new techniques, called Naive Assignment, Staging and Dynamic Scheduling. We have evaluated our proposal with several workflows from different domains and across different computing resources. The results show that our proposed techniques have significantly (up to 10X) improved the performance of the original dispel4py framework.
Original languageEnglish
JournalFuture Generation Computer Systems
Early online date8 Oct 2021
DOIs
Publication statusE-pub ahead of print - 8 Oct 2021

Keywords

  • Scientific workflow
  • Stream-based workflow
  • Workflow optimization
  • dispel4py
  • distributed systems

ASJC Scopus subject areas

  • Computer Science (miscellaneous)

Fingerprint

Dive into the research topics of 'Scalable Adaptive Optimizations for Stream-based Workflows in Multi-HPC-Clusters and Cloud Infrastructures'. Together they form a unique fingerprint.

Cite this