Transparently resilient task parallelism for chapel

Konstantina Panagiotopoulou, Hans-Wolfgang Loidl

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)


Hardware failure in High-Performance Computing systems is the norm. Failure data, collected over a nine year period across 22 large-scale systems of up to few thousands of NUMA or SMP nodes, at Los Alamos National laboratories, show averages of 20-1000 failures per year. This paper describes the design and outlines the implementation of transparent resilience for task parallelism in Chapel, a high-performance language developed for productive parallel programming. We detail the design directions and we implement a transparent resilience mechanism within Chapel's runtime system. Our primary goal is to ensure program termination in the presence of hardware failure of one or multiple nodes in the system. We evaluate our implementation using a set of five synthetic microbenchmarks covering Chapel's task parallel constructs and we quantify and discuss the small overheads and speedups noted for the resilient implementation compared to the latest non-resilient Chapel release.

Original languageEnglish
Title of host publication2016 IEEE International Parallel and Distributed Processing Symposium Workshops
Number of pages10
ISBN (Electronic)9781509036820
ISBN (Print)9781509036837
Publication statusPublished - 4 Aug 2016
Event30th IEEE International Parallel and Distributed Processing Symposium Workshops 2016 - Chicago, United States
Duration: 23 May 201627 May 2016


Conference30th IEEE International Parallel and Distributed Processing Symposium Workshops 2016
Abbreviated titleIPDPSW 2016
Country/TerritoryUnited States


  • Automatic task adoption
  • Fault detection
  • Fault recovery
  • Fault tolerance
  • Parallelism
  • PGAS
  • Resilience
  • Runtime system
  • Transparency

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Hardware and Architecture
  • Software


Dive into the research topics of 'Transparently resilient task parallelism for chapel'. Together they form a unique fingerprint.

Cite this