The HdpH DSLs for scalable reliable computation

Patrick Maier, Robert Stewart, Phil Trinder

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

The statelessness of functional computations facilitates both parallelism and fault recovery. Faults and non-uniform communication topologies are key challenges for emergent large scale parallel architectures. We report on HdpH and HdpH-RS, a pair of Haskell DSLs designed to address these challenges for irregular task-parallel computations on large distributed-memory architectures. Both DSLs share an API combining explicit task placement with sophisticated work stealing. HdpH focuses on scalability by making placement and stealing topology aware whereas HdpH-RS delivers reliability by means of fault tolerant work stealing. We present operational semantics for both DSLs and investigate conditions for semantic equivalence of HdpH and HdpH-RS programs, that is, conditions under which topology awareness can be transparently traded for fault tolerance. We detail how the DSL implementations realise topology awareness and fault tolerance. We report an initial evaluation of scalability and fault tolerance on a 256-core cluster and on up to 32K cores of an HPC platform.

Original languageEnglish
Title of host publicationHaskell 2014 - Proceedings of the 2014 ACM SIGPLAN symposium on Haskell
PublisherAssociation for Computing Machinery
Pages65-76
Number of pages12
ISBN (Print)978-1-4503-3041-1
DOIs
Publication statusPublished - 2014
Event6th ACM SIGPLAN symposium on Haskell - Gothenburg, Sweden
Duration: 4 Sept 20145 Sept 2014

Conference

Conference6th ACM SIGPLAN symposium on Haskell
Abbreviated titleHaskell 2014
Country/TerritorySweden
CityGothenburg
Period4/09/145/09/14

Keywords

  • embedded domain specific languages
  • fault tolerance
  • parallelism
  • topology awareness

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Software

Fingerprint

Dive into the research topics of 'The HdpH DSLs for scalable reliable computation'. Together they form a unique fingerprint.

Cite this