Reliable scalable symbolic computation: the design of SymGridPar2

Robert Stewart, Patrick Maier, Philip William Trinder

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Symbolic computation is an important area of both Mathematics and Computer Science, with many large computations that would benefit from parallel execution. Symbolic computations are, however, challenging to parallelise as they have complex data and control structures, and both dynamic and highly irregular parallelism. The SymGridPar framework has been developed to address these challenges on small-scale parallel architectures. However the multicore revolution means that the number of cores and the number of failures are growing exponentially, and that the communication topology is becoming increasingly complex. Hence an improved parallel symbolic computation framework is required.

This paper presents the design and initial evaluation of SymGrid-Par2 (SGP2), a successor to SymGridPar that is designed to provide scalability onto 106 cores, and hence also provide fault tolerance. We present the SGP2 design goals, principles and architecture. We describe how scalability is achieved using layering and by allowing the programmer to control task placement. We outline how fault tolerance is provided by supervising remote computations, and outline higher-level fault tolerance abstractions.

We describe the SGP2 implementation status and development plans. We report the scalability and efficiency on approximately 2000 cores, and investigate the overheads of tolerating faults for simple symbolic computations.
Original languageEnglish
Title of host publicationProceedings of the 28th Annual ACM Symposium on Applied Computing
PublisherAssociation for Computing Machinery
Pages1674-1681
Number of pages8
ISBN (Electronic)9781450316569
DOIs
Publication statusPublished - 2013
Event28th ACM Symposium on Applied Computing 2013 - Coimbra, Portugal, Coimbra, Portugal
Duration: 18 Mar 201322 Mar 2013
Conference number: 28
http://oldwww.acm.org/conferences/sac/sac2013/

Conference

Conference28th ACM Symposium on Applied Computing 2013
Abbreviated titleSAC 2013
Country/TerritoryPortugal
CityCoimbra
Period18/03/1322/03/13
Internet address

Keywords

  • fault tolerance, locality control, parallel functional programming
  • locality control
  • parallel functional programming

Fingerprint

Dive into the research topics of 'Reliable scalable symbolic computation: the design of SymGridPar2'. Together they form a unique fingerprint.

Cite this