Rethinking high performance computing platforms: Challenges, opportunities and recommendations

Ole Weidner, Malcolm P. Atkinson, Adam Barker, Rosa Filgueira Vicente

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

A growing number of "second generation" high-performance computing applications with heterogeneous, dynamic and data-intensive properties have an extended set of requirements, which cover application deployment, resource allocation, -control, and I/O scheduling. These requirements are not met by the current production HPC platform models and policies. This results in a loss of opportunity, productivity and innovation for new computational methods and tools. It also decreases effective system utilization for platform providers due to unsupervised workarounds and "rogue" resource management strategies implemented in application space. In this paper we critically discuss the dominant HPC platform model and describe the challenges it creates for second generation applications because of its asymmetric resource view, interfaces and software deployment policies. We present an extended, more symmetric and application-centric platform model that adds decentralized deployment, introspection, bidirectional control and information flow and more comprehensive resource scheduling. We describe cHPC: an early prototype of a non-disruptive implementation based on Linux Containers (LXC). It can operate alongside existing batch queuing systems and exposes a symmetric platform API without interfering with existing applications and usage modes. We see our approach as a viable, incremental next step in HPC platform evolution that benefits applications and platform providers alike. To demonstrate this further, we layout out a roadmap for future research and experimental evaluation.

Original languageEnglish
Title of host publicationDIDC '16: Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing
PublisherAssociation for Computing Machinery
Pages19-26
Number of pages8
ISBN (Electronic)9781450343527
DOIs
Publication statusPublished - 1 Jun 2016
Event6th ACM International Workshop on Data-Intensive Distributed Computing 2016 - Kyoto, Japan
Duration: 1 Jun 2016 → …

Conference

Conference6th ACM International Workshop on Data-Intensive Distributed Computing 2016
Abbreviated titleDIDC 2016
Country/TerritoryJapan
CityKyoto
Period1/06/16 → …

Keywords

  • HPC platform APIs
  • HPC platform models
  • Linux containers
  • OS-level virtualization
  • Resource management
  • Usability

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Rethinking high performance computing platforms: Challenges, opportunities and recommendations'. Together they form a unique fingerprint.

Cite this