dispel4py: An Open Source Python Framework for Encoding, Mapping and Reusing Seismic Continuous Data Streams: Intensive Analysis and Data Mining

Rosa Filgueira, Amrey Krause, Malcolm Atkinson, Alessandro Spinuso, Iraklis Klampanos, Federica Magnoni, Emanuele Casarotti, Jean-Pierre Vilotte

Research output: Contribution to conferencePaperpeer-review

Abstract

Scientific workflows are needed by many scientific communities, such as seismology, as they enable easy composition and execution of applications, enabling scientists to focus on their research without being distracted by arranging computation and data management. However, there are challenges to be addressed. In many systems users have to adapt their codes and data movement as they change from one HPC-architecture to another. They still need to be aware of the computing architectures available for achieving the best application performance. We present dispel4py, an open-source framework presented as a Python library for encoding and automating data-intensive scientific methods as a graph of operations coupled together by data-streams. It enables scientists to develop and experiment with their own data-intensive applications using their familiar work environment. These are then automatically mapped to a variety of HPC-architectures, i.e., MPI, multiprocessing, Storm and Spark frameworks, increasing the chances to reuse their applications in different computing resources. dispel4py comes with data provenance, as shown in the screenshot, and with an information registry that can be accessed transparently from within workflows. dispel4py has been enhanced with a new run-time adaptive compression strategy to reduce the data stream volume and a diagnostic tool which monitors workflow performance and computes the most efficient parallelisation to use. dispel4py has been used by seismologists in the project VERCE for seismic ambient noise cross-correlation applications and for orchestrated HPC wave simulation and data misfit analysis workflows; two data-intensive problems that are common in today's research practice. Both have been tested in several local computing resources and later submitted to a variety of European PRACE HPC-architectures (e.g. SuperMUC & CINECA) for longer runs without change. Results show that dispel4py is an easy tool for developing, sharing and reusing data-intensive scientific methods.
Original languageEnglish
Publication statusPublished - Dec 2015
EventAmerican Geoscience Union Fall Meeting 2015 - San Francisco, United States
Duration: 14 Dec 201518 Dec 2015

Conference

ConferenceAmerican Geoscience Union Fall Meeting 2015
Country/TerritoryUnited States
CitySan Francisco
Period14/12/1518/12/15

Fingerprint

Dive into the research topics of 'dispel4py: An Open Source Python Framework for Encoding, Mapping and Reusing Seismic Continuous Data Streams: Intensive Analysis and Data Mining'. Together they form a unique fingerprint.

Cite this