TY - JOUR
T1 - Interoperability and FAIRness through a novel combination of Web technologies
AU - Wilkinson, Mark D.
AU - Verborgh, Ruben
AU - Olavo Bonino da Silva Santos, Luiz
AU - Clark, Tim
AU - Swertz, Morris A.
AU - Kelpin, Fleur D. L.
AU - Gray, Alasdair J. G.
AU - Schultes, Erik A.
AU - van Mulligen, Erik M.
AU - Ciccarese, Paolo
AU - Kuzniar, Arnold
AU - Gavai, Anand
AU - Thompson, Mark
AU - Kaliyaperumal, Rajaram
AU - Bolleman, Jerven T.
AU - Dumontier, Michel
PY - 2017/4/24
Y1 - 2017/4/24
N2 - Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved at the level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.
AB - Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved at the level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.
KW - FAIR Data
KW - Interoperability
KW - Data Integration
KW - Semantic Web
KW - Linked Data
KW - REST
U2 - 10.7717/peerj-cs.110
DO - 10.7717/peerj-cs.110
M3 - Article
SN - 2376-5992
VL - 3
JO - PeerJ Computer Science
JF - PeerJ Computer Science
M1 - e110
ER -