The health care and life sciences community profile for dataset descriptions

Michel Dumontier, Alasdair J G Gray, M. Scott Marshall, Vladimir Alexiev, Peter Ansell, Gary Bader, Joachim Baran, Jerven T. Bolleman, Alison Callahan, José Cruz-Toledo, Pascale Gaudet, Erich A. Gombocz, Alejandra Gonzalez-Beltran, Paul Groth, Melissa Haendel, Maori Ito, Simon Jupp, Nick Juty, Toshiaki Katayama, Norio Kobayashi & 10 others Kalpana Krishnaswami, Camille Laibe, Nicolas Le Novère, Simon Lin, James Malone, Michael Miller, Christopher J. Mungall, Laurens Rietveld, Sarala M. Wimalaratne, Atsuko Yamaguchi

Research output: Contribution to journalArticle

Abstract

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.
LanguageEnglish
Article numbere2331
JournalPeerJ
Volume4
DOIs
Publication statusPublished - 16 Aug 2016

Fingerprint

Metadata
Health care
Semantic Web

Keywords

  • Data Profiling
  • Dataset descriptions
  • Metadata
  • Metadata standards
  • Provenance
  • FAIR Data

Cite this

Dumontier, M., Gray, A. J. G., Marshall, M. S., Alexiev, V., Ansell, P., Bader, G., ... Yamaguchi, A. (2016). The health care and life sciences community profile for dataset descriptions. PeerJ, 4, [e2331]. https://doi.org/10.7717/peerj.2331
Dumontier, Michel ; Gray, Alasdair J G ; Marshall, M. Scott ; Alexiev, Vladimir ; Ansell, Peter ; Bader, Gary ; Baran, Joachim ; Bolleman, Jerven T. ; Callahan, Alison ; Cruz-Toledo, José ; Gaudet, Pascale ; Gombocz, Erich A. ; Gonzalez-Beltran, Alejandra ; Groth, Paul ; Haendel, Melissa ; Ito, Maori ; Jupp, Simon ; Juty, Nick ; Katayama, Toshiaki ; Kobayashi, Norio ; Krishnaswami, Kalpana ; Laibe, Camille ; Le Novère, Nicolas ; Lin, Simon ; Malone, James ; Miller, Michael ; Mungall, Christopher J. ; Rietveld, Laurens ; Wimalaratne, Sarala M. ; Yamaguchi, Atsuko. / The health care and life sciences community profile for dataset descriptions. In: PeerJ. 2016 ; Vol. 4.
@article{8a6dabc8e1da4554b4eb6edac71d8ee1,
title = "The health care and life sciences community profile for dataset descriptions",
abstract = "Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.",
keywords = "Data Profiling, Dataset descriptions, Metadata, Metadata standards, Provenance, FAIR Data",
author = "Michel Dumontier and Gray, {Alasdair J G} and Marshall, {M. Scott} and Vladimir Alexiev and Peter Ansell and Gary Bader and Joachim Baran and Bolleman, {Jerven T.} and Alison Callahan and Jos{\'e} Cruz-Toledo and Pascale Gaudet and Gombocz, {Erich A.} and Alejandra Gonzalez-Beltran and Paul Groth and Melissa Haendel and Maori Ito and Simon Jupp and Nick Juty and Toshiaki Katayama and Norio Kobayashi and Kalpana Krishnaswami and Camille Laibe and {Le Nov{\`e}re}, Nicolas and Simon Lin and James Malone and Michael Miller and Mungall, {Christopher J.} and Laurens Rietveld and Wimalaratne, {Sarala M.} and Atsuko Yamaguchi",
year = "2016",
month = "8",
day = "16",
doi = "10.7717/peerj.2331",
language = "English",
volume = "4",
journal = "PeerJ",
issn = "2167-8359",
publisher = "PeerJ",

}

Dumontier, M, Gray, AJG, Marshall, MS, Alexiev, V, Ansell, P, Bader, G, Baran, J, Bolleman, JT, Callahan, A, Cruz-Toledo, J, Gaudet, P, Gombocz, EA, Gonzalez-Beltran, A, Groth, P, Haendel, M, Ito, M, Jupp, S, Juty, N, Katayama, T, Kobayashi, N, Krishnaswami, K, Laibe, C, Le Novère, N, Lin, S, Malone, J, Miller, M, Mungall, CJ, Rietveld, L, Wimalaratne, SM & Yamaguchi, A 2016, 'The health care and life sciences community profile for dataset descriptions', PeerJ, vol. 4, e2331. https://doi.org/10.7717/peerj.2331

The health care and life sciences community profile for dataset descriptions. / Dumontier, Michel; Gray, Alasdair J G; Marshall, M. Scott; Alexiev, Vladimir; Ansell, Peter; Bader, Gary; Baran, Joachim; Bolleman, Jerven T.; Callahan, Alison; Cruz-Toledo, José; Gaudet, Pascale; Gombocz, Erich A.; Gonzalez-Beltran, Alejandra; Groth, Paul; Haendel, Melissa; Ito, Maori; Jupp, Simon; Juty, Nick; Katayama, Toshiaki; Kobayashi, Norio; Krishnaswami, Kalpana; Laibe, Camille; Le Novère, Nicolas; Lin, Simon; Malone, James; Miller, Michael; Mungall, Christopher J.; Rietveld, Laurens; Wimalaratne, Sarala M.; Yamaguchi, Atsuko.

In: PeerJ, Vol. 4, e2331, 16.08.2016.

Research output: Contribution to journalArticle

TY - JOUR

T1 - The health care and life sciences community profile for dataset descriptions

AU - Dumontier, Michel

AU - Gray, Alasdair J G

AU - Marshall, M. Scott

AU - Alexiev, Vladimir

AU - Ansell, Peter

AU - Bader, Gary

AU - Baran, Joachim

AU - Bolleman, Jerven T.

AU - Callahan, Alison

AU - Cruz-Toledo, José

AU - Gaudet, Pascale

AU - Gombocz, Erich A.

AU - Gonzalez-Beltran, Alejandra

AU - Groth, Paul

AU - Haendel, Melissa

AU - Ito, Maori

AU - Jupp, Simon

AU - Juty, Nick

AU - Katayama, Toshiaki

AU - Kobayashi, Norio

AU - Krishnaswami, Kalpana

AU - Laibe, Camille

AU - Le Novère, Nicolas

AU - Lin, Simon

AU - Malone, James

AU - Miller, Michael

AU - Mungall, Christopher J.

AU - Rietveld, Laurens

AU - Wimalaratne, Sarala M.

AU - Yamaguchi, Atsuko

PY - 2016/8/16

Y1 - 2016/8/16

N2 - Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

AB - Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

KW - Data Profiling

KW - Dataset descriptions

KW - Metadata

KW - Metadata standards

KW - Provenance

KW - FAIR Data

U2 - 10.7717/peerj.2331

DO - 10.7717/peerj.2331

M3 - Article

VL - 4

JO - PeerJ

T2 - PeerJ

JF - PeerJ

SN - 2167-8359

M1 - e2331

ER -

Dumontier M, Gray AJG, Marshall MS, Alexiev V, Ansell P, Bader G et al. The health care and life sciences community profile for dataset descriptions. PeerJ. 2016 Aug 16;4. e2331. https://doi.org/10.7717/peerj.2331