Creating and Exploiting the Intrinsically Disordered Protein Knowledge Graph (IDP-KG)

Alasdair J. G. Gray, Petros Papadopoulos, Imran Asif, Ivan Mičetić, Andás Hatos

Research output: Contribution to journalConference articlepeer-review

50 Downloads (Pure)

Abstract

There are many data sources containing overlapping information about Intrinsically Disordered Proteins (IDP). IDPcentral aims to be a registry to aid the discovery of data about proteins known to be intrinsically disordered by aggregating the content from these sources. Traditional ETL approaches for populating IDPcentral require the API and data model of each source to be wrapped and then transformed into a common model. In this paper, we investigate using Bioschemas markup as a mechanism to populate the IDPcentral registry by constructing the Intrinsically Dis- ordered Protein Knowledge Graph (idp-kg). Bioschemas markup is a machine-readable, lightweight representation of the content of each page in the site that is embedded in the HTML. For any site it is accessible through a HTTP request. We harvest the Bioschemas markup in three IDP sources and show the resulting idp-kg has the same breadth of pro- teins available as the original sources, and can be used to gain deeper insight into their content by querying them as a single, consolidated knowledge graph.
Original languageEnglish
Pages (from-to)1-10
Number of pages10
JournalCEUR Workshop Proceedings
Volume3127
Publication statusPublished - 21 Apr 2022
Event13th International Semantic Web Applications and Tools for Health Care and Life Sciences Conference 2022 - Leiden, Netherlands
Duration: 10 Jan 202213 Jan 2022
http://www.swat4ls.org/

Keywords

  • Bioschemas
  • Findable
  • Intrinsically Disordered Proteins
  • Knowledge Graphs
  • Schema.org

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Creating and Exploiting the Intrinsically Disordered Protein Knowledge Graph (IDP-KG)'. Together they form a unique fingerprint.

Cite this