Description of impact
Data has been an underutilised output of research projects for many years due to the challenges of finding, understanding, and reusing that data. Research at Heriot-Watt Computer Science by Dr Gray contributed substantially to the definition of the FAIR Data Principles (2016) and led to a global Health Care and Life Sciences community recommendation for describing datasets for discovery and reuse. This community recommendation has been adopted by several data providers including the RDF Platform of the European Bioinformatics Institute (EBI). The standard has also been adopted and used internally in major pharmaceutical companies, including AstraZeneca, leading to datasets that comply with the FAIR Data Principles to be more readily reused and exploited.Narrative
The increasing use of computers to support researchers in gathering data, processing and analysing data, and publishing data and research results has led to a step-change in the way research is conducted. It was vitally important to pharmaceutical companies that the Open PHACTS system could provide provenance on the returned query answers, to state where the data originated (ChEMBL, Drugbank, UniProt, etc) and which version of the dataset was used. Dr Gray developed the Open PHACTS Dataset Description to provide the needed metadata about the data consumed, including important properties such as stating the version and format of the ingested data. This allowed specific provenance information to be returned on the platform’s query answers and increased trust in the analysis resulting from the data.Building on the Open PHACTS Dataset Descriptions, subsequent research in the period 2013-2015 led to The FAIR Data Principles, published in March 2016, which set out desirable criteria to enable the discovery, retrieval, understanding, and reuse of data associated with research, particular that funded by public bodies such as UK Research & Innovation (UKRI), European Research Council (ERC), and National Institutes of Health (NIH).
The FAIR Data Principles built on Dr Gray’s work on dataset descriptions, particularly with respect to the definitions of principles F1, F2, F3, A1, A2, I1, I3, R1.1, R1.2, R1.3. Dr Gray collaborated in the development of the FAIR Principles and engaged in activities to publicise the FAIR principles and train people to FAIRifiy their data. The FAIR Data Principles were endorsed by the G20 Leaders’ Communique Hangzhou Summit, September 2016, by stating;
‘‘We support effort to promote voluntary knowledge diffusion and technology transfer on mutually agreed terms and conditions. Consistent with this approach, we support appropriate efforts to promote open science and facilitate appropriate access to publicly funded research results on findable, accessible, interoperable and reusable (FAIR) principles”.
The Principles have subsequently led to interest within industry and academia to further exploit data that has previously been collected, either internally by companies or publicly by academia. This has been particularly the case within the health care and life sciences community where pharmaceutical companies have initiated/funded initiatives to retroactively make existing datasets comply with the FAIR Data Principles so that they can be more readily reused and exploited.
The W3C HCLS Dataset Description Profile enables the meeting of FAIR principle R1.3 and has been adopted internally in major pharmaceutical companies, including AstraZeneca as a means to make their internal data more discoverable and reusable by a wider set of research labs across the world. AstraZeneca’s Director, Oncology Translational Medicine, Data Strategy Lead stated,
“We recognised the costs associated with continual curation and reshaping of data as new questions arise beyond the original collection intent and have found the alignment and implementation of the FAIR principles as a way to solve this challenge”, and, “we found the use of the DCAT standard and the W3C HCLC recommendations to be critical to implementing the FAIR data set management”.
The profile has been deployed within major data repositories including the European Bioinformatics Institute’s (EBI) RDF Platform, the Swiss Institute for Bioinformatics (SIB), and the Japanese RIKEN MetaDatabase portal for life sciences data. At the EBI, the profile was used to automate their data ingestion pipeline for their RDF platform. The approach allowed them to perform various quality control checks on the metadata. This improved the quality of the data and also saved time.
An independent study recognised that Biopharma Research and Development (R&D) productivity can be improved by implementing the FAIR Data Principles and is an enabler for digital transformation of Biopharma R&D. The study went on to highlight the impact for one company who had implemented a FAIR platform for 3,000 users across three main sites and, “since running the FAIR platform for 2 months, the company collected usage activity data based on click counts per user. This FAIR platform had 900,000 page views in 60 days. The projection for the year gave an estimation of ∼5.5M page views. A very conservative assumption that each of these FAIR-enhanced views saved ∼5 s, by providing better search results with direct access to the target repository, led to a calculation of ∼3.5 full-time employees (FTEs) worth of time saved per year”.
In Boston, 14 May 2020, The Pistoia Alliance, a global, not-for-profit alliance that works to lower barriers to innovation in life sciences R&D, launched a freely accessible toolkit to help companies implement the FAIR (Findable, Accessible, Interoperable, Reusable) guiding principles for data management and stewardship. Collated by experts in the field, the toolkit contains numerous method tools, training and change management, as well as use cases, allowing organizations to learn from industry successes. The Alliance recognised that as the life sciences industry continues to digitize, the FAIR guiding principles of Findable, Accessible, Interoperable and Reusable data would help organizations realise their digital transformation.
“At Roche, we know that implementing the FAIR principles can be difficult for biotech and pharma organizations of every size, so we are very pleased to lead on this project and help make the process easier,” commented the Principal Scientist at Roche. “The toolkit will help to smooth the path to greater data sharing within and between industries, which is critical to future research efforts. We see the FAIR guiding principles as a worthy goal, and one which will help the industry realize the value of technologies like deep learning.”
| Impact status | Achieved |
|---|---|
| Impact date | 1 Jan 2014 → 31 Dec 2020 |
| Category of impact | Legal, Technological |
| Impact level | European |
Keywords
- 2021
Documents & Links
- Making research data more findable, accessible, interoperable, and reusable
File: application/vnd.openxmlformats-officedocument.wordprocessingml.document, 53 KB
Type: Other