B3634 - Curating UK COVID-19 diagnostics data to catalyse research and innovation - 19/10/2020

B number: 
B3634
Principal applicant name: 
Phil Quinlan | Computer Science, University of Nottingham (UK)
Co-applicants: 
Prof Nicholas Timpson
Title of project: 
Curating UK COVID-19 diagnostics data to catalyse research and innovation
Proposal summary: 

The UK has rich, globally important COVID-19 datasets, including large serology cohort studies funded by UKRI, Wellcome, DHSC/NHS, NIHR and the devolved administrations. However, this breadth of data creates a risk of fragmentation, inconsistent structure and access processes, severely limiting utility, timeliness and impact.
Our vision is to transform UK COVID-19 diagnostic datasets to be Findable, Accessible, Interoperable and Reusable (FAIR) and couple this with expert data engineering, enabled by Health Data Research (HDR) UK, to catalyse responsible and trustworthy use of the data for research and innovation.
We propose to support PIs and data custodians to link COVID-19 cohort, serology and other health and non-health datasets. This longitudinal linkage is vital to derive new scientific insights and deliver informed decisions about how best to control the spread of SARS-CoV-2. At present there are >30 independent studies with no streamlined approach to linkage to other health and non-health related datasets, lack of data standardisation, and no strategic approach to synthesise analyses across studies.
SAGE (9th June) requested HDR to work with partners to develop the UK-wide serology and testing data research asset that is linkable to other data sources.
This proposal has been prepared in response to this request. We have bought together 41 leaders from 29 different organisations and 44 data sources to address a major data engineering challenge by building upon existing UKRI investments, including the HDR BREATHE Hub, to create a ‘one-stop’ service for trustworthy, multi-stakeholder utilisation of curated COVID-19 data for public, private and third sector benefit.

Impact of research: 
Importance: The UK has >30 relevant research funded datasets (Table 1). Currently, these are held by multiple data controllers with different governance models. A key ambition is the standardised capture of granular data within NHS laboratory systems, permitting uniform analysis that adds value. Common data asset structures are vital if we are to deliver maximum research and innovation potency. Deliverables: We will deliver four new capabilities: (i) a platform for discoverability and feasibility analyses to understand if the required data and/or populations can answer research questions; (ii) the ability for researchers to perform meta-analysis and exploratory analysis UK-wide; (iii) the ability to link COVID-19 UK cohorts to multi- dimensional health and non-health related datasets UK-wide; (iv) development of the robust UK health data infrastructure to enable long-term impact beyond COVID-19. Expertise: We have brought together the PIs of research cohorts, NHS system leaders in diagnostic testing. the expertise and infrastructure of the HDR BREATHE Hub (Director Sheikh) and the HDR Gateway, the scalable global software of BC Platforms, the UKCRC Tissue Directory (200 tissue banks) and the four NHS national trusted research environments (TREs). Resources: We will develop the ability to discover, request access and analyse data from 44 data sources through: (i) improving access to new serology cohorts, including SIREN; (ii) existing cohorts that have been augmented with serology and other COVID-19 related data; and, (iii) NHS serology resources, to drive new insights into COVID-19. The HDR BREATHE Hub already has 17 COVID-19 related research cohorts registered in the HDR Innovation Gateway.
Date proposal received: 
Friday, 9 October, 2020
Date proposal approved: 
Monday, 12 October, 2020
Keywords: 
Health Services Research/Health Systems Research, Infection, Data management, COVID19