B3304 - HDR Data Discovery Preparation for ATLAS - 02/05/2019

B number: 
B3304
Principal applicant name: 
Philip Quinlan | University of Nottingham (UK)
Co-applicants: 
Matt Styles, Tom Giles
Title of project: 
HDR Data Discovery: Preparation for ATLAS
Proposal summary: 

Medical research in the UK is still hindered by a perceived lack of suitable data and sample resources that can be used by the research community. This perception contrasts starkly with the knowledge that there are hundreds of potential resources that can supply data and samples. The work of the Tissue Directory and Coordination Centre and backed by the Medicines Discovery Catapult has shown that the actual problem experienced by both academic and commercial researchers is the inability to find suitable resources. The consequence is that researchers tend to utilise the resources that are close by proximity (because they can go and talk to the resource directly) rather than close to their research goals. This approach is driven because there is no easy mechanism in which to discover other potential national resources.
The Tissue Directory and Coordination Centre (TDCC) is the UKs national centre that is tasked with coordinating biobanks (samples and data) to ensure that researchers re-use existing resources before seeking to collect new samples or datasets. TDCC is a joint endeavour between the University of Nottingham and University College London. TDCC has created a Tissue Directory to allow a high-level search on aggregated meta data that works well for clinical and disease orientated biobanks. In part the Tissue Directory has begun to solve the issue of findability but the challenge has evolved. Researchers can search on very high-level classification of disease, gender and age and resources matching those criteria are displayed. This Directory acts as a first filter but does not allow researchers to ask detailed feasibility questions. Although candidate resources are listed the confidence that the data or samples required do actually exist remain low once sub-typing is considered. Assuming the identified resources have a search portal (many do not), the researcher would have to create accounts in each in order to perform a more detailed search. This effort can often go unrewarded as the ultimate answer is often that once more detailed criteria are added, the resource does not have the data or samples required.

It is clear that in order to support and utilise existing UK investment in health care resources we must have a more efficient mechanism in which a researcher can ask a relatively detailed question and understand whether resources exist that could support them. It is especially important that researchers find out quickly if something is not feasible (rather than investing time and finding it is not) and equally that referrals to the resources (such as ALSPAC) from the search system does not create irrelevant traffic. By bringing together key academic, technology and research infrastructure stakeholders we are seeking to change this to make the necessary change.

Impact of research: 
Date proposal received: 
Monday, 29 April, 2019
Date proposal approved: 
Tuesday, 30 April, 2019
Keywords: 
Data Discovery at scale