B3438 - Novel statistical methods for the analysis of high-dimensional epigenetic data - 10/01/2020

B number: 
B3438
Principal applicant name: 
Haeran Cho | University of Bristol (United Kingdom)
Co-applicants: 
Prof Kate Tilling, Dr Josine Min, Dr Claire Gormley, Prof Jonathan Rougier
Title of project: 
Novel statistical methods for the analysis of high-dimensional epigenetic data
Proposal summary: 

We propose to address the problem of handling large-scale genome-wide DNA methylation data. To this end, we will develop a novel technique for clustering DNA methylation (DNAm) sites which will aid reducing the complexity of the subsequent EWAS. For example, a DNAm site that is hypo-methylated in the smoker cohort but hyper-methylated in non-smoker one merits further analysis for significant association with smoking, while those sites exhibiting no difference in the two cohorts does not.
We will investigate the use of algorithms for large matrix factorisation under constraints to provide a natural clustering of DNAm sites, and study what statistical guarantee is achievable under which conditions. To verify the suitability of the proposed method, we propose to use the DNA methylation data available from ARIES.

Impact of research: 
The findings from the proposed study will help lay a solid foundation for addressing the additional challenges brought on by epigenetic data analysis include (i) handling of continuous exposures beyond discrete variables (e.g., smoker/non-smoker), and (ii) accounting for cell heterogeneity. In particular, further research into (ii) is highly relevant since samples are measured at bulk rather than at the single-cell level and the methylome obtained for each sample contains the signals aggregated from distinct cell types. Few existing methods can identify the risk-DNAm sites for each individual cell type, missing the opportunity to obtain finer-scale results in EWAS. We plan to address the above problems based on the insights gained from the proposed research.
Date proposal received: 
Thursday, 9 January, 2020
Date proposal approved: 
Friday, 10 January, 2020
Keywords: 
Statistics/methodology, DNA sequencing, Statistical methods, Genetic epidemiology, Genome wide association study, Statistical methods