B3438 - Novel statistical methods for the analysis of high-dimensional epigenetic data - 10/01/2020
We propose to address the problem of handling large-scale genome-wide DNA methylation data. To this end, we will develop a novel technique for clustering DNA methylation (DNAm) sites which will aid reducing the complexity of the subsequent EWAS. For example, a DNAm site that is hypo-methylated in the smoker cohort but hyper-methylated in non-smoker one merits further analysis for significant association with smoking, while those sites exhibiting no difference in the two cohorts does not.
We will investigate the use of algorithms for large matrix factorisation under constraints to provide a natural clustering of DNAm sites, and study what statistical guarantee is achievable under which conditions. To verify the suitability of the proposed method, we propose to use the DNA methylation data available from ARIES.