B3020 - Investigating the causal role of CpG methylation in complex traits - 15/12/2017

B number: 
B3020
Principal applicant name: 
Heather Cordell | Newcastle University (UK)
Co-applicants: 
Mr James Fryett, Dr Richard Howey
Title of project: 
Investigating the causal role of CpG methylation in complex traits
Proposal summary: 

Genome wide association studies (GWAS) have been successful at identifying regions of the genome associated with disease (1). However, improved understanding of the biological pathways underlying these associations is needed to aid development of disease therapies and to identify the causal variants and genes for disease (2). To this end, transcriptome-wide association study (TWAS) has been proposed and implemented in the PrediXcan (3), MetaXcan (4) and FUSION (5) methods. This methodology uses genotype data to predict gene expression values, then tests association of these predicted expression values with phenotypes to identify potentially causal genes whose expression may be involved in the phenotype of interest. This has helped to improve knowledge of the role of gene expression in a range of diseases (6-8).

DNA methylation at CpG sites across the genome is known to be important in disease, and epigenome-wide association studies (EWAS) of CpG methylation have become a common tool for identifying CpG probes related to disease (9). Using a methodology similar to TWAS, we propose to investigate prediction of genome-wide CpG methylation status using genotype and methylation data from ALSPAC. Models that can predict methylation status from genotype data will be built using genotype and methylation data collected as part of the ARIES project. These prediction models will then be applied to the remaining samples in ALSPAC to impute methylation values. These imputed methylation values will then be tested for association with a range of phenotypes to identify potentially causal methylation probes that may be involved in disease.

Detection of an association between two phenotypes does not prove causality. Causal inference and causal modelling methods can be used to identify the true causal biological mechanisms underlying genetic associations, and to determine the causal effects of biological intermediates such as DNA methylation and gene expression on phenotypes of interest. Of particular interest are structural equation modelling (SEM) and Bayesian networks, which have been successfully used to identify the roles of intermediates (including DNA methylation and gene expression) in a number of traits (10-12). These methods will be applied to genomic loci of interest to identify causal pathways for disease. These causal pathways represent strong targets for development of potential disease therapies.

References
1. MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 2017;45(D1):D896-D901.
2. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet. 2017;101(1):5-22.
3. Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015;47(9):1091-8.
4. Barbeira AN, Dickinson SP, Torres JM, Bonazzola R, Zheng J, Torstenson ES, et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. bioRxiv. 2017.
5. Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BW, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 2016;48(3):245-52.
6. Jin Y, Andersen G, Yorgov D, Ferrara TM, Ben S, Brownson KM, et al. Genome-wide association studies of autoimmune vitiligo identify 23 new risk loci and highlight key pathways and regulatory variants. Nat Genet. 2016;48(11):1418-24.
7. Kiryluk K, Li Y, Moldoveanu Z, Suzuki H, Reily C, Hou P, et al. GWAS for serum galactose-deficient IgA1 implicates critical genes of the O-glycosylation pathway. PLoS Genet. 2017;13(2):e1006609.
8. Mancuso N, Shi H, Goddard P, Kichaev G, Gusev A, Pasaniuc B. Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. Am J Hum Genet. 2017;100(3):473-87.
9. Rakyan VK, Down TA, Balding DJ, Beck S. Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011;12(8):529-41.
10. Ainsworth HF, Shin SY, Cordell HJ. A comparison of methods for inferring causal relationships between genotype and phenotype using additional biological measurements. Genet Epidemiol. 2017;41(7):577-86.
11. Shin SY, Petersen AK, Wahl S, Zhai G, Romisch-Margl W, Small KS, et al. Interrogating causal pathways linking genetic variants, small molecule metabolites, and circulating lipids. Genome Med. 2014;6(3):25.
12. Zhu J, Sova P, Xu Q, Dombek KM, Xu EY, Vu H, et al. Stitching together multiple data dimensions reveals interacting metabolomic and transcriptomic networks that modulate cell regulation. PLoS Biol. 2012;10(4):e1001301.
13. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76-82.
14. Gaunt TR, Shihab HA, Hemani G, Min JL, Woodward G, Lyttleton O, et al. Systematic identification of genetic influences on methylation across the human life course. Genome Biol. 2016;17:61.
15. Quon G, Lippert C, Heckerman D, Listgarten J. Patterns of methylation heritability in a genome-wide analysis of four brain regions. Nucleic Acids Res. 2013;41(4):2095-104.
16. Tibshirani R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society (Series B). 1996;58:267-88.
17. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2005;67(2):301-20.
18. Hoerl AE, Kennard RW. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. 2000;42(1):80-6.
19. Ng B, White CC, Klein HU, Sieberts SK, McCabe C, Patrick E, et al. An xQTL map integrates the genetic architecture of the human brain's transcriptome and epigenome. Nat Neurosci. 2017;20(10):1418-26.

Date proposal received: 
Wednesday, 13 December, 2017
Date proposal approved: 
Thursday, 14 December, 2017
Keywords: 
Genetics, Allergy, Hypertension, Obesity, Respiratory - asthma, Computer simulations/modelling/algorithms, Epigenetics, Gene expression, GWAS, Metabolomics, RNA, Statistical methods, Biomarkers - e.g. cotinine, fatty acids, haemoglobin, etc., Blood pressure, BMI, Cardiovascular, Genetics - e.g. epigenetics, mendelian randomisation, UK10K, sequencing, etc., Metabolic - metabolism, Methods - e.g. cross cohort analysis, data mining, mendelian randomisation, etc., Statistical methods