B3297 - Methods to identifying genetically regulated components using transcriptome data - 09/05/2019

B number: 
B3297
Principal applicant name: 
Jin Liu | Duke-NUS Medical School, National University of Singapore (Singapore)
Co-applicants: 
Title of project: 
Methods to identifying genetically regulated components using transcriptome data
Proposal summary: 

Although genome-wide association studies (GWAS) have been very successful in identification of genetic variants associated with complex traits, the mechanistic links between these variants and complex traits are still largely unknown. A scientific hypothesis is that genetic variants influence complex traits via affecting cellular traits, e.g., regulating gene expression and altering protein abundance. Yet, how to systematically examine this hypothesis remains an open problem.
Recently, large genomic consortia are generating a vast volume of genomic data towards comprehensively characterizing the regulatory role of genetic variants. For example, the Genotype-Tissue Expression Project (GTEx) has collected genomic profile of 449 individuals, comprised of measurements of gene expression from multiple tissues and about 12.5 million DNA bases, providing unprecedented opportunities to dissect genetic contributions to complex traits. Statistical methods that effectively harness multilayer data (genetic variants, cellular traits and organismal traits) for mechanistic interpretation are highly demanding. The phenomenon that genetic variants have individually weak effects on complex traits makes statistical modeling very challenging, because it requires that the developed models can efficiently exploit information in low signal-noise-ratio (SNR) regimes.
In the proposed research, we aim at developing statistical methods to advance mechanistic understanding of the role of associated variants in complex traits. The key idea is built upon the emerging scientific evidence that genetic effects at the cellular level are much stronger than those at the organismal level. In our pilot study, we proposed a statistical approach, AUDIS, to accounting for uncertainty in dissecting genetic contributions to complex traits by leveraging regulatory information. We also developed a parameter expanded expectation-maximization (PX-EM) algorithm to ensure that AUDIS can perform stably in low SNR regimes. Interestingly, two popular methods in this field, SKAT and PrediXcan can be connected via a stagewise view of AUDIS, as supported by results from comprehensive simulation studies. Then we applied stagewise AUDIS to analyze 20 complex traits in GERA by incorporating the transcriptome data in the Genetic European Variation in Health and Disease (GEUVADIS) Project. Real data analysis results show that AUDIS can identify more genetically regulated genes that are significantly associated with complex traits, without inflated type I errors. To continue in this promising direction, we propose generalized AUDIS models to allow detection of trait-associated expression quantitative trait loci (eQTLs), as well as integrative analysis of GWAS data with transcriptome data from multiple tissues. We also propose to investigate theoretical properties of our methods, e.g., why AUDIS can be immune to model mis-specification.
The novelty of this research is that a statistically efficient and computationally feasible framework is developed to dissect genetic contributions to complex traits. Specifically, the proposed methods (AUDIS and its generalizations) can effectively borrow regulatory information at the cellular level and then boost statistical power of identifying genetically-regulated and trait-associated genes, even though in low SNR regimes. The stagewise view of AUDIS not only offers a statistical connection between existing approaches, but also highlights the importance of statistically rigorous design. Unlike conventional engineering approaches that often only consider a point estimate or a maximum a posteriori probability estimate, the stagewise view of AUDIS implies that the posterior distribution obtained in former stage should be naturally incorporated as the prior in the later stage. This gives birth to a modularized design of the whole data analytics system, greatly helping biomedical researchers to gain new scientific insights. The statistical and computational skills developed here are also broadly useful to many other applications in the field of data science.

Impact of research: 
The impact of our research will lie on two sides. First, we will develop statistically rigorous methods to conduct integrative analysis of multi-platform genetic/genomic data. Second, the results from our analysis will help people better understand the mechanistic link that genetic variants influence complex traits via affecting cellular traits.
Date proposal received: 
Tuesday, 16 April, 2019
Date proposal approved: 
Tuesday, 30 April, 2019
Keywords: 
Bioinformatics, Bone disorders - arthritis, osteoporosis, Developmental disorders - autism, Cancer, Diabetes, Eating disorders - anorexia, bulimia, Eczema, Hypertension, Mental health, Obesity, Computer simulations/modelling/algorithms, GWAS, Medical imaging, Statistical methods, Statistical methods