B1493 - Utilising population-based collections from the UK to identify genetic risk factors for idiopathic scoliosis - 31/01/2013
Aims
We aim to carry out the first population-based genome wide association studies for the presence of scoliosis using white European populations, and to investigate whether similar genetic associations are seen in different ethnic groups. In addition, by utilising resources already available from ALSPAC we will carry out preliminary functional investigations on any identified genetic loci.
To utilise the genetic data already collected to investigate the genetic contributions to scoliosis we will
1. Investigate if the SNP rs11190870 is associated with the presence of scoliosis
2. Perform GWAS in ALSPAC
3. Meta-analysis of GWAS from ALSPAC and Twins-UK
4. Evaluate if identified novel SNPs are likely to have biological relevance bioinformatically and by utilising gene expression, DNA methylation and metabolomic data
5. Continue to develop a model to predict the progression of scoliosis
Methods 1: GWAS
Only those common genotypic variants with a minor allele frequency greater than 5% will be studied. Only SNPs which passed an exact test of Hardy-Weinberg equilibrium are considered for analysis. Association analysis will be performed using logistic regression models based on log additive models, adjusting for age, gender and other appropriate variables. We will assess genome-wide data for associations with scoliosis in ALSPAC alone, setting genome-wide significance at P<=5x10-8.
Methods 2: Power
The power of our GWAS depends on the minor allele frequency of SNPs, the likely size of effect and our cut-off for defining scoliosis. Given the two previous genome-wide studies have found an effect size of 1.56 and 1.85, and the rs11190870 has an allele frequency of 0.437 in ALSPAC, we have reasonable power to detect de novo effects of a comparable magnitude to those previously published.
Methods 3: Meta-analysis
We will perform a meta-analysis of the ALSPAC and Twins-UK GWAS. Prior to meta-analysis, poorly imputed SNPs and those with an allele frequencyless than 5% will be excluded. Inverse variance fixed-effects meta-analyses will be undertaken using METAL. Genomic control corrections will be applied before reporting SNPs which reach genome-wide significance (Pless than 1x10-5) for further investigation.
Methods 4: Replication
We will select signal loci (+/-500mb) with the greatest evidence for association with the risk of scoliosis for replication within data from Japanese and Hong-Kong case-control studies that have GWAS data already available. Population linkage disequilibrium (LD) will be taken into account and where possible used to aid fine mapping. The Chinese Hong Kong disease cohort will then be genotyped for our top 10 SNP hits. Associations between SNPs and scoliosis will then be analysed as described above. We will then repeat the meta-analysis based on all five cohorts.
Methods 5: Bioinformatics
We will use all existing knowledge to analyze our identified SNPs in order to establish its status as a potential functional variant for scoliosis. A bioinformatic approach will be taken to help identify which gene the signal is in, and its likely functional networks. The potential impact of coding variation will be assesed with a series of predictive approaches including SIFT and PolyPhen and we will also followup non-coding functional elements recently annotated by the ENCODE consortium. This will generate prioritised lists of variants for further functional examination in future research projects.
Methods 6: Integrated use of available expression, methylation and metabolomic data
We will examine the impact of our identified SNPs on patterns of protein expression in cell lines, in DNA methylation and through examination of detailed banks of metabolomic data. Along with a sub-set of ALSPAC and Twins-UK samples with transformed multi-tissue specific expression data, the BBSRC-funded ARIES study provides an opportunity to examine the likely effect of our identified SNPs on methylation, as a potential mechanism of gene-environment interaction. This will allow us to extend our primary goal and to begin to unpick the contributing nature of genetic loci to scoliosis risk in a unique study environment.
Exposure variables
Common genotypic variants genotyped using the Illumina HumanHap550 platform.
Outcome variable
Scoliosis (yes/no as a binary variable) identified by the DSM at aged 9 and 15, will be defined as those with a curve >=10degrees, as this has substantial repeatability (Kappa of 0.74). Sensitivity analyses will be carried out using a lower cut-off of >=6degrees to define scoliosis, as the DSM underestimates curve size by approximately 40%, although this lower cut-off has only moderate repeatability (Kappa of 0.56). In addition, we will explore the use of angle size as a continuous variable.
Confounding and other variables
Age and gender
Gene expression data
DNA methylation data
Metabolomic data