B1186 - Genome-wide copy number variation association studies of blood pressure and lung function - 16/06/2011

B number: 
B1186
Principal applicant name: 
Dr Louise Wain (University of Leicester, UK)
Co-applicants: 
Prof Martin Tobin (Not used 0, Not used 0), Nick Timpson (University of Bristol, UK), Dr Dave Evans (Not used 0, Not used 0), Dr Dave Evans (Not used 0, Not used 0), Prof John Henderson (Not used 0, Not used 0)
Title of project: 
Genome-wide copy number variation association studies of blood pressure and lung function
Proposal summary: 

Background

The extent of copy number variation across the genome is still the subject of much debate although genome-wide maps of common copy number variation have now begun to emerge. Methods for detection and characterisation of both common and rare copy number variants are still far from perfect with no one approach to mining the genome optimal to detect all sizes of CNV. Consequently, there is still much to be learned about the extent and impact of this form of variation on human health and disease. Early studies of the role of common copy number variation in disease, including the Wellcome Trust Case Control Consortium (WTCCC) CNV project with which we were involved, suggest that much of this variation may already be well tagged by SNPs on the most recent genotyping platforms. However, rare and low frequency copy number variants, which are not expected to be tagged by common SNPs, have been shown to play a role in diseases such as autism and schizophrenia and the contribution of copy number variants to quantitative traits such as blood pressure and lung function is still largely unexplored. Algorithms have been developed that aim to utilise the raw allelic intensities from contiguous SNPs in SNP genotyping arrays to detect and quantify copy number variation. Although these detection approaches have raised substantial challenges, the most recent genome-wide SNP data allow improved detection of copy number through the inclusion of many more probes. Building upon recent experience and growing expertise in CNV analyses (1,2,3,4) and on data from other studies (listed in the following paragraph) which have approved collaborative CNV analyses we will undertake, we propose collaborative CNV studies of blood pressure & lung function together with ALSPAC investigators. These studies will help to gain the maximum benefit from newly available genome-wide association study data in ALSPAC, and will build upon the SNP association analyses that ALSPAC investigators plan to undertake.

Identifying the genetic determinants of lung function and blood pressure may lead to the identification of potentially modifiable intermediate phenotypes which can be targets for public health interventions. Analyses of CNV association with blood pressure and lung function will be undertaken to further the understanding of the genetic architecture of these traits. SNP association studies have demonstrated that very large sample sizes are necessary to detect variants with small effect sizes. The inclusion of 9000 ALSPAC individuals will substantially boost the power of our planned study (which currently includes data from adults in the Busselton Health Study (n ~1500, with 3600 expected in 2011) and the British 1958 Birth Cohort (n ~8000), children from the Raine study (n ~1500) and children & young adults from the Cardiovascular Risk in Young Finns Study (n ~2450)).

In a subset of individuals, robust findings from these analyses may be validated (either by experimental assay or by direct genotyping of tag SNPs if available), and followed up with breakpoint sequencing to fine-map the precise genomic location of the CNV. If it would appear appropriate and scientifically beneficial to include any ALSPAC participants at this stage, then this will be subject to a separate application.

Approaches

Louise Wain will work closely with Dave Evans or Nicholas Timpson in order to undertake the analyses as discussed below. In the first instance, the analyses would need to be run by Louise Wain, but the project aims also to develop this capacity at the University of Bristol. The analysis could be undertaken on the high performance cluster at the University of Bristol if, for example, Louise Wain were offered visitor status (or using a similar new £2m facility in Leicester, as you prefer). An alternative would be the provision of secured log in access to Bristol servers remotely. What ever the preferred mechanism, we would propose that Louise visits Bristol for regular meetings with a named ALSPAC investigator. This will be supplemented by project progress reports shared and discussed as required with all named co-applicants from both the University of Bristol and the University of Leicester.

Analysis of CNV using using SNP chip data requires the use of the raw X and Y intensities and/or derivatives of these (log R ratio and B allele frequency). Due to the inherent noisiness of this data, a key stage in the process of generating CNV genotype calls will be quality control (QC) and definition of a suitable filtering strategy (2).

Quality control (QC) will be applied to each dataset to exclude outlying samples on the basis of the SNP genotype data (e.g. those with substantial missing genotype data, or showing ancestral differences in principal components analysis (PCA)) or intensity data (large standard deviations of genome-wide intensity measures and subsets of samples that might have been processed differently identifiable as likely batch effects by PCA). There are several algorithms available for detecting and measuring copy number variation in SNP genotyping data (e.g. QuantiSNP and PennCNV which are based on Hidden Markov Models) using the raw allelic intensities or derivations thereof. Raw CNV calls will be subject to further rigorous QC with filtering thresholds defined by simulation and sensitivity analyses undertaken to evaluate the consequences of the filtering strategy chosen on the downstream association testing.

CNVs will be tested for association with blood pressure variables, SBP and DBP, and lung function variables, FEV1 and FEV1/FVC, using linear regression. In addition, previously published CNV maps will be used to define boundaries of common CNVs and these CNVs will be tested for association using methods such as CNVtools. Validation will be undertaken in selected subsamples from these studies.

Additional benefit

The CNV methods, or calls used, could be applied to the study of a wide range of additonal phenotypes in ALSPAC and the experience of these kinds of analyses will benefit analysts involved in ALSPAC.

Genotype data required

This project requires the intensity data for each SNP and non-polymorphic probe on the array (log R ratio and B allele frequency, and raw X and Y intensities if possible).

References:

1. L. V. Wain, J. A. Armour, M. D. Tobin, Lancet 374, 340 (Jul 25, 2009).

2. L. V. Wain et al., PLoS One 4, e8175 (2009).

3. N. Craddock et al., Nature 464, 713 (Apr 1, 2010).

4. H. M. Blauw et al., Hum Mol Genet 2010, 10 (Aug 10, 2010).

Date proposal received: 
Thursday, 16 June, 2011
Date proposal approved: 
Thursday, 16 June, 2011
Keywords: 
Allergies, Genetics, Respiratory, Atopy
Primary keyword: