B972 - Statistical modelling of longitudinal data from cohort studies to better understand phenotypes of asthma - 23/06/2010

B number: 
B972
Principal applicant name: 
Dr Raquel Granell (University of Bristol, UK)
Co-applicants: 
Prof John Henderson (Not used 0, Not used 0), Prof Jonathan Sterne (Not used 0, Not used 0)
Title of project: 
Statistical modelling of longitudinal data from cohort studies to better understand phenotypes of asthma
Proposal summary: 

Objectives

1. To develop and apply latent class and other statistical models that combine symptoms and physiological measurements to define and refine asthma phenotypes in children in the Avon Longitudinal Study of Parents & Children (ALSPAC) study.

2. To develop and apply methods to validate and replicate phenotypes in other large, population-based cohort studies and to compare phenotype definitions and their associations with asthma risk factors.

3. To examine associations between genetic variants identified through genomewide association studies of asthma with asthma phenotypes, in order to understand the biological pathways leading to different phenotypes.

Summary of proposal

Several studies have demonstrated the utility of the latent class approach in identifying phenotypes that have clear associations with clinically relevant outcomes [1-4]. Different approaches to phenotype definition appear to elucidate different aspects of the relationship between symptoms and asthma-related markers. There remains a need to develop these approaches to further investigate the validity and cross-cohort replicability of proposed phenotypes, and their suitability as proxies for underlying pathophysiological constructs of asthma.

The ALSPAC cohort (13,000 children born 1991-92) includes longitudinal reporting of respiratory symptoms (cough, wheeze, breathlessness, trigger factors) collected prospectively from birth to 17 years and direct measurements of lung function, bronchial responsiveness and allergy at different ages. In my previous work on the ALSPAC study I used reported wheezing at 7 time points to define wheezing phenotypes, in a "single dimensional" approach [1]. To refine and improve our understanding of asthma phenotypes, I will investigate "multi-dimensional" approaches in which different types of phenotypic information (for example, frequency and duration of respiratory symptoms, atopy and bronchial responsiveness) are included in latent class models. A multidimensional approach that uses a full range of symptoms, measures of allergy and measures of lung function to define phenotypes may better cover the distinctive pathophysiological features that represent the complexity and heterogeneity of asthma. Latent class models will be fitted using Mplus software (with which I am already familiar). Pre-selection of the most relevant variables using Principal Components Analysis may be required, to limit the number of parameters in the model.

The next step will be validation and replication of the phenotypes identified in analyses of ALSPAC data. Three population-based childhood cohorts have agreed to contribute data to these analyses: The Stockholm Children Allergy and Environmental Prospective Birth Cohort Study (BAMSE) [5] (4,089 children, born 1994); The prevention and incidence of asthma and mite allergy cohort (PIAMA) [6] based in Groningen (3291 children, born 1996-97) and The Western Australian Pregnancy Cohort (Raine) Study [7] (2868 children, born 1989-91). All four cohorts, including ALSPAC, have respiratory symptoms from repeat questionnaires and lung function measurements, bronchial challenge and allergy tests at different intervals through childhood.

In order to validate and replicate the derived asthma phenotypes in other cohorts, I will have to take into account differences in the nature, number and timing of measures among the cohorts. I will use cubic splines (nonlinear functions defined by smoothly joined piecewise polynomials) [8] to derive parametric curves representing the ALSPAC-derived asthma phenotypes. These curves will be used to interpolate model parameters appropriate to the number and timing of measurements in each dataset.

The accuracy and generalisability of the models will be validated across the independent cohorts. To investigate phenotypic variation among cohorts I will measure the loss of fit by using deviance (-2 x log-likelihood) differences. Such an approach has been used previously to derive generalisable prognostic models for the prognosis of HIV-infected patients starting antiretroviral therapy [9]. The performance of candidate latent class models in single cohorts will be measured by comparing the fit using the parameters estimated from the full data set with the fit using parameters re-estimated from single cohort data alone. The best-generalising model is defined as that for which the loss of fit in single cohort data is smallest.

ALSPAC, BAMSE, PIAMA and the cross-sectional study MAGICS/ISAAC (Hannover, 1,400 children) will have comparable genomewide data available through participation in the GABRIEL Study [10]. GABRIEL is a genomewide association study involving 14 European countries. Common genetic variants that are associated with childhood asthma have been identified, and more are likely to be reported over the next 12 to 24 months. Replication among these cohorts will enable me to investigate associations of candidate-gene variants in known functional pathways with different asthma phenotypes, and hence to determine if there is genetic evidence for biological differences between derived phenotypes. Between-phenotype variation in associations with genetic variants will also provide objective evidence for the validity of derived phenotypes.

Date proposal received: 
Wednesday, 23 June, 2010
Date proposal approved: 
Wednesday, 23 June, 2010
Keywords: 
Allergies, Respiratory, Atopy
Primary keyword: